# MODELING PLAY IN EARLY INFANT DEVELOPMENT

EDITED BY : Mark H. Lee, Patricia Shaw, Kathy Hirsh-Pasek, Karen E. Adolph, Qiang Shen, Pierre-Yves Oudeyer and Jill Popp PUBLISHED IN : Frontiers in Neurorobotics and Frontiers in Psychology

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-045-2 DOI 10.3389/978-2-88966-045-2

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# MODELING PLAY IN EARLY INFANT DEVELOPMENT

Topic Editors:

Mark H. Lee, Aberystwyth University, United Kingdom Patricia Shaw, Aberystwyth University, United Kingdom Kathy Hirsh-Pasek, Temple University, United States Karen E. Adolph, New York University, United States Qiang Shen, Aberystwyth University, United Kingdom Pierre-Yves Oudeyer, Institut National de Recherche en Informatique et en Automatique (INRIA), France Jill Popp, The Lego Foundation, Denmark

Citation: Lee, M. H., Shaw, P., Hirsh-Pasek, K., Adolph, K. E., Shen, Q., Oudeyer, P.-Y., Popp, J., eds. (2020). Modeling Play in Early Infant Development. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-045-2

# Table of Contents


Paul Muentener, Elise Herrig and Laura Schulz


Suresh Kumar, Patricia Shaw, Alexandros Giagkos, Raphäel Braud, Mark Lee and Qiang Shen


Yue Yu, Patrick Shafto, Elizabeth Bonawitz, Scott C.-H. Yang, Roberta M. Golinkoff, Kathleen H. Corriveau, Kathy Hirsh-Pasek and Fei Xu


*179 Contribution of Developmental Psychology to the Study of Social Interactions: Some Factors in Play, Joint Attention and Joint Action and Implications for Robotics*

Hélène Cochet and Michèle Guidetti

*190 Changes in Posture and Interactive Behaviors as Infants Progress From Sitting to Walking: A Longitudinal Study* Sabrina L. Thurman and Daniela Corbetta

# Editorial: Modeling Play in Early Infant Development

Patricia Shaw<sup>1</sup> \*, Mark Lee<sup>1</sup> , Qiang Shen<sup>1</sup> , Kathy Hirsh-Pasek <sup>2</sup> , Karen E. Adolph<sup>3</sup> , Pierre-Yves Oudeyer <sup>4</sup> and Jill Popp<sup>5</sup>

*1 Intelligent Robotics, Department of Computer Science, Aberystwyth University, Aberystwyth, United Kingdom, <sup>2</sup> Psychology, Temple University, Philadelphia, PA, United States, <sup>3</sup> Infant Action Lab, Psychology Department, New York University, New York, NY, United States, <sup>4</sup> Institut National de Recherche en Informatique et en Automatique (INRIA), Rocquencourt, France, <sup>5</sup> The Lego Foundation, Billund, Denmark*

Keywords: play, computational modeling, development, infant, learning

**Editorial on the Research Topic**

**Modeling Play in Early Infant Development**

### 1. INTRODUCTION

This Frontiers Research Topic focuses on the question: Can we develop computers or robots that play and develop like children? Approaches to this question involves the elaboration and study of computational models of infant play with the perspective of two complementary disciplines. Firstly, developmental psychology benefits from such models to formulate theories and conjectures of infant play which can be tested and evaluated through experimental studies. Secondly, the new field of developmental robotics looks toward infant development for inspiration, data, and guidance, in order to build models of learning that may be useful both for better understanding of human development and for engineering autonomous learning in robots and other systems.

These fields have common ground in this very active and significant research area, investigating how babies learn and grow cognitively, and testing our knowledge in the concrete world of computer models. A major characteristic of early human development is the open-ended acquisition of new abilities and competencies. Human infants are born helpless yet they actively become familiar with their environment and their own body through spontaneous exploration and interaction with others. Within a few months of rapid learning and development, they have acquired quite sophisticated sensory-motor and social competences. New skills appear to sprout from current competences as experience builds along a continuous trajectory of action and interaction. In particular, such open-ended learning is readily seen in the ubiquitous behavior known as play.

Play can be used to describe an expansive range of exploratory activities, but the concept currently lacks a sufficiently unifying theoretical framework. Here we focus on forms of play which involve free and spontaneous intrinsically motivated exploration of actions, objects, places, or tasks and activities in varying contexts, outside motivation to fulfill basic physiological needs like feeding and without external goals set by social peers. Such forms of exploration may involve the search for novelty or surprise, can be goal-free but also involve self-generated goals which are pursued for their intrinsic "interestingness." For example, when encountering novel objects or events, infants will often display pleasure in the interaction, try to repeat the experience and show enjoyment of their own activity. This suggests an enactive approach which Jerome Bruner called "learning by doing." Von Hofsten describes play as "the purposeful seeking of enjoyable action possibilities," and Vicky Bruce stresses the immersive aspects in terms of several features of "free-flow" play.

Edited and reviewed by: *Minoru Asada, Osaka University, Japan*

> \*Correspondence: *Patricia Shaw phs@aber.ac.uk*

Received: *05 May 2020* Accepted: *24 June 2020* Published: *06 August 2020*

#### Citation:

*Shaw P, Lee M, Shen Q, Hirsh-Pasek K, Adolph KE, Oudeyer P-Y and Popp J (2020) Editorial: Modeling Play in Early Infant Development. Front. Neurorobot. 14:50. doi: 10.3389/fnbot.2020.00050*

From developmental robotics, work on these ideas have explored both solitary play with objects and early interactive play with others as a generative behavior that combines fragments of past experience with new sensory-motor events in differing contexts. Computational models of play have been proposed, for example based on forms of novelty or information gain as an intrinsic driver, leading to designs for investigations on "curious robots."

The aim of this Frontiers Research Topic is to present international state-of-the-art research from naturalistic or experimental infant studies and computational/robot modeling, on early infant play behavior. The focus will be on the very earliest forms of play, because this is concurrent with increasing perception and understanding of the "physics of the world," e.g., perceptions of objects, causality, and interactions. Many interesting questions arise: for example, how does play emerge and what is its relation to goal-free motor babbling? How does play relate to object understanding and world knowledge. How does intrinsically motivated self-generation of goals relate to future extrinsically motivated goal generation and goal attribution? How far can the world be explored through the paradigm of play? How can we best understand more about infant cognition from modeling these concepts on robots? This topic includes leading contributions delivering experience and original research on computational modeling of psychological experiments about these topics, as well as experimental and theoretical papers that increase understanding of these important issues and core concepts in infants and machines.

### 2. OVERVIEW OF THE CONTENTS OF THE E-BOOK

The papers in this Research Topic are broken down into three categories. Firstly there are studies from Developmental Psychology of infants whilst playing, to help define the broad spectrum of play. Then we have the theoretical models exploring different aspects of this observed play behavior, before finally moving onto the application of models for playful learning to robotics to learn how to perform various tasks. Below is a summary of the various papers in this Research Topic.

### 2.1. From Developmental Psychology

Whilst previous attempts have tried to give a single definition for all types of play, or are restricted to the concept of free-play. Zosh et al. provides a new definition that describes play as a continuum from free-play through to directed-play. The level of engagement or direction from adults increases as you move along the continuum, allowing this new definition to better represent and review the importance of these different types of play.

At it's core, the foundation of definitions in play stem from the work of Vygotsky (1967) and Piaget (1952). These are all expanded here where working definitions and literature reviews are given for the common characteristics of play being; Active, engaged, meaningful, social, iterative, and joyful. Overall this gives a multi-dimensional space in which different types of play can be defined, opening up avenues for future research.

#### 2.1.1. Social Interaction for Play

Related to this, Cochet and Guidetti reviews two decades of research into Joint Actions and the importance of the social element for Human Robot Interaction (HRI). As part of the review, they focus on the development of joint attention through play for infants, breaking down the interaction based on three dimensions: motor precision, coordination, and anticipatory planning. By considering each dimension in isolation, they aim to support developmental roboticists in the modeling and learning of the behavior, whilst also providing developmental psychologists a platform on which to disentangle these and assess the "manipulability" of each dimension individually.

The dimension of motor precision requires not only the robot understanding its own motor skills, but also the kinematics of the human participant, e.g., reachability or graspability of an object. The use of gaze and pointing are also key elements here for identifying the object or event on which the joint attention is based. Note that to support the involvement of the human, the gaze should shift between the human and the target.

The second dimension of coordination considers the synchronization of behaviors as well as the multi-modal communicative signals (gaze, gesture, vocalizations, facial expressions, etc.), and use these to adjust the robot's own behavior.

Finally, the dimension of anticipatory planning considers the individuals ability to predict the behaviors of the partner as part of a sequence of actions in order to enable better coordination and anticipatory behaviors in support of the other person. The need for inner states such as those representing the beliefs of the partners is still an area for debate, but what is clear is the need for quick responses (in the order of 100 ms) in order to maintain the feeling of effective interaction.

Overall, this review provides a roadmap toward enabling robots using human-like communicative modalities to invite more natural interactive behaviors with people.

#### 2.1.2. Guided Play

Meanwhile, Yu et al. specifically focuses on guided play, providing a perspective on the existing literature and how this could be used to both form theoretical models for both studying this type of play in humans as well as developing models for robots.

Through the observations of effective methods adapted to individual learners, data analytic approaches could be applied to more accurately predict the current state of the learner and the effectiveness of the guidance, as well as starting to suggest and improve the ability of automated tutors. The important feature of any model is the need to be dynamic, and adapt to the interactions over time for each individual. Building on social cues such as gaze direction, a more naturalistic interaction can be achieved leading to better engagement and learning.

Gliga, explores the literature on the importance of variability in behavior for promoting learning, specifically considering motor acts for reaching, locomotion, and vocal behaviors in a variety of species. Through considering differences in the types and variability of motor actions between normally developing infants and those with various conditions leading to atypical

development (e.g., Cerebral palsy or brain damage), they identify the importance of certain types of variability in motor actions that support development and learning.

They start by differentiating between planned noise, variability generated in the central nervous system and execution noise, variability resulting from the randomness of biological processes. These are then classified into three main sources of variability present during infancy; Hypothesis testing, Learning expectant variability and Sensory-motor noise. It is clear that the first two are directly linked to learning, but the third still needs further fine grained investigation. Studies such as those by Thurman and Corbetta can start to investigate some of the finer details, whilst Chastain considers a more evolutionary change in phenotypic variation demonstrated through motor babbling by re-evaluating the "Baldwin Effect."

Neale et al., investigates the potential for more fine grained analysis of play behavior. In order to truly develop a multi-model definition of play behavior, we would need to combine behavioral, cognitive and neurological measures together. Currently, most measures of behavior and cognition are very coarsely grained, i.e., 10 s of seconds, hours, days, weeks, or months, whilst the neurological measures are in milliseconds (ms). If there is to be any hope of aligning them, all need to be measurable on the same scale. Neale et al., develops a framework for measuring sensorimotor, cognitive and socio-emotional play in the ms timescale for future alignment with EEG recordings, building on interdisciplinary studies of play behavior by Miller (2017). Observing adult-infant interactions during play and non-play conditions, a precise coding system was defined for each of the three measures and applied in 33 ms intervals (30 fps). Combining the three measures, a clear separation is visible between the play and non-play behaviors with further subcoding in each measure enabling finer grained evaluation. Whilst incorporation of the EEG data collected during the study is left to future work, this study concludes with a summary of how the potential analysis could be done.

In the study by Markova, the relationship between play and the hormone oxytocin was evaluated. The hormone, sometimes referred to as the "cuddle hormone," is recognized as supporting cooperation in adults. Mothers with infants aged 4-months engaged in a period of natural play.

The types of play considered were highly structured involving both verbal and non-verbal communication, where the nonverbal was in the form of facial expressions and gestures. For early infants, it was previously unclear how much they responded to disruption in this structured play, e.g., missing actions from a song.

By taking various swab samples before and after some structure natural play, they were able to identify a strong correlation to engagement with play, indication that social games are an important part of early mother-infant interaction.

#### 2.1.3. Longitudinal Studies

Thurman and Corbetta review data from one of their previous longitudinal studies to consider the postural changes between mothers and infants as early infants develop from sitting to walking, and how these postural changes are linked to exploratory behaviors on objects. Specifically, they ask the questions; do infants and mothers alike shift interactive behaviors as infants acquire locomotion? Do interactive behaviors depend on the posture performed in the moment? And, do transitions between targets occur while maintaining or changing posture?

Analysis of postures used predefined techniques (Touwen, 1976), to coarsely classify postures such as sitting, kneeling, crawling, or standing. The types of interaction were classified as targeted/untargeted, interaction, passive, fine motor, or gross motor.

Observing infants every 2 weeks during 10 min of free play starting from 6 months old up to five sessions after the onset of walking they observed significant and increasingly varied use of the full body to explore and interact with their environment. Throughout this developmental period, mothers produced little to none or purely passive activity during the sitting, kneeling/squatting, and standing phases.

In another longitudinal study by Muentener et al., five different measures for play were evaluated in relation to cognitive development over a 9 month period. These measures included attention to novelty, inductive generalizations, face preference, imitative learning, and efficiency of exploration.

Infants aged 5–19 months were observed 4 times over a 9 month period during 15 min sessions of exploratory play, with a variety of objects provided by the tester related to each of the measures being investigated. A later assessment on a subset of the individuals was done at 3 years old, assessing vocabulary size and IQ.

Over the range of measures considered, efficiency of exploration correlated with higher IQ scores at the final assessment.

Tian et al. perform a cross-sectional study of pre-school children in a block-building task. Variables in the methodology include the group size (1, 5, or 10), the form in which the model was presented (3D model vs. 2D pictures) and the age of the participants (K1–K3) in a public kindergarten. The measures of the task were then broken down to consider three different skills relevant to the block-building task (block building, structural balance and structural features) as well as considering the variation between genders alongside the other variables in task performance.

Significant variation from gender was identified in each of the block-building categories except structural features. Blockbuilding skills improved across the age dimension, and the 3D model was found to elicit more representational play than the 2D pictures. Finally the small group size performed slightly better than the individuals or large groups, possibly due to interference when group size was too large.

#### 2.2. Modeling Play in Infants

The contributions described above each include direct observations of infant behavior in various situations. In many of these, the details provided an outline for models to be constructed and then compared against. The following contributions each focus on different aspects and approaches to starting to model the observations by the Developmental Psychologists.

#### 2.2.1. Theoretic Models

Chastain presents an information theoretic approach to modeling learning, building on previous theories by Baldwin (1902). They discuss the many divergent interpretations of the Baldwin Effect for evolutionary theory and attempt to bring back interpretations from the original work, bring it back toward Developmental Psychology and specifically related to the role of abstraction in phenotypes. These theories consider evolution and development of complex skills over generations based on phenotypic plasticity. This allows organisms to try out motor actions to obtain reward signals in their juvenile state and enables motor babbling as a learning mechanism to smooth the fitness landscape. This can be observed in skills development such as hand writing where a level of imitation from one generation to the next speeds up the learning process and development of these complex skills.

Schank et al. also takes the approach of developing a theoretic approach to modeling learning, this time using a game theoretic approach to demonstrate how fair play in juvenile animals can lead to fair behavior in adults. Fair play is often observes in many species (Burghardt, 2005), with behaviors such as selfhandicapping (e.g., an individual not biting as hard as it can) and role-reversal (e.g., alternately switching between dominant and submissive positions). In adults, the fair behavior can be observed as the social group sharing food instead of hoarding all the food for an individual. By modeling a "play" gene that is either on or off, and two stages of development (juvenile and adult), they evaluate the activation of the play gene in animals, incorporating a "gestation" period between reproduction cycles. When compared against control simulations, they found that the play gene evolved to be activated significantly more across a wide range of conditions. This supports the argument that one of the benefits to play is for learning social skills and to facilitate the acquisition of skills for behaving fairly as adults.

#### 2.2.2. Robotic Modeling

Mannella et al. investigate the application of Competencebased Intrinsic Motivation (CB-IM) for driving the discovery of goals, and maintaining focus for learning a behavior until a goal is satisfied. This approach to driving learning is applied to learning a body model and kinematics of a 6DoF robot arm, through self touch on a simulated robot arm with touch sensors evenly distributed. The model combines a neuro-inspired RNN (Mannella and Baldassarre, 2015), with a random trajectory generator and an associate memory. The "easy" to reach contact points are learnt first before gradually building up the complexity of goals to the more challenging configurations, refining the reach actions. The contribution finishes by making three predictions based on the model for Developmental Psychology. These are related to the efficiency of reaching as infants develop as well as the reaching to points on the body related to the complexity of the reach and the uneven distribution of tactile receptors throughout the body.

Related to this, Kumar et al. also demonstrates modeling a schema based learning approach on a robotic platform, constructing increasingly complex actions through chains of simpler actions. Inspired by the ideas of Piaget (1952) and building on a schema based model by Sheldon and Lee (2011), the model is extended to enable hierarchical building of chains that can themselves become reusable unit actions, with both partial and complete generalization. Rather than being performed in simulation, in this case the learning is evaluated on-line on an iCub humanoid robot where the robot learns to grasp objects, and move a specific object to a key point to unlock a toy. The learning is performed online with new schemas and chains of schemas being constructed hierarchically. Properties of objects are considered to enable reuse of similar schemas and for generalization of schemas to reduce the overall number of schemas required. The experiments also consider individual variation in preferences between infants, rather than attempting to model the average results from observations. A set of preferences are defined with weightings to shift between them. Currently these weightings are static, but future work will consider how they may change based on the current situation, e.g., based on internal measures such as happiness or satiety.

Meanwhile, the ability to play football has long been a golden target for humanoid robotics. Ossmy et al. trained Nao robots based on toddler movement patterns to improve the ability of the robots to quickly navigate around the playing field. Simulated robots were trained on movement paths generated by toddlers, including stopping and starting movement intermittently, vs. robots trained on less varied geometrical paths. Games played between the two groups demonstrated that the increased variability of the movement patterns from human infants let to better performance in the matches. Not only does this paper show how robots can benefit from the observations of infant development, but also how robots can be used to test hypotheses about infant development.

Using the Nao humanoid robot as a basis for the simulation, and an existing system for training a walking system, MacAlpine et al. (2012), they focused on a used a reward based system to tune a set of the parameters for refining the walking system. When testing the robots trained on the infant patterns vs. those trained on more traditional geometric patterns, the infant trained robots consistently beat all those trained on different geometrical patterns. Further breaking down the infant patterns based on levels of exploration, the robots trained on the patterns showing the most exploration also went on to win the most games in a tournament. Overall, this emphasizes that variability is a feature of infant development, rather than a stumbling "bug" in the process.

This final contribution the topic by Wu et al. shows the benefit of developmental learning and play applied to another robot, this time a mobile robot that learns through stages to look at, reach and grasp, and move toward balls in its environment. Structured as a game with a simple and complex mode, the study uses the concept of Lift-Constraint, Act, and Saturate (LCAS) (Lee et al., 2007), to aid the robot learning the stages of the game to ultimately succeed in the complex game requiring the robot to drive around to visually locate the balls and pick them up. The grasping of balls requires the coordination of "hand-eye" movements that are also learnt in a stages approach. The model is implemented through the use of a Radial Basis Function (RBF) network that is trained using data collected by the robot. The training samples used are limited by the constraints applied based on the current stage of development. A comparison of learning without the constraints shows that the constraints enable to the robot to learn faster.

### 3. CONCLUSION

Here we have brought together studies that help to further define the broad spectrum of play based on infant studies, formations of theoretical models based on these definitions, and applied models to robotics platforms for developmental inspired learning approaches. Of course the process does not stop there, as the theoretical and robotic models will ultimately feedback to the Psychology to help better understand the behaviors observed.

The studies from Developmental Psychology provide a framework and roadmap for the implementation of theoretical and robotic models. Through the application of developmental stages, the studies here have demonstrated the gains to be made in improved final performance and rate of learning. Not only

#### REFERENCES

Baldwin, J. M. (1902). Development and Evolution. New York, NY: Macmillan.


that, but they have also provided a test bed for the evaluation of hypotheses related to development in infants.

### AUTHOR CONTRIBUTIONS

All authors acted as guest editors for the related Research Topic. This editorial was produced by PS.

### FUNDING

This Research Topic was supported by funding from the UK Engineering and Physical Sciences Research Council (EPSRC), grant No. EP/M013510/1.

### ACKNOWLEDGMENTS

The authors gratefully acknowledge the contributions of participants in this Special issue.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Shaw, Lee, Shen, Hirsh-Pasek, Adolph, Oudeyer and Popp. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Developmental Learning Approach of Mobile Manipulator via Playing

Ruiqi Wu<sup>1</sup> , Changle Zhou<sup>1</sup> , Fei Chao<sup>1</sup> \*, Zuyuan Zhu<sup>2</sup> , Chih-Min Lin1, 3 and Longzhi Yang<sup>4</sup>

*<sup>1</sup> Fujian Provincal Key Lab of Brain-Inspired Computing, Department of Cognitive Science, School of Informatics, Xiamen University, Xiamen, China, <sup>2</sup> Department of Computer Science, School of Computer Science and Electronic Engineering, University of Essex, Colchester, United Kingdom, <sup>3</sup> Department of Electrical Engineering, Yuan Ze University, Tao-Yuan, Taiwan, <sup>4</sup> Department of Computer and Information Sciences, Faculty of Engineering and Environment, Northumbria University, Newcastle upon Tyne, United Kingdom*

Inspired by infant development theories, a robotic developmental model combined with game elements is proposed in this paper. This model does not require the definition of specific developmental goals for the robot, but the developmental goals are implied in the goals of a series of game tasks. The games are characterized into a sequence of game modes based on the complexity of the game tasks from simple to complex, and the task complexity is determined by the applications of developmental constraints. Given a current mode, the robot switches to play in a more complicated game mode when it cannot find any new salient stimuli in the current mode. By doing so, the robot gradually achieves it developmental goals by playing different modes of games. In the experiment, the game was instantiated into a mobile robot with the playing task of picking up toys, and the game is designed with a simple game mode and a complex game mode. A developmental algorithm, "Lift-Constraint, Act and Saturate," is employed to drive the mobile robot move from the simple mode to the complex one. The experimental results show that the mobile manipulator is able to successfully learn the mobile grasping ability after playing simple and complex games, which is promising in developing robotic abilities to solve complex tasks using games.

#### Edited by:

*Patricia Shaw, Aberystwyth University, United Kingdom*

#### Reviewed by:

*Eiji Uchibe, Advanced Telecommunications Research Institute International, Japan Daniele Caligiore, Consiglio Nazionale Delle Ricerche (CNR), Italy*

#### \*Correspondence:

*Fei Chao fchao@xmu.edu.cn*

Received: *05 June 2017* Accepted: *19 September 2017* Published: *04 October 2017*

#### Citation:

*Wu R, Zhou C, Chao F, Zhu Z, Lin C-M and Yang L (2017) A Developmental Learning Approach of Mobile Manipulator via Playing. Front. Neurorobot. 11:53. doi: 10.3389/fnbot.2017.00053* Keywords: developmental robotics, mobile manipulator, robotic hand-eye coordination, neural network control, sensory-motor coordination

## 1. INTRODUCTION

Intelligent robots have been widely applied to support or even replace the work of humans in many social activities, such as assembly lines, family services, and social entertainment. These robots are made intelligent by many methods proposed in the literature, with the most common ones being mathematical modeling and dynamics models, such as Yan et al. (2013), Galbraith et al. (2015) and Grinke et al. (2015). These methods utilize predefined cognitive architectures in the intelligent systems, which cannot be used for significant changes during the interaction within the environment. If the intelligent system is applied in a new environment, the intelligent systems must be reconstructed. Also the complexity of the model increases exponentially as the complexity of the task increases. In addition, it is still very challenging in the field of robotics to allow the robot to learn complex skills and incorporate a variety of skills in an intelligent system.

Asada et al. (2001), Lungarella et al. (2003), and Weng (2004) attempt to let the robot learn intricate skills using the so-called developmental robotics approaches. These approaches enable robots to gradually develop multiple basic skills and thus learn to handle complex tasks (Berthouze and Lungarella, 2004; Jiang et al., 2014). In other words, the learning target of a complex set of skills are divided into the learning of a number of stage targets (Wang et al., 2013; Zhu et al., 2015), and the robot achieves the ultimate learning goal by completing a series of sub learning goals. This method reduces the difficulty for the robot to learn new skills (Shaw et al., 2014), and gives the robot the ability to accumulate learning, where the basic skills learned during the development process are reserved so as to arrive at the final skill (Lee et al., 2013). When a robot uses the method of developmental robotics to learn new skills, the target in every developmental phase must be clearly defined (Stoytchev, 2009). However, this is practically very challenge for those phases with a large number of complex tasks, thereby limiting the applicability of developmental robotics.

It has been observed by infant development researchers that infants and young children, when developing skills, do not need to define specific developmental goals (Adolph and Joh, 2007), and mergence is the primary form for infants to acquire skills (Morse and Cangelosi, 2017). In particular, a play phenomenon often accompanies the process of an infant skill development (Cangelosi et al., 2015), which has led to one infant development theory that infants develop relevant skills during play. The play of the early infant is driven primarily by intrinsic motivation (Oudeyer et al., 2007; Baldassarre and Mirolli, 2013; Caligiore et al., 2015), and an infant's development goal is implied in the game that the infant plays. This theory has not been applied and verified in developmental robotics. Therefore, a robotic developmental model that combines the infant developmental theory and developmental robotics is proposed herein. In this model, the learning skills of a robot are artificially viewed as game playing by an infant, and the developmental target is implied in the game goals. Then, a method of developmental robotics is ustilised by the model to accomplish the robot's skill development and learning. The proposed system not only reduces the difficulty of robot learning and allows accumulate learning, but also mitigates the limitation of applicability as discussed above by clearly defining goals in the developmental method.

In contrast to other developmental learning methods (Yang and Asada, 1996; Berthouze and Lungarella, 2004), the proposed approach embeds the role of play in early infant development into the developmental learning approach. Through two game modes, our robot developed mobile reaching and grasping abilities with no external reward existing in the two game modes. The robot merely uses its learning status to switch from one game mode into next one. Such approach also adopts the intrinsic motivationdriven learning method. Therefore, the main contribution of this work is a combined developmental algorithm that allows robot to acquire new abilities by applying the infant developmental theory, in which skills are developed through playing. With the inclusion of game elements, the robot can acquire developed mobile reaching and grasping skills with emergence.

The remainder of this paper is organized as follows: section 2 introduces the background knowledge of developmental robotics and the "Lift-Constraint, Act and Saturate" developmental algorithm. Section 3 outlines our model and designs the developmental strategy of robots. Section 4 describes the experimentation and analyzes the results. Section 5 concludes the paper and points out possible future work.

### 2. DEVELOPMENTAL ROBOTICS

As a research method with an interdisciplinary background of developmental psychology, neuroscience, computer science, etc. (Earland et al., 2014; Law et al., 2014a; Gogate, 2016), developmental robotics aims to provide solutions in the design of behavior and cognition in the artificial intelligence systems (Marocco et al., 2010; Baillie, 2016; Salgado et al., 2016). Developmental robotics is inspired by the developmental principles and mechanisms observed during the development of infants and children (Chao et al., 2014a), and thus the main idea of developmental robotics is to let a robot imitate a human's development process (Adolph and Joh, 2007; Oudeyer, 2017). The robot achieves sensory-movement and cognitive ability of incremental acquisition according to the inherent development principles and through real-time interaction with the external environment (Cangelosi et al., 2015). Developmental robotics focuses on two primary challenges in the field of robotics: (1) learning new knowledge and skills from a constantly changing environment; and (2) understanding their relationship with their physical environment and other agents.

Guerin et al. (2013) suggested in developmental robotics that most patterns need to be learned from a few patterns and the described knowledge must be developed gradually, by alluding to the general mechanism of sensory-movement development and the knowledge description in action-object relationships. Law et al. (2014a) achieved stage development on an iCub robot. They successfully built a development model for infants from birth to 6 months, which is driven by a new control system. Starting from uncontrolled movements and passing through several obvious stages of behavior, the iCub robot, like an infant, finally reaches out and simply manipulates the object. Cangelosi et al. (2015) used a method of action-centering to perform a large number of synchronous comparisons with similar human development and artificial systems. They discovered that human development and artificial developmental systems share some common practices from which they can learn. These studies inspired the establishment of the proposed robotic systems reported in this paper using the key features and important theories in human infant development.

One of the two most important research focuses in the field of developmental robotic is the development of skills corresponding to a particular stage of an infant's development (Chao et al., 2010; Law et al., 2013), and another is the modeling of the multi-stage development process (Hülse et al., 2010; Law et al., 2014b). However, it is also of significant importance to study the impact of play in early infant development, which may also provide solutions in developmental robotics. Hart and Grupen (2011) proposed a robot which organizes its own sensorymovement space for incremental growth. Their solution uses internal incentive motivations to allow robots to assimilate new skills which are learned from the new learning phase or the new

environment and become the basis for the next learning phase. This research has been applied to humanoid robots. Benefiting from important theories on developmental psychology, those humanoid platforms can easily reproduce several behavioral patterns or validate new hypotheses. Savastano and Nolfi (2013) used a neural robot to imitate an infant's learning process, which was demonstrated by a humanoid robot incrementally learning the ability to grasp. In the experiment, the maturity limit was systematically controlled and a variety of developmental strategies were produced. The experiment also shows that human and robots can learn from each other by a comparative study. Different with these studies, the experiment platform presented in this work is a wheeled mobile robot, aiming to study the influence of play in developmental learning methods.

The "Lift-Constraint, Act and Saturate" (LCAS) approach (Lee et al., 2007), as a developmental learning algorithm, has been widely applied in the developmental robot system (Chao et al., 2013; Wang et al., 2014). The LCAS approach contains a loop with three segments: (1) Lift-Constraint, (2) Act, and (3) Saturate. First, all possible (or available) restrictions are stated clearly and their release times are formulated. Then, the robot learns that all existing constraints are substantiated. When the saturation rate of a robot's learning system is stable, a new constraint is released. From this, the robot learns either new knowledge or skills leading to the removal of a new environmental constraint. When the robot has lifted all constraints through learning, and all saturation rates of the robot learning system are stable, the robot has successfully learned a series of skills.

### 3. THE PROPOSED METHOD

### 3.1. Model Overview

A developmental algorithm is proposed in this work by designing a game for robots that allows robots to develop skills by playing. Infants' stable grasping abilities are developed through grasping objects around them and infants are not happy until they can stably do so. Due to the constraints of body, infants pay most of their attentions on the range of physical activities while learning skills, which leads to more efficient skill learning. In the process of learning, infants do not have a clear learning goal, and all the activities are simply driven by intrinsic motivation and interest. Inspired by this, in our model, we design a game of pick-uptoys for a mobile manipulator, in which victory is the mobile manipulator successfully picking up surrounding toys. By playing this game, a robot with a mobile manipulator gradually develops mobile grasping ability. The game has two modes based on task complexity:


The robot's skill development process is illustrated in **Figure 1**. The robot with the mobile manipulator is initialised to play in the simple game mode until it successfully completes the game. After TABLE 1 | Infant's developmental stages and the corresponding development abilities.


the robot acquires near-body grasping ability, the game switches to the complex mode. The game is over when the robot acquires mobile grasping ability in this game mode.

The key in implementing a mobile robot with grasping ability is the coordination of the visual system, the manipulator, and the mobile system. This is mainly implemented by two basic non-linear mappings: (1) from the robot's visual space to the manipulator's movement space, and (2) from the visual space to the robot's mobile space. The two basic mappings are simulated in this work by two Radial Basis Function networks (RBF), and the training of these neural networks is accomplished by the robot playing the game.

#### 3.2. Developmental Strategy

Before acquiring mobile grasping ability, infants must attain a number of developmental stages as listed in **Table 1** (Law et al., 2010, 2011), in which they develop a variety of basic abilities. A developmental strategy is designed to support the development of a robot's mobile grasping ability, which is implemented by the LCAS algorithm (i.e., Algorithm 1 shown below) (Chao et al., 2014b). In this pseudo code, i is the number of learning epochs under current constraints, and Sat(i) is the saturation rate in the i th learning epoch. If the Sat(i) is true, the algorithm ends the training under the current constraint, and releases a new constraint. The value of Sat(i) is determined by Equation 1, where i is the number of training epochs; G(i) is the model's global excitation value at epoch i; ǫ controls the sampling rate; and the φ is a fixed value used to control the global excitation's amplitude of variation. If the value of G(i) is <φ and the variation of the global excitation is <ǫ, a new constraint is lifted. In this work, the values of parameters ψ, ǫ and φ are empirically set to 10, 0.5, and 0.02, respectively, and theoretical study on these parameters remain as future work.

$$\text{Sat}(i) = \begin{cases} true; & \text{if } |G(i) - G(i - \psi)| < \epsilon \\ & \text{and} \quad G(i) < \phi; \, i = \psi \cdot \cdots n \\ false; & \text{else} \end{cases} \tag{1}$$

In the LCAS algorithm, a robot's constraints are first substantiated, as shown in **Table 2**. Once the constraints are substantiated, the development of the robot proceeds according to the lift-constraint strategy as listed in **Table 3**.

The LCAS algorithm in this work is executed in the following five steps: (1) The target object is placed in the robot's external environment. The robot acquires image information about the environment by removing the "visual resolution" constraint of the robot's eyes. (2) After the robot attains watch ability, the



TABLE 2 | The constraint instantiation for mobile robot.


TABLE 3 | The robot's lift-constraint strategy.


eye joint constraint is lifted. The robot learns saccade ability by eye joint movement. (3) Then, the arm joint constraint is lifted to allow for the movement and sensory abilities of the robot's arm. After the motor babbling stage , the robot builds handeye coordination. Accordingly, the robot executes the reaching action. (4) From this, the tactile sensor constraint in the arm is removed after the robot builds hand-eye coordination. Based on hand-eye coordination, the robot detects whether the object in the gripper uses the tactile sensor. At this stage, the robot learns near-body grasping. (5) Finally, the wheel joint constraint is removed, and the robot has mobile ability. Then, the robot learns mobile grasping by building the mapping between visual and mobile space.

After the first two steps of training, the robot developed fixation and saccade abilities, and thus the robot can play the simple game (Chao et al., 2016). In the simple game mode, the third and fourth steps of the lift-constraint strategy are executed, and the robot develops hand-eye coordination and near-body grasping. The fifth step of the strategy is executed in the complex game mode, where the robot develops mobile grasping ability. The entire procedure for model training and execution is illustrated in **Figure 2**.

### 3.3. Mobile Robot Hardware System

The mobile robot's intelligent system is mainly composed of three subsystems: the visual subsystem, the manipulator, and the motor, as demonstrated in **Figure 3**. The detailed functions of the three systems discussed below.

1. **The visual subsystem.** The robot's visual system performs two tasks including finding the object and locating the object. Firstly, the visual system analyzes the image color information captured from the robot's two eyes, and detects whether the target is in the field of view. Secondly, if the target is in the field of view, the visual system, through fixation ability, acquires the retinal position of the target, S(x1, y1, x2, y2), wherein s<sup>l</sup> (x1, y1), and sr(x2, y2) express the coordinates of the target in the left and right eyes, respectively. The combination of the left and right eyes represents the target retinal position. Finally, the visual system, using the saccade ability, combines the retinal coordination, S(x1, y1, x2, y2), and the eye joint, Sh(j5, j6), to generate the visual space coordination, P(γ , θ).


### 3.4. Game Processes

#### 3.4.1. Simple Game Process

The simple game mode requires the robot to pick up balls that are scattered within the work range of the manipulator. In this

game mode, the robot's eyes only focus on the work range of the manipulator. After it implements the first two steps of the liftconstraint strategy and develops fixation and saccade abilities, the robot can participate in the simple game mode, as shown in the center frame of **Figure 2**. When the robot plays the simple game, it must build a mapping from the visual space to the manipulator's movement space. This mapping is simulated by one of the two RBF networks, R1. Different from the experiment hardware used by Caligiore et al. (2014), the robot is not a humanoid robot in this work. Therefore, it is impossible to use real data collected from infants, and instead the training samples are re-collected based on the wheeled robot. The training sample of this network consists of the visual space coordination, Sh(p, t), and the manipulator's joint value, (j1, j2, j3, j4). In the RBF network, the center positions of the RBF neurons are determined by a K-means algorithm, and the number of RBF neurons is set to twice the number of input dimensions. In this work, the Gaussian kernel is applied in the radial basis function. The calculation of the RBF network is shown in Equations (5) and (6), where y(x) denotes the network's output joint value, and w<sup>i</sup> denotes the weights of the hidden layer, φ<sup>i</sup> denotes a radial basis function, δ denotes the width of the Gaussian kernel, and it is empirically set at <sup>√</sup> 5.

$$\wp(\mathbf{x}) = \sum\_{i=1}^{i} \boldsymbol{w}\_i \boldsymbol{\phi}\_i \tag{2}$$

$$\phi\_l(\mathbf{x}) = \exp(-\frac{\left\|\mathbf{x} - \mathbf{x}\_l\right\|^2}{2\delta^2}), \delta > 0\tag{3}$$

In the process of training the robot's hand-eye coordination, a yellow ball is placed as a target object in the gripper of the manipulator. The ball moving randomly with the manipulator, and the sample e(P, Ma) are collected in each movement. The sample obtained is placed in the sample pool, E(e1, e2, e3, ... , en). When the number of samples in the sample pool reaches a fixed number, some samples are randomly selected as the training samples for the R<sup>1</sup> network. After that, the R<sup>1</sup> network is trained using the Backpropagation algorithm. During the collection of the sample, the values of the wrist, j3, and the gripper, j4, are fixed, because the shoulder j<sup>1</sup> and the elbow j<sup>2</sup> already represent most of the movement space of the robot, while the wrist, j3, is more involved in the grasping action (Marini et al., 2016). When the R<sup>1</sup> network training is saturated, the robot develops a basic handeye coordination capability, and the wrist, j3, and the gripper, j4, can be released. After this constraint is removed, the samples in the sample pool must be re-collected. The R<sup>1</sup> network uses these newly collected samples for further training, and, finally, the robot develops near-body grasping ability.

#### 3.4.2. Complex Game Process

When the learning status is stable in the simple mode, the robot switches to the complex game mode using whole field view. The procedure for the complex mode is shown in the right-hand box of **Figure 2**. In the complex mode, the balls are scattered within the visual range of the robot, but not in the work range of the manipulator. To pick up these balls, the robot must relocate itself. The mapping relationship between the visual space and the robot's mobile space is built in the complex mode. This mapping is simulated by another RBF neural network, R2. The training method and the set of parameters are the same as those used in the R<sup>1</sup> network.

The training samples of the R<sup>2</sup> network are based on two sequences: (1) the movement trajectory of the target in the robot's visual space, PS(ps1, ps2, ... , pst), and (2) the variation sequence of the robot's moving motor value MS(M1, M2, ... , Mt). In PS, pst denotes the coordinate distance of the ball in the robot's visual spaces when the robot moves from step t to step t + 1. The values of pst is determined by Equation (4), where pt(γ , θ) denotes the position of the target in the visual space at step t, and pt+1(γ , θ) denotes the position at step t + 1. Likewise, the change of the motor value from step t to step t + 1 is represented as M<sup>t</sup> . So, the former n-step movement trajectory and the accumulated change of the motor are expressed by Equations (5) and (6) respectively, where PA<sup>n</sup> denotes the accumulated distance from the target to

the robot when the robot moves n steps, and MA<sup>n</sup> denotes the accumulated change of the motor.

$$\rho\_{\rm st} = p\_{t+1}(\boldsymbol{\wp}, \boldsymbol{\theta}) - p\_t(\boldsymbol{\wp}, \boldsymbol{\theta}) \tag{4}$$

$$PA\_n = P\_{n+1} - P\_1 \tag{5}$$

$$MA\_n = \sum\_{n=1}^{n} M\_n, \quad n \le D \tag{6}$$

In the complex game, a target is placed within the field of the robot's vision, rather than within the manipulator's work field. Then, the robot is set to randomly move n steps. If the ball enters the manipulator work field occasionally during the n steps, the entire movement trajectory is chosen as a sample. However, if the ball is out of the field of robot's vision during the n steps, this trajectory is abandoned, and the target is randomly placed in a new position to start the iteration again.

This stage also requires the following two additional restrictions on the mobility of the robot. (1) Because the accuracy of the robot hardware is limited, the number of mobile steps n in a task must be less than the threshold, denoted as D. If the number of steps is too big, the accumulated error of the motor will be very large. (2) If the target disappears from the visual field during the robot moving, the task is considered to be a failure, and a new task is started by resetting the game. When the robot reaches the target position

within the number of steps, t, less than the threshold, the PA<sup>n</sup> and the MA<sup>n</sup> are combined into a sample e(PAn, MAn). The training R<sup>2</sup> network begins after enough samples have been collected in the sample pool E(e1, e2, e3, ... , en). After the mapping relationship is properly established by training the R<sup>2</sup> network, the robot develops mobile grasping ability. When

FIGURE 8 | The procedure in complex game mode.

performing a mobile grasping task, the robot completes the task in two steps. Firstly, the robot moves toward the target within the work range of the manipulator using the trained R<sup>2</sup> network. Then, the robot uses R<sup>1</sup> to complete near-body grasping.

## 4. EXPERIMENTS AND ANALYSIS

### 4.1. The Simple Game

In order to train the hand-eye coordination network, 2,200 training samples and 512 test samples were collected. The error change during the training process is shown in **Figure 4**. In this figure, each circle denotes the average error after the experiment has been ran for 50 times; and each vertical bar denotes its standard deviation. The training err quickly reduced before the wrist and gripper joints were released. The average error was reduced to <0.05 after just 500 training epochs, which is the point that the robot has successfully learned the hand-eye coordination. Then, the constraints of the wrist and gripper joints were released and the training of the robot's near-body grasping capability was initiated. **Figure 4** shows that, in the near-body grasping training stage, the training error begins to decline slowly after a rapid increase, eventually converging to about 0.06. After the network finished training, we tested it with 512 test samples, with the results shown in **Figure 5**. The overall average error for the 512 tests is about 0.07. Given that it is generally a success if the average error is <0.1 in robot hardware systems, it is clear that the proposed robot has successfully learned the near-body grasping skill.

**Figure 6** shows the robot's performance during the game after it has learned the near-body grasping skill. In the first step, the robot detects whether the target is within the working range of the manipulator. If the target is within that range, the robot proceeds to the second step, where it views the target, by a saccade and obtains the exact position of the target within its field of vision. After that, the robot maps the visual position of the target into the movement space of the manipulator, and drives the manipulator toward the target position. Finally, the manipulator reaches and grasps the target.

## 4.2. The Complex Game

The eye-mobile network was trained after succesfully trained the hand-eye network. In this stage, 600 samples were collected, of which 500 samples were used for training and the remaining 100 samples for testing. The threshold, D, is set as 7. This experiment has also been run for 50 times. Changes of the average error in the training process are shown in **Figure 7**. As seen in **Figure 7**, the training error immediately declines rapidly and reaches a stable minimum, indicating that the mapping between the robot's visual and mobile spaces is not very complicated.

**Figure 8** illustrates the robot's performance during the complex game mode after the eye-mobile network has finished training. In Step 1, the robot uses the visual system to obtain the position of target and detect whether the target lies within the working range of the manipulator. In Step 2, if the target is not in the working range of the manipulator, the robot drives the mobile system toward the target position until the target appears in the working range of the manipulator. In Step 3, after the target

appears in the working range of the manipulator, the robot stops moving and reacquires the visual position of the target. In Step 4, the robot drives the manipulator toward the target, and in Step 5, the robot reaches and grasps the target.

### 4.3. Performance Analysis and Comparative Study

The experiment discussed above allows the robot, step by step, to learn the near-body grasping skill using a developmental approach. In order to facilitate comparative study, another experiment with the same target of near-body grasping skill was designed using conventional direct training in the simple game mode. The performance of these two methods is compared in **Figure 9**, where the dotted line is the change of average training error that is learned directly in the simple game mode, and the solid line is the change of average error using the developmental method. As the results shown in **Figure 9**, in the first 2,000 epochs, the training efficiency of development method is higher than that of the direct training method. After 2,000 training epochs, the training efficiencies of these two methods achieve at a close error value. This phenomena proves that our approach is superior to the conventional method in learning efficiency. In the experiment using the developmental method, in the first 500 epochs, the network was trained using training samples for a robot whose wrist and gripper joints were constrained. After the error is less 0.05, the constraint was removed and the entire range of samples was used to retrain the network. As shown in **Figure 9**, as the number of epochs increases, the error rate for both the 2-step developmental and direct training approaches decreases. However, over the entire range of epochs, the error decreasing rate of the developmental approach is faster than that of the direct approach, indicating that using the developmental method in a game improves the learning efficiency of a robot.

To summarize, with the experiment of playing game, the robot successfully learned the mobile grasping ability by playing simple and complex games. Through the above experiments, we can conclude the following two results: (1) the proposed approach enables robots to learn skills by modeling the play activities during human infant development. (2) The developmental method with the game elements improves the robot's learning efficiency.

### 4.4. Comparison and Discussion

A comparison of **Figure 4** with **Figure 7** shows that the error rate for the eye-mobile network decreases more rapidly than that for the hand-eye network. However, in the early training epochs, the error rate is higher for the eye-mobile network than for the hand-eye network, because the mobile system has only two joints, but the manipulator has four. Therefore, mapping from visual space to mobile space is simpler than mapping to the manipulator movement space. On the other hand, fewer dimensions also make the output more sensitive to the input values. In **Figure 7**, the reason of the rapid error decreasing in R<sup>2</sup> is that a simplified motor mode, mapping the visual space to robot's motor position space, is used in this work. The target's visual coordination and the robot's wheel movement trajectory are collected as training samples, in which, the motor mobile value has only two dimensions. The mapping between the robot's visual and mobile spaces is not complicated. However, if the work uses the mobile platform's dynamic control model, which can support the acceleration control for our robot, the network will require more learning time for the more complicated control. In **Figure 4** the error rate rises rapidly after the wrist and gripper joints are released at the five hundredth epoch, because the mapping becomes more complex. However, the error rate after the rapid increase is still lower than that for the directly learning approach, proving that the learning in the previous stage is helpful for the next learning stage.

After testing the robot's near-body grasping ability, we further analyze the test results from the perspective of the manipulator's movement space. Because the gripper joint has only two ways to open and close, it has little effect on the variation of the manipulator in the movement space. Therefore, we analyze just the first three joints of the manipulator. The analysis results are shown in **Figure 10**. Triangles represent the test sample for which the error rate is >0.1; circles represent the others. **Figure 10** shows that most of the high error actions have at least two joints and an angle value near the extremum. In other words, these actions, distributed around the edge of the movement space, may be due to the instability of control when the manipulator's servos are near the maximum and minimum angles. Instead, the robot's grasping errors are generally below 0.1 at all other places. By removing the hardware factor, we assume that the robot has built the mapping from the visual space to the manipulator movement space.

For comparison, many developmental models used humanoid robots as experimental platforms (Marocco et al., 2010; Shaw et al., 2014; Morse and Cangelosi, 2017) in particular, several of them are infant-like robots. Those humanoid platforms directly benefit from important theories on developmental psychology, so that these platforms can easily reproduce several behavioral patterns or validate new hypotheses. In contrast, the shape and configuration of a mobile manipulator are very different from those of a human; therefore, the developmental theories need to

be adjusted to fit the robot system. However, currently, the mobile manipulators are competent to practical applications, which also requires the robots to have cognitive abilities. Thus, new developmental theories validated by mobile manipulators can be rapidly applied in real-life applications. Moreover, our work focus on using the "Play" strategy to create the mobile reaching ability for our robot, with the two game modes created. Our system not only involves spontaneous and intrinsically motivated exploration of actions and objects in varying contexts (Lee, 2011) but also, contains developmental characteristic by applying the "LCAS" developmental learning algorithm. Without setting specific goals, the robot uses its learning status to develop from the simple game mode to the complex one. The combination of developmental robotics and play modeling leads our robot to have faster learning rate.

## 5. CONCLUSION

Scientists have found that infants develop a number of skills when playing games in infant developmental research. In this paper, we combined these infant development theories with the LCAS algorithm to generate a developmental algorithm, which does not require specifically defined developmental goals for a robot. We designed two game modes and two RBF neural networks to simulate the procedures necessary for a robot to play in these game modes, with the support of a developmental strategy. The experiments demonstrated that a robot successfully learned moving and grasping skills. From results analysis and comparison, it can be concluded that: (1) a robot can successfully learn near-body grasping and moving grasping skills through play, and (2) in regard to a robot learning these skills, the developmental approach reduces the complexity and accelerates the learning speed.

Our model also has some limitations which can be mitigated in the future. For instance, in order to implement the hand-eye coordination system rapidly, our model uses an open-loop method, which may lead to several failed grasping. In addition, our model does not use the data obtained from real infants. In order to address these, a close-loop method may be used to improve the success rate of grasping. In addition, as an infant develops skills through play, the infant's intrinsic motivation and ability to imitate are closely related to that play (Santucci et al., 2013; Oudeyer et al., 2016). In other words, infants achieve unsupervised learning in their environment through intrinsic motivation, which plays an important role in the control of the infancy learning stage transformation. However, infants learn new skills faster than other babies if they have a strong ability to imitate during the learning process. Therefore, the applicability of the proposed system can be extended in the future by incorporating intrinsic motivation and ability.

## AUTHOR CONTRIBUTIONS

RW performed the experiments and wrote the manuscript; ZZ and FC designed the robotic learning approach; CZ provided psychological analysis on the experimental data; CL designed the robotic control system; and LY analyzed the experimental results and edited the manuscript.

## ACKNOWLEDGMENTS

This work was supported by the Major State Basic Research Development Program of China (973 Program) (No. 2013CB329502), the Fundamental Research Funds for the Central Universities (No. 20720160126), the National Natural Science Foundation of China (No. 61673322 and 61673326), and Natural Science Foundation of Fujian Province of China (No. 2017J01129). The authors would like to thank the reviewers for their invaluable comments and suggestions, which greatly helped to improve the presentation of this paper.

## REFERENCES


on Information Science, Electronics and Electrical Engineering (ISEEE), Vol. 3. (Sapporo: IEEE), 1613–1618.


on Intelligent Robotics and Applications, (Portsmouth, UK: Springer), 284–294.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Wu, Zhou, Chao, Zhu, Lin and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Impact of Individual Differences, Types of Model and Social Settings on Block Building Performance among Chinese Preschoolers

Mi Tian<sup>1</sup> , Zhu Deng<sup>2</sup> , Zhaokun Meng<sup>3</sup> , Rui Li<sup>4</sup> \*, Zhiyi Zhang<sup>5</sup> , Wenhui Qi<sup>6</sup> , Rui Wang<sup>7</sup> , Tingting Yin<sup>7</sup> and Menghui Ji<sup>7</sup>

<sup>1</sup> Department of Psychology, The Chinese University of Hong Kong, Hong Kong, China, <sup>2</sup> School of Psychology, Nanjing Normal University, Nanjing, China, <sup>3</sup> School of Art and Literature, Shihezi University, Xinjiang, China, <sup>4</sup> School of Foreign Languages, Huazhong University of Science and Technology, Wuhan, China, <sup>5</sup> School of Foreign Languages and Cultures, Nanjing Normal University, Nanjing, China, <sup>6</sup> School of Foreign Studies, Nanjing Forestry University, Nanjing, China, <sup>7</sup> Nanjing Liuyi Kindergarten, Nanjing, China

#### Edited by:

Qiang Shen, Aberystwyth University, United Kingdom

#### Reviewed by:

Wanze Xie, Boston Children's Hospital, Harvard University, United States Thea Ionescu, Babe ¸s-Bolyai University, Romania

> \*Correspondence: Rui Li liruidianzi@hotmail.com

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 25 October 2017 Accepted: 10 January 2018 Published: 30 January 2018

#### Citation:

Tian M, Deng Z, Meng Z, Li R, Zhang Z, Qi W, Wang R, Yin T and Ji M (2018) The Impact of Individual Differences, Types of Model and Social Settings on Block Building Performance among Chinese Preschoolers. Front. Psychol. 9:27. doi: 10.3389/fpsyg.2018.00027 Children's block building performances are used as indicators of other abilities in multiple domains. In the current study, we examined individual differences, types of model and social settings as influences on children's block building performance. Chinese preschoolers (N = 180) participated in a block building activity in a natural setting, and performance was assessed with multiple measures in order to identify a range of specific skills. Using scores generated across these measures, three dependent variables were analyzed: block building skills, structural balance and structural features. An overall MANOVA showed that there were significant main effects of gender and grade level across most measures. Types of model showed no significant effect in children's block building. There was a significant main effect of social settings on structural features, with the best performance in the 5-member group, followed by individual and then the 10 member block building. These findings suggest that boys performed better than girls in block building activity. Block building performance increased significantly from 1st to 2nd year of preschool, but not from second to third. The preschoolers created more representational constructions when presented with a model made of wooden rather than with a picture. There was partial evidence that children performed better when working with peers in a small group than when working alone or working in a large group. It is suggested that future study should examine other modalities rather than the visual one, diversify the samples and adopt a longitudinal investigation.

Keywords: block building, individual differences, types of model, social settings, Chinese preschoolers

## INTRODUCTION

Children's block building has been investigated for over a century (Froebel, 1895), and its relevance is documented in recent studies (Casey et al., 2012; Ramani et al., 2014; Newman et al., 2016). In preschool settings, children are provided with wooden unit blocks of varying shapes and sizes for the purposes of free play; children are also sometimes asked to copy a model or a picture, with more difficult tasks requiring symbolic representation (Otsuka and Jay, 2016). Such building activity is,

more often than not, recognized as an effective way to promote children's overall development (Rogers, 1985), literacy skills (Isbell and Raines, 1991; Wellhousen and Giles, 2005; Cohen and Uhry, 2011), social skills (Cohen and Uhry, 2007), mathematic skills (Casey et al., 2012) and spatial skills (Ramani et al., 2014; Cohen and Emmons, 2017).

One area of interest has been block building as an indicator of the development of symbolic representation, which involves a complex process associated with problem-solving, calculation and abstract thinking abilities (Diana and Test, 2011; Uhry and Cohen, 2011; Otsuka and Jay, 2016). There are several gaps in the literature. One aspect of block building that has not been studied concerns preschoolers' ability to copy models, either a wooden model or a picture. An understanding of preschoolers' block building skills under these two conditions might help shed some light on their psychological and cognitive development. More research is also needed on individual differences in block building skills based on gender (e.g., Goodson, 1982; Saracho and Spodek, 1995; Hanline et al., 2001; Casey et al., 2008; Kersh et al., 2008) and year in preschool (e.g., Stiles-Davis, 1988; Stiles and Stern, 2001), as the results of the extant literature are far from consensus. Lastly, the role of social settings (individual or group) (e.g., Hanline et al., 2001; Casey et al., 2008) has not been fully identified.

To this end, the present study examines gender differences, grade level (K1, K2, and K3, i.e., 1st, 2nd, and 3rd year in preschool or kindergarten, e.g., Shu et al., 2008), types of model (wooden model or pictures), and social settings (working individually or in a group) as predictors of block building performance. Block building was assessed using multiple measures, assessing a wide range of skills. The following sections review the relevant research on individual differences, types of model and social settings in relation to preschoolers' block building.

#### Individual Differences

Gender differences in block building performance have been investigated since the late 1950s (e.g., Farrell, 1957; Margolin et al., 1961; Clark et al., 1969). Early work in younger children samples suggested that more boys play with blocks than girls, boys spend more time in the block area (Farrell, 1957), and girls are more interested in non-block activities compared with boys (Margolin et al., 1961). More recent studies (e.g., Caldera et al., 1999; Snow et al., 2016) have obtained inconsistent results with regard to gender differences in children's preference for block building. Importantly for the current study, these studies focused on preference rather than process. That is, they did not compare boys and girls on block building skills in terms of spatial reasoning, for example as seen in the structural features and representational quality of children's constructions.

There also appear to be individual differences based developmental change in block building skills, documented in research in the 1930s (e.g., Hulson, 1930; Guanella, 1934) as well as more recent research (Stiles-Davis, 1988; Stiles and Stern, 2001). Several stage models have been proposed. For example, Guanella (1934) described five stages: non-structural use of blocks in late infancy; piles or rows of blocks; bi-dimensional use of blocks; tri-dimensional use of blocks; and representational play. Johnson (1983) described seven stages: carrying blocks around; making rows or piles; bridging; enclosures; decorative patterns with symmetry; naming of block constructions; and dramatic play with block constructions.

Other research has focused on children's block building skills in relation to their cognitive development (e.g., Goodson, 1982; Stiles-Davis, 1988). For instance, Goodson (1982) found a positive correlation among children's skills in building arches, planning, and perception. Reifel and Greenfield (1982) also noted that integration and dimensionality in children's block constructions are in line with the complexity of cognitive structures. However, these studies concerned only a subset of block construction skills, without a comprehensive focus on the construction's spatial, balance and structural features.

#### Types of Model

Traditional semiotics defined two components of a sign: the "signifier" and the "signified" (Saussure, 1916). "The 'signifier' is the physical form of the sign in words, images or sounds. The 'signified' is the mental concept referred to, its meaning" (Marsh and Millard, 2000, p. 78). By contrast, social semiotics "interprets language within a socio-cultural context, in which culture itself is interpreted in semiotic terms–as an information system" (Halliday, 1978, p. 2). From the social semiotics perspective, block building activities, to some extent, can be taken as the interplay between "signifier" (i.e., the symbolic representation of concrete objects or pictures in the real world) and "signified" (i.e., the realization of abstract meaning or mental concepts with wooden blocks in different shapes and sizes) in a social context. Block building becomes a special approach for preschoolers to convey abstract mental concepts by mapping symbolic representations into unit blocks they build based on their experiences.

Thus from the perspective of semiotics, preschoolers' block building performance can be understood as a reflection of their interpretation and expression of abstract meaning. However, only a few studies (e.g., Sluss and Stremmel, 2004; Cohen and Uhry, 2011) have examined symbolic representation in block building. For example, Cohen and Uhry (2011, p. 80) asked 4-year-old preschoolers in a culturally diverse classroom "to name and describe completed block structures to consider the meaning and learning represented through play experiences." The results showed that preschoolers built the block structures based on their personal life experiences in a social context, e.g., children described their block building based on home or school experience. In the current study, we extended this research by testing children's block building skills when presented with two types of symbolic representation (a wooden model vs. pictures). Developmentally, it is assumed to be easier to replicate a wooden model rather than the symbolic presentation provided by pictures (e.g., Greenfield and Schneider, 1977; Beagles-Roos and Greenfield, 1979).

#### Social Settings

Social settings refer to the presence or absence of peers during block building. Although children sometimes play alone with

blocks, blocks often encourage group play, and children are more likely to engage in a large cluster play within the block corner than in other areas of the classroom (Kinsman and Berk, 1979). Block building in a group setting has been shown to encourage preschoolers' prosocial behaviors such as smiling and helping, to promote communication and cooperation, and to discourage antisocial behaviors such as throwing blocks and fighting with others (Rogers, 1985). However, there may be gender differences in that boys have been found to be more influenced by the social context than girls during block building activities (Sluss and Stremmel, 2004).

Different studies have used various sizes to define "group," such as four children (e.g., Rogers, 1985), five children (e.g., Isbell and Raines, 1991) and ten children (e.g., Hanline et al., 2001). These studies compared individual and group block building, but no further comparison was conducted on the effects of group size (Hanline et al., 2001). Besides, there were also some studies (e.g., Cohen, 2006, 2015; Cohen and Uhry, 2007) that attempted to compare different group sizes on block building performance by allowing the group members to cooperate with each other during block building. Berk (1976) identified five levels of group size in children's play: individual, 2-member group, 3-to-5 member group, 6-or-more member group and total class group. Kinsman and Berk (1979) classified four levels of social settings in children's block building activity, i.e., individual block building, 2-member group, small cluster and large cluster. It should be noted that, the peer cooperation in groups might confound the effect of social settings itself, since members playing in a group environment might either work together or alone. For those working together in groups, it would be rather difficult to tell whether the social settings or peer cooperation contribute to children's block building performance. In this respect, we observed the classification of earlier studies (e.g., Berk, 1976; Kinsman and Berk, 1979) by examining three levels of social settings (i.e., individual block building, building in a 5-member group and building in a 10-member group), but each of them worked alone without peer cooperation.

### The Present Study

The present study examines the impact of gender and school level as individual differences, types of model (wooden model or picture) and social settings (individual, in a 5-member group, in a 10-member group) on block building performance among Chinese preschoolers. Multiple established measures were used to assess the full range of block building skills. There were three research questions:


### MATERIALS AND METHODS

### Participants

A total of 180 preschoolers from a public kindergarten in Nanjing city in Jiangsu province in China, ranging from K1 to K3 volunteered to participate in the block building study. Participants were from middle class families in order to keep family socioeconomic status (SES) homogeneous. Signed consent forms were obtained from parents. The experiment was approved by the ethics committee of Nanjing Normal University. Participants' detailed demographic information is reported in **Table 1** below.

### Materials

#### Blocks

At least 2000 unite blocks available to them, with 23 different shapes and sizes, including short board, medium plate, long board, small semicircle, semicircle, small curved surface, big circle, small triangles, triangles, sector, semi arches, small arch, small cubes, small rectangle, Gothic gate, small square column, square column, thin cylinder, small cylinder, middle cylinder, big cylinder, small curve A, and small curve B.

#### Wooden Model of the Tower

A three-dimensional wooden model of Yueyang Tower was presented to children to construct (see **Figure 1**). Yueyang Tower is a famous ancient Chinese tower in Hunan province, China. The Yueyang Tower was chosen because the preschoolers had no prior block building experience with it, and the building features of the Yueyang Tower range from simple to complex. The model was made up of six wooden plates that were the size of 34 cm (length) × 21 cm (breadth) × 0.3 cm (thickness), was chosen. The model weighed 0.65 kg, and it was 19 cm (length) × 19 cm (breadth) × 23 cm (height).

TABLE 1 | Summary of participants' demographic information (N = 180).


#### Pictures of the Tower

For the second reference material, two colored pictures of the wooden model of Yueyang Tower (one front view, one side view) were printed on A4-sized sheet of paper.

#### Measures

Block building performance was assessed using multiple measures. Using scores generated across these measures, three dependent variables were analyzed: block building skills, structure balance, and structural features. For detailed information about these three measures, see Appendix.

#### Block Building Skills

The measure of block building skills combined scores from two scales. First, constructions were rated based on the Block Construction Scoring Scale (Phelps and Hanline, 1999; Hanline et al., 2001, 2010): non-construction use of blocks (score of 0.5), linear constructions (scores ranging from 1 to 1.5), bidimensional/areal constructions (2–4.5), and tridimensional constructions (5–6.5). Block Construction Scoring Scale was intended to measure the complexity of block constructions, interrater reliability was between 0.83 and 1.00 (M = 0.95) when assessed across 65 children ranging in age from 16 to 75 months (Hanline et al., 2001). Constructions were also given a rating for tridimensional enclosure (7–9), from the Block Building Measure (Casey et al., 2008) with high inter-rater reliability (0.90–0.93). Thus, five classifications could be obtained using the measure of block building skills: Non-construction use of blocks, linear constructions, bidimensional/areal constructions, tridimensional constructions, and tridimensional enclosure.

#### Structural Balance

Based on a mixture of Study 1 and Study 2 as regards the Measure of Structural Balance and Structural Balance Rating Scale (Casey et al., 2012), inter-rater reliability was 0.91 and 0.87, respectively, six levels of rating consequently remained: stacking (rating of 1), bridging (rating of 2), bridging on a non-flat surface (rating of 3), scaffolding (rating of 4), balancing using counter-weights (rating of 5), and balancing using center- and counter-weights (rating of 6).

#### Structural Features

The measure of structural features developed for the present study based on the block building reference object Yueyang Tower. Yueyang Tower is a 3-story rectangular building completely of wood structure, including the bottom, the main tower, three layers of upturned eaves, and the roof. There are four classifications in the scale of structural features: bi-/tridimensional structure (scores of 1–1.5), basic structure (1–1.5), structural details (0.75–1.25), and representational play (1–1.5). A score was assigned to each classification independently, therefore, these four scores are summed to create a composite of total structural features.

## Procedure

#### Pre-test Preparation

Before the formal experiment, experimenters made classroom visits to observe block building performance across different grade levels and to negotiate with the preschool teachers in order to facilitate the forthcoming data collection for the experiment. The observation stage lasted 1 month. Three graduate students majoring in psychology received systematic training in order to become familiar with the data collection procedure.

#### Formal Test Procedure

Two rooms were used for assessment, a larger room for preschoolers in K2 and K3, and a smaller room for preschoolers in K1. Each room included unit blocks of varying shapes and sizes for the preschoolers to use. The experiment was conducted in either the morning or afternoon according to the respective schedules of the preschoolers across the three grade levels. Block activity was self-paced by the preschoolers. Half of the participants took wooden model of the tower as the reference material, the other half took pictures of the tower as the reference material.

During the course of the assessment, participants were given instructions that varied depending on social settings (work individually, work in a 5-member group, or work in a 10 member group). The instructions for the individual condition were as follows: "Let's play block building games! You see there, it is a Yueyang Tower model for your reference, which requires you to build the Yueyang Tower using blocks. After you finish, you'll receive a gift as a reward." The only difference in instructions for the 5-group and 10-group building settings were the addition of the sentence "Please notice that each of you should build Yueyang Tower alone without the cooperation of others."

#### Scoring

For the purpose of offline scoring, children's final block constructions were recorded with photographs taken from

various angles, e.g., front, back, left, right, up, down and interior space. Three raters who were blind to the aims and hypotheses of the current study independently completed the scoring of those photographs of the 180 block constructions. Interrater reliability of the three measures, namely block building skills, structural balance, and structural features, was established using the Kendall coefficient of concordance among the three raters. Kendall's W ranged from 0.952 to 0.992 (M = 0.974), p < 0.001, indicated high interrater reliability for the three measures. Five senior preschool teachers who taught block construction to preschoolers for 13 to 20 years rated the content validity of each measure using a 5-point Likert scale (1 = low content validity, 5 = high content validity). The mean rating was 4.89 ± 0.15, indicating the high content validity.

#### RESULTS

Outliers 3 SDs above or below the mean were trimmed during pre-processing of the data (e.g., Li et al., 2017a,b). A series of 2 (Gender: male, female) × 3 (Grade Level: K1, K2, K3) × 2 (Types of Model: wooden model, picture) × 3 (Social Settings: individual, 5-member group, 10-member group) multivariate analyses of variance (MANOVA) was carried out, for three dependent variables: block building skills, structural balance and structural features. The MANOVA analysis of gender, grade level, types of model and social settings yielded a Wilks' Lambda = 0.926, p = 0.149; Wilks' Lambda = 0.058, p = 0.000; Wilks' Lambda = 0.949, p = 0.399 and 0.926, p = 0.000, respectively.

#### Block Building Skills

There were significant main effects of gender, F(1,144) = 5.028, p = 0.026, η <sup>2</sup> = 0.034, and grade level, F(2,144) = 159.670, p < 0.001, η <sup>2</sup> = 0.689. Bonferroni post hoc pairwise comparison showed that, boys' score for block building skills (M = 5.81, SD = 1.88) was significantly higher than that for girls (M = 5.48, SD = 1.84), p = 0.026. Scores in the K1 group (M = 3.50, SD = 1.75) were significantly lower than in K2 (M = 6.68, SD = 0.50), p = 0.000, and in K3 (M = 6.75, SD = 0.48), p = 0.000. There was neither a significant main effect of types of model, F(1,144) = 0.004, p = 0.950, nor of social settings, F(2,144) = 1.021, p = 0.363. However, these results were subsumed under a two-way interaction between types of model and social settings, F(2,144) = 3.049, p = 0.050, η <sup>2</sup> = 0.041, and a three-way interaction between grade, types of model and social settings, F(4,144) = 8.322, p < 0.001, η <sup>2</sup> = 0.188.

Analysis of the simple effects showed that when the K1 group was presented with a wooden model, building skills in the 10-group setting (M = 5.983, SD = 1.277) were significantly higher than in the individual setting (M = 5.522, SD = 1.974) or 5-group setting (M = 5.644, SD = 2.422), with no significant difference between the individual and 5-group setting. When the K1 group was presented with a picture as a model, building skills in the 5-group setting (M = 5.944, SD = 1.477) were significantly higher than in the individual setting (M = 5.574, SD = 1.904) and in the 10-group setting (M = 5.433, SD = 2.075), with no significant difference between the individual and 10-group setting.

#### Structural Balance

There was a significant main effect of gender, F(1,144) = 6.675, p = 0.011, η <sup>2</sup> = 0.044, and grade level, F(2,144) = 219.803, p < 0.001, η <sup>2</sup> = 0.753, were both observed, with boys (M = 4.34, SD = 1.16) performing better than girls (M = 4.14, SD = 1.37), p = 0.011, and K2 (M = 5.00, SD = 0.000) and K3 (M = 5.00, SD = 0.000) performing better than K1(M = 2.71, SD = 1.16), p = 0.000. The interaction between gender and grade level was also significant, F(2,144) = 6.675, p = 0.002, η <sup>2</sup> = 0.085. Specifically, no difference between boys and girls was observed in K2 and K3 (M = 5.000, SD = 0.000), but in K1, boys' structural balance scores (M = 2.943, SD = 1.144) were significantly higher than girls' scores (M = 2.495, SD = 1.154). There was neither a significant main effect of types of model, F(1,144) = 0.014, p = 0.906, nor of social settings, F(2,144) = 0.326, p = 0.722. However, these results are subsumed under a two-way interaction between types of model and social settings, F(2,144) = 5.157, p = 0.007, η <sup>2</sup> = 0.067, and a three-way-interaction between types of model, social settings, and grade level, F(4,144) = 5.157, p = 0.001, η <sup>2</sup> = 0.125.

Analysis of the simple effects showed that, for K1, but not K2 or K3, children who were given a wooden model showed worse performance in the individual setting (M = 2.311, SD = 1.123) compared to both the 10-group (M = 3.100, SD = 0.994) and 5 group (M = 2.800, SD = 0.837) settings. When the K1 children were given a picture as a model, building performance in the individual setting (M = 3.244, SD = 1.172) was significantly higher than in both the 5-group (M = 2.667, SD = 1.886) and 10-group (M = 2.100, SD = 0.738) settings.

#### Structural Features

There was no main effect of gender, F(1,144) = 0.260, p = 0.611. There was a significant main effect of grade level, F(2,144) = 257.556, p < 0.001, η <sup>2</sup> = 0.782, with K3 (M = 10.37, SD = 1.94) being significantly higher than K2 (M = 7.49, SD = 1.68), p = 0.000, and K2 being significantly higher than K1 (M = 3.10, SD = 1.46), p = 0.000. There was no main effect for types of model, F(1,144) = 0.844, p = 0.360. However, there was a significant main effect of social settings, F(2,144) = 3.165, p = 0.045, η <sup>2</sup> = 0.042, with the best performance in the 5-member group, followed by the individual setting (M = 7.18, SD = 3.36) and then the 10-group setting (M = 6.56, SD = 3.34).

However, these main effects were subsumed under interaction effects. First, there was a two-way interaction between social settings and types of model, F(2,144) = 4.903, p = 0.009, η <sup>2</sup> = 0.064. Analysis of the simple effects showed that, when given a wooden model, the structural features score in the individual setting (M = 7.28, SD = 3.68) was higher than in both the 5-group (M = 6.87, SD = 4.13) and 10-group (M = 7.26, SD = 3.25) settings. When given a picture as a model, the structural features score in the 5-group setting (M = 7.63, SD = 3.80) was higher than that of both the individual (M = 7.09, SD = 3.04) and 10-group (M = 5.87, SD = 3.33) settings.

### DISCUSSION

The present study examined the impact of individual differences, types of model and social settings on three measures of block building performance (i.e., block building skills, structural balance and structural features). Performance varied depending on gender, grade level, and social settings, but not types of model used.

### Individual Differences

fpsyg-09-00027 January 27, 2018 Time: 14:31 # 6

#### Gender Differences

Boys performed better than girls in block building skills and structural balance, consistent with studies showing that boys are significantly more likely than girls to engage in block building activities (Rubin, 1977), and they choose to play in the block area more often than girls (Snow et al., 2016). This finding contradicts other research suggesting a lack of gender differences (Moyer and Gilmer, 1956; Hanline et al., 2001). For instance, recent research showed that boys did not outperform girls on a measure of structural complexity, except that girls tended to build structures that included more symbolic features (Ramani et al., 2014). It should be noted that, unlike previous studies, we used multiple measures to assess block building, making it possible to detect gender differences on specific skills. Specifically, we found that boys performed better than girls in block building skills and structural balance. These skills have been reported to be associated with spatial development (e.g., Cohen and Emmons, 2017) and mathematic skills (e.g., Casey et al., 2012), and gender differences in these areas would be consistent with research showing that boys outperform girls in logical thinking and abstract awareness (Fennema et al., 1998). However, we found no significant gender differences in the other measures of block building, namely structure features. These skills are closely related to preschoolers' spatial imagination, which might not be assessed well by the measures of block building used in this study. The fact that children in the current study were asked to copy a model rather than engage in free play might also have limited the chance to detect gender differences on these specific skills.

#### Grade Level Difference

Significant difference was found in block building scores depending on year in preschool. We found tridimensional constructions in K2 and K3 together with linear or bidimensional constructions in K1, consistent with other research (e.g., Reifel, 1984; Casey et al., 2008; Cohen and Emmons, 2017) showing developmental trends with respect to dimensionality in young children's block building. Combining blocks in only one dimensional space appears to be the most common form of block play before 2 years old. Between the ages of 2 and 3, children begin to build in two dimensions. Between 3 and 4, they gradually build blocks in three dimensions. It is not until 4 and 5 years that children build multicomponent constructions, and show a considerable flexibility in block building. Thus, there is general agreement that changes in the spatial dimensionality emerge in an organized fashion and increase with age.

One factor that appears to influence the increase in block building skills is that older children spend more time with blocks than younger children (Clark et al., 1969), and the amount of time involved in block play has a positive effect on the complexity of block constructions (Halford et al., 1998; Hanline et al., 2001), including more spatial dimensions (Stiles-Davis, 1988; Stiles and Stern, 2001). Peer and teacher interactions in the block area also appear to promote block building performance (Trawick-Smith et al., 2016), and systematic teaching of block building skills accounts in part for block structure complexity (Casey et al., 2008).

### Types of Model

Vygotsky (1967, 1978) argued that play may be children's chief means for developing and understanding symbols. Thus, block play may be a way for preschoolers to map the "signified" onto "signifier." In the present study, we presented children with two types of model, namely a wooden model and pictures, and asked them to make a replicate. Interestingly, we found that the wooden model elicited more representational play than the picture, but children's responses to the two types of representation did not differ in block building skills, structure balance, structure features.

Representational play refers to the representation of block constructions embedded with preschoolers' detailed real-world experience, requiring imagination and demands (e.g., Norman and Bobrow, 1975; Duncan, 1980). Preschoolers' processing difficulties are highly related to the detailed precisions of symbolic representation, i.e., the more detailed the symbolic representation, the easier it might be for preschoolers to process, which impact their performance of block building in turn. This perspective is consistent with Piaget's model of cognitive development (Piaget, 1962a,b; Lourenço, 2016), in which preschoolers, typically in the preoperational stage, begin to engage in symbolic play and learn to manipulate symbols, but do not yet understand concrete logic. Thus, creating a three-dimensional structure based on a three-dimensional model (the model made of blocks) would be easier than making a three-dimensional structure based on a bidimensional model (the picture). When children were presented with a threedimensional model they showed better representational play, but the dimensionality of the model did not affect other aspects of block building that might consume fewer cognitive resources.

### Social Settings

In the current study we measured block building performance in three social settings: building alone, in a group of 5 children, and in a group of 10 children. We found that block building performance was stronger when working in a small group than when working alone or working in a large group. This is consistent with early research showing that playing in pairs or small clusters elicited more intimate social interactions than were seen in larger clusters (Kinsman and Berk, 1979). The possibility that children would show better block building performance in smaller rather than larger groups is consistent with the "population interference effect"; that is, when members of a population are engaged in a cognitively demanding task, the efficiency of members' performance is interfered with by mutual peer influence.

In the current study, the "population interference effect" was seen in the quality of structural features, rather than in the quality of the basic structure. Structure features refer to the degree of resemblance between the model or picture to be consulted and the children's construction. Presumably, children in the 5-member groups encountered less mutual interference from peers than children in the 10-member group, allowing fuller expression of skills related to structural details. However, children in groups of 5 and in groups of 10 showed similar block building performance in terms of basic structure of Yueyang Tower, and both groups showed better performance than children working alone. Compared with other aspects of block building, the creation of a basic structure does not make as many cognitive demands, and so this skill may be less affected by interference from peers.

#### Limitations and Future Research

The present study has several limitations that should be addressed in future research. First, the effects of symbolic representation results might have been extraneously influenced by the task difficulty in that both tasks, namely the model and the picture, are in visual modality that seems too easy to process among the participants. In this sense, will the effects of symbolic representation be relatively prominent with tasks of different modalities, e.g., verbal vs. visual? In future research a multimodal approach could be exploited to compare the verbal modality (e.g., the naming task; Cohen and Uhry, 2011) and visual modality (e.g., model or picture). Second, the samples that were only from one Chinese kindergarten might limit the generalization of our results. Future studies should use diverse samples from different areas of China (e.g., Hornung et al., 2017; Lo et al., 2017) and from other countries. Third, we used a cross-sectional design and longitudinal data will also be important to capture developmental change in the future studies.

### REFERENCES


### CONCLUSION

To the best of our knowledge, no previous study has used multiple measures to examine factors influencing block building among Chinese preschoolers. The present study makes the following contributions to the block building literature. First, it used multiple measures of block building in order to identify a range of specific skills. Second, it clarified the role of individual differences (gender, year in preschool) and methodology (types of model for children to copy, number of children at work) in predicting block building performance among Chinese preschoolers. Third, we were able to make some reference for both the scale development and the future research.

### AUTHOR CONTRIBUTIONS

MT, RL, and ZM are responsible for research design, draft writing and editing. ZZ, WQ, and ZD are responsible for draft editing. RW, TY, and MJ are responsible for participants employment.

### FUNDING

This research was supported by China Postdoctoral Science Foundation funded project (grant number 2017M622395) and Projects of the National Social Science Foundation of China (grant number 14BYY060).

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.00027/full#supplementary-material


Action and Thought: From Sensorimotor Schemes to Symbolic Operations, ed. G. Forman (New York, NY: Academic Press), 167–201.


with the development of abstract thinking. Early Child Dev. Care 187, 990–1003. doi: 10.1080/03004430.2016.1234466


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Tian, Deng, Meng, Li, Zhang, Qi, Wang, Yin and Ji. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpsyg-09-00027 January 27, 2018 Time: 14:31 # 8

## Toward a Neuroscientific Understanding of Play: A Dimensional Coding Framework for Analyzing Infant–Adult Play Patterns

Dave Neale1,2, Kaili Clackson<sup>1</sup> , Stanimira Georgieva<sup>1</sup> , Hatice Dedetas<sup>1</sup> , Melissa Scarpate<sup>1</sup> , Sam Wass<sup>3</sup> and Victoria Leong1,4 \*

<sup>1</sup> Department of Psychology, University of Cambridge, Cambridge, United Kingdom, <sup>2</sup> School of Education, University of Delaware, Newark, DE, United States, <sup>3</sup> Division of Psychology, University of East London, London, United Kingdom, <sup>4</sup> Division of Psychology, Nanyang Technological University, Singapore, Singapore

Play during early life is a ubiquitous activity, and an individual's propensity for play is positively related to cognitive development and emotional well-being. Play behavior (which may be solitary or shared with a social partner) is diverse and multifaceted. A challenge for current research is to converge on a common definition and measurement system for play – whether examined at a behavioral, cognitive or neurological level. Combining these different approaches in a multimodal analysis could yield significant advances in understanding the neurocognitive mechanisms of play, and provide the basis for developing biologically grounded play models. However, there is currently no integrated framework for conducting a multimodal analysis of play that spans brain, cognition and behavior. The proposed coding framework uses grounded and observable behaviors along three dimensions (sensorimotor, cognitive and socioemotional), to compute inferences about playful behavior in a social context, and related social interactional states. Here, we illustrate the sensitivity and utility of the proposed coding framework using two contrasting dyadic corpora (N = 5) of mother-infant objectoriented interactions during experimental conditions that were either non-conducive (Condition 1) or conducive (Condition 2) to the emergence of playful behavior. We find that the framework accurately identifies the modal form of social interaction as being either non-playful (Condition 1) or playful (Condition 2), and further provides useful insights about differences in the quality of social interaction and temporal synchronicity within the dyad. It is intended that this fine-grained coding of play behavior will be easily assimilated with, and inform, future analysis of neural data that is also collected during adult–infant play. In conclusion, here, we present a novel framework for analyzing the continuous time-evolution of adult–infant play patterns, underpinned by biologically informed state coding along sensorimotor, cognitive and socio-emotional dimensions. We expect that the proposed framework will have wide utility amongst researchers wishing to employ an integrated, multimodal approach to the study of play, and lead toward a greater understanding of the neuroscientific basis of play. It may also yield insights into a new biologically grounded taxonomy of play interactions.

Keywords: play, mother–infant interaction, neuroscience, coding, social interactions

#### Edited by:

Qiang Shen, Aberystwyth University, United Kingdom

#### Reviewed by:

Sarah Jessen, University of Lübeck, Germany Ora Oudgenoeg-Paz, Utrecht University, Netherlands

> \*Correspondence: Victoria Leong vvec2@cam.ac.uk

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 13 October 2017 Accepted: 19 February 2018 Published: 21 March 2018

#### Citation:

Neale D, Clackson K, Georgieva S, Dedetas H, Scarpate M, Wass S and Leong V (2018) Toward a Neuroscientific Understanding of Play: A Dimensional Coding Framework for Analyzing Infant–Adult Play Patterns. Front. Psychol. 9:273. doi: 10.3389/fpsyg.2018.00273

## INTRODUCTION

fpsyg-09-00273 March 19, 2018 Time: 17:23 # 2

### Challenges to a Neuroscientific Understanding of Play

Play during early life is a ubiquitous activity. Engaging in play is positively associated with the development of social skills, cognitive skills, language and emotional well-being (Lyytinen et al., 1999; Pellegrini et al., 2002; St George et al., 2016; Thibodeau et al., 2016; Fung and Cheng, 2017). Current conceptualizations of play in the behavioral sciences view it in broad terms as behavior that is voluntary, engaging, nonfunctional, and associated with the expression of positive affect (Burghardt, 2005; Lillard et al., 2013; Miller, 2017). Play can also be categorized based on the focus of play, i.e., what is the individual playing with? For example, physical play is play with one's own body and other people, for example, climbing, sliding, chasing (Power, 1999; Pellegrini et al., 2002; St George et al., 2016); sociodramatic or pretend play is play with a makebelieve world, and the focus of play is more than a concrete observable entity (Lillard et al., 2013); games with rules involve playing with a set of rules that participants agree to abide by to partake in the play experience, for example, board games or playground games such as tag (Hassinger-Das et al., 2017); and object play involves playing with physical objects (Power, 1999; Pellegrini and Gustafson, 2005). Object play can be further subdivided, depending on the activity conducted with the object. For example, objects may be used in relational play, where multiple objects are combined or joined together, and object-pretense, where the object is used to represent something else (Belsky and Most, 1981).

These diverse categorizations and definitions show that play in humans is diverse, multi-faceted, and defined by a set of broad terms encompassing motivational, cognitive, social and emotional aspects of behavior and psychology. Consequently, a challenge for current research is to converge on a common definition and measurement system for play – whether examined at a behavioral, cognitive or neurological level. This is a timely challenge to address as advanced brain imaging techniques now permit the concurrent capture of neural activity from adult– infant dyads during naturalistic social interactions, such as joint play (Wass and Leong, 2016; Leong et al., 2017). Central to this challenge is the fact that it is difficult to exert the level of experimental control and temporal precision required for investigation at the neurological level, while also retaining the freeform, diverse quality which many consider to be a defining feature of play.

### Development of Human Play Behavior

Play behavior changes substantially across the life-span (Power, 1999). These changes in play behavior occur per one's developmental level (e.g., progression from solitary play to cooperative play) and interactions with others in an effort to achieve developmental goals. During infancy, mothers engage in one-on-one play with their baby to model and promote skills necessary for their child's development; such as communication and language skills, increase their cognitive capacities, foster autonomous development, and other important skills that are required for social interaction and well-being (e.g., Valentino et al., 2011; Bernier et al., 2016). For example, Mermelshtine and Barnes (2016) found that the mother's responsiveness to their infant during play at 10 months of age positively predicted higher cognitive capacities and skills (e.g., problem solving, knowledge and memory) at 18 months. This effect remained after accounting for maternal education, home adversity and infant advanced object play. While dyadic interaction e.g., mother-infant – is present throughout the first year of life, it is not until around the end of the first year that infant's ability to engage in triadic interaction (i.e., mother-infant-object) becomes consolidated (Bakeman and Adamson, 1984; de Barbaro et al., 2013a). This progression to triadic interaction, focussed around an object, is considered important for many aspects of psychological development, including symbolic awareness and language (Tomasello, 1999; De Schuymer et al., 2011). As Rodríguez (2009) points out, objects are symbols of their uses within a culture (a cup, for example, can represent drinking), and an understanding of these object-use relations represents the early acquisition of cultural norms and adoption of a fundamental symbolic system. Adults and infants communicate about objects and with objects. Furthermore, there is substantial crossover between the literatures on object play and object exploration in infancy, and exploration is viewed as a fundamental part of early childhood play (Belsky and Most, 1981). The evidence suggests that object exploration in infancy plays a role in the development of problem-solving and attention (Caruso, 1993; Poon et al., 2012; Clearfield et al., 2014) and individual differences are observed between children from different socioeconomic backgrounds (Clearfield et al., 2014). Consequently, focusing our model around a physical object was deemed the best approach for studying behavioral and neural activity during parent–infant play.

In addition to using object play in the current approach, the context of mother-infant play is equally important for the current study. The importance of mother-infant interactions on early development is well documented (e.g., Belsky and de Haan, 2011; Bernier et al., 2016; Mermelshtine and Barnes, 2016). However, the interactions measured are often related to parenting processes (e.g., parental support/affect, sensitivity, communication, responsiveness) that occur in a play context and do not include quantitative coding of the actual play interactions. While these studies do provide insights into how early development is influenced by maternal parenting processes, they cannot explain the specific role of mother-infant play. Therefore, it is an important next step to examine if and how mother–infant play affects early development, particularly neural development. In order to achieve this goal, a play coding scheme that is compatible in time-resolution to that of brain imaging measures should be developed.

### Insights Into the Neuroscience of Play From Animal Models

Due to the challenges of experimental control (e.g., standardization of participants' behavior and environment), neuroscience studies on play have primarily focused on animal

models (in particular rats) and rough-and-tumble social play behavior (see reviews by Pellis and Pellis, 2009; Cooke and Shukla, 2011; Siviy and Panksepp, 2011; Vanderschuren et al., 2016). Rodent models have proven to be particularly useful because rats show predictable and stereotypical forms of playrelated behavior (e.g., one animal 'pins' the other on its back, emission of ultrasonic vocalizations, etc.) which are readily quantifiable and amenable to experimental and pharmacological manipulation. Consequently, a relatively rich literature now exists on the neuroanatomical and neurochemical substrates of rough-and-tumble play behavior in rats, using (invasive) methods such as brain lesioning, intracranial administration of neuroactive compounds, and gene expression assays. Namely, the key neural circuits that are now known to work in concert to support rats' play fighting behavior are: (1) a cortical executive circuit (particularly the prefrontal cortex (PFC) and orbitofrontal cortex (OFC)) which mediates the developmental fine-tuning and complexity of play, such as the ability to coordinate with or modify movements in response to the social status of a play partner (Moore, 1985; Pellis et al., 1999; Pellis et al., 2006; Bell et al., 2009; Siviy and Panksepp, 2011); (2) a subcortical limbic circuit (amygdala, hypothalamus and striatum) which moderates the motivation for, and affective response to play (Meaney et al., 1981; Wolterink et al., 2001; Daenen et al., 2002; Burgdorf et al., 2007), potentially via dopaminergic and opioid pathways (Vanderschuren et al., 2016); and (3) somatosensory circuits (somatosensory cortex, thalamus, cerebellum) which control motor play patterns and performance (Siviy and Panksepp, 1985, 1987a,b; Panksepp et al., 1994; Byers and Walker, 1995).

Animal studies have further shown that play induces neural plasticity in brain areas involved in sensorimotor processing (e.g., parietal cortex, colliculi and striatum, Gordon et al., 2002), and also in the medial prefrontal cortex (mPFC, Cheng et al., 2008), an area which sends strong modulatory inputs to limbic circuits that control social behavior. In humans, the mPFC inhibits aggression and monitors approach/avoidance behavior (Bufkin and Luttrell, 2005; Hall et al., 2010). Therefore, increased plasticity in the mPFC following play could indicate that play helps to improve control of social behavior networks. Rough-and-tumble play in rats also seems to promote brain development by increasing the expression of brain-derived neurotrophic factor (BDNF) in the amygdala and prefrontal cortex (Gordon et al., 2003), and that of insulin-like growth factor 1 (IGF-1) in the frontal and posterior cortices (Burgdorf et al., 2010). Accordingly, it has been suggested that playinduced neural plasticity could support the emergence of adultlike behaviors (Cooke and Shukla, 2011). Although caution must be applied in extrapolating findings from animal work to humans, the current data do suggest that across species, play may be a fundamental neurobehavioral process that is underpinned by (and produces changes in) major cortical and subcortical neural circuits that support cognition, emotion and sensorimotor function. However, the neuroscientific methods that have successfully been used with animal models are too invasive to be performed on human subjects. Further, even when ostensibly comparing "motor-based" play, human play behavior is far more complex and less stereotypical than animal play-fighting behavior, as described above. Consequently, neuroscience research into play in humans tends to either assess neurological change using a pre-test, post-test design (Newman et al., 2016), or to study neural activity while the participant observes, but does not engage in, play behavior (Smith et al., 2013). Going beyond these empirical constraints to identify the neural mechanisms that underlie ongoing, complex play behavior in humans presents a considerable challenge. To address this challenge, non-invasive human neuroimaging (e.g., EEG, fMRI) and psychological behavioral coding approaches could be combined into a multimodal analysis that may yield advances in understanding the human neurocognitive mechanisms of play. However, there is currently no methodological framework that is suitable for conducting a multimodal analysis of play that spans brain, cognition and behavior.

### Limitations of Current Measures of Play for Neural Analyses

In order to combine neural and behavioral analyses of play, the behavior of interest must be identified with precise temporal resolution. In addition, the behavioral coding should be able to identify change between various play and non-play states, to facilitate the intra- and inter-individual analysis of corresponding changes in neural activity. However, existing play coding schemes are predominantly based around global ratings, checklists, or frequency counts of play behaviors, and so do not capture temporal information about when specific play behaviors occur, or information about non-play behavior. Examples of global rating schemes include Poon et al. (2012) who rated parentinfant play sessions on a scale of 1 – 5 for joint attention, imitation and object play, and St George et al. (2016) who gave each parent a global score on 10 different dimensions, including sensitivity (how responsive the parent was to the child's signals), positive regard (demonstrations of love and affection), and stimulation of cognitive development (teaching). Check-list approaches include the Symbolic Play Test (Lowe and Costello, 1976), which captures behaviors which children display when playing with a specific set of toys, such as 'feeds doll' and 'moves truck or trailer about.' A similar checklist approach is found in many bespoke measures of play, such as that used by Pellegrini (1992), where children were observed in the playground and the behaviors displayed were recorded, including peer interaction and object play. In an analysis of infant play, Belsky and Most (1981) applied a checklist approach to time-sampled data, by using a checklist to record the 'most competent' level of play observed in each 10-s period. The authors acknowledge that this approach obscures information about the frequency of play behaviors, as any 'lower level' play behaviors occurring in the same 10-s period cannot captured by the coding scheme. But information about 'high level' behaviors is also obscured, including their precise timing and frequency within each 10-s period. From a neuroscience perspective, knowing that one type of play occurred at some point within a 10-s window does not provide sufficient temporal precision for event-locked analyses to be conducted.

A few studies have captured more precise temporal information in the context of mother/parent–infant play

(e.g., Courage et al., 2010; James et al., 2012; Zuccarini et al., 2017). However, no play-specific coding schemes that we are aware of measure play behavior between the parent and infant at a time resolution that is fine-grained enough (i.e., 10s of milliseconds) to be compatible with neural (e.g., EEG) analyses. Furthermore, these existing schemes are not designed to capture and analyse the temporal evolution of a range of behavioral states as a fluid continuum. For example, Zuccarini et al. (2017) coded 'motor object exploration' in infant play, where the infant explored an object with their hands or mouth, and Koterba et al. (2014) coded infant looking and mouthing during play with a rattle. While such schemes reflect our emphasis on the continuous fine-grained coding of play behavior, they do so by coding one or two specific actions and then analyzing how the duration or frequency of those actions vary between infants. Our coding scheme, by contrast, is designed to track the continuously evolving behavior of participants as they move through play, teaching/learning, joint attention, and other such states, and facilitate analysis within, as well as between, participants. A multimodal coding system for mother–infant play, suitable for analyzing co-occurring patterns in real-time behavioral and neurological data, requires the flexibility to capture a wide variety of behavioral states, combined with a very high degree of temporal precision, and such a combination does not exist in established play coding schemes.

Second (as indicated by animal studies), play is a highly complex social interactive activity that activates a combination of sensorimotor, cognitive and socio-emotional neural circuits each of which may support separable dimensions of behavior. Importantly, no single behavioral dimension by itself is sufficient to define play, since behavior in each dimension can occur in both playful and non-playful situations. Rather, it is the cooccurrence of activity along multiple dimensions that defines a playful episode. Here, we contribute to the formation of a neuroscientific understanding of play by presenting a model and methodological framework that captures behavior at a high temporal resolution and as a continuously evolving multidimensional state, rather than as a set of discrete actions or as a global summary of type or quality. In this way, behavioral coding is well matched to the high temporal resolution of EEG data, maximizing the acuity with which brain-behavior correlates can be explored.

### Overview and Considerations of the Play Coding Methodological Framework

As described in the previous section, current neuroscientific research suggests that play behavior is underpinned by three major neural circuits that control motivation and affect (i.e., limbic structures), motor performance (i.e., somatosensory structures), and higher-order executive function (i.e., frontal cortical structures) respectively. Following from this, the proposed coding framework captures object-oriented play behavior along three corresponding dimensions: socioemotional (SE), sensorimotor (SM), and cognitive (C). Infants' or adults' behavior is coded according to the presence or absence [1/0] of play-congruent activity in each dimension. The intention of the coding scheme is to reliably capture common forms of playful behavior whilst retaining clarity of coding for each dimension (grounding the scheme in clear, observable, behaviors). With this in mind, play-congruent activity was defined for each dimension as follows:

Play-congruent activity in the SE dimension occurs when there is a display of positive or neutral affect, consistent with the idea that play leads to an internal sense of reward (Burghardt, 2005; Miller, 2017). Play-congruent activity in the SM dimension occurs when the partner (mother or infant) is voluntarily manipulating and/or touching the object in an exploratory manner. This criterion reflects the central place of self-directed, voluntary behavior in definitions of play (Burghardt, 2005; Lillard et al., 2013; Miller, 2017; Sawyer, 2017). Finally, the C dimension captures the presence of attentional engagement, as well as the level of complexity of this cognitive engagement. Therefore, our analysis is intended to explore 'minds-on' play, rather than 'minds-off' play. By 'minds-on play,' we mean play where cognition and attention are engaged through, for example, observation of object/partner behavior, exploration of the object, or communication. 'Minds-off play,' by contrast, refers to play behavior where cognition and attention disengage with the play object and partner, and the play goal is more sensory in nature, for example, chewing a toy or hitting it on the table while looking elsewhere. According to our framework, a play-congruent state is one in which the infant (or adult) concurrently exhibits play-congruent activity across all 3 dimensions (i.e., [1 1 1]).

An important feature of this framework is that it does not assume any one definition of play. Instead, we have grounded the framework in specific observable behaviors that are considered important factors across different conceptualizations of play – namely, the display of affect and voluntary physical and cognitive engagement with the object of play (Lillard et al., 2013; Miller, 2017). By analyzing the co-occurrence patterns of these basic behaviors, our framework can be used to assess similarities and potential groupings of different play-related social states (and, eventually, their neural substrates), which may in future lead to a definition of play behavior that is grounded in neuroscience. Another strength of the proposed framework is that it reduces the burden of subjective judgment about whether or not playful activity is occurring. Rather, objective and observable behaviors are coded (e.g., touching a toy, looking at a toy, smiling, etc), and the presence or absence of play (and other related social states) is inferred from temporally co-occurring patterns of behavior. Grounding coding in specific observable behaviors tends to result in higher levels of inter-rater agreement (Bakeman and Gottman, 1997).

### Aims and Predictions

The goal of the current study is to develop a new methodological framework for coding infants' and adults' playful behavior that would be compatible, in future, with EEG analysis. As mentioned previously, current research on mother-infant interactions and infant development often measure the parenting processes that occurs within a play context rather than the play itself, and current mother-infant play coding schemes typically lack the temporal precision to be integrated with neural measures. Therefore, we illustrate the application of our proposed

dimensional play coding framework using examples from two contrasting dyadic corpora of mother–infant object-oriented interactions during experimental conditions that were either non-conducive (Condition 1) or conducive (Condition 2) to eliciting playful behavior. In Condition 1, playful behavior was discouraged by asking mothers to focus on teaching infants about the social value (desirable or non-desirable) of the objects. In Condition 2, playful behavior was encouraged by asking mothers to use the objects in spontaneous, fun and natural interactions with their child. These corpora comprise both behavioral and electroencephalography (EEG) measurements that were collected concurrently from mothers and their infants. However, for this study, we focus on behavioral analyses. It is intended that the coding of play behavior under the proposed methodological framework will be easily assimilated with, and inform, future analysis of neural data that was also collected during adultinfant play. We have two specific sets of predictions regarding the behavioral differences between conditions that should emerge following application of the coding framework:

	- (a) Decreased negative affect
	- (b) Increased sensorimotor engagement
	- (c) Equivalent cognitive (attentional) engagement

The first prediction pertains to the sensitivity of the coding framework in detecting play-related behavior. Simply put, if mothers were instructed to play with their infants, then (although coders do not make direct judgments about whether participants were playing or not) we expect the coding framework to reveal that a play-congruent state was indeed the most frequent social state that infants displayed. The second set of predictions pertains to the utility of the framework in identifying differences in the quality of social interaction and temporal synchronicity with the dyad.

### MATERIALS AND METHODS

#### Participants

Five mother–infant dyads participated in the study (3M, 2F infants). Infants were aged 326.6 days (10.7 months) on average [range = 292–377 days (9.6–12.4 m), SD = 31.5 days (1.0 m)]. All mothers reported no neurological problems and normal hearing and vision for themselves and their infants. Although this sample size appears small, note that the aim of the present study was to assess how well the coding scheme captures intra-dyadic variation (variation in dyadic states over time and between conditions) rather than inter-dyadic variation (variation between dyads). Further, each dyad participated in two different conditions and generated over 20 min of data, which is substantial for infancy studies.

### Materials

For Condition 1 (non-conducive to play), 4 pairs of ambiguous novel objects were used. Within each pair, objects were matched to be globally similar in size and texture, but different in color. Ambiguous novel objects were chosen to ensure that infants would not have their own previous (playful) experience with these objects, and would rely on their mothers' instruction to guide their interactions with the objects.

For Condition 2 (conducive to play), a set of 8 different small toys was used. These were appropriate for the infants' age and included toys of differing shapes, textures and colors to encourage infants' interest in playing with them.

### Tasks

Each mother-infant dyad took part in experimental Conditions 1 and 2 in a counterbalanced order. In each condition, mothers and infants interacted with objects together, with the major difference being whether the nature of social interaction between mother and infant was conducive to eliciting playful behavior (as determined by the task instructions provided to the mother). In both tasks, the infant sat in a high chair, with the adult facing him/her across a table. The distance between the infant and adult was the same in each task, and each task lasted approximately 10 min.

#### Condition 1 (Not Play-Conducive)

In this condition, mothers were asked to teach their infants about the social value of pairs of ambiguous novel objects. For each pair of objects, mothers were instructed to describe one object with positive affect ("This is great, we really like this one!") and the other object with negative affect ("This is bad, we don't like this one"), as shown in **Figure 1**. Mothers were asked to limit their verbal descriptions to four simple formulaic sentences per object (which they repeated for each pair of objects), and to model positive or negative emotions in a prescribed manner (e.g., smiling versus frowning). The order of object presentation (positive or negative) was counterbalanced across trials. After observing their mothers' teaching about both objects, infants were then allowed to interact briefly with the objects themselves before the objects were retrieved. During the session, an experimenter was present to ensure that participants were interacting as instructed. She provided new pairs of objects

FIGURE 1 | Illustration of experimental setup for Condition 1. (Left) Negative object demonstration by adult; (Middle) positive object demonstration by adult; (Right) infants' interaction with objects. Written informed consent was obtained for the publication of this image.

although the actors here are not wearing EEG caps, EEG signals were also collected during this condition. Written informed consent was obtained for the publication of this image.

as required, but explicitly avoided making prolonged social contact with either participant.

#### Condition 2 (Play-Conducive)

In this condition, mothers were asked to play with their infant using a set of attractive toys (see **Figure 2**). Mothers were instructed to use the toy objects in a spontaneous, fun and natural way, to actively engage the infant's attention, but to play quietly whilst avoiding large physical motions (in order to minimize EEG motion artifacts). During the session, an experimenter was present to ensure that participants were playing as instructed. She provided new toys as required (approximately every 2 min, or more frequently if the child threw the object to the floor) to sustain their attention and interest. The experimenter avoided making prolonged social contact with either participant.

#### Video Recordings

To record the actions of the participants, two Logitech High Definition Professional Web-cameras (30 frames per second) were used, directed at the adult and infant respectively. Afterward, each video recording was manually coded for the timing of the behaviors of interest, using the coding scheme outlined in the Section "A Dimensional Framework for Analyzing Adult–Infant Play Patterns." EEG data were also concurrently collected from mothers and infants during social interactions, but this data is not reported here as the primary focus of the current study is to develop a framework for assessing play behavior.

### A Dimensional Framework for Analyzing Adult–Infant Play Patterns

#### Coding Scheme

The intention of the coding scheme is to reliably capture common forms of playful behavior whilst retaining clarity of coding (grounding the scheme in clear, observable, behaviors). Accordingly, the minimization of false positives (non-play states coded as play-congruent) was prioritized over the minimization of false negatives (play states coded as play-incongruent). This was aided by our observations that false negatives in infants' play tended to be rare and short in duration – e.g., throwing an object, directing focussed attention to something other than the play partner or play object. With this in mind, the coding scheme captures object-oriented play behavior in three dimensions: the socioemotional dimension (SE), the cognitive dimension (C), and the sensorimotor dimension (SM). On each dimension, simple, observable behavior at each timepoint (here, at the temporal resolution of 33 ms, corresponding to 30 frames per second) is coded using a [1/0] main code which indicates the presence or absence of play-congruent activity along the target dimension. Additionally, and where relevant, a further sub-code [1/0.x] may be assigned to indicate the level/type of activity that is occurring. These sub-codes (although not the focus of the current analysis) permit the capture and differentiation of more complex patterns of behavior in each dimension. The term 'play-congruent' is used to refer to behaviors and states in each dimension during which play might be occurring, and where the individual might be in a playful mental frame. In each dimension, the presence of play-congruent behavior is allocated a code of 1 and the absence of play-congruent behavior is allocated a code of 0. When play-congruent behavior is concurrently observed across all three dimensions (i.e., [1 1 1]), the resulting state is termed a 'playcongruent state.' The coding scheme is summarized in **Table 1**, and described further in the following text.

#### **Socioemotional (SE)**

The presence of positive affect and the idea that play is done for its own sake, leading to an internal sense of reward rather than any form of external reward, are both central to most conceptualizations of play behavior (Burghardt, 2005; Miller, 2017). However, whilst negative affect is considered antithetical to the presence of a mental 'play-state,' a neutral display of affect could also be present during play (Miller, 2017). Therefore, the expression of positive or neutral affect was taken as congruent with a play-state in the socioemotional dimension.

#### **Sensorimotor (SM)**

The sensorimotor dimension captures whether or not there is voluntary physical contact with the object that is free from external constraint. It is not possible to engage in object play without physical contact with the object, so this dimension encodes a necessary condition for one of the main play behaviors of interest during infancy. Furthermore, the fact that only voluntary contact is coded reflects the central place of selfdirected, voluntary behavior in definitions of play (Burghardt, 2005; Lillard et al., 2013; Miller, 2017; Sawyer, 2017). Therefore, voluntary physical contact with the object was deemed as congruent with a play-state in the sensorimotor dimension. The primary limitation of this criterion is that it will not capture play behavior with no physical contact with the object – for example, when throwing or dropping objects. However, in infancy, play behavior without physical contact is rare: even if an infant drops or throws a toy, they tend to either pick it up again soon after, or cease playing with it. It is only later in life that forms of play appear which can involve sustained lack of contact, e.g., sociodramatic play and games with rules. Consequently, confining positive coding in the SM dimension to physical contact may be conservative but only rarely erroneous.

#### **Cognitive (C)**

The cognitive dimension captures the level of cognitive complexity and engagement, by coding whether or not there is visual attention on the object and/or play partner and what kind

#### TABLE 1 | Coding scheme.

fpsyg-09-00273 March 19, 2018 Time: 17:23 # 7


Please see text for a detailed description and explanation.

of behavior is occurring in relation to the object. While playful behavior may occur without active attention on the object or play partner (for example, an infant swinging a toy around while not looking at anything in particular, or with eyes closed), we decided that, as we were interested in play's effects at the neural level, even our broadest criteria for a play-congruent state should include some level of cognitive engagement. In other words, our model is intended to explore 'minds-on' play, rather than 'mindsoff' play that is purely physical or sensory in nature. During infancy, looking behavior is closely related to visual attention, and is frequently used as an index of early emerging cognitive function and development (Colombo, 2001). Therefore, visual attention on either the object or the play partner (as determined by participants' looking behavior) was deemed as congruent with a play-state in the cognitive dimension.

In the cognitive dimension five sub-codes were also developed to delineate whether the individual is also engaged in exploratory behavior (object-general or object-specific), pretense or acting, or rule-based behavior. The distinction between object-general and object-specific exploration is intended to capture two different levels of cognitive engagement which may relate to observable differences in neural activity. Object-general exploration is any kind of activity with the object that does not involve appreciation of the object's particular properties, i.e., the action could be done with almost any object. The main examples of objectgeneral exploration include shaking, banging, or mouthing the object, and these behaviors are often done in a repetitive or circular fashion. Object-general exploration may provide sensory stimulation and coarse, 'global' information, such as object weight or texture, but seems unlikely to lead to specific conceptual information about an object's functions and uses. Object-specific exploration, by contrast, involves an appreciation of that object's unique properties – for example, spinning the blades of a toy helicopter, or pulling on parts of the object to see if they can be removed. It is this kind of exploration that seems most likely to involve the processing of more complex information, and lead to more advanced conceptual learning about an object's functions and uses. This distinction between object-general and objectspecific behavior is parallel to the distinction made by Belsky and Most (1981), between mouthing or simple manipulation and 'functional play' which is appropriate for the specific object. Belsky and Most (1981) found object-specific play to be more developmentally advanced than more general, non-specific object play, supporting our decision to encode object-specific play as a more cognitively engaged form of play in our coding scheme. **Figure 3** shows an example of the resulting codes for each separate dimension over time during a social interaction episode for an infant.

#### Coders and Reliability

Coding of all videos was performed by one trained coder. To establish inter-rater reliability, approximately 20% (48 min) of the infant and adult video data was coded by a second coder, who was trained independently from the first coder. Percentage agreement was over 90% on all 3 dimensions for the infant (SE = 91%, SM = 98%, C = 95%) as well as for the adult (SE = 97%, SM = 94%, C = 93%), indicating a high level of agreement.

## Analysis of Social States

#### Mean Dimensional Scores

For each dimension (socioemotional [SE], sensorimotor [SM], cognitive [C]), separate SE, SM and C mean dimensional scores can be computed by taking the average over all timepoints in the session. This mean score ranged between 0 (if all timepoints were coded as 0) and 1 (if all timepoints were coded as 1). Accordingly, if infants generally displayed more positive/neutral affect than negative affect, their SE mean score would be greater than 0.5 (i.e., higher proportion of 1 s than 0 s overall). Similarly, the mean SM score indicates the proportion of time during which the infant has active "hands-on" possession of the toy (e.g., mean SM score of 0.7 = infant has active possession of the toy 70% of the time). Finally, the mean C score indicates the relative attentiveness of the infant during the session (e.g., mean C score of 0.6 = infant is attentive toward the object or partner 60% of

the time). It is important to emphasize that a high score on a single dimension (or indeed across several dimensions) does not in itself indicate that the infant is highly engaged in play, since these mean dimensional scores are independent of each other and therefore provide no information about temporal co-occurrence across dimensions (i.e., social state). For example, it would, in theory be possible for the infant to show positive affect only when inattentive/not touching the toy and yet still produce a high mean score on the SE dimension.

#### Social States (Per Timepoint)

At each timepoint, an infant's current social state can be defined by the temporal co-occurrence and valence of existing codes on each dimension, signifying affect (SE), touch (SM) and attention (C) respectively. As SE, SM and C codes could each take a value of 0 or 1, this allows a total of 8 distinct social states (i.e., 2<sup>3</sup> ), as outlined in **Table 2** and **Figure 4**.

Further, as social states were computed for every time-point in the session, this permits the tracking of infants' dynamic


evolution between social states over time, as illustrated in **Figure 5A**, as well as the relative proportion of time that infants spend in each state (**Figure 5B**). Finally, using the state frequency histogram, it is possible to identify the modal social state for a given interaction session.

#### Adult–Infant Joint States (Behavioral State Synchrony)

The joint (i.e., concurrent) social state of adults and infants can also be assessed using this scheme. For example, during didactic teaching, only the cognitive dimension may be concurrently engaged in both partners, whilst their sensorimotor and socioemotional states may be discordant (e.g., Mother's state is [1 0 1] whilst the infant's state is [0 1 1]). This may also be performed to examine joint states within a particular dimension (e.g., affect), considered alone. Such joint state analysis could be useful to address research questions pertaining to parent-child synchrony (since if both parent and child display the same state at the same time, they are behaving synchronously), contingency and responsiveness.

Finally, this framework permits an empirical discrimination between similar/related social interactional states such as teaching versus play. Although the play-congruent state [1 1 1] is of greatest interest here, a total of 8 different individual states for infants and adults (and 64 joint adult-infant states) may be discriminated under the proposed framework, which may yield insights into a new biologically grounded taxonomy of play interactions.

### RESULTS

Here, we report the results from the main codes assigned along each dimension (e.g., 1 or 0 – see **Table 1**). However, if desired, more fine-grained information about the quality of social interaction may be gleaned by examining participants' sub-codes, as detailed in the Supplementary Materials.

#### Mean Dimensional Scores

Infants' (N = 5) and mothers' (N = 5) mean dimensional scores obtained during Conditions 1 and 2 are shown in **Figure 6**. All scores were normally distributed (Kolmogorov–Smirnov test, p > 0.20 for all dimensions). The data were assessed statistically using a Repeated Measures ANOVA taking Dimension (3 levels, SE/SM/C) and Condition (2 levels) as within-subjects factors, and participant (infant or mother) as a between-subjects factor.

The ANOVA revealed that there was a significant main effect of Condition [F(1,8) = 18.82, p < 0.01, η 2 <sup>p</sup> = 0.70), where, across all dimensions, mean dimensional scores for Condition 2 (play-conducive) exceeded those for Condition 1 (not playconducive). Importantly, there was also a significant interaction between Condition and Dimension, suggesting that not every dimension differed between Conditions [F(2,16) = 8.42, p < 0.01, η 2 <sup>p</sup> = 0.51]. Tukey HSD post hoc analysis of this interaction indicated that Socioemotional and Sensorimotor dimensional scores were both significantly higher during Condition 2 than Condition 1 (p < 0.001, p < 0.05 respectively), but Cognitive dimensional scores did not differ (p = 0.99). Therefore, during social interactions that supported playful behavior, both infants and their mothers (on average) showed more positive/neutral affect and active possession of the toy, but they were equally attentive across both conditions. There was also a significant main effect of Participant [F(1,8) = 54.7, p < 0.001, η 2 <sup>p</sup> = 0.87] with mothers showing higher dimensional scores overall than their infants, which is consistent with high compliance and taskengagement by adults.

Whilst dimensional scores are able to capture time-averaged differences in overall social interactional quality, they cannot reveal whether qualitatively-different types of social interaction are occurring (as well as their timing and frequency of occurrence). Accordingly, we next assessed infants' social states (calculated for each timepoint) in each condition.

### Frequency Distribution of Social States During Play and Teaching

The frequency distribution of different social states observed in infants and mothers during Conditions 1 and 2 are shown in **Figure 7**. Given that there were only 5 data points, this

provided insufficient degrees of freedom to conduct an omnibus Repeated Measures ANOVA. Accordingly, to assess whether there were statistical differences between conditions in social state frequency, we conducted paired t-tests (Benjamini–Hochberg FDR corrected p-values, α = 0.05) for each social state, for infants and mothers.

#### Infants

The t-test results revealed that there was a large and significant increase in the frequency of the play-congruent [1 1 1] social state during Condition 2 as compared to Condition 1 [t(4) = 8.97, BH-FDR p < 0.01, d = 4.01]. On average, during Condition 1, infants were in a [1 1 1] state 24.9% of the time but during Condition 2, this frequency doubled to 50.7% (i.e., half) of the total time spent in social interaction. Similarly, during Condition 2, there was a trend toward an increase in the frequency of the [1 1 0] social state [t(4) = 2.90, BH-FDR p = 0.12, d = 1.30], and a similar trend toward a decrease in the frequency of the [0 0 1] social state [t(4) = −3.11, BH-FDR p = 0.12, d = 1.39]. Together, these results suggest that during Condition 2 (which was conducive to playful behavior), infants spent significantly more time in social states characterized by "positive-affect hands-on interaction" (i.e., [1 1 1] or [1 1 0]), and proportionately less time in a negative-affect passive observational state [0 0 1].

#### Mothers

Mothers' t-test results revealed only one significant difference between conditions. There was a significant decrease in the frequency of the play-incongruent social state of [0 1 1] (negative affect, contact, attention) in Condition 2 as compared to Condition 1 [t(4) = −9.85, BH-FDR p < 0.01, d = 4.41]. However, although there was a trend toward an increase in playcongruent behavior ([1 1 1]) in mothers for Condition 2, this increase was not significant [t(4) = 1.28, BH-FDR p = 0.43, d = 0.57]. Therefore, although mothers displayed less playincongruent behavior during Condition 2 than Condition 1, we did not observe significantly more play-congruent behavior.

#### Individual Modal States Infants

During Condition 1, the modal (most frequently occurring) state was [1 0 1] for 4 infants, and [0 0 1] for 1 infant. However, during Condition 2, the modal state for all 5 infants was the play-congruent state of [1 1 1]. Therefore, there was a clear difference in the characteristic state of infants between conditions. During social interactions that were non-conducive to playful behavior, infants were predominantly passive ("handsoff ") but attentive. During social interactions that supported playful behavior, infants were predominantly active ("handson"), positive and attentive.

#### Mothers

By contrast, mothers displayed almost no difference in their modal states across experimental conditions. Four out of five mothers showed a modal state of [1 1 1] for both

Conditions 1 and 2. One mother showed a modal state of [1 1 1] during Condition 1, and a modal state of [1 0 1] during Condition 2, with [1 1 1] being her next most frequently occurring social state.

### Mother–Infant Joint States (Behavioral State Synchrony)

Finally, we assessed the joint probability distribution of infants' and mothers' social states during Conditions 1 and 2, as shown in **Figure 8**. Of note, perhaps the most relevant difference is that during Condition 1, mothers and infants were in a joint playcongruent social state (i.e., [1 1 1] – [1 1 1], or synchronous play) only 5.7% of the time on average. By contrast, during Condition 2, mothers and infants showed synchronous play 24.9% of the time – a nearly fivefold increase. Therefore, during conducive social contexts (Condition 2), the play-congruent state occurred more frequently in regard to infants' own behavior, and this joint social state also occurred concurrently (i.e., synchronously) with their mothers more often.

## DISCUSSION AND CONCLUSION

Play behavior is diverse and multi-faceted, and a major challenge for current research is to converge on a common definition and measurement system for play that integrates behavioral, cognitive and neurological levels of analyses. Here we present and test a new methodological framework that captures different social interactional states (play-congruent or play-incongruent) and permits an empirical discrimination between similar and related social interactional states (such as joint activities in situations both conducive and non-conducive to the emergence of playful behavior).

A priori, we made two sets of predictions about the differences in infants' behavior between these conditions. Our coding results supported both predictions. First, we observed that, during conducive social interactions designed to be conducive to play (Condition 2), infants' modal state was indeed coded as playcongruent (i.e., [1 1 1]). Further, infants spent significantly more time in social states characterized by "positive-affect handson interaction," and proportionately less time in a negativeaffect passive observational state. This result demonstrates the sensitivity of the coding scheme in correctly identifying the intended mode of social interaction as either playful or nonplayful, even though coders did not explicitly code for play itself. Further, our data also highlight the fact that, although mothers were instructed to play with their infants during Condition 2, infants themselves did not display playful behavior all of the time (on average, only 50.7% of the time). Rather, infants showed a heterogenous mixture of social states characterized variously by positive affect and "hands-on" engagement. This behavioral finding has important practical implications for the analysis of

infants' neural data that is collected during such play sessions. If infants' neural data during play is assumed to be homogenous and analyzed as such (e.g., by computing averages of neural indices across all time-points), such an analysis would be erroneous as infants may be engaging in different forms of play (as well as other related types of social interactions) over a period of time. The proposed coding framework therefore lends itself well to time-sensitive neural analyses, because it permits the automatic extraction of discrete time periods when a certain social state is observed (i.e., [1 1 1]), as well as separate analyses of the periods leading up to, and away from these moments.

As predicted, our coding results also showed that, when infants were engaged in social interactions that were designed to be conducive for playful behavior (Condition 2), they showed decreased negative affect, increased sensorimotor involvement with objects, but equivalent attentional engagement. Similarly, mothers also showed significantly decreased negative affect during social interactions that supported playful behavior. Mothers' decrease in negative affect during Condition 2 was expected as they had been instructed to model both positive and negative affect in Condition 1. However, it is interesting to note that their infants also showed a similar decrease in negative affect. This result also demonstrates the potential utility of the coding scheme in highlighting key differences between different forms of early social interactions. Specifically, the finding that infants were no less cognitively engaged during interactions where the mother was not explicitly teaching her infant suggests that playful interactions might provide an equally (if not more) effective social context for early learning as compared to direct didactic instruction from parents.

Finally, we observed that during play, parent-child dyads showed greater temporal synchrony with each other's social states, as mothers and infants were concurrently (jointly) in a playful state (i.e., [1 1 1] – [1 1 1]) five times more frequently than was observed during teaching. This strong alignment of social-affective state between parent and child during playful scenarios is consistent with previous work. For example, patterns of temporally synchronous activity between parent and child during social interaction have been noted for gaze (Kaye and Fogel, 1980), affect (Cohn and Tronick, 1988; Feldman et al., 2011) and even autonomic arousal (Feldman et al., 2011; Waters et al., 2014). Our coding scheme not only allows the identification of specific time periods when play is synchronously occurring between mother and child (e.g., for neural analyses), but also allows comparison to periods when the dyad is socially asynchronous, or 'out of tune' with each other. Such parentchild asynchrony is known to occur more frequently and to be of particular clinical relevance in affective disorders such as maternal depression (Goldsmith and Rogoff, 1997; Jameson et al., 1997) which is known to have an impact on the quantity and quality of children's own play (Murray et al., 1999). However, two caveats should be noted when interpreting these synchrony data. First, 'social state synchrony' (as defined by our coding scheme) may not necessarily imply that mother and infant are jointly engaged in the same activity. The current scenario involved play with a single toy object, however, if multiple toy objects were present, parent and child could be interacting separately with different objects yet still be coded as being in a joint state of play. Second, it should be noted that successful social interactions also include more complex temporal contingencies (e.g., turntaking) where partners' actions are not concurrent (de Barbaro et al., 2013b; Leclère et al., 2014). As the current coding scheme captures the temporal evolution of different states, in future, these non-synchronous temporal contingencies between parents and children could also be identified and examined.

#### Limitations

One major limitation to the current study is its small sample size and the restricted movement of participants (which was necessary for concurrent EEG measurements but could reduce

ecological validity). However, our intention has been to illustrate the sensitivity of the proposed framework in discriminating between different play-related states, using a coding scheme grounded in simple, observable behavior and with a temporal resolution suited to neuroscientific research. Five dyads provided sufficient data to assess the efficacy of the framework as a means of capturing variations in individual and joint behavior as continuously evolving states, rather than as discrete actions or a subjective global assessment. Nevertheless, research applying our framework to larger samples is needed to ensure that the contextual differences we identified between play and teaching scenarios are generalisable.

A second limitation is our focus on one specific type of play which revolves around a physical object. However, the focus on a physical object does not limit our model entirely to earlier and more basic types of play, because a physical object can be used in play with more symbolic content. For example, object substitution – using an object as if it is something else – is often coded in established play coding schemes as an indicator of pretend play. Many games with rules involve physical objects, so participants could engage in such a game, either spontaneously or because they are asked to do so. We decided to capture these more complex types of behavior in our model, with the acknowledgment that in infancy, these types of behavior will be very rare, and most likely observed on the part of the parent. Also, with appropriate development (e.g., the elaboration of sub-code options) our framework could be applied to more abstract forms of play that do not revolve around physical objects. It may also be possible to analyze play with multiple objects, although this would make the coding of dyadic states more complex, because it could no longer be assumed that the dyad were playing together if they showed the same state (for example, parent and child may each be touching and engaging with a different toy yet both show the [1 1 1] state). However, the scheme can be used in its present form to code solo play. Comparing behavioral and neurological results from solo play with multiple objects to solo play with one object could provide important developmental insights, as there is evidence that play with multiple objects is developmentally distinct from play with a single object and arises later in ontogeny (Belsky and Most, 1981).

A third potential limitation is the use of looking behavior as the primary index of cognitive engagement. Infants may stare at something without much cognitive engagement, and several researchers have discussed the limitations of using looking behavior to index cognitive engagement during infancy (Ruff and Rothbart, 1996; Aslin, 2007; Richards, 2010). Richards has, for example, distinguished between looks that are accompanied by concomitant physiological changes from those that are not, and found that the degree of physiological change is a better indicator than the mere presence of looking behavior as to whether information presented is retained (e.g., Richards and Casey, 1991). Ruff has differentiated different qualities of visual attention based on detailed video coding (Ruff and Capozzoli, 2003). Nonetheless, during object play, looking behavior can be a useful indicator of infants' level of cognitive engagement. For example, in an analysis of infant exploratory behavior with objects, Caruso (1993) found that 'sophisticated exploration,' which included visual examination and manipulating an object to look at it, was negatively related to mouthing and gross motor manipulation with objects (termed 'unsophisticated exploration'). Thus, in periods where the infant exhibits sustained attention on the parent and/or toy object, it seems likely that they are engaging in social interaction and/or attempting to understand or manipulate the toy object. Similarly, a child could have an object in their mouth and therefore not be looking at it, but this is what we regard as a more sensory, 'minds-off' form of play, where the action (mouthing) and visual attention are directed to different stimuli, i.e., there is a divided sensory focus compared to an infant whose action and attention are congruent, through both looking at a toy and reaching for or holding a toy. Therefore, despite limitations, for practical reasons, and in common with numerous other researchers in the field (e.g., Yu and Smith, 2016), we have taken the simple presence or absence of looking behavior toward an object as an indicator of cognitive engagement. However, it should be noted that for adults, this might be an unnecessarily restrictive definition of cognitive activity.

A final limitation is that, as play behavior changes significantly across the life-span (Power, 1999), we chose to focus primarily on infants. Therefore, adult play behavior may not have been optimally captured by the framework. Nonetheless, our coding revealed an interesting result: mothers (unlike their infants) did not show a clear shift toward greater playfulness for Condition 2 as compared to Condition 1, although their negative affect decreased overall. One possible reason for this may be that mothers approached the teaching exercise in Condition 1 as pretend play. Since the objects used in Condition 1 had no intrinsic social value, mothers had to act out a 'good' or 'bad' response to the objects, by pretending that the objects had a particular social significance. As this was not a classic pretend play situation (in that infants might not know that their mothers were pretending), this could explain why infants (aged on average 10.7 months) responded less playfully to their mother's social pretend play. The result would also be consistent with the late emergence of pretend play capabilities during the second year of life (Fein, 1981).

### Toward a Neuroscientific Understanding of Play

Although the neural EEG data that was collected during the parent–child social interactions was not analyzed here, the proposed framework represents an important first step toward analyses of the concomitant neural data, which is planned for future investigation. Specifically, since the proposed framework fractionates play behavior along dimensions that have previously been associated (in animal studies) with well-defined neural circuits, this provides the potential to generate specific hypotheses regarding the neural activation patterns predicted to accompany each of the 8 possible social states. An analysis of these underlying neural substrates may, in future, lead to a definition of play behavior that is more closely grounded in neuroscience.

**Figure 9** provides an illustration of potential analyses that could be conducted on the adult-infant EEG data, and

time period, for infant (Left) and adult (Right) respectively. Each node represents a neural circuit subserving each dimension (socioemotional [SE], sensorimotor [SM], and cognitive [C]). Arrows represent functional connectivity between these respective neural circuits.

(highly simplified) examples of neural activation patterns that could underpin three different social states (including the play-congruent [1 1 1] state). The top panel of **Figure 9** depicts EEG signals concurrently measured from infant and adult during the joint play session. After the application of behavioral coding to the accompanying video, the EEG data may be divided into three different time periods (A, B, C) based on the infant's social state (for simplicity, here the adult remains in the play-congruent state [1 1 1] throughout). The bottom panels show the predicted patterns of neural activation that would be expected during each time period, for the infant and the adult. For simplicity (and purely illustrative purposes), here, each circled node represents a neural circuit that subserves each dimension (socioemotional [SE], sensorimotor [SM], and cognitive [C]). It may be hypothesized that each neural circuit will show activation [1] or inactivation [0] depending on the concomitant behavioral state of the participant. Further, when more than one neural circuit is activated, these circuits may show mutual patterns of

functional connectivity, as represented by connecting arrows in the figure. In this case, the play-congruent [1 1 1] state would also be associated with the highest levels of neural activation and connectivity across brain regions. Note that in this example, changes in the infant's behavioral state (e.g., from [1 0 0] to [1 1 1]) occur only every 2 s. However, the coding scheme permits precise identification of the start and end points of each social state with a temporal precision of 10s of milliseconds (i.e., 2.1 s versus 2.4 s) which is crucial for phase-based connectivity analyses of EEG oscillatory signals.

#### Conclusion

In conclusion, we have presented a novel dimensional framework for analyzing the continuous time-evolution of adult-infant play patterns, underpinned by biologically informed state coding along sensorimotor, cognitive and socio-emotional dimensions. We expect that the proposed framework will have wide utility amongst researchers wishing to employ an integrated, multimodal approach to the study of play, and lead toward a greater understanding of the neuroscientific basis of play.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Cambridge Psychology Research Ethics Committee with written informed consent from all subjects. Parents gave written informed consent on behalf of their children

### REFERENCES


in accordance with the Declaration of Helsinki. The protocol was approved by the Cambridge Psychology Research Ethics Committee.

### AUTHOR CONTRIBUTIONS

DN designed the coding scheme and wrote the article. KC and SG collected the data. HD analyzed the data. MS helped to design the coding scheme and wrote the article. SW designed the experiment and wrote the article. VL designed the experiment, analyzed the data, and wrote the article.

### FUNDING

This research was funded by a UK Economic and Social Research Council (ESRC) Transforming Social Sciences Grant ES/N006461/1 to VL and SW a Nanyang Technological University start-up Grant M4081585.SS0 to VL, and an ESRC Future Research Leaders Fellowship ES/N017560/1 to SW.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.00273/full#supplementary-material

lesion, and pharmacology studies. Behav. Brain Res. 182, 274–283. doi: 10.1016/ j.bbr.2007.03.010




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Neale, Clackson, Georgieva, Dedetas, Scarpate, Wass and Leong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Variety Wins: Soccer-Playing Robots and Infant Walking

Ori Ossmy <sup>1</sup> \* † , Justine E. Hoch1†, Patrick MacAlpine<sup>2</sup> , Shohan Hasan<sup>1</sup> , Peter Stone<sup>2</sup> and Karen E. Adolph<sup>1</sup>

<sup>1</sup> Department of Psychology, New York University, New York, NY, United States, <sup>2</sup> Department of Computer Science, University of Texas at Austin, Austin, TX, United States

Although both infancy and artificial intelligence (AI) researchers are interested in developing systems that produce adaptive, functional behavior, the two disciplines rarely capitalize on their complementary expertise. Here, we used soccer-playing robots to test a central question about the development of infant walking. During natural activity, infants' locomotor paths are immensely varied. They walk along curved, multi-directional paths with frequent starts and stops. Is the variability observed in spontaneous infant walking a "feature" or a "bug?" In other words, is variability beneficial for functional walking performance? To address this question, we trained soccer-playing robots on walking paths generated by infants during free play and tested them in simulated games of "RoboCup." In Tournament 1, we compared the functional performance of a simulated robot soccer team trained on infants' natural paths with teams trained on less varied, geometric paths—straight lines, circles, and squares. Across 1,000 head-to-head simulated soccer matches, the infant-trained team consistently beat all teams trained with less varied walking paths. In Tournament 2, we compared teams trained on different clusters of infant walking paths. The team trained with the most varied combination of path shape, step direction, number of steps, and number of starts and stops outperformed teams trained with less varied paths. This evidence indicates that variety is a crucial feature supporting functional walking performance. More generally, we propose that robotics provides a fruitful avenue for testing hypotheses about infant development; reciprocally, observations of infant behavior may inform research on artificial intelligence.

Keywords: infant walking, locomotion, bipedal robotics, robot soccer, natural gait

### INTRODUCTION

Both infancy and artificial intelligence (AI) researchers are interested in developing systems that produce adaptive, functional behavior. Infancy researchers have the benefit of starting with infants—one of nature's most flexible and generative learning machines. Through observation, infancy researchers work backward to reverse engineer infants' underlying learning mechanisms and develop formal theories. These theories, however, are often difficult to test experimentally; controlled rearing environments and training regimens are notoriously slow, burdensome, and in some cases, outright impossible. AI researchers have the benefit of building models, but can gain

#### Edited by:

Tom Ziemke, University of Skövde, Sweden

#### Reviewed by:

Rajiv Ranganathan, Michigan State University, United States Jason Scott Metcalfe, US Army Research Laboratory Human Research and Engineering Directorate, United States

> \*Correspondence: Ori Ossmy oo8@nyu.edu

†Shared first authorship.

Received: 19 November 2017 Accepted: 11 April 2018 Published: 09 May 2018

#### Citation:

Ossmy O, Hoch JE, MacAlpine P, Hasan S, Stone P and Adolph KE (2018) Variety Wins: Soccer-Playing Robots and Infant Walking. Front. Neurorobot. 12:19. doi: 10.3389/fnbot.2018.00019 insights into the processes of change by studying natural learning systems (Gómez et al., 2004; Cangelosi et al., 2015). Here, we use the computational power of AI to test an otherwise intractable developmental question: What is the best way to learn a generative skill like walking?

### VARIETY IN SPONTANEOUS INFANT WALKING: A FEATURE OR A BUG?

Variety is essential for functional motor behavior. Movements must be tailored to the changing constraints of the body, environment, and task (Gibson, 1979; Newell, 1986; Bernstein, 1996). Functional walking, for example, is a highly creative process. It requires more than alternating leg movements to get from A to B. No step is ever repeated in exactly the same way or under exactly the same conditions. To successfully navigate the environment, walking must be continually modified to suit changes in local conditions—different surfaces (e.g., walking on pavement or sand), changes in layout (e.g., walking uphill or over flat ground), and obstacles along the path (e.g., clutter, elevations, and other agents who move). Thus, functional walking requires agents to navigate varied paths to adapt to moment-to-moment changes in body-environment relations (Adolph, 2008; Adolph and Robinson, 2015). How does anyone, let alone an infant, learn such a generative skill? What sort of training regimen facilitates the acquisition of flexible, creative, adaptive motor action?

Decades of research on the development of walking have focused on the acquisition of periodic gait—the ability to maintain steady-state velocity in a straight line using a series of alternating steps (Adolph et al., 2003; Ivanenko et al., 2004; Chang et al., 2006; Hallemans et al., 2006; Bisi and Stagni, 2015; Bril et al., 2015). With straight-line walking as the "gold standard," research on motor learning and rehabilitation has focused on training uniform, alternating steps (Cherng et al., 2007; Ivanenko et al., 2007; Ulrich et al., 2008; Reisman et al., 2009; Willoughby et al., 2010). Although such training leads to improvements in strength, and indeed improvements in straight-line walking, it does little to improve the functional, flexible, adaptive, walking skills needed to navigate a realworld environment. So, what does? A growing literature on motor learning recognizes the beneficial role of variable practice (Moxley, 1979; Catalano and Kleiner, 1984; Van Rossum, 1990; Schmidt, 2003; Davids et al., 2006; Ranganathan and Newell, 2013). The principle at the heart of this line of research is that more variability in practice leads to greater flexibility outside the training environment.

Initially, infant walking is highly variable. Infants' gait is inconsistent from step to step (Clark et al., 1988; Bonneuil and Bril, 2012). Infants cannot reproduce leg movements consistently, they cannot walk quickly, they cannot walk far, and they fall a lot (Adolph et al., 2012). New walkers are bad walkers, but they get better with experience (Adolph and Robinson, 2015). Moreover, individual infants display a tremendous variety of path shapes during spontaneous walking in free play. They produce both short and long bouts; they generate curving, serpentine, and zigzag paths; they double back on themselves; they step in every direction and sometimes take multiple steps on the same foot (Adolph et al., 2012; Lee et al., 2017). These varied paths steer infants around toys and people, but infants also take varied paths over open ground, when nothing is in the way (Hoch et al., 2017).

Is the variety in infant walking paths a feature or a bug? If variety is a feature, then infants' early experience with varied walking paths may be beneficial for learning functional walking. If variety is a bug, then infants' varied paths may add noise that impedes or, at best, has no consequences for learning. More likely, it is both. Learning on varied walking paths presumably has both costs and benefits depending on the task. Recent work suggests that early experience with varied walking paths may be an essential component of infants' natural training regimen. Short bouts, curving paths, and omnidirectional steps are endemic from infants' first steps until many months after walk onset (Lee et al., 2017). Inconsistency goes away with walking experience. Varied paths do not.

### HUMANOID ROBOTS LEARNING TO WALK: ROBOCUP!

Much like infants, for robots, functional movement in a realistic physical environment (simulated or real world) requires a behavioral flexibility. In the robot world, successful, functional locomotor performance is assessed with robot soccer competitions. Why soccer? Historically, computer scientists believed that a truly intelligent artificial agent might be able to beat a human at chess (1997; Deep Blue), at trivia (2011; Watson), or more complicated strategy games (2017; Alpha-Go). However, in 1997, the same year Deep Blue defeated chess grandmaster and former world chess champion Garry Kasparov, a new breed of AI researchers decided that rather than learning and implementing a set of rules, true intelligence might look something more like generative, adaptive, embodied motor action. To meet this challenge, they created RoboCup—the world's premier robot soccer competition (Visser and Burkhard, 2007). The original call of the RoboCup initiative was to create a team of autonomous humanoid robots that could beat the human soccer world cup champions by the year 2050 (Kitano et al., 1997; Burkhard et al., 2002).

Soccer competitions are a good measure of functional locomotor performance because players cannot simply enact a set of rules or merely produce repetitive movements. Seeing many "moves" into the future, as in chess, is not sufficient. Instead, soccer players must take rapid steps in every direction along curved and sharply turning paths—all while the locations of the ball, players on both teams, and the relative positions of the goals are changing. Thus, soccer-playing robots, like infants, must learn in a way that facilitates flexible, goal-directed locomotion in a continually changing environment.

Previous studies showed that training robots with omnidirectional walking paths decreased falls and increased speed and distance traveled, leading to smoother and faster turns compared to training on unidirectional walking (Urieli et al., 2011). Likewise, training robots on infants' walking paths may improve robots' locomotor performance.

### CURRENT STUDIES

In the current studies, we used simulated soccer-playing robots as a model system to ask whether infants' naturally varied walking paths are beneficial for learning functional walking. Although the full variety of infants' walking experiences is unknown, the quantity is massive. Infants take an estimated 2,400 steps and travel the length of 7.7 American football fields in 1 h of free play with caregivers (Adolph et al., 2012). Thus, any experimental training regimen with infants would likely be swamped by the sheer quantity of their everyday experiences. Given that it is not feasible to control infants' everyday walking experience (or even record their walking paths over a waking day), we exploited the computational power of RoboCup to experimentally test the hypothesis that paths varying in shape, step direction, number of steps, and number of starts and stops are better training for functional walking than less varied paths. Specifically, we compared the outcomes of different robot training regimens using simulated robot-soccer competitions. By using simulated robots as models of real-world infant walking, we could control the training regimen and obtain robust estimates of performance over thousands of games of RoboCup. In the current studies, we aimed to: (1) experimentally examine the role of varied paths in learning functional walking, and (2) test whether differences in the natural variety of infant walking paths affect functional performance. We addressed these aims in two simulated robot soccer tournaments.

To address our first aim, in Tournament 1, we trained one team of robots on a training course composed of infants' natural—and highly varied—walking paths. The "opposing" teams were trained using uniform geometric paths: straight-lines, squares, and circles. To evaluate the success of the different training regimens, each pair of teams played off in a series of head-to-head soccer games. We predicted that the robot team trained on infant paths would outperform the teams trained on less variable geometric paths (infant-trained robots would score more goals and win more games).

To address our second aim, in Tournament 2, we compared robots trained on infant walking paths that varied in several aspects—shape, step direction, number of steps, and number of starts and stops. Variety in path shape—some straighter and some curvier paths—reflects the ability to control the two sides of the body independently. Variety in step direction—forward, backward, and sideways—reflects the ability to produce steps in every direction. Variety in the number of steps reflects the ability to produce both short and long bouts of locomotion. Finally, the number of starts and stops reflects the ability to initiate and control disequilibrium. We clustered infants into five groups based on these measures of path variety and trained soccer teams according to the five sets of paths. It is important to note that soccer involves more than just walking. Players must also have the ability to kick the ball and collaborate with others. However, because the current studies focus on walking, all other skills remained constant and equal across teams. Therefore, if one team performed significantly better than another, the advantage was due to differences in walking training.

### GENERAL METHODS

#### Infant Walking Paths

We observed the walking paths of 90 infants (49 girls, 41 boys) from the New York City area during free play in a large laboratory playroom (6 × 9 m) as shown in **Figure 1A**. Play sessions lasted 20 min. Infants' age ranged from 10.75 to 19.53 months (M = 15.28) and their walking experience ranged from 0.10 to 9.01 months (M = 3.09). The study protocol was approved by the New York University Institutional Review Board. Infants' parents gave written consent for participation. For those parents who gave additional permission, videos from the session are shared on Databrary.org. We recorded infants' walking paths from four camera views: a fixed overhead view captured the entire playroom, two fixed cameras recorded side views of the room, and a camera held by an experimenter recorded a close-up view of the infant. The experimenter did not interact with infants or caregivers during the session.

To define the training paths for the infant-trained robots, we first identified bouts of walking. Using Datavyu (datavyu.org), a primary coder scored the onset (when infants' foot lifted off the floor) and offset of each walking bout (when infants were stationary for ≥500 ms). A second coder independently scored 25% of each session to ensure inter-observer agreement, rs > 0.96, ps < 0.001 for number of bouts, bout duration, number of steps per bout. To define the shape of each path and the angle between consecutive steps, a coder used Matlab software (DLTDataViewer; https://www.unc.edu/~ thedrick/software1.html) to manually digitize the location of each step using an overhead camera view that covered the entire playroom. If infants' feet were momentarily occluded, coders estimated their location based on the preceding and following steps. We used the xy coordinates of these points to map the paths infants took through the playroom (adjusting for lens and perspective distortion). Using known distances, we verified that the digitizing method returned < 1% error per bout.

### Robot Simulations

To ensure robustness, each pair of teams competed in 1,000 headto-head matches. Because such a large number of real-world competitions is impractical, we used a computer simulation environment—RoboCup 3D—as a low cost, high efficiency alternative to real model testing (Boedecker and Asada, 2008; Xu and Vatankhah, 2013). In addition, previous work showed that walking parameters learned through RoboCup simulations can be translated to effective walking parameters for physical robots (Farchy et al., 2013). The RoboCup 3D simulation environment is based on SimSpark (http://simspark.sourceforge. net/), a generic, physical, multiagent system simulator that uses the Open Dynamics Engine library (ODE; http://www.ode. org/). The library provides rigid body dynamics with collision detection, friction, and support for the modeling of advanced motorized hinge joints used in the humanoid agents.

The robots used in the simulation are loosely modeled after the Aldebaran Nao robot (http://www.aldebaran-robotics.com). All robots have a height of 57 cm, a mass of 4.5 kg, and 22 degrees of freedom (six in each leg, four in each arm, and two in the neck). Each robot has proprioception of all joints, pressure sensors on its feet, two gyrometers, and an accelerometer. The joint perceptors and effectors enable monitoring and control of the hinge joints. Joint effectors allow the robot to specify the torque and direction in which to move.

#### Robot Walk Engine and Optimization

To walk, a request for velocity and a destination for the feet and torso are sent to a walk engine, which uses this request, together with inverse kinematic and sensor information, to determine the next desired joint positions. The engine sends these joint positions to proportional-integral-derivative (PID) controllers that convert the positions into torque commands, which are then sent to the simulator for processing.

We used an open source parameterized walk engine (MacAlpine and Stone, 2016) that first selects a path for the torso to follow, and then determines where the feet should be with respect to the torso's location. More than 40 parameters are used to calculate the position of the feet with respect to the torso. A full description of the technical and mathematical details of the walk can be found in MacAlpine et al. (2012a).

The parameters for the walk engine are initialized based on previous testing on an actual Nao robot (MacAlpine et al., 2012a). Robots that use walk engines with these values, without any further parameter optimization (i.e., training to walk), are stable but slow walkers. We refer to these robots as "no-training" and used them as a baseline. All other teams were trained through walking optimization.

In the walking optimization, we wished to improve robots' stability during various situations encountered during soccer game play and to increase their speed. In this procedure, the robot learns a set of parameters by walking toward a series of destinations on the field (goToTarget optimization sub-task; MacAlpine et al., 2012a). The robot is rewarded based on the distance traveled toward the destination. If the robot reaches a destination ahead of time, it receives extra reward based on the distance it could have traveled given the remaining time. The robot also has "stop destinations," where it is penalized for overshooting the destination. Finally, the robot receives a penalty if it falls during the optimization run (for full equations describing the robot reward system, see MacAlpine et al., 2012a). Over the course of the optimization, robots learn to walk increasingly faster, with fewer errors. Because it is impractical to optimize all 40 parameters, we selected a subset of 25 parameters, based on their high potential impact on the speed and stability of the robots (see **Tables 1**, **4** for the list of selected parameters and further details in MacAlpine et al., 2012a). Moreover, because we focused on walking optimization, all phases of optimization that relate to other skills (e.g., teaching robots how to dribble or kick the ball) were similar to previous work and were held constant across teams (Urieli et al., 2011; MacAlpine et al., 2012a).

#### Soccer Game Procedure

We evaluated the success of each training regimen using a tournament of soccer games among teams of eleven simulated robot players. All players on a team were trained in the same way. Each team competed to get a ball into the other team's goal (**Figure 1B**). The games consisted of two 5-m halves (without stopping the time). Each half began with a kick-off, and all players were located on their team's side of the field.

We calculated the number of goals scored per team per match, and the number of wins in each set of 1,000 head-tohead matches. As in human soccer, the team that scored the most goals at the end of the game won. If the score was even, we declared a tie. To evaluate the success of each team (and thereby the success of its training regimen), we focused on the magnitude and consistency of their wins. The magnitude of each team's wins is expressed by their average goal difference, or the average number of goals scored relative to the number of goals conceded. Consistency is expressed by a high number of league points across the tournament. Using the standard league point system in human soccer, a team gains 3 points for a win, 1 point for a tie, and 0 points for a loss. Importantly, the motion targets used during the soccer matches are similar no matter what walk is used for training. That is, robots walk to the same


target positions near the ball even if they struggle to do so given their current walking capability. Therefore, an analysis of small differences in locomotion during the matches is not informative for determining differences in functional walking. However, differences in locomotion between teams can accumulate over time to produce differences in scoring.

### TOURNAMENT 1: INFANT PATHS VS. GEOMETRIC PATHS

#### Training Regimens

Our first aim was to examine the role of varied paths in learning functional walking. We compared a team trained on natural, varied infant walking paths to four teams trained on uniform, geometric walking paths. To create the infant training course, we randomly selected 15 infant play sessions. We then took the coordinates of each infant path and mapped those points onto the soccer field where each grid space is 1 <sup>×</sup> 1m<sup>2</sup> . For each session, we capped stationary periods at 2 s, and then randomly sampled a 4-min block of walking time plus stops. Although infants stop for longer periods, after 2 s, the robot is usually fully stabilized, so longer pauses have no additional merit. Three infants had fewer than 4 min of walking plus stops, so their paths were repeated until 4 min accumulated. Then, we concatenated the randomly sampled 4-min blocks from each of the 15 infants to create a 1-h long training course (a realistic duration for training in terms of computational time complexity). This training path was used to optimize the infant-trained team in Tournament 1. During training, the robots walked sequentially toward each step specified by infants' paths. Whenever the infant stopped walking, the robot also stopped walking and stood in place.

For the less varied training regimens, we optimized the walking engine parameters by training the robots on either a straight-line, circle, or square path. The straight-line team walked continually forward for 10 walking segments in which the robots walked for 7 s and then stopped for 2 s. The straight-line team's walking parameters were fit using the average of these 10 walking attempts. The circle team walked along a fixed-size circular path where the target heading was updated every second for 20 s and then stopped for 2 s. The square team walked once around the square before stopping for 2 s and then once around the square stopping for 2 s at each corner in alternation (the size of the square was determined by the robot's walk - 5 s of walking per side, 20 s total). Both the square team and the circle team repeated their walks 7 times. All teams' walking parameters were fit using the average of all repetitions. In previous work, the fitness values of robots trained on geometric paths plateaued after 200 generations of learning. In the current study, the duration of each training regimen was sufficient to include 300 generations of learning, thus there was no need for further training time. The final team used the initial parameters of the walk engine without any optimization (the no-training team; see Methods).

After the training phase, the five teams competed in a RoboCup 3D simulation.

### Results and Discussion

Overall, more variety in training led to better performance. Final values of the walking parameters (**Table 1**; see MacAlpine et al., 2012b for more details) indicate that training on varied paths leads to improvements in the optimization process in terms of stability (e.g., larger step size applied to the forward position of the torso, smaller foot angle at ground contact, higher proportion of stationary time for the swing foot), speed (e.g., shorter duration of single steps), and shifts in direction (e.g., smaller steps). The infant-trained team, which had the most varied paths, beat all other training regimens in terms of consistency (as measured by League points) and magnitude (as measured by average goal difference scores).

The infant-trained team won Tournament 1 with 9,701 League points, winning 2,888 games, tying 1,037, and losing only 75. The square-trained team came in second, followed by the circle-trained team, the line-trained team, and the

FIGURE 2 | Tournament 1 results: Infant paths vs. Geometric paths. (A) Accumulated league points, indicating consistency of training success. (B) Each team's wins (rows) against all possible opponents (columns) Color denotes the number of wins and does not include ties between teams. (C) Average goal difference, indicating magnitude of training success. The infant-trained team scored more goals and conceded fewer than all other teams. (D) The average number of goals scored by each team (rows) against all other opponents (columns). The infant-trained team scored fewer goals against more variably trained teams (squares, circles).

TABLE 2 | Scoring table for Tournament 1 describing results across all games.

TABLE 3 | Pairwise comparisons for the average goal differences in Tournament 1.


no-training team, respectively (**Figure 2A**; see **Table 2** for full description of the competition results). As in previous studies (MacAlpine et al., 2012a), the no-training team never beat a trained team (0 wins, see **Table 2**), demonstrating the essential value of optimizing the walk engine. **Figure 2B** depicts the wins of each team (rows) against all possible opponents (columns). The blue gradient in the infant team row shows that as the variety of the opponent's path increased, the number of infant team wins decreased. These findings suggest that more varied training regimens generalized to the new task constraints of RoboCup and led to better functional performance.

The infant-trained team also won in terms of magnitude by achieving a larger average goal difference across the tournament [**Figure 2C**; <sup>F</sup>(4, 19995) <sup>=</sup> 5595.91, <sup>p</sup> <sup>&</sup>lt; 0.001, one-way ANOVA on average goal difference]. As shown in **Figure 2C**, the infant team had the highest average goal difference followed by the circle and square teams (which did not differ, p = 1.00), the line team, and the no-training teams, respectively (all other Bonferroni post-hoc tests ps < 0.001). **Figure 2D** depicts the average number of goals scored against each possible opponent. The blue gradient in the infants' row shows that as the variety of the opponent's path increased, the number of goals infants scored decreased (see **Table 3** for pairwise comparisons). Taken together, the results of Tournament 1 indicate that the variety in infants' paths is a feature that leads to better functional walking as indexed by success in robot soccer. Moreover, path variety promotes generalization to new, untrained paths.

### TOURNAMENT 2: INDIVIDUAL DIFFERENCES IN THE VARIETY OF INFANT PATHS

#### Training Regimens

Our second aim was to test whether differences in the natural variety of infant walking paths affect functional performance. To ensure that team differences in variety did not depend on the number of infants contributing to the robot-training regimen, we created 5 equal sized groups of 15 infants by clustering the paths of the 75 infants who did not contribute to the training regimen for Tournament 1. We used a k-means clustering algorithm with k = 5 (Spath, 1985). To maintain equal sized groups, we applied an equal cardinality constraint to the clusters while keeping them as spatially cohesive as possible (Zhu et al., 2010).

Clusters were based on variation in four interdependent aspects of walking: path shape, step direction, number of steps, and number of starts and stops. We calculated variety in path shape as the standard error of path curvature. For bouts of ≥4 steps, we calculated path curvature by averaging the overall path curvature (the shortest distance between the start and end points of the bout divided by the total distance traveled) and stepto-step curvature (calculated the same way from each series of 3 points in the bout). We calculated variety in step direction as the standard error of the change in degrees of the plane angle between each pair of steps. We calculated variety in path length as the standard error of the number of infant steps per walking bout. Finally, we calculated the number of starts and stops as the total number of bouts.

Following the same procedure used for the infant-trained team in Tournament 1, we created 5 robot-training courses using the paths of the 15 infants in each group. Thus, the robot training courses represented the combination of dimensions in each group of infant paths. **Figure 3** shows the 5 infant-trained teams, distinguished by color. The green team was characterized by a high variation in step direction (SD of the change in degrees between each pair of steps) and a high number of stops and low variation in path shape (SD of path curvature) and relatively low variation in path length (SD of the number of steps per bout). The yellow team was characterized by relatively high variation in path shape and a high number of stops and low variation in step direction and path length. The blue team was characterized by a high variation in path shape and path length and a low number of stops and relatively low variation in step direction. The red team was characterized by high variation in path shape and length

FIGURE 3 | Exemplar robot training paths. (A) Exemplar paths from each of the five robot training courses built from clustered infants' walking paths. Colored lines show the path trajectory, dashes indicate steps, black dots indicate stops. (B) Bars showing relative combinations of walking features for each team's training course. Values are scaled from the minimum to the maximum across teams.

TABLE 4 | Final values of optimized parameters after each training regimen in Tournament 2.


and a relatively low number of stops and low variation in step direction. The purple team had relatively high variation along all dimensions. **Figure 3A** depicts examples of paths from each training course. As a baseline, we trained an additional team on a straight-line training course, just as the line-trained team in Tournament 1.

#### Results and Discussion

Overall, natural differences in the variety of infants' paths resulted in a consistent pattern of wins and losses in RoboCup, suggesting that some combinations of variation are more beneficial for functional walking than others. Final values of the optimized walking parameters (**Table 4**) indicate that although all teams were trained on variable paths, variability in more aspects of walking leads to improved whole-body control (e.g., longer constant offset between the torso and the feet, higher proportion of time the swing foot spends in the air, torso higher from the ground) and faster movement (e.g., shorter duration of single steps).

The purple-trained team won Tournament 2, with 11,786 League points, winning 3,420 games, tying 1,526, and losing only 54. The red-trained team came in second, followed by

Ossmy et al. Variability in Walking


TABLE 5 | Scoring table for Tournament 2 describing results across all games.

the blue-trained team, yellow-trained team, green-trained team, and the line-trained team. As expected, the line-trained team performed worse than any team trained on infant paths. The line-trained team never beat an infant-trained team, scored no goals, and accumulated 155 ties (see **Table 5** and number of league points in **Figure 4A**). **Figure 4B** depicts the wins of each team (rows) against all possible opponents (columns). The blue gradients across rows show the patterns of wins and losses**.** The win/loss matrix is not symmetrical because teams may tie.

The purple-trained team also had the highest average goal difference across the tournament [F(5, 29994) <sup>=</sup> 5281.72, <sup>p</sup> <sup>&</sup>lt; 0.001, one-way ANOVA on average goal difference]. As shown in **Figure 4C**, the purple-trained team was followed by the redtrained team, the blue-trained team, the yellow trained team, the green-trained team, and the line-trained team, respectively (all Bonferroni post-hoc tests ps < 0.001). **Figure 4D** shows the goals scored (rows) and conceded (columns) for each set of competitions (see **Table 6** for pairwise comparisons). Taken together, the results from Tournament 2 suggest that teams trained on a training course with high variability across most features fared better than teams trained on a course that had low variability on at least one feature.

#### GENERAL DISCUSSION

We combined the power of robotic modeling with the power of behavioral observation in infancy research. Specifically, we tested the functional utility of varied paths in infant walking using simulated soccer-playing robots, a model that shares many of the critical components of real-world infant walking (embodied agents moving purposefully through a changing environment). We found that optimizing simulated robot walking using more varied paths in a solitary, uniform training environment led to better functional outcomes in the new context of soccer, where the robots moved through a changeable environment filled with other agents. We suggest that infants' early experience with varied walking paths constitutes a natural training set that is a feature—not a bug—of learning functional walking.

### The Importance of Variety for Functional Performance: Tournament 1

With a changing body in a changing environment, learning fixed motor solutions is maladaptive (Adolph and Robinson, 2015). Instead, infants must learn to tailor their motor actions to dynamic body-environment relations. Indeed, experienced walking infants display tremendous flexibility and generativity. They distinguish safe from risky ground within two degrees of slant while navigating slopes, and one centimeter of accuracy while crossing drop-offs, gaps, and bridges (for reviews, see Adolph and Robinson, 2015; Adolph and Franchak, 2016). They update their assessment of whether slopes are walkable to take heavy shoulder-packs or slippery-soled shoes into account (Adolph and Avolio, 2000; Adolph et al., 2010). They modify their walking patterns (e.g., by altering step length and velocity) while approaching and crossing obstacles (Gill et al., 2009; Kretch and Adolph, 2017). And they find new solutions on the fly such as scooting down steep slopes, backing down drop-offs, and using handrails to cross narrow bridges (Adolph and Robinson, 2015).

How do infants learn such flexible, functional motor behaviors? A central principle in motor control is that variable practice minimizes the tendency to learn a fixed motor solution for a specific motor problem and encourages generalization to new variants of the task (Schmidt, 1975). But few laboratory training studies have focused on infant motor skill acquisition, and none involved a training regimen comparable to the magnitude and variety of infants' everyday walking experiences. Outside the laboratory, the flux of everyday life is replete with varied walking paths, varied footwear and clothing, varied ground surfaces and layouts, and varied tasks and activities. Infants' natural walking experience—"variable practice" writ large—may ensure that they learn flexible rather than fixed behaviors.

In the current studies, varied practice was operationally defined as variations in walking paths. Accordingly, in Tournament 1, teams with no training, or teams trained to walk along a straight line performed worst. Their narrow experiences did not prepare them to deal with the variety of movements needed to succeed in soccer. Robots trained on more varied paths (circles, squares) faired better. These teams had more experience turning, controlling the two sides of the body differently, and stopping to change direction. The infant-trained team experienced the most varied paths and performed best. Experiencing a wider variety of paths during training better led to more functional and adaptive performance in soccer.

#### Variety Is a Feature of Learning to Walk in Infants: Tournament 2

Every infant walking path was varied, and each dimension of variation was present in every robot team. However,

indicating magnitude of training success. The purple team scored more goals and conceded fewer than all other teams. (D) The average number of goals scored by each team (rows) against all other opponents (columns). Teams that had high variability in path shape, step direction, and bout length, and had a higher number of starts and stops were more likely to win.

because the dimensions are interdependent, high variability on all dimensions is unlikely. For example, a high number of stops likely limits the number of steps in a path, and consequently limits the variability in path shape and step direction. This interdependence among aspects of path variation is a fundamental characteristic of infant walking. Thus, no single feature of variation can explain the pattern of results in Tournament 2, and no single feature was more important than any other. Instead, the relative combination of dimensions differed among training regimens and these differences were crucial for functional performance. Teams that performed best showed high variability on multiple dimensions of path variation and did not show low variability on any dimension. It is important to note that we tested variability in path features and not the average values. For example, high variability in path curvature does not imply more curved

TABLE 6 | Pairwise comparisons for the average goal differences in Tournament 2.


bouts overall, but rather a wide range of path shapes some that were straighter and some that were more curved. Findings from Tournament 2 show that varied experience with multiple walking dimensions results in better functional walking.

Is training on the most varied infant paths sufficient to beat the current RoboCup world champions? Possibly. The winning 2017 robot soccer team, "UT-Austin Villa," in the relevant division (3D simulation league) was also optimized for varied walking using a hand selected training course (MacAlpine and Stone, 2018). There are many ways to manipulate training paths to optimize variability. Future work should investigate which specific aspects of variable walking helped our infant team outscore the geometrically trained teams. Simulated "infant-based" training paths that isolate one aspect of variability may help to parse the necessarily interdependent aspects of variability found in real infant walking paths. Future studies along these lines may provide important insights for AI researchers and roboticists about how to improve walking in robots. Regardless, our findings focus on infants and suggest that their everyday walking experience serves as useful training set for functional walking. Through incidental learning in the course of free play, infants

#### REFERENCES


likely learn to walk using a highly adaptive natural training

### CONCLUSION

regimen.

What is the best way to learn a generative skill, like functional walking? Answers to this kind of developmental question require appropriate models. Walking and other flexible, adaptive motor skills develop in real bodies, performing real tasks, in real environments. Robots are good models for development because they, like infants, must learn to cope with a body embedded in an environment (Adolph and Robinson, 2015). Similarly, RoboCup is a good domain to test functional walking performance because it requires robots to update their actions in response to a dynamically changing environment. Using robots allowed us to demonstrate that variety in everyday spontaneous activity leads to improved functional performance. Reciprocally, we suggest that AI researchers may benefit by observing everyday learning in human infants and other animals that acquire functional, adaptive performance.

### AUTHOR NOTE

A portion of this work took place at New York University, and was supported by NICHD grant # R37HD033486 to KA. A portion of this work took place in the Learning Agents Research Group (LARG) at UT Austin, and was supported by NSF (CNS-1305287, IIS-1637736, IIS-1651089, IIS-1724157), Intel, Raytheon, and Lockheed Martin awards to PS. PS serves on the Board of Directors of Cogitai, Inc. Human subjects participation was approved by the NYU IRB-FY2016-825. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research. We are grateful to Do Kyeong Lee, Orit Herzberg-Keller, Carli Heiman, Joshua Schneider, Rose Egan, and Sinclaire O'Grady for their help with data coding and processing.

#### AUTHOR CONTRIBUTION

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Ossmy, Hoch, MacAlpine, Hasan, Stone and Adolph. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Efficiency of Infants' Exploratory Play Is Related to Longer-Term Cognitive Development

Paul Muentener <sup>1</sup> \*, Elise Herrig<sup>2</sup> and Laura Schulz <sup>2</sup>

*<sup>1</sup> Department of Psychology, Tufts University, Medford, MA, United States, <sup>2</sup> Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Medford, MA, United States*

In this longitudinal study we examined the stability of exploratory play in infancy and its relation to cognitive development in early childhood. We assessed infants' (*N* = 130, mean age at enrollment = 12.02 months, *SD* = 3.5 months; range: 5–19 months) exploratory play four times over 9 months. Exploratory play was indexed by infants' attention to novelty, inductive generalizations, efficiency of exploration, face preferences, and imitative learning. We assessed cognitive development at the fourth visit for the full sample, and again at age three for a subset of the sample (*n* = 38). The only measure that was stable over infancy was the efficiency of exploration. Additionally, infants' efficiency score predicted vocabulary size and distinguished at-risk infants recruited from early intervention sites from those not at risk. Follow-up analyses at age three provided additional evidence for the importance of the efficiency measure: more efficient exploration was correlated with higher IQ scores. These results suggest that the efficiency of infants' exploratory play can be informative about longer-term cognitive development.

#### Edited by:

*Kathy Hirsh-Pasek, Temple University, United States*

#### Reviewed by:

*Jennifer B. Wagner, College of Staten Island, United States Ora Oudgenoeg-Paz, Utrecht University, Netherlands*

#### \*Correspondence:

*Paul Muentener paul.muentener@tufts.edu*

#### Specialty section:

*This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology*

Received: *30 June 2017* Accepted: *16 April 2018* Published: *31 May 2018*

#### Citation:

*Muentener P, Herrig E and Schulz L (2018) The Efficiency of Infants' Exploratory Play Is Related to Longer-Term Cognitive Development. Front. Psychol. 9:635. doi: 10.3389/fpsyg.2018.00635* Keywords: exploratory play, cognitive development, IQ, infancy, longitudinal design

## INTRODUCTION

Parents, educators, and researchers (Groos, 1901; Vygotsky, 1934/1962; Piaget, 1962; Berlyne, 1969; Bruner et al., 1976; Rubin et al., 1983; Power, 2000) all tend to believe that children learn through exploratory play; however, understanding the relation between play and cognitive development remains an ongoing challenge. The causal relation between exploration and cognitive development has been proposed in both directions: smarter, more behaviorally flexible species are more likely to play (Groos and Baldwin, 1898; Bjorklund, 1997; Pellegrini et al., 2007), and play may support the acquisition of motor (Bjorklund and Brown, 1998; Pellegrini and Smith, 1998), cognitive (Hutt and Bhavani, 1972; Singer et al., 2006), and social skills (Leslie, 1987; Astington and Jenkins, 1995; Youngblade and Dunn, 1995; Taylor and Carlson, 1997) Play has also been used to assess epistemic curiosity, with several studies showing that children selectively engage in exploratory play given opportunities for information gain (Schulz and Bonawitz, 2007; Schulz et al., 2008; Cook et al., 2011; Bonawitz et al., 2012; Buchsbaum et al., 2012; Legare, 2012, 2014; Gopnik and Walker, 2013; Gweon et al., 2014; Stahl and Feigensen, 2015; van Schijndel et al., 2015). Research also suggests that early advances in exploratory play or direct facilitation of exploratory play may have cascading effects on children's learning about the physical and social world (Needham et al., 2002; Sommerville et al., 2005; Libertus and Needham, 2010; Rakison and Krogh, 2012; Schwarzer et al., 2013; Gerson and Woodward, 2014; Oudgenoeg-Paz et al., 2015).

Although a large body of research has attempted to characterize exploratory play over the first few years of life, defining exploratory play remains a challenge. Indeed, although the clinical diagnosis of developmental disorders such as Autism Spectrum Disorders (ASD) and Attention Deficit Hyperactivity Disorder (ADHD) is partly based upon the judgment that children engage in atypical exploratory play (e.g., restricted/repetitive play in ASD, and distracted/disoragnized play in ADHD; American Psychiatric Association, 2013), distinguishing typical and atypical exploratory play remains largely a matter of intuition. Studies that have tried to characterize exploratory play more rigorously have focused largely on how simple object manipulation changes with age. Such studies have assessed, for instance, the number of objects children play with, the amount of time children play with each object, and the types of actions they engage in (e.g., spinning, touching, dropping, banging, etc.,) (McCall, 1974; Fenson et al., 1976; Ruff, 1984; Palmer, 1989; Rochat, 1989; Whyte et al., 1994; Morange-Majoux et al., 1997; Bourgeois et al., 2005; Kahrs et al., 2013). Other work has focused on visual exploration, documenting changes in scan patterns and rate of habituation to novel stimuli over infancy (Fagan, 1974; Rose et al., 1982, 2001; Rose, 1983; Bornstein and Benasich, 1986; Colombo et al., 1987, 1988, 2004; Rose and Feldman, 1987; Richards, 1997). Finally, other work has focused on the relation between visual attention and manual action during object exploration (Fenson et al., 1974; Johnson and Brody, 1977; Ruff, 1986; Ruff and Dubiner, 1987; Oakes et al., 1991, 2002; Ruff et al., 1992; Oakes and Tellinghuisen, 1994; Cassia and Simion, 2002; Perone and Oakes, 2006; Soska et al., 2010; Baumgartner and Oakes, 2013) Such research suggests that children's visual exploration becomes more efficient, (e.g., reflected in faster encoding of visual information), their manual exploration becomes more complex, and the link between their visual and motor systems become more integrated over development. These developments may represent increasingly sophisticated cognitive skills, more opportunities for learning, or both.

It is also the case that relatively few studies have looked at whether individual differences in infants' exploratory play are related to longer-term cognitive development. Rather, studies have looked either at proxy measures, arguably related to but not necessarily specific to exploratory play, or they have looked at single-time-point correlations between measures of exploration and measures of cognition. As a result of such work, we now know, for instance, that one of the most basic measures of visual exploration in infancy—rate of visual habituation—is a better predictor of IQ than standard developmental assessments such as the Bayley's Scales of Infant Development, the Battelle Developmental Inventory, and the Gesell Developmental Assessment (see McCall and Carriger, 1993 for review and meta-analysis). Similarly, a detailed multivariate analysis of 60 min of typically-developing infants' object exploration and overall motor development at 5 months correlates with children's academic achievement at 14 years of age (Bornstein et al., 2013). Additionally infants categorized as high risk for developmental delay (e.g., infants born prematurely, with Down Syndrome, or with older sibling with ASD) differ from full-term infants in their simple object interactions (e.g., touching, rotating, and transferring) (Sigman, 1976; Kopp and Vaughn, 1982; Ruff et al., 1984; Loveland, 1987; Kavsek and Bornstein, 2010; de Almeida Soares et al., 2012; de Campos et al., 2013; Koterba et al., 2014; Kaur et al., 2015; Zuccarini et al., 2016).

Collectively, this research suggests that something about early exploratory play correlates with cognitive development, but which precise aspects of exploratory play are correlated with cognitive development remain unclear. Across studies, researchers have looked variously at discursive vs. focused exploratory play in preschoolers and divergent and convergent thinking in seven to 10-year-olds (Hutt and Bhavani, 1972), stimulation seeking (including, but not limited to, exploratory play) in 3-year-olds and IQ in 11-year-olds (Raine et al., 2002), fine and gross motor development in infancy and literacy at seven (Viholainen et al., 2006), and variables indexing both exploratory activity and motor maturity (upper and lower body coordination, locomotion, and balance) in infants and IQ in adolescence (Bornstein et al., 2013). The diversity of such studies speaks to a compelling relation between early exploration and later cognitive development, but raises questions about whether more active infants are more likely to thrive overall, whether any particular aspects of exploratory play might be particularly informative, and how any particular aspects of exploratory play may be related to each other. In the current study we attempt to address each of these questions. Specifically, using naturalistic measures of exploratory play (i.e., measures that could be easily used in educational, clinical or home environments) we aimed to see (1) whether we could identify diverse, non-overlapping measures of exploration; (2) whether any of these measures were stable longitudinally over infancy, and if so, (3) whether any stable measure of exploratory play correlated with shorter- and longer-term measures of cognitive development.

In choosing which aspects of exploratory play to assess, we were motivated by prior theoretical and empirical work on the role of exploratory play in early cognitive development, and therefore, took a rather broad approach in designing our measures. Our choice of measures was motivated by two overarching perspectives: rational constructivist accounts of children's learning (e.g., Gopnik and Wellman, 2012; Schulz, 2012; Xu and Kushnir, 2013) and social learning theories (Vygotsky, 1934/1962; see also, Tomasello, 2000; Csibra and Gergely, 2006; Meltzoff, 2007). To follow we briefly discuss some of the work underlying the choice of each of the five items in the exploration assessment. Critically, the five items were chosen to be distinctive rather than exhaustive. Our goal was not to fully characterize exploratory play in infancy but to capture components of play that seemed likely to draw on distinct cognitive skills, across different phases of exploratory play (i.e., choosing which objects to explore as well as engaging in different actions on those objects), all while requiring approximately equivalent motor skills (i.e., reaching for and manipulating objects). If exploratory play in early childhood relies upon a single cognitive process, we would expect some or all of these measures of exploratory play to correlate with each other. If, on the other hand, as hypothesized, exploratory play is comprised of a distinct, non-overlapping set of cognitive processes, and our measures effectively assess this, then there should be no correlations among our diverse measures of exploratory play.

Rational constructivist theories propose that at least in simple contexts, children integrate prior knowledge and data to guide their inferences in ways that can be characterized by formal accounts of learning (Tenenbaum et al., 2011). These accounts view children's exploration as an effective means of gathering evidence to inform and update learners' beliefs about the world (see Schulz, 2012 for discussion and review). Here we focus on three aspects of rational exploration: attention to novelty, inductive generalization, and efficiency of exploration.

As noted, infants' attention to novelty has been shown to be one of the most robust predictors of cognitive development: studies of visual attention have shown that faster rates of visual habituation (e.g., fewer trials to reach a habituation criterion, greater decrement in looking time across habituation trials) as well as a greater degree of novelty preference (e.g., longer looking at novel images compared to familiar images) exhibited during looking time studies is correlated with higher IQ and distinguishes full-term from pre-term infants at risk for developmental delay (for review, see McCall and Carriger, 1993; Kavšek, 2004; Fagan et al., 2007) These studies support the argument that encoding and storing visual information more quickly into memory might allow for more opportunities both to integrate this information with existing knowledge and more opportunities to encode new information. To the extent that these measures index visual exploration, these findings provide support for the hypothesis that early measures of exploration might index broader cognitive abilities. Because here we were interested in play per se, we used manual exploration rather than looking time to assess children's attention to novelty.

The inductive generalization measure was motivated similarly by rational constructivist approaches to early learning. Research suggests that infants can draw rich generalizations from sparse data (Dewar and Xu, 2010; Gweon et al., 2010; Téglás et al., 2011) and that the ability to make inductive generalizations supports much of children's theory-building over the first several years of life (for review, see Schulz, 2012). Thus, it seemed likely that children's ability to make inductive generalizations may be positively related to cognitive development. Here we assessed infants' ability to extend non-obvious properties demonstrated on a target toy to a novel object that had a similar shape, but different color or pattern (e.g., Baldwin et al., 1993; Welder and Graham, 2001).

The efficiency of exploration measure was motivated by work looking at the increasing sophistication of exploratory play over infancy (e.g., Ruff et al., 1992) and the idea that this might play a role in rational exploration (e.g., Bonawitz et al., 2011; Gopnik and Walker, 2013; Legare, 2014; Stahl and Feigensen, 2015; van Schijndel et al., 2015; Sim and Xu, 2017) Further support for this measure comes from some longitudinal work, mentioned above, suggesting that a factor combining both motor coordination and efficient exploratory behavior in infancy correlates with longerterm cognitive development (Bornstein et al., 2013). In the current study, efficient exploration was indexed by the ability to find different target functions on a multi-function toy.

In addition to these three measures focused on rational constructivist learning, we also included two measures intended to assess social aspects of early exploration. First, motivated by considerable evidence that selective attention to faces and facelike stimuli emerges early (for review, Morton and Johnson, 1991; Johnson et al., 2015; see also, Fantz, 1963; Farroni et al., 2002; Johnson, 2005; Frank et al., 2009, 2012; Reid et al., 2017), we thought it was possible that such selective attention might encourage selective exploration. Previous work on exploratory play has focused almost exclusively on object exploration, however, it seemed possible that selective exploration of faces might correlate with later cognitive development. Thus, as we were interested in exploratory play, we assessed infants' preferential exploration of stimuli with faces over stimuli without faces in a reaching task, rather than a traditional preferential looking task.

The second social aspect we assessed was children's imitative learning. We reasoned that although infants' exploratory play is typically assessed as spontaneous, self-directed exploration, in the cultures in which these assessments typically occur, caregivers routinely use ostensive, pedagogical cues to demonstrate object properties to children. Researchers have suggested that infants' responsiveness to pedagogical cuing plays a critical role in cultural transmission (Tomasello, 2000; Csibra and Gergely, 2006) and empirical evidence suggests that the presence or absence of such social cuing changes the way children explore their environment (Senju and Csibra, 2008; Bonawitz et al., 2011; Butler and Markman, 2014; Gweon et al., 2014; Butler and Tomasello, 2016; Shneidman et al., 2016). Motivated by the idea that the ability to use these cues to filter out distractors and constrain initial exploration might be an important cue to cognitive development, we assessed children's imitation of an object function from an adult's pedagogical demonstration.

Thus, to address our first two aims, we assessed the distinctiveness and stability of children's performance on five aspects of exploratory play: attention to novelty, inductive generalization, efficiency of exploration, face preferences, and imitative learning. To capture a broad and representative view of exploratory play over development, we assessed infants' exploratory play over a relatively large age range (5–19 months of age) and across differing levels of risk status for developmental delay (i.e., a subset of infants were recruited from early intervention sites). In total, throughout the first phase our study (Phase 1), we assessed children's performance on the five exploratory play tasks four times over a 9-month period.

Given that researchers have theorized that the five aspects of exploratory play measured in the current study contribute to learning over the first few years of life, we hypothesized that children's performance on the exploratory play measures might also be indicative of longer-term cognitive development and intelligence. To address this third aim, we assessed the relation between children's exploratory play behaviors and their cognitive development at two time points: in the shorter term at the end of Phase 1 (shorter-term cognitive development assessments described below in Methods) and in the longerterm at 3 years of age (Phase 2 described below in Methods). We specifically looked only at those exploratory play behaviors that were stable over Phase 1. Of course, we anticipated that significant differences would emerge across development at any given time point (e.g., we might expect older children to engage in more efficient exploration than younger children) as well as within participants across Phase 1 (e.g., we might expect children to become more efficient in their exploration over time). Thus, rather than compare children's actual exploratory behavior on each task with other cognitive measures, we looked at how each child performed relative to similar-aged peers at each timepoint; although significant developmental changes were likely to occur in our battery of tasks, assessing individual children's abilities relative to their peers should normalize any grouplevel developmental differences. We reasoned that if children's exploration relative to their peers at one time point failed to predict their exploration relative to their peers at another time point, it was also unlikely to correlate with broader cognitive development. However, to the degree that any measures of exploratory behavior remained stable relative to peers over development, we might then ask how exploratory play correlates both with shorter-term measures of cognitive development and whether exploratory play in infancy correlates with cognitive outcomes later in childhood.

In choosing measures of cognitive development, we focused on broad cognitive abilities that seemed likely to index overall learning and knowledge construction. Specifically, for shorterterm cognitive development we focused on vocabulary size and the ability to delay gratification. Both receptive and productive language abilities contribute to IQ tests, such as the Weschler Preschool and Primary Scales of Intelligence (WPPSI) test (Wechsler, 2012), and vocabulary size in infancy and toddlerhood is correlated with later IQ (Bornstein, 1985; Marchman and Fernald, 2008). Several researchers have also argued that the development of executive function plays a role in conceptual change and theory development across childhood (Carlson and Moses, 2001; Carey et al., 2015; Powell and Carey, 2017) Specifically, within the set of abilities that comprise executive functions (e.g., inhibition, set shifting, working memory), we focused on the ability to delay gratification in early childhood as it has been shown to be correlated with higher IQ later in development (Mischel et al., 1989; Shoda et al., 1990). For the longer-term cognitive development measures, in addition to measuring their IQ and ability to delay gratification, we also included an assessment of children's social communication abilities as we had also focused on social aspects of exploratory play.

To summarize, we assessed the stability and distinctiveness of five aspects of exploratory play in infancy, as well as their potential relation to shorter- and longer-term cognitive development. The study had two phases: in Phase 1 (Exploratory Play Assessment and Shorter-term Cognitive Development Assessment), we assessed infants' exploratory play four times over a 9-month period and, in Phase 2 (Longer-term Cognitive Development Assessment), these children returned for follow-up cognitive assessments at age three. Our overall hypothesis was that components of exploratory play in infancy would be related to cognitive development later in childhood; however, since there is broad agreement among researchers that the individual components tested here may be important for early learning but little consensus as to their relative importance, we remained agnostic as to which specific components of infants' exploratory play would correlate with cognitive development. Phase 1 allowed us to assess the independence of the exploratory tasks from each other, their stability across testing sessions, and their sensitivity to group differences in at-risk vs. typically developing infants. As an exploratory measure, it also allowed us to investigate possible correlations between items on the exploratory play assessment and shorter-term cognitive development in order to motivate a targeted hypothesis for Phase 2. Following these exploratory analyses, we then restricted our analyses of the longer-term relations between exploratory play and cognitive development to the specific components of early exploration that were correlated with shorter-term measures of cognitive development in Phase 1. In order to draw conclusions on the overall relation between exploratory play and cognitive development, we then assessed the relation these components and both the average performance across Phase 1 as well as performance for the first Phase 1 visit.

### METHODS

#### Participants

We recruited infants between 5 and 19 months of age to participate in this longitudinal study of exploratory play. To increase variability in the sample, we recruited both infants from a local children's museum and infants in early intervention programs. We refer to the former subset of infants as "typicallydeveloping" as these infants were not born premature, were not enrolled in early intervention programs, and had parents who did not report any health concerns for them. We refer to the latter subset of infants as "at-risk," as these children were enrolled in early intervention services due to birth complications and social risk factors and were expected to be at an increased risk for developmental delay.

For the typically-developing sample, 262 infants were initially recruited at a local children's museum and asked to participate in Visit 1 of the exploratory play assessment (i.e., the first session of this longitudinal study; full procedure described below). At the conclusion of this session, all families were asked if they were interested in continuing on in the remainder of the longitudinal study. Of these 262 infants, 196 (74.81%) families agreed to be contacted for subsequent visits; however, only 120 infants (45.80 %) were scheduled and participated past Visit 1. These 120 infants were contacted every 3 months to participate in Visits 2-4 of Phase 1 of the study. Infants needed to complete at least 3 of the 4 Phase 1 visits in order to be included in the final sample; 96 infants (80.00%) met this criterion, while the remaining infants had families who moved during Phase 1 (n = 7), were no longer interested in participating after Visit 2 (n = 4), or expressed interest in participating but were unable to schedule 3 or more visits (n = 13).

For the at-risk sample, infants were recruited for participation from early intervention programs. Infants had been referred to the early intervention programs due to a combination of risk factors including: prematurity, low birth weight, birth complications, and social risk factors (in particular, low socioeconomic status and risk for maternal depression). Contacted families were concurrently enrolled in a separate study assessing maternal problem-solving strategies. Forty-two infants were recruited initially; 38 (90.48%) were scheduled and participated past Visit 1. Of these 38 infants, 34 infants (89.47%) were assessed at three of the four Phase 1 visits; the remaining infants had families who moved during Phase 1 (n = 1), were no longer interested in participating after Visit 2 (n = 1), or were interested in participating but unable to schedule 3 or more visits (n = 2).

Thus, the final sample of participants who participated in at least three of four Phase 1 visits over the 9-month period included 130 children (69 female): 96 typically developing infants (n = 51 female) and 34 at-risk infants (n = 18 female) (overall mean age at enrollment: 12.02 months, SD = 3.5 months; range: 5–19 months).

Families were contacted again when their child turned three to participate in Phase 2 of the study. All follow-up visits were completed within approximately 6 months of the child's third birthday. Of the initial sample of 130 infants, 38 children returned for Phase 2 (29.23%; mean age at Phase 2 assessment: 3.23 years, SD = 0.15 years; range 36–43 months); two of these children were from the at-risk sample.

#### Procedure

The study has two phases: the Exploratory Play Assessment and Shorter-term Cognitive Development Assessments (Phase 1) and Longer-term Cognitive Development Assessment (Phase 2). See **Figure 1** for study design. All procedures were approved by the MIT Institutional Review Board with written informed consent provided by the parents of all participants in this study.

In Phase 1 of the study (**Figure 1**), we administered an exploratory play assessment to infants four times over a 9-month period. Children began Phase 1 when they were 5–19 months of age and ended Phase 1 when they were 14–28 months of age. After the exploratory play assessment was administered at the final (fourth) Phase 1 visit, parents were asked to complete the Macarthur-Bates Communicative Development Inventory (MCDI; Fenson et al., 2000). To assess the specificity of any significant relation between exploratory play and shorter-term cognitive development, children's executive function skills were assessed on a modified delay of gratification task, and parents were asked to fill out a questionnaire relating to assessment and diagnosis of developmental disorders as well as parental concern.

Children returned for Phase 2 of the study at 3 years of age, at which time an independent lab, with no knowledge of the children's performance on the exploratory play assessment,

FIGURE 2 | Sample stimuli images. All four Efficiency stimuli are shown below. Sample stimuli from the remaining tasks are shown below; see Table 1 for a description of the full stimulus set.

assessed the children's IQ using the Weschler Preschool and Primary Scales of Intelligence (WPPSI) test (Wechsler, 2012). To determine the specificity of any relation between exploratory play and IQ, the children's executive functioning (Mischel et al., 1989) and social communication abilities (Rutter et al., 2003) were also assessed.

#### Phase 1: Exploratory Play Assessment

The exploratory play assessment took approximately 15 min to complete. Infants were tested in a quiet room in their own homes, a private testing room in our laboratory, or an onsite laboratory at a children's museum; a preliminary assessment early in the data collection process showed that the procedure could be implemented equally well across testing locations. Parents were present throughout the procedure, but were not told any of the dependent measures or directional hypotheses for any task or for the study overall. A striped red tablecloth was placed between the experimenter and the child in order to control for stimuli placement throughout the study. The procedure described below was the same at each of the four Phase 1 visits; however, we used different stimuli at each visit (see **Figure 2** for example stimuli; see **Table 1** for full details). The same experimenter administered the exploratory play assessment at each visit across Phase 1. All sessions were videotaped and all behaviors were coded from videotape. Although this experimenter was present across all Phase 1 visits, the experimenter did not code children's performance on these tasks and did not view coded data for individual children when conducting Phase 1 visits.

#### **Warm-up phase**

This trial helped familiarize the children to the experimenter and determine the extent of each child's furthest reach. During this phase, the experimenter established the child's furthest reach to the left, right, and center of the tablecloth with a toy not in the stimulus set. When children had to make a choice between stimuli during Phase 1, the experimenter placed the items at the limits of each child's reach.

#### **Attention to novelty task**

We assessed children's exploration of novel toys on two trials (Fenson et al., 1974; Sigman, 1976; Oakes et al., 1991, 2002) At the start of each trial, the experimenter said, "Look at this!" while holding up a toy (familiar toy). The experimenter then placed the familiar toy within the child's reach and allowed the child to play for 30 s. The experimenter then retrieved the familiar toy and showed the child the familiar toy alongside a new toy (novel toy). The researcher then placed both toys equidistant to the left and right of the child (counterbalanced across children) and allowed the child to play for up to 90 s. The experimenter then repeated this procedure with a new pair of stimuli on the second trial. We coded the child's latency to touch the novel toy on each trial and averaged the latencies to compute an average latency.

#### **Efficiency of exploration task**

We assessed how long children explored a novel multi-function toy on a single trial and how many functions of the toy they contacted (adapted from Bonawitz et al., 2011; Gweon et al., 2014; Shneidman et al., 2016). At the start of the trial, the experimenter said, "Look at this!", placed the toy within the child's reach and allowed them play. The play time was terminated when any of the following occurred: (1) the child stopped contacting the toy for 5 s, the toy was re-introduced to the child, and the child again stopped contacting the toy for 5 s; (2) the child verbally indicated that they were finished or (3) 5 min of play time elapsed, whichever came first. The different functions for each toy were pre-specified based on the individual toys. We coded the total time the child was in contact with the toy as well as the number of pre-specified functions of the toy the child discovered. We divided the number of functions the child found by the total amount of time the child played with the toy to yield an efficiency score. Note that because this measure does not compensate for the fact that later-discovered functions may be more difficult to find, it is a relatively conservative measure of the efficiency of children's exploration.

#### **Inductive generalization task**

We tested children's ability to generalize non-obvious properties of objects (Baldwin et al., 1993; Welder and Graham, 2001). At the start of each trial, the experimenter said, "Look at this!" while holding up a novel toy. She then demonstrated a target action on the toy (e.g., shaking it to make a rattle noise) six times. The



experimenter then gave the child a new toy that was the same shape but differed in color and pattern. The child's toy was inert (e.g., it did not make a noise when shaken). The child was allowed to play for up to 30 s, and we coded the number of target actions the child produced. The experimenter repeated this procedure on a second trial with new toys and outcomes. During the second trial, the child's toy produced the target outcome so that the child could not infer that the toys would never produce the target outcome. The experimenter then repeated the procedure on a third trial, again with new toys and outcomes; as in the first trial, the child's toy did not produce the target outcome. We averaged the number of target actions the child produced on the first and third trial to yield the average number of attempts.

#### **Face preference task**

We assessed whether children preferred toys with schematic upright faces to schematic scrambled faces using a forced choice paradigm (adapted from Morton and Johnson, 1991). At the start of each trial the experimenter said, "Look at this one!" while holding up a schematic face and then a scrambled face, both mounted on discs. The experimenter then placed the discs equidistant to the left and right of the child (counterbalanced across children) and allowed them to make a choice. This procedure was repeated twice more with new stimuli. We coded whether the child chose the face on each trial yielding a % preference for face stimuli.

#### **Imitative learning task**

We assessed the extent to which children would imitate a pedagogically demonstrated target action (e.g., Southgate et al., 2009). Pilot testing on each toy was used to identify children's initial actions at baseline (e.g., playing with feet and antennae of plush caterpillar toy); the experimenter's target actions were always actions never produced by children at baseline. At the start of each trial, the experimenter said, "Look at my toy! This is my toy. I am going to show you how my toy works. Watch!" and then demonstrated a target action (e.g., pushing center of caterpillar toy to make a squeaking noise). The experimenter then said, "Wow! That's how my toy works. Watch, this is how my toy works," and demonstrated the same target action two additional times. The experimenter then said, "Do you want to play with my toy?" and placed the toy within the child's reach. We coded whether the child imitated the experimenter's action on the first interaction with the toy (1 or 0). This procedure was repeated on a second trial with a new toy. We summed across the two trials to yield a total imitation score.

#### Phase 1: Shorter-Term Cognitive Development Assessment

We assessed children' vocabulary and executive function abilities as well as asked parents about any developmental concerns as a measure of shorter-term cognitive development outcomes. These assessments occurred at the final (fourth) Phase 1 visit, when children were between 14 and 28 months of age. For two participants, the vocabulary measure and parent questionnaire were completed over the phone, as the participants did not complete a fourth visit; these participants did not provide data for the delay of gratification task.

FIGURE 3 | Visual depiction of coding procedure. Coders coded no more than one task within a visit and no more than one visit for a given task. For example, if a coder coded the Visit 1 Attention to Novelty task for a participant, then that coder did not code any other Visit 1 task or the Attention to Novelty task on any other visit for that participant.

#### **Vocabulary**

To assess children's vocabulary size, parents completed the short form Macarthur-Bates Communicative Development Inventory (MCDI), which assesses children's receptive and productive vocabulary (Fenson et al., 2000). This inventory was then scored corresponding to the child's corrected-age based on prematurity. Children whose corrected age was under 18 months were assessed using the CDI: Words and Gestures form; children whose corrected age was over 18 months were assessed using the CDI: Words and Sentences form. We determined children's percentile score based on the productive vocabulary measure across both forms.

#### **Delay of gratification task**

Children were shown that when a ball was placed down a chute, a jingle noise would occur. Children were very interested in this outcome, and most children spontaneously reached for the ball to place it down the chute. The experimenter, however, kept the ball and chute at a distance from the child. The experimenter then placed the ball under a transparent cup, and children were told that they needed to wait to retrieve the ball until the experimenter rang a bell. The experimenter increased the wait time on successive trials (5, 10, 20, 40, and 80 s), and we averaged the time it took for children to retrieve the ball across trials.

#### **Parental concerns checklist**

Parents reported whether their child had ever spent time in a neonatal intensive care unit, had ever been assessed for any developmental disorder, and whether they had any concerns about their child's motor, social, language, or cognitive development. Children who spent time in the neonatal intensive care unit or whose parents reported any concern about their development were given a score of 1; all other children were given a score of 0.

#### Phase 1: Administration and Coding

A single experimenter administered the exploratory play assessment throughout Phase 1. This experimenter neither coded nor saw any of the Phase 1 data. Eighteen different coders independently coded the videotapes from the Phase 1 exploratory play assessment. The coders were unaware of Phase 2 and that some children were at-risk for developmental delay, and did not know the directional hypotheses for any task or the overall study. To mitigate against any bias from coding repeated tasks for a given child, the coders' responsibilities were distributed such that


TABLE 2 | Descriptive statistics for the 6-month-old cohort's performance on the Phase 1 Exploratory Play Tasks.

TABLE 3 | Descriptive statistics for the 9-month-old cohort's performance on the Phase 1 Exploratory Play Tasks.


*Visits occurred at 3-month-intervals across Phase 1: infants were 6, 9, 12, and 15 months of age for Visits 1–4, respectively.*

any given coder coded only one of the five tasks in a single visit and did not code the same task across visits (e.g., a coder who coded the Visit 1 Attention to Novelty task did not code this child on any other Visit 1 task and did not code the Visits 2-4 Attention to Novelty task for that child) (**Figure 3**). All coders were initially trained to code performance on all five exploratory play tasks in this study, using testing sessions from children (n = 20) who had completed only the first Phase 1 visit. All coders achieved high inter-rater reliability (all r's >0.9) with experienced coders on each of the five exploratory play tasks. An additional two coders coded the delay of gratification task; both were unaware of the whether the children were at-risk for developmental delay and had no knowledge of children's Phase 1 performance.

#### Phase 2: Longer-Term Cognitive Development Assessment

We contacted families for a follow-up visit within 6 months of the child's third birthday. A researcher from an independent clinical lab not involved in any of the previous research, unaware of children's risk status, and of children's performance in Phase 1, administered the Phase 2 assessments: the IQ test and delay of gratification task. Parents completed the Social Communication Questionnaire (SCQ) (Rutter et al., 2003) while the children were *Visits occurred at 3-month-intervals across Phase 1: infants were 9, 12, 15, and 18 months of age for Visits 1-4, respectively.*

completing the other tasks. The independent researcher coded all tasks.

#### **IQ task**

We assessed IQ at age 3 with the Weschler Preschool and Primary Scales of Intelligence test (WPPSI, 4th edition). This test assessed children's receptive and productive vocabulary, their general world knowledge, and their visual-spatial abilities. We used the full-scale composite score comprised from the individual subscales of the WPPSI as an index of children's cognitive development; we also conducted post-hoc analyses using the individual WPPSI verbal comprehension, visual spatial, and working memory subscales.

#### **Delay of gratification task**

This task was modeled after the standard marshmallow delay of gratification task (Mischel et al., 1989). Children first practiced ringing a bell to make an experimenter return to the room after leaving. Children were left alone in the testing room with a small amount of a preferred snack and told that they could ring the bell immediately to have the small snack or wait until the experimenter returned (without ringing the bell) to have a larger amount of snack. Children were left alone in the testing room



TABLE 5 | Descriptive statistics for the 15-month-old cohort's performance on the Phase 1 Exploratory Play Tasks.


*Visits occurred at 3-month-intervals across Phase 1: infants were 12, 15, 18, and 21 months of age for Visits 1-4, respectively.*

for up to 15 min, until they rang the bell, or requested that the experimenter return.

#### **Social communication abilities**

While the children were completing these tasks, parents completed the Social Communication Questionnaire (SCQ) (Rutter et al., 2003). This questionnaire assesses children's basic social communication abilities (e.g., emotional expressions, turntaking, pretend play). Although this checklist questionnaire was designed primarily as a screening tool to assist in the diagnosis of autism spectrum disorders in children aged 4 years and older, it has been used successfully to screen for social communication abilities more broadly at 3 years of age (Allen et al., 2007; Snow and Lecavalier, 2008). For diagnostic purposes, the SCQ has a cutoff point of 15 for children older than 4 years of age; a lower cutoff point (e.g., 13) has been recommended for younger children (Snow and Lecavalier, 2008). In the current study we used children's raw score as a continuous measure of their social communicative abilities; however, as we also note below, no child received a score greater than the diagnostic cut-off of 13 on this measure.

*Visits occurred at 3-month-intervals across Phase 1: infants were 15, 18, 21, and 24 months of age for Visits 1-4, respectively.*

### RESULTS

### Preliminary Analyses

Preliminary analyses revealed that the Attention to Novelty, Inductive Generalization, Efficiency of Exploration, and Imitative Learning tasks, as well the Delay of Gratification scores during the shorter-term cognitive development assessment, were all correlated with age: performance increased with age for each task (all ps < 0.05). Since we were primarily interested in individual differences, rather than age-related differences, participants were split into 3-month cohorts based on their age at enrollment (6-month-old cohort, range: 5–7 months, n = 21; 9-monthold cohort, range: 8–10 months, n = 35; 12-month-old cohort, range: 11–13 months, n = 35; 15-month-old cohort, range: 14– 16 months, n = 27; 18-month-old cohort, range: 17–19 months, n = 12) and a standard score for infants' performance on each task was computed, relative to children in their age cohort, separately for each visit; premature infants were assigned to cohorts based on their age corrected for prematurity. We then computed the average of the standard scores across visits for each task to obtain a measure of infants' average performance on each task relative to similar-aged peers. Subsequent correlational analyses on the average standard scores of each task with


TABLE 6 | Descriptive statistics for the 18-month-old cohort's performance on the Phase 1 Exploratory Play Tasks.

*Visits occurred at 3-month-intervals across Phase 1: infants were 18, 21, 24, and 27 months of age for Visits 1-4, respectively.*

TABLE 7 | Descriptive statistics for the Shorter- and Longer-term measures.


*Shorter-term assessments were conducted at Visit 4, after participants completed the exploratory play assessment. Longer-term assessments were conducted when children were 3 years of age.*

participant age, separately by cohort (i.e., 5 task analyses per cohort, 5 cohorts in total), did not reveal any systematic relations and suggested that the new age cohorts mitigated any age effects present in the exploratory play data. **Tables 2**–**7** report the descriptive statistics for all of the raw data for each task, separately by age cohort and visit, as well as the shorter- and longer-term cognitive development measures. These tables show that children's performance resulted in a wide range of raw scores, and suggest that we had sufficient variability to detect potential relations between the measures in the current study.

Additional preliminary analyses revealed no significant impact of gender, parent socioeconomic status, or testing location on children's performance on the exploratory play assessment, the shorter-term cognitive development, or the Phase 2 cognitive development measures. Thus, we collapsed across and did not consider these factors in all subsequent analyses.

#### Phase 1 Analyses

We conducted three separate analyses in Phase 1. First, we looked at the items in the exploratory play assessment to determine their independence from one another and their stability across testing sessions. Second, we looked at whether the sample of infants recruited from the early intervention sites performed differently than the infants not at-risk for developmental delay on any particular exploratory play assessment item. Finally, we conducted exploratory analyses looking at the relation between the five measures in the exploratory play assessment and the shorter-term cognitive development assessment.

#### **The exploratory play assessment**

Our first set of analyses focused on infants' performance on the exploratory play assessment. Analyses revealed that, as intended, the exploratory play assessment tapped distinct components of exploratory play and that only performance on the efficiency measure was stable across development. This conclusion was supported by three sets of analyses. First, we conducted pairwise correlations between children's scores on all Phase 1 tasks. To control for multiple comparisons across these 10 analyses, we employed a Bonferonni-correction yielding a significance threshold level of <0.005. This analysis yielded no significant correlations among the tasks (**Table 8**). Second, we conducted a principal components factor analysis on children's scores on each task to determine whether the data were better described by a smaller set of components. This analyses suggested that we should not collapse the five Phase 1 items onto a fewer number of components. Although the analysis yielded three components with Eigenvalues >1, a standard threshold for extracting components, an inspection of the scree plot displaying the Eigevalues across components revealed a relatively linear decrease in Eigenvalues across the factors. Each factor contributed similarly to the overall variance—ranging from 25 to 15%—suggesting that we should retain independently all five measures in subsequent analyses. Finally, we assessed whether infants' performance was consistent across the four Phase 1 visits by conducting correlational analyses within each task across Phase 1; we applied a Bonferroni-correction for multiple comparisons within the analysis for each task, yielding a significant threshold of <0.008. This analysis revealed that only the Efficiency task was relatively stable across visits (r between.25 and.39 across four of six comparisons; see **Table 9**). Children did not exhibit consistent patterns of play across visits on other tasks in the exploratory play assessment.

TABLE 8 | Summary of intercorrelations for the Phase 1 Exploratory Play tasks.


*Pearson correlation r-values. N* = *130 for all correlations. No correlations are significant after Bonferroni-correcting for multiple comparisons.*


*Pearson correlation r-values (and n for each comparison).* \**p* < *0.008 after Bonferonnicorrecting for multiple comparisons.*

#### **Risk status of infants**

Next, we assessed whether infants recruited from the early intervention sites differed from the infants not at-risk on any items on the exploratory play assessment. To control for multiple comparisons across the five assessment items, we employed a Bonferonni-correction yielding a threshold level of p < 0.01. Only the average Efficiency score differed significantly between the two populations. Independent samples t-tests revealed that at-risk infants were less efficient than typically-developing infants [Efficiency: typically-developing: M = 0.10, SD = 0.71, at-risk: M = −0.26, SD = 0.54, t(128) = 2.72, p = 0.007, two-tailed]. There were no significant differences between typically-developing and at-risk infants on any other task in the Exploratory Play Assessment. See **Table 10**.

#### **Shorter-term cognitive development**

To motivate the hypotheses for Phase 2, we performed an exploratory analysis on the relation between each Phase 1 measure and the shorter-term cognitive development measures. As this was an exploratory analysis to motivate hypothesistesting for Phase 2 of the study, we did not correct for multiple comparisons in this analysis. Although children produced a wide range of scores for both the MCDI and the delay of gratification tasks, the scores for both tasks were not normally distributed. Therefore, we used non-parametric Spearman rank order correlations to conduct our analyses. The only significant relation between the exploratory play tasks and the shorterterm cognitive development assessment measures was between infants' average Efficiency score and their MCDI score [rs(111) = 0.23, p = 0.012; **Table 11**]. This correlation suggests that infants who explored more efficiently had larger vocabularies. Infants' efficiency score did not correlate with executive function abilities and did not distinguish parents with and without concerns about their child's development; similarly, no other exploratory play assessment measure predicted any other shorter-term cognitive development assessment measure.

#### Phase 2 Analyses

A subset of children from Phase 1 (38 of 130 infants) returned for Phase 2 at 3 years of age (mean age at Phase 2 assessment: 3.23 years, SD = 0.15 years; range 36–43 months). Preliminary analyses revealed that this subset of children was representative of the initial sample; children who returned for Phase 2 did not differ significantly from those who did not return on either average Efficiency scores or Phase 1 vocabulary scores [Efficiency scores: Returners: M = 0.05, SD = 0.67, non-returners: M = −0.01, SD = 0.69, t(128) = 0.45, p = n.s., two-tailed; Vocabulary scores: Returners: M = 51.12, SD = 28.02, non-returners: M = 48.21, SD = 31.87, t(111) = 0.46, p = n.s., two-tailed].

Preliminary inspection of our longer-term developmental measures showed children's IQ scores were high (M: 120.1, SD = 11.92; range 94–142) and that no child received an SCQ score above the standard diagnostic cutoff point (i.e., 15); three children received an SCQ scores of 12, which is still below the lower cutoff point recommended for younger populations (i.e., 13; Snow and Lecavalier, 2008). This finding suggests that our sample was comprised of children with relatively high cognitive and social communication abilities, a point which we return to in the general discussion. Nonetheless, early exploratory play abilities could be related to longer-term development even among this relatively high achieving sample.

Given that infants' Efficiency score elicited the most stable performance across Phase 1, was the only measure for which typically-developing infants exhibited significant performance differences compared to the at-risk infants, and suggested a correlation with vocabulary size, we focused our final analyses only on the relation between the efficiency of children's exploration and longer-term cognitive development. Specifically, we hypothesized that greater efficiency of children's exploration in infancy would be related to higher IQ scores during Phase 2; given this specific prediction, we did not correct for multiple comparisons through the analysis of Phase 2 measures.

Our analyses supported our prediction. Infants who contacted more parts of the toy relative to the time that they played had higher IQ scores at age three [r(34) = 0.37, p = 0.028]; r 2 values suggest a medium effect size (**Figure 4**). Further analysis focused specifically on individual components of IQ revealed that infants' average efficiency score was correlated significantly with TABLE 10 | Mean performance in the Phase 1 Exploratory Play Assessment as a function of risk status.


*Mean (and Standard Deviation) for each Exploratory Play Assessment Task for the typically-developing children and children at risk for developmental delay. Note that z-scores, standardized relative to age-binned cohorts including both infants at risk and not at risk, were used for the Attention to Novelty, Inductive Generalization, Efficiency of Exploration, and Imitative Learning tasks, as they were all correlated with age. We used the raw % scores for the Social Preference task since the children's performance was not correlated with age.*

TABLE 11 | Relation between the Exploratory Play Assessment tasks and the Phase 1 Shorter-term Developmental Assessment.


\**p* < *0.05.*

*<sup>a</sup>Spearman's rank order correlation r-values.*

*b Independent-samples t-tests.*

verbal comprehension on the WPPSI [r(34) = 0.35, p = 0.038], was marginally correlated with visual spatial skills [r(34) = 0.28, p = 0.094], but not with working memory abilities [r(34) = −0.02, p = 0.895]. Infants' average Efficiency score across Phase 1 did not predict children's delay of gratification or SCQ scores (both ps > 0.05). Post-hoc analyses found that no other item in the Phase 1 exploratory play assessment predicted IQ, delay of gratification, or SCQ scores (all ps > 0.05); additionally Phase 1 MCDI scores did not predict Phase 2 IQ scores (p > 0.05).

To determine whether these results held even for the youngest infants assessed, we looked at the correlation between infants' Phase 1 Visit 1 scores and all cognitive development measures for both Phase 1 and Phase 2. Analyses revealed that infants with higher Efficiency scores at their very first visit had marginally higher MCDI scores at the end of Phase 1 [Visit 1 score: r(110) = 0.17, p = 0.08]. The first visit Efficiency score was also higher for typically-developing infants than at-risk infants [typicallydeveloping: M = 0.14, SD = 1.04, at-risk: M = −0.41, SD = 0.66, t(126) = 2.89, p = 0.005, two-tailed]. Finally, infants' first visit Efficiency score predicted their full-scale IQ at age three [r(34) = 0.43, p = 0.009]. Further analysis revealed that infants' efficiency score was correlated significantly with verbal comprehension skills [r(34) =0.38, p = 0.021] and visual spatial skills [r(34) = 0.39, p = 0.02], but not with working memory abilities [r(34) = −0.03, p = 0.876]. No other Visit 1 measure predicted any cognitive development measure (all ps > 0.05).

#### DISCUSSION

The current study assessed the relation between and stability of multiple aspects of infants' exploratory play in a longitudinal design, as well as their relation to longer-term cognitive development. The results of the current study suggest that there are distinct, non-overlapping aspects of infants' exploratory play, and that the efficiency of infants' exploration is a relatively stable measure, at least over a 9-month period in infancy. This efficiency measure is also informative: typically developing infants' performance differed from infants at-risk for developmental delays, the measure correlates with parental report of toddlers' vocabulary, and the measure was correlated with IQ at age three. Finally, the efficiency measure appears to be related specifically to IQ: it was not correlated with children's executive function at either time point, nor did it correlate with children's social-communicative competence. In sum, a 5-min assessment of infants' free play showed that infants who explore efficiently at one time point are likely to do so again, and that the efficiency of their exploration is correlated with both near- and longer-term cognitive development.

There are several limitations to the conclusions we can draw from this study. First, we are unable to make any strong claims about the exploratory play behaviors measured in the current study—attention to novelty, inductive generalizations, face preference, and imitative learning—which were not stable over the 9-month period in Phase 1 and did not correlate with any shorter- or longer-term cognitive development measure. Critically, failure to find stable effects should not be taken to imply either that the abilities these measures were intended to index are unstable, or that those abilities have no implications for long-term cognitive development. We restricted ourselves to tasks that were easy both to administer and code. A consequence of this practical design aim may be that the simplicity of our measures limited our ability to capture relatively fine-grained individual differences in these tasks or their relation to longerterm measures of cognitive development.

In particular, we note that at least one other study has found that latency to respond to a novel vs. a familiar toy distinguishes premature infants and full-term infants (Sigman, 1976). Why did we fail to find evidence for this in our study? There are a number of possibilities. In addition to methodological variations between the studies (e.g., differences in the specific stimuli used), the care provided to premature infants has changed dramatically over the past few decades thus the behavioral profiles of premature infants in the 1970's may be different than they are today. Additionally, previous research looked at infants at a single time point (8 months) whereas the current study recruited infants from 5 to 19 months, assessed them at four different time points, and looked at infants' average score across all the tasks. Measures that are predictive at a single point in time may not be predictive averaged across 9 months of infancy. Although we did assess the relation between exploratory play at the first Phase 1 visit with longer-term developmental outcomes, this analysis included the full age range recruited for the study, rather than only young infants. Finally, it is possible that the stability of some exploratory play constructs (e.g., attention to novelty) may be captured more clearly not by assessing the relation between a uniform measurement across development (e.g., time to contact a novel toy), but rather by assessing the relation between agecalibrated measurements which may change in complexity with age (e.g., looking time measures in early infancy with actionbased measures in toddlerhood).

It is also possible that, although our attention to novelty measure was intended to be comparable to visual attention measures of novelty preference, our efficiency measure may have better indexed infants' ability to process information efficiently and detect changes in their environment. As our efficiency measure was computed based on the number of parts of the toy that children contacted over their total playtime, infants with higher scores in this task may have been better able to visually detect, process, and encode novel aspects of the toy. Thus, the findings we report here may serve as supporting evidence for the positive relation between these skills and later cognitive development and suggest that the efficiency of children's manual exploration might be a proxy for measuring intelligence early in development. Future research could directly compare rate of habituation measures with our efficiency of exploration measure to determine whether they index the same cognitive abilities and whether they are related similarly to cognitive development.

Our design also is unable to assess the full complexity of the development of children's exploratory play. In particular, as noted in the introduction, studies have shown that infants' manual exploration becomes more complex and integrated with other cognitive processes over development. As children's motor repertoire increases over development, children are able to engage simultaneously with more objects, both exploring interactions between these objects and using objects as tools to explore their environment, which can facilitate the acquisition and learning of new knowledge (e.g., Lockman, 2000). Future research could be directed at assessing behaviors across the full range of contexts and actions that define children's developing exploratory play, ranging from simple exploration of single objects to the use of multi-affordance objects as tools. Moreover, given that children's exploratory play behaviors were standardized according to age-matched peers to reduce age effects over our sample, the findings from this study motivate future research with larger samples that could investigate the time-course of developmental changes within components of exploratory play at both at the level of individual children and within smaller developmental windows, how developmental changes compare across components of exploratory play, and how they collectively interact to impact cognitive development outcomes.

The current results are also limited in that the children retained through Phase 2 had relatively high IQ scores (M: 120.1, SD = 11.9; range 94–142). We do not know whether the correlation between exploratory play and IQ holds for the broader population–nor do we now whether infants' exploratory play, even in relatively high IQ children, predicts intelligence after age three. Additionally, future research might look at whether children's home environment plays a mediating or moderating role in the relation between exploratory play and cognitive development (e.g., having more toys in the home may independently facilitate children's exploration and their later cognitive development or the relation between exploration and cognitive development may only hold for homes with many toys to explore) (e.g., see Storch and Whitehurst, 2001, for similar approach in literacy development). Finally, this study leaves unresolved the question of causation; smarter infants might explore more efficiently or efficient exploration might contribute to intelligence. Future research might identify the particular processes underlying the correlation between efficient exploratory play and intelligence.

Despite these limitations our results suggest a positive relation between the efficiency of exploratory play and cognitive development. There are several possible mechanisms that might contribute to this correlation. Although our exploratory play assessment was designed to involve comparable motor demands across tasks (reaching for and manipulating objects), and although infants did not differ on other measures of motor capability (e.g., latency to reach for novel objects) it is nonetheless possible that infants who discovered more functions of a toy relative to their total play-time had more advanced motor skills overall (see e.g., Bornstein et al., 2013). If so, it may be that infants who are relatively advanced in their motor development are relatively advanced in cognitive development as well, that advances in motor development contribute to cognitive development through enhanced opportunities for interaction and exploration, or that exploratory play has differential effects on children at varying stages of motor development (e.g., Bushnell and Boudreau, 1993; Karasik et al., 2011; Schwarzer et al., 2013; Kretch et al., 2014) However, assuming that differences in infants' motor skills are not the only factor affecting the efficiency of their exploratory play, the free exploration measure may have taxed a number of other cognitive abilities. Efficient exploration plausibly requires the ability to flexibly engage and disengage attention, to plan sequences of actions, and to integrate these abilities with sensitivity to the rate of information gain. Arguably, the cognitive skills that let infants rapidly discover novel functions of a toy could be deployed to support learning in many domains. Finally, it is possible that motivational factors underlie both children's performance on the efficiency measure and their performance on the cognitive measures. Future research might clarify the relative contribution of motor skills, cognitive abilities, and affective engagement to the correlation between efficient exploratory play and later cognitive developments. Additionally, although we found evidence of a specific relation between efficient exploration and verbal abilities, future research might study more broadly the relation between efficient exploration and different components of IQ (i.e., verbal and spatial abilities) and of executive function (e.g., inhibition, set shifting, working memory) across development.

The current study suggests that continued research investigating individual differences in early exploration may have important implications for our understanding of

#### REFERENCES


longer-term cognitive developments. It is also encouraging that stable, predictive differences in infants' exploratory play can be assessed using stimuli and measures easy to administer outside of the lab. Such measures have the potential to link basic science on children's exploratory play with applied efforts to identify children at-risk, and intervene on children's cognitive development. Insofar as infants' free exploration predicts longerterm cognitive development, children's play is worth taking seriously.

#### AUTHOR CONTRIBUTIONS

PM and LS designed the study. PM oversaw data collection for the study. EH was responsible for Phase 1 data collection. PM and EH oversaw coding of all data. PM, EH, and LS all contributed to data analysis and interpretation, and PM, EH, and LS all contributed to the drafting and revision of the manuscript.

#### FUNDING

This work was supported by a grant from the Simons Foundation to the Simons Center for the Social Brain at MIT through a postdoctoral fellowship to PM, and a John Merck Scholar Award, an NSF Faculty Early Career Development Award and a grant from the Center for Brains Minds, and Machines (CBMM), funded by NSF STC award CCF-1231216 to LS.

#### ACKNOWLEDGMENTS

We are grateful to members of the Early Childhood Cognition Laboratory at MIT for assistance in coding the data and helpful feedback on the study. The at-risk sample was recruited with help from Emily Feinberg at the Boston University School of Public Health. We are grateful for their support and collaboration. We are also grateful to John Gabrielli and the MIT Clinical Research Center for support administering the Phase 2 cognitive development measures. Thanks to Andrew Gelman, Rebecca Saxe, and Josh Tenenbaum for helpful comments on prior drafts of the manuscript.


in preschoolers suspected of having pervasive developmental disorders. Autism 12, 627–644. doi: 10.1177/1362361308097116


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Muentener, Herrig and Schulz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Games Infants Play: Social Games During Early Mother–Infant Interactions and Their Relationship With Oxytocin

#### Gabriela Markova\*

Department of Applied Psychology: Health, Development, Enhancement and Intervention, Faculty of Psychology, University of Vienna, Vienna, Austria

The present study examined early social game routines during natural face-to-face mother–infant interactions and their relationship with oxytocin. Forty-three mother–infant dyads were observed, when infants were 4 months old, during a procedure involving a baseline and a natural interaction, where mothers were instructed to interact with their infants as they would at home. During this procedure four saliva samples from mothers and infants were collected to determine levels of oxytocin at different time points. Social game routines and infant social engagement (gaze, positive, and negative affect) were coded during the natural interaction. Social games were observed in 76.7% of the mother–infant dyads, and 46 different types of games were identified. Mothers initiated games to re-engage infants significantly more often than when infants were already engaged with them. During the games, infants showed more positive affect and less negative affect in comparison to the rest of the interaction. Finally, maternal increase in oxytocin from before to after the natural interaction was positively correlated with game rate and time spent in games, while infant increase in oxytocin from before to after the natural interaction was inversely related to game rate. These results indicate that social games are an inherent part of early mother–infant interactions, and their occurrence is associated with oxytocin of both infants and mothers.

#### Edited by:

Jill Popp, The Lego Foundation, Denmark

#### Reviewed by:

Ben Nephew, Tufts University, United States Robert C. Froemke, New York University, United States

> \*Correspondence: Gabriela Markova gabriela.markova@univie.ac.at

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 14 December 2017 Accepted: 04 June 2018 Published: 25 June 2018

#### Citation:

Markova G (2018) The Games Infants Play: Social Games During Early Mother–Infant Interactions and Their Relationship With Oxytocin. Front. Psychol. 9:1041. doi: 10.3389/fpsyg.2018.01041 Keywords: social play, game routines, mother–infant interactions, oxytocin, engagement

### INTRODUCTION

Social play consists of social activities with the goal "to have fun, to interest and be with one another" (Stern, 1977, p. 71). Infants and their caregivers begin to co-construct vocal, gestural, and also multimodal social game routines, such as peek-a-boo, throughout the 1st year of life (e.g., Bruner and Sherwood, 1976; Gustafson et al., 1979; Fantasia et al., 2014b). Recent research suggests that already 3-month-old infants actively participate in such game routines and recognize when their structure is violated (Fantasia et al., 2014b). Despite this evidence, we know little about the contexts and formats of early social game routines and their underlying mechanisms. For example, oxytocin plays a vital role in human social behavior, and seems particularly influential during social interactions between infants and their caregivers. Consequently, the goal of the present study

**77**

was to investigate naturally occurring social game routines during early face-to-face mother–infant interactions and their relationship with oxytocin.

Early social interactions between infants and their caregivers are characterized by a face-to-face context, close physical contact and a turn-taking structure, where cycles of mutual attention between mothers and infants (engagement) and cycles of non-attention (disengagement) alternate (e.g., Brazelton et al., 1974; Field, 1978; Stern, 1985; Trevarthen, 1993; Papoušek and Papoušek, 1995). Research suggests that these pre-verbal communicative exchanges between adults and infants take place across different modalities, such as through vocalizations, facial expressions, gazes, touch or gestures (e.g., Condon and Sander, 1974; Murray and Trevarthen, 1986; Kobayashi and Kohshima, 2001; Schore, 2001). Early interactions are often characterized as a dialog or a mutual, bidirectional process (Tronick, 1989), in which both partners modulate the timing, the form and the intensity of interaction and their own emotional expression to achieve complementary interactive exchanges (Ammaniti and Trentini, 2009). Thus, the purpose of early engagements is to share meaning, particularly affect with another person (Stern, 1977; Trevarthen and Aitken, 2001; Markova and Legerstee, 2006; Legerstee et al., 2007). Interestingly, these early affective communicative interactions have been described as playful, because their sole purpose is to share experiences with another person (Stern, 1977). Consequently, early social interactions seem to be the optimal context for social play to develop. While some forms of interactions between infants and adults can be clearly characterized as play (e.g., tickling, blowing raspberries), the distinction between what constitutes social interaction as compared to social play is difficult to make (see e.g., Burghardt, 2011). Investigating early social game routines may be one way to distinguish between the two largely overlapping constructs.

Social game routines have a clear recurring structure (Stern, 1977; Crawley et al., 1978; Gustafson et al., 1979; Crawley and Sherrod, 1984; Tamis-LeMonda et al., 2002; Fantasia et al., 2014b), which allows infants to follow elementary rules in their social interactions (Ross and Kay, 1980). Social game routines also follow explicit rules and sequences (Ratner and Bruner, 1978; Ross and Kay, 1980) that are observable at the vocal as well as the motor level of the game. Such games usually include simple rhymes or songs that are associated with motor movements to achieve a coordination of behavior and vocal expressions (Crawley et al., 1978). This visual language (see Hay et al., 1979; Goldman and Ross, 1978) is depicted in **Table 1** for the three most common game routines used by mothers in the sample examined in the current study. Thus, social game routines are characterized by a multimodality that includes a vocal-kinetic format in the form of a rhyme or song, and hand gestures or physical manipulation of the child's body corresponding to the context of the given rhyme or song (Crawley et al., 1978; Fantasia et al., 2014b).

Various authors have argued that infants are initially passive during a playful interaction, which is initiated by the adult (e.g., Bruner and Sherwood, 1976; Crawley et al., 1978; Ratner and Bruner, 1978; Gustafson et al., 1979; Bruner, 1983; Hodapp et al., 1984; Ross and Lollis, 1987). In the second half of the 1st year infants then assume a more active role in the game that is associated with their growing tendency toward structured games as well as motor capabilities, which allow them to contribute to the games more actively (Bruner and Sherwood, 1976; Crawley et al., 1978; Ratner and Bruner, 1978; Ross and Kay, 1980; Crawley and Sherrod, 1984). In contrast, recent research investigating routine activities suggests that infants actively participate in and contribute to daily routines from early on in life (e.g., Nomikou and Rohlfing, 2011; R ˛aczaszek-Leonardi et al., 2013; Reddy et al., 2013). Routines, such as picking up or changing a diaper, enable infants to recognize their structure and, consequently, built expectations about and thus anticipate others' behaviors (Reddy et al., 2013). By participating in these shared and meaningful social routines, infants practice their ability to make sense of and coordinate with the others' actions. In this context, Fantasia et al. (2014b) examined infants' sensitivity to violations of social game routines. The authors found that structured game routines take place already at 3 months of age and infants are sensitive to modifications of these multimodal routines, such as when mothers leave out the rhyme or gestures of a particular game (Fantasia et al., 2014b). Specifically, infants reduced their body movements and positive vocalizations, and avoided their mothers' gazes when a game structure was violated (Fantasia et al., 2014b). These findings suggest that already 3 month-old infants have expectations of the structure of early social game routines and recognize when this structure is modified. Unlike free unstructured play, game routines include fixed action patterns, which make them highly predictable. This feature makes it then very easy for even very young infants to actively participate in the games (Fantasia et al., 2014b). Understanding the structure of play as a sequence of tasks, based on the practical, procedural understanding of the routine (Lerner et al., 2011), and not as a necessity for the presence of cognitive representation of the game (Tomasello, 2009), enables infants to be fully capable partners in this shared activity.

Research reviewed thus far suggests that participation in social game routines is cooperative. Interestingly, there is evidence showing that oxytocin (OT) underlies cooperation in adult populations (see Bartz et al., 2011, for review), and this neuropeptide also seems particularly influential during early social interactions in that it enhances social competence of both infants and adults (e.g., Feldman et al., 2007; Levine et al., 2007). For example, an abundance of research has shown that OT promotes parental caregiving behaviors (e.g., Feldman et al., 2010a; Gordon et al., 2010a,b,c; Naber et al., 2010, 2013), and also infant social engagement (Feldman et al., 2010b; Weisman et al., 2012). Most importantly, it is particularly the match between infants' and parental behaviors that seems to be reflected in the workings of the OT system (Feldman, 2006, 2007; Feldman et al., 2010a, 2011; Gordon et al., 2010b,c). Playing social games involves coordinated activities resulting from a continued attempt to construct and maintain a shared purpose (Tuomela, 2000). Thus, there may be interesting links between playing early social game routines and OT.

In the present study, we set out to examine (1) the occurrence of early social game routines during natural face-to-face

TABLE 1 | Multimodal sequence for the three most common social game routines in the present study.


mother–infant interactions, (2) infant engagement in these game routines, and (3) their relationship with salivary OT of both mothers and infants. Consequently, we have observed mothers and their 4-month-old infants during a procedure involving a baseline and a natural interaction, during which saliva samples were collected from both mothers and infants to assess OT. During the natural interaction, we have observed naturally occurring social game routines as well as infant social engagement throughout the games and the rest of the interaction. We expected that dyads will spend a considerable amount of time in social game routines and infants will enjoy these game routines more than the rest of the interaction. We also hypothesized that there will be associations between maternal and infant participation in the game routines and their levels of OT.

#### MATERIALS AND METHODS

#### Participants

Overall, 43 mothers and their infants (24 girls) participated in the present study. Mothers were recruited in prenatal childbirth classes and in mother–infant activity classes. Their visit to the laboratory was arranged when infants were 4 months (M = 139.43 days, SD = 19.415 days). All infants were born healthy (5 min Apgar ratings 6–10) and at term, with a gestation period of at least 36 weeks. The majority of infants (93.3%) had no siblings. Mothers were 31.60 years old at infants' birth (SD = 3.578 years) and had on average 5.08 years of higher education (SD = 2.853 years). The majority of mothers were primiparous (90.7%) and were breastfeeding their infants (86.7%). All dyads were of European Caucasian origin (Czech) and came from middle to upper class families based on parental education. Mothers and infants received a small gift for participating.

#### Procedure and Materials

The Institutional Ethics Committee approved the study. In line with previous research, visits at the laboratory were scheduled between 1 and 4 pm (see Feldman et al., 2010a, 2011). Mothers were asked to come at least 30 min after breastfeeding and to refrain from eating or drinking (other than water) at least 1 h before testing. The mean time difference between last feeding and first saliva collection was M = 90 min (SD = 34.41, range = 15–193 min).

Initially, mothers were informed about the experimental procedure and saliva extraction, after which they signed an informed consent. Mothers were then instructed to rinse their mouth with water to remove food residue and the first salivary sample was collected from mothers and infants. Infants were seated in an infant-seat lying on a table (95 cm × 65 cm × 50 cm) and mothers sat facing the infant (approximate eye level between mothers and infants was 30 cm). The experimenter was present in the room, out of sight from mothers and infants, and did not communicate with infants or mothers during the procedure (except when explaining the procedure). Interactions were filmed using two digital cameras.

There were two main parts of the procedure that were analyzed for the purpose of the present study: a condition

without communication (i.e., baseline) and a natural interaction. There was a third condition following the natural interaction, a modified interaction, in which mothers were instructed to change their interaction style (e.g., use adult directed speech). The behavioral analysis of this condition was not part of the present investigation. The three conditions were presented in a fixed order, each lasting approximately 10 min. Consequently, the duration of the procedure ranged between 30 and 40 min. To obtain a baseline where no interaction between mothers and infants takes place, mothers were asked to fill out various questionnaires and to refrain from communicating with infants, while infants watched a Baby Einstein <sup>R</sup> DVD designed for children from 3 months. During the natural interaction, mothers were instructed to interact with their infants as they usually would at home, without toys. No specific instructions were given pertaining to playing in general, nor playing structured game routines in particular.

#### Saliva Samples

During the visit, a total of four saliva samples from each mother and infant were collected using oral swabs to determine the concentration of OT: OT1 was collected after mothers and infants came to the laboratory and were informed about the experimental procedure; OT2 was collected after the baseline; OT3 was collected after the natural interaction; and OT4 was collected after the modified interaction (see **Figure 1**). Mothers were instructed to keep swabs (Salimetrics Oral Swab) under their tongue for 2 min. A research assistant collected saliva samples from the infants (Salimetrics Infant's Swab). The swabs were put into collection tubes immediately after collection and kept on ice in a thermocol ice box during the whole procedure. After the procedure, collection tubes were frozen and stored at −20◦C.

#### Measures

All sessions were videotaped and behaviors during the natural interaction were coded from the videos at a later point in time. Behaviors were coded separately and at different times. To determine inter-rater reliability, one rater coded all data and a second rater independently coded 30% of randomly selected data. There was high inter-rater reliability, computed as intra-class correlations, for all behavioral measures (see **Table 2**).

#### Social Game Routines

A social game routine was defined as an infant-directed activity. Because early game routines usually contain nursery rhymes in combination with gestures that are dependent on the rhymes' context, we were particularly looking for games complying with this vocal-kinetic format. Social game routines were coded when: (a) they were recurring and universal across and within dyads; (b) they were individually varied (e.g., gestures/rhymes varied across and/or within dyads), but adhered to the vocal-kinetic format; or (c) their structure was individual (e.g., song that usually does not go along with gestures, but individually performed gestures matched the song's context), but they adhered to the vocal-kinetic format. Coding of a social game routine commenced when every element (i.e., verbal and non-verbal expressions) of a particular game was present and a game structure was clearly recognizable. Coding was discontinued when the game activity was interrupted or the game was completed. Games that violated the vocal-kinetic format (e.g., mother sang without corresponding gestures), or lacked a recognizable and universal structure (e.g., unstructured play) were not coded. The onset and offset of each game was noted, and the following variables were then computed: (1) game rate was defined as the total number of games, adjusted to the individual duration of each interaction; (2) relative duration of a game was defined as the duration of a game (in seconds) adjusted to the total number of games played; and (3) percentage of time spent in games was defined as the total duration of time spent in games, adjusted to the individual duration of each interaction.

#### Infant Social Engagement

Infant gazes (i.e., gazes at mother, gazes away), facial expressions (i.e., positive, negative), and vocalizations (i.e., positive, negative) were coded second-by-second. The durations of behaviors were adjusted according to the duration of each individual interaction (for similar coding see Peláez-Nogueras et al., 1996; Markova and Legerstee, 2006; Legerstee and Markova, 2007). We coded gazes at mother as infant gazes at their mother's face, and gazes

TABLE 2 | Inter-rater reliability computed as intra-class correlations (ICC) for all behavioral measures.


away as infant gazes away from their mother's face at something else in their surroundings. To be coded, infant gazes had to last a minimum of 1 s. Positive facial expressions were defined as smiles with the mouth (open or closed) turned upward. Negative facial expressions were coded when infants showed negative emotions like distress, fretting, anger, or discontentment with mouth curled or grimacing. Vocalizations were coded when a discrete sound occurred within one respiration cycle. Two separate sounds were coded if the sound was segmented by a 1 s silence. Vegetative sounds, such as wheezes, sneezes, cough, hiccups, and effort sounds, such as grunting and panting, were excluded. Infant positive vocalizations were produced with a composed facial expression and defined as sounds containing varied pitch contours, produced relaxed and syllablelike, often called babbling, and containing oral resonance. Infant negative vocalizations were produced with an agitated facial expression and defined as vocal sounds that were produced somewhat forced or with effort and were often series of vowellike sounds, somewhat nasal with uniform pitch, such as whimpers, fusses, cry sounds, and wails. Consistent with previous research (Peláez-Nogueras et al., 1996; Legerstee and Markova, 2007) we computed composite scores for positive and negative affect. Positive affect included positive facial expressions and vocalizations, negative affect included negative facial expression and vocalization. Infant gazes were considered as a separate category.

#### Oxytocin Analysis

The present study used a similar method as previous research validating measurement of OT in saliva (for exact methodology see Carter et al., 2007). However, because measurement of OT in saliva was a relatively new approach without any standardized protocol, a pre-experiment was conducted. The results from the pre-experiment showed that all test samples (5 adults and 5 infants) had sufficient volume concentrations (at least 1 mL) and were above the limit of detection of the assay. Thus, unlike in previous research (Carter et al., 2007; Feldman et al., 2010a,b, 2011), it was not deemed necessary to concentrate the samples before assay. We used a commercially available kit (Oxytocin EIA kit, ADI-901-153, Enzo Life Science) to determine the concentration of OT. The limit for detection of the assay was 11.7 pg/mL. Saliva was recovered from the swabs by centrifugation (2500 × g for 10 min at 4◦C). Samples were measured directly without any further modification, and the assay procedure was performed meticulously following the kit's instructions. All test samples were run in duplicates and a separate standard curve was constructed for each plate. After the first part of the assay procedure, the plate with reagents was incubated overnight at 4◦C. On the following day the plate was incubated at room temperature for 1 h. The reaction was then stopped and the optical density of the samples was immediately read on a microplate reader at 405 nm. The concentrations (in pg/mL) of OT were calculated from the relevant standard curve using Softmax Pro 5.2. Each standard curve was checked for quality control parameters as stated in the instructions. The intra-assay coefficient of variability was 13.28%.

### RESULTS

Prior to analyses, all data were screened for deviations from a normal distribution and univariate outliers (z > ±3). Outliers were assigned a new score one unit higher/lower than the next highest/lowest score in the distribution (Tabachnick and Fidell, 2001). Because most of the data was not distributed normally, non-parametric tests were used. Data were also screened for possible confounding variables (maternal age, education, breastfeeding, primiparity, symptoms of depression, time difference between last feeding and first saliva extraction, infants' age and gender), and these were not found associated with any of the behavioral nor OT variables.

Descriptive statistics for behavioral measures are shown in **Table 3**. Gazes at mother were perfectly negatively correlated with gazes away, as was to be expected. Infant gazes at mother were significantly correlated with positive affect displays, r(43) = 0.595, p < 0.001 (the same negative correlation was found between gazes away and positive affect), and positive affect was negatively correlated with negative affect, r(43) = −0.372, p = 0.014.

### Occurrence of Social Game Routines

Social games were observed in 76.7% of the mother–infant dyads. A comparison between playing and non-playing dyads revealed significantly higher years of maternal education for playing (M = 5.52, SD = 2.62) than for non-playing mothers (M = 3.70, SD = 2.16; z = 2.210, p = 0.031). Comparisons on the


<sup>a</sup>Rate = total frequency of games, adjusted to the individual duration of each interaction. <sup>b</sup>Relative duration = duration of a game (in seconds), adjusted to the total number of games played. <sup>c</sup>Proportional duration = total duration of a particular behavior, adjusted to the individual duration of each interaction.

study's main variables or other background variables were nonsignificant. Mothers and infants spent on average 12% of their interaction time playing games, and games lasted on average 11 s. The first game occurred on average 1.13 min after the onset of the interaction (SD = 1.23 min). Overall, 46 different game routines were identified (see Supplementary Table S1 in supplementary materials for a list of all observed games and their frequency of occurrence), and 37% of these games were played by at least two different dyads.

The three most common game routines were Paci, paci, paciˇcky (39.53%), Vaˇrila myšiˇcka kašiˇcku (30.23%), and Kovej, kovej, kováˇríˇcku (20.93%; see **Table 1** for the detailed sequences of these game routines). During Paci, paci, paciˇcky the whole body of the infant is used. The game begins by the mother clapping the hands of her infant and indicating how hands can be used. Then the mother moves to stamping the infant's feet, lightly pulling his/her ears or tapping his/her mouth with her finger, each time indicating what these body parts are used for. During Vaˇrila myšiˇcka kašiˇcku, the mother first draws a circle in the palm of her infant and then moves successively the individual fingers of the infant's hand, beginning with the thumb. At the same time, the mother tells a story about a mouse feeding her hungry children. The game ends when the smallest mouse (i.e., the little finger) runs to the pantry to steal some food (i.e., mother runs her fingers along the infant's body toward the mouth or the armpit). During the game Kovej, kovej, kováˇríˇcku the infant becomes a horse. The game begins with the horse being studded with horseshoes. To do this, the mother gently taps with her hand on the sole of the infant's foot. Thereafter, the horse is given different cereals, which the mother symbolically indicates by putting them in the hands of her infant.

### Infant Engagement in Social Game Routines

To find out in which situations mothers initiated social game routines, we first examined infant behaviors in the time period before the first game occurred. A Wilcoxon singed rank test showed that infants looked significantly longer away (M = 55.44, SD = 34.31) than at the mother (M = 34.53, SD = 30.74) in the period leading up to the first game, z = 2.035, p = 0.042. There were no significant differences in infant affect displays. In addition, we combined infant social engagement behaviors 2 s before each game into three groups: engagement, disengagement, and ambivalent engagement. Engagement included gazes at mother or the combination of gazes at mother and positive affect. Disengagement was composed of gazes away from the mother or the combination of gazes away from the mother and negative affect. Ambivalent engagement included gazes at mother in combination with negative affect or gazes away from the mother in combination with positive affect. The frequencies of these combined behavioral categories were adjusted to the frequency of games for each dyad. The Friedman test showed significant differences among the repeated measures, χ <sup>2</sup> = 19.183, p < 0.001. Wilcoxon's pairwise comparisons showed that mothers initiated games significantly more often when infants were disengaged (M = 58%, S.E. = 6.77) as compared to when they were already engaged with each other (M = 33.98%, S.E. = 5.37), z = − 1.959, p = 0.050, or when their engagement was ambivalent (M = 7.82%, S.E. = 6.77), z = −3.645, p < 0.000. Moreover, compared to ambivalent engagement, mothers initiated games significantly more often when they were already engaged with their infants, z = −3.905, p < 0.000.

We further examined the conditional probabilities for the occurrence of infant social engagement behaviors depending on whether they were followed by a game routine or another form of social interaction (see **Figure 2**). Wilcoxon's pairwise comparisons showed that infants displayed more positive affect, z = −1.884, p = 0.06, and less negative affect, z = −2.935, p = 0.003, during game routines in comparison to the rest of the interaction.

Finally, we compared infant social engagement before and after game routines to examine whether games were instrumental in changing infants' behaviors. Friedman's ANOVA revealed no significant changes in infants' behaviors, suggesting that social game routines have no sustainable effect on infants' engagement.

### Relationship Between Social Game Routines and OT

There was a substantial amount of missing OT data for both mothers (OT1 = 25.6%; OT2 = 14%; OT3 = 18.6%; OT4 = 32.6%) and infants (OT1 = 39.5%; OT2 = 30.2%; OT3 = 34.9%; OT4 = 48.8%) that was due to either an insufficient volume of saliva or an error in computing the OT curve. Little's MCR test showed that OT data were missing completely at random (p = 0.540). Moreover, we found no differences between dyads with and without missing data on any of the background or the main behavioral variables. Consequently, the multiple imputation method was used to replace missing OT values. All subsequent analyses were computed with original data as well as imputed data (pooled results), and results with both data sets are reported.

We employed three different ways of measuring OT (see also **Figure 1**). First, we calculated individual OT values for each of the four time points. Second, we calculated the area under the curve (AUC) of all four OT measurements with respect to ground (AUCG). AUC<sup>G</sup> gives information about the total hormonal output, and takes into account differences between single measurements from each other and the distance of these measurements from the ground. This approach is frequently used to comprise information contained in repeated measurements (Pruessner et al., 2003). Third, in order to assess the individual changes in OT, we computed the area under the curve with respect to increase (AUCI), for the time interval before (OT2) and after (OT3) the natural interaction. The AUC<sup>I</sup> is a measure of AUC with reference to the first value. It ignores the distance from zero and, thereby, emphasizes the sensitivity of the system and changes over time (Pruessner et al., 2003). The AUC<sup>I</sup> allows comparing individual changes in reactivity, rather than simply comparing means over time and, therefore, represents a more advantageous approach to measuring interpersonal differences. Negative values indicate a decrease and positive values an increase in OT. Descriptive statistics for original and imputed data of all OT values are reported in **Table 4**, and bivariate correlations of all OT measures within and between mothers and infants are provided in **Table 5**. Maternal and infant AUC<sup>G</sup> were significantly positively correlated [original data: r(24) = 0.450, p = 0.027; imputed data: r(43) = 0.433, p = 0.004], while their AUC<sup>I</sup> were significantly negatively correlated [original data: r(21) = −0.440, p = 0.046; imputed data: r(43) = −0.553, p < 0.001].

Spearman correlational analyses between the game variables and OT variables showed that, for original data, game rate was significantly negatively correlated with infant OT3, r(28) = −0.413, p = 0.029, and infant AUC<sup>I</sup> , r(24) = −0.425, p = 0.038, as well as positively with maternal AUC<sup>I</sup> , r(34) = 0.531, p = 0.001. Similarly, for imputed data, game rate was marginally negatively correlated with infant OT 3, r(43) = −0.296, p = 0.054, and positively with maternal AUC<sup>I</sup> , r(43) = 0.376, p = 0.013.

Moreover, for original data, time spent playing games during the interaction was negatively related to infant OT3, r(28) = −0.405, p = 0.032, and positively to maternal AUC<sup>I</sup> , r(34) = 0.368, p = 0.032. For imputed data, there was also a trend TABLE 4 | Descriptive statistics for original and imputed OT measures (in pg/ml) of the present study.


AUCG, the area under the curve with respect to ground. AUC<sup>I</sup> , the area under the curve with respect to increase.

for a relationships between time spent playing games during the interaction and maternal AUC<sup>I</sup> , r(43) = 0.279, p = 0.070. No other correlations reached significance.

#### DISCUSSION

Results of the present study suggest that social game routines are an inherent part of early mother–infant interactions. In this sample, almost 77% of mother–infant dyads spontaneously engaged in game routines during their interactions without being instructed to do so. Thus far, no research has examined naturally occurring game routines during early mother–infant interaction, and therefore the present results cannot be compared with existing evidence. However, the large prevalence as well as variety of game routines found in the present study may be attributable to the particulars of the examined sample. Specifically, the sample examined in the present study consisted of Czech women with their infants, and the Czech language has a particularly large repertoire of social game routines that are well-known in the general public. Moreover, playing mothers reported, on average, higher years of education than non-playing mothers, suggesting that engaging in structured game routines with infants may

TABLE 5A | Bivariate relations between maternal and infant baseline OT at the four time points (original data).


<sup>+</sup>p < 0.06, <sup>∗</sup>p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001. i, infant; m, mother.

TABLE 5B | Bivariate relations between maternal and infant baseline OT at the four time points (imputed data).


<sup>∗</sup>p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001. i, infant; m, mother.

depend on maternal educational status. Thus, it remains an open question, whether these results are representative of other linguistic, societal and/or cultural backgrounds.

Despite these limitations the present findings show that social game routines occurred naturally during interactions between mothers and their 4-month-old infants, and were clearly distinguishable from the remaining activities during the interaction. A major problem of play research remains the difficulty in defining what constitutes a play activity with the consequence that any social activity cannot be clearly differentiated from play. In an attempt to identify play in various species, including humans, Burghardt (2011) has proposed five criteria of play. Accordingly, playful behavior is (1) not necessary for current survival; (2) spontaneous, voluntary, intentional, pleasurable, rewarding, reinforcing, or autotelic; (3) not fully functional, because it incorporates incomplete, exaggerated, awkward, or precocious elements, or involves modified or sequenced behavior patterns; (4) being repeated in a similar form; and (5) initiated in a "relaxed field," when all basic needs are provided for. All of these criteria can be applied equally well to social/interactive as well as playful activities. The lack of a good operational definition of play makes it almost impossible to systematically examine early playful activities. Interestingly, we instantly recognize play when we see it, and the present study shows that focusing on particularly structured playful behavior (i.e., game routines) could provide a way to circumvent the definitional challenges. Of course, the research assistants coding the data for the present study were Czech, and, as argued above, the knowledge of game routines is widely spread in the general Czech population. Thus, it is unclear whether coders who do not speak Czech could identify game routines in this study. Yet, we have provided some reference points for a possible operational definition of game routines, and it remains for future research to ascertain its applicability across different samples.

Next, we were interested in the context in which mothers initiated social game routines during the interaction. Our findings suggest that mothers most often began playing game routines when infants were not engaged with them, even more so than when there was an ongoing engagement between mothers and infants. It seems that mothers used game routines as a strategy to regain their infants' attention or interest in the interaction. This could have been a function of the observation situation – mothers were asked to interact with their infants as they would at home in a strange environment, with cameras directed at them, with the pressure to 'perform' and, most importantly, without the possibility to use any objects (e.g., toys). In such a possibly stressful context, engaging in a structured playful routine with their infants may have provided a way for mothers to create a comfortable zone for themselves as well as redirect the infant's attention back to them when the interaction went astray. Again, it remains to be examined how social game routines are used in other contexts, for example during interactions at home.

Confirming our hypothesis, the probability was significantly higher for infants to display positive affect during game routines than during other social activities, while the probability for negative affect during game routines was significantly lower than during the rest of the interaction. Thus, playing games may

have fulfilled one of the main goals of a playful interaction – creating enjoyment (Stern, 1974; Ratner and Bruner, 1978; Papoušek and Papoušek, 1995), which corroborates research examining older children (Crawley et al., 1978; Ross and Kay, 1980). Fantasia et al. (2014b) also showed a general tendency toward positive affectivity during game routines in 3-monthold infants as compared to modified games. Thus, it is possible that infant positive affect during a game is a result of their recognition and understanding of the familiar structure of the ongoing game routine (Ratner and Bruner, 1978; Fantasia et al., 2014b). In fact, our finding that there were no changes in infant behaviors from before to after playing game routines with their mothers would support such an argument. That is, while the probability for positive affect during social game routines is higher than for other types of social activities, playing games does not seem to have a sustainable impact on the infants' mood. In line with the argument presented above, playing social game routines may create a comfort zone in an otherwise stressful social environment, which then elicits positive affect in infants that serves as a feedback loop back to the mother to signal to her that the interaction is enjoyable. This interpretation could also explain the large numbers of social game activities found.

Interestingly, the results pertaining to the third goal of the present study seem to support such theorizing. Specifically, we found that the number of game routines played and the time spent playing them during the interaction was positively related to maternal increase in OT from before to after the interaction. Because the current data is correlational in nature, any implications of directionality remain speculative. Nevertheless, findings of the present study are consistent with current literature on the role of OT in early social interactions, and particularly its associations with interactional synchrony (e.g., Feldman et al., 2010a, 2011; Gordon et al., 2010b,c). The concept of synchrony puts focus on time as a central parameter and is characterized by 'co-occurrence' or 'match' between the infant's and parental behaviors (Feldman, 2006, 2007). This research has consistently shown that synchrony between parental and infant interactive behaviors is linked to both parental and infant OT (e.g., Feldman et al., 2010a, 2011; Gordon et al., 2010b,c). In the present study, mothers who took more initiative to play structured game routines during the interactions with their infants may have felt more comfortable during the laboratory observation, especially when receiving positive affect from their infants in feedback, and this was reflected in an increase in maternal OT from before to after the interaction. This argument would also be consistent with the fact that mothers were more compelled to initiate game routines with infants when they were disengaged from them in the interaction. Thus, the motivation for playing structured game routines during laboratory parent–infant interactions may be to connect, coordinate and thus establish behavioral patterns and bonds between parents and infants as an active strategy to bring the interaction to a new level, rather than simply for the sake of playing.

Additionally, it could be hypothesized that playing social game routines activates maternal caretaking behaviors that are associated with increased levels of OT and prolactin (see also Panksepp, 1998). This hypothesis further suggests that caregiving quality may play a role in the usage of game routines, particularly as a strategy in possibly stressful situations. On the one hand, previous research has shown that the degree of synchrony between parents and infants moderated the relation between parental and infant levels of salivary OT: under conditions of high synchrony, infants whose parents had high OT levels had also significantly higher OT levels compared to infants whose parents had relatively low OT levels (Feldman et al., 2010b). In contrast, no differences were observed in infant OT levels when interactional synchrony of the dyad was low (Feldman et al., 2010b). On the other hand, there is evidence that both play and OT have stress-reducing qualities (Watamura et al., 2003; Marazziti et al., 2006; Numan and Woodside, 2010; Guzman et al., 2013; Elmadih et al., 2014). Specifically, OT released during interpersonal stress modulates the psychological reactivity to these experiences, inducing calmness and increasing motivation for social interactions (Uvnäs-Moberg, 1998). Interestingly, Elmadih et al. (2014) suggested that OT is released to reduce interpersonal stress and anxiety particularly when maternal sensitivity is low. Correspondingly, it is possible that mothers used game routines to relieve the stress arising from the observation situation, which could have resulted in an increase in their salivary OT, and, in turn, heightened their caretaking behaviors. Recent observations of parental use of mobile devices during their interactions with children showing the adverse effects of distracted parenting would support this conclusion. That is, distracted parents showed an overall reduction in their active engagement (e.g., Kirkorian et al., 2009), slow or absent responsiveness, as well as reduced sensitivity (see Kildare and Middlemiss, 2017, for review). Naturally, such interferences of caretaking behaviors would not only increase the overall stressful experience, but also prevent playing social game routines with the aim to reconnect with the infant. Further research needs to take into consideration the factors and circumstances that may facilitate or impede the occurrence of social game routines.

Panksepp (1998) proposes what he calls the PLAY system that is located in the subcortices, particularly in brain regions rich in opioids and dopamine, and there is now substantial evidence showing that opioids can increase playful behaviors (see Panksepp, 1998, for review). In contrast, OT was found to suppress play behavior in juvenile rats (Panksepp, 1998), which seems at odds with the above-discussed findings of an increase in maternal OT after interactions rich in social game routines. However, results of the present study also showed that particularly playful mother–infant interactions (i.e., more games and more time spent playing game routines) were associated with less infant OT sampled after the interaction (i.e., OT 3) as well as a decrease in infant OT from before to after the interaction. There seems to be an interesting interaction between play and social interaction at the neurochemical level: while opioids can increase play, they simultaneously can decrease the desire for social interaction (Panksepp, 1998). Thus, if there is play, then there is no motivation for social interaction, and vice versa, which could explain the inverse relationship between playing games and infant OT in the present study. Because dyads in the present study played a lot, this may have affected infant OT levels,

which is particularly evident in the OT sampled after the natural interaction. Relatedly, it is also possible that the decrease in infant OT from before to after playful interactions reflects a stress-regulating mechanism of the dyad. That is, if mothers and infants experienced the observation situation in the laboratory as particularly stressful, then mothers may have initiated game routines as an attempt to coordinate behaviors with their infant. While in mothers this strategy could have heightened their caretaking behaviors and thus increased their OT level (see discussion above), in infants it could be responsible for a reduction in stress and a corresponding decrease in OT. Thus, maternal caretaking efforts via playing may have moderated infants' stress response.

The present study's specifications may limit the conclusions presented here. First, the study took place in a laboratory setting, which, as discussed above, may explain the high numbers of game routines observed. Maternal behavior toward their infants is affected by a laboratory context (Belsky, 1980), and although care was taken for mothers and their infants to feel comfortable, we cannot rule out maternal performance-anxiety, which would not only affect their behaviors, but also their OT levels (e.g., Marazziti et al., 2006). Second, the linguistic and cultural background of the examined sample strongly limits the possibility to generalize the found results to other populations. As already suggested, the Czech language has a rich repertoire of social game routines that are widely known. The occurrence of social game routines during natural interactions thus remains to be examined in future research to corroborate the game rates found in the present study. Third, the sampling of salivary OT may have affected the present findings, because the coordination mechanism between OT release in the central and peripheral nervous system is not fully understood. Moreover, unlike previous studies using saliva samples (e.g., Carter et al., 2007; Feldman et al., 2010a,b, 2011), we analyzed the samples directly without extracting and concentrating the samples. This might have caused the missing OT data and elevated absolute OT values. Thus, further research validating ELISA analysis of OT in saliva is critically required (e.g., Bosch, 2014). Other potentially limiting factors include recruitment strategies, which resulted in a sample representing middle to upper class, highly educated families and overrecruitment of first-time mothers, the correlational nature and a small sample size. Despite these limitations, the present results provide first compelling evidence of naturally occurring social game routines and the different role of the maternal and infant OT system during playful interactions.

Animal research has shown that play is deeply embedded in the mammalian brain (Panksepp, 1998), and, similarly, it is argued to be essential in the core biological functioning in humans (Kestly, 2014). Thus, there may be an evolutionary advantage for human infants to engage in play. During play, infants can encounter and experiment with different emotions, thoughts, roles and rules, which enables them to have particular expectations about their exchanges with others and thus be intrinsically cooperative. Mainstream views on cooperation see cooperative actions as a result of infants' ability to infer others' thoughts and plans, and combine them to build their coactions in some shared way (e.g., Tomasello, 2009). This view presupposes high-level mentalizing abilities, which very young infants do not possess. An alternative view sees cooperation as a property of interaction processes, not as an individual attitude toward another person (Fantasia et al., 2014a). Accordingly, no high-level mental abilities are needed for cooperation to take place. Recent evidence supports this view by showing that very early in life infants form expectations about and adjust to others' actions during daily routines (e.g., Nomikou and Rohlfing, 2011; R ˛aczaszek-Leonardi et al., 2013; Reddy et al., 2013). Playing structured social games may be considered another form of such joint routines that helps infants to assume different roles and experience variations of social exchanges (e.g., Fantasia et al., 2014b), and consequently become skilled cooperative agents as they participate in them (R ˛aczaszek-Leonardi et al., 2013). Thus, early social games may support the development of complex social competencies, because they enable infants to become increasingly skilled in their social participation without the need for higher-level mentalizing.

### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Ethics Committee of the Institute of Psychology, Czech Academy of Sciences with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Ethics Committee of the Institute of Psychology, Czech Academy of Sciences.

### AUTHOR CONTRIBUTIONS

GM has made a substantial contribution to the study conception, design, formulation of the theoretical arguments, data analyses, and interpretation of data.

### FUNDING

This research was funded by a postdoctoral grant of the Czech Science Foundation (P47/10/P610).

### ACKNOWLEDGMENTS

I am grateful to the mothers and infants who participated in the study. I would also like to thank Martina Brožová, Zuzana (Špircová) Hˇrivíková, Michaela Krátká, Vlasta Radová, Barbora Šipošová, Katrin Steinbrück, and Sophie Weyer for their help with data collection and coding.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.01041/full#supplementary-material



**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Markova. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Developing Hierarchical Schemas and Building Schema Chains Through Practice Play Behavior

Suresh Kumar 1,2 \*, Patricia Shaw<sup>1</sup> \*, Alexandros Giagkos <sup>1</sup> , Raphäel Braud<sup>1</sup> , Mark Lee<sup>1</sup> and Qiang Shen<sup>1</sup>

<sup>1</sup> Department of Computer Science, Aberystwyth University, Aberystwyth, United Kingdom, <sup>2</sup> Department of Electrical Engineering, Sukkur IBA University, Sukkur, Pakistan

Examining the different stages of learning through play in humans during early life has been a topic of interest for various scholars. Play evolves from practice to symbolic and then later to play with rules. During practice play, infants go through a process of developing knowledge while they interact with the surrounding objects, facilitating the creation of new knowledge about objects and object related behaviors. Such knowledge is used to form schemas in which the manifestation of sensorimotor experiences is captured. Through subsequent play, certain schemas are further combined to generate chains able to achieve behaviors that require multiple steps. The chains of schemas demonstrate the formation of higher level actions in a hierarchical structure. In this work we present a schema-based play generator for artificial agents, termed Dev-PSchema. With the help of experiments in a simulated environment and with the iCub robot, we demonstrate the ability of our system to create schemas of sensorimotor experiences from playful interaction with the environment. We show the creation of schema chains consisting of a sequence of actions that allow an agent to autonomously perform complex tasks. In addition to demonstrating the ability to learn through playful behavior, we demonstrate the capability of Dev-PSchema to simulate different infants with different preferences toward novel vs. familiar objects.

Keywords: Dev-PSchema, practice play, schemas, action sequencing, schema chains, play and playthings, modeling of behavior

### 1. INTRODUCTION

Humans are capable of learning within different environments and of extending their knowledge to new situations. As new experiences are gained, our capacity to understand the world and to adapt to changes within it strengthens. We are also capable of generalizing experiences and of repeating successful behaviors that were previously expressed, in related situations. Developing this capability in robots is one of the major goals of roboticists. Modeling requires an in-depth understanding of how we learn from experiences and how we develop our knowledge.

We learn different behaviors throughout our entire life, beginning with initial sensorimotor experiences that develop to high level cognitive reasoning over time and through a series of stages. Piaget's cognitive theory (Piaget and Cook, 1952) proposes different learning stages in humans, supporting the idea of constructivism. He believed that children develop a variety of cognitive skills at different ages. The first stage of the cognitive developmental theory is referred to as the

#### Edited by:

Mehdi Khamassi, UMR7222 Institut des Systémes Intelligents et Robotiques (ISIR), France

#### Reviewed by:

Rahul Goel, Baylor College of Medicine, United States Maxime Petit, Imperial College London, United Kingdom

#### \*Correspondence:

Suresh Kumar suk9@aber.ac.uk; suresh@iba-suk.edu.pk Patricia Shaw phs@aber.ac.uk

Received: 19 December 2017 Accepted: 05 June 2018 Published: 25 June 2018

#### Citation:

Kumar S, Shaw P, Giagkos A, Braud R, Lee M and Shen Q (2018) Developing Hierarchical Schemas and Building Schema Chains Through Practice Play Behavior. Front. Neurorobot. 12:33. doi: 10.3389/fnbot.2018.00033

**89**

Sensorimotor stage, where learning is focused on the sensorimotor experiences of the infants. Experiences that are gained at this stage are related to the infants' own actions and the associated sensory outcomes.

At the early stage of their life infants spend much of their time playing, a behavior closely coupled with their ability to learn (Pramling Samuelsson and Johansson, 2006). They explore their own actions and understand the resulting effects. Owing to this strong correlation, play is seen as an important part of cognitive development (Nicolopoulou, 2010). In addition, play provides a foundation for academic and social learning (Hirsh-Pasek and Golinkoff, 2008).

Infants appear to be very interested in their surrounding environment and tend to perform a wide variety of free play activities in order to explore it. Their actions are not constrained by any predefined rules other than those related to physical capabilities. Nevertheless, physical constraints do help them to scaffold learning, as the infants gradually understand the different elements related to their behaviors. At the sensorimotor stage infants learn all relevant elements of the actions and sensory information that are associated with their experiences (Baillargeon, 1994). Apart from exploratory play, infants demonstrate exploitation behaviors during play. They explore the environment and the objects in it, extending their learning into novel and identical environments through a process of generalization (Baldwin et al., 1993; Welder and Graham, 2001).

In robotics, we aim to develop robots that are autonomous and capable of operating within dynamic environments and adapting to the changes that occur. The robotic agents ought to be able to re-use any previously acquired experiences in order to perform sufficiently in novel situations and under new circumstances. They should also be capable of learning from different experiences and through performing different tasks. Indeed, developmental robotics concentrates on modeling infant learning so that robots learn and adapt in similar ways to humans. More specifically, modeling the play behavior of infants provides a mechanism for robots to explore and discover new knowledge, acting as a driver for learning (Lee, 2011).

To develop a robot that learns from experiences and adapt, several learning systems have been proposed to make it learn from active and passive experiences (Drescher, 1991; Montesano et al., 2008; Krüger et al., 2011; Aguilar and y Pérez, 2015; Petit et al., 2016; Kansky et al., 2017). Drescher (1991) proposed a learning mechanism based on Piaget's schema mechanism. Following Drescher's proposed system, Sheldon (2013) introduced PSchema, a schema-based system that focuses on learning from sensorimotor associations, where learning outputs are formulated as schemas that contain sensory information that is received before and after an action is performed. At the beginning, the system learns a set of basic actions by considering only the proprioception of the hand. This process is referred to as bootstrapping and is inspired by the reflexive movements of infants toward distant stimuli (Piaget and Cook, 1952). The next action to be performed is selected by a mechanism responsible for the calculation of the excitation associated with each action, in an intrinsic motivation fashion. Being an open-ended learning system, PSchema is capable of creating action sequences in addition to generalizing experiences (Sheldon and Lee, 2011). However, they are only created when the targeted conditions are provided by the user.

Krüger et al. (2011) introduced a learning model for autonomous agents using sensorimotor experiences, allowing an agent to interact with the real world and to develop hierarchical knowledge. The latter is termed Object Action Complexes (OACs) and constitutes the means by which the system enables behavior planning. OACs are essentially tuples of (i) an action, (ii) the sensory-state transition (initial to final predictable state) caused by the action, and (iii) the reliability of predicting the resulting state in the environment. Different problem-related learning algorithms can be used to learn OACs. Wörgötter et al. (2009) presented the implementation of an OAC model related to their robotic application. In their work, OACs are learned through a supervised learning method and are tested on a robotic arm. The goal of the experiment they presented is to move an object from one point to another by removing obstacles along the path. Their results show that the OACs model is capable of planning and making predictions. However, goals for the planning are set up by the user rather than the agent itself. This limits the agents capability for performing open-ended learning and encouraging continuous play behavior. Moreover, the capability to generalize experiences is recognized as future work, limiting its performance within novel environments.

Also inspired by Piaget's theory, Aguilar and y Pérez (2015) developed a schema-based learning system called Developmental Engagement-Reflection (Dev-ER) for autonomous agents. Learning consists of schemas that contain preconditions, an action and the postconditions, which are results of applying the action on the preconditions. The model is employed by a virtual agent in a 3D virtual environment, where it can passively observe the environment by moving its head and by fixating to interesting objects. The latter are found by the use of an attention process based on an interest value of the perceived objects. Interest values depend upon three aspects; pre-programmed preferences, number of object features (properties) and virtual emotional interest in the object. The agent is initially provided with two reflexive saccade schemas, through which it develops its knowledge by interacting in the environment to create more schemas. The attention system helps the agent to demonstrate the playful behavior. However, the model is not capable of planning in order to achieve a goal within the environment, or to exploit a sequence of actions in order to achieve a state in the environment which is not possible with a single action.

Most recently, Kansky et al. (2017) developed a schemabased deep learning network, based on the generative model of a Markov Decision Process (MDP). The objects are represented by lists of fixed binary properties, where an object may or may not have a given property in the environment. The network can perform planning toward maximizing the reward from the initial state as it matches the goal state in the environment. To evaluate the network, an experiment is performed using the environment of the classic arcade video game Breakout. In the game, a ball is used to gradually break a brick wall positioned at the top of the screen by being repeatedly bounced between the bricks and the player's paddle that moves horizontally on the bottom of the screen. Points are awarded every time a brick is hit by the ball, which is enough to break it from the wall, without missing the ball. The performance of the schema network is compared with two other deep learning network models; Asynchronous Advantage Actor-Critic (A3C) and Progressive Networks (PNs), in different experiments containing different variations in the environment. The results show that the proposed network outperforms the others in all the variations of the environment, capable of generalizing and adapting what it has learnt to variations of the environment. However, the network still needs a large amount of training to achieve a better result.

The above learning models are comparable to the schemabased mechanism proposed in this work. However, some of these systems do not offer open-ended play behavior (Krüger et al., 2011; Kansky et al., 2017) and some do not offer planning behaviors to achieve a desired or given state in the environment (Sheldon, 2013; Aguilar and y Pérez, 2015). Here, we present an intrinsically motivated open-ended learning and play generator system, termed Dev-PSchema. By employing it, an agent plays and learns that a ball can be grasped and moved to a different location and disappears when dropped in a hole. The system can learn from a small number of experiences and can combine them in order to construct higher level reusable chains of actions to represent more complex hierarchical behaviors. An excitation mechanism triggers learning by exploratory play during which the system generalizes schemas and re-uses them in novel situations. Moreover, with a change in the excitation parameters, different individual infants are simulated, a feature that is absent to all of the above discussed learning models. Finally, the system is sufficiently abstract and can be used with different platforms without making any major design changes.

In Dev-PSchema, each schema consists of the pre and post states of the environment (i.e., the world) related to a highlevel action. The term high-level defines the actions without underlying motor/joint movements.

The work presented in this paper draws inspiration from Piaget's schema mechanism. An initial implementation of this mechanism is given in Drescher (1991), with a model based on the sensorimotor stage, i.e., the first learning stage from the four stages of cognitive development outlined by Piaget. Learning at this stage is believed to be associated with motor actions that are performed by the developing infant. Based on this idea, the schema system simulates an agent which learns from its sensory experiences that result from motor actions, and uses the knowledge that was previously acquired to interact with the environment. The mechanism has no concept of persistence of objects while associating the sensory cues, i.e., touch, sound and vision, with the performed actions in order to generate new behaviors.

In Section 2 we present Dev-PSchema and the experiments along with the results. In Section 3 we discuss the system's capability to express different behaviors due to variations in the excitation parameters and to learn high-level actions by developing schemas chains. Finally, in Section 4 we provide a conclusion about our findings in the light of developmental psychology.

### 2. DEV-PSCHEMA AND EXPERIMENTS

Dev-PSchema builds on PSchema, a previously developed system by Sheldon (2013), and simulates an agent within an environment capable of interacting with it. By considering simulated sensory information as well as actions that the agent can perform, the system is capable of learning action-effect correlations. These are represented as schemas and constitute the knowledge the agent gains by interacting with objects within the environment. At the beginning, the system starts with a basic set of action schemas, referred to as bootstrap schemas (details are found in Kumar et al., 2016a,b), stating the actions that can be performed without describing the preconditions associated with them. Subsequently, the system is free to start applying the schemas in the environment and, by interacting with objects, to learn new ones while expressing playful behaviors. As such, the system is considered a play generator that allows infant behaviors and learns to emerge through playing.

As the agent interacts with the environment, new schemas are added to record new experiences or unexpected outcomes from actions, incorporating the preconditions from which the effect was experienced. These new schemas contain a set of sensory information, the behavior and its predictions in the environment. We refer to the sensory information as preconditions, the behavior as action and the sensory predictions or results as postconditions. Thus, a schema is a tuple that consists of an action and the sensory information from both before and after the execution of the action, as preconditions and postconditions respectively. Any unpredicted effect of actions, as described by the schema used by the agent at any time, leads to the generation of new experiences that are also captured as new schemas. For instance, this happens when the postconditions of a schema do not match the resulting phenomena of the schema's action. Note that Dev-PSchema operates in discrete time; the system records observations before and after the execution of an action. Counting actions that are performed from the beginning of an experiment indicates the time-steps. During a single time-step, the system records all available observations to form the preconditions, executes an action and finally records observations again to form the postconditions. A chain of schemas is also executed within a single time-step.

**Table 1** shows an example of a schema that was learnt after grasping an object using an initial bootstrap schema. Here the sensory information and the actions are defined as high-level abstractions, rather than the sets of raw sensor data and motor commands that they reflect. When in use, the system is connected

TABLE 1 | An example of the concrete "Grasp" schema.


to a body<sup>1</sup> via a low-level system that is responsible for the generation and availability of perceptions and actions for the schemas. In the case of real robotic hardware, the low-level system translates the schema actions into appropriate motor activities allowing the agent to interact with the environment. Although schemas could be used to represent low level actions and sensory information the focus here is on high level playful behavior.

In order to generate play behaviors within an environment, attention and novelty are important (Mather, 2013). Dev-PSchema employs an excitation mechanism that provides action selection by identifying those object-action pairs that are most interesting to the agent considering their postconditions. Selection of interesting object-action pairs depends upon the agent's preferences. Whereas such preferences are affected by novelty and habituation (i.e., familiarity) of the environment. The system provides exploratory play behaviors to interact with the objects and learn outcomes related to different actions performed on them. Note that the objects in the system are defined with the visual perceptions containing underlying properties. On one hand, the system is capable of exploring an object by performing actions associated with it. On the other, the system has the ability to switch between objects as necessary, ensuring the evaluation of the transferability of any learned knowledge while encouraging further explorations.

Furthermore, the system is able to create sequences of schemas in order to achieve a distance state (i.e., set of postconditions) that may not be feasible with a single schema (chains are discussed later in Section 2.2). The agent will create new schemas and chains of schemas from existing schemas wherever possible following the execution of a schema or chain. The process of creating new schemas following interaction resembles the adoption process where a subject learns new knowledge building upon an existing knowledge base as described by Piaget and Cook (1952).

Below we describe the key components that allow the generation of schemas and schema chains and therefore the development of the learning. In particular the excitation calculator (Section 2.1) and the chaining mechanism (Section 2.2).

#### 2.1. Excitation Calculator

Considering all objects in the environment, as they are perceived via sensory information, the agent calculates the excitation of each available schema in order to find the most interesting one to be executed with respect to the current perceived environment referred as world state. Calculating the excitation is based on the similarity, novelty and habituation assigned to each schema, the total excitation of a schema is a weighted combination of these three factors. Varying the weights allows the generation of different play behaviors (Oudeyer et al., 2007; Ugur et al., 2007), that could correspond to different simulated infants or to behaviors expressed within varying external environmental conditions (e.g., playing in a familiar or unfamiliar setup).

In particular, similarity is designed to favor schemas related to previous interactions with a given object, whereas novelty increases the excitation value for new objects or objects that have not been interacted. Subsequently, habituation decreases the interest the agent has for an object that is frequently used for interactions over time. Obviously, novelty and habituation are in contradiction by which the agent switches its attention from objects that have been explored to those that propound novel interactions. Note that although the terminology used in this work is based on that of developmental psychology, the meaning is not an exact match. Therefore, a precise definition of all three of such factors of excitation are given below.

#### 2.1.1. Similarity

This factor is used to describe the degree of resemblance between the object-specific perceptions that are captured at the end of an action and those that constitute the postconditions in each of the previously learned schemas. It is calculated by matching individual properties of an object, such as color or shape.

Such that

$$Similarity = \frac{\sum\_{i=1}^{C(\rho)} \max\_{1 \le j \le C(\xi)} [Sim(\rho\_i, \xi\_j)]}{\mathcal{C}(\rho)} \tag{1}$$

where

$$Sim(\rho\_i, \xi\_j) = \begin{cases} 1, & \rho\_i \cong \xi\_j \\ 0.5, & \rho\_i \sim \xi\_j \\ 0, & \rho\_i \approx \xi\_j \end{cases}$$

returning the similarity between the i th property of the object's perception ρ, that is ρ<sup>i</sup> , and the jth property of the schema's object perception (ζj). C(ρ) is the count of the number of properties in the perceived object and C(ζ ) is that of in a schema object perception. If a property appears in both states but the values are different, then Sim will return a partial match, i.e., 0.5<sup>2</sup> . The result, in short, is the ratio between the sum of all maximum similarities calculated by Sim and the total number of properties in the perceived object.

The result is a number between 0 and 1, with 1 indicating an exact match. Although each property is compared with all properties found in all schemas, only the one with the maximum similarity measure is considered.

#### 2.1.2. Novelty

This is calculated by considering how frequently perceptions that describe an object are confirmed as postcondition in schemas, in connection to the running time-step:

$$\text{Novelty} = (1 + \cos(4.75 \ast \tau\_1))/2 \tag{2}$$

where

$$\pi\_1 = \frac{C(O\_\text{s})}{C(O\_\text{e})} \tag{3}$$

<sup>1</sup>Either a simulator or a real robot.

<sup>2</sup>Where the parameters are numeric and the range is known, the euclidean distance can be used to give a similarity measure between 0.5 and 1.0

with C(Os) being the number of times the object perception O appeared in schemas and C(Oe) being that O was captured in the environment.

The novelty factor is designed to express a smooth curve for values between 0 and 1 for τ1, as shown in **Figure 1**. The cosine is scaled between 0 and 1, with the period reduced such that at τ<sup>1</sup> = 1.0 the value is 50%. Novelty of the perceived object transitions from the maximum to the minimum and then back up to the 50% over the values of τ<sup>1</sup> from 0 → 1. Initially the novelty of the newly perceived object will be the maximum. As the object is played with more frequently or appears more in schemas its novelty reduces. If the object is not played with for a longer period of time, its novelty again increases.

#### 2.1.3. Habituation

This factor depends on how recently schemas containing the object perception are used in the environment. The agent is expected to be more habituated, hence less interested, with a situation that reoccurs after interacting with the environment. This is inspired by developmental psychology, where infants become habituated with objects or events after a period of exploration or observation (Sigman, 1976; Hunter et al., 1983; Kirkham et al., 2002; Colombo et al., 2004). Habituation at a given time-step is given by

$$x\_2 = \begin{cases} \frac{1}{n} \sum\_{i=1}^{n} \frac{T\_{s\_i}}{T\_c}, \text{ if } n > 0\\ 0.0 \end{cases} \tag{4}$$

where n is the total number of those schemas that contain the object perception and that have been executed at least twice, Ts is the time step when a schema s was last executed and Tc is the current time step. If schemas containing the object perception have not been executed more than twice or the object perception never appeared in the schema(s) then τ<sup>2</sup> = 0 and habituation for the perceived object remains 0. Since τ<sup>2</sup> is used to calculate the habituation over the period of time steps its value increases as a schema(s) containing the object perception was executed recently, as shown in **Figure 2**. On the contrary, τ<sup>2</sup> decreases when the object perception does not occur for a period of time steps or a schema(s) containing the object perception has not been used for a long time. Thus the overall habituation is computed by

$$Habituation = 1.0 - e^{(-5\text{ r}\_2)}\tag{5}$$

Similar to novelty, the coefficient at the exponential is designed to smooth the curve for the range 0–1. Habituation is expected to increase as frequent interactions with the environment lead to the same object perceptions being captured, which in turn allows the agent to select actions that promote interactions with different areas of the environment.

#### 2.1.4. Total Excitation

The total excitation is calculated by combining similarity, novelty and habituation, such that

$$\phi = \omega\_1 \times \text{Similarity} + \omega\_2 \times \text{(Novelty -- Radiation)} \quad \text{(6)}$$

where the weights of ω<sup>1</sup> and ω<sup>2</sup> satisfy:

$$a\_1 + a\_2 = 1$$

This allows the agent to select an appropriate object to interact with, by utilizing previous experiences associated to all objects in the environment.

In particular, novelty and habituation are directly combined as they are both related to experiences associated with the currently perceived object, whereas the similarity considers all experienced perceptions of the objects which the system has previously interacted with. By varying the weights, we can simulate different artificial infants with different preferences (e.g., novel vs. favorite toy). Applying a higher weight to ω<sup>1</sup> will make the agent more likely to interact with similar objects. Whereas with higher values of ω2, the agent will be more likely to interact with novel or less familiar objects. This can also be seen as a preference toward exploration or exploitation. The parameters ω<sup>1</sup> and ω<sup>2</sup> at 0.5 will allow the robot to direct its attention toward a novel object, while keeping all other parameters constant.

Alongside the object-related excitation, the agent calculates the excitation of each schema in the system, in order to select

an appropriate schema to be employed. Thus, this excitation is related to the possible actions that could be performed for each object, rather than the object perception alone. If the perception(s) in the environment following an action matches the post conditions of the schema, the execution is considered to be successful. A success rate S<sup>r</sup> is maintained to record the proportion of time that the expected outcome of a given schema has been achieved. This can also be considered as a reliability measure for each schema, such that

$$
\lambda = \mathcal{S}\_r \times e^{-1.1\frac{T\_\ell}{T\_\ell}} \tag{7}
$$

where T<sup>s</sup> is the last time step on which a particular schema was executed and T<sup>c</sup> is the current time step. A coefficient to the exponential power is used as a smoothing factor to obtain an exponential response over the values of the ratio between schema executions and current time. Ultimately, the final excitation for each schema is calculated by considering each object that is present in the environment, so that

$$\text{Exaction} = \left(\omega\_3 \times \frac{\sum\_{i=1}^{m} \phi\_i}{m}\right) + \left(\omega\_4 \times \lambda\right) \tag{8}$$

with the weights satisfying:

$$a\_3 + a\_4 = 1$$

where m is the number of all the perceived objects, φ<sup>i</sup> is the excitation of the ith object and λ is the particular schema's excitation. Notice that due to Equation 7, a schema that is being executed repeatedly results in a lower excitation value for λ, which in turn contributes less to the final excitation. In a similar vein, schemas that are never used become more excited than their recently executed counterparts, enabling the agent to explore the environment by performing different actions. The parameters ω<sup>3</sup> and ω<sup>4</sup> at the value of 0.5 will allow the robot to switch its behavior, keeping all other parameters constant.

Algorithm 1 describes the process of calculating the excitation for a given environment state, referred to as world state WS in the system. It computes the excitation of schemas and schema chains (to be introduced next) and returns the schema or chain with the highest excitation, following the winner takes all principle. In the case of equal excitation, schema chains will be preferred to encourage the system to explore more complex behaviors.

Function Diff (line 32) returns an excitation based on the change in the preconditions of schema s<sup>i</sup> to the postconditions of the next schema, si+1, in the chain, and C<sup>r</sup> (line 36) is the success rate of the chain. During the calculation of schema excitations, the system generates schema chains as described below in Section 2.2. Once finished, Algorithm 1 results to the schema or chain with the highest excitation.

#### 2.2. Schema Chains

As an agent gains more experiences and skills, certain skills can be linked together to form higher level skills in a hierarchical structure. For example, individual actions such as reach and grasp can become linked by a single reach→grasp action. Through playful exploration, more complex chains can be learned that combine basic and form more sophisticated high level actions.

Chains are seen as sequences of schemas, which the agent discovers by finding the links between the preconditions and postconditions of the schemas in memory. Chaining helps in achieving distant states of the environment that are not possible when employing a single schema. For example picking up an object from a reachable position needs two different actions to be achieved; (i) reach for the object and (ii) grasp it. **Figure 3** shows an example of a two schema chain obtained by linking the preconditions and postcondition of two different schemas.

Algorithm 2 is responsible for the chain generation. As previously mentioned, chains are created during the process of calculating the excitation for schemas. Longer chains are discouraged during the chaining process in order to reduce computational costs and avoid overly complicated chains that are more likely to be unsuccessful. Here, a limit of 5 schemas is set.

In Algorithm 2, the schemas S<sup>s</sup> contains preconditions which are a subset of the current environment, WS. The algorithm adds all the possible chains, for a given state of the environment, into the memory and returns the most reliable chain among them. Reliability of a chain is calculated by taking the average of success probabilities of all the schemas present in the memory.


Schemas in a chain are executed in a sequential order. A chain is considered successful if the resulting WS due to the preceding


schema's action matches its postconditions. A chain execution is performed either as chain reflexes or motor programs as described below.

#### 2.2.1. Chain Reflex

Initially chains are executed in the chain reflex mode. The world state (sensory information from the environment) is considered at the end of every executed schema in the chain. If it does not match the expected postconditions of the executed schema then the schema chain is considered unsuccessful. The term "match" means all the observations in the postconditions are obtained as an outcome. An unsuccessful chain is then opted out from the next step's schema selection.

#### 2.2.2. Motor Program

If a chain is successfully executed multiple times, then it is considered reliable and therefore becomes automatic, in a sense that it behaves as a singular continuous higher-level action called a motor program. As such, the chain is used to achieve a certain condition that results from a hierarchy of actions.

At least four successful repetitions and a probability of success higher than 80% render a chain sufficiently repeatable to be considered as a motor program. Motor programs are executed sequentially without the need of intermediate verification of the world state. That is, only the last action's resulting postconditions are used for the evaluation of the motor program. Consequently, if the validation (4 successes and 80% success rate) fails, the motor program's success probability is negatively affected turning it to a standard chain.

Algorithm 3 describes the execution process of an exciting schema or a chain. Note that executing a chain is considered as taking a single time step. For further details on this mechanism of Dev-PSchema, please see Kumar et al. (2016a,b).


During the execution of a motor program, although the external state of the environment may not be directly monitored by the high-level agent, the internal proprioceptive system is active. When interfaced with a low level system that is monitoring all the sensors, the chain can still be interrupted if something unexpected was perceived.

The concept of schema chains is inspired from developmental psychology, where the ability for planning, hence action sequences, is investigated (Willatts and Rosie, 1989; McCarty et al., 1999; Rosenbaum et al., 2007). McCarty et al. (1999) investigated planning in 9, 14, and 19 month old infants. A spoon full of food was placed in various orientations in front of the infant. It was observed that 9 and 14 month old infants reached and grasped the spoon with their preferred hands. Due to difficult orientations of the spoon, 9 month old infants were found to grasp the spoon from the opposite side of the spoon, i.e., the food rather the handle side, before a corrective grasp change was required. The 14 month old infants always made corrections to make sure that the food reaches the mouth, whereas the 19 month old infants were found to switch to their non-preferred hand when the orientation of the spoon was difficult. The authors identified a series of planned strategies employed by the infants each with the goal of eating the food that can be considered as chains of action schemas.

The concept of a motor program is also inspired from developmental psychology, such as the work by Lashley (1951) investigating the hierarchical organization of behavioral plans. He believed that the concept of a motor program was being ignored over the concept of chain reflexes. The theory of chain reflexes proposes the serial order of behaviors with sensory feedback, which contributes to the excitation for each of the sequential building blocks of the chain. However, the motor program theory proposes the serial order of the actions in the behavior where the sensory feedback of the intermediate actions are ignored. Lashley (1951) believed that more time was spent at the beginning of the sequence with a shorter time in between the behavioral elements where errors in the behaviors support the theory of motor programs. The longer time spent at the beginning provides the planning of the entire sequence leading to shorter gaps between the behavioral elements, which are not sufficient to receive feedback and plan the next step. More recently, his work has been reviewed by Rosenbaum et al. (2007). Although the review suggests that going directly to motor programs and ignoring all sensory feedback is discounted, key-frames are identified in behaviors between where motor adaptation can be performed. The authors also observe that the execution time of actions between key-frames is significantly reduced following 4 to 6 repetitions. The behaviors displayed between these key-frames could be considered as short chains being executed as our motor programs, supporting the need to limit the length of any chains generated.

### 2.3. Experiments and Results

We present two different experiments that demonstrate the capabilities of the proposed schema system. In the first experiment, we show the impact of varying the four weights used in the excitation calculation during play. As previously mentioned, this is equivalent to simulating different behaviors by different individuals in infancy. In the second experiment, we examine the capacity of the system with respect to playful exploration and the application of chaining, is performed both in a simulator and with a real iCub humanoid robot (Metta et al., 2008).

In the simulator, the environment consists of a 5 × 5 grid of regions, where an end effector that represents a hand and several objects are situated. Both objects and the end effector occupy one indexed region in the environment, but objects can form a stack where multiple objects occupy the same region/position. The positions of the regions in the environment are labeled with horizontal and vertical coordinates x and y respectively. An example of the simulated environment is shown in **Figure 4**.

Dev-PSchema receives sensory information as observations representing relevant sensor data e.g., visual observation represents color and shape. An object is represented by an

observation containing relevant sensory information from a single world position in the case of the simulator and single gaze position in iCub. There are three different sensor cues simulated; visual, touch and proprioception. The visual sensor provides object perceptions that include the color, shape and the coordinates of the objects in the environment as the properties of the perception. These properties are used in similarity calculations of perceived object for excitation. The touch sensor provides indication of contact between the hand and the objects and, coupled with the proprioception is used to determine whether an object is being held. The proprioceptive sensor provides the coordinates of the hand on the grid and a value of 0 or 1 that represents the state of the hand's grip, where 0 represents the open hand and 1 represents the fully closed hand. The simulator returns all the perceptions of the objects present in the environment, alongside proprioceptive and touch perception, if touch perception exists.

The end effector can perform several different actions in the environment resulting in different perceptions. Both the actions and sensory perceptions are specified at a higher level to maintain the focus on playful interaction rather than the low level sensorimotor control. Actions used in this work are defined as follows:


an object at the current hand's position. If there is no object present the hand will fully close. In both cases, the corresponding touch related perceptions are expected to be captured.

• Release: This action is the reverse of a grasp. It triggers the hand to fully open and, if the hand already holds an object, to drop it on the surface of the grid. A dropped object is expected to be found at the same position of the hand, offering corresponding touch perceptions.

Although simplified, this set of actions are sufficient to demonstrate the playful capabilities of the agent similar to an infant's play. They provide an initial set of predefined actions here to bootstrap the process. In a developmental system these actions could be learnt through a combination of reflexive and exploratory behaviors.

### 2.4. Experiment 1 (A): Novel vs. Familiar Preference

This experiment is inspired by the study of "Young childrens preference for unique owned objects" by Gelman and Davidson (2016). The study investigates the infants' preference to be attracted by a well known object (a favorite toy) rather than a new identical object or a novel, non-identical object. In the study, most of the time infants tend to select their own objects when they are given a choice of two. Interestingly, the infants are found to select the identical or novel object when they are asked to select an object for the experimenter.

To replicate the behavior of infants in the experiment only the reach schema, hence action, is used. The agent's preference is expected to be demonstrated by utilizing several reach related schemas that are gradually learned by interacting with the objects on the grid.

At first a single object, a red cube is presented to the agent. The environment and the perceived world state are shown in **Figure 4**. With the single reach schema in memory, the agent is most excited to interact with the object by reaching toward it. Once reaching is performed successfully, we reset the environment and return the hand to its initial position. The experiment is divided into two stages: Stage one is for familiarization, that is the agent reaches for the same object for at least three times. Stage two is for test condition, where both the familiar and a novel object are presented to the agent. This stage is further divided into four parts for each of the novel object introduced. For each object combination, the weightings for similarity, novelty and habituation, ω<sup>1</sup> and ω2, are varied to show the change in preference. Note that ω<sup>3</sup> and ω<sup>4</sup> remain 0.6 and 0.4, respectively, in all the variations of this experiment. A slight weighting bias is given to the value of ω<sup>3</sup> over ω<sup>4</sup> to keep excitation dependence on the similarity and habituation/novelty rather than schema statistics.

### 2.5. Experiment 1 (B): Action Preferences

By varying the excitation parameters described in Section 2.1, several different behaviors emerge from interacting with the environment. In particular, here we vary the weights ω<sup>3</sup> and ω4, keeping ω<sup>1</sup> and ω<sup>2</sup> constant (0.5 each). We examine the agents preference to either favor recently executed actions or switch to different actions during a series of executions. For this experiment, we use the same agent and the environment described in Section 2.4. However, the agent will have only two different actions here, "Press" and "Squish", which produce the same outcome in the environment. Having the same outcome/postconditions for both actions gives the same similarity and novelty/habituation. Hence excitation of both schemas will only depend upon the schema statics.

We only use one object in the environment for this experiment to control the variation in object excitation, and place the end-effector at the same position as the object to remove the reach action from this experiment. Each action, squish or press, responds with a new observation, press, in the environment. By producing the same outcome for each action this will provide the same value for the similarity and novelty/habituation pair, so the two action will be comparable using schema excitation only. We let the agent play with the object using the actions and record which action is selected at each execution.

### 2.6. Results of Experiment: 1 (A)

Following the familiarization stage, along with the original object (i.e., the red cube) we introduce four different objects one by one. Each of the new objects contains at least one common property to the red cube, such as color or shape<sup>3</sup> . A blue cube, a red ball, a red cube and a blue ball are used, with the latter being the object with no common properties to the object the system is familiar with. We expect that the agent will prefer to reach for the novel object when it is introduced. However, by changing the parameter values, we expect the agent will reach for the familiar object rather than the novel one. The initial weight for similarity ω1, and novelty and habituation ω<sup>2</sup> are both set to 0.5, then the weight ω<sup>1</sup> is increased in steps of 0.1, whilst maintaining ω<sup>1</sup> + ω2=1 until the observed behavior flips toward the familiar object. Below is a discussion of the observed behavior of the agent, following the initial experience and perceiving the novel object over different values of the excitation parameter.

#### 2.6.1. Novel Object With no Change in the Parameters (Same Color & Shape)

When an identical object<sup>4</sup> , to the red cube is placed in the environment, the agent draws its attention to it, as similarity and novelty/habituation are equally weighted. With just 10% increase in the similarity weight (ω<sup>1</sup> = 0.6), the agent's preference switches to reaching toward the familiar object. **Figure 5** shows the excitations of the two reaching decisions, one toward the familiar object and one toward the novel (identical) one, after having only experienced reaching to the familiar object during stage one.

For each weighting, the executed action is the one with the highest excitation. The first three executions in the figure represent the familiarization stage of the experiment. The dotted lines represent the reach for the familiar object and continuous lines represent the reach for the novel object. Note that the

<sup>3</sup> Here the different types of properties are weighted equally.

<sup>4</sup>All the properties with the same value, except at different position.

novel object is only introduced following the completion of the familiarization stage. The enclosed figure shows that for the novel object at equal weightings (red star) the excitation of the "reach for novel" object is higher, whereas with a similarity weighting of 0.6 (blue circle), the excitations are almost the same, giving a marginally higher value for "reach for familiar object." At this point, the agent prefers to reach for the familiar object rather than the novel one<sup>5</sup> , unlike it did previously when weights were 50% for both ω<sup>1</sup> and ω2. Thus, increase in the weight for similarity (ω1) enabled the agent to prefer the familiar object rather than novel.

#### 2.6.2. Novel Object With Change in Single Parameter

By varying just ω<sup>1</sup> from 0.5 to 0.7, it is observed that the agent interacts with the novel object, i.e., the blue cube or the red ball after being familiarized with the red cube. Changing ω<sup>1</sup> to 0.8 and ω<sup>2</sup> to 0.2, the agent's behavior switches from interacting with the novel object to interacting with the familiar one after being familiarized. Here interacting, it means reaching toward the object.

Thus the additional variation in the object properties results in the agent interacting with the novel object instead of the familiar one, until a higher weighting toward the similarity parameter is applied to draw the agent's attention toward the familiar object. At this level (similarity weight ω<sup>1</sup> = 0.8), the low weight to the novelty/habituation parameters (ω<sup>2</sup> = 0.2) counters the excitation generated from the different properties. **Figure 6** shows the excitation of the "reach novel vs. familiar object" schemas for the different values of the excitation parameters.

Changing the similarity weight value allows several individuals to be simulated. For weights in the range 0.5 − 0.7 for ω1, the agent is found to interact with the novel object, however each of those has different excitations for reaching toward the novel object and reaching toward the familiar object actions. When the similarity weight is set to 0.8 or above, the agent is more likely to interact with the familiar object rather than the novel one. As anticipated, both the object and schema excitation weights (i.e., ω<sup>3</sup> and ω4) cause the agent to habituate with the same object and action in case that the agent is allowed to interact with the world for a longer period of time.

#### 2.6.3. Novel Object With Change in Both Properties (Color & Shape)

When an object with different color and shape properties to those previously experienced is introduced, the agent requires a greater weighting on similarity values in order to draw its attention to the familiar object. A completely novel object being introduced generated a high level of excitation triggering interaction with it. The simulation results show that the agent reaches for the novel object with similarity weight set to 0.5, 0.6, 0.7, and 0.8 respectively. When set to 0.9, the behavior of the agent finally switches to interest in the familiar object instead of the novel one. **Figure 7** shows the excitations for schemas for reaching toward the familiar and novel objects.

From **Figures 5**–**7**, it is evident that the agent's preference in the environment changes with the variation in the excitation weights ω<sup>1</sup> and ω2. A weighting bias toward ω<sup>1</sup> will increase preference toward familiar objects. However, as the difference between the familiar and novel object increases, so do the weighting toward ω1.

#### 2.7. Results of Experiment: 1 (B)

In this experiment, the agent has the option to perform two different actions on the object. Both actions are controlled

<sup>5</sup>Winner takes all, therefore size of gap is not important.

FIGURE 6 | Reach actions for Familiar vs. Novel (change in either color or shape) object. Enclosed Figure shows the excitations at the 4th execution.

to provide the same outcomes in order to ensure they both provide the same object excitation based on similarity, novelty and habituation. Thus, the excited schema (or excited action) depends on the schema excitation as described in Equation 7 and its weight (ω4). The agent's observed behaviors for different values of the ω<sup>3</sup> and ω<sup>4</sup> are shown in **Figure 8**.

In particular, this figure shows the most excited schema, hence the action, for each execution at different paired values of ω<sup>3</sup> and ω4, for 10 executions. From the results, it is evident that the agent shows different behaviors as the weights vary. As the weight shifts toward ω4, the agent becomes increasingly inclined to frequently switching between actions, rather than to explore the effects of the previous action further.

### 2.8. Experiment 2: Playful Discovery of Action Sequences-Chaining

In the second experiment, we demonstrate the capability of playful behavior for exploring an environment, discovering action outcomes then creating schemas chain to form higher level behaviors in a hierarchical manner. For the first part of this experiment we will use the same environment with different objects and all the actions as described in the beginning of Section 2. The experiment is then repeated on an iCub humanoid robot (Metta et al., 2008) to show the application of Dev-PSchema in a real world scenario.

This experiment contains two stages. In the first stage, we introduced an object (red cube in simulator) and a hole in the environment and let the agent play with it. The hole in the environment is perceived as an object with color and shape, however it cannot be interacted with through grasping. The agent will not get any touch perception when it reaches toward it and when attempting to grasp, the hand will close fully to a fist. When an object with the similar shape as the hole is released in the hole, it disappears from the environment. **Figure 9** shows the environment for this stage of the experiment and perceived state by the agent.

During the aforementioned first stage, the agent is allowed to freely play with the objects in the environment. The stage ends when the agent drops the object in the hole. Note that the aim to drop the object in the hole is decided by us (experimenter), but not specified to the agent. The agent is neither programmed with this aim, nor contains any schema to perform this specific action. At the start, the agent only contains the raw actions (Reach, Grasp, Release), without any understanding of the effects that the actions will have on either object in the environment. We expect that during a period of playful exploratory behavior

using high-level motor babbling, the agent will be able to achieve the aim of the experiment i.e., creating sequences of actions.

In the second stage of the experiment, the environment is reset to evaluate the ability of the agent to exploit the knowledge gained during stage 1 and to apply chains of higher level actions. We anticipate that the agent will be able to create a chain of four actions (reach for cube, grasp, reach for hole, release) to pick and drop an object in the hole in a single execution rather than the exploratory play it did in the first stage. Note that the agent is still able to generate and reuse chains as during the first stage of the experiment. The parameter weights used in this experiment for the simulator are 0.5, 0.5, 0.6, and 0.4 for ω1, ω2, ω<sup>3</sup> and ω<sup>4</sup> respectively. We made a slight change in the weights of the ω<sup>3</sup> and ω4, as compared to the weights used in Experiment 1, to encourage the agent to become habituated with schema actions quickly during play and therefore to try different schemas, hence different actions.

#### 2.8.1. Humanoid Robot and Environment

The above experiment was also conducted using the iCub humanoid robot (Metta et al., 2008), where a low-level system is responsible for (i) providing high-level action commands, and (ii) preparing and maintaining visual, proprioceptive and tactile perceptions. The only changes made to Dev-PSchema were to the weights and slightly increasing the tolerance of similarity to account for variations from the robot sensors. The expected sensory information is undefined, enabling the system to respond to new and previously unknown states or actions that may become available from the low level system. This shows the ability of Dev-PSchema to be applied to different and more complex settings.

In terms of actions, the reach, grasp and release commands are available after they are learnt using developmental approaches as documented in previous research efforts by Law et al. (2014b), Shaw et al. (2015), and Lewkowicz et al. (2016). Reaching is learnt by employing an approach that is inspired from hand regard in children during infancy (Rochat, 1992). This learning approach consists of random arm movements that trigger eye saccades on the visually stimulating hands. Once fixated, mappings are learned between the reaching space and the visual space, i.e., the gaze space of the robot (Giagkos et al., 2017b). Further information regarding learning the reaching space is found in Earland et al. (2014). The result of learning associations between reaching and gaze spaces is twofold. First, the robot is capable of performing reaches to a given set of coordinates within its reach space. Second, by knowing the exact hand position in the reach space, the robot is able to know where the hands are located in its gaze space by following the associations previously learned. Thus, the robot is able to perform eye and head movements in order to visually visit its hands, if necessary.

For grasping, the robot currently employs a mechanism inspired by the reflexive grasping in infancy (Giagkos et al., 2017a). When a touch sensation occurs on any tactile sensitive area of the hand, motor commands are sent to all digit joints to form a power grip. Joints that have reached a maximum are excluded from further motor activity. Digit joints are excluded when tactile sensation is constantly received from the associated fingertip, indicating that an obstacle is firmly grasped. Equivalently, a release command opens the fingers iteratively, as long as their joints have not reached their minimum values.

All perceptions are prepared by monitoring and grouping information that is received from the robot's sensory cues. At the beginning of the experiment, the robot is given time to visually explore its intermediate space by performing saccades to stimulating targets. In this experiment, green and red patches on the retina visually attract the robot's attention. Coordinates of the fixation target are calculated by considering the kinematics model of the robot with respect to the head configuration of the fixation. The gaze coordinates act as the equivalent of the world coordinates in the simulator. Subsequently, all color information that is found within the foveal area of the retina (i.e., the circular area depicted in **Figure 11**), as grouped as part of the same visual perception. This is because at this stage, visual targets that are found in the fovea are considered being part of the same object in the world. Along with the HSV color model values (i.e., Hue, Saturation and Value), the area is also calculated, being followed by the fixation target's depth. HSV is preferred over other color models such as RGB due to its robustness toward external lighting changes, with Hue varying relatively less in real-world environments. In brief, raw images from the DragonFly2 cameras of the robot are processed to identify stimulating targets of interest. Color detection is achieved by comparing the perceived HSV values against the range that defines each detectable color. Subsequently, the centroid of each target, the mean HSV and also the area's size in pixels are reported. This approach allows the system to identify potentially stimulating areas in the scene and utilize their attributes to characterize them. The low-level feature extraction mechanism employed in this experiment is discussed in Giagkos et al. (2017b). Although the gaze space is two-dimensional, an estimation of the depth of the fixation is measured to enrich the information about the visual perception in the three dimensional space. Depth is calculated after the eyes converge or diverge to perform both eye fixation.

As with the visual perceptions, tactile information is analyzed by the low-level system, in order to prepare tactile perceptions for Dev-PSchema. A tactile perception consists of the touching hand identification and the areas that received tactile information on it (i.e., the 5 fingertips and the palm). Finally, proprioception perceptions are sent for each hand of the robot, consisting of the position of the hand in the gaze space and the value related to the current hand grip. The latter reflects the hand's open and close configuration in percentage with 0% defined as fully open and 100% as fully closed.

Unlike the simulator, where the world state is provided by the software, visual changes in the real-world cannot be fully captured unless the robot visually revisits the areas of interest. Previously generated visual perceptions may no longer be available due to several real-life phenomena. For instance an object is perceived differently while it is partially or fully hidden from the eye cameras while the arms move within the reach space, or when the object has moved while an action is performed. Not all the visual perceptions are found in the retina at all times. This means that substantial head movement may be required in order to update their information, or the robot needs a way to update the world state perceptions after each action. To tackle this practical issue, the low-level system keeps a short term memory of the gaze targets with which it previously engaged, and iterates through them at the end of every action. Having such access to up-to-date world state perceptions and actions, the associated Dev-PSchema mechanisms can efficiently operate.

The experimental set-up that is used for this experiment is illustrated in **Figure 10**. A red soft toy is placed on a wooden board that contains a hole, big enough to ensure a successful drop. The hole is marked with a green color tape to be visible to the robot. Visual perceptions of both targets are sent to Dev-PSchema containing their coordinates in the gaze space. In order to match the simulator's experiment, one robotic arm is utilized, limiting the amount of proprioceptive and tactile perceptions to the right hand only. **Figure 11** shows how targets are perceived by the eye-cameras from the environment. Using the iterative mechanism mentioned above, the visual perceptions of both the

FIGURE 10 | Experimental set up for the iCub and perceived sensory information.

red and green targets are updated to constitute a fresh world state for Dev-PSchema's postcondition matching and excitation computations.

In this experiment we used 0.5 for all the parameter weights ( ω1, ω2, ω<sup>3</sup> , and ω<sup>4</sup> ). Equal weights (0.5) for the similarity and novelty/habituation pair will encourage the agent to interact with the less habituated and more novel object, having the same similarity. For ω<sup>3</sup> and ω4, this encourages the agent to switch objects and schemas, hence actions, frequently. Values from iCub perceptions were all normalized to the range of [0, 100], with 10% tolerance to account for noise from the raw sensors.

#### 2.9. Results of Experiment: 2 (Simulator)

During the first stage of the experiment, the agent playfully explored the two objects and actions available in the environment. As new experiences were gained, new schemas describing these were formed. These new schemas had high novelty and therefore, where often selected as the next action, resulting in a playful behavior that repeats interesting actions, thereby also confirming their effects. Initially the agent focused its attention on the cube, learning the effects of reaching, grasping and releasing it. These actions were then combined into various chains that were tested, before the attention switched to the hole. At this point it was still holding the object, which it discovered to have moved with its hand. Attempts to grasp the hole made no difference, allowing the release action to become most excited again, and finally dropping the object in the hole. **Figure 12** shows the excitations of different schemas and chains created during the playful behavior.

Before each action execution, the agent calculates the excitation of all the actions with the action (schema) of the highest excitation executed. **Figure 12** shows the winning action at each execution in the experiment. During the play, it also created and executed chains of schemas. The continuous lines **Figure 12** shows the excitations of the schemas and dotted lines represent chain excitations. Initially there are no chains available for the agent. Once the agent performs the grasp action, it created the "Reach and Grasp chain" and executed this at the 8th execution. Similarly, once the agent released the object, it discovered the "Reach, Grasp, and Release" chain. The chain was then executed twice as it had the highest excitation at the 9th and 10th execution.

Once the agent reached the first stage aim, we reset the environment, for the second stage of the experiment, by placing the object back at the same position as shown in **Figure 9**, and the hand back to its starting position. At this point the agent already had experience of dropping the object in the hole, so this stage evaluates the agent's ability to reuse that knowledge. The agent created the 4-schema chain "Reach, Grasp Cube, reach Hole and Release" following stage 1. It calculated the excitations of all the

schemas and the chains and this 4-schema chain (dropping cube in the hole) was found to be the most excited. This is due to it being a new chain and also making the highest difference within the environment. Execution 19 (Reset on X-axis) in **Figure 12** shows the excitations of all the schemas and chains for the given environment.

**Figure 12** also shows that at the final execution, the excitations for all the schemas were less than the 4-schema chain. However, two other 3-schema chains i.e., "Reach, Grasp Cube & Reach Hole" and "Reach, Grasp Cube, and Release at Hole" have the same excitation as the 4-schema chain. The agent, in this condition, picks the longest chain (4-schema chain) to execute. During the chain execution, the agent checks the sensory feedback to confirm if it is getting the expected postconditions at the end of the action in the 4-actions (schemas) chain. Thus, the chain is executed in the "Chain Reflex" mode here.

### 2.10. Results of Experiment: 2 (iCub)

The experiment starts with the robotic arm at what we refer to as the home position. Having the arm raised next to the head and thus outside the robot's visual field, it is ensured that the initial acquired visual perceptions reflect a world state of inactivity. At the beginning, both targets are equally exciting for the robot therefore it initially selects to reach toward the hole target. Grasp happens to be the next exciting action to be performed, and due to the perception changes at both visual and proprioceptive levels, new schemas are generated. These new experiences are repeated and followed by a release action, an order which leads to the creation of a schema chain in the system; "Grasp→Release." The related excitations are depicted against the Y-axis of **Figure 13**, whereas the X-axis shows the order of schema (i.e., action) execution.

After a number of executions where related to the hole target, habituation occurred and therefore the robot reached toward the ball (10th execution). After a successful grasp action, the world state was updated with the red ball to be ultimately perceived differently due to the grasping hand partially covering it. Subsequently, the sudden change to the visual perceptions offered a lot of new stimulation, fostering the creation of new schemas. As a result, grasping again became the most exciting action to perform, while holding the object. This repeating behavior is akin to squeezing an object, which in turn results in to several changes in visual and proprioceptive perceptions. However, after a number of grasp actions, the system habituated and a release was selected for the 17th execution.

Once released, the object dropped on the wooden board again giving different visual perceptions. A new post-release schema reflecting the new world state was learned for iCub to repeat, and after a few executions, it ultimately utilized the "Grasp→Release" chain to interact with the object. The robot then moved its arm to the hole coordinates while holding the ball at the 25th execution, followed by a release command being issued at the 28th execution which caused a successful drop of the ball into the hole.

For the second stage of the experiment the robot is expected to utilize the previously learned schemas and schema chains to express the similar playing behavior. Thus, without specifying a particular goal the aim is to evaluate the ability of the system to link past experiences and actions from its repertoire with the environment and to succeed in dropping the ball into the hole. The robot's performance differs in this stage from the

simulator. The amount of noise in the real world is found to play an important role in delaying the process of appropriate schema selection for execution. The significant variation between schemas makes it difficult for the robot to directly link between them. However, it is anticipated that with generalization over the variation in perceptions, the generation of a full chain for dropping the ball in the hole would be possible given sufficient time for exploration. Although the number of schemas will gradually increase over time with more exploration, the process of generalization will limit the number of schemas in total (Law et al., 2014a; Kumar et al., 2016a). Thus, given the noise in the perception of the real environment, we anticipate that an affordable number of executions will be needed to achieve the desired chain. Nevertheless, subsets of the desired full chain are generated and repeated by the system, such as the "Grasp→Reach" and "Grasp→Release" chains.

only. Each solid lines shows excitation of the corresponding schema, whereas dashed line shows that of a chain.

### 3. DISCUSSIONS

Dev-PSchema is expected to provide an interesting and appealing approach for developmental robotics. To demonstrate the abilities of the system we performed two sets of experiments.

### 3.1. Discussion of Experiment 1

In the first experiment (1A), the agent is shown to express different behaviors for the novel object, while the weights of the similarity and excitation parameters change. A summary of the points at which the changes occur is given in **Table 2**.

Excitation and attention are seen as important factors for individual behaviors in developmental psychology. Although TABLE 2 | Summary of the weightings at which the observed behavior changed preference from familiar to novel.


vision is the least developed sense at birth, humans have evolved to rely heavily on this sense (Slater and Bremner, 1989). Colombo (2001) considered alertness, object features, spatial orientation and endogenous control as the basic factors that affect visual attention in the environment. In this work, we are concerned with the last three factors of visual attention. Object features and relevant spatial orientation are inseparable. That is, the question "what" is related to the question "where" in the visual field. When the eyes fixate a target, moving it from the peripheral to the foveal vision, the direction of attention shifts in order to maintain the attention locked on the target and thus to engage with it. The endogenous control factor in visual attention is responsible for holding the attention and engaging. The noveltyfamiliarization pair is used in developmental psychology to investigate visual attention in humans. To investigate visual attention in this experiment, the simulated infant is initially familiarized (habituated) with a visual stimuli or event and is then presented with novel and familiar objects side by side (Wilcox, 1999; Sann and Streri, 2007; Schmuckler et al., 2007; Gelman and Davidson, 2016).

A habituation paradigm is widely used in developmental psychology experiments to test infants' ability to identify or recognize objects (Wilcox, 1999; Sann and Streri, 2007; Schmuckler et al., 2007), or events (Rosander and von Hofsten, 2004; Kellman and Spelke, 1983) based on visual cues. These examples show that infants tend to look longer toward novel objects or novel and unexpected events than toward those which are familiar or predicted. However, infants have been observed to have favorite objects for interaction and play (Furby and Wilke, 1982; Jonsson et al., 1993). Also, it has been observed that young children prefer their favorite toy over new toys, irrespective of identical form (Gelman and Davidson, 2016). In the experiments by Gelman and Davidson (2016), young children were asked to select a toy from a choice of their own or a new toy (identical and non-identical). They preferred their own toy when they were asked to choose for themselves and preferred the novel object when they were asked to select for the experimenter.

Sigman (1976) investigated the exploratory behavior of the pre-term and full-term infants at the same conceptional age. Both birth groups were familiarized with an object, a small ball. Following the familiarization period, the infants were provided with the same object along with other novel objects, each for 1 min. It was observed that both groups explored the novel objects more than the familiar objects. However, the pre-term infants explored the familiar object for longer than the full-term infants.

Ruff (1986) examined behaviors of 7-and 12-month infants with a set of objects over a period of time. Six different objects were presented in front of each infant, each for a period of 1 min. Different activities such as examining, mouthing and banging, were recorded during the trails. It was observed that the examining of each object decreased over the period of time. In addition, examining occurred before the other activities when a new object was presented. Furthermore, the 7-month old infants spent more time on examining and mouthing than the 12-month old infants.

From these examples it is evident that children show different behaviors for novel and familiar objects depending upon their experiences. This effect was reproduced within Experiment 1 by changing weights of the excitation parameters. The results showed the capability of the system to demonstrate different behaviors when interacting with a novel vs. familiar object. When the features match, the habituation effect from the first object can be considered as transferring to the new object, resulting in low novelty-excitation. Therefore, only a small change in favor of the similarity triggers a change in observed behavior. However, as the novel object becomes increasingly different, the novelty/habituation value of it becomes increasingly higher, requiring a greater weighting on similarity to cause the change in behavior.

This behavior of the artificial agent can be compared with the infants' behaviors. Steele and Pederson (1977) investigated the effect on visual fixation and manipulation with toys across 10 continuous trials in 26 weeks old infants. They were presented the same toy for the 1st to 7th and 10th "trails" and a novel object was introduced in the 8th and 9th "trails". Fixation and manipulation times were found to decrease at each trial. However, fixation time was increased at the 8th trail when a novel object was introduced, different in either color, shape, texture or shape and texture. Similarly manipulation time was increased when the novel object contained different shape and texture. However, the manipulation time was found to continuously decrease when the novel object only differed in color.

While given of the parameters were controlled, particularly in Experiment 1b, within the pairs of weights, a higher weighting on ω<sup>1</sup> will drive the agent to spend longer exploring the same object, and a higher weighting on ω<sup>4</sup> will encourage the agent to try different actions. By adjusting each of the weights, different behaviors can be simulated. This could be considered as modeling different infants preferences, or different external conditions under which the agent is acting. Currently the weights are fixed at the start of an individual experiment, but in the future allowing the agent to vary these, could generate a shift from exploratory play behavior to more exploitative or focused behavior.

### 3.2. Discussion of Experiment 2

In the second experiment, both agents (simulator and iCub) have shown the capability of playfully exploring their environment to discover object-action behaviors and construct hierarchical actions through the use of schema chains. In the first stage, both agents demonstrated the exploratory play behaviors in their respective environment. The second stage highlights the exploitation capability of the agents, by demonstrating direct re-use of their behaviors, learned during the exploratory play. The chains in the experiment shows the high level behaviors containing several steps in between to achieve a more distant state, i.e., postconditions of the last schema in the chain.

The simulated individuals in our experiments, have shown the capability of generating action sequences to achieve further states in the environment. The action sequences are obtained through exploratory play behavior. This behavior is followed by the exploiting behavior in stage 2, where either of the agents reused the learnt behaviors to attain the state in the environment which was obtained previously with individual actions selected on the basing of excitation at each time step to reach the experimenters aim of dropping ball in the hole. The two modes of the chain executions, in the execution mechanism of the agents, are modeled on the "Chain reflex" and "Motor program" theories discussed as previously.

### 4. CONCLUSION AND FUTURE WORK

In this work we have presented a schema-based play generator for artificial agents inspired by Piaget, termed Dev-PSchema. With experiments in both a simulated environment and with the iCub robot, we have demonstrated the ability of the system to create schemas of sensorimotor experiences from playful interaction with the environment. In particular, the proposed model captures concepts related to similarity, novelty and habituation, as a result of the agent interacting with objects, leading to the expression of different exploratory behaviors.

The first experiment has demonstrated the variations in the behaviors of the agent by changing the weights of parameters (ω1, ω2, ω3, and ω4). Experiment 1(A) illustrates the variation in behaviors of the agent by changing the weights of the similarity and novelty/habituation pair (ω<sup>1</sup> and ω2), while keeping the object and schema excitation weights constant (ω<sup>3</sup> and ω4). Similarly, experiment 1(B) demonstrates the variations in behaviors of the agent by changing the weights of the object and schema excitation (ω<sup>3</sup> and ω4), keeping similarity and novelty/habituation weight pair (ω<sup>1</sup> and ω2) constant. This aspect of the system enables us to simulate different individuals with individual behaviors rather than a single simulated agent with average behavior. It also enables the agent to switch behaviors from exploratory to more focused behavior and vice versa.

With the experimental results reported above, we have demonstrated the capability of the proposed learning system, Dev-PSchema, to simulate different individuals. By varying the weights of the excitation parameters, the agent has shown different preferences within the experimental environment. For instance, regarding the second experiment, the agent's preferences were based on its experiences in the environment, while fixing other model parameters. We have focused on the excitation mechanism and its parameters to demonstrate its importance in the agent's behaviors. The agents' behaviors show attention, interest and their preferences in the environments.

The second experiment, presented in Section 2.8, has demonstrated the play behavior of the agent in the environment and examined the potential effects of the actions on different objects. The agent was able to create a new schema while grasping the ball in the simulator, and multiple different grasp schemas were learned by the iCub due to changes in perception and the environment. For both the simulator and the iCub, the agents did not create any new schemas for grasping the hole as this does not make any change in the environment. This behavior shows that the agent is capable of learning the effects of its actions on different objects. Thus the agent learns the behaviors with objects through exploration. Furthermore, the agent reuses learnt schemas during the exploitation stage. This stage reflects the sensorimotor stage of Piaget's theory (Piaget and Cook, 1952), where infants are described as re-using or repeating their learnt behaviors involving their bodies on the interesting objects.

The second experiment also demonstrated the capability of the system to be integrated with different platforms, transferred from the simulator to the iCub robot in a laboratory environment, without making any changes to the system. In both experiments we demonstrated that the agent shows playful and exploratory behaviors. While Dev-PSchema also enables the agent to simulate different individuals with different preferences, within the current system, weights of the excitation parameters remain

#### REFERENCES


constant during a run and all properties are weighted equally in the object excitation.

In the future, extensions to the system will be carried out to allow the agent to dynamically adjust the weights and to learn the importance of different properties of the object, such as shape vs. color (Kumar et al., 2016a), in order to adjust the property weights accordingly. In addition, we plan to further develop the generalisaton mechanism to address the noise associated with real-world environments.

We also intend to develop the capability of the system to create and use chains with generalized schemas, so that the chains can be utilized with novel objects and in different situations. We will develop the system to exploit the chains as a high level action in different chains creating chains of chains. The chaining system will be improved further to provide an optimal action/schema sequence to achieve a user-defined target state. This will help to evaluate learning by testing the systems ability, in an effort to find solutions for user-defined problems using schemas learnt through play behaviors. In extension, we further plan to develop the system to learn from demonstrations and interactions with the other agents (human or robot). Alongside this, the system will be developed to generalize the properties of the objects and learn their limitations. For example, with the generalized reach schemas, the agent could be expected to learn the limits of the reach space in the environment.

#### AUTHOR CONTRIBUTIONS

Work is predominantly based on the Ph.D. by SK supervised by PS. AG and RB are responsible for the low-level system for the iCub robot described in sections 4.6 and 4.8 as part of a project managed by PS, ML and QS. All authors can contributed to the editing of the paper.

#### ACKNOWLEDGMENTS

This research is supported by the Aberystwyth University Doctoral Training Program, Faculty Development Program Sukkur IBA University Pakistan and the UK Engineering and Physical Sciences Research Council (EPSRC), grant No. EP/M013510/1. We are grateful for contributions from our recent research colleagues, in particularly Dr Michael Sheldon, for the development of the PSchema tool.


Slater, A., and Bremner, J. G. (eds.). (1989). Infant Development. Psychology Press.

Steele, D., and Pederson, D. R. (1977). Stimulus variables which affect the concordance of visual and manipulative exploration in six-month-old infants. Child Dev. 48, 104–111. doi: 10.2307/1128887


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Kumar, Shaw, Giagkos, Braud, Lee and Shen. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Know Your Body Through Intrinsic Goals

Francesco Mannella<sup>1</sup> \*, Vieri G. Santucci <sup>1</sup> , Eszter Somogyi <sup>2</sup> , Lisa Jacquey <sup>2</sup> , Kevin J. O'Regan<sup>2</sup> and Gianluca Baldassarre<sup>1</sup>

1 Institute of Cognitive Sciences and Technologies, National Research Council - CNR, Rome, Italy, <sup>2</sup> Laboratoire Psychologie de la Perception (UMR 8242), Paris Descartes - CPSC, Paris, France

The first "object" that newborn children play with is their own body. This activity allows them to autonomously form a sensorimotor map of their own body and a repertoire of actions supporting future cognitive and motor development. Here we propose the theoretical hypothesis, operationalized as a computational model, that this acquisition of body knowledge is not guided by random motor-babbling, but rather by autonomously generated goals formed on the basis of intrinsic motivations. Motor exploration leads the agent to discover and form representations of the possible sensory events it can cause with its own actions. When the agent realizes the possibility of improving the competence to re-activate those representations, it is intrinsically motivated to select and pursue them as goals. The model is based on four components: (1) a self-organizing neural network, modulated by competence-based intrinsic motivations, that acquires abstract representations of experienced sensory (touch) changes; (2) a selector that selects the goal to pursue, and the motor resources to train to pursue it, on the basis of competence improvement; (3) an echo-state neural network that controls and learns, through goal-accomplishment and competence, the agent's motor skills; (4) a predictor of the accomplishment of the selected goals generating the competence-based intrinsic motivation signals. The model is tested as the controller of a simulated simple planar robot composed of a torso and two kinematic 3-DoF 2D arms. The robot explores its body covered by touch sensors by moving its arms. The results, which might be used to guide future empirical experiments, show how the system converges to goals and motor skills allowing it to touch the different parts of own body and how the morphology of the body affects the formed goals. The convergence is strongly dependent on competence-based intrinsic motivations affecting not only skill learning and the selection of formed goals, but also the formation of the goal representations themselves.

Keywords: developmental robotics, developmental psychology, intrinsic motivations, goals, body

### 1. INTRODUCTION

The first "object" that newborns start to play with is their own body, in particular by engaging with self-touch activities. Body activity starts in the fetus at 8 weeks of gestation with spontaneous movements called General movements (Piontelli et al., 2014). These movements continue to be part of the motor activity of infants during their first months of life but gradually more controlled

#### Edited by:

Frank Van Der Velde, University of Twente, Netherlands

#### Reviewed by:

Subramanian Ramamoorthy, University of Edinburgh, United Kingdom Guido Schillaci, Humboldt-Universität zu Berlin, Germany

\*Correspondence:

Francesco Mannella francesco.mannella@istc.cnr.it

Received: 30 November 2017 Accepted: 17 May 2018 Published: 03 July 2018

#### Citation:

Mannella F, Santucci VG, Somogyi E, Jacquey L, O'Regan KJ and Baldassarre G (2018) Know Your Body Through Intrinsic Goals. Front. Neurorobot. 12:30. doi: 10.3389/fnbot.2018.00030

**109**

movements become dominant (Thelen, 1995). This controlled motor activity (Piontelli et al., 2014) continues for many years after birth (Bremner et al., 2008) and presumably determines the formation of a "body schema" (Rochat and Striano, 2000), a sensorimotor map and a repertoire of actions that constitute the core of future cognitive and motor development.

The importance of self-touch activity for infants is supported by empirical evidence. Infants after birth react differently to external touch events compared to self-touch events: for example, Rochat and Hespos (1997) found that head-turning in response to a tactile stimulation in the mouth area was three times more frequent when the stimulation was externally produced than self-produced, thus showing the unique status of self-touch for infants. Moreover, it seems plausible to consider self-touching as a self-sufficient activity: for instance, we do not need to include vision as part of the sensory input that determines early self-touch events. This is justified first by the very poor use of vision by fetuses in the womb, and second by the fact that infants before 10 months of age seem to not use vision to localize external tactile stimulation on their body (Bremner et al., 2008; Ali et al., 2015).

In this work we propose the theoretical hypothesis, operationalized as a computational model, that early body knowledge in infants is not acquired through random motorbabbling, but guided by self-generated goals, autonomously set on the basis of intrinsic motivations (IMs). The concept of IMs was introduced in animal psychology during the 1950s and then extended in human psychology (Berlyne, 1950, 1960; White, 1959; Deci and Ryan, 1985; Ryan and Deci, 2000) to describe a set of motivations that were incompatible with the Hull's theory of drives (Hull, 1943) where motivations were strictly connected to the satisfaction of primary needs. Different experiments (e.g., Harlow, 1950; Montgomery, 1954; Kish, 1955; Glow and Wtnefield, 1978) showed how exploration, novel or surprising neutral stimuli and even the possibility to affect the environment can modify the behavior of the agents, thereby driving the acquisition of knowledge and skills in the absence of tasks directly required for biological fitness. Further neurophysiology research (e.g., Chiodo et al., 1980; Horvitz, 2000; Redgrave and Gurney, 2006) showed how IMs can be linked to neuromodulator activity, and in particular to dopamine. These results highlighted the role of IMs in enhancing neural plasticity and driving the learning of new skills (Mirolli et al., 2013; Fiore et al., 2014).

Following biological inspiration, IMs have also been introduced in machine learning (e.g., Barto et al., 2004; Schmidhuber, 2010) and developmental robotics (e.g., Oudeyer et al., 2007; Baldassarre and Mirolli, 2013) to foster the autonomous development of artificial agents and the open-ended learning of repertoires of skills. Depending on their functions and mechanisms, different typologies of IMs have been identified (Oudeyer and Kaplan, 2007; Barto et al., 2013; Santucci et al., 2013) and classified broadly into two main groups (Baldassarre et al., 2014): (1) knowledge-based IMs (KB-IMs), divided into (1a) novelty based IMs related to novel non-experienced stimuli, and (1b) prediction-based IMs, related to the violation of the agent's predictions; and (2) competence-based IMs (CB-IMs) related to action, i.e., to the agent's competence to change the world and accomplish self-defined goals. While in their first implementations in computational research KB-IMs and CB-IMs were indistinctly used to drive autonomous skill acquisition (e.g., Schmidhuber, 1991; Oudeyer et al., 2007), different authors underlined how the signal generated by CB-IMs has to be preferred when developing agents that has to learn to accomplish new tasks (Oudeyer and Kaplan, 2007; Santucci et al., 2012; Mirolli et al., 2013). In particular, while KB-IM mechanisms generate learning signals based on the acquisition of knowledge, for example based on the improvement of a forward model of the world, CB-IM mechanisms generate learning signals based on the acquisition of competence, for example based on the capacity of achieving a certain desired state (e.g., the capacity of an inverse model or of a state-action controller to achieve a goal state).

Based on these insights, authors started to use CB-IMs for autonomous skill acquisition (Barto et al., 2004; Oudeyer et al., 2007; Schembri et al., 2007a,b; Hart and Grupen, 2011; Santucci et al., 2014b; Kompella et al., 2015). Recent research has started to use CB-IMs for the autonomous generation and/or selection of goals which can then drive the acquisition of skills (Merrick, 2012; Baldassarre et al., 2013; Baranes and Oudeyer, 2013; Santucci et al., 2016) and the optimization of learning processes in highdimensional action spaces with redundant robot controllers (Rolf et al., 2010; Baranes and Oudeyer, 2013; Forestier and Oudeyer, 2016). The present research has been developed within the CB-IM framework, and particular the model presented here uses competence measures to select goals. In line with empirical and computational perspectives (Balleine and Dickinson, 1998; Russell and Norvig, 2003; Thill et al., 2013; Mannella et al., 2016), and also with most works reviewed above, here goals are intended as agent's internal representations of a world/body state or event (or of a set of them) having these properties: (a) the agent can internally activate the representation of the goal even in the absence of the corresponding world state or event; (b) the activated goal representation has the power of focussing the behavior of the agent toward the accomplishment of the goal and to generate a learning signal when the world state matches the goal ("goal-matching").

Given the connection between CB-IMs and goals, in this paper we present a new hypothesis where these two elements play an important role in the early phases of body knowledge acquisition, i.e., in the first months after the infant's birth. In particular, under our hypothesis the initial infant's exploration determines the formation of proto-representations of sensory events. As soon as the baby discovers the possibility of re-activating those proto-representations a CB-IM signal for obtaining those specific sensory events is generated. This signal improves the information about the current competence (probability of obtaining a sensory event given an action) and the discovered events become intrinsic goals that guide both the learning and the selection of the motor commands to achieve them. Importantly, under the presented hypothesis this "goal-matching" signal also modulates the encoding and consolidation of the outcome representations themselves so that the learning processes defining sensory encoding and motor control are coupled together into an integrated sensorimotor learning system.

Do we really need the notion of goal to account for the development of body knowledge? An important alternative hypothesis might rely on the direct use of IMs to drive the acquisition of stimulus-response behavior, in particular on the basis of trial-and-error behaviors (Sutton and Barto, 1998; Mannella and Baldassarre, 2007; Caligiore et al., 2014; Williams and Corbetta, 2016), which are model-free reinforcement learning strategies, strengthened by intrinsic rewards. We shall not here be evaluating such possible alternatives, since the primary purpose of this paper is to fully articulate and operationalize the goal-directed hypothesis, which is a modelbased reinforcement learning framework, for future simulation studies and empirical tests. Nevertheless it should be said that the goal-directed hypothesis challenges other hypotheses based on stimulus-response/trial-and-error learning processes for at least two reasons. First, the learning of multiple actions (e.g., to touch different body parts) relying on a stimulus-response mechanism seems to require different stimuli able to trigger those different actions. In this respect, actions that allow an infant to touch different parts of own body would start from the same sensory state (touch, sight, proprioception, etc.). Their acquisition thus seems to require some internally generated patterns/stimuli to which to link them: we hypothesize these patterns/stimuli are represented by different goals (goal-based learning, Baldassarre et al., 2013). The use of model-free strategies alone cannot guide behavior in conditions in which the environment does not give enough information to make a choice while modelbased solutions allow decision making through the use of the information stored within an internal model. Second, once the infant has acquired those different actions she should be able to recall a specific one at will, independently of the current sensory and body state: again model-free reinforcement learning would seem to not allow this whereas internally activated goals could allow it (goal-based action recall). In the discussion (section 4) we will consider the differences between our model and other goal-directed approaches.

In this paper we present our hypothesis, implementing a computational model that allows us to investigate the details of the proposed theory and provide quantitative measures that could be useful for future experimental validation. The model is used as a controller (sections 2.2 and 4) for a simulated planar robot composed of a torso and two kinematic 3DoF arms exploring its own body in a 2D environment (section 2.1). Sensory information from self-touch activity is used by the system to form goals and drive skill learning. Results of the tests of the model are presented (section 3) together with their possible implications for ongoing empirical experiments with human infants (section 3.3). Section 4 presents a detailed description of the model equations. The final section of the paper (section 5) discusses relevant related literature and possible future development of the presented model.

### 2. THE MODEL

This section describes the functioning and learning mechanisms pivoting on goals that allow the model to autonomously acquire knowledge on own body.

### 2.1. Agent's Body

The model is tested within a simulated body living in a twodimensional space. The body is formed by two arms each formed by 3 links attached to a "torso" (**Figure 1**). The resulting 6 degrees of freedom (DoF) of the body receive motor commands from the model and as a consequence perform movements. The movements are simulated by only considering kinematics (changes of the joint angles) and no dynamics (the body does not have an inertia).

The body is covered by 30 touch sensors that can activate when touched by a "hand." Note how the sensors, that are uniformly distributed over the body, belong to a one-dimensional space. The activation of sensors is caused by the two "hands" (arm end-points), in particular the sensor activation is computed on the basis of the sum of two Gaussian functions each getting as input the distance of the sensor from respectively the two hands. Sensors that are nearby the extremity of one "hand," including the one on the end-point itself, are only sensitive to the other hand to avoid their permanent activation.

The simulation is divided into trials. Each trial ends after a fixed time interval has elapsed or when the agent reaches the selected goal ("goal-matching" event, see section 2.2).

### 2.2. Overview and Core Aspects of the Model Functioning and Learning

Section 4 illustrates the functioning and learning of the model in mathematical detail. Instead, this section overviews such aspects at a level of detail sufficient to understand the results presented below.

The system is composed of four main components (**Figure 2**): the Goal Generator, the Goal Selector, the Motor Controller and the Predictor. The Goal Generator is responsible for the autonomous generation of the mapping from the domain of sensory input patterns to the domain of internally encoded representations. These representations encode possible states of the world, in particular outcomes of actions, that can be later internally activated as goals. In particular, the Goal Generator receives as input the positive change of the activation of the touch sensors distributed over the body. This change is encoded into

two-dimensional patterns where the first dimension represents the spatial location of sensors on body, and the second represents the sensor activation amplitude (see section 4 for details). The Goal Generator then performs an unsupervised clustering of the perceived changes by using a self-organizing neural map (SOM). Each output unit of the SOM learns to respond to sensory input patterns that best fit with the prototype stored in the afferent weights of that unit. The output layer of the SOM tends to preserve in its topology the similarity present in the sensory input space: units that are closer to each other in the SOM output layer acquire prototypes that correspond to patterns that are closer to each other in the sensory input space. The unsupervised learning process driving the online clustering in the Goal Generator, which takes place at each time step of the simulation, is modulated by a measure of the current competence based on the prediction for the occurrence of an action-outcome contingency. This contingency is detected internally as a match between the sensory encoded representations (the outputs of the SOM) triggered at any time step of the trial, and the goal representation, internally activated from the begin of each trial. The competence measure is computed by the Predictor and is further described later in this section. The higher the competence prediction related to a given goal, the lower is the learning rate of the update of the related outcome prototype. Moreover, the higher the average competence prediction of all stored goals, the lower is the learning rate of all prototypes. This two-fold modulation tends to freeze an outcome prototype when the related goal is accomplished with more reliability and when the system becomes able to accomplish all discovered goals.

The Goal Selector is formed by a vector of units, corresponding one-to-one to the SOM output layer units, that localistically encode goals. At the beginning of each trial, the component selects the goal to be pursued by means of a softmax function. The resulting output is a one-hot vector with the winning unit switched on. The input to the softmax function used to decide the winning unit is based on the difference (error) between the competence prediction for each goal, given by the Predictor, and the actual goal-outcome match. In particular, a decaying average of such error is used. Thus, also the goal selection is modulated by the current agent's competence.

The Motor Controller is composed of three components: a dynamic-reservoir recurrent neural network (Jaeger, 2001; Jaeger et al., 2007; Mannella and Baldassarre, 2015), a random trajectory generator and an associative memory. The dynamicreservoir is a recurrent network whose dynamics is regulated by the goal received as input from the Goal Selector. The random trajectory generator outputs a trajectory at each trial based on a sinusoidal oscillator with a randomly-chosen setting of its parameters. Both the read-out units of the recurrent network and the output of the random trajectory oscillator contribute to control the two arms, with the competence for the currently chosen goal defining their relative importance weight. The Motor Controller is trained by means of a novel modelbased reinforcement learnig algorithm exploiting the goal-based reward. The algorithm relies on two processes: (1) The associative memory stores and updates the end-point posture for each goal based on the occurrence of goal-outcome contingencies; (2) The end-point postures stored in the associative memory are then used as models to train the readout of the recurrent network. In particular, the current chosen goal recalls the end-point posture to which it is related in the associative memory, and the readout units of the recurrent network are trained to acquire an attractor dynamics corresponding to that end-point posture. The learning in the associative memory is also guided by the competence for the currently chosen goal. When the competence for a goal is low the learning rate for the update of the relative end-point posture is high, while the more the competence for that goal gets higher the more the learning rate for the update of the relative end-point posture gets lower. Overall, when competence is low the random generation of motor trajectories prevails. Meanwhile, goal-outcome contingency events lead the learning of the end-point postures. As competence gets higher the learning processes are slowed down and the exploitation of the so far learned readout of the recurrent network prevails in defining the motor trajectories. When the agent eventually achieves the maximum competence for a goal the related motor skill is frozen.

As we have seen, all the learning processes within the system depend on the detection of goal-outcome matches corresponding to external action-outcome contingencies. This detection is based on the fact that the units of the generated goal and those of the selected goal have a one-to-one correspondence. In particular, when a couple of corresponding outcome unit and selected-goal unit co-activate a goal-outcome matching signal is delivered to the whole system. The CB-IMs that guide the exploration and the learning of the system are determined by the activity of the Predictor. This component is a linear neural network that gets as input the activation pattern of the Goal Selector encoding the selected goal, and is trained with a supervised learning algorithm to predict the matching signal for that goal (0 in the case of failure and 1 in case of success). The output of the Predictor represents an esteem of the probability of accomplishing the selected goal. This esteem is used to compute two measures of competence. The first measure indicates the system competence for the selected goal, and corresponds to the actual output of the Predictor. The second measure indicates the rate of accomplishment of the selected goal per trial and is given by a decaying average of the error of the predictor trying to predict the goal-outcome matching. These two CB-IM measures modulate the system learning processes.

Summarizing, there are three interacting optimization processes involving three different functions of the system: (1) encoding of action outcomes; (2) motor control learning; (3) competence prediction learning. The optimization of the outcome encoding (1) and the optimization of the motor control (2) are guided by the competence of the agent, which is itself acquired through the optimization of the agent's goal-outcome matching prediction (3).

### 3. RESULTS

This section presents some simulation tests directed to show that the model manages to explore the one-dimensional space of the agent's body and autonomously build knowledge on it. In particular, the autonomous learning processes involving the acquisition of goals and of the motor capabilities to accomplish them under the guidance of contingency detection and competence-based intrinsic motivations converge to a steady equilibrium, thus consolidating the agent's bodily knowledge that allows it to reach at will all different parts of the body with one of the two hands.

The tests involve simulations where the grid of goals (both in the Goal Generator and the Goal Selector layer) is formed by 5 × 5 units (25 possible goals) and the agent's body is uniformly covered by 30 touch sensors. We now illustrate the results of the tests in detail.

### 3.1. Coverage of the Body Space by the Acquired Knowledge

We performed 20 different simulations lasting 8,000 trials each. At the beginning of each trial the units in the Goal Selector layer were recruited as representations of a different desired goal, while the units in the Goal Generator layer were triggered by the online encoding of the sensory inputs. The Goal Generator/Goal Selector matching pairs became related to touch events centered in different points in the body space during learning.

We now focus on the analysis of the data referring to one simulation representative of the average performance of the system.

**Figure 3A** shows which sensors are activated when the different goals are pursued. The figure shows how after learning different goals produce touch events that cover the whole body space. This shows that the goals that the agent forms only partially overlap and also succeed to cover the whole body space.

**Figure 3B** shows the average activation of sensors over all the learning trials. The figure shows that different parts of the body are touched with different frequencies. In particular, the "chest" area is touched very frequently whereas areas around sensors at distance 0.28 and 0.72 from the left hand (where 1 is the length of the whole body) are touched less frequently. This different frequencies are due to the topology of the agent's body. Random exploration favors the touch of the chest, exposed to reaching of both arms, while disfavors the touch of the "shoulders," "hidden" in the angle formed by the chest and one arm, and with a medium frequency the rest of the arms, fully exposed to the reaching of the controlateral hand. The touch events activating sensors on the "hands" always involve both of them and so they tend to have peak frequencies.

**Figure 4A** analyses the SOM receptive fields related to the different outcomes encoded by the Goal Generator after the completion of the learning process. Recall that each 20 × 20 field represents the activity of a map where the horizontal axis refers to the different sensors located on the one-dimensional body space and the vertical axis refers to the intensity of their activation. **Figure 4B** shows the posture of the two arms learnt to reproduce the touch event related to each goal. The figures show a tendency of the grid of goals to represent multiple aspects of the touch events. In particular, going from the bottom-left to the topright of the grid receptive fields tends to represent touch events involving the chest and one arm and then both arms. Instead, the bottom-right dimension of the grid tends to represent touches involving more the left or the right part of the body.

## 3.2. Stability of the Acquired Knowledge

The system reaches a steady equilibrium guided by the increase of competence for the different goals. **Figure 6** shows how the mean of the competence for each goal, self-estimated by the agent as the prediction to activate the target sensors corresponding to the selected goal (see section 4.1.1.4.), grows until the agent reaches the maximum competence for each goal. At that point, all learning processes regarding body knowledge halt.

The raster plot at the top of the **Figure 6** plots illustrates the positions in time of the matching events (corresponding to the touch of the sensors related to the selected goal). Each row of the raster plot refers to one of the 25 goals. At the beginning of the learning process the system selects different goals and focuses on them until it has properly learnt how to achieve them (see the bottom-left plot of **Figure 6**).

Note how the system tends to focus on single goals with some persistence after they are discovered (see also **Figure 5**). This is due to the fact that the competence signal used to select goals changes slowly. This feature turns out to be important for the convergence of the system. Indeed, if one uses a non-smoothed version of the signal (by setting τ<sup>ξ</sup> = 1 in Equation 20) then the focussing disappears and the system fails to converge. A possible interpretation of this results might be as follows: the focus on a specific goal leads the system to acquire a high competence for that goal; the high competence for the goal stabilizes both the goal representation and the related motor skill; the acquired goals/skills furnish an enough stable "structure" that the system can leverage to build the other goals and skills.

When the learning process converges, the system continues to test each goal and the prediction is maintained at its maximum (see the bottom-right plot of **Figure 6**). At this point all goals start to be equally and randomly selected as they are no more interesting for the system but this, in its current state,

must still engage in some activity (see the bottom-left plot of **Figure 6**). This means the intrinsic motivation and contingency detection capabilities would be ready for the exploration of other sources of knowledge if they were available to the agent.

**Figure 7** shows the history of the goal formation during the learning process. The figure shows that at different stages of development some goals have been formed but then they are temporary "deleted" and then replaced in the following stages. This indicates that the system searchers an overall goal configuration and motor skill repertoire that settles only when the acquired knowledge covers the whole body space, as shown in **Figure 3**. The graph also shows that the goals tend to form starting from the outer ring of the map units and then to involve inner units of the map. This might reflect the formation of broad goal categories (and related motor skills) followed by more refined categories. Further investigations are needed to confirm this interpretation.

#### 3.3. Features of the Behavior of the Model

Some peculiar features of the model behavior emerge during its development. These could be possibly compared with the behaviors of children in future empirical experiments.

#### 3.3.1. The Time to Accomplish the Goals Diminishes as Learning Progresses

During development, the time taken by the movements to cause the desired touch sensation set by the different goals progressively decreases. In this respect, **Figure 8** shows how the mean trial duration to accomplish the goals actually decreases with the "age" of goals. This reflects the fact that the motor accuracy of the system improves with time dedicated to learn the motor skill of each goal. **Figure 9** confirms this interpretation. The figure shows the relation between the time needed to accomplish a goal and the competence of that goal (systems goalmatching prediction probability). The figure shows how lower trial durations are positioned at the bottom-right part of the plot where the value of competence for the goals is very high. The

FIGURE 5 | An example of the initial focussing on goals that are just been experienced. On the top a raster plot with each row indicating the match events for a goal. On the bottom a plot indicating the current changes in the weights of the motor controller (Euclidean distance from weights at the previous time step). The initial match event produces a great change in the weights. The following ones refine the motor skill, and the corresponding outcome sensory abstraction, until the competence for the goal is completely acquired.

color of the dots, related to the "age" of goals, also indicates that performance time and competence improve with the amount of learning dedicated to each goal. In this respect, note how the competence tends to reach values close to 100% after about 100 successes (matching events). The motor skill, however, continues to increase as shown by the lower trial duration after 200 trials. It might be surprising that motor ability for a goal continues to improve even when the competence-based intrinsic motivation signal becomes low. This is due to the fact that while this signal continues to exert its effect on the selection of the goals, when a goal is selected the related motor skill continues to be trained as much as possible, as it should, by the mechanism driving the echo-state network to produce the goal-related desired posture.

### 3.3.2. Easy Postures Are Acquired Before Hard Ones

During the development of the sensorimotor behavior of the agent there is also a change in which postures are explored. Indeed postures that are easier to be reached due to the physical

FIGURE 6 | History of goal predictions, indicating the probability of success when pursuing a given goal, during the learning process. Top plot: the black line indicates the average prediction over all the 25 goals, the dark gray shadow indicates the standard deviation, and the light gray shadow indicates the worst and best skill; the raster plot in the upper part shows the matching events for each goal, where different rows correspond to different goals. Bottom plots: zoomed visualization of the initial phase and convergence phase of the learning process, respectively.

constraints of the actuators are discovered since the first trials of the simulations while postures that are more difficult to achieve are acquired later on. **Figure 10** shows this phenomenon. from bottom to top several plots are presented indicating the mean activation of the touch sensors during different 10,000 timestep-long time intervals. It is evident how during the initial intervals the curve of sensor's activations follows the white line, representing the mean of sensor's activations recorded in a simulation where the agent's motor behavior is kept strictly random. This is an indication that at the beginning of the experiment, postures that are common during random behavior (and thus can be considered less difficult to reach) are more likely to be chosen than others. Instead, going in the top part of the plot series the curve of sensor's activation depart from the white line confirming that the agent is more likely to be focused on postures that are more rare during random behavior (and thus can be considered more difficult to reach).

#### 3.3.3. Areas With More Density of Sensors Are Explored Before Other Ones

The influence of the density of sensors within different regions of the body during the development of self-touching behaviors was also explored. To this end, a different simulation was run in which 10 sensors (one third of the total) were uniformly distributed within the first two thirds of one dimensional body space of the agent, while the remaining 20 sensors (two thirds of the total) were uniformly distributed within the last one third of the body space. **Figure 11** shows the overall effect consisting

in a different distribution of the receptive fields of each sensor with respect to the standard simulations (**Figure 11A**—compare it with **Figure 3A**) and a different curve of sensor's activation means after learning in which activations are shifted to the right part of the body space (**Figure 11B**—compare it with **Figure 3B**). More importantly **Figure 12** shows that during the initial phases of development (bottom plots) the means of sensor's activations is shifted to the left with respect to the standard development (see **Figure 10** for a comparison) and this shifting is reverted only later on in the development.

#### 4. METHODS

#### 4.1. Model Detailed Implementation

#### 4.1.1. Goal Generator

The Goal Generator performs the unsupervised formation of the abstract representations of the touch-sensor activation patterns that the system can select as goals. The activation of the touch

sensors is filtered so that only the positive changes in the somatosensory activations are considered. The change pattern is transformed into a two-dimensional map of units where the horizontal dimension encodes the different sensors and the vertical dimension spatially encodes the activation intensity of each sensor change: this is done by determining the height of a Gaussian function used to activate the column units related to a certain sensor. **Figure 13** shows this process with an example.

The Goal Generator is implemented as a self-organising map (SOM, Kohonen, 1998). SOMs are a particular kind of neural network that is able to categorise all the patterns of a given dataset in a unsupervised manner (Kohonen, 1982). Each node of the output layer of a SOM learns to detect the distance of input patterns from a prototype pattern stored in the connection weights of the unit. SOMs also acquire information about the distance between the different cluster prototypes by storing it in the n-dimensional topology of their output layer.

More in detail, we refer to the case in which the input to the SOM is a vector **<sup>x</sup>** <sup>∈</sup> <sup>R</sup> n , and its output organised in a two-dimensional map and unrolled into the vector **<sup>y</sup>** <sup>∈</sup> <sup>R</sup> m. Generally in SOMs each output unit y<sup>j</sup> the output layer of the SOM computes the distance of the input **x** from each output-unit weight vector **<sup>w</sup>**<sup>j</sup> <sup>∈</sup> <sup>R</sup> <sup>n</sup> belonging the network connection weight matrix **W** = [**w**1, · · ·**w**<sup>j</sup> , · · ·**w**m] T :

$$\mathbf{y}\_{j} = \|\mathbf{x} - \mathbf{w}\_{j}\|\_{2}^{2} \tag{1}$$

shows the mean activation of each sensor over the trials where a given goal is chosen. (B) Mean activation and standard deviation of each sensor over all the trials after convergence of learning (black curves and gray area). The white curve indicates the touch frequency with random movements. Data collected over 1,000 trials after the learning process stabilized to maximum competence for all goals.

where ||.||<sup>2</sup> 2 is the square Euclidean norm of a vector. The weight vector **w**<sup>j</sup> is the prototype of the cluster represented by output unit y<sup>j</sup> . The best matching ("winning") unit ywin is the output unit whose prototype is closest to the current input:

$$
\min = \arg\min\_j y\_j \tag{2}
$$

However, here the activation of the map units was computed in a different and more biologically plausible way as a standard weighted sum of the input signals, minus a bias depending on the prototype weights size. To show this, we transform the selection of the winning unit as follows:

$$\begin{split} \min &= \underset{j}{\arg\min} ||\mathbf{x} - \mathbf{w}\_{j}||\_{2}^{2} \\ &= \underset{j}{\arg\min} \left( (\mathbf{x} - \mathbf{w}\_{j})^{T} (\mathbf{x} - \mathbf{w}\_{j}) \right) \end{split} \tag{3}$$

$$\mathbf{y} = \operatorname\*{arg\,min}\_{\boldsymbol{\beta}} \left( \mathbf{x}^T \mathbf{x} - 2 \mathbf{w}\_{\boldsymbol{\beta}}^T \mathbf{x} + \mathbf{w}\_{\boldsymbol{\beta}}^T \mathbf{w}\_{\boldsymbol{\beta}} \right),$$

and since the term **x** T **x** can be ignored because it is constant with respect to the minimization we have:

$$\begin{aligned} &= \operatorname\*{arg\,min}\_{j} \left( -\mathbf{w}\_{j}^{T} \mathbf{x} + \frac{1}{2} \mathbf{w}\_{j}^{T} \mathbf{w}\_{j} \right) \\ &= \operatorname\*{arg\,max}\_{j} \left( \mathbf{w}\_{j}^{T} \mathbf{x} - \frac{1}{2} \mathbf{w}\_{j}^{T} \mathbf{w}\_{j} \right) \end{aligned}$$

This leads to compute the activation of each output units as a standard weighted sum of the input minus a weight-dependent term:

$$\mathbf{w}\_{j} = \mathbf{w}\_{j}^{T}\mathbf{x} - \frac{1}{2}\mathbf{w}\_{j}^{T}\mathbf{w}\_{j}.\tag{4}$$

FIGURE 12 | Mean activations of each sensors over several intervals (10,000 timestep) from the beginning (bottom) up to 100,000 timesteps. In the initial part of the simulation the distribution of touches tends to be greater in the right region of the body.

This formulation of the output layer activation has some advantages (Martín-del Brío and Blasco-Alberto, 1995), so we use it here. In particular, it is biologically plausible and allows the comparison of the units activations with a threshold (see below in this section).

The sets of connection weights reaching a given output unit j (unit prototype) are updated at each iteration as follows:

$$
\Delta \mathbf{w}\_{\circ} = \eta\_{sum} \Theta \left( \dot{w} \dot{n}, j, \theta\_n \right) (\mathbf{x} - \mathbf{w}\_{\circ}) \tag{5}
$$

where ηsom is a learning rate and 2 i, j, θ<sup>n</sup> is a function of the distance of a unit j from a unit i. In the classic SOM algorithm, a threshold distance θ<sup>n</sup> is used to define the "winning neighbourhood" that 2 = 1 if the distance of the output unit y<sup>j</sup> from ywin within the output neural space is below θn, and 2 = 0 otherwise. Both the distance threshold θ<sup>n</sup> and the learning rate ηsom are then exponentially decreased on each iteration so that increasingly fewer units surrounding the winning units undergo learning.

We deviate from this standard learning algorithm in one important way so as to cope with the open-ended learning nature of the architecture, where new goals can be continuously discovered, by linking the goal-formation to the competence in accomplishing them. In particular, both the learning rate etasom and the neighbouring threshold θ<sup>n</sup> are updated on the basis of the competence-based intrinsic motivation measure as follows:

$$
\Delta \mathbf{w}\_{\circ} = (1 - \bar{\psi})(1 - \psi\_{\circ}) \Theta \left( \operatorname{win}, j, (1 - \bar{\psi}) \right) (\mathbf{x} - \mathbf{w}\_{\circ}) \tag{6}
$$

where ψ<sup>j</sup> is the competence-improvement of the SOM output unit j and ψ¯ is the average of such measure for all units.

In order to compute the matching signal, the output of the SOM is filtered so that it results in a binary pattern **o** whose elements are all set to 0 with the possible exception of the element corresponding to the winner unit: this is set to 1 in the case its activation exceeds a threshold θo.

#### 4.1.2. Goal Selector

The Goal Selector is responsible for the autonomous selection of goals at the beginning of each trial. The selected-goal pattern is sent to the Motor controller to generate a movement and is also used to compute the matching signal. The Goal Selector is implemented as a layer of units **g** corresponding one-to-one to the units of the output layer of the Goal Generator. All elements of **g** are set to 0 with the exception of one element set to 1. The element set to 1 is decided on the basis of a probabilistic sampling based on probabilities computed through a softmax function getting as input the current competence improvement ξ of the goals:

$$p(\mathbf{g}\_{\circ}|\boldsymbol{\xi}) = \frac{e^{\frac{\boldsymbol{\xi}\_{\circ}}{\boldsymbol{\nu}}}}{\sum\_{i} e^{\frac{\boldsymbol{\xi}\_{i}}{\boldsymbol{\nu}}}} \tag{7}$$

where γ is the "temperature" parameter of the softmax regulating how much the generated probabilities tend to favour goals with a higher competence improvement.

#### 4.1.3. Motor Controller

The **g** pattern is used as input to the Motor controller sending commands to the joints of the arms. The Motor controller is formed by three components: (1) an echo-sate neural network ("dynamic reservoir network") whose 6 output ("readout") units encode the desired angles of the joints of the arms; (2) a structured noise that produces a random trajectory averaged with the echo-state network commands to support exploration; (3) an associative memory that learns to pair to each goal g<sup>j</sup> a desired final posture causing the matching event related to the goal, and acquired by reinforcement learning: when the selected goal is matched, this posture is learned in a supervised fashion as a desired attractor-posture by the echo-state network. These components are now illustrated in detail.

#### **The motor controller**

Dynamic reservoirs are sparse recurrent networks that respond to inputs with dynamics that are close to chaotic behaviour, meaning that their activation is very rich but still non-chaotic (the "reservoir property," Jaeger, 2001). Similar inputs produce similar dynamics. Moreover, when the network is fed with a constant input its activity goes through a transient dynamic activation and then settles to a stable attractor. This attractor is formed by zero values when the input is a vector with zero elements, and is formed by a certain pattern when the input is formed by a vector with some non-zero elements (different input patterns cause different attractors). The transient activation feature allows dynamic reservoir networks to learn to produce temporal sequences in output. Their convergence to attractors with constant input patterns allows them to produce movements that converge to specific postures Mannella and Baldassarre (2015). Moreover, reservoir networks have a great capacity of storing different responses to patterns because they can produce an expansion of the dimensionality of the input patterns when the number of the internal units is high with respect to the number of the input-layer units.

The units of the reservoir network used here—a leaky echostate network—have a leaky activation potential **r** and an activation **a** as follows:

$$\tau\_{dr}\dot{\mathbf{r}} = \, -\mathbf{r} + \mathbf{W}\_{\mathcal{S}\to r}\mathbf{g} + \mathbf{W}\_{r\to r}\mathbf{a} \tag{8}$$

$$\mathbf{a} = \left[ \tanh \left( \mathbf{r} \right) \right]^{+} \tag{9}$$

where τdr is a temporal factor, **W**g→<sup>r</sup> is the matrix of weights connecting the selected-goal units **g** to the reservoir, and **W**r→<sup>r</sup> is the matrix of internal connections. The initial values of **W**r→<sup>r</sup> is generated with a Gaussian noise.

After being generated, the matrix has been normalised to satisfy the reservoir property (Jaeger, 2001):

$$1 - \epsilon < \rho \left(\frac{\delta t}{\tau\_{dr}} \mathbf{W}\_{r \to r} + \left(1 - \frac{\delta t}{\tau\_{dr}}\right) \mathbf{I}\right) < 1\tag{10}$$

where ρ (**M**) = max<sup>j</sup> |λj | is the spectral radius of a matrix **M** with eigenvalues λ<sup>j</sup> , and **I** is the identity matrix.

The reservoir internal units are connected to a layer of readout units **z** setting the values of the joint angles of the arm:

$$\mathbf{z} = \left[ \tanh \left( \mathbf{W}\_{a \to z} \mathbf{a} \right) \right]^{+} \tag{11}$$

The reservoir learning involves the weights **W**a→<sup>z</sup> and is directed to produce a mapping from the selected-goal **g** received in input and the desired postures **D** produced in output and stored in the motor associative memory (see below). Indeed, **D** represents the posture experienced at the moment of the matching involving the selected goal received as input by the reservoir. To this purpose, the weights are modified at each step of the trial as follows:

$$
\Delta \mathbf{W}\_{a \to z} = \alpha ( (\mathbf{d}\_s - \mathbf{z}\_t) \odot \mathbf{z}\_t') \mathbf{a}\_\mathbf{t}^T \tag{12}
$$

where α is a learning rate, **d**<sup>s</sup> is the desired posture stored in the associative memory **D** and corresponding to the selected goal **g**<sup>s</sup> sent as input to the reservoir, **z**<sup>t</sup> is the output pattern of the reservoir at time step t of the trial and **z** ′ t its element-wise first derivative, ⊙ is the element-wise product, and **a**<sup>t</sup> is the activation of the reservoir internal units. If **d**<sup>j</sup> has not yet been generated, as the selected goal has never been matched, learning does not take place.

#### **The random trajectory generator**

During learning, the output of the reservoir merged with the output of a random trajectory generator to foster motor exploration. To this purpose, at each trial the random trajectory generator produces a sinusoidal trajectory, having a frequency randomly drawn from a certain random range, for each joint j:

$$m\_{\circ} = \cos\left(2\pi f \frac{t}{\beta} + \pi\right) \tag{13}$$

where f is a random frequency in the interval [0, 1] and β is a scale factor.

The final motor command issued to the joints, **m**, is a weighted sum of the reservoir output and the random trajectory generator, using as weight the competence ψ<sup>j</sup> of the selected goal:

$$\mathbf{m} = \pi \left( \psi\_{\vec{j}} \mathbf{z} + (1 - \psi\_{\vec{j}}) \mathbf{n} \right) \tag{14}$$

#### **The associative memory**

Every time there is a matching of selected goal g<sup>j</sup> , the target posture associated to it, **d**<sup>j</sup> , is updated as a decaying average of the experienced postures **p**:

$$
\sigma\_d \dot{\mathbf{d}}\_j = -\mathbf{d}\_j + \mathbf{p} \tag{15}
$$

where τ<sup>d</sup> is a decay factor. This factor is modulated by the competence of the selected goal:

$$
\pi\_d = \frac{1}{1 - \psi\_j} \tag{16}
$$

This implies that with low competence the target posture corresponding to the selected goal is strongly updated towards the experienced posture causing the accomplishment of the generated goal (corresponding to the selected goal), whereas with a high competence it freezes on its current values.

#### 4.1.4. Competence Measures

This section shows how the model computes the competence for goals through the online optimization of the outcome-goal contingency prediction.

During each trial a goal matching happens (and considered equal to 1) if the Goal Generator activated unit at a given timestep corresponds to the goal selected by the Goal Selector at the beginning of the trial (otherwise the matching is considered equal to 0 at the end of the trial):

$$
gamma = \mathbf{o}^T \mathbf{g} \tag{17}$$

A linear neural network getting as input the selected-goal pattern predicts the goal matching (0 in the case of failure, 1 in the case of success):

$$pred = \boldsymbol{\Psi}^T \mathbf{g} \tag{18}$$

Initially, the values ψ are set to zero so the prediction is equal to 0. The predictions are learned to predict based on the difference between the current match and pred values:

$$
\Delta\Psi = \eta\_{pred} \left( {match \, - \, pred} \right) \odot \mathbf{g} \tag{19}
$$

Given that the range of both match and pred is [0, 1], the possible values of the elements of ψ tend to be in the same range. Each element ψ<sup>j</sup> is then a measure of the competence for the goal g<sup>j</sup> as, given the 0/1 values of this, it tends to represent the probability of achieving such goal when it is selected.

The model also uses the (match − pred) error to compute a second measure of competence that changes more slowly with respect to the first one by applying to it an exponentially decaying average:

$$
\pi\_{\xi}\dot{\xi} = -\xi + \left[ {mathbb{I} - pred} \right]^{+} \tag{20}
$$

The choice of using only the positive part of the prediction error ([.]+) (cf. Santucci et al., 2013) is due to the fact that the intrinsic motivation signal is related to competence, thus when the system fails to accomplish a goal the leaky value (and motivation) converges towards zero rather than towards negative values. The exponential decaying average causes a slow change of the signal: as we shall see, this is important for the focussing of the system for some trials on the discovered goals and this in turn affects the convergence of the model. The parameters of the model are shown in **Tables 1**, **2**.

#### 4.2. Source Code

The model was developed using the Python programming language. Simulations to find the best parameters were run through the computers of the Grid'5000 system, allowing free access and use of high performance computing resources. Analyses and plots were made by using the R programming language. The source code of the simulations is available at: https://github.com/GOAL-Robots/CNRUPD\_010618\_ sensorimotorcontingencies.

### 5. DISCUSSION

In this work we investigated the hypothesis that self-generated goals and Intrinsic Motivations (IMs) may play an important role even in the early development of knowledge on own body and basic motor skills. This hypothesis, supported by empirical data (section 1), has been incorporated in a 2D simulated robot composed of two arms and endowed with touch sensors. The results confirm the computational soundness of the hypothesis (section 3), showing how the model is able to autonomously form a map of self-generated goals, encoded in terms of touchsensations, and to learn the motor skills to reach the different areas of such map. The learning processes allowing the model to acquire this knowledge are completely autonomous and rely on two key processes, the autonomous generation of goals and the use of intrinsic motivations based on competence to select them.

The model autonomously generates goals based on the capacity of its movements to change own sensation, specifically, when the model discovers a contingency between a motor behavior (the achievement of a specific end-posture of the two arms) and the detection of a perceptual change (the activation of the touch sensors). Once generated, goals can play important functions both during learning and during functioning. During learning they can guide the refinement of the motor behavior leading to them, in this case the movements to produce the perceptual change (as in GRAIL architecture, cf. Santucci et al., 2016). In particular, the activation of the internal representation of a goal allows the model to learn the motor skill to accomplish it independently of the fact that the contextual input from the environment, here the possible states of own body, is always the same. This would not be possible within a stimulusresponse reactive framework, e.g., with standard reinforcement learning models (Sutton and Barto, 1998), as the constant context ("stimulus") would not allow the system to perform different motor behaviors. Instead, the model can learn different motor behaviors as it can associate them to different internallyactivated goals. Moreover, goals support a second function during learning, namely the generation of a "matching signal," produced when the experienced sensation (here the touch

TABLE 1 | Parameters used in the model for all simulations.


TABLE 2 | Sizes of all the components's layers in the model.


sensation) matches the internally-activated goal representation: such signal produces a reward that guides the trial-and-error learning process supporting the refinement of the motor skill directed to pursue the currently-active goal.

During functioning, goals can serve the role of "pointers" to recall the acquired skills. Indeed, the activation of a goal can trigger the performance of the motor skill that accomplishes it even if the context has no change. Again, reactive models cannot do this as they cannot recall different skills unless information from the outside is provided (e.g., in the form of a pointer somehow associated to each skill). Here the activation of goals to test this functionality of the model is done by hand but in the future the enhancement of the model within a developmental framework might endow it with the capacity to autonomously employ goals to recall the related motor skills to accomplish similar desired goals or to facilitate their learning (Seepanomwan et al., 2017), or to compose more than one goal/skill to form more complex policies accomplishing goals at a coarser granularity (Vigorito and Barto, 2010; Hart and Grupen, 2011).

The second important process guiding the body knowledge acquisition in the model is related to competence-based intrinsic motivations linked to the acquisition of the motor skills leading to the desired goals. This motivation is computed on the basis of a mechanism measuring the probability that a skill accomplishes the goal to which it is linked. As it typically happens in intrinsic motivations, these mechanisms are related to the acquisition of information (in this case the capacity to reach own body) and has a transient nature (Baldassarre, 2011), i.e., it leads to decrease the agent's interest in an activity when the competence in that activity has been acquired. In the model, competence-based intrinsic motivations plays several different functions. First, a low competence favors the update of the representations of goals whereas a high competence leads to stabilize them. Second, the opportunity to gain competence guides the selection of the goals on which the agent focuses its exploration and learning resources. Third, high levels of competence for a goal reduce motor noise used to search the motor behavior to accomplish it. Fourth, a low competence for a certain goal leads to a substantial update of the related movement target (and hence of the related movement) whereas a high competence leads to its stabilization. Overall, when integrated these mechanisms lead the agent to converge to stable action-outcome contingencies, namely to both effective movements to accomplishing goals and to stable goal representations.

The dependence of the autonomously formed goal representations on competence is particularly innovative. The introduction of the dependence of goal representations on the competence to accomplish them was a critical step that allowed the model to be able to form stable goals and skills, with goals covering the whole body space in a homogeneously distributed fashion. To our knowledge, this is the first work that uses competence-based intrinsic motivations to modulate the formation of the perceptual representations related to goals, and to show its importance for the overall stability of the discovered action-outcome ("skill-goal") contingencies.

In the computational literature, other works proved the power of self-generated goals and IMs to boost the autonomous learning of knowledge and competences. While the majority of these works focused on the acquisition of some sort of control on the environment (e.g., Vigorito and Barto, 2010; Santucci et al., 2014a; Kulkarni et al., 2016; Forestier et al., 2017; Seepanomwan et al., 2017), here we wanted to test how similar principles could be used to drive the learning of low-level motor skills based on the interaction with own body. In the goal-babbling literature, some works (e.g., Rolf et al., 2011; Baranes and Oudeyer, 2013; Rolf and Steil, 2014) use the autonomous generation of intrinsic goals to learn a mapping between different end-points in the goal space and the corresponding configuration of the redundant effectors of the robots. Differently from these systems, our model is able to jointly map two different dimensions of the agent: the proprioception of its arms (the postures) and the activation of the touch sensors, thus providing a more sophisticated learning of the contingencies related to the agent's body.

Another relevant computational framework regards knowledge gradients (Frazier et al., 2008; Scott et al., 2011; Wu et al., 2017). This is a Bayesian method to optimize the exploration of some alternative options, each carrying a stochastic reward, based on the information-gain gradient related to them and with the objective of a later choice of the best option. Our model has some similarities with the idea of knowledge gradients since it uses a value function over the space of the internal representations of goals to bias their selection. The major difference is that in our model the knowledge gain giving rise to competence-based IM concerns the competence of the motor controller. Instead, in knowledge gradients the knowledge gain regards the increase of the confidence of the estimate of the rewards of options. Moreover, in our model the value function for goal exploration is built upon a "utility function" that is non-stationary, namely the competence of goal which varies with motor learning, whereas knowledge gradients are built upon a value function related to the information gain concerning the esteem of the rewards of alternative options which are fixed. Given these similarities, an interesting line of research would be to cast our hypothesis within a probabilistic framework such as that of knowledge gradients, where the exploration is guided by a measure of competence gain computed through Bayesian optimization.

The hypothesis that goals might be used also to learn low-level fine-grained motor skills is in agreement with evidence from neuroscience. This shows that, alongside high-level goals encoded in the prefrontal cortex (Miller and Cohen, 2001; Mannella et al., 2013), premotor and motor cortical areas might encode movements in terms of goals related to desired end-movement postures (Graziano et al., 2002) or body-object relations (Umilta et al., 2001).

Empirical evidence from developmental psychology relevant to the present model of the acquisition of self-touch behavior is not plentiful. However some evidence can be summoned at a more general level from experiments showing the role played by sensorimotor contingencies in development. In particular, it has been shown that contingencies related to producing relevant changes in the environment, in particular the movement of a mobile toy attached with a ribbon to one arm of a baby but not to the other arm, can lead to an increase movement of the relevant arm (Rovee-Collier et al., 1980).

More recently it has been shown how learning progress related to reaching a toy can be enhanced by the fact that the object produces a sound contingently to its touching (Williams and Corbetta, 2016). Overall this evidence indicates the importance of contingencies for the development of motor skills, and in particular of the fact that actions lead to a change in the world. This is also a key assumption of the model, although it remains to be ascertained by the use of appropriately designed empirical experiments if those contingencies drive the learning of motor skills through the mediation of goalformation, as proposed here, or directly within a stimulusresponse framework.

Another source of relevant research concerns the development of reaching toward the own body and toward objects. This research shows that infants develop progressively from spontaneous to goal-directed arm and hand movements. Wallace and Whishaw (2003) examined hand movements in infants aged 1–5 months and describe a development from closed fists to open hand movements that progress in complexity toward selfdirected grasping. Such "hand babbling" may correspond to the goal-directed babbling of the models behavior. Also, about a month before infants execute their first successful reaches, they increase the number of arm movements in the presence of a toy and raise their shoulders and arms in approximation of a reach (Bhat and Galloway, 2006; Lee et al., 2008), which again suggests that spontaneous arm movements or "arm babbling" prepares the emergence of purposeful reaching. Thomas et al. (2015) documented self-touching behavior in developing human infants over the first 6 months of life. In the initial weeks, they mainly observed movements around the shoulders with the digits in a closed fist configuration, resulting in incidental contacts with the body. From about 12 weeks, movements included palmar contacts, giving a goal-directed, exploratory quality to self-touch.

We shall now review how more specific predictions of the model may be linked to existing data in developmental psychology as well as the perspectives that these predictions open for future research. One first prediction from the model is that movement duration should decrease with learning progress, in particular for specific goals, as the competence to accomplish them increases. Certainly it is true that infants' reaches become smoother and straighter over development. First reaches have irregular, inefficient, curved paths, with several changes in direction and multiple bursts of speed, making the path up to four times longer than a straight line to the object (von Hofsten, 1991). Evidence from infants reaching in the dark shows that this initial inefficiency is not due to continuous visual tracking of the hand relative to the target in a series of corrections. In fact, infants produce similar hand paths and reach characteristics when reaching for glowing objects in the dark (Clifton et al., 1991). It is therefore possible that the inefficient movement phase reflects motor babbling. Similarly, between 6 and 15 months of age, infants' arm movements while banging a block or wielding a hammer become increasingly straight and efficient (Kahrs et al., 2013). We are currently investigating how reaching toward vibrotactile targets on the infant's own body develops between 4 and 6 months of age. Based on the model we expect that reaches toward locations on the body will gradually become faster and more efficient.

A second, obvious, prediction of the model is that skill acquisition should progress from easy to hard skills. In our series of observations of self-touch, we expect to observe a developmental sequence of reaching for parts of the body that are easier to attain, such as the mouth or the hips, toward reaching for targets that are more difficult to find, such as the forehead or the earlobes. Our current investigations with infants between the ages of 2 and 6 month along with the results reported by Chinn et al. (2017) with older infants appear to confirm this developmental trend.

Finally, a third prediction from the model regarding human development holds that uneven density of tactile receptors throughout the body should contribute to determining which areas of the body are contacted earlier. This prediction is partially confirmed by existing empirical data that shows that infants' and fetuses' first self-touch behaviors involve areas with high tactile receptor density such as the mouth or the thumb (De Vries et al., 1982). These regions also produce approach motions of the hands which have faster dynamics as compared to other regions (Zoia et al., 2007). It should be noted however that alternative models, such as that designed to account for fetus behavior by Mori and Kuniyoshi (2010) may make similar predictions.

### AUTHOR CONTRIBUTIONS

FM, VS, GB designed the model and the computational framework. FM carried out the simulation and the analysis of the results. KO, ES, LJ contextualized the computational model within the framework of empirical literature. FM, VS, ES, LJ, KO, GB contributed to the writing of the paper.

### FUNDING

This project has received funding from the European Union's Horizon 2020 Research and Innovation Program under Grant Agreement no. 713010 (GOAL-Robots—Goal-based Openended Autonomous Learning Robots). KO, ES, and LJ were also partially funded by ERC Advanced Grant FEEL, number 323674.

### ACKNOWLEDGMENTS

Experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).

### REFERENCES


Conference on Epigenetic Robotics (EpiRob2007), eds L. Berthouze, G. Dhristiopher, M. Littman, H. Kozima, and C. Balkenius (Lund), 141–148.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Mannella, Santucci, Somogyi, Jacquey, O'Regan and Baldassarre. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Theoretical and Methodological Opportunities Afforded by Guided Play With Young Children

Yue Yu<sup>1</sup> \* † , Patrick Shafto<sup>2</sup> , Elizabeth Bonawitz<sup>1</sup> , Scott C.-H. Yang<sup>2</sup> , Roberta M. Golinkoff<sup>3</sup> , Kathleen H. Corriveau<sup>4</sup> , Kathy Hirsh-Pasek<sup>5</sup> and Fei Xu<sup>6</sup>

<sup>1</sup> Department of Psychology, Rutgers University-Newark, Newark, NJ, United States, <sup>2</sup> Department of Mathematics and Computer Science, Rutgers University-Newark, Newark, NJ, United States, <sup>3</sup> School of Education, University of Delaware, Newark, DE, United States, <sup>4</sup> School of Education, Boston University, Boston, MA, United States, <sup>5</sup> Department of Psychology, Temple University, Philadelphia, PA, United States, <sup>6</sup> Department of Psychology, University of California, Berkeley, Berkeley, CA, United States

#### Edited by:

Pierre-Yves Oudeyer, Institut National de Recherche en Informatique et en Automatique (INRIA), France

#### Reviewed by:

Katharina J. Rohlfing, University of Paderborn, Germany Clément Moulin-Frier, Cogitai, Inc., United States Vanessa R. Simmering, ACT, Inc., United States

#### \*Correspondence:

Yue Yu pkuyuyue@gmail.com

#### †Present address:

Yue Yu, Centre for Research in Child Development, National Institute of Education, Singapore, Singapore

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 31 October 2017 Accepted: 15 June 2018 Published: 17 July 2018

#### Citation:

Yu Y, Shafto P, Bonawitz E, Yang SC-H, Golinkoff RM, Corriveau KH, Hirsh-Pasek K and Xu F (2018) The Theoretical and Methodological Opportunities Afforded by Guided Play With Young Children. Front. Psychol. 9:1152. doi: 10.3389/fpsyg.2018.01152 For infants and young children, learning takes place all the time and everywhere. How children learn best both in and out of school has been a long-standing topic of debate in education, cognitive development, and cognitive science. Recently, guided play has been proposed as an integrative approach for thinking about learning as a child-led, adult-assisted playful activity. The interactive and dynamic nature of guided play presents theoretical and methodological challenges and opportunities. Drawing upon research from multiple disciplines, we discuss the integration of cutting-edge computational modeling and data science tools to address some of these challenges, and highlight avenues toward an empirically grounded, computationally precise and ecologically valid framework of guided play in early education.

Keywords: guided play, computational modeling, data science, direct instruction, free play

## INTRODUCTION

Learning in school is often characterized by structured courses and tasks with discrete and explicit objectives. Yet, learning is a continuous process that also takes place outside the classroom where explicit objectives are not always evident. This is especially true in early childhood interactions at home, where children often learn from everyday interactions with both the physical environment and with social partners (Bruner, 1961; Csibra and Gergely, 2009). How to best navigate between explicit, objective-directed learning and more flexibly driven exploration has been a longstanding topic of debate in education, developmental psychology, and cognitive science (Kirschner et al., 2006; Tobias and Duffy, 2009). This debate surfaces in a number of forms, as direct instruction vs. discovery learning or as work vs. play (Bonawitz et al., 2011; Hirsh-Pasek and Golinkoff, 2011; Clements and Sarama, 2014). Pitting these two interests against each other has neither optimized our understanding of learning, nor produced optimal methods of learning (Wise and O'Neill, 2009). Here, we discuss an integrated approach, guided play, that enables us to rethink learning as a child-led, adult-assisted activity (Weisberg et al., 2013, 2014, 2016). Focusing on everyday interactions in early childhood, guided play is operationally defined as learning that is active and engaged, where the child takes initiative in a playful learning environment and the adult supports, rather than directs, the learning experience. Sitting between free play, where children explore by themselves, and direct instruction, where the interaction is led by an adult and children take a passive role, guided play takes advantage of the latest research in the science of learning.

Educational research indicates that student-led discovery learning that is facilitated by teachers outperforms both direct instruction and unassisted discovery (Mayer, 2004; Honomichl and Chen, 2012). In a meta-analysis comparing explicit instruction, unassisted discovery, and assisted discovery (Alfieri et al., 2011), learning outcomes were more favorable for assisted discovery than for other forms of instruction. These results held for learners of different ages and across different learning domains. Similarly, developmental studies have shown an advantage of adult guidance over both direct instruction and free play, even before children start formal schooling (Han et al., 2010; Fisher et al., 2013; Ridge et al., 2015; Haden et al., 2016; Sim and Xu, 2017; Yu et al., 2018). In both bodies of literature, "guidance" has referred to a variety of practices including modeling, questioning, encouragement, and feedback, and thus it is unclear what particular aspects of guidance are associated with learning (Wise and O'Neill, 2009; Honomichl and Chen, 2012).

In guided play, learning opportunities may be explicitly structured, but importantly the activity is child-led. Specifically, we define "guidance" as adults' involvement that subtly channels the dyadic interactions to fulfill certain pedagogical objectives, while not interfering too much so that the activities remain child-led. The pedagogical objectives can be multi-level: they can focus on specific content knowledge, but can also focus on the emotional, motivational, and metacognitive aspects of the learning process, such as cultivating children's love of learning, promoting their engagement, or making them aware of their own learning process (Weisberg et al., 2014). Our concept of guidance is inspired by the Vygotskian concept of scaffolding (Vygotsky, 1934/1987; Wood et al., 1976; Fernández et al., 2001) and Barbara Rogoff's theory of guided participation (Rogoff et al., 1993). In addition to guidance being tailored to fit individual children's needs and skill level (which is similar to scaffolding), in guided play we also emphasize that guidance should never shift children away from controlling their own learning process. The pedagogical objectives of guidance are therefore broader besides helping children to master particular knowledge or skills, guided play also aims to provide children with an opportunity to enjoy, control, and reflect upon their own learning process, which may facilitate independent inquiry and discovery in the future.

Because guided play requires seamless integration between the adult's objectives to support learning and child-led activity that can be highly fluid, characterizing appropriate guidance requires an understanding of the dynamic nature of an adult–child interaction in context. First, guided play is interactive. How wellchildren can learn from a playful interaction depends on their mental state at the moment—including their level of knowledge, goal, attention, emotion, trust toward the play partner, etc. Therefore, effective guidance should take into account and be contingent upon the mental state of the child. This requires theories to consider the dyad as a system moving toward a joint objective (Fogel and Garvey, 2007; Lavelli et al., 2015; Heller and Rohlfing, 2017), and requires experimental designs and analytical tools that go beyond between-group comparisons to focus on individual dyads. Second, guided play is dynamic. Timing is critical for the guidance to be effective. Providing a label, for example, can be educational at a moment when a child is focusing on the target object, but can be confusing when the child is focusing on multiple objects (Pereira et al., 2014). Similarly, demonstrating object functions when an infant is pointing to the object also supports learning (Begus et al., 2014). For preschoolers, revealing causal features of objects right before, but not after, a demonstration of categorization facilitates children's category learning (Yu and Kushnir, 2016). Existing theories, such as direct instruction and free play, and methodological tools, such as standard statistical tests, are optimized for discrete interventions and are usually applied uniformly across groups of individuals. Characterizing the dynamic nature of guided play will require development of new theories and tools to capture interventions along a continuous timeline. In what follows, we detail these theoretical and methodological matters, the tools that may be used to address them, and the prospects for a theory of guided play.

### THEORETICAL CHALLENGES AND OPPORTUNITIES FOR GUIDED PLAY

Free play and direct instruction have long been contrasted in education and cognitive development (Dewey, 1933; Mayer, 2004; Kirschner et al., 2006; Hirsh-Pasek et al., 2008), and existing mathematical and computational models for the two scenarios have likewise been developed separately because they typically focus on different aspects of learning (Nelson, 2005; Shafto et al., 2014). Free play is based on the constructivist views of learning, which portrays learning as an active process during which the learner repeatedly intervenes on their environment, and updates their beliefs based on information gathered from these experiences (Piaget, 1952). Correspondingly, computational models of free play have largely focused on how to sequentially choose evidence during learning (Nelson, 2005; Settles, 2010; Markant and Gureckis, 2014; McCormack et al., 2016). These models generate predictions about how the optimal next step will depend on the current state and are therefore dynamic. However, such models are inadequate to capture the interactive aspect of guided play because they do not usually simulate a social partner whose behavior is contingent on the learner.

In contrast, direct instruction emphasizes the necessity of outside instructions for learners to successfully navigate a learning task (Kirschner et al., 2006), and focuses on what content should be delivered by instruction (Mayer, 2004). Correspondingly, computational models of direct instruction have focused on the evidence teachers should select to lead learners to the correct answer, given the learner's current beliefs (Shafto and Goodman, 2008; Shafto et al., 2012b, 2014; Frank, 2014; Zhu, 2015; Rafferty et al., 2016). Some of these models simulate the interactive nature of teaching and learning through modeling the teacher and the learner's reasoning about the other's knowledge levels and objectives (Shafto and Goodman, 2008; Shafto et al., 2012b, 2014). However, these models are not dynamic; they select evidence with the immediate goal of the learner arriving at the correct inference. When dynamic extensions have been proposed, they encounter significant

computational challenges that render the models of limited use for modeling real-life scenarios (Rafferty et al., 2016; Yang and Shafto, 2017).

Theories and models of epistemic trust may inform modeling of dynamic interactions between a teacher and a learner. The literature on epistemic trust has investigated the dynamics of reasoning, focusing on a learner's sensitivity to both a teacher's prior knowledge in a given domain (Pasquini et al., 2007; Sobel and Corriveau, 2010) as well as her social group membership when making decisions about whom to trust (Kinzler et al., 2011; Chen et al., 2013). Models of epistemic trust (Eaves and Shafto, 2012, 2017; Shafto et al., 2012a) tend to build upon aforementioned models of direct instruction. Although both of these bodies of work make the prediction that children's epistemic and social evaluation of a teacher should influence their trust in her (and therefore, their sensitivity to her guidance), to date, both the experimental and computational work has focused on the dynamics of trust, but not learning.

Finally, ecological psychology and dynamic systems approaches have been applied to analyze dynamic interactions between adults and children (Bronfenbrenner, 1986; Thelen and Smith, 1996; Fogel and Garvey, 2007). These approaches were foundational in emphasizing the need to view adult–child interactions as a system that evolves through time, as well as the need to situate these interactions in the immediate environment. They also provided invaluable computational tools to analyze patterns of co-activities that emerges along time. Because formal dynamic systems models often focus on overt behavior, applying these models to guided play may require an extension which takes into account the mental state and inferential capacities of both learners and guiding adults.

A unified theory of guided play must combine strengths from previous research to capture the interactive and dynamic nature of learning. A key challenge for proposing such a theory is the development of theoretical frameworks that avoid simulating every possible mental state of the teacher and the learner, which would create intractable computational problems. Even the simplest learning situations involve many potential choices by both learners and guiding adults. For example, when an adult guides a child to learn the name of an object, the adult could choose from a variety of actions (e.g., pointing to the object, holding it, looking at the child, or looking at the object) as well as utterances (e.g., naming the object, or asking a question), and the child could also respond in a variety of ways (e.g., reaching for the object, repeating the word, or displaying a puzzled face). Adults and children nevertheless navigate such situations, making choices while balancing short- and long-term objectives. To simulate these capacities, one approach is to adopt simplified computational models similar to those employed in the educational technology literature. One example is Bayesian knowledge tracing, which instead of modeling the learner's full belief state, focuses on whether the learner has the correct concepts (Corbett and Anderson, 1995; Yudelson et al., 2013). A second approach is to use task-specific information to limit the set of relevant actions. For example, an approach that pairs observation of naturalistic adult–child interaction during a task with an experiment that measures the learning outcome of that task could help to identify the task-relevant subset of information (Yu et al., 2017). Subsequent experimental studies could then test predictions of the model on this reduced set of relevant information rather than the whole set of logical possibilities.

### METHODOLOGICAL CHALLENGES AND OPPORTUNITIES FOR GUIDED PLAY

The interactive and dynamic properties of guided play also pose questions for experimental design and analysis that may require modifications of existing tools and the development of new ones. One source of methodological challenges arises from variations in the effectiveness of guidance based on individual characteristics of the child. Guidance content that is effective for one child may not be effective for a different child. For example, two children may have different misconceptions about what constitutes a triangle (Fisher et al., 2013). One may think a triangle needs to have the point at the top, whereas the other may think a triangle needs to have all acute angles. In this case, different examples should be presented to guide these two children away from their respective misconceptions: it would be more effective to show the first child a real triangle with point in the bottom, and show the second child an obtuse triangle. This intuition is supported by research: research in category learning has shown that a set of evidence that is effective in facilitating one person's learning may be less effective when presented to another person (Markant and Gureckis, 2014; Sim et al., 2015). In addition, individual differences in children's background knowledge, cognitive style, and experiences with different sociocultural practices can all influence the effectiveness of presenting certain content to them (McNamara et al., 1996; Gutiérrez and Rogoff, 2003; Price, 2004). Individual differences remain an important topic for further research.

The timing of guidance is also important: well-timed guidance that is contingent upon the child's prior actions may impact child learning outcomes differently than if the same guidance is not well-timed (Pereira et al., 2014). Such variability in guidance content and timing poses challenges to typical randomassignment controlled experiments, as uniform interventions applied to groups of randomly assigned individuals do not necessarily test the interactive and dynamic predictions of guided play. Yet observational designs are insufficient to tease apart the causal relations between components of guided play and children's learning outcomes. Therefore, new methods and analytical tools are required to select the content and timing of guidance to maximally inform our understanding of the mechanisms involved in guided play.

Advances in data science and technology may provide tools for addressing some of these challenges by providing an opportunity for real-time analysis and feedback, as well as (semi-)automatic analysis of large amounts of time series data. For example, in word-learning scenarios, children look at the experimenter more when they are uncertain about an object label (Hembacher et al., 2017). Thus, an overt behavior, here eye gaze, reveals important information about the learner's mental state, and could represent opportunities for guidance. Technological advances

in eye-tracking equipment and data sharing mechanisms have allowed for the collection and sharing of large-scale, live-stream video data from naturalistic adult–child interactions (Franchak et al., 2011; Databrary, 2012). However, coding and analysis of children's looks are usually conducted manually, which restricts the amount of data that can be utilized and precludes real-time feedback during the interaction. Applying tools of automatic decoding of eye movements and looking, such as those used in vision research (Duc et al., 2008; Gottlieb et al., 2013; Borji and Itti, 2014), may allow for the online recognition of the referent associated with the child's gaze, which, in turn, may help to nominate a range of appropriate guidance "moves" that are contingent upon the child's attention and mental state. Indeed, research in social robotics has implemented gaze and action detection in robot learners to infer human teachers' pedagogical intent based on their gaze and actions, and to react in a contingent way (e.g., when the teacher showed an object with pedagogical cues, the robot turned head to the same object; then when the teacher looked back at the robot's eyes and labeled the object, the robot looked at the teacher and smiled). Human teachers were more engaged and more likely to attribute human-like traits to the robot when the robot displayed these contingent reactions (Lohan et al., 2012). Similar algorithms may also support teachers who provide guidance contingent on the learner's behavior.

Similarly, the learner's affect and engagement play an important role (Greene and Noice, 1988; Rader and Hughes, 2005). In guided play, the joy that accompanies play helps to sustain motivation, interest, and excitement, which should be associated with enhanced learning outcomes (Hirsh-Pasek and Golinkoff, 2003; Weisberg et al., 2016). Unfortunately, given the time-intensive nature of affect coding, the evidence relating affective states to improved learning outcomes is less extensive. Data science tools may be used to automatically identify affect and engagement in real-time video streams for analysis, and to time guidance to foster affect that predict positive short term and long term learning (Littlewort et al., 2006; Yao et al., 2015; Baker et al., 2017). Such analytical tools would allow for direct tests of guided play predictions related to the timing of learning, while employing experimental designs that are similar to those typically used in the developmental and educational literature.

### COUPLING COMPUTATIONAL MODELS AND DATA SCIENCE TOOLS

A more ambitious possibility is to couple models and data science tools to create experiments highlighting times when interventions may yield the strongest test of the theory. Attempts at interactive, dynamic approaches to teaching can be found in the literature of social robotics and intelligent tutoring systems (Anderson et al., 1985; Breazeal, 2002; Thomaz and Breazeal, 2008; Lohan et al., 2012; Nguyen and Oudeyer, 2014; Vollmer et al., 2014; Clement et al., 2015), in which data from expert teachers have been used to train algorithms to learn the contingencies between learner's behavior and teachers' appropriate response (Ruvolo et al., 2008). Such data-driven approaches can serve as a first step for identifying patterns in guided-play interactions. However, to understand characteristics of effective guidance, we also need theory-driven computational models that can represent children's mental states based on their behavior. Such models differ from existing intelligent tutoring systems in that instead of teaching knowledge in specific domains, they are designed to understand the general principles of effective guidance in a wide range of child-led activities that may or may not have an explicit learning goal. Coupling such models with empirical data could inform an algorithm that predicts appropriate guidance based on children's behavior, which could in turn be used in experiments to verify the effect of guidance on children's learning. These experiments would have significant advantages relative to classic training studies, as the intervention is based on an online algorithm which would adapt based on children's moment-by-moment behavior.

Consider how such computational models could be applied to a recent study of guided play (Fisher et al., 2013). This study examined different pedagogical methods on preschoolers' learning of geometric shapes, with increased learning in guided play as compared to didactic instruction and free play. In the guided play condition, the experimenter presented two typical examples (e.g., upright triangles) and two atypical examples (e.g., inverted triangles) in a playful manner, and asked children to determine what makes them the same shape. During children's active exploration the experimenter used questions, encouragement, and feedback to guide them toward the correct answer. Yet, because the interaction was dynamic, the manner and timing of adult guidance were not prespecified in the experimental design, which makes it difficult to pinpoint what aspects of guidance resulted in the enhanced learning outcomes.

Following the aforementioned framework, existing videos of guided play interactions could be used to train a computational model of learning geometric shapes in four steps (**Figure 1**): first, data science tools can identify a set of common taskrelevant behavior during children's active exploration, and cluster behaviors into categories (e.g., children's looking and pointing may be categorized as seeking guidance from the experimenter; their emotion as confident vs. doubtful; their language as statements or questions). Tools of this stage could build upon advances in (semi-)automatic recognition of eye gaze (e.g., Lohan et al., 2012; Smith et al., 2015), emotion (e.g., Baker et al., 2017), natural language including information-seeking questions (e.g., Rothe et al., 2016), among others.

Second, a computational model can be used to simulate children's moment-to-moment beliefs about geometric shapes based on these behavioral patterns. For example, if children point to an upright triangle, look doubtfully at the experimenter, and ask "Is this a triangle because the point is at the top?," their presumed belief about triangles would shift toward the wrong hypothesis of "point at the top," with a flat distribution indicating uncertainty. The model at this stage could be built upon existing work that links behavior with mental states on a microgenetic scale, including those that model shifting hypotheses (e.g., Bonawitz et al., 2014), epistemic trust (e.g., Eaves and Shafto, 2017), and automatic goal inference (inverse reinforcement learning; e.g., Baker et al., 2009).

FIGURE 1 | We propose a framework that integrates computational modeling and data science to address challenges brought by the interactive and dynamic nature of guided play. By modeling children's moment-to-moment mental state from their task-relevant behavior, the proposed framework identifies guidance that are optimized in terms of timing and form, with the objective of sustaining the children's interests toward the learning goal. The italic text provides an example of learning geometric shapes (Fisher et al., 2013) to show how the framework could be implemented to a specific guided play interaction. This framework can facilitate research of guided play by identifying key aspects of guidance within the dynamic and complex interactions children experience in their everyday environment.

Third, a model of guidance can identify the most effective intervention given children's current belief. For example, in the aforementioned scenario, to shift children's belief away from the wrong hypothesis and toward the correct hypothesis, the best example to show may be a real triangle with the point at the bottom. Existing models of teaching, such as the model presented in Rafferty et al. (2016), has used partially observable Markov decision process to optimize teaching actions given the learner's observed behaviors as well as previous teaching actions. Similar approaches could be used to build models that optimize guidance based on children's current belief. Importantly, the model is not intended to immediately lead the child to the correct hypothesis as in direct instruction (e.g., "Triangles are shapes bounded by three edges and three vertices"), rather it optimizes the child's interest to guide them toward the correct hypothesis. In this way, guided play remains child-led.

Finally, the recommended intervention can be carried out by the experimenter in a way that is consistent with the principles of guided play (e.g., through questions like "What about this one [pointing to the inverted triangle]? Does it have point at the top? Is it a real triangle?").

Once trained, this model will significantly advance our understanding of (1) how individual children grasp concepts of geometric shapes; (2) common misconceptions along the way; and (3) optimal interventions. The resulting modelbased interventions allow for guidance tailored to the learner's moment-by-moment belief states.

#### PROSPECTS AND DIRECTIONS FOR A THEORY OF GUIDED PLAY

For children, learning takes place everywhere, all the time, and often involves interactions by the learner with more knowledgeable individuals. This ubiquity of learning opportunities can be exploited by providing subtle guidance that is contingent on the environment and children's current mental state (Ridge et al., 2015). Although research has highlighted the advantage of guided play, as compared to direct instruction

#### REFERENCES


or free play for facilitating learning (Alfieri et al., 2011; Fisher et al., 2013; Haden et al., 2016; Sim and Xu, 2017; Yu et al., 2018), pinpointing the optimum content and timing of guidance requires an understanding of the interactive and dynamic nature of an adult–child interaction.

We suggest that integrating computational models and data science tools may help lay out an avenue toward an empirically grounded and computationally precise framework of guided play. By modeling children's moment-to-moment mental state and the responsive behavior from adults, the proposed model has the potential to identify different components of guided play from dynamic and individualized interactions, and recommend modelbased interventions that are optimized in terms of timing and form, with the objective of sustaining the child's interests toward the learning goal. The resulting theory of guided play could identify key aspects of guidance that makes guided play effective in a particular context, while maintaining the complexity and ecological validity that comes with the interactive and dynamic nature of the theory. The goal is to use this framework to understand how learning proceeds and when it succeeds, which will also depend on the cultural context and individual learner. Future work could further extend the framework from one-onone interactions in early childhood to more complex learning scenarios and topics, such as those in a classroom setting. We hope such a framework will shed light on principles of optimal environments and practices to facilitate children's learning, and present an example of using new approaches to studying cognitive development.

#### AUTHOR CONTRIBUTIONS

YY, PS, and EB drafted the manuscript. All authors were involved in editing the manuscript.

### FUNDING

This study was supported by NSF Award SMA-1640816 to PS, EB, RG, KC, KH-P, and FX.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor is currently co-organizing a Research Topic with one of the authors, KH-P, and confirms the absence of any other collaboration.

Copyright © 2018 Yu, Shafto, Bonawitz, Yang, Golinkoff, Corriveau, Hirsh-Pasek and Xu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Toward a Theory of the Evolution of Fair Play

#### Jeffrey C. Schank<sup>1</sup> \*, Gordon M. Burghardt<sup>2</sup> and Sergio M. Pellis<sup>3</sup>

<sup>1</sup> Department of Psychology, University of California, Davis, Davis, CA, United States, <sup>2</sup> Departments of Psychology and Ecology & Evolutionary Biology, University of Tennessee, Knoxville, Knoxville, TN, United States, <sup>3</sup> Department of Neuroscience, University of Lethbridge, Lethbridge, AB, Canada

Juvenile animals of many species engage in social play, but its functional significance is not well understood. This is especially true for a type of social play called fair play (Fp). Social play often involves behavioral patterns similar to adult behaviors (e.g., fighting, mating, and predatory activities), but young animals often engage in Fp behaviors such as role-reversals and self-handicapping, which raises the evolutionary problem of why Fp exists. A long-held working hypothesis, tracing back to the 19th century, is that social play provides contexts in which adult social skills needed for adulthood can be learned or, at least, refined. On this hypothesis, Fp may have evolved for adults to acquire skills for behaving fairly in the sense of equitable distribution of resources or treatment of others. We investigated the evolution of Fp using an evolutionary agent-based model of populations of social agents that learn adult fair behavior (Fb) by engaging in Fp as juveniles. In our model, adults produce offspring by accumulating resources over time through foraging. Adults can either behave selfishly by keeping the resources they forage or they can pool them, subsequently dividing the pooled resources after each round of foraging. We found that fairness as equitability was beneficial especially when resources were large but difficult to obtain and led to the evolution of Fp. We conclude by discussing the implications of this model, for developing more rigorous theory on the evolution of social play, and future directions for theory development by modeling the evolution of play.

#### Edited by:

Patricia Shaw, Aberystwyth University, United Kingdom

#### Reviewed by:

Chen Yu, Indiana University Bloomington, United States Ori Ossmy, New York University, United States

#### \*Correspondence:

Jeffrey C. Schank jcschank@ucdavis.edu

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 13 October 2017 Accepted: 18 June 2018 Published: 24 July 2018

#### Citation:

Schank JC, Burghardt GM and Pellis SM (2018) Toward a Theory of the Evolution of Fair Play. Front. Psychol. 9:1167. doi: 10.3389/fpsyg.2018.01167 Keywords: social play, fairness, cooperation, evolutionary game theory, equitability, social development

### INTRODUCTION

Many species of animals engage in social play as juveniles and even in adulthood (Fagen, 1981; Palagi, 2011), but its functional significance is not well understood and accounting for its evolution has proven challenging (Caro, 1988; Burghardt, 2005). Social play appears not to be adaptive, especially in immature animals, because typically no immediate functions are apparent (Martin and Caro, 1985), it is costly due to increased mortality from predation, injury, and disease (e.g., Harcourt, 1991; Kuehl et al., 2008). However, it can have immediate benefits in terms of exercise, metabolism, and perceptual-motor coordination among other possibilities (see Burghardt, 2005 for review). A common working hypothesis, going back to the instinct-practice views of Groos (1898), is that there must be more adaptive benefits to social play and that these benefits come from learning specific social skills as juveniles that will be useful during adulthood (Pellis and Pellis, 2009; Pellis et al., 2010). While there is some limited empirical evidence supporting this working

hypothesis (Pellis et al., 2014; Vanderschuren and Trezza, 2014), until recently, there has been little theoretical support for it. Durand and Schank (2015), using an evolutionary agentbased modeling approach, showed that learning to cooperate by engaging in social play could stably evolve even with relatively high costs of mortality. Their model demonstrated that the synergistic benefits of learning to cooperate as adults via social play as juveniles can outweigh the costs of social play.

That the synergistic benefits of adult cooperation can outweigh the costs of social play is intuitively clear, but a more controversial idea is that young animals engaging in fair play (Fp) acquire skills needed to behave fairly as adults (Bekoff, 2001; Palagi et al., 2016). Bekoff (2001) has argued that behaviors such as selfhandicapping (e.g., an individual not biting as hard as it can) and role-reversal (e.g., alternately switching between dominant and submissive positions) are Fp behaviors that have evolved to facilitate the acquisition of skills for behaving fairly as adults. For humans, Bekoff (2001) has suggested that Fp may be the basis not only for fair behavior (Fb) but also for morality. By focusing on self-handicapping and role-reversal in juvenile play and its implications for morality, Bekoff (2001) argued for an equitability interpretation of fairness (i.e., fairness as the equitable distribution of resources or treatment of others).

From an evolutionary perspective, the hypothesis that Fp is beneficial for acquiring skills for behaving equitably and possibly for acquiring moral behavior as adults is problematic. While previous research (Durand and Schank, 2015) demonstrated that social play could evolve by facilitating adult cooperation, it is not clear that these theoretical results extend to fairness as equitable behavior. Fairness and cooperation are closely connected but they are not synonymous. Consider, for example, two hunters who cooperate to hunt elk instead of individually hunting for rabbits or squirrels. Taking down an adult elk provides considerably more meat than either hunter could bring home hunting individually for rabbits and squirrels. Hunting cooperatively for an elk provides a synergistic payoff greater than individual payoffs from hunting rabbits and squirrels. For cooperation to persist, the payoff of cooperative hunting must benefit both hunters, but this does not imply that the distribution of the elk has to be fair in the sense of an even distribution (assuming the hunters have the same abilities, needs, and contributed the same effort). Even though both hunters contributed equally to elk the hunt, if one takes 75% of the elk leaving 25% for the other hunter, it is still in the interest of the hunter receiving the lesser amount to participate in future cooperative hunts if 25% of a stag is more meat than a couple of rabbits or squirrels. That is, if the expected payoff from cooperating is greater than the expected payoff for not cooperating (even when the distribution of gains is not fair), unfair cooperation will be favored by evolution. This example suggests that evolution of cooperation does not guarantee the evolution of fairness.

A model of the evolution of Fp in juveniles must show that fairness is selected for as adults. Dugatkin and Bekoff (2003) developed a game-theoretical model aimed at showing that Fb in juveniles could stably evolve by promoting fairness in adults. Their model consisted of two-developmental stages. As young animals, individuals either engage in Fp or they do not (NFp). As adults, they either engage in Fb or they do not (NFb). This results in four possible strategies Fp/Fb, Fp/NFb, NFp/Fb, and NFp/NFb and each pairwise combination of strategies (e.g., in strategy pair Fp/Fb, Fp/Fb, and an individual plays another with the same strategy) had a distinct probability of obtaining a resource R. Fb learned during juvenile Fp is represented by the strategy Fp/Fb and Dugatkin and Bekoff (2003) assumed that when Fp/Fb plays itself, that fairness is a 50:50 split of the resource. When Fp/Fb plays an unfair strategy (i.e., Fp/NFb and NFp/NFb), the resource split favors the unfair strategy. In addition, their model assumed that Fp/Fb playing itself had the highest probability of obtaining a resource. They found that Fp/Fb is a pure evolutionary stable strategy as long as the payoff when playing against Fp/NFb is not greater than or equal to the payoff for Fp/Fb when playing itself. However, this result depends on the assumption that the total payoff of Fp/Fb against itself is higher than any other combination. Without this synergistic assumption, fairness—as a 50:50 split of the resource—cannot evolve.

It might be expected that social play in animals would display a 50:50 win-loss ratio in social play encounters as assumed in Dugatkin and Bekoff (2003) model, but there are many factors that could affect equitability. Empirical studies have shown that social play can deviate markedly from a 50:50 win-loss ratio due to factors such as age, sex, dominance and species differences (e.g., Pellis et al., 1993; Biben, 1998; Bauer and Smuts, 2007; Cordoni and Palagi, 2011), and even in similarly matched juveniles, role reversals occur at a rate of around 30% (Himmler et al., 2016; Pellis and Pellis, 2016). Indeed, the similarly below 50% level of role reversals in juveniles, reflecting a degree of reciprocity in play, is present across species that use very different behavioral mechanisms to ensure that some reciprocal exchanges occur during social play (Pellis and Pellis, 2017). Importantly, though, it should be noted that while the reciprocity in such play need not be equitable, excessive deviation toward one partner persistently gaining the upper hand leads to unstable play partnerships. Typically, the individual that is too overbearing becomes ostracized from the potential play partners in the group (e.g., Wilmer, 1991; Suomi, 2005).

While empirical studies provide some evidence that social play in juveniles can be fair (as described above), we still have no evidence that Fb in adults can be beneficial in itself. Because fairness does not imply a synergistic gain, it is difficult to conceive of how fairness could be selected for. To illustrate this point consider a group of hunter-gatherers. Assume all are exactly the same in ability, needs, and effort they put into obtaining resources. Fairness in this case implies an even distribution of resources because no individual is entitled to any more than the others because their abilities, needs, and effort are the same. There are two strategies these hunter-gatherers can use: selfish and fair. Individuals using a selfish strategy keep the resources they obtain each day while individuals following a fair strategy pool their resources for an even distribution at the end of the day with others who also adopt a fair strategy. There is no synergistic gain in pooling. Assuming individuals using both strategies are equally successful, p, in obtaining a resource, R, on a given day, the expected payoff over time for individuals using the selfish strategy

is pR. For fair individuals who pool their resources, the total pool is mpR, where m is the number of individuals using the fair strategy. After equitable division (mpR/m), each fair individual's expected payoff is also pR. Thus, the long-term expected payoffs for individuals adopting either fair or selfish strategies is exactly the same, pR. There is apparently no clear benefit for Fb as adults and if Fp as juveniles is costly, then there appears to be no theoretical basis for the evolution Fp as a learning or skill refining context for adult fairness and moral behavior.

Expected payoff is not the only way to characterize payoffs for fair and selfish strategies. There is also variance in payoffs among individuals adopting fair or selfish strategies. Individuals adopting the fair strategy pool and equitably divide their resources each day. Daily variance in payoffs for a group of fair individuals is easy to calculate at the end of the day, it is zero. For selfish individuals, although their expected payoff in the long run is pR, on each day they only have a probability p of success. Some selfish individuals will succeed in obtaining R resources, but (1 – p) other individuals will fail to obtain any resources. The expected variance among individuals adopting the selfish strategy can be calculated on the assumption that for a group of m individuals, pm of them will obtain a resource and (1 – p)m of them will fail yielding Eq. 1.

$$\text{var}(\mathbb{R}, p) = (1 - p)p\mathbb{R}^2\\\text{var}(\mathbb{R}, p) = (1 - p)p\mathbb{R}^2 \tag{1}$$

For example, if R = 40 units and p = 0.0875, variance is 127.75. Thus, even though there is no difference in the expected longterm payoff to either fair or selfish strategies, there is a large difference in daily variance in payoffs. Could differences in payoff variance play a role in fitness differences? If so, then it may be possible to show that the apparently worst-case scenario for fairness as equitability (i.e., even distributions of resources) has fitness benefits.

To illustrate how payoff variance may play a role in fitness, consider the dictator game. The dictator game is a simple 2 person game in which one player, the dictator, decides how to divide a resource with a second player. Since the second player has no leverage, the rational decision for the dictator is to keep all of the resource and give nothing to the second player because the second player has no counter strategy. However, numerous empirical studies have found that dictators give on average 30% to the other player (Engel, 2011). Thus, while it is surprising that dictators behave far more equitably than predicted they also do not, on average, evenly divide resources. Schank et al. (2015), using an agent-based model, showed that when population structure emerges from agent aggregation, clusters or groups of agents that more equitably distribute resources produce more offspring than those that do not. According to their analysis, the advantage of more equitable distributions of resources is due to the more efficient conversion of resources into offspring when there are constraints on the flow of resources to offspring. Interestingly, the sharing of resources need not be an even split to gain the benefit of more efficient conversion of resources into offspring.

In this paper, we developed an approach along the lines of Schank et al. (2015) to model the evolution of fair social play. Our model aimed to investigate the evolutionary plausibility of social Fp having its adaptive benefit in facilitating the learning of adult Fb. Our model, like Dugatkin and Bekoff (2003), has two developmental stages: a juvenile stage in which agents can engage in social Fp with mortality cost, c, which is the probability of dying when engaged in social play (e.g., killed by a predator due to increased exposure from playing). Our model is also similar to Auerbach et al. (2015) in that it is based on asexual reproduction, involving a single gene, but differed in that they modeled asocial play. As adults, agents forage for resources, R, at each time step with probability p of success. Agents that have learned to play fairly as juveniles pool their resources with other fair agents (if any) and then evenly divide the pooled resources at the end of each simulation step. Agents that have not learned to be fair, simply keep the resources (if any) they obtain. We hypothesized that Fp would evolve—even with juvenile mortality due to social play—when there is considerable variance in foraging for resources (i.e., likelihood of obtaining a resource is relatively low but the value of the resource is relatively high, for example, mimicking foraging in hunter-gather societies, see discussion). We show that Fp can evolve under these reasonable conditions and that our model can serve as a first step in the development of a rigorous theory of the evolution social play.

#### MODEL AND SIMULATION METHODS

Our aim was to develop a generically realistic model of the evolution of social play rather than a model for a particular species. By generic we mean a model that represents very general biological properties of animal social systems in which social play can evolve. By developing a generic model of Fp, this can facilitate the future extension of this model to specific species and social play systems. Although our model is not strictly speaking a gametheoretical model (i.e., fair and selfish strategies do not directly affect the payoffs of each other), it does share features in common with other game-theoretical models using agent-based modeling (for a recent review see Adami et al., 2016).

In our model, animals reproduce and invest some of their resources into their offspring. There are many ways organisms can reproduce, but we have selected a very simple mode of asexual reproduction with a single gene for social play. Variation is constantly introduced by random mutation of the play gene (i.e., play genes mutate to an on or off state depending on the prior state of a parent) at a low frequency. We assumed agents have an average lifespan with Guassian variation about the mean to model the myriad causes of death without modeling these causes in specific detail. Finally, we assumed that agents live in small social groups and that juvenile agents can play with other juveniles in their group.

Development is simplified to consist of two stages, juvenile and adult, similar to the assumption made by Dugatkin and Bekoff (2003). During the juvenile stage, agents can engage in social play with a potential cost of death while learning to behave fairly as adults. As adults, all agents accumulate resources if they have learned to behave fairly, they can share their resources on each round of play with other fair agents. Resources are converted

into offspring by reproduction. An adult agent can reproduce if it has accumulated sufficient resources and a minimum "gestational" period has occurred between reproductive events. To our knowledge, introducing a delay between reproductive events has never been done in an evolutionary model, but is a generic characteristic of all multicellular organisms. Based on Schank et al. (2015), delays between reproductive events should constrain the flow of resources into offspring resulting in fitness benefits for fairness.

### Model Details

Adult agents are assumed to live in groups with their offspring. The number of groups, n, in a population is limited to a maximum of G groups with a total maximum population size of K. For example, if G = 50 and K = 1000, then the maximum average group size is 20 adults. When all the members of a group die, the group is extinguished. When a group reaches it fission size f and the number of groups is less than G, the parent group fissions producing an offspring group by randomly selecting g adult members from the parent group to form the offspring group.

Agents have two developmental stages: juvenile and adult. Juvenile agents engage in social play. Adult agents forage for resources to reproduce. The juvenile stage is j steps long and during this period, agents can engage in social play if their social play gene is on, otherwise they do nothing (below we will refer to the play gene in the on state as the play gene). Juvenile agents can learn to behave fairly as adults if they engage in at least α bouts of Fp. Only one bout of play with another juvenile can occur on a given simulation step. Each bout of Fp comes with a potential cost, c, of mortality. That is, there is a random chance with probability c that an agent dies during a bout of play. Agents that do not play suffer no mortality cost. A juvenile agent finds a play partner by randomly querying (analogous to directing a play invitation signal) other juvenile agents in its group until it finds another juvenile that will play (i.e., has the play gene) or until it has queried all juveniles in its social group and found none that will play. If a juvenile agent does not find any other juvenile agents to play with, it does not play. Thus, when the frequency of the play gene in a population is low, some agents with the play gene may not learn to play fairly but also will not suffer the cost c of engaging in Fp.

When a juvenile reaches the jth simulation step after birth, it becomes an adult and enters its social group if the total number of adult agents in the population is less than K. If there are K or more adult agents in the population, then the juvenile agent dies. This method holds the number of adult agents in the population to no greater than K by assuming juvenile mortality occurs at a higher rate than adult mortality, which is biologically reasonable (Caughley, 1966). This method introduces no bias into the simulation at the juvenile stage because other than mortality due to play, whether a juvenile becomes an adult is entirely random with respect to K.

During the adult stage, reproductive output is dependent on resource acquisition, which implies that the more resources an agent obtains, the greater its reproductive output. Adult agents forage for resources, R, on each simulation step with probability p and so the expected payoff for each agent is pR (e.g., if R = 40 units and p = 0.0875 the expected payoff would be 3.5 units over time). The resource R is the mean of the resources agents can obtain on a given step and the quantity of the resource obtained is R plus a random Gaussian deviate with standard deviation SD<sup>R</sup> = 0.1R (10% of the mean resource). Agents that learn to be fair by engaging in social play and those that do not, have the same success rate, p, of obtaining resources, R. Fair adult agents pool their resources with other fair adult agents in their social group on each simulation step and then divide the pooled resources at the end of each simulation step. Because fair agents pool and then divide their resources on each round of play, their expected payoff is exactly the same as selfish agents, pR. Thus, there is no apparent reproductive advantage to fair agents pooling their resources based on expected payoffs.

Adult agents can reproduce when they have accumulated resources sufficient to reach or surpass a threshold T. The timing between reproductive events is constrained by a reproductive delay d. That is, if an agent reproduces at step t then the earliest it can reproduce again is t + d. The reproductive delay, d, can be interpreted, for example, as a fixed gestational period. Reproductive delays constrain the number of offspring that can be produced in a lifetime. Unlimited resources cannot result in unlimited reproduction in this model.

Agents have only one gene, which is a social play gene that is in one of two possible states: on or off. Offspring inherit the state of their play gene from their parent, but the state can be flipped to the opposite state by mutation at rate r. A parent contributes a portion P of its accumulated resources (i.e., the total amount of resources it has accumulated up to that step) to its offspring and keeps (1 – P) resources. When an agent is born, it is assigned a lifespan, which is a random integer composed of the mean lifespan l plus a randomly generated integer (±) drawn from a Gaussian distribution with standard deviation SD<sup>l</sup> . When an agent reaches the end of its lifespan, it dies and is removed from the simulation including its current juvenile offspring. The underlying assumption is that the juvenile agents are dependent on the parent and do not survive a parent's death. Alternatively, it could have been assumed that dependent juvenile offspring survive the death of a parent. For this model, since adult death does not depend on the resources collected, but rather a randomly assigned death date, the choice of assumption is not crucial. If, however, the lifespan of an agent depends on its behavior, then such assumptions do matter (see **Figure 1** for an agent's decisions and possible events during s simulation step).

#### Simulations

The parameter values used in all simulations are listed in **Table 1**, initial conditions listed in **Table 2**, and the parameter sweeps used in the two experimental sets of simulations are listed in **Table 3**. Control simulations were run, which differed from the experimental simulations in that agents did not learn fairness from engaging in Fp as juveniles. In this model, the dependent variable is the frequency of the Fp gene (i.e., the play gene is in the on state). Because the frequency of the play gene in a population correlated, as expected, very closely with the frequency of fair

adults in a population, the frequency of the Fp gene is also a very accurate proxy for the frequency of adults that learned to behave fairly in these simulations.

Before running the main simulations reported here, we ran preliminary simulations to determine how many simulation steps were required to reach equilibrium frequencies of the play gene. Based on these simulations, we estimated that 10000 simulations steps were required to reach estimated equilibrium frequencies. We then ran all simulations for an additional 15000 steps giving a total of 25000 steps. This allowed us to calculate the frequency of the play gene based on the number of agents born with the play gene over the total number of agent born in the interval 10000 to 25000 steps. Based on these calculations, frequency estimates of the play gene were based on at least 21000 agents for each simulation experiment. For each set of parameter conditions, we ran 20 simulation experiments. Thus, the play gene frequencies reported for each set of parameter conditions were based on at least 420000 agents and so the frequency results reported here are based on very large numbers of observations.

The theoretically interesting parameters in this model are parental investment, P, the average resources R obtained in

TABLE 1 | Fixed parameters, values, and descriptions.


#### TABLE 2 | Initial conditions.


a successful foraging bout, the foraging success rate p, and the juvenile mortality play cost, c. Variance in foraging was hypothesized to create the opportunity for selection on adult fairness. We generated different levels of variance in two ways. First, we held expected foraging success pR = 3.5, constant (the expected payoff should be relatively small so that a substantial number of simulation steps are required to accumulate sufficient resources to reproduce) and then systematically varied combinations of p and R (see **Table 3**, first set of simulations). Second, we held R = 40 constant and varied p to produce a range of expected payoffs, pR (see **Table 3**, second set of simulations). The mortality cost, c, is the probability that a juvenile agent dies as a result of engaging in social play. We investigated different values of c (see **Table 3**) that generated different percentages of juvenile mortality due to play. Finally, we simulated four levels of parental investment P (see **Table 3**) to assess the effect of parental investment on the evolution of the play gene.

More precisely, for the first set of simulations, the expected payoff pR was held constant at 3.5 and we investigated a range of payoffs R = 10 to 50 with increments of 10 (see **Table 3**). For each expected payoff, we ran 20 simulations for reproductive delays of 0 to 50 in increments of 1 for the seven mortality conditions. This resulted in 5 × 51 × 7 = 1785 sets of simulations for a total of 20 × 1785 = 35700 simulations. For each of four parental investment values, we repeated these 35700 simulations for a total of 142800 simulations. For the second set of simulations, we held constant at R = 40 and varied the foraging success rate, p, of obtaining R such that the expected payoffs ranged from 2 to 5 in increments of 0.5 (see **Table 3**). For each expected payoff, we again ran 20 simulations for reproductive delays of 0 to 50 in


increments of 1 for the seven juvenile mortality rate conditions. This resulted in 7 × 51 × 7 = 2499 sets of simulations for a total of 20 × 2499 = 49980 simulations. For each of the four parental investment values, we repeated these 49980 simulations for a total of 199920 simulations. Thus, we ran a total of 342720 simulations, which lasted up to 25000 steps each with populations of 1000 agents.

We also ran control simulations, which were exactly the same as the experimental simulations except that agents did not learn adult fairness from juvenile Fp. Control simulations were required because the expected payoffs for fair and selfish agents were the same and in the absence of selection, the frequency of the play gene should evolve to 50% when there is no mortality cost for social play. A positive mortality cost of juvenile agents engaging in social play but not learning to behave fairly as adults does not guarantee that the frequency of play gene will drop to zero. This is because at low frequencies, there will be too few if any juvenile agents in a small group that have the play gene. Thus, at low frequencies, the play gene will suffer little if any mortality cost due to social play and mutation will continue to reintroduce the play gene at a low rate (see **Table 1** for mutation rate).

Mortality cost, c, is the probability of dying when engaged in social play. Values of c were selected to generate a range of mortality rates ranging from 0% to just over 10% mortality in juveniles engaged in social play. Because of the complexities of how often juvenile agents actually play, mortality percentages can only be calculated by recording how many juvenile agents die during a simulation. Different values of c were used (see **Table 3**), which generated mortality percentages, which varied among different simulations sets. For example, when the probability c of dying during a play bout was c = 0.003, this typically resulted in a 10% mortality rate. In some sets of simulations, the record mortality may have been 10.2% and in others 9.9%.

For simulations with parental investment of P = 0.1, populations often went extinct for large reproductive delays (d > 45). Agents, on average, live for additional 100-time steps after they become adults. Reproductive delays greater than 45 steps imply that adults can reproduce at most twice. When parental investment, P = 0.1 is very low, new adult agents may require more than 25 steps to accumulate sufficient resources, reducing the average individual reproductive rate below sustainable levels when combined with positive juvenile mortality costs. Thus, in the results reported below, the average evolved frequency of the play gene for parental investment of P = 0.1, were average for values of d ranging from 0 to 45.

The agent-based model was written in Java using the agentbased modeling library provided in MASON (Luke et al., 2005). All simulations were run on computers using Scientific Linux<sup>1</sup> .

#### RESULTS

We found that the play gene evolved to frequencies greater than in control simulations across a wide range of conditions as the variance in payoffs increased. **Figure 2A** illustrates the evolved

FIGURE 2 | Mean frequencies of the play gene plotted against mortality costs for four different values of parental investment, P = 0.1, 0.3, 0.5, and 0.7 (A). Expected payoffs were held constant at pR = 3.5 by multiplying values of R with values of p. Mean play-gene frequencies for parental investment of P = 0.5 only and plotted by the expected quantity of R = 10, 20, 30, 40, and 50 (B). Although expected payoffs were held constant, varying R and p generated different degrees of expected resource variance on each round (C). Resource variance was calculated using Eq. 1 for values of p and R and then plotted against the probability p of obtaining a resource payoff on a given round of play. The combination of lower probability of payoff and higher resource quantity (compare colors in A and B) generated considerably different levels of variance. For R = 10, p = 0.35 (green), variance was very low and the play gene evolved to frequencies barely above chance and only for the lowest social play mortality costs. In contrast, for R = 40, p = 0.0875 (black), variance was high and the play gene stably evolved well above chance levels even for the highest rate of social play mortality. However, the social play gene did not evolve monotonically with increasing variance. The values R = 50, p = 0.07 (red) produced the highest variance but not yield the highest frequency of play genes.

<sup>1</sup>www.scientificlinux.org

reproductive delays. Panels (B–F) correspond to different resource variances, respectively. When there was no mortality cost for social play, the frequency of the play gene typically evolved to about 80% except for the lowest variance condition [var(10,0.35) = 22.75; B] in which the play gene only evolved to about 80% at the reproductive delay, d = 15. As mortality due to social play increased, the evolved frequency of the play gene rapidly decreased except close to the reproductive delay, d = 15 (B–F). Play gene evolution was most favored for the next to highest variance condition (R = 40, E). In panel (E), near the d = 15 delay, the play gene is maintained at over 70% even with 10% mortality.

frequencies of the play gene for different values of parental investment including corresponding control simulations. Each point is averaged over reproductive delays and different values of p and R. For all values of parental investment, the evolved frequency of the play gene was above the control simulation values. For these simulations, parental investment had a relatively small effect on the overall evolution of Fp. **Figure 2B** illustrates results for a representative parental investment of P = 0.5. Simulation results were again averaged over reproductive delays, but the values of p and R were varied to produced different degrees of variance (see **Figure 2C**) while holding pR = 3.5 constant. As variance (calculated using equation 1), increased with greater values of R (**Figure 2C**), the stably evolved frequency of the play gene increased until R = 50, where it was slightly lower than for R = 30, 40. The lowest variance occurred for R = 10, as expected, and the frequency of the play gene barely evolved above chance (**Figure 2B**).

**Figure 3** illustrates the same simulations as in **Figure 2A**, but not averaged over reproductive delays. **Figure 3A** shows the control simulations, which as expected did not vary as a function of reproductive delay. **Figures 3B–F** plot the evolved frequencies of the play gene as a function of reproductive delay. These figures also illustrate the intensity of selection for the Fp gene as a function of reproductive delay. We see that different reproductive

delays interact with expected payoffs so that the intensity of selection for the play gene is very high for a narrow range of d. For these sets of simulations, the peak intensity of selection occurred with a reproductive delay of 15. At peak selection intensity, high frequencies of the play gene could be maintained in the face of juvenile mortality ranging from 8 to 10% (e.g., **Figure 3E** with R = 40 and p = 0.0875). In contrast with the averaged results in **Figure 2**, even for the lowest variance condition var(10, 0.35) = 22.75, the play gene evolved to 80% at d = 15 when social play mortality was 1.9% (**Figure 3B**). In **Figure 3E**, for var(40, 0.0875) = 127.75, the play gene evolved to 76% at d = 15 even with a 10% juvenile mortality rate. (see **Figure 4** for the evolution over time steps of the simulations illustrated **Figure 3E** for d = 15).

agent (A), but even for 10% mortality, the mean number of agents that

evolved the play gene was over 70%.

In **Figure 5**, we held the mean payoff constant at R = 40 but varied the foraging success rate p and thus the expected payoffs, pR, varied from 2 to 5. In **Figure 5A**, parental investment P was varied and each point is averaged over reproductive delays and different expected payoffs. In these simulations, the play gene

rates, p, yielding expected payoffs ranging from 2 to 5. Mean play-gene frequencies for parental investment of 0.5 with expected payoffs, pR, ranging from 2 to 5 are plotted in (B). A plot of the expected variances for each expected payoff in A and B (C). Panel A illustrates that social play can robustly evolve even under the highest social play mortality conditions, averaging equilibrium values for individual sets of simulations provides only the crude depiction of the complexity of multilevel evolutionary process (see Figure 6).

evolved to higher frequencies than the control simulations for all values of parental investment. Under low parental investment (P = 0.1), the play gene evolved to the highest frequencies. Parental investment of P = 0.5 was approximately in the middle, and the play gene evolved to the lowest levels when P = 0.7. **Figure 5B**, illustrates the results for parental investment of P = 0.05, with the results for each value of the expected payoff pR plotted individually. In all of these simulations the play gene

evolved well above control simulation frequencies for all levels of juvenile mortality. All simulations have very similar results (due to smaller range in resource variances, **Figure 5C** as compared to **Figure 2C**) even though there was considerable range in expected payoffs (2 to 5; **Figure 5B**).

As with **Figure 3**, examining the evolution of play gene with respect to reproductive delays paints a more complex structure. In **Figures 6B–H**, the evolution of the play gene evolves above chance levels (**Figure 6A**) for low to moderate juvenile mortality. In **Figure 6B**, when the expected payoff pR = 2 was the lowest of all the conditions, short reproductive delays (d = 0, . . ., 5) resulted in the evolution of play gene frequencies at the level of chance (cf. **Figure 6B**). However, as the reproductive delay, d, increased beyond d = 5, play gene frequencies began to increase. For example, for the 8% mortality condition, play gene frequency in population reached 70% with reproductive delays, d, of 26 to 27 (**Figure 6B**). Interestingly, **Figures 6B–H** illustrate that the intensity of selection for Fp is a function of the reproductive delay for expected payoffs ranging from 2 through 5 in increments of 0.5 (**Figures 6B–H**). In **Figures 6B–H**, the delays resulting in peak selection intensity occur at d = 26 (**Figure 6B**), 20 (**Figure 6C**), 17 (**Figure 6D**), 14 (**Figure 6E**), 13 (**Figure 6F**), 11 (**Figure 6G**), and 10 (**Figure 6H**).

The increased selection for adult Fb that peaks around specific values of d in **Figures 3**, **6**, can be explained in terms of parental investment, P. **Figure 7A** plots the cumulative expected payoffs during reproductive delays of d = 26, 20, 17, 14, 13, 11, and 10 with the corresponding expected payoffs per round of pR = 2, 2.5, 3, 3.5, 4, 4.5, and 5. In each case, dpR is close to 50 (e.g., for d = 10, p = 0.125, and R = 40, dpR = 50). With a reproductive threshold of T = 100 units of resources and parental investment of P = 0.5, a parent is expected to retain about 50 units of it resources after a reproductive event. Thus, about 50 units of resources are required to reproduce again. A fair agent with a pR = 2.5 will require, on average, about 20 rounds of foraging to accumulate about 50 units of resource where as a fair agent with a pR = 5 will only require about 10 rounds of foraging. Selfish agents have the same cumulative expected payoff except that their payoffs come in chunks of size R. Thus, during reproductive delays there is a higher probability that selfish agents will not accumulated the required resources during the delay period (i.e., t + 1 to t + d). For example, a selfish agent requires at least two payoffs of R = 40 during a reproductive delay d to reach the reproductive threshold. The binomial probability of a selfish agent achieving this threshold in d rounds is less than 0.4 as illustrated in **Figure 7B**. On the other hand, fair agents have a slow but steady accumulation of payoffs that on average achieves the reproductive threshold in d simulation steps.

#### DISCUSSION

We found that juvenile Fp could evolve by facilitating the acquisition of skills for equitable behavior in adulthood. This provides theoretical support for the working hypothesis that adult fairness could be beneficial and, as Groos (1898) long ago proposed, a benefit of social play comes from learning specific adult social skills as juveniles, in this case fairness. These results also provide support for the more controversial idea proposed by Bekoff (2001) and Palagi et al. (2016) that young animals engaging in Fp acquire skills needed to behave fairly as adults. Our results indicate that Fp behaviors, such as self-handicapping (e.g., an individual not biting as hard as it can) and role-reversal (e.g., alternately switching between dominant and submissive positions), could have evolved to facilitate the acquisition of skills for behaving fairly as adults.

In our model, adult agents could either keep what they foraged or pool it with other fair agents and then distribute pooled resources evenly among themselves after each round of foraging. Selfish and fair agents had the same expected payoffs but variance in accumulated resources was less for fair agents. This allowed resources to flow more efficiently into the production of offspring. By imposing a "gestation" period (reproductive delay) on agents, we found that this greatly affected the intensity of selection for fairness even in the face of high juvenile mortality costs (see **Figures 3**, **6**). Such constraints enhance the advantages of fairness because unfair agents cannot convert all of their resources into offspring due to "gestational" delays. In other words, assuming no constraints on the rate of reproduction is equivalent to assuming that by feeding a female rat twice as much will either double her litter size or cut the gestation period for her pups in half. Neither are biologically plausible or possible assumptions but they are implicitly assumed in all evolutionary game-theoretical models. We have demonstrated for the first time that gestation may be an important parameter in the theoretical analysis of fair and cooperative behavior.

We found that these "gestational" constraints interacted with expected payoffs and foraging success rates to generate differential selection intensities for fairness. Selection was most intense for fairness when the reproductive delay d multiplied by the expected payoff on each simulation step equaled the expected resources needed to reach the reproductive threshold in d steps (i.e., d × pR ∼= 50 for parental investment, P = 0.5). When selection was most intense, juvenile social play mortality rates of 10% could still support the stable evolution of Fp. However, even when selection was not at peak intensities, Fp still evolved with approximately 2% juvenile mortality especially when the probability of obtaining resources is low but the reward was relatively high (see **Figures 3**, **6**).

These conclusions also held for other values of parental investment. For parental investment of P = 0.7, an adult agent on average needs to accumulate at least 70 units of resources to reproduce again. For example, in simulations with an expected payoff of pR = 3.5, the most intense selection for fairness occurred for reproductive delays, d, ranging from 20 to 22, which would be expected to yield 70 to 77 units of resource. For parental investment of P = 0.3, an adult agent on average needs to accumulation only about 30 units of a resources to reproduce again. We found that for an expected payoff of pR = 3.5, the most intense selection for fairness occurred for reproductive delays, d, of 9, which would be expected to yield 31.5 units of resource. This suggests that there may be a previously unrecognized theoretical relationship among the evolution of fairness and cooperation in a social system, parental investment in offspring, and the minimum

delay in the production of offspring. Further research will be

expected payoffs ranging from 2 to 5 in increments of 0.5 (B–H).

importance. The evolution of Fp varies greatly with reproductive delay and social play mortality rates. Fp may only evolve when play mortality is relatively low, variance in payoffs is relatively high,

required to more fully elucidate these relationships and their

or reproductive delays and expected payoffs are optimal for the evolution of Fp. When expected payoffs from foraging are relatively low, gestational periods that optimally support Fp are also relatively long with corresponding longer juvenile periods. Could longer periods of development facilitate acquiring more sophisticated or refined social skills? Interestingly, the experience

of social play in the juvenile period appears to improve sexual performance and reproductive success in adulthood (e.g., Nunes, 2014; Ahloy Dallaire and Mason, 2017). In part, this may arise from the effects of play experience on the skills influencing social competency noted above (Marks et al., 2017). Moreover, species with more complex social play tend to have a more protracted juvenile phase (Pellis and Iwaniuk, 2000; Diamond and Bond, 2003). If so, the reproductive delays arising from the present model may reflect the longer juvenile period needed for play to train social skills. Indeed, Groos famously claimed that the purpose of youth was so they could learn or practice through play the skills they would need as adults. Although the reproductive delay is characterized as gestation in our model, it could also include postnatal parental care and investment as well.

The selection processes that emerged in these simulations were multilevel but not group selection in the classical sense. In classical group selection, a phenotype evolves because groups with individuals that possess phenotype X out reproduce groups without X leading to a proliferation of groups with X. In these simulations, group structure was not essential, only the social behavior of pooling resources and then dividing the pooled resources with other fair agents was essential. Similar results to those presented here can be obtained with a single large population of agents that pool resources with other fair agents in a population. Those agents that pool resources out reproduce those that do not even though all agents have the same long-term expected payoffs. Thus, selection, in these simulations, emerged from the social interactions of agents and occurred at both the individual (selfish agents) and social levels (fair agents that pool resources).

As noted in the Introduction, when animals engage in social play, they may deviate substantially from equity. In this model we assumed all have the same rate of foraging success, which justified our assumption of the even distribution of pooled resources among fair agents participating in the pool. In real-world contexts, animals have different foraging success rates and other individual differences, which may raise questions about the generalizability of this model and its results to more realistic contexts. We believe that this model and its results are likely generalizable because the benefit of equitability in resource distribution is the more efficient flow of resources into offspring under constraints. In more realistic models in which individual differences in foraging success are included, division of resources may be based on ability or contribution, but as long as, some portion of the distribution is based on equitability, there will be more efficient flow of resources into offspring than if there is no equitability at all. Thus, any strategies that tend toward equitability and thus tend to reduce inter-individual variation in resources should be selected for at the social level. This is what Schank et al. (2015) found in their evolutionary model of the dictator game. Equitability evolved among agents even though equitability did not evolve to even splits of resources. Future models could more fully investigate these complexities by introducing individual differences in individual foraging success or personality differences into fairness contexts. Empirically, we need a deeper and more precise understanding of how adult skills are acquired by engaging in social play as juveniles. For example, research correlating the frequency of self-handicapping or rolereversal behaviors in juvenile play and adult behaviors such as tolerance.

If the evolution of fairness and Fp often deviates from evensplit equitability, how common is Fp in species that engage in social play? In no case that we know about, is play sustainable if play is completely inequitable (e.g., no role-reversals or no selfhandicapping). This means that there may be variation in what particular pairs of play mates agree to be equitable, but whatever that level may be, it affords ample opportunity to train social skills. Although rare, play fights can escalate to serious fighting (Fagen, 1981) and this typically occurs when one of the partners fails to follow the species-typical rules that ensure that these contests remain reciprocal (Pellis and Pellis, 1998).

The proximate mechanisms regulating Fp and the acquisition of adult social skills are beyond the scope of this paper, but empirical progress has been made. During social play, any given event may lead to loss of bodily control and some pain, but playing animals have to decide whether that arose as a one off due

to excessive exuberance by the partner or due to a systematic rule breaking. Such a decision requires that play partners, monitor the actions of the partner (attention), keep track of wins and losses in successive play bouts (short-term memory), do not overreact to minor transgressions (emotional regulation), and then when confronted with a major transgression take appropriate action forgive the partner, terminate the play bout or escalate to serious aggression (decision making). In this way, playing with peers engages the executive functions of the frontal areas of the brain and there is now growing evidence that such play in the juvenile period facilitates neural development and the refinement of these skills, resulting in more socially skilled adults (e.g., Bell et al., 2010; Baarendse et al., 2013; Burleson et al., 2016; Schneider et al., 2016a,b). Moreover, it is not simply the performance of combat-like actions during play that is critical, but the modulatory adjustments needed to ensure that play fights remain reciprocal (Schneider et al., 2016a; Pellis et al., 2017). Future models could focus on more realistic learning contexts when agents engage in play fights (see Bell et al., 2015) and so identify how they may learn the rules that maintain reciprocal play.

The evolution of play and social play is likely more complicated than just whether it facilitates the acquisition of adult social skills such as cooperation (Durand and Schank, 2015) or fairness. Another recent agent-based model for the origins of play (Auerbach et al., 2015) is worth comparing to the current one as it has both important similarities and differences, as well as differing results (Auerbach et al., 2015). In this model asexually reproducing agents could engage in foraging, resting, and playing or reproducing when a certain level of energy is acquired. It differed from the current model in being a two loci model with one being an on-off play trait and the other a quantitative trait of how often the agent plays. A mutation turns play on or off. The results of various simulations showed that under conditions of ample resources in the environment, play with no fitness benefits can evolve and be maintained indefinitely, whereas play with benefits becomes both more common and more variable and thus prone to extinction. Unlike in the present model, population size was not held constant in Auerbach et al. (2015) model, but was limited by available resources. Play, being energetically costly, led to more exploitation of resources in the environment and in a resource limited environment resulted, in a counterintuitive way, to lower survival by nonplaying agents. Future models could investigate the potential benefits of acquiring cooperative or fairness skills in limited resource environments where the quantity of play affects resource demand. Such models would allow us to test predictions about the environmental circumstances that favor or do not favor the evolution of Fp.

In the present model we did not consider scenarios in which agents cheat as adults. Learning how to deal with cheats also could be an important function of social play in the context of fairness. Through social play, individuals can learn who plays fairly and who does not, avoiding those that do not. Individuals could also learn how to punish cheats and thereby reduce the fitness of cheats at some cost to themselves. Future models could investigate the socialplay acquisition of strategies for dealing with cheats such as punishment in public goods games (Fehr and Gächter, 2000).

Our model shows that juvenile Fp could evolve in contexts in which large packages of resources are relatively rare. In human hunter-gatherer societies, there has long been considerable evidence that relatively rare large-resource package size (e.g., meat from a large mammal) is associated with food sharing (Kaplan et al., 2000). Indeed, chimpanzees, who acquire meat much less often than human hunter-gatherers, are more likely to share meat (which usually comes in relatively larger quantities) than any other food resource they acquire (Kaplan et al., 2000). These results strongly suggest that our investigation of lowfrequency large-package size resources is consistent with human evolution.

Our model may also have broad application for understanding social behavior beyond what are traditionally considered social species. Elbroch et al. (2017) reported that a solitary carnivore, the puma, often shares kills with other pumas. Puma kills are relatively rare but often involve prey several times the mass of a puma (Elbroch et al., 2017). This fits the scenarios we modeled in which resources that are large but rare generate considerable variance among individuals, in this case even being applicable to solitary pumas. According to our model, such conditions are ideal for pooling resources. Puma litters are typically 2 to 3 cubs (Logan and Sweanor, 2001), which would allow ample opportunity for littermates to learn to behave fairly via rough and tumble play. Although only a working hypothesis at this point, our model suggests that tolerance of other pumas at a kill site could be related to the degree of social play within a litter and selected because sharing large kills is most favored when resource variance is high.

Another example that could test the limits of our model occurs with Komodo dragons. Occasionally, they can bring down deer or even water buffalo by themselves. Such kills attract other Komodos, which sometimes peaceably join in the consumption of the kill or sometimes hierarchical disputes arise but sharing of kills benefit the local population (Auffenberg, 1981). Not much is known about juvenile social play in Komodos, but they do engage in playful interactions with keepers in captivity and engage in extensive play with objects in captivity becoming more solitary as they age (Burghardt et al., 2002). This suggests that it may be worthwhile to empirically investigate to what extent if any juvenile Komodos engage in social play, and if so, investigate whether aspects of their social play correlate with increased tolerance of others at their kill sites.

We are now beginning to develop a more precise and quantitative theoretical understanding of the evolution of social play and more generally, play. Play can evolve to facilitate adult cooperation even when social play among juveniles is costly (Durand and Schank, 2015). Here we have shown that Fp can evolve to facilitate fairness in adults and this may provide further theoretical insights for empirical studies investigating Fp in different species.

However, Auerbach et al. (2015) model suggest that understanding the evolution of play is not as simple as just weighing adult benefits against social play costs. Play may evolve without any functional benefit under conditions of abundant resources. Although, once present, play may readily be co-opted for novel functional benefits (Pellis et al., 2015).

Future models could investigate richer and more detailed social play contexts in which juveniles not only learn to behave fairly and cooperate as adults, but also learn through their play interactions strategies for dealing with cheats and unfair individuals. Models could also investigate individual differences such as foraging abilities in adults or age difference interactions in juvenile play. Models such as Auerbach et al. (2015) can be extended to further investigate the conditions under which play can evolve. Differences in personalities or behavioral syndromes both in juvenile play and adult behavior also may be important to include in future models (e.g., see Sih et al., 2004). In our view, developing a theory of the evolution of play involves developing a family of related and increasingly testable models. Our model takes us another step toward this long-term goal.

#### REFERENCES


### AUTHOR CONTRIBUTIONS

JS developed the model, helped conceptualize the project, and wrote the first draft. GB and SP helped conceptualize the project and helped write and edit the manuscript.

### FUNDING

In part the contribution was supported by an Natural Sciences and Engineering Research Council (Canada) grant to SP (grant #40058).

### ACKNOWLEDGMENTS

This work was inspired by participants in the working group on Play, Evolution, and Sociality sponsored by National Institute for Mathematical and Biological Synthesis, an Institute sponsored by the National Science Foundation through NSF Award #DBI-1300426 with support from The University of Tennessee, Knoxville.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Schank, Burghardt and Pellis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Accessing the Inaccessible: Redefining Play as a Spectrum

Jennifer M. Zosh<sup>1</sup> \*, Kathy Hirsh-Pasek2,3, Emily J. Hopkins<sup>2</sup> , Hanne Jensen<sup>4</sup> , Claire Liu<sup>5</sup> , Dave Neale<sup>6</sup> , S. Lynneth Solis<sup>5</sup> and David Whitebread<sup>7</sup>

<sup>1</sup> The Pennsylvania State University Brandywine, Media, PA, United States, <sup>2</sup> Department of Psychology, Temple University, Philadelphia, PA, United States, <sup>3</sup> The Brookings Institution, Washington, DC, United States, <sup>4</sup> The LEGO Foundation, Billund, Denmark, <sup>5</sup> Graduate School of Education, Harvard University, Cambridge, MA, United States, <sup>6</sup> School of Education, University of Delaware, Newark, DE, United States, <sup>7</sup> Homerton College, University of Cambridge, Cambridge, United Kingdom

Defining play has plagued researchers and philosophers for years. From describing play as an inaccessible concept due to its complexity, to providing checklists of features, the field has struggled with how to conceptualize and operationalize "play." This theoretical piece reviews the literature about both play and learning and suggests that by viewing play as a spectrum – that ranges from free play (no guidance or support) to guided play and games (including purposeful adult support while maintaining playful elements), we better capture the true essence of play and explain its relationship to learning. Insights from the Science of Learning allow us to better understand why play supports learning across social and academic domains. By changing the lens through which we conceptualize play, we account for previous findings in a cohesive way while also proposing new avenues of exploration for the field to study the role of learning through play across age and context.

#### Edited by:

Ann Dowker, University of Oxford, United Kingdom

#### Reviewed by:

Ora Oudgenoeg-Paz, Utrecht University, Netherlands Rachel M. Flynn, Northwestern University, United States

> \*Correspondence: Jennifer M. Zosh jzosh@psu.edu

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 05 April 2018 Accepted: 12 June 2018 Published: 02 August 2018

#### Citation:

Zosh JM, Hirsh-Pasek K, Hopkins EJ, Jensen H, Liu C, Neale D, Solis SL and Whitebread D (2018) Accessing the Inaccessible: Redefining Play as a Spectrum. Front. Psychol. 9:1124. doi: 10.3389/fpsyg.2018.01124 Keywords: play, playful learning, cognitive development, children, games, pedagogy

The most irritating feature of play is not the perceptual incoherence, as such, but rather, that play taunts us with its inaccessibility. We feel that something is behind it all, but we do not know, or have forgotten how to see it. [scholar Robert Fagen (1981) as cited in Sutton-Smith, 1997].

Play is a roomy subject, broad in human experience, rich and various over time and place, and accommodating pursuits as diverse as peekaboo and party banter, sandlot baseball and contract bridge, scuba diving and Scrabble. Play welcomes opposites, too. Play can be free—ungoverned by anything more complicated than choosing which stick is best to improvise a light saber—or fixed and codified, as in those instances when soccer players submit to scrupulous "laws." Play can take active or passive form and can be vicarious or engaging—and so we recognize play in both the spectator and the actor (Eberle, 2014, p. 214).

Play is often defined as activity done for its own sake, characterized by means rather than ends (the process is more important than any end point or goal), flexibility (objects are put in new combinations or roles are acted out in new ways), and positive affect (children often smile, laugh, and say they enjoy it). These criteria contrast play with exploration (focused investigation as a child gets more familiar with a new toy or environment, that may then lead into play), work (which has a definite goal), and games (more organized activities in which there is some goal, typically winning the game) (Smith and Pellegrini, 2013).

For something so easy to observe, that occurs in monkeys, humans and even octopuses (Burghardt, 2011), play has been notoriously difficult to define. An infant playing with vocalizations

**149**

while engaged in primary interactions with his parent is engaged in play, as is a 7-year-old deeply focused on a game of checkers. Children can play by themselves or with a group of 15 others.

The widely variable, inaccessible nature of play has not been lost on those writing on the topic. As the quotes above exemplify, play is complex. And a number of researchers have attempted an all-encompassing operational definition of the construct. For instance, Vygotsky (1967) stressed that, through sociodramatic play in particular, both children's cognitive development and higher mental functions (e.g., inhibition) are strengthened as they navigate through the play situation and operate within the zone of proximal development (see Bodrova and Leong, 2015 for a review). Piaget (1962), however, focused on play for its own sake and conceptualized play as the way that children assimilated the external world to match their own concepts rather than to learn something new. Stuart Brown (2010) argues that play is evolutionary and has the following properties: apparently purposeless/done for its own sake, voluntary, inherent attraction, freedom from time, diminished consciousness of self, improvisational potential, and continuation desire. Gray (2013) also provides a list of features to describe play, with some overlap. His conceptualization maintains that play (1) is directed and chosen by the child, (2) is as an activity in which the focus is not the end-state or a goal, but the means themselves, (3) consists of structure that comes from the minds of the players and not external constraints, (4) is imaginative and separate from real life and, (5) involves mental, non-stressed activity.

Garvey (1990) joins in with a list of characteristics that has been widely cited suggesting that play is pleasurable, with no extrinsic goals, spontaneous and voluntary, involves engagement on the part of the player, and also that it is related to other cognitive and social functions that exist outside of play. Similarly, Smith and Pellegrini (2013) suggest that play is activity that is done with no extrinsic goal with the focus being the play itself, is flexible, and involves positive affect. They separate play from exploration and games. Weisberg et al. (2013) used previous research (Garvey, 1990; Sutton-Smith, 1997; Johnson et al., 1999; Hirsh-Pasek and Golinkoff, 2003; Hirsh-Pasek et al., 2009; Pellegrini, 2009; Burghardt, 2011; Fisher et al., 2011) to suggest four criteria that emerge in the flood of definitions to demarcate whether or not an activity is play. Play (1) has to have no specific purpose nor be linked to survival, (2) can oftentimes be exaggerated – e.g., playful experiences often are not necessarily related to how things work in regular, everyday life, (3) requires both joyful and voluntary participation, and (4) is child-led, not adultdirected.

Though it is easy to find the overlap, these should not mask key points of disagreement, especially when it comes to the purpose of play and the role of structure or scaffolding from others. For example, the idea that play requires that there is no goal, suggests that children playing a pretend play scenario that was crafted to build vocabulary is not play, even if the children are leading their exploration and having fun pretending to be at a grocery store. It suggests that children playing a board game are not really playing. Further, child-directed play would lose its cache as play if adults suggested that the children become explorers who hunted down magic bugs in the backyard. These conceptual differences lead to definitional confusion within the fields of education and developmental psychology. The criterial features of play are hard to pin down.

In this piece, we suggest a reason why discussions of play are often vague and conclude that broad definitions of the construct are too encompassing and as such, void of the nuance that this field demands. Rather than holding to one set of criteria, it might be better to conceptualize play as unfolding along a spectrum, or continuum, that ranges from free play, captured in definitions provided by Garvey (1990), Pellegrini (2009), Stuart Brown (2010), and Weisberg et al. (2013), to forms of play that are none-the-less child directed but that have inherent goals like guided play, and games (Hassinger-Das et al., 2017). Defining play as a continuum might also allow us to better specify not only the types of play, but the outcomes that emerge from each genre. For example, free play, with no extrinsic goal, might prove optimal for social development whereas guided play, in which adults take supportive (rather than leading) roles in service of a learning goal is repeatedly demonstrated to be effective for more academic types of learning. Here we attempt to chart a definition of play that allows us to capture the Play Spectrum and thus to make more refined hypotheses about how play relates to varied aspects of development – from traditional academic outcomes to the newer conceptualizations of skills needed for 21st century success [e.g., Golinkoff and Hirsh-Pasek's (2016) 6C's: collaboration, communication, content, critical thinking, creative innovation, and confidence].

Thinking about play as a spectrum enables us to retain a play essence where children experience joy and have agency in their play contexts while also recognizing that play may take many different forms and serve many different functions. We acknowledge that while there is little disagreement in the function of play as an avenue for social interaction and enjoyment, there is disagreement about the functions of play for learning. Here, we first review the range of experiences that would fall along our proposed play continuum, taking a more bird's eye view of the literature while suggesting that this more nuanced view provides cohesion amongst seemingly contradictory views of play. Then, we will use evidence generated from the Science of Learning, a multi-disciplinary approach that seeks to characterize how learning occurs and is supported through lessons learned across education, machine learning, linguistics, cognitive science, neurobiology, psychology, and other fields (Bransford et al., 1999) to spotlight guided play as a context that clearly demonstrates learning through play, but would not technically fit the global definition of play by Pellegrini and others. Finally, we end with the suggestion that a more inclusive and nuanced understanding of play allows the field to better understand findings to date about play and learning and generate new hypotheses moving forward.

### A MORE NUANCED DEFINITION OF PLAY

Free play, in which adults do not guide or scaffold, and in which there is no goal, is often hailed as the gold standard of play and is the focus of most of the traditional definitions we mentioned above (Gray, 2013). During free play, the child initiates and directs play. This happens when children sit in front of a mountain of building blocks that are not designed to build a particular outcome, or when children construct a fort in the living room. There is no pre-determined learning goal. There is a large body of research on how this type of play may benefit children and lead to positive developmental outcomes. But this laser-focus on one type of play prevented scholars and researchers from examining a wider range of experiences that are adult-scaffolded but remain playful in essence.

We argue that one can begin to add more specificity and nuance to the definition of play by imagining free play as one end of a spectrum (see **Figure 1**). In effect, we attempt to answer the call of Pyle et al. (2017) for "a need to move away from a binary stance regarding play and toward an integration of perspectives and practices, with different types of play perceived as complementary rather than incompatible" (p. 311). In free play, the child initiates the play context and also directs the play within that context. In contrast, if the adult chooses or arranges a context for learning, but the child directs the play within that context, we have guided play. Guided play can take the form of an adult playing with a child and offering scaffolding and guidance or an adult setting up a space or activity in such a way as to provide support as a child plays on their own (e.g., games). Children's museums are an excellent example of the latter (see Sobel and Jipson, 2016 for a review). Guided play differs from free play in two ways: an adult helps to structure the activity, and the activity is centered around a learning goal. Critically, however, the child must still retain agency to direct the activity.

If a child initiates a context for play and then an adult intervenes to direct the play within that context, we enter coopted play, not guided play. The child might have been interested in building a circus out of blocks, yet the well-intentioned parent swept in to declare that the animals were at the zoo, redirecting the child's vision and robbing her of some agency in the play experience. When adults initiate and direct using playful elements, the scene more closely resembles direct instruction – even if it is dressed up in playful "clothing." Habgood and Ainsworth (2011) cite Bruckman's (1999) term of "chocolate covered broccoli." Here a well-meaning adult decides that today, her child is learning about shapes and that she will be sure that she keeps the child on task by arranging the different shapes, counting sides, encouraging the child to place the blocks in appropriately shaped holes, and misses the opportunity to go on a "shape hunt" around the house.

The idea that discovery-based, active learning might prove a powerful pedagogical approach has been discussed for some time (e.g., Hirsh-Pasek et al., 2009; Bonawitz et al., 2011). Alfieri et al. (2011) conducted a meta-analysis of 164 studies and found that assisted discovery methods (those similar in nature to guided play in which adults support but children lead) resulted in the best learning outcomes (in domains as varied as: math, computer skills, science, physical/motor, and verbal and social skills) when compared to either free play or direct instruction. Research over the last few decades (see Hirsh-Pasek et al., 2015 for a review) has repeatedly shown that learning is optimized when adults scaffold an environment or feedback toward a learning goal but the learning environment encourages fun child-led exploration and discovery.

The expansion of our definition of play to include guided play widens the range of contexts and topic areas where play might have a beneficial impact on learning. Research in the past has found that free play was less effective in academic settings than direct instruction (Pianta et al., 2009; Fuller et al., 2017), but that does not mean that playful learning has no place in education. Rather, guided play, with its adult support and focus on particular learning goal, may offer an optimal pedagogical approach in academic contexts. In domains ranging from STEM [spatial thinking (Fisher et al., 2013)] to literacy (Han et al., 2010; Nicolopoulou et al., 2015; Hassinger-Das et al., 2016; Cavanaugh et al., 2017; Toub et al., 2018), children perform better in guided play than in free play and equal to or better than in direct instruction (though see Jirout and Klahr, 2012). Even studies of causal reasoning in infancy echo this idea. Work by Sim and Xu (2015) finds that 19-month-old toddlers are more likely to figure out what caused a machine to activate and to generalize that causal information in a guided, but not free play, condition.

### LESSONS FROM THE SCIENCE OF LEARNING

#### Why Guided Play Primes Learning

In 2015, several of the authors of this piece suggested that the science of learning – a newly minted amalgamation of research in psychology, education, neuroscience, machine learning, linguistics and others – has reached some consensus on features that comprise optimal learning environments (Hirsh-Pasek et al., 2015). Though first presented in the context of app use with education goals, the features that optimize learning processes are context general rather than task specific. In this piece, it was suggested that children learn best when the learning is active (minds-on) and engaged (not distracting), meaningful (applied to prior knowledge and transferred to the outside world), and occurring in a socially interactive environment.

Two additional characteristics of learning in playful contexts may also help explain why this pedagogical approach increases educational value: the joy and iteration that are inherent in play. Joy, or positive affect, has been linked to increased executive functions and academic outcomes (see Diamond, 2014 for a review) and even brain flexibility (Betzel et al., 2017). Iteration, or the mindful construction of new knowledge based on hypothesis testing and revising one's own knowledge over time, is a hallmark of learning and play (Piaget, 1962). Each of these characteristics is supported by the learning literature and is inherent in playful learning contexts.

It is important to note that these characteristics align with learning across the play spectrum – from supporting the development of executive functions through free play (Elias and Berk, 2002) to the development of mathematics when playing games and engaging in guided play and/or exploration (see Ginsburg, 2009 for a review; e.g., Siegler and Ramani, 2008; DeCaro and Rittle-Johnson, 2012; Zosh et al., 2016). However, different types of play will embody the characteristics to different extents, which will then lead to different benefits for learning and other outcomes. For example, free play with friends may be high on joy and social interaction, which could lead to the development of socio-emotional skills. In contrast, guided discovery learning at a science museum may be high on iteration and meaning-making, which could support STEM learning. We argue that guided play particularly harnesses active, minds-on thinking, engagement, meaning-making, joy, and iteration more so than other types of play, which helps it maximize learning, particularly for academic skills.

### Active "Minds-On"

The study of early cognitive development centers on the idea that children play active roles in the construction of knowledge (e.g., Piaget, 1962). The activity, here though, that is crucial is that of mental activity – the active manipulation and processing of information rather than observation or rote responding. Active learning – where people are focused and engaged and where they are making decisions about the flow of incoming information – always outpaces passive learning where the information presented is merely meant to be absorbed. There is a rich and growing literature in this area.

While teaching new information directly may seem as if it has the advantage of being efficient, it may discourage further discovery, deeper processing, and ultimately, learning – leaving some researchers to dub this phenomenon the "double-edged sword" of pedagogy. Bonawitz et al. (2011) offer an excellent example of the power of active mental manipulation for learning. In their study, children were given the opportunity to play with and learn from a novel toy that had a number of non-obvious functions. In one condition, a knowledgeable adult demonstrated a subset of those functions and then children were allowed to play with the toy. In this case, children passively watched the knowledgeable adult and then were given the opportunity to play with and learn from the toy. In another condition, a non-knowledgeable adult "accidentally" demonstrated a hidden function, inspiring an active mindset for the children, and children were then again allowed to explore as they saw fit. Children in the first condition were less likely than children in the second condition to explore the toy and discover its additional features. Inspiring minds-on thinking led to discovery and learning. Yang and Shafto (2017) find computational evidence that this type of discovery-based, active learning is especially important when the teacher and learner have different assumptions and knowledge.

In other direct comparisons, Zosh et al. (2013) compared toddlers' word learning when they were directly told the meaning of a novel word versus one in which they used process of elimination to determine the referent of a novel word. Even though the toddlers looked longer at the referent of the novel word in the first condition, they demonstrated greater retention of the novel label when they had to engage in the active processing task. Fisher et al. (2013) contrasted children's ability to learn the identifying information for shapes (e.g., a triangle is any shape that has three connected sides regardless of whether they are symmetrical or not. It is not merely a shape with a point at the top). Four and five-year olds were shown examples of various triangles and asked to discover the secret of how they were related. Much like the word-learning example of Zosh et al. (2013), children who had to discover the information for themselves had better immediate and long-term (1 week later) retention of this information than children who were directly told.

As noted in the introduction, some researchers (e.g., Pellegrini) separate exploration from play. Here, we conceptualize exploration as minds-on thinking, either in playful or non-playful contexts. Exploration on its own does not make a context playful, but playful exploration represents minds-on-thinking in an enjoyable and child-directed context, and as such can help support learning. This is one reason that play in general, and guided play in particular, is so effective.

Active, minds-on thinking is intrinsic to play. When coupled with guidance toward a learning goal, in a playful setting, such as in guided play, it is more likely for children to be minds-on with the information that adults hope that they learn and this is more likely to be retained than information shared in more passive contexts.

#### Engagement

Although being "minds on" is an important first step, staying "minds on" is crucial for learning. This is, perhaps, one of the greatest challenges that children face: Their ability to

resist distraction and stay on task develops over childhood. Ruff and Lawson (1990) examined children's sustained, focused attention during free play in childhood and found that children's ability to maintain focused attention increased over the first 5 years. Kannass and Colombo (2007) found similar results in a comparison of children's susceptibility to distraction between 3 to 4 years old. Further, children vary greatly in their susceptibility to distraction (Choudhury and Gorman, 2000; Dixon et al., 2006) and attention in earlier childhood is related to attention problems later in childhood (Martin et al., 2012). Even simple things such as pop-up books (Tare et al., 2010), instrumental music (Barr et al., 2010), and decorated classrooms (Fisher et al., 2014) distract young children and interfere with their learning. But crucially, susceptibility to distraction is, to some degree, malleable (Kannass et al., 2010; Neville et al., 2013).

Though play is often characterized as being a context with an absence of constraints, play naturally requires children to stay on-task, to balance their own wants with those of their social partners, and in the cases of pretend, to inhibit distractions from the immediate environment that conflict with the play narrative. As such, make-believe play has been linked to increased self-regulation ability but more data is necessary (see Berk and Meyers, 2013 for a review). For example, preschoolers who exhibited more socio-dramatic play early in the year showed increased self-regulation abilities later in the academic year (Elias and Berk, 2002). In a recent intervention study, Thibodeau et al. (2016) investigated the impact of a 5-week play-based intervention with preschoolers and found that those children who were in a fantastical pretend-play condition showed increased gains in executive function relative to children in a nonimaginative play condition or a business-as-usual control. This again speaks to the importance of a nuanced conceptualization of play. Not all play is created equal. Guided play in particular, where an adult scaffolds a situation toward a specific learning goal, may be especially helpful at maximizing engagement, particularly for younger children who are more susceptible to distraction.

#### Meaningful

Meaningful information is that which is relevant, connected to something familiar, and able to be transferred to new situations or problems. For example, there is a difference between memorizing the fact that a triangle has three sides versus understanding that the pizza slices, tortilla chips, and sailboat sails in the real world resemble triangles.

The challenge of meaning-making is the work of the early years. Even a young child who, on the surface, knows the count list does not necessarily understand the principle of cardinality – this true knowledge unfolds over time (Wynn, 1990; Sarnecka and Carey, 2008). That is, knowing the word "three" is not synonymous with an understanding that the word "three" maps onto (or indicates) the quantity of three objects, and that sets of three things can come in diverse sets of things as varied as cups or books. A child might know that she is supposed to share toys with her brother because her mother has told her to, but not understand the reasons why (that her brother also wants to play with the toys and will be upset if he cannot). Similarly, being able to recognize the printed alphabet letters does not directly relate to the phonological awareness that is necessary for reading (Blair and Savage, 2006).

The distinction between surface and deeper learning has a long history in the scientific literature. From Einstein's statement "The value of an education . . . is not the learning of many facts but the training of the mind to think something that cannot be learned from textbooks" to Ausubel's (1968) distinction between rote versus meaningful learning, the idea that learning goes beyond basic content or knowledge to transferable, generalizable, deeper thought continues today. Shuell (1990) adds to this idea by stating that rote learning (e.g., knowing the count list or being able to recite the alphabet) is a precursor to "real" learning (e.g., having true numerical knowledge or being able to read) and this is expanded even more by Chi (2009) who emphasizes the use of previous knowledge to help actively construct new knowledge for conceptual change. Deeper learning in number or vocabulary requires that the learner not only store information, but also connect it to prior information (see Hadley and Dickinson, 2018, for an example in vocabulary development).

Comparing multiple examples and drawing analogies between situations and systems are some of the most powerful learning mechanisms available to young children. For example, children tend to rely on superficial surface properties when comparing objects unless they are given multiple examples. Seeing multiple examples prompts children to compare and examine the features that are common to each (Gentner and Namy, 1999). Making analogies between situations can also lead to new insights about problem-solving (e.g., Holyoak et al., 1984; Brown et al., 1986, 1989; Daehler and Chen, 1993; Chen, 1996) understanding scientific principles (e.g., Ganea et al., 2011; Kelemen et al., 2014; Shtulman et al., 2016), or learning moral lessons (see Mares and Woodard, 2005 for a review).

When children play, they choose themes, objects, and people that are relevant and interesting to them. Thus, they are motivated to make meaning out of the information in their play. Guided play or games can teach effectively by presenting information that is contextualized in ways that make sense to children. In one study, Habgood and Ainsworth (2011) created two versions of an educational computer game. In one version, 7 to 11-year-old children had to use an understanding of division to "divide" zombies and defeat them. In another version, children defeated the zombies using standard game methods, and then solved division problems at the end of each level. When the information about division and factors was made meaningful within the game, it led to better learning: Children who played the integrated version of the game outperformed the other group on a division test 2 weeks later.

Guided play may particularly support this type of meaning making because young children may struggle to do it on their own. Mares and Acosta (2008) found that kindergarteners who watched a story about dogs who befriended another dog with a missing leg tended to draw the very narrow lesson to "be kind to three-legged dogs," rather than the broader moral lesson about accepting people who are different. However, children learn more from reading a book when a parent or other adult asks questions that encourage connecting the story to their existing knowledge – a process known as dialogic reading (e.g., Hargrave and Sénéchal,

2000). For example, the reader might ask children to think about how a character might be feeling or point out how an element in the story is similar to something from the child's own life. Thus, adults can help scaffold children to make connections between new information and what they already know, thereby helping to make the new information more meaningful and supporting learning. This type of meaning-making supports learning in more informal, play-based contexts as well. For example, research in children's museums suggests that instructing caregivers to ask questions such as "why?" helps children to learn more from their experience (Benjamin et al., 2010) and studies investigating how to increase learning pinpoint that scaffolding is crucial (Wolf and Wood, 2012; see Andre et al., 2017 for a review), especially to taking the learning beyond the museum. Guided play harnesses the power of children's own agency and discovery but couples it with the adult-supported scaffolding that maximizes learning through meaning-making.

### Socially Interactive

Although children can play on their own, they also frequently play with parents, siblings, friends, or classmates. Playing with others also adds social meaning to the activity at hand. Chi (2009) describes how peer interactions can involve "building on each other's contribution, defending and arguing a position, challenging and criticizing each other on the same concept or point, asking and answering each other's questions" (p. 83). Through these processes, the two individuals each contribute to the conversation in such a way that it helps construct new shared knowledge. Indeed, it has been suggested that the sharing of information between individuals acts as a type of "natural pedagogy" (Csibra and Gergely, 2009, p. 148), in other words, social interaction is, in itself, a mechanism for learning.

Entire theories are centered around the role and importance of social partners not just for learning but for lifetime attainment of things such as independence, self-worth, and fulfillment (e.g., Vygotsky, 1967). Perhaps nowhere is this more important than in infancy and early childhood, and infants seem to be born looking for this interaction (e.g., Meltzoff and Moore, 1983). Infants and children prioritize input and learn more from social cues compared to non-social presentations of the same information [e.g., a human arm versus a robotic arm (Wu et al., 2011); a face or a flashing cue (Wu and Kirkham, 2010), and even a communicative point versus non-communicative reaching (Yoon et al., 2008)].

Social interaction in infancy and childhood centers around interactions with parents/caregivers and peers<sup>1</sup> . Both have been shown to be important resources for children. Parents and/or caregivers are an infant's initial social partner, and the quality of this early caregiver-infant relationship has been linked to many different positive outcomes. For example, a parent's contingent responses to a child's vocal play support language development (see Tamis-LeMonda et al., 2014 and Reed et al., 2016 for reviews). Recent work suggests that direct gaze sharing between a parent and infant promotes neural connectivity and communication bidirectionally (Leong et al., 2017).

Parent/child interaction can also promote healthy socioemotional regulation critical for academic achievement and can even serve as a protective factor against the negative physical and cognitive effects of stress (see Center on the Developing Child at Harvard University, 2016 for a review; Nelson et al., 2014; Nelson, 2017).

Play encourages social interaction for young children in a number of ways. Playing with peers has been shown to support learning. For example, Ramani (2012) found that children built larger, more complicated structures when they were engaged with a peer in a playful building activity compared to when they were presented with the same materials in an adult-directed and adult-structured activity. Similarly, social interaction among preschoolers was related to increased complexity of building with blocks (Trawick-Smith et al., 2017). Although Park and Lee (2015) suggest that one of the advantages of working with a peer is benefiting from a higher-ability peer or one with higher social skills, even the illusion of working collaboratively has positive effects. Preschoolers who were told they were collaboratively working on a puzzle with a child in the next room persisted longer on the task and reported liking it more compared to children who knew they were working alone or were told that they were taking turns (Butler and Walton, 2013). And crucially, children seem to be rather discerning and take into account the knowledge and reliability of their social partners (Bonawitz and Shafto, 2016).

While free play has traditionally been recognized as optimal for promoting social interaction, even in play with peers, it is important for adults to scaffold and protect the playful peer interaction (Ghafouri and Wien, 2005) as even young children are susceptible to social loafing (Arterberry et al., 2007), bullying (Kirves and Sajaniemi, 2012), and exclusion (Fanger et al., 2012), suggesting that guided play may have a role in the development of positive social skills.

It is important to note that the presence of social interaction does not necessarily make a situation playful. Teachers using didactic methods to instruct a class can be interacting with a class that is devoid of any play. Further, and on the opposite side, children can engage in solo play that is joyful and child directed, but that is not social at all. We suggest here that playful pedagogies are effective because they often harness the power of high quality social interaction, in combination with the other characteristics outlined here (joy, active thinking, engagement, meaningful, and iterative) to support learning<sup>2</sup> .

#### Iterative

Acquiring knowledge requires more than the deposit of facts from the more educated into the less educated; instead, learning is similar to the scientific process. As Piaget (1964) notes:

<sup>1</sup>Note that parasocial relationships, in which children form emotionally connected relationships with characters, also have been linked to increased learning potential. See Calvert (2017) and Hirsh-Pasek et al. (2015) for reviews of how parasocial relationships can utilize the benefits of social interaction to maximize learning.

<sup>2</sup>Note that not all of the examples provided in this section would necessarily fall somewhere along our playful continuum (e.g., watching a video). Instead, what we suggest is that social interaction, in general, has the power to support learning and that, when combined with the other characteristics, helps explain why playful pedagogies are so effective.

Knowledge is not a copy of reality. To know an object, to know an event, is not simply to look at it and make a mental copy or image of it. To know an object is to act on it. To know is to modify, to transform the object, and to understand the process of this transformation, and as a consequence to understand the way the object is constructed. (p. 176)

To learn, even young infants, or, as Gopnik et al. (1999) called them, the "scientists in the crib," engage in the process of generating hypotheses, testing those hypotheses, and then using the generated data to inform one's own understanding. In other words, they construct the knowledge using the methods described by Piaget. Learning requires that knowledge generation is an iterative process in which a child uses what he or she knows to generate new hypotheses, tests those hypotheses using minds-on thinking, and updates his or her understanding based on those tests. A striking example of this comes from decades of research examining young infants' reasoning about physical objects and relationships finding that even young infants have expectations about what objects can and cannot do, but they revise this knowledge over infancy and it becomes more accurate and nuanced as they acquire additional experience (see Baillargeon, 2004 for a review; Wang et al., 2016; Baillargeon and DeJong, 2017).

Indeed, children explore more when violations of their expectations occur (e.g., Schulz and Bonawitz, 2007; van Schijndel et al., 2015). In Stahl and Feigenson (2015), researchers presented 11-month-old infants with visual presentations in which expectations about normal objects were violated (e.g., an object passed through a solid surface or an object seemingly blipped out of existence) and compared their immediate learning to presentations in which there were no violations of expectations. The infants who observed an object appear to violate physical laws were more likely to learn about a hidden property of the object, and they spent more time exploring the objects. They even appeared to test out the apparent violation; for example, children who observed an object appear to pass through a solid wall spent more time banging it on surfaces than children who did not witness the violation. Preschoolers will preferentially explore a toy characterized by confounded evidence over toys without ambiguity (Schulz and Bonawitz, 2007). Thus, not only do young infants generate rules based on evidence but they actively seek to revise these rules over time.

Play inspires iteration. Guided play, in particular, can be described as "constrained tinkering" where, within a bounded exploration space, children have the freedom to test out different hypotheses. Unlike more direct instruction contexts in which active exploration and discovery are often thwarted, playful contexts encourage exploration and discovery as a focus. For example, in one study, children were more likely to play with a toy when the causal structure of the toy (i.e., which lever caused a toy to pop up) was ambiguous than when it was clearly demonstrated to them (Schulz and Bonawitz, 2007; see also Cook et al., 2011; Buchsbaum et al., 2012). Pretend play also invites iterative processing as children must not only keep in mind conceptual premises that exist outside of reality but also adapt to changing circumstances as the play session continues (e.g., Harris and Kavanaugh, 1993; Weisberg and Gopnik, 2013). As with the other characteristics, iteration alone is not a necessary feature for play. Yet, play often inspires iteration. And although all types of play may inspire simple iteration, some adult support in the form of guided play may be necessary for more advanced types of hypothesis testing, such as those involved in higher scientific thinking (e.g., generating hypothesis about what variables cause a particular effect and testing those hypotheses in the midst of confounding variables). Research has found that children are not very good at designing appropriate experiments on their own (e.g., Klahr et al., 1993), but can do so in a child-directed and fun way if offered some adult guidance (see Lazonder and Harmsen, 2016 for a meta-analysis).

#### Joyful

Joy is an essential element of play: Early scientific writings on play mention "positive affect" and "intrinsic motivation" as defining features of what makes an activity playful (Krasnor and Pepler, 1980). Joy is inherent, and required, for an activity to be considered play. Even in the most constrained definitions of play cited above, joy is a central characteristic.

The idea that positive affect influences cognition is not new (Isen, 1984). Ashby et al. (1999) proposed a neuropsychological theory relating positive affect to both long-term and working memory as well as creativity in problem solving. Indeed, positive affect is linked to increased creativity (Isen et al., 1987), and creative thinking is linked to increased learning (Resnick, 2007; Zosh et al., 2017).

The idea that emotions and cognition are linked is only growing in popularity. Fischer and Bidell (2006) highlight the dynamic nature of development, with emotion and cognition as "two sides of the same coin as characteristics of control systems for human activity. Emotion is together with cognition at the center of mind and activity." (p. 370). Recent research in psychology and neuroscience furthers supports this idea (Immordino-Yang and Damasio, 2007). In fact, an entire field of psychology touts its benefits for a variety of positive outcomes, including learning (Seligman, 2002).

Positive affect is not the only aspect of joy implicated in learning. Surprise also seems to play a role in increasing curiosity and exploration – leading to increased learning potential. In the Stahl and Feigenson (2015)study described above, infants learned more when their expectations were violated, and also engaged in more information-seeking behavior and hypothesis-testing, congruent with the violations they observed. Neuroscientists are beginning to unravel the neural correlates of affect and surprise on learning (Betzel et al., 2017), potentially through increased dopamine levels, which are implicated in the brain's reward system and motivation (e.g., Cools, 2011; Dang et al., 2012).

Beyond positive affect, intrinsic motivation is a key distinguishing feature of play, even in the traditional definitions offered above. The definition of intrinsic motivation offered by Ryan and Deci (2000) overlaps considerably with our conceptualization of play.

Perhaps no single phenomenon reflects the positive potential of human nature as much as intrinsic motivation, the inherent

tendency to seek out novelty and challenges, to extend and exercise one's capacities, to explore, and to learn. . .. The construct of intrinsic motivation describes this natural inclination toward assimilation, mastery, spontaneous interest, and exploration that is so essential to cognitive and social development and that represents a principal source of enjoyment and vitality throughout life (Csikszentmihalyi and Rathunde, 1993; Ryan, 1995) (Ryan and Deci, 2000, p. 70).

This definition stresses minds-on thinking, engagement in the material to be learned, exploration/iterative thinking, assimilation (meaningfulness), and learning. These concepts all align with the same exact features that support learning. The key here, however, is that children engage in this process with agency. They are intrinsically motivated to learn and discover. Playful learning contexts, in which children lead the play experience with or without adult support (see **Figure 1**), thus capitalize on this intrinsic motivation to harness children's own learning potential. Decades of research have investigated the role of intrinsic motivation and support its importance for learning and creativity among other positive outcomes (Ryan and Deci, 2017). In a recent review of programs that improved children's executive functions, Diamond (2012) highlighted a potential mechanistic explanation for why play and learning may be mutually supportive: "Children devote time and effort to activities they love; therefore, EF interventions might use children's motivation to advantage" (p. 335). Play, an inherently positive experience for children, has the potential to be the context that provides this advantage.

### Outstanding Issues

Active, engaged, meaningful, social, iterative and joyful are characteristics that individually and collectively appear in a number of scientific articles that highlight processes involved in optimal learning. These same characteristics coalesce in play. Thus, playful learning – and in particular guided play – should confer real learning advantages for academic and social outcomes.

Adopting a more nuanced understanding of the play construct allows us to better understand how play might support learning: Different types of play may be optimal for different types of learning. This speaks not only to the theory of why guided play works, but also to long-standing debates in the field regarding optimal pedagogical approaches to learning. It also raises new questions that can guide further research.

To the issue of why guided play works, we not only offer the parallel between the characteristics of play and high-quality learning, but also a theoretical argument for why guided play, in particular, feeds learning specific information. Bonawitz et al. (2009) argue that the possibility space of learning new information is vast, "Learning the affordances of a novel artifact is challenging because for any object, there are an unknown, and potentially large, number of causal properties." (p. 1575). However, the danger of direct pedagogy is that it limits exploration and discovery, ". . .in natural learning contexts, pedagogical demonstrations cannot demonstrate all there is to know, and teaching will necessarily be limited." (p. 1580). Because there are too many degrees of freedom in a free play situation, a child actively engaged with shapes, for example, who is not "guided" toward a learning outcome could guess that triangles have "points on top," are in "primary colors," or are "2 inches in length" rather than focusing on key variables like the number of sides and angles. However, what direct instruction does is teach children that these exact exemplars are triangles and the child may not understand that a square split along its diagonal is triangle because that was not what was taught. Thus, when free play pedagogies have been compared to direct instruction pedagogies – direct instruction pedagogical approaches are often better suited to learning (Pianta et al., 2009; Fuller et al., 2017). Bonawitz et al. (2009) suggest "Understanding how to combine the efficiency of pedagogical knowledge transmission while encouraging curiosity and exploratory play is an important direction for future work." (p. 1580).

The introduction of a play spectrum with nuanced categories like guided play, however, changes the dynamic and answers this call. Guided play, like direct instruction techniques, constrains the contexts in which children generate hypotheses, effectively helping them to hone in on the learning and avoid distraction. For example, it invites children to play with a set of triangles that can be compared and contrasted such that key properties "fall out" of the context. It might also offer a "coach" who helps children direct attention to these defining characteristics. Thus, in Bonawitz's theory, guided play points children in a direction that allows for "constrained tinkering."

Guided play, however, also adopts the characteristics noted above and allows or even motivates the child to direct the learning in a joyful way. Thus, guided play, sitting midway between direct instruction and free play, allows for the best of both pedagogical approaches. In this context, it no longer makes sense to say that pedagogies should be either play or direct instruction. High quality schools can have rich curricular goals and at the same time deliver them through guided play techniques. Indeed, this is precisely the formula that was recommended in a number of recent papers (Bustamante et al., 2017; Fuller et al., 2017; Jenkins and Duncan, 2017), and Jenkins and Duncan (2017) write about the most effective pre-K curricula, ". . . these curricula provide teachers with lesson plans to follow in which playful activities are strategically organized to present children with learning opportunities that are focused, sequential and cumulative" (p. 39, see also Burchinal, 2018).

By recognizing a continuum of play categories, we can better understand why play in general, and guided play in particular, is related to learning and we can accelerate learning outcomes by designing targeted play pedagogies.

This spectrum-based view of play also allows us to better formulate questions about where and when particular types of playful learning might prove predictive of particular outcomes. As Jirout and Klahr (2012) argue, for second grade math curricula, direct instruction might prove a more effective way to help children settle on the right formulas (though see Weisberg et al., 2016). Might dramatic play be an optimal way for young children to learn socioemotional skills (Copple and Bredekamp, 2009; Goldstein and Lerner, 2017; but see Lillard et al., 2013)? Might guided play lead to stronger outcomes in literacy (Han et al., 2010; Hassinger-Das et al., 2016; Cavanaugh et al., 2017;

Toub et al., 2018) and STEM [e.g., spatial thinking (Fisher et al., 2013)]? While there are hints in the literature that confirm each of these hypotheses, more work needs to be done. Indeed, this work is beginning. In an impressive review and analysis of pretend play in particular, Lillard et al. (2013) explore whether or not pretend play holds a causal role in supporting development. They find that when carefully examined, the evidence to date does not allow us to draw any firm conclusions about the casual role of pretend play and suggest that more and better research is needed. We could not agree more and suggest that viewing play as a spectrum allows us to create new hypotheses and design new studies to help tease apart the role of play and playful learning in developing a whole host of skills across childhood and beyond. In fact, Lillard et al. (2013) even directly state that the lack of firm research about the impact of pretend play on development does not equate to a call for teacher-centered approaches to education and learning. Instead, "The hands-on, child-driven educational methods sometimes referred to as "playful learning" (Hirsh-Pasek et al., 2009) are the most positive means yet known to help young children's development." (Lillard et al., 2013, pp. 27–28).

That is, it is possible that as we sketch out a suite of 21st century skills like those suggested by Golinkoff and Hirsh-Pasek (2016) that different types of playful learning are more or less effective. It will also be critical to explore whether different types of play afford these different advantages in the same way across context and time. As Schindler et al. (2017) reminds us,

"This rapidly advancing science calls for a new early childhood agenda that builds on current investments in quality improvement and system building and seeks new models and methods in the quest for greater impacts. To this end, there is a need for enhanced theories of change and more effective strategies that move beyond

#### REFERENCES


the general question of "what works?" and seek a more nuanced understanding of what works (and what does not) for whom and why, and in what contexts (Shonkoff and Fisher, 2013)" (p. 1436).

#### CONCLUSION

While many would agree that play is an important part of childhood and supports social interaction and growth, questions about the relationship between play and learning abound and there is renewed energy around the study of play. To better harness this energy, though, we need to have a working definition of play that is not as broadly construed as that proposed in the global literature. Here, based on the newest research and with respect to playful learning studies in the past, we propose a multidimensional definition of play that creates a spectrum of play opportunities from free play through guided play to games and then playful direct instruction (a form of direct instruction with minor playful elements to try to keep children engaged). This more nuanced definition allows us to better define the mechanisms for playful learning – how and why different types of play are related to various types of outcomes. It also challenges us to raise new questions in the field that should enhance our understanding of how play relates to varied outcomes across time and in varied contexts.

#### AUTHOR CONTRIBUTIONS

JZ was responsible for the first draft of the manuscript. KH-P and JZ were primarily responsible for main revisions. EH, HJ, CL, DN, SS, and DW were responsible for writing sections of the manuscript. All authors contributed to manuscript revision, read and approved the submitted version.




Proc. Natl. Acad. Sci. U.S.A. 114, 13290–13295. doi: 10.1073/pnas.17024 93114



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Zosh, Hirsh-Pasek, Hopkins, Jensen, Liu, Neale, Solis and Whitebread. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Information Theory, Developmental Psychology, and the Baldwin Effect

#### Erick J. Chastain\*

*Department of Ecology and Evolutionary Biology, University of Tennessee Knoxville, Knoxville, TN, United States*

As part of the extended evolutionary synthesis, there has recently been a new emphasis on the effects of biological development on genetic inheritance and variation. The exciting new directions taken by those in the community have by a pre-history filled with related ideas that were never given a rigorous foundation or combined coherently. Part of the historical background of the extended synthesis is the work of James Mark Baldwin on his so-called "Baldwin Effect." Many variant re-interpretations of his work obscure the original meaning of the Baldwin Effect. This paper emphasizes a new approach to the Baldwin Effect, focusing on his work in developmental psychology and how that would impact evolution. We propose a novel population genetics model of the Baldwin Effect. First, the impact of a kind of learning process motivated by motor babbling, in the developmental psychology literature, on evolution; second, that Information-theoretic phenotype reshaping speeds up evolution compared to populations without this kind of learning. The basic idea behind the model is to allow the organism to apply abstraction to his initial phenotype to situate it within one of a few different classes of phenotypes in the local neighborhood of a fitness maximum. The reshaping of the phenotype space thereby allows the organism to reach a nearby fitness maximum. By so doing, valleys in the fitness landscape are leveled out, making a rugged fitness landscape into a set of mesas and plateaus with increasing height. Using this model we can show the first sizeable speed-up for the Baldwin Effect compared to ordinary population genetics. We also introduce an information-theoretic foundation for the Baldwin Effect, which may be of independent interest.

#### Edited by:

*Patricia Shaw, Aberystwyth University, United Kingdom*

#### Reviewed by:

*Valery E. Karpov, National Research University Higher School of Economics, Russia Samuel Scheiner, National Science Foundation (NSF), United States*

\*Correspondence:

*Erick J. Chastain chastain.erick@gmail.com*

Received: *22 December 2017* Accepted: *08 August 2018* Published: *04 September 2018*

#### Citation:

*Chastain EJ (2018) Information Theory, Developmental Psychology, and the Baldwin Effect. Front. Neurorobot. 12:52. doi: 10.3389/fnbot.2018.00052* Keywords: evolutionary biology, developmental psychology, phenotypic plasticity, population genetics, information theory

### 1. INTRODUCTION

Phenotypic plasticity (DeWitt and Scheiner, 2004), under its different aspects (learning, social and cultural innovation) has come into sharp focus recently as a possible source for genotypic change, with some going so far as calling this set of new ideas an extended evolutionary synthesis (Laland et al., 2015). Ordinarily in the classical evolutionary synthesis, genetic change arises from mutation, drift, or selection. The extended evolutionary synthesis seeks means by which phenotypic plasticity or phenotypic change more broadly speaking can affect genetic change, through processes like epigenetic inheritance and niche construction. Antecedents of the extended synthesis were pondered by James Mark Baldwin in his work on the Baldwin Effect. Baldwin showed that there could be a way that ordinary genetic change could occur in a direction controlled or "ratified" by phenotypic plasticity, but of a very specific form: abstraction of inputs, leading to a different kind of space in which phenotypes can be generated. Abstraction is the formation of more general concepts from specific examples, in contrast with learning. Learning is the process of associating stimuli with rewards, or the process of making good predictions (of new examples). Abstraction is the formation of novel representations that can broadly generalize more abstract features of stimuli in order to "understand" or simplify dealing with them in the future. In other words, abstraction is more like compression or unsupervised learning than ordinary learning (which is normally called supervised learning) (Bishop, 2006). Through the various forms of abstraction and developmental stages of organisms (as further elucidated by Piaget et al., 2013) the phenotype could change, often radically, leading to new representations and means for phenotypic variation. Finally, the very form of plasticity itself could evolve, and was heritable, according to Baldwin. But in a quirk of history, much of his original work, which was in developmental psychology, went unread. Instead, a new "Baldwin Effect" was broadcasted to Evolution researchers by Simpson (1953): a process by which non-heritable plastic traits would be replaced by specific fixed genetic factors (Scheiner, 2014). Others have followed up primarily along the same lines as Simpson, including Waddington's work on canalization (Waddington, 1953), and those who consider the effect of learning on genetic change (Hinton and Nowlan, 1987).

In this paper, the Baldwin Effect, as understood by Baldwin himself is analyzed with a close reading of his work in developmental psychology. We conclude that his intention was to show how new kinds of behaviors could develop via abstraction that would allow for very advanced information-processing by an organism with the rudiments of phenotypic plasticity. Secondarily, Baldwin showed how abstraction could impact the direction of genetic change. A crucial assumption he introduced is that organisms with phenotypic plasticity can try out motor actions and obtain reward signals in their juvenile state which are a good approximation to the fitness of the behavior (If the behavior is considered as a genetically-determined reflex).

Besides the problems of Baldwin Effect interpretation, there are questions about whether it even makes sense in ordinary population genetics. The usual interpretation of the Baldwin Effect, as elaborated by Hinton and Nowlan, is that by using learning algorithms one can reach higher fitness during one's lifetime, thus smoothing the fitness landscape for animals born with bad reflexes (and thus paving the way for those better reflexes to emerge quickly in the population). Recent work has exposed that the examples used by Hinton and Nowlan are easily and quickly solved by ordinary evolutionary dynamics (Santos et al., 2015). Therefore, the idea that the learning-based Baldwin Effect (as in Hinton's work) could somehow be more efficacious than ordinary genetic change via mutation, drift, and selection has yet to be shown. The impact of phenotypic plasticity on evolution, and its connection to the Baldwin effect, has been shown by Scheiner et al. (2017), Waddington (1953), and others. In this paper we focus specifically on two more novel influences of phenotypic plasticity on evolution: first, the impact of a kind of learning process motivated by motor babbling (Information-theoretic phenotype reshaping), in the developmental psychology literature, on evolution; second, that Information-theoretic phenotype reshaping speeds up evolution compared to populations without this kind of learning, all using ordinary population genetics. As shown by Szathmary, it was a known flaw in the older model of the influence by learning processes on speed of evolution that their speed-ups were not much faster than those associated with ordinary genetic drift.

A fitness landscape is the landscape which has its height defined by the fitness of the phenotype (as it varies over the space of all possible phenotypes). An example of an easy fitness landscape is a single hill, which can be climbed by natural selection. A medium-difficulty fitness landscape is a flat landscape with a peak (which requires some randomness but does not work against natural selection). A hard fitness landscape is a rugged landscape with many peaks (which requires randomness to work against natural selection). In this paper, the Baldwin Effect introduced is formally modeled and shown to significantly speed-up evolution in a rugged fitness landscape using ordinary evolutionary dynamics. Specifically, without the Baldwin Effect, the time for the most fit mutant to fix in the population slows down exponentially as fitness valleys get deeper. With the Baldwin Effect, there is no dependence of the time to fixation on the depth of the fitness valleys. The core insight is to show that the ability for organisms to undergo rewardbased sensorimotor abstraction during their youth allows them to effectively flatten the hills in a rugged landscape. In human neonates, this process of play (as motor babbling Meltzoff and Moore, 1997) combined with intrinsic and extrinsic reward, allows them to reach novel representations for goals. The means by which abstraction can do this is by allowing the organism to "cluster" all possible phenotypes into those that are close to the same fitness maximum, and then "decode" or "assign" the initial phenotype to its cluster. Under the Baldwin Effect the reshaped rugged fitness landscape is a set of neighboring plateaus and mesas that are of increasing height. Effectively then a transitional mutant on the path to one with maximal fitness can arise as a neutral intermediate mutation rather than as a deleterious intermediate mutation. Of possible independent interest is a link established between the Baldwin Effect and a certain kind of information-processing that is nearly optimal for the Gaussian channel (in an Information-theoretic sense).

### 2. MATHEMATICAL BACKGROUND

The mathematical background necessary for the formal model is the models and tools of Information theory. To understand information theory, consider the following communication game. Alice is communicating to Bob with a continuous-valued signal. At each point in time, the signal is corrupted by Gaussian noise (with a mean of zero). Bob receives the noisy signal and must decipher with high accuracy Alice's message. In information theory this is called communication over a Gaussian channel. It seems difficult for Alice to communicate in such a way that Bob can reconstruct her message. But, intuitively, Alice could exploit redundancy, sending many different similar codewords for every single message sent. By exploiting redundancy Alice can thus hope that Bob, knowing this coding scheme, can find the codewords that correspond to the same message. The process Bob uses to find the corresponding codewords is called decoding.

In particular, Alice could send a numerical codeword of length n such that the average sum of squares (power) for each character of the codeword is bounded by ω. An example of a codebook along these lines is the random codebook, that is, one chosen at random. A Gaussian random codebook chooses for each message a random sequence of n values sampled from a zeromean Gaussian distribution with variance ω − ǫ, with ǫ being positive and small (Cover and Thomas, 2012).

For decoding, Bob can take the corrupted codeword z and find the nearest codeword, that is the codeword x in the codebook that minimizes the Euclidean distance D(x, z) = P<sup>n</sup> i=1 (x<sup>i</sup> − zi) 2 . For the Gaussian random codebook, Bob must also declare an error if the power of the nearest codeword is not less than ω. Smith and Morowitz (2016) The coding and decoding algorithm just described is optimal for communication between Alice and Bob for the Gaussian channel, communicating messages accurately at the maximal rate possible.

### 3. BACKGROUND

In this section is described a novel approach to interpret the Baldwin Effect within the framework of "Evolution of learning" using developmental psychology.

#### 3.1. Interpretations of the Baldwin Effect

In this part of the paper we describe Baldwin's work and how it relates to the literature on the "Baldwin Effect." Specifically, we define and interpret the terminology Baldwin used to describe the Baldwin Effect. We also connect Baldwin's work on developmental psychology and how the life history of the organism contributes to the Baldwin effect and evolution of the species. A connection is made between the learning-based Baldwin Effect literature and an approach inspired more by the kinds of learning processes highlighted by Baldwin in his developmental psychology work.

#### 3.1.1. Baldwin Effect qua Baldwin or Impact of Learning in a General Sense Acting Genetically

Any ideas of Baldwin were conditioned by his time and place: our models should be based on modern ideas about genetics and development. As such we should be cautious to use his theories as-is. Rather we should try to use modern ideas about genetics and development to formulate our models, giving credit to Baldwin for having a germ of some of these ideas in a pre-genetic context.

Baldwin gives a very explicit example of how Organic selection in its general sense can influence the direction of natural selection and variation, respectively. Baldwin's example concerns the origin of grasping and how functional selection can influence it:

"We may imagine creatures, whose hands were used for holding on with the thumb and fingers on the same side of the object held, to have first discovered, under stress of circumstances and with variations which permitted the further adaptation, how to make intelligent use of the thumb for grasping opposite to the fingers, as we do now. Then let us suppose that this proved of such utility that all the young that did not do it were killed off; the next generation following would be intelligent or imitative enough to do it also. They would use the same coordinations intelligently or imitatively, prevent natural selection getting into operation, and so instinctive "thumb-grasping" might be waited for indefinitely by the species and then arise by accumulated variation" Baldwin (1902).

Inspired by the preceding, and adapted for modern genetics, what Baldwin describes is thus a two-stage process of


The first stage is thus a kind of phenotypic plasticity associating instincts, and the second is a learning process.

Then after mutation acts on this high-viability population one gets a new population which also is retained, starting from variants of the same instincts which lead to good phenotypes with functional selection. The new population could have new instincts that do better than the old instincts, and are closer to the phenotype that is produced by functional selection. If such a mutant arises in the population, it would have higher fitness and thus create a new population, after which the twostage process continues. Baldwin points out that this process terminates with a population that has instincts that match what functional selection produced at step (1) of the first chain of twostage processes that were kicked off by functional selection. We call this the Baldwin effect qua Baldwin, noting that it is not the same as what Baldwin described due to its being framed in the context of modern genetics. We wished to call this model the Baldwin effect qua Baldwin in order to honor that his writing on learning in developmental psychology was a major inspiration for its formulation.

If we compare this mechanism with the variety of Baldwin effects identified in the literature, we can say the following:

1. Niche construction (Griffiths, 2003)

Niche construction takes a fitness landscape of genetic variations that exist in the population and reshapes it. A special case of this is "Social Heredity" in which cultural selection allows one to reshape the fitness landscape. In fact the process Baldwin describes, because it involves real novelty of the phenotypes, will reshape the whole space of phenotypes, and then reshape the fitness landscape. The emphasis in the mechanism above is focused more on learning and its effects on the fitness landscape.

2. Smoothing the fitness landscape with learning (Hinton and Nowlan, 1987)

Closely related to the Niche Construction view, but with a stronger connection to the Baldwin Effect qua Baldwin is the work of Hinton & Nowlan. They showed that when one evolves in the space of bitstrings (strings of one's and zero's

such as 11010), if one starts out with a bitstring with medium hamming distance from the optimal (medium error), then one will retain those because backpropagation learning can set the other positions accurately and thus increase viability. This then increases the fitness of the medium error types, leading to a smoothing of the fitness landscape. The Baldwin effect qua Baldwin differs in two respects from Hinton & Nowlan: the learning mechanism (as outlined in the previous section) and that in Baldwin's case the phenotype space itself is reshaped in such a way as to both generate the optimal phenotype and reshape the fitness landscape in this space to be much easier (either by being smoother or reducing mutational distance to the optimal type). To place the Baldwin effect qua Baldwin in the same setting, imagine one can come up with a different representation for bitstrings, one which is useful for the environment. Then the Baldwin Effect qua Baldwin will reshape the phenotype space from the original bitstring representation to the new representation, and in the new representation the fitness landscape is smooth or the number of mutations until one gets an optimal bitstring is smaller. We will describe a more advanced example of how the phenotypereshaping version of the Baldwin Effect can speed up evolution in the Results section (section 4.1.1).

The mechanism described above (Baldwin qua Baldwin) is a kind of model for the impact of learning on evolution, like Hinton and Nowlan's work, but it is focused on learning processes found in developmental psychology that are different than the backpropagation neural networks considered by them. Moreover, the genetics framework considered is modern genetics rather than genetic algorithms. We will discuss in the sequel the relation between Baldwin's developmental psychology work in motor learning as formalized here and more recent work by Meltzoff on motor babbling.

3. Genetic assimilation (Simpson, 1953; Waddington, 1953) In Genetic Assimilation (according to Livnat et al., 2014), there is some structure to the phenotype (modeled by say a boolean function f) that when one combines the various expression levels of some proteins with variables related to the environment one gets novel phenotypes. One tries to generate phenotypes that are varying levels of expressions for proteins which when presented with novel environmental inputs can generate novel responses (e.g., assignments to some inputs of the boolean function). Then the assignments to the inputs of f which lead to high viability generally are retained in the population. Here the mechanism of variation was to change the assignments to f that are geneticallycontrolled. But the mechanisms of change in phenotype is due to phenotypic plasticity, an environmentally-induced change which can be far from random. In the Baldwin Effect qua Baldwin, the variation itself is based on learning mechanisms, and actually reshapes the phenotype space, whereas in Genetic Assimilation, it is of a different kind of phenotypic plasticity. That is, the phenotype space in Genetic Assimilation doesn't get changed, say, from the space of bitstrings to the space of even or odd bitstrings. Whereas it does for the Baldwin Effect qua Baldwin.

#### 3.1.2. The Impact of the Baldwin Effect qua Baldwin in a General Sense on Variation and Generation of Novel Phenotypes

Now for the origin of variations or novel phenotypes, Baldwin gives the example of a child learning how to write:

"Every child has to learn how to write. If he depended upon chance movements of his hands, he would never learn how to write. But on the other hand, he cannot write simply by willing to do so. . . . What he actually does is to use his hand in a great many possible ways as near as he can to the way required; and from these excessively produced movements, and after excessively varied and numerous trials, he gradually selects and fixes the slight successes made in the direction of correct writing" Baldwin (1902).

Note that in the above case, we have a decidedly nonrandom set of behavioral variations to choose from. In fact the child tries to approximate the best way to write and of these approximations, she chooses the best one. Then according to the mechanics of the previous section, one would imagine that the child would vary her movements more if she is closer to writing well. The picture given here by Baldwin accords with our model of the previous section.

In the next section we will present a formal approach to modeling the Baldwin Effect, both organic selection and functional selection. Then we will discuss how the Baldwin Effect can have impact on population genetics.

### 4. RESULTS

This section describes a formal model based on the new interpretation of the Baldwin Effect described in the Background section.

### 4.1. Formal Model of the Baldwin Effect

Consider that for organisms with phenotypic plasticity, the initial phenotype can change in response to environmental and other factors. In the context of Baldwin's observations, we introduce a two-stage model of phenotypic change which incorporates a lifehistory of rewards and a changed phenotype. Baldwin describes a process of phenotypic change which starts with the initial phenotype P<sup>0</sup> containing the instincts alone. The organism then changes to phenotype P<sup>T</sup> in response to rewards R<sup>T</sup> received over the life history (of length T). The iterative dynamics of how P<sup>0</sup> changes throughout each epoch in the life history is related to P<sup>T</sup> as a difference equation is related to its solution. For simplicity, we omit a thorough treatment of iterative dynamics and instead focus on the final state P<sup>T</sup> (though see Sandefur's book if interested Sandefur, 1993). In accord with the connection between reward and fitness assumed by Baldwin, the reward history R<sup>T</sup> is at each epoch an approximation to the fitness of the corresponding phenotype. For instance, the last reward in R<sup>T</sup> approximates the fitness of phenotype PT−1. Then the basic model of Baldwin's Organic selection is given by the equation

$$P\_T = \Phi(P\_0, R\_T) \tag{1}$$

The approximation of the fitness by the rewards gives us the following relationship between the final phenotype and the history F<sup>T</sup> of fitnesses:

$$P\_T \approx \Phi(P\_0, F\_T) \tag{2}$$

With the introduction of fitness histories into the approximate dynamics, the penultimate phenotype PT−<sup>1</sup> could have a fitness different than that of P0. If a number of initial phenotypes P = (P0, P ′ 0 , . . .) converge to the same P<sup>T</sup> under 8 for some T, then they form a natural class of 8-equivalent phenotypes. If indeed a good number of phenotypes are 8-equivalent, then the fitness landscape across all phenotypes ends up being over a new space of phenotypes. Since the phenotypes change under 8 at a rate rapid enough to affect the fitness of the organism (Note that rapid phenotypic change does not necessarily rule out significant later-life plasticity). The effective change in the space of phenotypes occasioned by the process inspired by Baldwin and formalized by Equation (2) could fundamentally change the way that populations evolve in the long-run. Reshaping of the phenotype space due to the Baldwin effect could therefore have a significant effect on the way that populations evolve. The impact on population genetics will be explored later. We would like to present a specific reshaping function 8 based on information theory.

#### 4.1.1. Information Theory and Phenotype Reshaping

Now we turn to the reshaping function for Equation 2 that we wish to characterize. Let's call it the Info-theoretic reshaping function 8<sup>i</sup> . For the communication game and all other background details about the information theory, refer to the Mathematical Background section. Assume that the P<sup>0</sup> for the organism is defined by motor parameters for the initial instincts. The choice of motor parameters each correspond to ways of encoding an abstract class c ∈ C of functions that the animal may perform in its niche (with C being the set of all such classes). Posit that the organism is trying to communicate to its environment by its instincts what class c of function it would like to perform in the environment. Now consider that the environment during the organism's life history is trying to decode the class c of functions corresponding to the motor parameters, and that the appropriateness of the functions for the current environment determines fitness. Then assume that the motor parameters are subjected to some kind of additional Gaussian noise in their execution (for instance, noise due to wear and tear, heat noise). For the model that gives rise to the reshaping function 8i , we merely assume there is close-to-optimal communication between the organism and the environment: with the organism communicating the class c of ecologically-relevant functions it wishes to perform near-optimally to the environment for the purpose of natural selection. With the organism achieving communication near-optimal both in communication rate and also accuracy. Then what kind of strategy should the organism use? The communication game over the Gaussian channel will give us the answer.

Recall that the proposed encoding function for the Gaussian channel was based on a random Gaussian codebook. Then for the near-optimal code, the n motor parameters θ encoding the class c are chosen at random, according to the Gaussian distribution (with zero-mean and variance ω − ǫ as in the communication game). Each of the randomly-chosen set of motor parameters θ would give a way of executing instinctually each function class c. Such a code is similar in spirit to models found in neural coding theory (Pouget et al., 2000), but we view motor parameters (and the neural populations that code for them) in this case as encoding more abstract functions than saccades in response to motion direction (as in Shadlen and Newsome, 2001). More abstract neural codes can be found for instance in the literature for value coding in LIP (Platt and Glimcher, 1999). The optimal decoding mechanism, according to information theory, is given by the nearest codeword Gaussian channel decoder used by Bob in the communication game (if we rule out decoded codewords that give rise to decoding errors by having power greater than ω). How can we model the decoding mechanism if the environment is trying to decode which function class a noisy set of executable motor parameters belongs to and its appropriateness for the sake of natural selection?

The decoding mechanism requires a suitable fitness function. Such a fitness function is defined according to the initial random choice of motor parameters θ c encoding each class c. For a nearoptimal decoder, the fitness of a set of motor parameters θ can be set inverse to the Euclidean distance between θ and θ c , D(θ, θ c ), where θ c is the nearest codeword. In words, the closer one is to the nearest codeword θ c , the higher one's fitness will be. We should also note that for every motor parameter setting, there will also be a corresponding abstract class c of function for the organism (and it will be closest in terms of Euclidean distance in the space of motor parameters θ c ). A suitable fitness function for instincts is thus:

$$f(\theta) = \arg\min\_{\mathcal{L}} \exp(-D(\theta, \theta^c))\tag{3}$$

which is a special kind of Gaussian fitness function, as introduced by Fisher (Fisher, 1999; Martin and Lenormand, 2006, 2015).

Given the fitness model, we can now define 8<sup>i</sup> . Let 8<sup>i</sup> be a function which when given a sequence R<sup>T</sup> of reward values that is increasing, provides an output phenotype which gives at least the same reward as the last value of RT. Any kind of dynamics that increases R<sup>T</sup> with respect to the phenotype could do this (for example, multiplicative weight updates Arora et al., 2012, gradient ascent Boyd and Vandenberghe, 2004, etc.). Then by Equation 2 such a 8<sup>i</sup> when combined with a fitness function 3 will output a P<sup>T</sup> such that f(PT) > f(PT−1), where PT−<sup>1</sup> = 8i(P0, RT−1) and RT−<sup>1</sup> is the reward history found in R<sup>T</sup> excluding its last element. Therefore, since the output phenotype ends up increasing the fitness each iteration, for some T, P<sup>T</sup> will be a local maximum of the fitness. But this would mean for some T, P<sup>T</sup> = θ c corresponding to the original class of the P<sup>0</sup> = θ, by the definition of the fitness (Equation 3).

For the kinds of dynamics that increase reward R<sup>T</sup> over time, all of the phenotypes that lead to the same local maximum of the fitness are 8-equivalent. Therefore the 8i-equivalent phenotypes are those which are closest to the same θ c , according to the definition of the fitness function (Equation 3). So the 8iequivalent phenotypes are all parametrized by θ c , and thus we denote them with Pθ c.

Before proposing our model specifically, we would like to present some background information about play and its role in child development. The primary phenomenon we will introduce is body babbling. Body babbling was introduced by Meltzoff and Moore (1997) to account for the process by which babies do movements in a free and non-directed way in order to develop deliberate reaching at the age of 8 months. As defined by Meltzoff and Moore (1997),

" In body babbling, infants move their limbs and facial organs in repetitive body play analogous to vocal babbling. In the more familiar notion of vocal babbling the muscle movements are mapped to the resulting auditory consequence; infants are learning this articulatory–auditory relation. Our notion of body babbling works in the same way, a principal difference being that the process can begin in utero. What is acquired through body babbling is a mapping between movements and the organrelation end states that are attained."

In particular, we propose the following model inspired by Baldwin's account of motor babbling (Meltzoff and Moore, 1997) (which he calls body babbling) during play.

We propose a 8-approximate reshaping function based on the Multiplicative Weight Updates Algorithm (MWUA) (Arora et al., 2012). MWUA selects one of k different experts, choosing experts with higher probability when their advice leads to higher reward. The reward for an expert i at time t is its reward r (t) i . Note that the loss r (t) i is a function that varies for different applications of MWUA, and in the case of the application in our paper is specified by Equation (5). The probability distribution over experts p (t) i at time t + 1 for MWUA is given by:

$$p\_i^{(t+1)} = p\_i^{(t)} \frac{1 + \eta r\_i^{(t)}}{\sum\_j p\_j^{(t)}(1 + \eta r\_j^{(t)})} \tag{4}$$

with η > 0 being the learning rate. When η is small, experts are chosen more based on long-term increases in reward, and when it is large they are chosen based on immediate reward.

The particular model for a 8i-approximate reshaping function proposed takes the reward functions r (t) i for MWUA to be the reward expected from exploratory play during some motor or navigation task. Each "expert" is a motor behavior. Now the reward function is assumed to be as follows:

$$r\_i^{(t)} = -\arg\min\_c \exp(i - c)^2\tag{5}$$

where c is a particular target motor behavior that is the "goal" for the exploratory play, as defined by the nearest local maximum to the instinctual initial motor behavior strategy (at time t = 0, the motor behavior with highest probability p (t+1) i ) on the corresponding fitness function Equation (3). It is a property of the MWUA that it converges in linear time T to the expert i that maximizes the cumulative reward P<sup>T</sup> t=1 r (t) i . Arora et al. (2012) So therefore our model of motor play is a 8-approximate reshaping function, since the motor behavior PT<sup>∗</sup> that the MWUA model converges to is the same as the local maximum, and thus satisfies the criteria for an approximate 8<sup>i</sup> informationtheoretic reshaping function.

There have been many useful models of body babbling that have been proposed as of late in the robotics literature Lee (2011), making new advances in solving the inverse problems involved in motor planning (Rolf et al., 2010) and representation issues in sensorimotor representations (Law et al., 2013). We view our model as a simplified form of model for body babbling that allows us to ask what kind of impact it has on evolution of animals that engage in it and robots that use genetic algorithms combined with body babbling.

Motor and object play (Smith, 2010) are relevant to us, since they are a set of open-ended, non-goal-directed actions, like what would be found in the MWUA model with medium or low values of the temperature (for motor behaviors having to do with arm movement or object manipulation). Also, the MWUA model assumes there are internal reward signals associated with different motor behaviors, and that the ones which are closer to goal-directed are internally rewarded this way. So too does Lee (2011) emphasize the importance of internal rewards and intrinsic motivation as a way to model play, with the latter originally introduced by Furth (1969).

Sensorimotor development happens in stages, in order to set progressively harder learning problems to solve. Past algorithmic approaches have used these stage-wise sensorimotor constraints to model infant development during play, and in fact learned using appropriate constraints (Law et al., 2013). We too have considered the same with only one stage of constraints (modeled by the nearest goal-directed action) modeled by one round of phenotype-reshaping, but for multiple stages, there could be multiple rounds of phenotype-reshaping for complex goals.

Now we turn to the impact of phenotype-reshaping using 8<sup>i</sup> on the rate of evolution.

#### 4.1.2. Impact on Population Genetics

As reviewed in the Introduction, there is currently no mathematical proof that something like the learning-based Baldwin Effect (as in Hinton's work) can speed up evolution in non-trivial ways. In this section we show that under the phenotype-reshaping account of the Baldwin Effect we can prove that there is a significant speed-up in the evolution of a complex trait. Phenotype reshaping can speed up evolution by effectively removing the stochastic element of crossing fitness valleys in a rugged fitness landscape. The mechanism for this is to reshape the fitness valley so that it increases in fitness to that of the local maximum, effectively just leaving a series of plateaus of ever-increasing fitness (see **Figure 1**).

Normally, in the evolution of complex traits, there are three regimes: the fast near-deterministic regime of evolving a trait with greater fitness after a single mutation, the intermediate regime of evolving a trait after a few steps of neutral or nearneutral evolution, and the slow stochastic regime of evolving a beneficial trait that requires a large decrease in fitness as an intermediate step to achieving the larger fitness (Weissman et al., 2009). For small population sizes, the first is called a beneficial

mutant, the second is called sequential neutral fixation and the latter (for a k-allele gene) is called a beneficial k-mutant that results from sequential deleterious fixation. Sequential fixation regimes are so named because they involve small populations that have to sequentially fix intermediate mutations along the way to the final, beneficial, mutation.

If the difference in fitness between the original phenotype and the beneficial mutation is s, then the beneficial mutant arises in γ /s time (on average), where γ = 0.577 . . . is Euler's constant (Desai and Fisher, 2007). Assume for k = 2 that the population size N is small, a single mutation will reduce the fitness by a large amount δ, and the double mutant will increase the fitness by a factor of s (which is called the deleterious sequential fixation regime). Then for a beneficial double-mutant with all mutation rates equal to µ to arise it takes approximately <sup>1</sup> Nµρ1 time (on average), (where ρ<sup>1</sup> = e <sup>δ</sup>−<sup>1</sup> e <sup>N</sup>δ−<sup>1</sup> ). One can also look at a situation in which there are almost-neutral (δ < 1/N) or neutral intermediate phenotypes with a beneficial complex trait resulting from their combination, which is called a sequential neutral fixation process. For this setting with k = 2 necessary mutations the time for a beneficial mutant to arise is approximately <sup>1</sup> µ on average. Such a regime is called neutral sequential fixation (Weissman et al., 2009). According to the work of Weissman, the deleterious sequential fixation regime takes more time to produce a beneficial double-mutant than the neutral sequential fixation regime, due to negative selection on the intermediate mutant.

Phenotype reshaping puts all steps of complex trait evolution into the beneficial mutant or the sequential neutral fixation regimes, with each step for the evolution of a trait reducing to a single mutation or a set of neutral steps to a beneficial mutant (traveling from the neighborhood of the local maximum to another adjacent neighborhood in one step). Effectively the evolution within a neighborhood of the local maximum is neutral, and so one individual on the boundary can arise and then cross over without any delay. In contrast, more time is required for the evolution of a k-beneficial mutant in the sequential deleterious regime. (Due to negative selection of the intermediates as they arise sequentially.) The next section describes a workedout analysis of the speed-up for a simple example fitness landscape.

#### 4.1.3. Phenotype Reshaping's Impact on the Speed of Evolution for a Simple Example

The model of the last section, information-theoretic phenotype reshaping (using 8i), is applied in this section as a means by which one can speed up evolution on a specific example fitness landscape. We show that if information-theoretic phenotype reshaping, as introduced in the last section, is used during the lifetime of those in the population, they can effectively flatten bumps on fitness landscapes and thus avoid fitness valleys (which slow down evolution). Thereby using info-theoretic phenotype reshaping functions 8<sup>i</sup> one can show on a simple example that evolution speeds up when one has this kind of informationtheoretic phenotype reshaping.

The following example is not meant to model the genetic basis of behavioral traits or learning in general. It is a simple model which serves as a proof of concept that for behavioral traits that involve rugged fitness landscapes and Informationtheoretic phenotype reshaping one can find a simple genetic mechanism based on population genetics that speeds up evolution considerably.

Consider a fitness landscape in which each phenotype is a bitstring of length k. For example, for k = 5 a phenotype would be 01101. All but one phenotype x will be either an optimal type, with fitness P i (xi), or a phenotype which is suboptimal, with fitness (1−c)(1<sup>+</sup> P i (xi)) e where c is a positive constant. This is a special case of the fitness function proposed in the last section (Equation 3) with a Hamming distance function rather than a Euclidean distance. Despite the use of Hamming distance, the fitness function would behave similarly without loss of generality to a fitness function using Euclidean distance. The optimal phenotypes will be the bit-strings that correspond to even numbers, and the suboptimal ones will correspond to odd numbers (formally, if the number of 1's is even, then the bit-string is even, and likewise for an odd number of 1's and odd numbers).

For the analysis, rather than using the fitness as-is, the relative fitness is used. (Which re-normalizes the fitness of the optimal phenotypes to 1 and thus divides the suboptimal phenotype fitness by the original fitness of the optimal phenotype.) Then after phenotype reshaping as described above with 8<sup>i</sup> , it will take at least k/µ time (on average), as evolution will happen in the sequential neutral fixation regime for double-mutants, and only k of those steps would be necessary. The reason the beneficial

mutants are 2-away is due to the hamming distance between any even and odd bitstring being exactly one, so the distance between two optimal phenotypes is two. (A bit flip away from the optimal phenotype and a bit flip from the suboptimal phenotype to the next optimal one.) Along the same lines as the above argument, one can see that for the deleterious sequential fixation regime the double-mutant will take at least <sup>k</sup> Nµρ1 time (on average) to arise, where N is the population size and ρ<sup>1</sup> is as above (with δ = 1 − 1−c e ). The final expression for the waiting time without phenotype reshaping simplifies to <sup>1</sup> 2µ (1 <sup>+</sup> exp [ <sup>1</sup> e (c + e − 1)]). Now the waiting time is exponentially increasing in c. Comparing the two, the phenotype reshaping has as its rate of evolution something independent from the depth δ of the fitness valley, whereas the ordinary evolutionary rate slows down exponentially as a function of δ, see **Figure 2**. Analytically, the difference between the two regimes' waiting times is <sup>k</sup> 2µ <sup>1</sup> <sup>+</sup> exp [ <sup>1</sup> e (c + e − 1)] , and thus grows exponentially in c. So the phenotype reshaping has a large impact on increasing the speed of evolution for this simple example, and for these biologically realistic parameter settings the effect grows linearly in the dimensionality k of the phenotype space.

#### 5. DISCUSSION

The Baldwin Effect is probably one of the most multifarious topics in Evolution. In this paper the many divergent interpretations were reviewed and a novel one was proposed. It seems that much of the literature has under-estimated the Baldwin Effect, due to the over-emphasis on genetics and the under-emphasis on developmental psychology. The role of abstraction over phenotypes in particular has been left out of most accounts of Baldwin's work on genetics.

Moreover, a phenotypic plasticity and abstraction-based account of the Baldwin Effect has other benefits. Notably, using some insights from Baldwin for the impact of play on evolution we were able to show the first Baldwin Effect-induced dramatic speedup of evolution on a fitness landscape using ordinary population genetics. The result is a notable improvement over prior work, which was based on neural networks theory and genetic algorithms and did not show dramatic improvement over ordinary evolution.

The most salient aspect of the Baldwin Effect we did not touch on in great detail was its emphasis on consciousness and its impact on genetics. We attempted to interpret what Baldwin meant by these effects by emphasizing the role of abstraction. There is nonetheless a gap between abstraction and what Baldwin seems to mean by consciousness, since he says that reason "ratifies" the moves proposed by genetics, and attention also has a role. But most of all Baldwin emphasizes the role of conscious experience in first-person control of innovation and behavior, and we have not explored those in any detail. It would be fascinating to explore the role of conscious experience in the Baldwin Effect in more detail.

Abstraction-based accounts of reason though are very old indeed, and go back all the way to Aristotle (2015) and Aquinas (1947). In addition, there is a rich tradition of abstractionbased structures informing the origin of biological innovations in the medieval literature on the scala naturae (Lovejoy, 2011). The scala naturae posits a set of major transitions based on new abstractions introduced at ever-higher "rungs" of the ladder. (With each rung being a kind of organism, for instance animals with sentience or plants with the ability to grow and self-repair.) Along these lines, recent work has tried to find rapprochement between Piaget's stages of child development and new formulations of the Baldwin Effect (Burman, 2013).

James Mark Baldwin was a pioneer in Evolution, but his primary advance was to explore the effect of developmental psychology on biological theory and function. Perhaps the most important work yet to be done is to bring more recent theory from developmental psychology (such as Gopnik's work on Bayesian theory; Gopnik et al., 2004; Gopnik and Tenenbaum, 2007) to bear on genetics. Such an update of the Baldwin Effect would be an interesting and natural direction left open by this work.

#### AUTHOR CONTRIBUTIONS

EC ran experiments, did analysis, and wrote the manuscript.

#### ACKNOWLEDGMENTS

Thanks to Lee Altenberg for early discussions on this paper, Austin Choate for feedback on the early stages of this work, and for Nina Fefferman for guidance in the process of writing the manuscript.

#### Chastain Development and the Baldwin Effect

#### REFERENCES


Smith, P. (2010). Children and Play: Understanding Children's Worlds. Chichester


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Chastain. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpsyg-09-01939 October 10, 2018 Time: 14:47 # 1

# Telling Apart Motor Noise and Exploratory Behavior, in Early Development

#### Teodora Gliga1,2 \*

<sup>1</sup> School of Psychology, University of East Anglia, Norwich, United Kingdom, <sup>2</sup> Centre for Brain and Cognitive Development, Birkbeck, University of London, London, United Kingdom

Infants' minutes long babbling bouts or repetitive reaching for or mouthing of whatever they can get their hands on gives very much the impression of active exploration, a building block for early learning. But how can we tell apart active exploration from the activity of an immature motor system, attempting but failing to achieve goal directed behavior? I will focus here on evidence that infants increase motor activity and variability when faced with opportunities to gather new information (about their own bodies or the world) and propose this as a guiding principle for separating variability generated for exploration from noise. I will discuss mechanisms generating movement variability, and suggests that, in the various forms it takes, from deliberate hypothesis testing to increasing environmental variability, it could be exploited for learning. However, understanding how variability in motor acts contributes to early learning will require more in-depth investigations of both the nature of and the contextual modulation of this variability.

#### Edited by:

Kathy Hirsh-Pasek, Temple University, United States

#### Reviewed by:

Floris Tijmen Van Vugt, McGill University, Canada Jean-Baptiste Leca, University of Lethbridge, Canada

#### \*Correspondence:

Teodora Gliga T.Gliga@uea.ac.uk

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 21 December 2017 Accepted: 20 September 2018 Published: 12 October 2018

#### Citation:

Gliga T (2018) Telling Apart Motor Noise and Exploratory Behavior, in Early Development. Front. Psychol. 9:1939. doi: 10.3389/fpsyg.2018.01939 Keywords: variability, infants, reaching, vocal behavior, exploration

### INTRODUCTION

Watching infants move about, interact with objects or attempting to communicate, one cannot but observe the great variability of their motor acts. A 7 months olds' repeated banging of an object on a hard surface takes many trajectories. From one movement to the next, she might be grasping the object differently, as if exploring both the motor affordances of the object and her own motor abilities. Exploratory behavior, or the focused investigation as children get familiar with new environments (Hughes, 1978; Zosh et al., 2018) was proposed as a driving force for learning. However, deciding whether variability in motor acts is actively produced to serve learning, is not straightforward. Gibson, in her 1988 monograph, had already noted the difficulty with interpreting early motor acts. While suggesting that "The active obtaining of information that results from the spontaneous actions of the infant is a kind of learning," she also raised the question of whether "this activity is in any way controlled by the infant," rather than "compulsory response to stimulation (Gibson, 1988)." This is further complicated by the fact that variability in motor behavior has often been described as the manifestation of an immature system, which is attempting but failing to achieve goal directed behavior (e.g., Yan et al., 2000). In the adult skill acquisition literature, as well, movement variability is an index of error or noise in sensory-motor systems, something the organism strives to eliminate as a new skill is acquired (Harris and Wolpert, 1998; Todorov and Jordan, 2002). Understanding under which conditions variability reflects exploration rather than fpsyg-09-01939 October 10, 2018 Time: 14:47 # 2

noise is critical for both those interested in identifying and intervening in atypical development and for those invested more generally in creating environments that offer opportunities for learning. This review aims to create a framework for the study of early motor acts as exploratory behavior.

I will start by reviewing evidence suggesting that increased motor variability is not always a manifestation of an impaired or immature motor system since (1) variability sometimes increases in development and (2) decreased and not increased variability was documented in certain developmental disorders. I suggest (joining others, e.g., Thelen and Smith, 1994) that variability is upregulated when an organism faces new learning opportunities. I will propose this as a defining principle that sets apart variability as exploration from variability that is simply noise. This will be supported by evidence for (3) an increase in the amount and variability of motor output with information availability, in experimental situations, and for (4) direct associations between variability in motor activity and learning outcomes. I will then move on to discussing the mechanisms that might support the upregulation of motor variability and those linking variability to learning. Given the aim of this review is to illustrate a general principle, evidence will be brought from a variety of motor acts: reaching, locomotion or vocal behavior, and from a variety of species. Both variability that supports exploration of an organism's motor abilities and of the surrounding environment will be considered. Finally, I will ask the question of which of these mechanisms might be at play in early human development.

### Measuring Variability

This review does not draw on a rich literature. Although many studies have characterized the amount of movement or the types of movements infants produce, few have focussed on the manner in which acts are realized, as for example on the variability in acceleration, trajectory or in the combination of articulators (**Figure 1**). Even fewer have investigated this variability as exploratory behavior. Those who have done this, have sometimes distinguished between variability, calculated, for example, as the sum of the variance at each point in the path of a reaching hand (Wu et al., 2014) and complexity, which takes into account the temporal dimension of this variation, such as the amount of repetition of the same type of sway movement when standing (Dusing et al., 2013). It remains unknown which of these measures better captures variation targeted at exploration. By inquiring putative neural mechanism generating variability, this review hopes to offer guiding methodological and theoretical principles that can fuel a new avenue of investigation.

### EVIDENCE THAT VARIABLE BEHAVIOR REFLECTS EXPLORATION

#### Increased Variability at Key Points in Development

If variability is a nuisance then we'd expect development to always proceed from more variable to less variable behavior. On the contrary, observing increasing variability, at certain moments in development, might point to it having a functional role. Dynamic systems accounts of development have already highlighted the need for transitions between stability and variability, whenever new skills emerge (Thelen and Smith, 1994). Increased variability has been observed at various points in development. Motor activity starts early in fetal development. Fetal movements are varied and structured; rodents, for example, exhibit coordinated motor patterns antecedent to postnatal locomotion, suckling, maternal–infant communication and grooming behavior (Robinson and Smotherman, 1992). This activity decreases toward 40 weeks after gestation, whether the pup is born at term or pre-term, suggesting that this decrease does not reflect space limitation toward the end of the pregnancy but a pre-programmed pattern of up and down-regulating variability (Robinson and Smotherman, 1992). Indeed, after birth, although the newborn must cope with the restraints of gravity, there is an increase in the variability of movements. In human infants, we see the emergence of writhing general movements (Prechtl, 1993). These variable sequences of arm, leg, neck and trunk movements, with often slight changes in direction of the movement "make the movements fluent and elegant and create the impression of complexity and variability" (Prechtl, 1993). An increase in combinatorial variability, in terms of a decrease in the locking of movement of different limbs is also observed from 6 to 18 weeks (Piek et al., 2002). Despite the repeated suggestion that an increase in variability reflects an active process of exploration, allowing the selection of most efficient movement strategies (Edelman, 1987; Stulp and Oudeyer, 2018), this process of increasing variability followed by selection has not yet been captured, in development. This limitation is most certainly methodological, since new skills appear at different points in time in different infants (e.g., infants may start crawling anywhere from 6 to 12 months, and some skip this locomotive stage all together), and capturing these transition points would require frequent sampling before and after the new skill emerges. New wearable technologies (see **Figure 2**), might make this research easier to carry out. Alternatively, one can attempt to train new skills in the lab (see further on).

### Variability in Atypical Development

Another piece of evidence in support of the idea that variability promotes development and learning comes from observations of decreased rather than an increased variability of motor outputs in many pathologies of movement (e.g., Parkinson's Disease - Freund and Hefter, 1993; stuttering-Grosjean et al., 1997) but also more generally whenever development is compromised. This is the case in infants with documented brain damage, who display monotonous and more stereotypical movements, less fluent and lacking complexity (Newell et al., 1993; Prechtl, 1993). Cerebral palsy has also been linked with decreased movement variability in the first few months of life (e.g., Prechtl, 1997). Vaal et al. (2002) showed that 18 and 26 weeks old infants with periventricular leukomalacia had tighter intra-limb locking during spontaneous kicking, compared with infants with no evidence of brain damage. 9-month-old high-risk preterm infants engaged in less fingering, rotation or transfer of objects and a summary exploration score predicted cognitive functioning at

24 months (Ruff, 1984). Preterm born infants showed decreased variability in reaching movements, both when producing distal (e.g., reaching with both one or two hands) and proximal adjustments (e.g., various hand openings; Soares et al., 2014) and lower levels of exploratory movements of toys (Soares et al., 2012; Guimarães et al., 2013). Between 12 and 18 months of age, when infants start standing up unsupported, variability in the execution of this motor act is the norm; the rotation of the foot, the degree of knee flexion and hip abduction or the foot leading the movement vary from one standing up to the next (Towen, 1993). Atypically developing infants make fewer attempts to stand up but the most striking difference is in the decreased variability of the gestures (Towen, 1993).

However, some pathologies are associated with increased variability. In Tourette's there is poor motor learning but increased variability (e.g., Draper et al., 2015). Hyperkinesia and extreme clumsiness are often observed in development and characterized by increased variability (Towen, 1993). Towen (1993) noted that this apparent discordant evidence probably reflects poor understanding of the mechanisms generating and making use of variability. He advanced the idea that in some pathologies, it may not be the mechanisms generating variability but the selective process (of optimal motor strategies), that is impaired. Alternatively, it may be the nature of the variability, reflecting decreased exploration or increased noise, that differs between pathologies. Investigating whether variability increases, or fails to increase, in learning contexts, may help tease apart between these hypotheses.

### Variability Increases With Information Availability

Since exploratory behavior is behavior targeting information acquisition, an increase in variability when new information is available is a key indicator of variability as an index of exploration. It was indeed observed that, when infants are engaged in reaching training regimes they initially produce distal adjustments that increase in variability (Soares et al., 2013). Across a number of studies, introducing infants to an object with a new property increased object-directed movement and the variability of movement types. Steele and Pederson (1977) observed that 6-month-old infants increased their touching and looking behavior when introduced to an object that differed in temperature from previous ones, but no change in behavior occurred when the object changed color. 9- to 12 month-olds engaged in more banging when exploring objects that had a new weight and more rotating and transferring when exploring objects that had a new shape (Ruff, 1984). When given an object with a new texture newborns increased the frequency of their hand pressure movements (Molina and Jouen, 2004). In another study, information content modified infants mouthing of artificial nipples – more variable movements (and less sucking per se) was measured in newborns when they experienced a new nipple texture (Rochat, 1983). Rochat notes that this activity could not have been reflexive, since it was modulated in character and varied according to context.

Later in development, it was observed that vocal articulators increase in movement variability following cochlear implantation. The stability of movement trajectories for correctly produced speech was compared pre- and post-implantation, in a 7-year-old child (Goffman et al., 2002). Pre-implantation, the participant had slightly higher movement variability than age-matched controls. Two and four months after implantation, variability increased further but by 6 months post-implantation, this child produced speech movements of a similar stability as the controls. The authors comment that variable movements of correctly articulated speech 'may reflect a system that is being

fpsyg-09-01939 October 10, 2018 Time: 14:47 # 3

fpsyg-09-01939 October 10, 2018 Time: 14:47 # 4

FIGURE 2 | Motion tracking systems allow for precise tracking of infant's limb or head movement. The position of light reflecting spheres attached to the infant body is triangulated with the help of a system of surrounding cameras. This system is light and does not interfere with infant's movements.

modified in response to new auditory input provided by the cochlear implant' (p. 892).

Although these studies are compatible with the idea that variation is upregulated to help learning, it still remains possible that an increase in variability in learning situations simply reflects failed attempts to achieve a new goal rather than a process of exploration. Showing that increasing variability actually leads to knowledge accumulation, provides the strongest evidence for its adaptive role in development or skill acquisition. This type of evidence remains scarce.

#### Variability Leads to Learning

One of the most compelling studies leading variability in motor activity and learning, had adult participants learn a new motor routine. Wu et al. (2014) measured variability of arm trajectories before and during a motor learning task in which participants had to draw subtly curved shapes, with fast arm movements. Variability during a baseline period (in which no feed-back was given for tracing a model curve), was positively correlated with how close to the target curve participants got in the training phase. It was variability in a task-relevant dimension that best predicted learning. In another study, Byun et al. (2014) assessed the role of motor exploration for vocal learning and found that children enrolled in an ultrasound biofeedback intervention for /r/ mis-articulation only made progress when they were allowed to try out a variety of tongue shapes for /r/, rather than being set a specific shape by the therapist. More recently, Lee et al. (2018), showed worse learning of a new motor skill in children than adults. Participants had to use their upper body movements to control a cursor on a screen. The authors explained these findings based on children's limited exploration of their movement repertoire. Exploration was quantified here as the ratio between the 2 principal components that explained most variance in movement. This metric was considered to better capture exploration of the 2 dimensions of the screen than variation within each dimension.

No study yet has shown that progress in a particular learning task is improved in infants that manifest increased variability in behavior (e.g., better discrimination of weight in infants that had manifested most variability in banging objects or acquiring faster reaching in those infants that started off with higher reach variability). A recent study took a different approach to demonstrating this relationship, by simulating learning of the ability to play football in conditions of variable or non-variable walking practice. Rather than using human infants, Ossmy et al., 2018 used robots. As predicted, training that varied in path shape, step direction and number of steps helped teams win "RoboCup" tournaments. Although this first study did not investigate the role played by different types of variability, this approach clearly has the potential to delve deeper in understanding the mechanisms linking variability to learning.

### THE MECHANISMS DRIVING VARIABILITY IN MOTOR ACTS AND ITS CONTRIBUTIONS TO LEARNING

A mechanistic understanding of how variability is actively generated may also help us identify it and understand how it supports learning. One strategy is to look at where in the nervous system variability originates. In a recent review, Dhawale et al. (2017) differentiate between planned noise, variability generated in the central nervous system and execution noise, variability resulting from the randomness of biological processes such as spike generation and propagation, synaptic transmission, muscle protein changes; however, execution noise may originate both in the central and the peripheral nervous system. Thus, variation in cortical activity does not necessarily reflect actively generated variability.

However, specific mechanisms have been suggested to generate variable behavior that may point to specific manifestations of variability. In its highest-level form, planned variability may reflect deliberate hypothesis testing. Children figuring out how to activate a hidden mechanism with the help of wooden blocks try various combinations of blocks and often verbalize the hypothesis they are testing (Gopnik et al., 2001). This process of hypothesis or theory testing is seen by some as critical for advancing learning, especially for generalizing knowledge beyond the particulars being experience at a moment in time (e.g., see Annette Karmiloff-Smith's, "If you want to get ahead, get a theory"; Karmiloff-Smith and Inhelder, 1974). Discrete instantiation of each hypothesis, especially when accompanied by verbal explanations, clearly identifies this process as exploration. How exactly hypotheses are generated remains largely unknown; in their study of balancing objects, Karmiloff-Smith and Inhelder (1974) observed that past experience heavily influences which hypothesis children will try out (try balancing in the middle if that worked before), and formulating a new theory – and therefore trying out a new balancing point – did not necessarily emerge from encountering counter-examples of the former theory, but from a process of insight, difficult to capture from children's behavior.

There are however, cases in which an individual does not have enough background knowledge to formulate explicit theories. To take a simple example, we might know where alternative sources of food could be if we don't find any in our fridge (try corner shop); but while in a forest and hungry, we might not even recognize what food looks like. Adopting quasi stochastic behavior, e.g., sampling anything that looks vaguely edible, might be our best bet in these situations. This trial-end-error approach is critical for reinforcement learning. Rats faced with an unpredictable competitor for food, whose actions they try to counteract but fail, adopt a random pattern of choices between two food sources (Tervo et al., 2014). Another example comes from song learning, in zebra finches. Young males produce song syllables with a normal distribution of pitch values (Tumer and Brainard, 2007). Tumer and Brainard (2007) showed that by negatively reinforcing the upper end of a normal distribution of pitches through the contingent presentation of white noise, the pitch of a particular syllable in the song can be shifted. Interesting, the shift resulted in a distribution with a new mean, but which maintained the same degree of variability around the mean. Thus, this variability is actively maintained to enable the learning of new songs through reinforcement of particular ranges in the distribution, just like genetic variability is generated for natural selection to occur. Arm reach angles of adult human participants learning a new motor task are also initially normally distributed around an optimal value (Pekny et al., 2015). I will call this learning expectant variability.

Despite the seemingly stochastic nature of this variability, some have argued that it is not simply reflecting execution noise, but is actively produced at the motor planning stage. Churchland et al., 2006 showed that about half of the variability in reach speed (in monkeys) originates in the pre-motor and motor cortex. The neural structures and physiological mechanisms through which pitch variability is produced in the finches' brain are also well characterized (Budzillo et al., 2017). However, as stated before, cortical origin does not necessarily imply active modulation of variability. The strongest evidence in support of the active generation of variability in these cases comes from the fact that variability is contextually modulated and increases in situations conducive to learning. For example, the song of young male zebra finches increases in spectral variability when they sing in isolation, compared to when singing to a female (Kao and Brainard, 2006; Budzillo et al., 2017). Thus, males take advantage of solitary moments to explore vocal productions, in view of improving their song. In rats, it is the presence of a novel, uncertain environment that activates noradrenergic input from the locus coeruleus into the anterior cingulate cortex (ACC). This suppresses ACC activity (i.e., responsible for accessing previous world models), leading to an upregulation of motor variability (Tervo et al., 2014).

However, some variability in motor acts is simply noise. Could learning take advantage of this type variability as well? We can see why this is difficult by taking an example from learning sensorymotor contingencies. In a system with high execution noise, erroneous contingencies between intended motor plans and the actual (incorrect) motor output may be created. However, studies that have used passively generated variability suggest that sensory feed-back is sufficient for reinforcement learning to occur. For example, in Bernardi et al. (2015) adult participants learned a new motor contingency after only having been given passive exposure to a variety of trajectories to a particular target, some of which were reinforced as successful hits. Passive exposure was achieved by moving participant's limbs using a robot arm and resulted in the same learning success as active training. Thus, even in the absence of motor plan, participants could discover successful motor sequences simply based on the sensory feed-back they received from their limbs. However, in this case, recovery of the motor plan was possible by the existence of known sensorymotor contingencies. Participants were adults who had a life time of experience with arm movements and therefore a fairly good idea of which of which motor plans could lead to the particular sensory feed-back. These assumptions will not hold at some point in infancy.

### WHICH OF THESE MECHANISMS COULD GENERATE EXPLORATORY VARIABILITY IN INFANCY?

Where might infant variable motor outputs be, on the continuum between hypothesis testing and sensory-motor noise (see **Table 1**)? Gibson (1988) suggested that infant exploratory activity "continues as play through the preschool years and as deliberate learning later in life," and this possibly reflects the view of many others. However, even for a gesture as simple as reaching, we have little evidence for developmental continuity between the mechanisms driving the various paths arm movements when a 4-month-old reaches for an object, when a 12-month-old tries to

TABLE 1 | Potential sources of variable behavior in early motor output.


fpsyg-09-01939 October 10, 2018 Time: 14:47 # 5

activate a new mechanical toy, or when, a year later, she figures out how to build a tower of blocks. It is highly likely that the balance between noise and deliberate exploration, as sources of variability, shifts during development.

### Hypothesis Testing

fpsyg-09-01939 October 10, 2018 Time: 14:47 # 6

Many have been captivated by the metaphor of infants as little scientists (see Alison Gopnik's "Scientist in the crib"), and indeed some early object exploration descriptions do seem compatible with primitive hypothesis testing. Infant's using different action patterns when reacting to changes in object properties (e.g., Steele and Pederson, 1977; Ruff, 1984) could reflect deliberate testing of a "perceptual" hypothesis, e.g., banging might reflect infants' deliberate testing object weight and fingering, the optimal way of testing an object's temperature. These behaviors are very similar to the exploratory procedures used by adults when having to discriminate objects based on various properties (Lederman and Klatzky, 1993). However, these behaviors need not reflect infants apriori appreciation that banging is a better way of learning about weight than about temperature. It may simply be that the unexpected change in temperature triggers exploratory behavior, just as unexpected environmental changes increase randomness in motor choices in rats (e.g., Tervo et al., 2014). An increase in a variety of object directed actions (banging, fingering, mouthing) would eventually allow infants to discover that some of these actions bring about more information than others – e.g., that a new object's temperature is better perceived when fingering it. Fingering would therefore be gradually selected over other behaviors, in the process of infants interacting with objects but may not initially be stored in long term memory as an explicit strategy to use for learning about temperature. Younger infants may have to make this discovery at each encounter of a temperature change. Rather than hypothesis testing, this would, at least initially, variable movements in object exploration may initially reflect learning expectant variability.

Hypothesis testing was directly investigated in a recent study by Stahl and Feigenson, 2015. Here, 12-month-olds manipulated objects differently following solidity vs. support violations – they banged objects that had passed through walls, but dropped objects that had not obeyed gravity. In this case, the objects themselves did not give away any cues about their properties, which means infants must have apriori chosen which actions were best suited to test their previous observations. In another study, Needham and Baillargeon (2000) observed an intriguing association between the percentage of time 3.5-month-old infants looked at or mouthed objects they were holding and their ability to visually parse objects based on their surface features. While this might simply reflect that motorically advanced infants also have better visual processing skills, an alternative interpretation is that object manipulation, which involves breaking contact between objects, had helped infants formulate hypothesis about object structure, for example the hypothesis that discontinuity in surface features will result in objects being easily taken apart.

Is talking about hypothesis testing, in the above cases, too rich of an interpretation of infant's behavior? In its simplest form, the hypotheses infants test involve acting on the world and expecting a particular outcome (e.g., when I bang this object, I will perceive its weight). But is it possible to demonstrate that infants build up specific expectations during exploration? In an EEG study in which infants could build specific expectation about learning either object functions or labels, theta-band activity was measured over frontal areas in anticipation of object functions, but temporal theta activity was measured when labels were expected (Begus et al., 2016). Frontal theta band activity was measured also while infants explored objects (Begus et al., 2015). These neural correlates of information expectation offer an opportunity to investigate the earlier forms on hypothesis testing driving infant object exploration.

### Learning Expectant Variability

Is there evidence that infants produce the type of learning expectant variability that supports reinforcement learning? The increasing variability in reaching behavior during the first year of life, may be a good candidate for this mechanism at play in infancy (Thelen, 1979; Prechtl, 1993). Infants given reaching and grasping practice, which includes reinforcement of successful reaches, increased the frequency of this behavior (Soares et al., 2013). Interestingly, training only increased grasping success in infants born at term (Soares et al., 2014), i.e., in those infants that showed higher variability in grasping behavior already before the intervention. This suggests that increased variability may give term infants more opportunities to discover optimal reaching strategies. However, only one published study reports on an attempt to directly reinforce a subset of the spatial positions that 5-month-old infants' hands took during reaching (Darcheville et al., 2004), a manipulation similar to the reinforcement of particular pitches in zebra finches' song. In this study, the arrival of infant's hand within particular spatial positions was automatically detected and generated a recording of mother's voice. This manipulation increased reaching behavior; we do not know, however, whether this was accompanied by an increase in reaching using the reinforced trajectory. Selective reinforcement of either consonants and vowels (through smiling, vocal responses and touch) works to increase infants' production of these phoneme classes (Routh, 1969). There is some evidence that mothers themselves selectively reinforce infant vocalizations, as for example imitating infant consonant production more than vowel productions (Gros-Louis et al., 2006). Again, evidence for reinforcement of vocal behavior also falls short of telling us whether learning takes advantage of the increased variability in infants' vocal productions, for example.

### Environmental Variability

One obvious source of variability in behavior is the environment itself. For example, when reaching for an object, another object might block her way and change the reaching trajectory; reaching might change an infant's center of gravity and this in turn could affect the trajectory her arm takes toward an object. Reinforcement learning is central to computational models of reaching (Caligiore et al., 2014) and vocal development (Moulin-Frier and Oudeyer, 2012) and these models critically depend on an initial pool of variable behavior. To model reaching development, Caligiore et al. (2014) used what they call exploratory noise, i.e., random perturbations in the motor output, fpsyg-09-01939 October 10, 2018 Time: 14:47 # 7

to which muscular noise is added. Interestingly, these authors suggest that much of the exploratory noise is not actually planned by the child, but is a consequence of the child interacting in with her (unpredictable) environment. However, is there evidence that the child herself could generate environmental variability with the aim of exploring their body or the environment?

Fagan and Iverson (2007) suggested that one of the functions served by object mouthing, during early play, is to add variability to vocal output – creating some kind of lucky accidents. These researchers went on to show that a larger variety of glottal and sub-glottal sounds were produced when infants were vocalizing while mouthing objects. However, none of the sounds produced were new, in the sense that they were well in the repertoire of an infant that age. Mechanistically, this type of variability is not different from variability produced by noisy motor outputs, since the child is not in control of it, i.e., not in possession of the motor plans that yielded the final behavior. A priori, these motor plans could still be retrieved by making use of the sensory output of these actions and mapping them back on their motor plan(s). Of course, had the sound produced been a new sound, a corresponding motor plan would not exist. This strategy of increasing vocal variability is therefore unlikely to a driver of phonological development. The best a learner can do, if a new sound results from them mouthing objects, is to access the nearest motor plan available, the motor plan corresponding to the closest sound in their repertoire. Given young infants poor memory, sifting through these motor plans should occur fast enough, before she forgets the sound she wanted to re-enact. The solution to that is one other feature of early exploratory activity. In addition to being highly variable, exploratory behavior is also highly repetitive, in the sense that the same motor act may be activated many times in a row.

Repetitive patterns of behavior are present in both limb and vocal movement, and in higher frequency at particular time points in development. Cyclical grasping is elicited in 3-dayold infants when they are handed objects with new textures (Molina and Jouen, 2004). Repetitive actions with objects are present at high frequency during infancy, being ubiquitous at 12 months (Fyfield, 2014). Repetitions per vocalization increase and peak around 9.5 months (Fagan, 2009) but decline with word production (only 18% of first words contain 2 reduplicated syllables Vihman, 1996). With increase motor control, infants could actually produce more reduplication, but they do not. Thus, the amount of reduplication does not reflect competence, but seems to serve a particular function during particular windows of development. I suggest here that this type of repetitive behavior may help infants recover the correct sensory-motor mappings. A detailed analysis of reduplicated behavior will reveal that

#### REFERENCES


repetitions are not identical. Although the same motor plan is activated, variability in outcome is the result of added execution noise. Given this noise is normally distributed, with the most common outcome at noise zero, this should allow the mapping of the motor plan onto the correct output. The role reduplication has in learning new sensory-motor mappings is suggested by the fact that reduplication decreases in the absence of sensory feedback. Deaf infants show delayed or absent reduplication (Oller and Eilers, 1988; Koopmans-van Beinum et al., 2001). Reduplication does appear in vocal production weeks after cochlear implants and, interestingly, precedes an increase in the quality of the consonant vowel vocalizations themselves (Fagan, 2015). However, strong evidence in support of this hypothesis will come from precise measurements of the motor parameters of repetitive motor acts. This has now become possible thanks to motion tracking technology (**Figure 2**).

#### CONCLUSION

We set to answer the question of whether the variability characteristic of infants motor acts is actively generated, rather than being the signature of an immature motor system. Evidence for contextual modulation of motor variability, especially evidence that variability increases with information availability, and a better understanding of the neural sources of variability, suggests that, even early in development, variability might be upregulated in support of learning. However, strong support for this hypothesis still awaits a better characterisation of infant motor variability per se, in the same way in which it has been characterized in bird vocal learning or adult motor skill acquisition. A better characterization of how variability in motor outputs is modulated in learning contexts will allow us to understand to what extent they reflect hypothesis testing, learning expectant variability, or merely infants actively creating lucky accidents.

#### AUTHOR CONTRIBUTIONS

TG has written this review paper.

#### ACKNOWLEDGMENTS

I wish to thank the UK Medical Research Council (G0701484), who funded me while writing this article. Thanks to Clare Press for commenting on an earlier draft of this manuscript and to Ivan for inspiring me to investigate this topic.

subsequent object recognition. Biol. Lett. 11:20150041. doi: 10.1098/rsbl.2015. 0041

Bernardi, N. F., Darainy, M., and Ostry, D. J. (2015). Somatosensory contribution to the initial stages of human motor learning. J. Neurosci. 35, 14316–14326. doi: 10.1523/JNEUROSCI.1344-15.2015

Budzillo, A., Duffy, A., Miller, K. E., Fairhall, A. L., and Perkel, D. J. (2017). Dopaminergic modulation of basal ganglia output through coupled excitation–inhibition. Proc. Natl. Acad. Sci. U.S.A. 114, 5713–5718. doi: 10.1073/pnas.16111 46114


fpsyg-09-01939 October 10, 2018 Time: 14:47 # 8


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Gliga. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpsyg-09-01939 October 10, 2018 Time: 14:47 # 9

# Contribution of Developmental Psychology to the Study of Social Interactions: Some Factors in Play, Joint Attention and Joint Action and Implications for Robotics

#### Hélène Cochet\* and Michèle Guidetti

CLLE, Université de Toulouse, CNRS, UT2J, Toulouse, France

#### Edited by:

Jill Popp, The LEGO Foundation, Denmark

#### Reviewed by:

Nicolas Cuperlier, Université de Cergy-Pontoise, France Gautier Durantin, The University of Queensland, Australia Eiji Uchibe, Advanced Telecommunications Research Institute International (ATR), Japan

> \*Correspondence: Hélène Cochet helene.cochet@univ-tlse2.fr

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 13 October 2017 Accepted: 28 September 2018 Published: 19 October 2018

#### Citation:

Cochet H and Guidetti M (2018) Contribution of Developmental Psychology to the Study of Social Interactions: Some Factors in Play, Joint Attention and Joint Action and Implications for Robotics. Front. Psychol. 9:1992. doi: 10.3389/fpsyg.2018.01992 Children exchange information through multiple modalities, including verbal communication, gestures and social gaze and they gradually learn to plan their behavior and coordinate successfully with their partners. The development of joint attention and joint action, especially in the context of social play, provides rich opportunities for describing the characteristics of interactions that can lead to shared outcomes. In the present work, we argue that human–robot interactions (HRI) can benefit from these developmental studies, through influencing the human's perception and interpretation of the robot's behavior. We thus endeavor to describe some components that could be implemented in the robot to strengthen the feeling of dealing with a social agent, and therefore improve the success of collaborative tasks. Focusing in particular on motor precision, coordination, and anticipatory planning, we discuss the question of complexity in HRI. In the context of joint activities, we highlight the necessity of (1) considering multiple speech acts involving multimodal communication (both verbal and non-verbal signals), and (2) analyzing separately the forms and functions of communication. Finally, we examine some challenges related to robot competencies, such as the issue of language and symbol grounding, which might be tackled by bringing together expertise of researchers in developmental psychology and robotics.

Keywords: human–robot interaction, human development, joint attention, joint action, coordination, complexity, gestures

### INTRODUCTION

Developmental psychologists aim at describing and explaining changes across the life span in a wide range of areas such as social, emotional, and cognitive abilities. Focusing on childhood is a way of grasping numerous changes, especially in terms of communication: infants gradually learn to identify the common ground they have with others and engage in social interactions. The development of such abilities relies on the personal experiences shared between partners in specific contexts (Liebal et al., 2013), among which social play may offer particularly rich opportunities for children to acquire joint action and joint attention skills. Studying the different forms and functions of communication in this context paves the way for identifying the necessary ingredients

for effective joint activities and therefore better understanding the architecture of human–social interactions. Even though the concept of effectiveness may cover different theoretical frameworks, the latter objectives have several applications, for example in supporting children with atypical development, especially when they have difficulty communicating both verbally and non-verbally (e.g., children with autism spectrum disorders, ASD), but also in the field of artificial intelligence. The role of robots in society raises indeed a lot of debates and challenges, as they share more and more space and tasks with humans, for instance in service robotics to assist elderly people. The robots' ability to initiate and respond to social interactions is one of the key factors that will shape their integration in our everyday life in the future. Researchers in social robotics have been working on the question of joint action for over two decades now, sometimes in collaboration with developmental psychologists (e.g., Scassellati, 2000), in order to improve robots' motor and communicative skills. Developmental models of human communicative behavior can indeed help define the components to implement in human–robot interactions (HRI), so as to build rich and natural joint activities (Breazeal et al., 2004; Lemaignan et al., 2017).

The objective of this paper is twofold. First, we intend to present the point of view and some research perspectives of developmental psychologists on joint attention and joint action, in particular in the context of social play. To this end, we will also define, starting from studies on non-human primates, what can be regarded as complex (or rich) and natural (or effective) interactions in both human communication and HRI. Second, we aim to show the extent to which the above-mentioned issues may be of interest to roboticists, in helping conceptualize and implement some variables associated with joint attention and joint action in the context of HRI. Collaborative tasks involving robot and human partners, regarded as tantamount to children's social play, will thus be considered through the prism of pragmatic communication, allowing researchers to dissociate the forms and the functions of communication.

### HOW DOES COMMUNICATION DEVELOP IN THE CONTEXT OF SOCIAL PLAY?

The definitions of play include a wide range of activities, which makes it difficult to determine where play begins and where it ends, even though it is traditionally associated with positive affective valence (Garvey, 1990). Play, which occurs in several animal species (most notably in mammals), has been argued to allow "practice of real-world skills in a relatively safe environment" (Byrne, 2015). We will focus here on social play in human children, which may also enable them, as highlighted by Bruner (1973), to "learn by doing" as they interact with one or several partners. At the individual level, children can indeed explore and enhance specific skills like motor control and creativity, while developing for example cooperation abilities at the social level. The concepts of artifact-mediated and objectoriented action, originally formulated by Vygotsky (1999), are particularly relevant to describe these situations: the relationship between the child and the surrounding objects is indeed mediated by cultural means, tools, and signs. Studying the development of play can therefore reveal how children come to represent and think about their environment.

Social attention is a crucial capacity for the emergence of these play situations, allowing children to focus on some of the other's characteristics such as the facial expressions, gaze direction, gestures, and vocalizations. When the direction of another's attention has been identified (for example through gaze following or point following), we can shift our own attention to focus at the same time on the same external object or event as our partner. This process of joint attention is usually inferred from behavioral cues, including mainly gaze alternation between one's partner and a specific referent (Bourjade, 2017). Joint attention seems therefore necessary for individuals to perform joint action, i.e., to coordinate their actions in space and time to produce a joint outcome, whether it involves here symbolic play (with or without objects), construction toys, board games or any other forms of play.

Joint attention and joint action begin to appear at the end of the first year in human development (Carpenter et al., 1998), gradually allowing children to integrate the notion of common ground and engage in social interactions. The development of gaze understanding, which has been widely studied, plays a key role in this regard. It was for example shown in a study using habituation-of-looking-time procedure that infants start to understand ecologically valid instances of social gaze between two adults interacting, and to have expectations concerning gaze target at 10 months of age (Beier and Spelke, 2012). Besides, responsive joint attention skills (e.g., gaze following and point following) have been reported to emerge before initiative joint attention skills, from 8 months of age (Corkum and Moore, 1998; Beuker et al., 2013).

However, depending on the authors, the definitions of these social-cognitive skills can be more or less demanding, the main difference lying in whether or not individuals have mutual understanding of their shared focus of attention. The ability to "know together" that we are attending to the same thing as our partner has sometimes been referred to as shared attention (Emery, 2000; Shteynberg, 2015), which would develop in parallel with shared intentionality (Tomasello and Carpenter, 2007). The latter involves the motivation to share goals and intentions with the other, as well as forms of cognitive representation for doing so. This ability has been argued to constitute a hallmark of the human species (Tomasello et al., 2005), even though it is particularly difficult to assess when verbal language is not available as a clue to these representations (in pre-linguistic children or non-human primates). Similarly, joint action may rely solely on the learning of the cues that appear significant (e.g., gestures and eye contact) to coordinate actions in space and time with a partner, or it may also involve, in a more demanding perspective, the common and explicit knowledge of the objectives of the activity and of the way to achieve them (Tomasello and Carpenter, 2007).

Joint attention and joint action, whether they are accompanied or not with shared and explicit intentions, thus allow children

to participate with others in collaborative activities in which each partner benefits from the joint outcome and/or from the interaction in itself. In a series of experiments, the ability to coordinate with a partner in social games was shown to significantly improve between 18 and 24 months of age, whether the games involved complementary or similar roles (Warneken et al., 2006). In the first game of this study, one person had to send a wooden block down one of a tube mounted on a box on a 20 degrees incline, while the other person had to catch it at the other end with a tin can that made a rattling sound. Two tubes were mounted in parallel so that individuals could perform in turn the different roles. In the second game, two persons had to make a wooden block jump on a small trampoline (67 cm diameter ring covered with cloth) by holding the rim on opposite sides. The trampoline collapsed when being held on only one side. Children successfully participated in both games, although the 24 month-olds were more proficient than the 18 month-olds, and they all produced at least one communicative attempt to reengage the adult partner when the latter stopped participating in the activity. Children for example pointed at the object, and/or vocalized while looking at the adult, which was regarded as evidence for a uniquely human form of cooperation, involving shared intentionality (Warneken et al., 2006). A less "mentalistic" interpretation could be proposed (D'Entremont and Seamans, 2007), but these results nevertheless highlight children's motivation for reinstating joint action toward a shared goal. The development of this capacity has received much attention from researchers, as the initiation of joint attention appears to be strongly related to language comprehension and production in the second and third year of life (Colonnesi et al., 2010; Cochet and Byrne, 2016), as well as to theory of mind ability (e.g., Charman et al., 2000; Milward et al., 2017) in both typical and atypical development (e.g., Adamson et al., 2017).

In addition, the observation of children's behavior during collaborative activities may lead to a thorough description of multimodal communication (e.g., gaze, facial expressions, gestures, and verbal language) and of the way its components become coordinated. For example, the production of gestures gradually coordinates with gaze in the course of development. Children start to produce pointing gestures to orient the attention of another person around 12 months of age; an object, a person or an event can become the shared focus of attention but then children do not usually look at their partner while they point (Franco and Butterworth, 1996). A couple of months later, they are able to alternate their gaze between their partner and the object of interest, which represents a key feature of intentional triadic interactions (Cochet and Vauclair, 2010). At 16 months of age, gaze toward the adult can precede the production of pointing (Franco and Butterworth, 1996), suggesting that children may thus take into account the partner's attentional state before initiating communication (Lamaury et al., 2017).

Children also gradually learn to take account of their partner's facial expressions to infer their emotional state and adjust their response accordingly. Infants are sensitive to the characteristics of faces from very early on; newborns look for example significantly longer at happy expressions than at fearful ones, demonstrating some discrimination skills (Farroni et al., 2007). The still-face paradigm, initially designed by Tronick et al. (1978) also suggests that infants have expectations about interactional reciprocity from a few months of age, partly relying on emotional expression. This sensitivity manifests itself in specific behavioral and physiological responses (e.g., reduced positive affect and gazing at the parent, increased negative affect, rise in facial skin temperature) when the mother puts on a neutral and unresponsive face, after a period of spontaneous play with his/her infant (Aureli et al., 2015). The ability to recognize and identify facial expressions of basic emotions further develops in preschool children, before they can understand a few months later the external causes of emotions and then, around 5 years of age, the role of other's desires or beliefs in emotional expression (Pons et al., 2004).

During play interactions, being attentive to the other's facial expressions allows each partner to consider the emotional nature of the signals (e.g., joy, surprise, and frustration) and to possibly modify his/her own behavior to change or maintain this emotional state. The development of facial expression perception thus plays a key role in the emergence of joint actions, in coordination with other communicative modalities. Facial expressions are indeed usually synchronized with vocalizations and/or gestures, and this from infancy.

The vocal and the gestural modalities also become more and more coordinated as children grow older, which represents a key feature of human communication as we use gestures as we speak throughout our life. Communicative gestures are first complemented by vocalizations, whose prosodic patterns may already code for semantic and pragmatic functions (Leroy et al., 2009). In the second year of life, children then produce their first gesture-word combinations, which have an important role in the transition to the two-word stage (e.g., Butcher and Goldin-Meadow, 2000). Pointing and conventional gestures (e.g., waving goodbye, gestural agreement, and refusal: Guidetti, 2002, 2005) remain in the child repertory after the two-word stage, but other forms of gestural-vocal coordination are observed from 3 years of age with the emergence of co-speech gestures. Although we are usually not aware of producing or perceiving them, co-speech gestures can lend rhythm, emphasize speech and sometimes serve deictic or iconic functions. The deictic presentation of pointing gesture can for example be combined with vocal pointing, performed through syntactic or prosodic means (Lœvenbruck et al., 2008). Such coordination between the vocal and gestural modalities is omnipresent in adults and play a crucial role in face-to-face communication for both speaker and listener (e.g., McNeill, 2000; Kendon, 2004).

Moreover, the characteristics of gaze, gestures, and vocalizations and their coordination may vary according to the communicative function of the signal. A gesture can indeed serve different purposes, starting with the traditional distinction between imperative and declarative functions (Bates et al., 1975). Imperative gestures are used to request a specific object or action from a partner whereas declarative gestures are used to share interest with the other about some referent or provide him/her with information that might be useful. Imperative and declarative pointing, which both represent powerful means of establishing joint attention, have been extensively studied and compared:

hand shape and body posture were shown to differ according to the communicative function of the pointing gesture (Cochet et al., 2014), as well as the frequency of gaze alternation between the partner and the referent and the frequency of vocalizations (Cochet and Vauclair, 2010). These comparisons (see section "Pragmatics in HRI: Which Ingredients Are Necessary for Effective Interactions?" for more detailed results) thus highlight the strong relationship between the form of the gestures (in the broad sense, i.e., including visual and vocal behavior in addition to movement kinematics and hand shapes) and pragmatic features in children, even semantic ones in adults (Cochet and Vauclair, 2014).

To sum up, when two children are playing together or when a child is playing with an adult, they do so in the framework of joint action; they attend to a common situation and use multimodal communication to initiate, maintain, or respond to the interaction. These three different roles in the interaction can be assessed with the Early Social Communication Scales, in particular with the French version (Guidetti and Tourrette, 2017). In an evaluation situation, giving the child the opportunity to initiate the interaction is particularly crucial in atypical development, for example in children with ASD. The initiation of shared attention is a key ability in this context as it allows joint action coordination (Vesper et al., 2016) and has also significant consequences on the development of cognitive and emotional processes (Shteynberg, 2015). Whether this coordination relies on the representation and the understanding of the other's intentions or only on behavioral cues is a challenging question, as we do not have any direct access to the other's subjectivity. In the field of HRI, an objective that appears sufficiently ambitious for now, or at least the one we chose to focus on in the present review, is to design robots able to identify the observable changes in the human's behavior, in order to make the right inferences and thus the appropriate decisions in the interaction. This appears as an essential condition for a successful exchange between a robot and a human, which can depend on the joint outcome (has the common goal been reached?), but also on the way the interaction has been perceived by each individual, for example in terms of coordination between gaze and gesture and fluidity of movement (Hough and Schlangen, 2016). The richness of communication here lies indeed in the ability of each partner to integrate multiple communicative cues in a way that what will seem natural to the humans, i.e., that will be close to peer interaction in everyday life.

This appears as a complex ability and probably the most challenging one to replicate in HRI. In pursuit of this objective, we now need to further describe the concept of appropriateness and propose a frame to determine the relative importance and the relative complexity of the different behaviors observed during joint activities such as social play.

#### TO WHAT EXTENT CAN INTERACTIONS BE CHARACTERIZED AS COMPLEX?

Smith (2015) has argued that "development, like evolution and culture, is a process that creates complexity by accumulating change." This perspective applies to the development of social interactions, from the emergence of joint attention to coordinated and multimodal communication that enable joint action. Several attempts have been made in developmental robotics to explore the cognitive, social, and motivational dynamics of human interactions (Oudeyer, 2017); algorithmic and robotic models can then be used to study the developmental processes involved for instance in imitation (Demiris and Meltzoff, 2008) or language (Cangelosi et al., 2010). In this context, roboticists aim at designing systems allowing for selforganized and "progressive increase in the complexity" of the robot's behavior (Oudeyer et al., 2007).

To benefit further from their exchanges, developmentalists and roboticists may therefore need to frame the study of HRI by disambiguating the concept of complexity. Because "complicated systems will be best understood at the lowest possible level" (Smith, 2015), we aim to differentiate different levels of complexity depending on the nature of the elements to take into account for decision making. This analysis will allow us to go forward in the study of joint attention and joint action and define what is implied by the qualifying terms "complex" (or rich) and "appropriate" (or effective) when referring to interactions.

To this end, we used a categorization recently proposed in research on animal behavior, including human and non-human primates, to define the concept of complexity (Cochet and Byrne, 2015). Three dimensions have been described: motor precision, coordination, and anticipatory planning, which can relate to both individual and social activities. The authors argue that "the complexity of a given mechanism/behavior can be assessed by distinguishing which of these three dimensions are involved and to what degree," which may "clarify our understanding of animal behavior and cognition." Such analysis applied to joint attention and joint action, although there may be other ways of untangling the question of complexity, may here allow researchers to dissect the different factors involved in social interactions for each dimension, and thus help them assess the "manipulability" of these factors in HRI.

In order to make appropriate decisions in a collaborative task, i.e., decisions leading to the desired joint outcome and/or decisions that approach the characteristics of human interactions, the robot first needs to recognize specific patterns in his/her partners' behavior, without asking for agreement or information for all actions. The robot can for example rely on gaze direction, manual movements or body posture to identify the human's attentional and intentional states and thus define the most useful role it can play in the interaction. By way of illustration, if a human and a robot share the common goal of building a pile with four cubes in a definite order and putting a triangle at the top, each of them can perform different actions: they can grasp an object (a cube or a triangle) on the table, grasp an object on the pile, give an object to the partner, support the pile while the partner places a cube on it, etc. Other actions can emerge, for example if the pile collapses or if one agent does not pile the cubes in the correct order (Clodic et al., 2014). Individuals can then blame each other, or give each other some instructions. In addition to the perception of its own environment, the robot thus has to observe the activity of the human and take his/her perspective

(e.g., to determine whether an object is reachable for the other).

Motor precision is therefore necessary in this context to obtain flexible and human-aware shared plan execution (Devin and Alami, 2016), as it enables a selective shift of attention toward aspects of the environment that will become shared knowledge, which has also been described as the accuracy of shared attention states (Shteynberg, 2015). First, the emergence of joint attention requires to properly use gaze and/or pointing gesture to localize the object or event referred to. Verbal cues also demand particularly fine motor skills through speech articulators. Second, joint action necessitates some motor control to reach the expected outcome, hence the importance of evaluating beforehand human motor skills, especially during development, as well as the technical capabilities of the robot. Following on from the previous example, children's grasping skills in relation to the size of the cubes as well as the characteristics of robotic gripper to handle objects have to be finely described.

Moreover, recent experimental findings have shown that the execution of object-oriented actions is influenced by the social context such as the relative position of another person and the degree of familiarity with this person (Gianelli et al., 2013). Individuals perform for example more fluent reach-tograsp movements, with lower acceleration peaks and longer reaction time when a partner is located close enough to be able to intervene on the same object than when he/she is farther away (Quesque et al., 2013). In addition, there is a significant relationship between the kinematic features of the actions and the actor's explicit social intention: movements have longer durations, higher elevations and longer reaction times when individuals place an object on a table for another person than when they place the object for a later personal use (Quesque and Coello, 2015). These variations, although they do not seem to be intentionally produced, have been suggested to facilitate the partner's detection of planned actions, thus enhancing potential interactions. These kinematic effects were indeed shown to influence the subsequent motor productions of an observer (Quesque et al., 2015). The motor characteristics of actions performed in a social context may therefore prime the perceiver to prepare and anticipate appropriate motor responses in the interaction.

The second dimension that can allow us to understand the complexity of joint activities pertains to the coordination between several communicative modalities and between interacting individuals. Whether joint action involves complementary or similar roles, it can be performed through several coordination processes, which can determine the efficiency of shared attention states (Shteynberg, 2015). Efficiency requires here a representational shift from the first-person singular to the firstperson plural, as the partners attend to the same referent at the same time. The ability to monitor each other's attention and action, using behavioral cues such as gaze direction, facial expressions, gestures, and speech is essential for successful coordination. The intentional production of communicative signals, representing hints for one's partner, is also an efficient way of achieving joint outcomes.

Coordination is therefore necessary first at the individual level, so that the different communicative modalities such as gestures and gaze synchronize or follow one another in a natural order, i.e., acceptable with regard to human interaction patterns (see above). Each agent can then make decisions based on these signals, moderate their behavior accordingly and thus coordinate at the social level to reach a common objective. The ability to adjust one's behavior to others' actions during collaborative activities (including play) has been argued to "reach a higher degree of complexity when intentional and referential signals are directly addressed to specific individuals" (Cochet and Byrne, 2015). In order to build the pile of cubes, interacting partners can then for example point toward a specific cube or ask the other to wait before placing another cube.

In those cases, coordination processes can be enhanced by predicting the effects of each other's actions on joint outcomes and by distributing tasks effectively (Vesper et al., 2016). This ability involves the third dimension characterizing the question of complexity, namely the dimension of anticipatory planning (Cochet and Byrne, 2015). It requires to go beyond the immediate perception of the environment and represent the relationship between a sequence of actions and a precise goal. At the individual level, planning ability implies to mentally review an action sequence in anticipation of a future need (e.g., selecting a specific cube in a first room in order to build a pile of cubes in another room). At the social level, planning ability allows individuals to predict the other's behavior and adjust one's own sequence of actions, leading to a better coordination. Whether the ability to make such inferences necessitates to mentalize about others' inner states (e.g., beliefs and preferences) is still subject of debate, but again, this question may not be central in the context of joint attention and joint action between a robot and a human.

The above-described categorization can therefore provide a common ground between ethologists, psychologists, and roboticists that may clarify which dimensions need to be considered in an attempt to implement the characteristics of motor precision, coordination and anticipatory planning in human–robot joint activities (see **Table 1** for an overview). The objective is to approach the complexity (or richness) of human interactions and obtain appropriate (or effective) responses from robots with regard to these different dimensions.

### PRAGMATICS IN HRI: WHICH INGREDIENTS ARE NECESSARY FOR EFFECTIVE INTERACTIONS?

The increasing complexity of communicative abilities (complexity that involves the three above-mentioned dimensions) in the course of human development leads to a rich potential of interactions. Children actively go through different stages allowing them to engage successfully in joint activities, i.e., to operate within their physical environment, coordinate with other people, plan their own behavior and anticipate their partners'. Intending to model, at least partially, human developmental pathway seems a fruitful way of designing robots that can effectively initiate and respond to communicative

TABLE 1 | Complexity in HRI: illustration of three dimensions at the individual and social levels (adapted from Cochet and Byrne, 2015).


situations. Such enterprise, although still recent, has given rise to a substantial amount of literature in robotics, especially from the 2000s, covering several sub-fields such as for example developmental and epigenetic robotics, cognitive systems and social robotics. Several journals, including both HRI experimental studies and computational modeling, focus entirely on these questions (e.g., IEEE Transactions on Cognitive and Developmental Systems, Journal of Human-Robot Interaction, Journal of Social Robotics), and numerous conferences also take place every year, whose proceedings are usually available online<sup>1</sup> .

The data from developmental psychology described in the first section, coupled with the framework proposed in the second section to help researchers define complex and effective HRI, may contribute to this growing body of work. To this effect, it seems necessary (1) to consider the multimodality of interactions and (2) to adopt a pragmatic perspective to be based upon an accurate representation of human communicative behaviors. Indeed, children learn to communicate through joint activities with adults who combine various forms of expressions, serving various functions. In the course of development, children gradually integrate the dissociation between the form and the function of language – they become more and more flexible in understanding that a single form can serve different functions and reciprocally, that a single function can be expressed through several forms. Language is here regarded as more than a medium to convey an information, in agreement with a proposition that was developed in the speech act theory (Austin, 1962; Searle and Vanderveken, 1985). Language would be way of acting on the environment, of "doing things with words," independently of its structural properties. Initially aiming at describing the relationships between the forms and functions of linguistic utterances, this theory defines several speech acts, depending on whether one intends to assert, comment, warn, request, deplore, etc. This theory has later been adapted to non-verbal behavior (e.g., McNeill, 1998; Guidetti, 2002). The form still refers to the message structure, but applies to the whole body, including the posture, the structure of communicative gestures (kinematic features and hand shape), gaze and facial expressions. These nonverbal signals can be used in complementarity with speech or be used alone for example in the case of conventional gestures (see Guidetti, 2002). The function refers to the illocutionary force of the speech act (what one achieves by speaking), in other words here to the effect of these communicative acts in a specific context, thus giving some insight into the signaller's intention. Gestures, and especially the conventional gestures produced by children during the prelinguistic period, are thus regarded as genuine communicative acts, with a propositional content that can equal the one expressed by words. For instance, agreeing and refusing can be expressed gesturally by nodding or shaking one's head. The separate analysis of the forms and functions of communication, as well as the description of the different modalities involved during interactions, therefore provide a key framework to help define what capacities the robot should be equipped with to ensure efficient collaboration with humans.

In this perspective, Mavridis (2015) has proposed a list of "ten desiderata that human–robot systems should fulfill" to maximize communication effectiveness. One of the guiding lines relates to the importance of considering multiple speech acts, for both verbal and non-verbal communication, and not restrict the robot competencies to "motor command requests." In the same way as imperative gestures (see section "How Does Communication Develop in the Context of Social Play?") are generally understood and produced later than declarative gestures in human development (Camaioni et al., 2004), robotic systems initially aimed to assign the robot a servant role, with the human driving the interaction. Devising wider robots' pragmatic abilities is a first step toward the conception of human–robot shared plans. The robot may for example comment on the pile of cubes as it is being built (see example section "To What Extent Can Interactions Be Characterized as complex?") to support or correct the human's action, rather than just producing a motor response to the human request. The dimension of social coordination is thus added to that of motor precision (see **Table 1**).

Similarly, flexibility in HRI also requires "mixed initiative dialog" (Mavridis, 2015), so that the robot can both initiate and respond to the interaction. Integrating models based on human adaptation and probabilistic decision processes, Nikolaidis et al. (2017) have indeed shown that the performance of human–robot teams in collaborative tasks is improved when the robot guides the human toward an effective strategy, compared to the common approach of having the robot strictly adapting to the human. The human's trust in the robot was also facilitated by a greater symmetry in role distribution and adaptation between the robot and the human, which might in turn lead to greater acceptability of HRI.

Designing such "socially intelligent and cooperative robots" (Breazeal et al., 2004) requires specific temporal dynamics of the interaction, which represents a considerable challenge especially at a computational level. These dynamics convey social meanings

<sup>1</sup>For example, http://www.lucs.lu.se/epirob/

to such an extent that any delay in the interaction can sometimes question its effectiveness. Researchers here face a dilemma that seem to bring into opposition interaction complexity (which requires to take account of numerous parameters) and interaction timing. The implementation of fast timescales (on the order of 100 ms) is usually considered necessary for robots to integrate (i.e., detect, interpret, and predict) and react to social stimuli in a timely manner through interactions (Durantin et al., 2017). Researchers developing a storytelling robot interacting with children aged 4–5 years have confirmed the importance of temporal features in the pragmatics of interactions. Contingent responses from the robot, in relation to the attentional and social cues signaled by the children, were indeed found to facilitate engagement of the latter (Heath et al., 2017).

The variation in some characteristics of the robot's behaviors according to the action performed may also illustrate further the question of pragmatics in HRI, moving us one step closer toward human-like interactions. For example, the morphological differences that have been reported in young children between pointing and reaching (Cochet et al., 2014) could be applied to the robot. First, regarding body posture, we might expect robots to lean closer to a given object when they intend to grasp it than when they want to communicate about that object. Second, depending on the robot technical possibilities (e.g., twoor three-finger grippers, biomimetic anthropomorphic hands), differences in the form of manual gestures produced should be observed between imperative and declarative pointing. The former is typically characterized by whole-hand gestures (all the fingers are extended in the direction of the referent), while the latter is mostly associated with index-finger gestures (the index finger is extended toward the referent and the other fingers are curled inside the hand) (Cochet and Vauclair, 2010; Liszkowski and Tomasello, 2011). Hand shape is also influenced by precision constraints: imperative gestures are likely to shift from whole-hand pointing to index-finger pointing when the target is surrounded by distractors (Cochet et al., 2014), which can be the case when the robot has to identify a specific object among several (e.g., the human can ask the robot to give him/her the red cube). Here, the notion of iconicity, which plays a role in both oral and sign languages, may help researchers to precisely analyze the structure of gestures and better understand the interface between gestures and signs (Guidetti and Morgenstern, 2017). The importance of motor precision is here directly related to the dimensions of coordination and anticipatory planning, therefore providing a comprehensive framework to assess the complexity and effectiveness of HRI.

Moreover, the importance of implementing responsive social gaze in robots has previously been highlighted (e.g., Yoshikawa et al., 2006), but this response might also vary depending on the communicative function involved. To mirror child development, gaze alternation between the partner and the referent should indeed be more frequent in declarative situations than in imperative ones (Cochet and Vauclair, 2010). The coordination between gestures and gaze (see also section "How Does Communication Develop in the Context of Social Play?") is also an important factor, which can help the robot to estimate the state of goals, plans, and actions from human point of view, and allow the human to feel that he/she is involved in fluid interactions with the robot, both facilitating the emergence of joint outcomes. If a robot alternates its gaze between an object and its partner before initiating a pointing gesture, the human may for example interpret this behavior as the robot's willingness to take into account his/her attentional state before gesturing, thus favoring the exchange of information. Broadly speaking, coordinated gaze behavior could be considered as the most fundamental modality for effective HRI, or at least as a key prerequisite in collaborative tasks.

The consideration of facial expressions may also facilitate turn-taking dynamics and limit miscommunication, by allowing some inferences about the other's affective state. Integrating the emotional component into HRI gives each partner additional cues to decide what is the most appropriate response in a given situation. The development of methods for facial expression analysis raises several issues though (e.g., Kanade et al., 2000). Even if there have been some attempts to design facial expression mechanism in humanoid robots (e.g., Hashimoto et al., 2006; Gao et al., 2010), most of current robots' facial features are still far from the extremely rich motor possibilities of the human face. In parallel, the development of real time coding of emotional expressions seems to be an achievable goal (Bartlett et al., 2003), allowing robots to directly perceive some changes in the human facial expressions.

In addition to visual information, the auditory modality can also play a role in influencing robots' and humans' decisions and coordination processes. In children at around 2 years of age, vocalizations accompany more frequently declarative gestures than imperative ones (Cochet and Vauclair, 2010). More recently, the prosody of these vocalizations was shown to gradually match the function of pointing during the second year of life (Tiziana et al., 2017), allowing to differentiate imperative from declarative gestures (Grünloh and Liszkowski, 2015). Other features such as the positioning of the object and the attentional state of the partner have also been suggested to influence the rising and falling tones in the vocal productions simultaneous to gestures (Leroy et al., 2009). Prosody can therefore serve pragmatic purposes, and changes in pitch, intensity, or duration of speech or vocalizations can in this regard be considered as a full-fledged component of multimodal communication.

Beyond prosody, language content may be the most effective way for human–robot teams to coordinate. However, the design of robots with language comprehension and production abilities that could lead to fluid conversations with humans raises several issues. Verbal language requires indeed symbolic representations, which need to be connected not only to the robot's sensory system, but also to "mental models" of the world internalized within its cognitive system. Mavridis (2015) has highlighted here the question of "situated language and symbol grounding." For example, the relation between the verbal label "cube" uttered by the human and the physical cube that it refers to in front of the robot can be mediated through sensory data, but the use of conventional signs should allow the robots to go beyond the hereand-now and extend symbol grounding to abstract entities in addition to objects, people, or events. To implement architecture that can be compared to human interactions, this relation should

be bidirectional: the visual perception of a cube should activate the right symbol in the robot's cognitive system, leading to the production of the word "cube"; reciprocally, a request addressed to the robot to give the human the cube should create a precise representation, allowing the robot to identify the right object.

Moreover, the identification of emotion labels in the verbal modality could also contribute, in addition to the recognition of emotional facial expressions and acoustic properties of speech (see Breazeal, 2004 for a complete review on emotion systems in robots), to a better coordination between each partner of the interaction. The haptic modality, playing an important role in social interactions, is also regarded as a valuable medium for expressing emotion (Yohanan and MacLean, 2012). By developing motion capturing system and tactile sensors, the robot may use its human partner's positions and such "affective touch" to estimate human intentions (Miyashita et al., 2005). This modality, essential in human development, may be a particularly good candidate to study complexity of HRI, involving simultaneously motor precision, coordination and planning (see section "To What Extent Can Interactions Be Characterized as complex?").

Finally, in addition to the coordination dimension, the verbal dialog between a robot and a human would ideally imply purposeful speech and planning (Mavridis, 2015), in order to avoid fixed mapping between stimuli and responses. Anticipatory planning abilities, as described in Section "To What Extent Can Interactions Be Characterized as complex?", would enable the robot to make the most appropriate or efficient decisions in a given shared activity, in conjunction with its perspective-taking skills and the goal of the activity. If the robot can represent which information are needed by the human to perform a specific action (and therefore identify which information the human misses), it can decide to express a verbal request or comment on the situation, and/or plan a sequence of actions to coordinate with its partner.

This last example raises the question of intrinsic motivation in interactions: why is each partner engaged in this multimodal coordination, and to what extent does it influence the characteristics of the interaction? Studies in developmental robotics have shown that intrinsic motivation systems based on curiosity can directly impact learning skills and lead to autonomous mental development in robots (Oudeyer et al., 2007). Such mechanism is obviously involved in human development and in social play in particular: children discover and create new possibilities by exploring their physical and social environment. Through the development of social referencing, self-consciousness or cooperation, human social interactions may even sometimes constitute a motivated goal per se (Tomasello, 2009), which provides some perspectives to shape robots' intrinsic motivation with a "social reward" function.

We can see here that the relationships between theories in developmental psychology and robotics offer bidirectional benefits. To put it in a nutshell, some models in developmental robotics are based on psychological theories, which are then formalized and implemented in robots, while developmental robotics allows researchers in psychology to go further in the elaboration of their theories through thorough experimentations and hypothesis testing. This applies to a variety of questions addressed in this review, from the conditions that influence learning process during interactions (Boucenna et al., 2014) to the description of stages in language development (Morse and Cangelosi, 2017). Advances in developmental robotics may thus provide previous help in the analysis and implementation of the processes involved in interactions.

### CONCLUSION AND PERSPECTIVES

The question at stake in the present work was to improve the effectiveness of human–robot interactions in collaborative tasks, first in terms of joint outcomes – has the task been completed? – but also with regard to the human's perception and interpretation of the interaction. Is the robot's behavior appropriate, i.e., acceptable, considering the frame of human communication? We argue here that the observation of the development and the structure of interactions between the child and the adult, especially in the context of social play, can help answer this question. To shape a shared common space between the human and the robot that could reflect the complexity of human interactions, we have also proposed to focus on three dimensions: motor precision, coordination, and anticipatory planning. The specific examples developed in Section "Pragmatics in HRI: Which Ingredients Are Necessary for Effective Interactions?" suggest that the more robots use human-like communicative modalities (e.g., facial expressions, gestures, and language) in respect to these three dimensions, the more they invite interactive behaviors that are natural to people. The interpretation of dealing with a social agent is strengthened, which facilitates in turn the interaction with robots. In this sense, and to paraphrase Cangelosi et al. (2010), the integration of action and language may constitute a roadmap to better frame and assess HRI from a developmental point of view and with a pragmatic perspective.

However, there are still numerous obstacles before achieving the level of details pictured in the present article, involving mainly technological challenges, given the motor and cognitive correlates of the above-mentioned behaviors. To put it bluntly, developmental psychologists cannot expect roboticists to implement in robots all the subtleties of multimodal communication that occur in human children. There may also be some conceptual difficulties as the attempts to approach human realism, aiming at maintaining the human's trust in the robot, can sometimes be confronted with an uneasy feeling of viewing and/or hearing a robot that looks imperfectly human. This uncanny valley effect (Mitchell et al., 2011; Mori, 1970, 2012), which was shown to emerge in middle childhood in relation to developing expectations about humans and machines (Brink et al., 2017), may complicate the design of socially interactive robots, both in terms of appearance and behavior. Empirical evidence for the uncanny valley seems nevertheless inconsistent or restricted to specific conditions (Kätsyri et al., 2015), with the definition of human-likeness mostly involving physical realism.

By contrast, anthropomorphic behavior (see Duffy, 2003), in addition to its facilitating role in the interaction with humans (see

above), also results in better and faster learning by the robots. For example, in a task in which they have to learn the meaning of words, the robots' performances are enhanced when they provide humans with social cues to communicate a learning preference, as these cues influence the tutoring of the human teacher (de Greeff and Belpaeme, 2015). We observe the same phenomena when human children start to learn new concepts: according to Bruner's constructivist theory, children need scaffolding from adults (or from children who have already acquired the concept) in the form of active support, which may represent at first a reduction in the choices a child might face. Such learning processes play obviously an important role in human development, and may also enable quick and effective application of robotic systems. Multi-level learning may indeed constitute a key line of research for HRI (Mavridis, 2015), which might again benefit from research in developmental psychology.

Reciprocally, the field of robotics provides interesting perspectives for psychologists, especially for research on atypical development. Atypical development might be a direct window on typical development and vice versa: "development is the key to understanding developmental disorders" (Karmiloff-Smith, 1998). Joint action and joint attention are for example usually impaired in children with ASD; the comparison with typical development has revealed different use of social gaze and often a lack of the declarative function, both for verbal and non-verbal communication. The exchanges between robotics and developmental psychology could help conceptualize the stages of joint attention in order to better understand how children develop joint attention and get through the whole sequence of declarative pointing. This will have an impact on elaborating intervention programs for children with neurodevelopmental disorders. Moreover, numerous intervention programs have recently been proposed

#### REFERENCES


showing the added value of therapy robot for the development of communication, play, or emotional skills (e.g., Robins et al., 2009; Huijnen et al., 2016).

In conclusion, the combination of insights and methods in robotics and developmental psychology allows researchers to conceive models of HRI in which the robots can come to develop motor, social, and cognitive skills. These models may benefit fundamental research on joint attention and joint action in typical development, but also early evaluation and intervention programs for atypical development (e.g., Dautenhahn, 2007). The continuation of these interdisciplinary discussions, which may possibly integrate some of the elements proposed in the present article, will undoubtedly lead to more and more solid HRI models in the next decades.

#### AUTHOR CONTRIBUTIONS

HC and MG devised the conceptual ideas presented in the article. HC drafted the manuscript. MG revised it critically and gave final approval of the version to be submitted.

#### FUNDING

This article is part of the project JointAction4HRI, funded by the French National Agency for Research (n◦ 16-CE33-0017).

#### ACKNOWLEDGMENTS

Many ideas presented in this paper stem from fruitful discussions with R. Alami, A. Clodic, and E. Pacherie, all involved in the Joint Action for Human-Robot Interaction project funded by French National Agency for Research (Project No. 16-CE33-0017-01).

a human partner. IEEE Trans. Auton. Mental Dev. 6, 213–225. doi: 10.1109/ TAMD.2014.2319861



Garvey, C. (1990). Play. Cambridge, MA: Harvard University Press.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Cochet and Guidetti. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Changes in Posture and Interactive Behaviors as Infants Progress From Sitting to Walking: A Longitudinal Study

Sabrina L. Thurman<sup>1</sup> \* and Daniela Corbetta<sup>2</sup>

<sup>1</sup> Department of Psychology, Elon University, Elon, NC, United States, <sup>2</sup> Department of Psychology, The University of Tennessee, Knoxville, Knoxville, TN, United States

#### Edited by:

Karen E. Adolph, New York University, United States

### Reviewed by:

Lana Karasik, College of Staten Island, United States Caitlin Fausey, University of Oregon, United States Mark Schmuckler, University of Toronto Scarborough, Canada

> \*Correspondence: Sabrina L. Thurman sthurman2@elon.edu

#### Specialty section:

This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology

Received: 18 January 2018 Accepted: 27 March 2019 Published: 12 April 2019

#### Citation:

Thurman SL and Corbetta D (2019) Changes in Posture and Interactive Behaviors as Infants Progress From Sitting to Walking: A Longitudinal Study. Front. Psychol. 10:822. doi: 10.3389/fpsyg.2019.00822 This longitudinal study assessed how infants and mothers used different postures and modulated their interactions with their surroundings as the infants progressed from sitting to walking. Thirteen infants and their mothers were observed biweekly throughout this developmental period during 10 min laboratory free-play sessions. For every session, we tracked the range of postures mothers and infants produced (e.g., sitting, kneeling, and standing), we assessed the type of interactions they naturally engaged in (no interactions, passive involvement, fine motor manipulation, or gross motor activity), and documented all target transitions. During the crawling transition period, when infants used sitting postures, they engaged mainly in fine motor manipulations of targets and often maintained their activity on the same target. As infants became mobile, their rate of fine motor manipulation declined during sitting but increased while kneeling/squatting. During the walking transition, their interactions with targets became more passive, particularly when sitting and standing, but they also engaged in greater gross motor activity while continuing to use squatting/kneeling postures for fine motor manipulations. The walking period was also marked by an increase in target changes and more frequent posture changes during object interactions. Throughout this developmental period, mothers produced mainly no or passive activity during sitting, kneeling/squatting, and standing. As expected, during this developmental span, infants used their body in increasingly varied ways to explore and interact with their environment, but more importantly, progression in posture variations significantly altered how infants manually interacted with their surrounding world.

Keywords: infancy, locomotion, posture, interactive behaviors, exploration, longitudinal study

## INTRODUCTION

Infants develop curiosity about the world. This encourages them to interact and explore objects and people in it out of their own volition (Piaget, 1936). By acting on the environment with their own bodies, infants come to understand their surroundings, and learn the interrelationships between their own action capabilities and the features of the environment that support those actions

(Gibson, 1988). Postures – the particular body and limb configurations used at any moment – mediate action development in meaningful ways (Rochat and Bullinger, 1994). For example, the acquisition of each new posture provides a unique lens through which infants can view the world, and it allows them to accrue a range of possibilities for moving about and physically interacting with the environment (e.g., Adolph, 2008; Pierce et al., 2009; Thurman and Corbetta, 2017). Such interactions contribute to psychological change (Campos et al., 2000), and lay a foundation for future cognitive skills (e.g., Bornstein et al., 2013; Libertus and Violi, 2016), and long-term brain development (Bernier et al., 2016).

Infants' motor and interactive behaviors also often occur in an environment attended by their caregiver (e.g., Campos et al., 2000; Bigelow et al., 2004; Lobo and Galloway, 2012; Karasik et al., 2014; Fukuyama et al., 2015). The interactive activities that mother and child each produce in the environment may change as infants acquire new motor skills. Little research has described the posture and physical interaction patterns that mother and child display in free-play activities over the first 2 years of life. This study aims to capture how infants and their mothers use their bodies to manipulate targets in a playroom as the infants transition from sitting, to crawling, and walking.

### The Role of Posture in Infant Interaction and Exploration

Postures can be seen as a means through which infants use their bodies to interact with their surroundings. Depending on the motor skill level and posture used, physical interactions with objects can be facilitated or reduced. For example, when sitting, infants' hands are free allowing them to manipulate and explore objects, sometimes in sophisticated ways (Rochat, 1989; Soska et al., 2010; Lobo and Galloway, 2013; Lobo et al., 2014; Soska and Adolph, 2014). When in prone, however, infants are limited to use one hand to lift their torso off the ground, while using the other to reach out for an object which can reduce the range of actions (Rocha and Tudella, 2008). When standing and walking, infants' hands are free again and can further expand their range of possibilities, while during hands-and-knees crawling, infants are less likely than walking infants to carry objects (Karasik et al., 2012). Thus, each posture provides unique problem spaces and constrains how the body can be used (Adolph, 2008).

Postures can also alter the stability of the body, the demand of attentional resources, and what can be perceived in the surroundings (e.g., Kretch et al., 2014; Franchak et al., 2018). Certain postures and their relative stability can even influence the use of the limbs and hands. For example, transitioning from sitting to crawling, and from crawling to walking affects the way infants use their arms for reaching and retrieving objects (Corbetta and Bojczyk, 2002; Corbetta and Thelen, 2002). Unstable postures often require more of the infants' effort for balance. For example, the hands are needed to hold onto surfaces during cruising (Berger et al., 2014), or they may be simply needed to balance the body when newly standing (Ledebt, 2000). Attention to balancing one's posture may limit object manipulations, however, holding objects can help stabilize standing in infants (Claxton et al., 2013). Finally, energetically expensive forms of locomotion such as crawling can affect object interactions (Dosso and Boudreau, 2014).

Thus, research has shown that postural progression and postural control can modulate infant's experiences with objects, people, and their wider environments. Furthermore, these posture-specific working spaces not only change depending on the posture adopted, but also may affect object interactions at any given moment, and over the course of motor development. In this study, we examine more closely how infants' expanding repertoire of postural skills as they acquire locomotor skills, affects their manipulatory behaviors and interactive activities with objects in their surroundings. Prior research has provided single snapshots of infants' interactive behaviors at certain developmental times. Here, we track how infant interactive behaviors reorganize as they progress from sitting through walking. In doing so, we also document mothers' posture and interactive behaviors as their infants acquire new motor milestones.

### The Role of Locomotion in Infant Target-Directed Behavior

The emergence of self-produced locomotion elicits significant improvements in target-directed behaviors (e.g., Gustafson, 1984; Campos et al., 2000). Before infants can produce hands-andknees crawling, they almost exclusively interact with objects, people, and contexts in their close proximity (Pierce, 2000; Zachry and Mitchell, 2012). With the onset of self-produced locomotion, however, infants become active agents in their own expanding world. Their interactions shift to the wider space, and they engage in more target-directed actions such as object manipulation (Gustafson, 1984). Zachry and Mitchell (2012), who observed infant behaviors in childcare centers, discovered that crawling infants displayed more goal-directed actions (e.g., activity initiation) than did pre-crawling infants. This expansion in interactions with objects was found to relate to increases in spatial exploration of their surroundings. In free-play, mother and infant travel more distance and spread their activities around a room more following the onset of crawling, but infants' spatial explorations of the room broadens significantly more than that of their mothers' (Thurman and Corbetta, 2017). Infants' expanding spatial exploration patterns also strongly correlate to the number of bouts of interactions they perform in the room (Thurman and Corbetta, 2017). Clearly, the emergence of selfproduced locomotion brings many changes in infants' patterns of interactions with objects and target-directed behaviors.

Interestingly, walking infants' behavior appears more deliberate and target-directed compared to crawling infants. Walkers will travel even further distances to obtain a desired toy or reach a destination compared to crawlers, whose interactions appear more opportunistic (Pierce et al., 2009; Dosso and Boudreau, 2014). They also become more interactive with toys and people in the room, compared to crawlers (Campos et al., 2000; Clearfield et al., 2008; Clearfield, 2011; Karasik et al., 2011).

These studies together highlight the role of forms of self-produced locomotion in infants' expanding explorations

of their surroundings. But, how does the acquisition and expanding repertoire of new postures relate to target interactions? The behavioral rhythm of infant action patterns has been characterized by frequent and abrupt movements (Reed, 1988), but we know little about how infants actually use their bodies as they transition between moments of target-interactions and moments of pause. To our knowledge, no study has longitudinally examined how pre-locomotor, crawling, and walking infants' expanding postural skills modulate the way they interact with targets in their environment. We also are not aware of any studies investigating these patterns in mothers, although prior work has indicated that caregivers seem to mirror certain patterns of movement and postural behaviors to match those displayed by their infants (Thurman and Corbetta, 2017; Franchak et al., 2018). It remains unclear whether mothers' physical interactive behaviors also change as their infants acquire new locomotor skills.

#### The Current Study

This study is an extension of a previous report (Thurman and Corbetta, 2017) using the same longitudinal dataset. In that report, we delineated patterns of mother-infant spatial exploration, the number of object interaction bouts and posture changes displayed, and the proportion of time intervals infants and mothers spent in certain postures during free play. We found that over time, infants increased their interactive behaviors, traveled further, and spread their exploration of the room more widely than their mothers. These trends in spatial exploration were highly correlated with the number of posture changes infants and mothers performed, but interactive behavior and posture changes were positively correlated only in the infants. This seemed to suggest that infants used their postures for movement and discovery, whereas mothers seemed to play a more supportive role.

In this report, we aim to further address how the postures adopted in the moment, at different periods of development, affected the type of interactive behaviors performed on objects in the room. While in certain postures, were infants passively holding objects, finely manipulating them, or performing gross motor actions? We also wanted to capture how frequently infants transitioned between targets, if they had moments of no interaction, and whether infants changed their postures during target transitions during the session and as they progressed through developmental time. Active exploration of one's surroundings is not limited to manipulating objects with the hands or traveling to different locations. It may also entail shifting posture while interacting with objects or furniture in the surrounding. From an embodiment perspective, using the body as a whole provides new opportunities for interaction and discovery, and can provide new means-end to explore, achieve novel activities, and learn about objects in the environment. If this is a correct assumption, we should find greater postural diversity when infants are engaging in targeted behaviors compared to non-targeted behaviors.

We tracked interactive and postural activities in 13 infants and their mothers as the infants acquired crawling and walking skills. Our sampling, spanning several months across the first 2 years of life, allowed us to investigate natural changes in these behaviors in both infants and mothers, to determine how they occurred in a less controlled environment, and most importantly, to address how physical interactive activities reorganized as infants developed expanding postural forms over time. We asked, do infants and mothers alike shift interactive behaviors as infants acquire locomotion? Do interactive behaviors depend on the posture performed in the moment? And, do transitions between targets occur while maintaining or changing posture?

During the pre-locomotor period, we expected that infants would demonstrate more fine motor manipulations of objects while sitting. Given their limited range of postural skills, their posture would not differ greatly between moments of interactions with targets and moments of no interactions. Also, when transitioning between targets, they would do so while maintaining the same posture.

During the crawling period, infants begin squatting and kneeling. Because these postures will be novel and somewhat unstable, we expect that infants will engage in more passive involvement with targets (e.g., holding, hands on) when in those postures. During this period, infants orient more to their wider surroundings (Campos et al., 2000). As a result, we expect that infants will engage in less fine motor manipulation when sitting, since sitting can now be used as a transition posture (Kretch et al., 2014; Soska et al., 2015). Crawling infants also have more postural options available. These postural options may be more widely used when engaged with targets, compared to when not engaged with targets, or they may be used when transitioning from one target to another.

Finally, during the walking period, as infants have gained more postural experience and stability in squatting/kneeling postures, we expect that they may rely less on sitting postures for fine and gross motor activities, but increase their reliance on squatting/kneeling postures for these interactive behaviors. Furthermore, as infants now spend more time standing upright, they may display more passive involvement with objects when in that posture, as this newly acquired and unpracticed posture places more demands on balance. During this period, infants may begin to show an even wider range of postures when interacting with targets compared to when they are not interacting with targets. We also expect continued shifts in posture during transitions from one target to another.

We anticipate that mothers will not display significant changes in their use of postures for interaction over time. Particularly, as their infants gain more autonomy and become able to transition between targets more independently, mothers may take a more laid-back role with a decrease in interactive behaviors.

### MATERIALS AND METHODS

#### Participants

Participants in this study were the same as in Thurman and Corbetta (2017). Thirteen firstborn infants (6 females) and their mothers were followed every other week in our laboratory from the time infants were 6 months of age (M = 6.0, SD = 0.3 at first session), until they had 2 months of walking experience

(M = 14.9, SD = 1.2 at final session). Visits totaled 247 sessions across all participants. There was no attrition. **Table 1** reports the number of sessions each dyad contributed to the study.

All participants were recruited from a human subject database maintained by the Child Development Research Group at the University of Tennessee. We sent invitations for participation by mail when infants were around 5 months old. Interested families were invited to attend a non-committal informational session in our laboratory before deciding to participate. Thirteen out of 14 families who attended the information session decided to be a part of the study and signed informed consent forms. All infants were healthy throughout the duration of the study and were free of physical impairments. All families were non-Hispanic White, and over half fell into categories consistent with middle socioeconomic status (e.g., college degree(s), middleincome households). At each session, families were compensated \$10, and at the end of the study, were given a certificate of completion, a photo book with photographs and milestones from each session, and copies of all DVD recordings.

### Materials

#### Room and Toys

Infants participated in 10 min free-play sessions with their mothers held in a brightly lit, temperature controlled laboratory space. The free-play room measured 3.3 m × 3.7 m, the size of a standard bedroom. The space was accessed through a small walkway that could be closed off with a baby gate at the mother's request. The room contained a couch against one wall, and a small set of infant-sized stairs (54 cm tall), located directly across from the couch, and a large metal cabinet, a chair, and a bookshelf against another wall. Colorful foam tiles covered most of the



Number of sessions used for the analyses ranging from 5 sessions (when available) before crawling onset up to 5 sessions following walking onset. The infant ages for crawling and walking onsets according to Touwen's (1976) Group III Neurological Assessment Scale are also reported.

floor and posters were placed on the walls around the room (see **Figure 1**). In addition to these room features and large furniture items, the room was equipped with a variety of gender-neutral colorful objects to elicit fine and/or gross motor exploration (e.g., the pull-string toy resembling a traditional phone could be rolled across the floor or manually manipulated by pushing buttons). **Table 2** lists all possible targets present and available in the playroom, including the mother and infant. As shown, the room contained at all times between 28 and 29 possible targets (this includes the furniture, walls, and flooring; sometimes the mothers used their own items). Of those items, 23 (82%) were always present in the room for the duration of the entire study. Five objects better suited for younger infants were switched out at some point during the study depending on infants' locomotor skill progression, and six new ones were added (e.g., the sit-on rocking horse was replaced with the rolling melody push toy).

Sessions were recorded with two Canon Vixia HFR32 digital video cameras that were positioned on opposite sides of the room. Together, the two camera views captured all activity in the room.

#### Assessment of Postural Control

Touwen's Group III Neurological scale is an infant assessment technique designed for the evaluation of posture, muscle tone regulation, reflexes and reactions, trunk coordination, and fine and gross motor coordination (Touwen, 1976). The technique has good reliability and validity assessment scores and takes about 15 min to administer (Heineman and Hadders-Algra, 2008; Hadders-Algra et al., 2010). The assessment was administered at the end of each session in the presence of the parents.

For the purposes of the current analyses, we used two items from this scale: locomotion in prone position (crawling), and walking. Locomotion in prone position was assigned the following scores: 0 to indicate no change in spatial position, 1 for wriggling or pivoting movements, 2 for abdominal progression using the arms only, 3 for abdominal progression using the arms and legs, 4 for progression by way of a mixed pattern of abdominal progression using the arms and legs and hands-andknees crawling, and 5 for hands-and-knees crawling. We used the score of 5 as a cutoff point between pre- and post-crawling. For walking, the scores were: 0 for unable to walk, 1 for walking when held by both hands, 2 for walking when held by one hand, 3 for walking a few (less than seven) independent steps, and 4 for walking at least seven independent steps. We used the score of 4 as the cutoff point between pre- and post-walking. Consistency of these cutoff scores, when infants first exhibited those locomotor skills in the laboratory, were reassessed in subsequent sessions. **Table 1** reports the ages at which those milestones were attained.

#### Procedure

Before each session, objects were positioned in consistent locations in the room (e.g., the sit-on pushcart was always placed on the floor in the bend of the stairs), but all objects could all be moved freely around the room by the participants except for the furniture.

At their arrival, dyads were given time to settle into the laboratory space. An experimenter turned on both cameras and bounced a small rubber ball in the center of the room to provide

an event that could be easily identified in both recordings for later video synchronization. There were three 10 min conditions, which were randomized across sessions and dyads. In one condition, mothers were given a problem-solving toy (e.g., fitthe-shape toy). Another condition involved a startle toy (e.g., jack-in-the-box), and the third condition was the free-play. Only data from the free-play session is included in the current report.

During the free-play condition, mothers were asked to play with their infants as they normally would. An experimenter monitored each session from an adjacent location that was not visible to the participants. Out of the 247 recordings, only 14 were paused at the mother's request for diaper changes, feedings, dealing with fussiness, etc. until the mother indicated that the session could resume.

### Coding and Dependent Measures

Session recordings from both video cameras were imported into The Observer XT and synchronized for behavioral coding (see **Figure 1** for view from both cameras). We used a time sampling of 15 s intervals to capture general trends in these behaviors over the first 2 years. At each 15 s interval across the 10 min free-play session, we coded infants' and mothers' postures, and the types of physical interactive behaviors they produced with targets. To provide coders with sufficient information to accurately code behaviors, we used 2 s of video prior to each 15 s interval to interpret the behavior (e.g., standing vs. walking). Thus, behaviors occurring between 6:58.0 and 7:00.0 would be examined for coding the interval at 7:00.0. The total corpus corresponded to 41 h of video recordings and 9,880 15 s intervals of free play. Codes are explained below.

#### Posture

We adapted the posture coding scheme used by Thurman and Corbetta (2017). This coding scheme delineates nine posture categories from a range of positions and movements. Posture categories were as follows: being repositioned, held or carried by the mother, laying down, sitting, stationary on all fours, kneeling/squatting, crawling, standing, cruising, and stepping. The posture displayed at each 15 s interval in the session was coded. If the participants' body was not fully visible during an interval, that interval was excluded from the analyses, but this represented on average less than 1% of the overall percentage of infants' intervals (Mean = 0.92%, SD = 0.83%, range = 0.00–2.45%).

#### Interactions With Targets

We considered whether participants were directly and actively engaging/interacting with a target in a physical way (meaning they were in direct contact with the target). Targets included toys, furniture, or the other person. Instances when participants were not contacting a target were coded as "nothing" (e.g., an infant simply sitting on the floor and looking at the mother). We derived a few variables from this coding. First, we counted how many intervals in each session participants physically interacted with a target vs. not. From this, we derived the proportion of intervals that participants spent in targeted vs. untargeted behavior.

We also derived information about how targets changed across successive intervals. We classified five types of target transitions. Target-to-same-target transitions occurred when participants continued to interact with one target from one interval to the next (e.g., climbing the stairs in one interval to pulling up on the stairs in the next interval). Target-to-new-target transitions occurred when participants switched from one target to a different target from one interval to the next (e.g., bouncing the ball, then patting the mother). Target-to-nothing transitions occurred when participants engaged with a target in one interval, but then did not in the next (e.g., hand on the stairs, then simply sitting on the floor). Nothing-to-target transitions occurred when participants went from not interacting with a target in one interval to engaging with a target in the following interval (e.g., standing on the floor, then climbing on the couch). Finally, we coded nothingto-nothing transitions if participants went from one interval to the next and did not interact with a target in either interval (e.g., sitting on the floor, then laying down). We counted how many times each of the five target transitions occurred in each session, then normalized the counts out of the total number of interval-to-interval transitions possible in each session. Using the posture coding described above, we also derived whether infants simultaneously changed or maintained their postures during target transitions.

#### **Interactive behaviors**

Interactive behaviors with targets were coded for each interval based on four categories. No activity indicated that the participant was not physically interacting with a target (e.g., standing in the middle of the floor). Participants also could engage in passive and/or minimal involvement (e.g., hand on a toy, sitting stationary on the cart). More involved complex movements were classified as either general fine motor manipulation (e.g., pressing

TABLE 2 | List of all targets used in study, their classifications, and the length of time they were available in the playroom.


buttons, spinning), or general gross motor activity (e.g., climbing on the couch, pushing wagon, throwing ball) which corresponded to all behaviors not involving fine motor manipulation. We investigated these interactive behaviors during intervals in which participants were either sitting, squatting/kneeling, or standing because those were the three most commonly displayed postures (Thurman and Corbetta, 2017). For each session, we counted how many intervals of sitting, squatting/kneeling, and standing postures corresponded to each category of interactive behavior, and then derived the proportion of intervals participants engaged in each type of interactive behavior while in each of those postures.

#### Inter-Rater Reliability

Pairs of trained coders independently coded between 20 and 23% of the data depending on the analyses. Video segments were selected randomly throughout the entire developmental period and across dyads. For the infants, Kappa's agreements (and interrater correlations) were 0.73 (r = 0.91) for posture, 0.94 (r = 0.81) for targets, and 0.84 (r = 0.74) for interactive behaviors. For the mothers, Kappa's agreements (and interrater correlations) for these codes were 0.89 (r = 0.81), 0.92 (r = 0.90), and 0.86 (r = 0.80), respectively. Disagreements on these reliability sessions were resolved through discussion.

#### Analyses

Infants in our study learned to crawl and walk at different times and therefore were followed for different lengths of time. To structure our analyses, we included data from 5 sessions prior to crawling onset up to 5 sessions following walking onset (see **Table 1**). For some analyses, we used Pearson correlations on each infant's and mother's data over this entire period from sitting to walking to test developmental trends independent of the number of sessions each infant received. Pearson was chosen because it fits a linear trend on the data points while maintaining the developmental order of the sessions (Spearman ranks orders the data, hence potentially altering the developmental order). The individual correlation values obtained were then used with the non-parametric Friedman test to compare general developmental trends between variables. If the Friedman test yielded significant differences (p 2-tailed), we performed Wilcoxon tests (p 2-tailed) to determine where the differences lied.

We further analyzed the developmental changes in infants' and mothers' interaction patterns around the onset of hands-andknees crawling and upright locomotion by running Generalized Estimating Equations (GEE) with a Bonferroni correction for multiple comparisons on segments of data covering 5 sessions prior and 5 sessions following the onsets of those locomotor skills. GEEs are particularly adequate for longitudinal data because they take into account the dependency and ordering of the data within subjects in repeated measures. GEEs assessed differences between mothers and infants, determined which behaviors were produced significantly more, and whether they changed as a function of sessions. Because infants had between 6 and 9 crawling sessions before walking onset, some sessions were used for the computation of the 5 post-crawling sessions and for the 5 pre-walking sessions. As a result, we did not run statistical tests to assess changes between post-crawling and prewalking sessions.

### RESULTS

### Postural Skills Progression

Infants sat independently at an average of 6.6 months (SD = 0.6), crawled on all fours at 8.8 months (SD = 1.4), stood independently at 11.2 months (SD = 1.2), and walked at least seven paces at 12.4 months (SD = 1.6). Four infants could sit independently for at least 30 s at their first session in the study, and one infant (ID#8, **Table 1**) crawled on all fours.

### Targeted Behavior

We first examined to which extent infants' and mothers' each engaged in target-directed behaviors. A GEE on the percent of intervals in targeted behavior around the crawling period, using dyad (mothers vs. infants) and session (10) as predictors revealed a main effect of dyad, and a dyad × session interaction. On average, infants engaged in targeted behaviors significantly more (79.47%) than their mothers [60.64%, Wald χ 2 (1) = 33.108, p < 0.0001]. Furthermore, over the 10 sessions, infants' targeted behaviors increased up to 85.02%, while mothers only increased to 63.30% [Wald χ 2 (9) = 18.349, p < 0.031].

During the walking transition, the GEE analysis using the same predictors, similarly, revealed a main effect of dyad, and a dyad × session interaction. Again, infants displayed on average a higher percentage of targeted behaviors (85.37%) than their mothers [56.69%, Wald χ 2 (1) = 9.698, p < 0.0001]. Over the 10 sessions, infants' targeted behaviors increased on average from 80.99 to 85.33%, while mothers' decreased from 63.87 to 56.60% [Wald χ 2 (9) = 23.885, p < 0.004].

### Number of Posture Configurations Displayed

As infants developed mobility, they also widened their range of posture use, but they did so mainly during targeted interactions. **Figure 2** represents the number of different posture configurations (not the number of posture changes) infants displayed over the 10 sessions around crawling and walking onsets during intervals of targeted or untargeted behavior. Over the crawling transition period, a GEE ran on this variable using behavior (targeted vs. untargeted) and session as predictors confirmed a main effect of both. **Figure 2** shows that, on average, infants displayed more varied posture configurations during intervals of targeted (3.02) than untargeted behavior [2.39, Wald χ 2 (1) = 4.185, p < 0.041]. Further, the number of posture configurations displayed increased significantly over the 10 sessions [Wald χ 2 (9) = 40.497, p < 0.0001].

Around the emergence of walking, the GEE with the same predictors revealed a main effect of behavior, but not of session. During this transition period, infants displayed an average of 5.46 different postures configurations during intervals of targeted behavior compared to 3.20 in untargeted ones [Wald χ 2 (1) = 60.231, p < 0.0001].

The different types of posture infants displayed during each 10-session period surrounding the onset of crawling and walking are reported in **Figure 3** by targeted behavior. The colors correspond to the number of infants that displayed each

of the postures listed by session. This figure illustrates that between the first and second 10-session period, increasingly more infants diversified the range of postures they used to interact with their environment, but they did so mainly while actively engaging with targets.

To verify whether posture diversity related to the frequency of targeted behaviors, we ran Pearson correlations on each infants' data, pairing their number of posture configurations with their number of targeted behaviors by session (infants provided between 12 and 19 sessions for each correlation, see **Table 1**). All correlations were positive (see **Figure 4**). Nine out of the 13 infants' correlations were significant above the 0.05 level (range: r = 0.284, p < 0.239 to r = 0.751, p < 0.001). For the majority of infants, as postural diversity increased, so did the frequency of targeted behaviors.

### Interactive Behaviors in Postures

Did interactive behaviors change as infants transitioned from sitting to walking and developed new postural forms? We focused on three postures most commonly produced (sitting, kneeling/squatting, and standing) and examined the types of interactive behaviors infants and mothers each produced when in those postures within and across sessions.

#### While Sitting

Sitting was the only posture performed throughout the entire study. We first ran an analysis on the entire data span to test the overall developmental trends. Then, we ran analyses by 10 session periods to focus more closely on the changes occurring around the crawling and walking transitions.

**Figure 5** displays the developmental trends for four types of interactive behaviors while infants and mothers were in sitting postures. Each regression line from the Pearson's correlations corresponds to each of the 13 infant or mother. Friedman tests comparing the correlation values of those trend lines by interactive behavior for the infants and mothers separately revealed significant differences [infants: χ 2 (3) = 21.277, p < 0.0001; mothers: χ 2 (3) = 8.723, p < 0.033]. For the infants, the near zero correlations for no activity (mean

r = −0.093) were significantly different from both the negative correlations in fine motor manipulation (mean r = −0.469; Z = −2.900, p < 0.004), and the positive correlations in gross motor activity (mean r = 0.316; Z = −2.760, p < 0.006). The negative correlations in fine motor manipulation were also significantly different from both the positive correlations in passive/minimal involvement (Z = −2.830, p < 0.005), and gross motor activity (Z = −3.180, p < 0.001). Thus, when sitting, over the study period, infants decreased the proportion of intervals they engaged in fine motor manipulation, they increased intervals in passive/minimal involvement and gross motor activity, while intervals in no activity remained about the same.

For the mothers, differences in correlation trends for interactive behaviors during sitting intervals were significant only between fine motor manipulations (mean r = −0.271) and both no activity (mean r = 0.255; Z = −2.341, p < 0.019) and gross motor activity (mean r = 0.030; Z = −2.132, p < 0.033). Mothers also displayed a decline in fine motor manipulations over developmental time, while increasing no activity and maintaining gross motor activity.

GEE analyses using dyad (infant vs. mother), interactive behavior, and session as predictors allowed us to assess more finely differences between the interactive behaviors displayed during sitting intervals, and capture infants/mothers differences at those transition times. Because our data were normalized within postures, for each GEE, we focused on the three interactive behaviors that displayed the largest developmental changes over the 10-session period as our first selection criterion, and then, we used the mostly represented behavior as our second criterion based on the combined data from the infants and mothers.

During the transition to crawling (**Figure 6**, top), a GEE ran on the percent intervals of interactive behaviors performed in sitting using dyad (infant vs. mother), interactive behavior (fine manipulation, gross motor activity, and no activity) and session as predictors revealed a significant main effect of interactive behavior [Wald χ 2 (2) = 69.594, p < 0.0001]. Pairwise comparisons revealed that, as a whole, the average proportion of intervals of gross motor activity while in sitting was significantly lower (13.58%) than for no activity (29.60%) and fine manipulation (30.55%, all ps < 0.0001). A dyad × interactive behavior interaction [Wald χ 2 (2) = 108.658, p < 0.0001] indicated that the infants used on average 40.89% of their sitting intervals engaging in fine motor manipulations, while the mothers used an average of 42.35% of their sitting intervals in no activity. Further, an interactive behavior × session interaction [Wald χ 2 (18) = 69.384, p < 0.0001] revealed that intervals of fine motor manipulation while in sitting decreased over this 10-session period, while intervals of gross motor activity increased during this same period. Finally, a 3-way significant interaction of dyad × interactive behavior × session [Wald χ 2 (18) = 36.421, p < 0.006] indicated that the observed decline in fine motor manipulation and increase in gross motor activity while sitting were more pronounced for the infants than for their mothers.

however, each with a different number of targeted interactions.

During the transition to walking, a similar GEE using no activity, fine manipulation, and passive engagement as the three selected interactive behaviors revealed again a significant main effect of interactive behaviors [Wald χ 2 (2) = 31.865, p < 0.0001], a dyad × interactive behavior interaction [Wald χ 2 (2) = 143.834, p < 0.0001], an interactive behavior × session interaction [Wald χ 2 (18) = 39.030, p < 0.003], and a dyad × interactive behavior × session interaction [Wald χ 2 (18) = 52.345, p < 0.0001]. Pairwise comparisons indicated that during this period the average proportion of intervals of fine motor manipulation had now become overall significantly lower (19.84%) than the average proportion of intervals of no activity in sitting (33.02%) and passive involvement (27.79%, all ps < 0.001). The mothers and infants, however, differed greatly in their respective distribution of interactive activities. Mothers continued to spend on average a high percentage of their sitting intervals in no activity (51.64%), while infants only spent 14.38% in no activity (p < 0.0001). Further, during this period, infants, on average, used most of their sitting intervals for passive involvement (33.48%), and much less performing fine motor manipulations (24.41%) compared to the previous crawling period. In fact, the significant interactive behavior × session interaction over this walking transition, revealed that infants continued to decrease their rate of fine motor manipulation during sitting intervals over the 10-session period, while they increased their rate of passive involvement. Mothers further increased their rate of no activity while decreasing their rate of passive involvement during sitting intervals.

In sum, during the crawling transition, infants' fine motor manipulation – which was their most frequent activity during sitting intervals – declined progressively, while their gross motor activity increased. During the transition to walking, infants' fine motor manipulations during sitting intervals further declined, but now intervals of passive involvement increased. The mothers, when sitting, performed mainly no activity throughout the study period. Over time, they decreased their rate of intervals of all other forms activities.

#### While Kneeling/Squatting

Kneeling and squatting postures began to appear in the behavioral repertoire of the infants after they began to crawl, thus we examined manipulations in those postures only around the transition to walking (see **Figure 6**, middle). GEE analyses on the interval percentage of interactive behaviors performed in kneeling/squatting using dyad (infant vs. mother), interactive behaviors (fine manipulation, passive involvement, no activity), and sessions as predictors revealed a main effect of interactive behaviors [Wald χ 2 (2) = 34.257, p < 0.0001]. During kneeling/squatting intervals, infants and mothers on average engaged more in passive/minimal involvement (34.68%), than fine motor manipulation (26.37%) and no activity (19.06%, all ps < 0.019). However, a dyad × interactive behavior interaction [Wald χ 2 (2) = 88.959, p < 0.0001] revealed that infants performed on average more fine motor manipulation (40.31%) and passive involvement (33.38%) than no activity (7.94%), all ps < 0.0001), while mothers, during kneeling/squatting intervals, engaged on average more in no activity (30.18%) and passive involvement (35.97%), than fine motor manipulation (12.44%, both ps < 0.0001). A 3-way dyad × interactive behavior × session interaction [Wald χ 2 (18) = 31.718, p < 0.024] further identified that while infants' kneeling/squatting intervals showed a decrease in passive involvement and increase in fine motor manipulation during the transition to walking, mothers displayed an increase in passive involvement.

Thus, during the walking transition period, infants' fine motor manipulations occurred mainly during kneeling/squatting intervals, and not so much during sitting intervals. Mothers continued to maintain a relatively high level of no activity or minimal/passive involvement even when in kneeling/squatting.

#### While Standing

Around the transition to walking, infants also learned to stand (**Figure 6**, bottom). A GEE analysis on the percent intervals of interactive behaviors performed in standing using dyad (infant vs. mother), interactive behaviors (passive involvement, fine and gross motor activity), and sessions as predictors revealed again a main effect of interactive behaviors [Wald χ 2 (2) = 101.988, p < 0.0001]. The proportion of intervals of passive interactions during standing were on average higher (46.36%) than those for fine motor manipulation (14.85%) and gross motor activity (23.73%; all ps < 0.0001). However, a dyad × interactive behavior [Wald χ 2 (2) = 9.147, p < 0.010] revealed that while both mothers and infants produced on average high rates of passive behaviors during standing intervals (mothers = 42.52%; infants = 50.20%), they differed in their rates of fine motor

proportion of intervals spent engaging in each type of interactive behavior (left to right: no activity, passive/minimal involvement, fine motor manipulation, and gross motor activity) while sitting. Regression lines for each participant fit the proportion of intervals of a given interactive behavior from 5 sessions before crawling onset, up to 5 sessions after walking onset. Light gray and darker gray shaded areas span the range of session numbers during which different infants learned to crawl and walk, respectively.

manipulations. Infants produced on average 21.63% of fine motor manipulations intervals compared to 8.07% for the mothers (p < 0.043). Finally, a 3-way dyad × interactive behavior × session interaction [Wald χ 2 (18) = 46.669, p < 0.0001] indicated in infants a decrease in passive involvement and an increase in gross motor activity during standing, while no clear developmental trend reflected changes in the mothers' interactive behaviors over those 10 sessions.

Together, these results suggest that passive involvement with targets, mainly performed during standing intervals around the transition to walking, decreased over the sessions as gross motor activities increased.

#### Transitions Between Targets

To understand more about mothers' and infants' interactive behaviors in their environment, we tracked if they changed targets between successive intervals, maintained the same target across successive intervals, went from a target to nothing on the next interval (or the reverse), or did not engage at all with targets for a few intervals. These different types of target transitions were normalized out of the total target transitions possible. **Figure 7** displays the developmental trends as regression lines from Pearson's correlations for all 13 infants and 13 mothers for each types of target transition.

Friedman tests comparing the correlation values of those trend lines between target transition types and sessions revealed significant differences for the infants [χ 2 (4) = 29.846, p < 0.0001], but none for the mothers [χ 2 (4) = 4.862, p < 0.302]. Infants had positive correlations for target-to-new-target (mean r = 0.697) that were significantly different from the negative or near zero correlations of all other target transition categories (all ps < 0.001). Thus, over the entire study period, infants not only increased their bouts of interactions, but they also increasingly transitioned to new targets between consecutive time intervals, while all other target transition types either declined or remained about the same over time. The mothers did not reveal significant changes in target transitions over the duration of the study.

Since little developmental variations were found for the target-to-nothing and nothing-to-target transitions, we ran the GEE analyses using dyad (mother vs. infants), target transition type, and session as predictors only on the three categories of target transitions showing developmental change (i.e., target-tosame-target, target-to-new-target, and nothing-to-nothing, see **Figure 8**). Around the crawling period, the GEE revealed a main effect of target transition type [Wald χ 2 (2) = 249.354, p < 0.0001]. The proportion of successive time intervals in which mothers and infants interacted with the same target was on average significantly higher (41.63%) than the two other target transition types (15.62 and 16.61%, ps < 0.0001). However, a significant dyad × transition type interaction [Wald χ 2 (2) = 179.799, p < 01.0001] indicated that this effect was mainly driven by the infants. On average, infants interacted with the same target across successive time intervals significantly more (57.71%) than they transitioned to new targets (11.96%) or from nothing-to-nothing

across the transition to crawling and walking, and by dyad member. The vertical lines on the graphs indicate the onsets of crawling and walking, respectively. The lines that are grayed out were not entered in the GEE analyses but are still plotted for illustration purposes.

(9.37%, all ps < 0.0001), while mothers did not show any trend. The GEE also reported a target transition × session interaction [Wald χ 2 (18) = 42.318, p < 0.001], and a dyad × target transition × session interaction [Wald χ 2 (18) = 38.112, p < 0.004]. Infants' proportion of successive intervals interacting with the same target declined over this crawling transition period while the proportion of transitions to new targets increased. Again, mothers did not reveal much changes over time.

A GEE analysis on percent intervals of target transitions using the same predictors and the same target transition types over the transition to walking revealed similar trends (**Figure 8**). Main effects of dyad [Wald χ 2 (1) = 12.618, p < 0.0001] and target transition type [Wald χ 2 (2) = 163.543, p < 0.0001], and a significant dyad × target transition type interaction [Wald χ 2 (4) = 459.489, p < 0.0001] indicated that the target-to-sametarget transitions still occurred on average more frequently over successive time intervals than the other two target transition types (34.83% compared to 23.45, 14.53%, all ps < 0.0001). However, this was again mainly the case for the infants, who produced on average 49.83% of target-to-same-target transitions compared to 26.15% target-to-new-target transitions and 3.93% of nothing-to-nothing transitions (all ps < 0.0001). Mothers did not demonstrate significant differences between target transition types.

In sum, infants produced many target-to-same or targetto-new-target transitions over the observed developmental period compared to their mothers and produced very few target-to-nothing, nothing-to-target, or nothing-to-nothing transitions over successive time intervals. Over time, infants gradually decreased their rate of target-to-same-target transitions and increased their rate of target-to-new-target transitions, suggesting that with the acquisition of mobility, infants explored their environment more widely and interacted with more targets. Mothers did not show much change in their target transitions crawl and walk, respectively.

fpsyg-10-00822 April 10, 2019 Time: 20:3 # 12

over time; neither did they display a predominant type of target transition.

### Target Transitions and Posture Changes in Infants

Given that infants were the only ones showing high transitions between same and new targets, we also examined, for the infants only, whether these two types of target transitions corresponded to a change or maintenance of posture over the same successive time intervals. **Figure 9** displays regression lines from Pearson's correlations for each of the 13 infants indicating the developmental trends for maintaining posture vs. changing posture during target-to-same-target transitions and target-to-new-target transitions. The Friedman test on the obtained correlation coefficients revealed that the developmental trends for those four case scenarios were significantly different [χ 2 (3) = 31.985, p < 0.0001]. In target-to-same-target transition intervals, posture maintenance declined over time (mean r = −0.551) posture changes increased (mean r = 0.594, p < 0.002). For the intervals of target-to-new-target transitions, posture maintenance did not change over time (mean r = 0.098), but posture changes also increased (mean r = 0.691, p < 0.001).

A GEE on these percentage intervals of posture/target changes using the type of target transition with type of posture change and session as predictors around the emergence of crawling (**Figure 10**) revealed a main effect of posture change/target transition [Wald χ 2 (3) = 400.703, p < 0.0001] and a significant posture change/target transition × crawling session interaction [Wald χ 2 (27) = 89.543, p < 0.0001]. Posture maintenance during target-to-same-target transition was the behavior most highly performed by the infants (45.73%), and was on average significantly different from all three other posture/target transition combinations (range = 5.75–9.19%, all ps < 0.0001). However, the interaction indicated that posture maintenance during target-to-same-target transitions declined significantly over the 10-session crawling period (ps from session 7 < 0.003), while posture changes increased. For the target-tonew-target transitions, posture maintenance and posture change did not occur much over this crawling period and represented less than 20% of the successive intervals.

A GEE on this same variable using the same predictors over the walking transition period (**Figure 10**) only returned a main effect of posture change/target transition [Wald χ 2 (3) = 99.12, p < 0.0001]. The interaction with sessions of walking did not reach significance [Wald χ 2 (27) = 38.555, p < 0.07]. Posture maintenance during same-target interval transitions was again on average more represented (26.96%) than the other posture change/target transition combinations (7.18, 17.32, 20.6%, all ps < 0.008).

Thus, the early period, corresponding to the transition to crawling presented the greatest developmental change in posture during target transitions. Change in posture increased whereas posture maintenance declined, especially during sametarget transitions.

### DISCUSSION

A growing body of literature underscored the importance of infants' action experience for their understanding of the world (e.g., Sheya and Smith, 2010; Soska et al., 2010). Developmental researchers have also examined the relation between infants'

sense of agency, locomotor experience, and environmental characteristics in shaping infants' actions (e.g., Dosso and Boudreau, 2014), and how their actions both influence and are influenced by interactions with their caregivers (e.g., Karasik et al., 2014). Prior findings from this dataset revealed that the rate of posture changes was related to the number of bouts of interaction infants performed during free play, but this was not true for mothers (Thurman and Corbetta, 2017). The current study extended this work, and investigated how mothers and infants adopted various postures during play, how they used postures to interact with targets in their environment, and how infants' repertoire of postural skills expanded over the course of locomotor development.

As one would expect, infants broadened postural diversity and interactive behaviors during object interaction as they gained locomotor skills over time, but interestingly, as they did so, they also reorganized the way they used prior occurring postures to manipulate their environment. For example, sitting which was mostly used for fine motor manipulation during the crawling transition period, started to be used increasingly more for passive holding during the walking transition period, and at that same time, kneeling/squatting became the preferred postures for fine motor manipulation. Regardless, infants engaged in targeted behavior most of the time and interacted with the same target from interval to interval more frequently than they changed to a new target. They also tended to maintain the same posture when attending to the same target across intervals despite developing a growing range of postures. Mothers on the other hand remained passive or minimally engaged most of the time, even though they had the freedom to move about the room and interact with their infants. With mobility, infants' bodies seemed to become tools for exploration, allowing for a growing diversification of their behavior and interactions with their surroundings, interspersed with moments of posture maintenance with a same target.

### Infants' and Mothers' Interactions and Postures

Mothers spent much less time interacting with targets compared to their infants, and this did not change very much longitudinally

as their infants acquired locomotor skills. This finding may be consistent with prior work in home settings, which has shown that mothers tend to respond similarly to their infants over time (e.g., Masur and Turner, 2001), and mothers often arrange play spaces for self-initiated infant play (Pierce, 2000).

While mothers' interactions seemed to be more predictable and stable, infants' interactions developed and reorganized in concert with their growing postural options (Thelen and Smith, 1994). The early fine motor manipulations with objects performed during sitting, progressively morphed into holding patterns later in the study. Early sitting frees the hands to manipulate objects and has been associated with fine haptic explorations and differential functioning of the hands (Rochat, 1989). But as infants increasingly varied their postures and were free to play, kneeling and squatting emerged as the new postures for fine motor manipulation of targets along with passive involvement with targets. Indeed, when infants stop moving around the room and want to examine an object, kneeling or squatting are the next easiest postures to produce, and recent research suggests squatting postures even enhance postural control (Bril, 2018). Sitting and kneeling/squatting occurring during the later developmental period, that gave rise to more passive involvement, may at that point have become more transitional postures (than standing postures). Infants can adopt those postures for short moments on their way to their next object destination. A more in-depth examination of these data in future studies will allow us to address these questions more readily.

As infants began to stand, they revealed the highest rate of passive interactions with targets. Standing, initially, is a very unstable posture. Holding an object helps stabilize the upright posture (Claxton et al., 2013), but it may be that when standing, infants are busy maintaining balance, which may temporarily affect their ability to perform detailed object manipulations or gross motor activity on objects. Metcalfe and Clark (2000) have shown that when infants keep their hand on a surface while standing, they are using the surface contact as source of postural stabilization, but also as a way to explore their own developing postural coordination. Consistent with that study, we found that gross motor activity, which was seldom represented in the early period, progressively increased during the later kneeling/squatting and standing postures. Here also, future research could investigate more closely how infants learn

to control their bodies and postures in relation to acting on their environments over time.

This work supports previous claims that postural development and postural control both play important roles in the execution of skilled and target-directed actions in infancy (Rochat and Bullinger, 1994). At every stage throughout development, as infants learn to sit, kneel, crawl, and walk, they learn information about their body's resources and action capabilities, and this greatly affects their ability to interact with the resources and opportunities provided in the environment (Gibson, 1988; Rochat and Bullinger, 1994; Adolph, 2008). Importantly, in developmental pathway approaches, cumulative change builds complexity in developmental systems, and rudimentary exploratory skills lay a foundation for which later-appearing skills can be built upon (Thelen and Smith, 1994; Smith, 2013). For example, Bornstein et al. (2013) discovered that early motor and exploratory behaviors in infancy lay a foundation for future intellectual functioning and academic achievement in childhood. This is because opportunities for interacting with the environment, which are promoted by motor and exploratory behaviors, can lead to crucial learning opportunities for the infant. In our analyses we have not examined whether learning opportunities were also provided by the mothers as infants and mothers interacted with each other. We know mothers often scaffold infants' play, which leads infants to display more advanced functional play during joint attention moments (Bigelow et al., 2004). This is a question we are planning to examine in future analyses.

### Postural Development and Target-Directedness

Investigating self-directed infant locomotion provides some insight into the information that infants select from their environments (Dosso and Boudreau, 2014). Our data show that as infants develop locomotion, infants not only produced an increasingly higher rate of targeted behaviors, but they also transitioned to new targets in addition to maintaining interactions with the same target. Others have shown infants spend a great deal of time interacting with objects in their surroundings (e.g., Cole et al., 2016; Hoch et al., 2018), but the way in which infants arrive at those targets has been contested recently. Prior characterizations of infant sensorimotor and functional play patterns referred to Piaget's early descriptions of infant behavior, which describe infant play as more intentional and goal-directed, such that infants tend to seek out certain kinds of stimulation (Burghardt, 2006). Recent work by Cole et al.

(2016) suggests otherwise. They investigated whether bouts of infant walking ended with infants making contact with toys. They found that when infants ended a bout of walking, they most often stopped in the middle of the floor, and many interactions with objects occurred after infants were already in motion. They concluded that infant's behavior is not goal-directed in the sense that an infant may see a goal in the distance and then travel to it. Instead, because infants cover so much ground, they happen upon opportunities for interaction along the way and while already in motion.

Further, the presence of toys seems to elicit different patterns of locomotor exploration in infants. Recent research compared infants' exploration patterns in toy-filled vs. empty rooms. Although infants traveled similar distances across the two conditions and took about the same number of steps, in the toy-filled room, infants showed greater spread of exploration compared to infants in the empty room. These differences may be related to how infants interacted with locomotor toys, which are designed to be rolled or carried (Hoch et al., 2018).

Our data, which considered all infant postures and movements, and not just those that occurred in bouts of walking, similarly suggest that infants' behaviors are highly target-directed. But, we did find particularly in the later period, that infants increasingly involve their whole bodies when interacting with objects, changing body posture when remaining with similar targets or switching to new ones.

#### Implications

An infant's ability to learn new things about their environment is strongly related to exploratory skills that arise with locomotion (Bornstein et al., 2013). Here, we have shown throughout locomotor development, that infants gain more postural options, which in turn affect how they use their bodies and postures to interact with and transition between targets in their environment. Mobility impairments such as Down syndrome and cerebral palsy in infancy can severely delay or completely prevent mobility (Cobo-Lewis et al., 1996; Ghazi et al., 2016), which in turn, can reduce the range of opportunities for interaction that infants possess.

Furthermore, more general motor impairments can also affect how infants use their posture and manipulate objects. In a study by Nickel et al. (2013), infants who later received an autism diagnosis had previously shown slower development of sitting and standing postures, and exhibited fewer posture changes during play. Delays in postural and motor development such as those seen in infants who are at high risk for autism limit opportunities infants have to explore objects and their surroundings. This early disruption in interaction patterns can set the stage for further atypical experiences both within the realm of postural and locomotor skills, but also cognitive development and social interactions (Thelen, 2004).

CONCLUSION

Our observations were done in a free-play session, where there were no instructions as to which toys participants should choose in their activities. Furthermore, in order to provide developmentally appropriate toys for participants, we occasionally changed out a small number of objects as infants progressed through motor skills. This may have affected the likelihood that infants would have engaged in particular interactive behaviors in a given session. However, it was our intent to capture variations in behaviors in our free-play format. Varied opportunities for interactions could occur at each session whether the objects in the room were identical or not. We are confident that our findings are unaffected by the small variations in toy selection at any session, as 68% of infants' target interactions were with items that remained in the room throughout the whole study.

The current study utilized an extensive longitudinal approach to investigate infants' and mothers' use of posture for playful interaction with their environments both within sessions and across infant locomotor development. We discovered that with locomotor development, infants' interactions in their environments changed depending on which postures they adopted in the moment, the range of postures they displayed, and how they used their postures to transition between targets. Mothers, however, remained largely inactive and did not alter significantly patterns of interactions as their infants did. Our approach provided evidence to further support the notion that infants' use of posture is a dynamic and essential part of their action repertoire during exploration. Our observations were limited to 15 s interval time sampling over just 10 min freeplay sessions. While one could argue that this is not sufficient to capture these developmental dynamics, our observations show that they map logically on expected patterns of development.

#### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Institutional Review Board at the University of Tennessee - Knoxville with written informed consent from all parents of the infants. All parents gave written informed consent for their infants in accordance with the Declaration of Helsinki. The protocol was approved by the Institutional Review Board at the University of Tennessee - Knoxville.

#### AUTHOR CONTRIBUTIONS

This work is part of ST dissertation project. DC was her advisor. ST had a primary role in designing and collecting the data and took as well the lead in coding and analyzing the data. DC provided guidance in planning the design and data collection, and provided inputs throughout the coding and data analysis process. ST and DC contributed to the writing of this manuscript.

#### FUNDING

This work was supported by a University of Tennessee Professional Development grant to DC and Graduate Student Summer Research Funds to ST.

### REFERENCES


infant neurological examination: strengths and limitations. Dev. Med. Child Neurol. 52, 87–92. doi: 10.1111/j.1469-8749.2009.03305.x


J. R. Stewart, O. Gapenne, and E. A. Di Paolo (Cambridge: MIT Press), 123–144. doi: 10.7551/mitpress/9780262014601.003.0005


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Thurman and Corbetta. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.