# DYNAMICS OF JOINT-ACTION, SOCIAL COORDINATION AND MULTI-AGENT ACTIVITY

EDITED BY: Michael J. Richardson, Richard C. Schmidt, Rick Dale, Rachel W. Kallen and Joanna Raczaszek-Leonardi PUBLISHED IN: Frontiers in Psychology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2018 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-420-4 DOI 10.3389/978-2-88945-420-4

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **DYNAMICS OF JOINT-ACTION, SOCIAL COORDINATION AND MULTI-AGENT ACTIVITY**

Topic Editors:

**Michael J. Richardson,** Macquarie University, Australia **Richard C. Schmidt,** College of the Holy Cross, United States **Rick Dale,** University of California, Los Angeles, United States **Rachel W. Kallen,** Macquarie University, Australia **Joanna Raczaszek-Leonardi,** University of Warsaw, Poland

**Citation:** Richardson, M. J., Schmidt, R. C.,Dale, R., Kallen, R. W., Raczaszek-Leonardi, J., eds. (2018). Dynamics of Joint-Action, Social Coordination and Multi-Agent Activity. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-420-4

# Table of Contents


*160 Modeling Multi-Agent Self-Organization through the Lens of Higher Order Attractor Dynamics*

Jonathan E. Butner, Travis J. Wiltshire and A. K. Munion

*175 Sensorimotor Coarticulation in the Execution and Recognition of Intentional Actions*

Francesco Donnarumma, Haris Dindo and Giovanni Pezzulo


Akifumi Kijima, Hiroyuki Shima, Motoki Okumura, Yuji Yamamoto and Michael J. Richardson


Liam Cross, Andrew D. Wilson and Sabrina Golonka


*342 Coordination and Collective Performance: Cooperative Goals Boost Interpersonal Synchrony and Task Outcomes*

Jamie S. Allsop, Tomas Vaitkus, Dannette Marie and Lynden K. Miles

*353 Performance of Language-Coordinated Collective Systems: A Study of Wine Recognition and Description*

Julian Zubek, Michał Denkiewicz, Agnieszka De˛bska, Alicja Radkowska, Joanna Komorowska-Mach, Piotr Litwin, Magdalena Ste˛pien', Adrianna Kucin'ska, Ewa Sitarska, Krystyna Komorowska, Riccardo Fusaroli, Kristian Tylén and Joanna Ra˛czaszek-Leonardi

*365 Impairments of Social Motor Synchrony Evident in Autism Spectrum Disorder* Paula Fitzpatrick, Jean A. Frazier, David M. Cochran, Teresa Mitchell, Caitlin Coleman and R. C. Schmidt

# Likability's Effect on Interpersonal Motor Coordination: Exploring Natural Gaze Direction

Zhong Zhao1,2 \*, Robin N. Salesse<sup>2</sup> , Ludovic Marin<sup>2</sup> , Mathieu Gueugnon<sup>2</sup> and Benoît G. Bardy2,3

1 Institute of Human Factors and Ergonomics, Shenzhen University, Shenzhen, China, <sup>2</sup> EuroMov, University of Montpellier, Montpellier, France, <sup>3</sup> Institut Universitaire de France, Paris, France

Although existing studies indicate a positive effect of interpersonal motor coordination (IMC) on likability, no consensus has been reached as for the effect of likability back onto IMC. The present study specifically investigated the causal effect of likability on IMC and explored, by tracking the natural gaze direction, the possible underlying mechanisms. Twenty-two participants were engaged in an interpersonal finger-tapping task with a confederate in three likability conditions (baseline, likable, and unlikable), while wearing an eye tracker. They had to perform finger tapping at their comfort tempo with the confederate who tapped at the same or 1.5 times of the participant's preferred frequency. Results showed that when tapping at the same frequency, the effect of likability on IMC varied with time. Participants coordinated at a higher level in the baseline condition at the beginning of the coordination task, and a facilitative effect of likability on IMC was revealed in the last session. As a novelty, our results evidenced a positive correlation between IMC and the amount of gaze onto the coordination partner's movement only in the likable condition. No effect of likability was found when the confederate was tapping at 1.5 times of the participant's preferred frequency. Our research suggests that the psychosocial property of the coordinating partner should be taken into consideration when investigating the performance of IMC and that IMC is a parameter that is sensitive to multiple factors.

#### Edited by:

Rick Dale, University of California, Merced, United States

#### Reviewed by:

Daniel Richardson, University College London, United Kingdom Alexandra Paxton, University of California, Berkeley, United States

> \*Correspondence: Zhong Zhao zhaozhong838@hotmail.com

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 06 October 2016 Accepted: 09 October 2017 Published: 26 October 2017

#### Citation:

Zhao Z, Salesse RN, Marin L, Gueugnon M and Bardy BG (2017) Likability's Effect on Interpersonal Motor Coordination: Exploring Natural Gaze Direction. Front. Psychol. 8:1864. doi: 10.3389/fpsyg.2017.01864 Keywords: motor coordination, likability, gaze direction, interpersonal relationship, eye tracking

# INTRODUCTION

In social interaction, psychological processes and behavioral activities are highly involved simultaneously. People verbally communicate with each other, appreciate the likability of the interaction partner, and behaviorally coordinate with the person. The present study aimed at exploring whether the likability of an individual would influence interpersonal motor coordination (IMC).

Likability refers to the degree of preference of a target individual by another individual (Reysen, 2005), and it indicates the quality of the interpersonal relationship. Literature also refers to affiliation and rapport as synonyms of likability (Bernieri, 1988; Hove and Risen, 2009; Miles et al., 2011).

In the current paper, we adopted Bernieri and Rosenthal's (1991) definition of IMC, which can be broadly classified as behavioral matching and interactional synchrony

(Bernieri and Rosenthal, 1991). Behavioral matching, also known as behavioral mimicry, refers to the phenomenon that individuals adopt the postures, gestures, and mannerisms of interaction partners (Chartrand and Bargh, 1999). Interactional synchrony mainly emphasizes the congruency in the temporal aspects of behavior, and illustrates how two people act simultaneously (Bernieri et al., 1988). Behavioral mimicry and interpersonal synchrony are regular forms of IMC. A large body of research from social psychology, neuroscience, and coordination dynamics indicates that during social interaction individuals do not act independently from each other; instead, their movements coordinate as long as there is a perceptual contact (Bernieri, 1988; Bernieri et al., 1988; Bernieri and Gillis, 1995; Schmidt and O'Brien, 1997; Schmidt et al., 2012).

Motivated by the notion that human behavior and psychological states are tightly intertwined with each other, the relation between IMC and likability has attracted a good amount of research interest. Already in the 1960s, psychologists were intrigued by the correspondence between level of mimicry and likability of partners in interaction. For instance, Charny (1966) reported a positive correlation between postural congruency and rapport between the psychotherapist and the client (Charny, 1966). Strong correlation was also found between IMC and teacher-student rapport (Bernieri, 1988). Recent research analyzed video clips of interactions between therapists and patients (Ramseyer and Tschacher, 2011) and found that non-verbal synchrony was associated with the outcome of the therapy, suggesting a positive correlation between IMC and likability.

Beyond a simple correlation between IMC and likability, past research has also suggested that IMC leads to higher level of likability between interactants. Chartrand and Bargh (1999) reported that the group of participants who were mimicked by the confederate liked the confederate more as compared to the group who was not mimicked (Chartrand and Bargh, 1999), suggesting that mimicry facilitates likability. Lakin and Chartrand (2003) also observed that people who attempted to affiliate with the partner mimicked the person more, inferring that mimicry might be an unconscious vehicle individuals utilize to achieve the purpose of being affiliated with others during social interactions (Lakin and Chartrand, 2003). Hove and Risen (2009) even demonstrated the existence of a causal effect of IMC onto likability. By adopting a finger-tapping task, they obtained a positive correlation between likability and IMC, but more importantly, found that likability was significantly higher in a synchronous condition compared to asynchronous and control conditions. Finally, they showed that likability was higher when synchronizing with another human than with an inanimate object, suggesting that likability arises from interpersonal relationship (Hove and Risen, 2009).

Existing research has therefore reached a common agreement on a positive relation between likability and IMC and as a consequence, it can be claimed that IMC would lead to likability increase. However, although previous research has shown a causal relationship from IMC to likability, no studies have yet proven a causal relationship from likability to IMC. Several previous studies suggest that this may be the case. For example, in order to seek whether social context would modulate how people coordinate with each other, Miles et al. (2010) manipulated the confederate's punctuality or tardiness, and found a lower degree of IMC with the tardy confederate (Miles et al., 2010), indicating that manipulating likability induced IMC changes. The work conducted by Cesario et al. (2006) provided evidence that mimicry is modulated in some way by the likability of the interacting partner (Cesario et al., 2006). Recent studies also found that a divergence of arguments between interactants can disrupt in-phase bodily coordination (Paxton and Dale, 2013).

All of these studies support the idea that the level of bodily coordination is influenced by the likability of the interaction partner. However, because IMC can be used as a means to establish rapport (Chartrand and Bargh, 1999; Lakin and Chartrand, 2003; Hove and Risen, 2009), possibility remains that individuals would coordinate at a higher level with the unlikable individual when they desire to be affiliated with this person. This idea was supported by the study conducted by Miles et al. (2011), who explored whether group membership influenced IMC. They found a higher percentage of in-phase coordination with the out-group compared to the in-group confederate. This study inferred that individuals were more coordinated with members of the out-group in order to gain likability and search for affiliation, suggesting that low levels of current likability may lead to higher IMC if the interlocutors are trying to bond with one another (Miles et al., 2011). Similarly, Lakin et al. (2008) found that participants coordinated more with individuals who've just ostracized them, and this study also suggested the possibility of coordinating more with an unlikable person (Lakin et al., 2008).

Therefore as the main objective of the present study, we were particularly interested in seeking the causal effect of likability on IMC. We reasoned that if this was true, then even with the same interacting partner, higher level of IMC might be witnessed with higher likability, and lower IMC with lower likability. Moreover, we attempted to investigate the role of gaze in the relation between likability and IMC.

Our study was conceived in the theoretical framework of the dynamical approach to IMC. In this context, IMC is a selforganized phenomenon, which follows basic dynamic principles (Schmidt et al., 1990; Schmidt and O'Brien, 1997; Richardson et al., 2007). The majority of these studies required participants to perform rhythmic oscillatory movement. Each single individual was considered as an oscillator, and the level of IMC depended on the level of entrainment between the two oscillators. The abovementioned relation between likability and IMC suggested that likability might influence the strength of the entrainment. But once again it still remains an open question whether likability would increase or decrease IMC. The present study specifically aimed to address the following two questions:


For (1) we emphasized that if a causal relationship exists, IMC would follow the change of likability. To fulfill that purpose, we arranged participants to interact with a confederate whom they had not known before the experiment started. Conversations were arranged to manipulate the likability toward the confederate. Interpersonal finger-tapping task was adopted right after the conversation. Participants had to tap with their index finger while the confederate was performing the same movement in their visual field. We expected that the coordination level would be higher in the likable condition.

For (2) gaze toward the partner's movement was hypothesized as a mediator between likability and IMC, for several reasons. First, IMC cannot be possibly established without perception. Although coordination can be established via a variety of different perceptual modalities [e.g., visual (Schmidt and O'Brien, 1997; Richardson et al., 2007), auditory (Shockley et al., 2003; Baumann et al., 2007), tactile (Marin et al., 2009)], here we only focused on the role that visual perception played in establishing IMC. The perceptual basis of IMC has been confirmed by studies adopting both intentional (Schmidt et al., 1990) and unintentional motor coordination (Schmidt and O'Brien, 1997; Oullier et al., 2008). Second, the amount of available visual information is positively correlated to the level of entrainment in unintentional rhythmic coordination. For instance, Richardson et al. (2007) tested whether the extent to which participants fixated the partner's movement influenced the level of coordination. During unintentional coordination, they found a higher level of inphase pattern when participants fixated their focal vision to their partner's rocking movement compared to peripheral vision (Richardson et al., 2007). It suggests that more visual perceptual information leads to greater extent of coordination. Third, coordination seems inevitable as long as visual perception is available. Issartel et al. (2007) asked participants intentionally not to coordinate while looking at each other's movement. Results showed that participants' intrinsic oscillatory frequencies tended to converge when visual information was shared, revealing that they could not avoid influencing each other as soon as visual contact was available (Issartel et al., 2007). In sum, all these studies suggest the importance of visual perception on determining the level of motor coordination. Some studies also focused on the role eye contact plays during social interaction (Wang et al., 2011; Wang and Hamilton, 2014), and found that eye contact facilitated mimicry. Differently, our study tested whether likability influenced IMC simply through looking at the partner's movement. Moreover, we investigated how natural gaze was oriented in a continuous interpersonal interaction situation. The above-mentioned studies provide reasonable justifications to hypothesize that the amount of gaze targeted onto the partner's movement determines the level of coordination.

In the experiment reported below, we captured the natural gaze direction of our participants during IMC. Of particular interest was the amount of gaze directed toward the partner's movement. Eye tracking techniques have been extensively documented as valid tools to detect visual focus (Arolt et al., 1996; Dalton et al., 2005). In our study, in order to ensure that visual perception was the only source of inter-personal entrainment, auditory cues were blocked with proper techniques. Based on the critical role visual perception plays on coupling interactants, we expected that higher level of motor coordination might be attributed to greater amount of visual fixation on the partner's movement.

# MATERIALS AND METHODS

# Participants

Twenty-two participants (10 female and 12 male; age 26.9 ± 6.6 years) were recruited from the University of Montpellier and other Universities in Montpellier by asking whether they would like to participate a finger tapping experiment in order to study the individual's tapping characteristics. Each participant signed the informed consent prior to the start of the experiment. The protocol conformed the Declaration of Helsinki, followed the guidelines of the University of Montpellier, and was approved by the EuroMov IRB. Participants had normal or corrected-to-normal vision, and they were not told about the exact purpose of the experiment until all sessions of the experiment were completed.

# Confederate and Likability Manipulation

A female confederate was employed and conversations were arranged to manipulate the level of likability toward the confederate. An interpersonal finger-tapping task was arranged right after each conversation in order to assess the level of IMC.

The confederate was a 24-year old college female student. She was asked to adopt the similar style of dressing and makeup in order to maintain the identical level of physical attractiveness throughout different likability conditions. In this way, a potential difference in IMC could not be attributed to physical attractiveness, which was reported to influence IMC (Zhao et al., 2015). The confederate was not naïve to the study hypothesis. She was paid for the job and highly motivated to accomplish the task, and she did not know any of the participants.

Three levels of likability were tested: baseline, likeable and unlikeable. The baseline condition captured the first impression, requiring both the confederate and participant to meet and say "Hi" to each other without further communication. In the likable condition, both the participant and the confederate were told to have a conversation on their hobbies and studies. The confederate behaved in a friendly and outgoing manner in order for the participants to like her. She engaged herself completely in the conversation, listening attentively and responding properly to the participant. Her phone was switched off to avoid incoming calls or messages. In the unlikable condition, both persons were asked to have a conversation on controversy topics such as gay marriage. The experimenter indicated in this particular condition that they were allowed to discuss the debated topics. In order to know the participant's opinion, the confederate always raised the question first. After knowing the participant's opinion, the confederate intentionally posed opposite opinions. Moreover, she avoided eye contact and acted inattentively when the participant was speaking. To further ensure the success of the unlikable manipulation, the confederate set an alarm on her phone to ring during the conversation (as if a message came in). Afterward she switched off the alarm but continued playing with her phone (as if texting messages). This technique intended to annoy participants to an extent that the likability level would be low. Both people were allowed to ask questions to each other in both likable and unlikable conditions. The conversations in these two conditions lasted around 5 min and the experimenter stopped the conversations at a proper time.

#### Likability Questionnaire

fpsyg-08-01864 October 24, 2017 Time: 15:58 # 4

In order to confirm that likability was successfully manipulated, participants rated a likability questionnaire after the conversation in each of the three likability conditions. The questionnaire was tailored by incorporating eight items of the Reysen likability scale. The original Reysen likability scale is an 11-item measurement, and is a valid and reliable tool to assess likability (Reysen, 2005). It uses a 7-point Likert scale format, with −3 representing "strongly disagree" and +3 "strongly agree." Higher score of all items stands for a higher likability level. The 7th, 10th, and 11th items of the Reysen likability scale were not selected into the likability questionnaire for empirical reasons. For example, the seventh item— "I would like this person as a roommate" — was not chosen because it might have been viewed as inappropriate, especially between a male participant and the (female) confederate. Moreover, the decision of eliminating items was also taken by consulting the questionnaire developer Stephane Reysen, who believed that skipping a few items would not affect the validity and reliability of the questionnaire. In order for the participants to rate their real feeling for the confederate, they were arranged to sit at two corners when filling out the questionnaire so that neither of them knew the other's appraisal. Meanwhile, they were highly encouraged to rate their genuine feelings. The questionnaire took about 2 min to answer.

# Experimental Procedure

Each participant underwent the three likability conditions. The baseline condition always came before the likable and unlikable conditions, whose order was counterbalanced across number and gender of the participants. Likable and unlikable conditions were conducted at least 2 days apart because the confederate behaved in a completely different way in these two conditions. If both conditions had been arranged on the same day, participants would have been surprised by the great change in the confederate's attitude, and would thus have been suspicious about the goal of the experiment.

After each likability manipulation and questionnaire, an interpersonal finger-tapping task was conducted to measure the level of IMC in the corresponding likability condition (**Figure 1A**). The tapping task session lasted around 15 min in each condition.

There were six trials of interpersonal finger tapping in each of the three likability conditions. Each trial lasted 90 s, which was composed of two parts – the first 30 s and the last 60 s. The participant tapped alone for the first 30 s, whose data were used to generate an auditory metronome, which beeped at 100% or 150% of the participant's tapping frequency for the confederate to follow. Then both persons tapped simultaneously for the last 60 s (**Figure 1B**). The 100 and 150% frequencies were repeated three times, and all of these six trials were randomly presented. Participants wore an eye tracker throughout the IMC task. The instruction required participants to tap at a constant and comfortable tempo. They were free to look wherever they wanted, but were instructed not to close their eyes (except for eye blinking) during the IMC task. The confederate was looking straight ahead and was careful to express no emotion during the finger-tapping task. She was particularly instructed to maintain the same performance at all times.

# Apparatus

A Macbook Pro (15-inch, Mid 2012, OSX 10.9.5) connected to two keyboards and an eye tracker (PupilLab©) was used. The Matlab toolbox (Matlab\_R2013a) together with Psychtoolbox (Kleiner et al., 2007) were run to generate and deliver auditory metronome to the confederate, to initiate the recording of the eye tracker data, and to collect the tapping data. The confederate and the participant tapped on two separate keyboards, which recorded the finger tapping data. The participant's keyboard was covered with a shield in order to block the confederate's peripheral view of their finger tapping. Participants tapped on the "left arrow" key on the participant's keyboard and the confederate on the "right arrow" on her keyboard.

Gaze direction of the participant was collected with a commercial head-mounted eye tracker, which consisted of two cameras: a scene camera and an eye-tracking camera. The scene camera captured the environmental scene in front of the subject, and the eye camera recorded eye movements. The average recording frequency of both cameras was 30 Hz. The device is a reliable eye-tracking tool for estimating natural gaze direction, with decent temporal-spatial accuracy and precision (Kassner et al., 2014). Data recording was initiated by the first tap of the participant, and was paused manually after each trial was completed. Participants were all naïve with respect to the eye tracking device. They were told that it was used to count the number of eye blinking events during the task. This cover story about the purpose of the eye-tracker was added in order to avoid possible unnatural behavior during tracking behavior.

# Experimental Setup

In the interpersonal finger tapping task, the participant was situated at a 90◦ angle from the confederate (**Figure 1C**). The particular position ensured that only the participant had a full view of the confederate's finger tapping, instead of the other way around.

In order to block auditory cues, both the participant and the confederate wore earphones, through which white noise was delivered. Noise was delivered through a cellphone to the participant. As for the confederate, it was delivered together with the auditory stimulus via the computer. The volume of white noise was tuned to an appropriate level, so that it was not uncomfortable but efficient at blocking the tapping sound. Both seats arrangement and white noise were adopted to establish a unidirectional coupling. In such a way, the difference in motor coordination could only be explained by the likability

manipulation, and the underlying mechanism could be solely attributed to vision instead of other forms of perception.

In order for the participants not to realize the genuine objective of the experiment, participants were told that they would perform the task together with another participant (in reality the confederate) for the purpose of faster recording experimental data. They were also informed that the computer had assigned the seats randomly, and that it was completely possible for them to remain at the same position throughout the entire experiment. In this case, it was also likely that the same person was wearing the glasses (eye tracker) all the time. As a matter of fact, participants remained in the same seat and wore the eye tracker during all experimental sessions.

At the very end of the experiment, a debriefing was set up by the experimenter to explain to the participants why the confederate acted in such different ways, and to know whether they were aware of the genuine purpose of the experiment. Participants were instructed to not discuss the purpose or the conditions of the experiment during the entire study. Two participants (not included in the 22 Ss) correctly assumed the real objective of the experiment; hence, their data were discarded from further analysis.

# Data Analysis

#### Relative Phase Calculation

fpsyg-08-01864 October 24, 2017 Time: 15:58 # 6

In the calculation of relative phase, previous studies examined the distribution of relative phase across the range of 0–180 degrees (Schmidt and O'Brien, 1997; Richardson et al., 2007). It is an efficient way of showing that relative phase values are not evenly distributed, with a dominance around in-phase and antiphase patterns. However, this methodology helps little to capture how much percentage of in-phase and anti-phase coordination segments occurred in a trial. It incorporates all relative phase values that are lower than 20 degrees in the region of in-phase coordination. But one single point with its relative phase lower than 20 degrees does not necessarily indicate the occurrence of in-phase coordination, and it could also be a sample in the middle of phase drifting. Alternatively, we reckon that in-phase or antiphase coordination segments are stable periods where relative phase values dwell around these two patterns of coordination. Therefore, in order to detect the intrinsic (i.e., in-phase and antiphase) patterns of coordination segments, discrete finger tapping data were converted into continuous signals with a sinusoidal function (Varlet and Richardson, 2011).

During the conversion, because both persons performed rhythmic oscillatory movement, two consecutive taps were considered as a full oscillatory cycle, and the position of the tapping moment was set as the value "−1" in the simulated sinusoidal function. Once the continuous signal was obtained, both participant's and confederate's signals were filtered by using a second order Butterworth filter, with a cutoff frequency of 10 Hz. Hilbert transform was employed in the final calculation of the relative phase between the participant and the confederate. The first 3 s and the last 2 s were discarded due to the transient process in the beginning and the abnormal value at the end of the Hilbert transform.

#### Dependent Variables of Coordination

We used different variables to compute the coordination level in the 100 and 150% frequency conditions, respectively, due to the fact that both persons tapped at the same frequency in the 100% condition and at different frequencies in the 150% condition. It was indeed not possible to compute in- or anti-phase coordination in the 150% condition.

In the 100% condition, we tested whether the percentage of in-phase, anti-phase, and/or the sum of these two patterns would be higher in the likable condition as compared to the other two conditions. The reason of computing the sum of in- and anti-phase coordination was described in section "Discussion."

The criteria for defining both in-phase and anti-phase patterns of coordination were (1) the existence of a coordination segment no less than five consecutive cycles of tapping, (2) no relative phase value more than 60 degrees deviated from the intrinsic patterns of coordination (**Figure 2**). The criteria were settled empirically to maximally capture the genuine coordination segments and to discard the out-of-coordination segments such as phase drifting. The percentage of coordination was calculated as the ratio of the total length of the specific pattern of coordination relative to the length of the trial.

In the 150% condition, we measured the changes in tapping frequency in the different likability conditions. With the confederate tapping in the participant's field of view, we expected the tapping frequency of the participant would be entrained to some extent. Specifically, we hypothesized that the participant's tapping frequency would increase more in the likable condition as compared to the other two conditions. In the unlikable condition, participants might even tap slower because they might intend to be "asynchronous" with the unlikable person (who was tapping much faster in this condition). The tapping frequency change rate was computed as (Freq60 – Freq30)/Freq30, where Freq30 and Freq60 stood for the median tapping frequency during the first 30 s and the last 60 s, respectively.

#### Gaze Direction

The eye tracker registered the natural visual scan during the motor coordination task. Three areas of interest were defined and examined: head, trunk, and finger (**Figure 3**). The size of these areas was determined with the principle of maximally covering the interested part and excluding extra areas even when the confederate was slightly moving. Of primary interest was whether the amount of gaze direction toward the three defined areas would differ with likableness. For this purpose, we computed the percentage of time when gaze direction was allocated to the interested area during the last 60 s.

Among the three areas, we summed the head and trunk areas together to create a new "body" area since it was difficult

to clearly separate these two areas. The confederate's head was moving occasionally during the coordination task, and it occurred very often that her chin went down into the trunk area.

By examining gaze direction allocated to these two areas – finger and body [termed as Gaze (finger) and Gaze (body), respectively, in the following text] – it was feasible to test whether likability exerted a general effect by looking at the whole body, or it favored the entrainment specifically by looking at the finger area.

### Expected Results

Due to the facts that most previous studies favored a positive correlation between likability and IMC, and that participants were not instructed to be bond with the confederate, we hypothesized that the coordination level in the likable condition would be higher than in the other two conditions. We also hypothesized that the higher level of coordination would be mediated by a greater amount of gaze direction toward the confederate's finger. We voluntarily formulated different hypotheses for the 100 and 150% conditions. In the 100% condition, we hypothesized a higher percentage of in-phase, and/or anti-phase, and/or sum of these two intrinsic patterns of coordination in the likable coordination than in the other two conditions. In the 150% condition, we hypothesized a higher frequency increase in the likable condition compared to the other two conditions.

# RESULTS

# Likability Questionnaire

To assess likability through all conditions, the mean of the eight items in the likability questionnaire was calculated. A repeatedmeasures ANOVA revealed a significant difference for the likability score (F2,<sup>42</sup> = 26, p < 0.01, η 2 <sup>p</sup> = 0.553). The Fisher's LSD post hoc test demonstrated that the level of likability in the likable condition (2.06 ± 0.18) was significantly higher than in the baseline (1.15 ± 0.22) and unlikable conditions (0.688 ± 0.32): both p < 0.01; and baseline was significantly higher than the unlikable condition: p < 0.05. This result confirmed that the likability manipulation was successfully executed.

# Predicting IMC

In this section, we first built linear mixed-effect models (LMEMs) to explore which predefined factors were significant predictors of the dependent variables by accounting for random effects. We included maximal random effects structure justified by the experimental design and assumptions (Barr et al., 2013). In all of the LMEMs listed below, we specify random slopes for the

by-subject effect of Likability and Trial. As a complement to LMEMs, the repeated measures ANOVAs or non-parametric tests were conducted to perform pairwise comparisons. LMEMs were performed by using the package lme4 (Bates et al., 2014) for R (R Development Core Team, 2014), whereas ANOVAs and non-parametric tests were conducted on SPSS (22.0).

# Testing Gaze as a Mediator between Likability and IMC

As the first step and one of our main objectives, we tested whether Gaze (finger) was a mediator between likability and IMC. According to Baron and Kenny (Baron and Kenny, 1986), at least two prerequisites needed to be fulfilled if gaze (finger) was the mediator between likability and IMC: (1) both Likability and Gaze were independently significant predictors for IMC; (2) only Gaze (finger) but not Likability was significant when both Likability and Gaze were entered into the model to predict IMC. To test prerequisite 1, we built two LMEMs by exploring whether Likability or Gaze (finger) alone exerted significant effect on IMC:

Model A: IMC ∼ Likability Model B: IMC ∼ Gaze (finger)

In both models, we entered Likability or Gaze (finger) alone as the fixed effect, and participant, participant's gender, likability order (whether likable was arranged before or after unlikable condition) as random effects. The dependent variables were the occurrence of in-phase, anti-phase and the sum of in-, anti-phase in the 100% condition and the frequency change rate in the 150% condition (**Table 1**). The detection of the significance of Likability or Gaze (finger) was conducted by using the likelihood ratio test (Giampaoli and Singer, 2009). As is shown in **Table 1**, results failed to show that both Likability and Gaze were independently significant predictor for either of the four parameters of IMC. The prerequisite 1 was not fulfilled, indicating that likability alone did not influence IMC, and that Gaze (finger) was not the mediator between Likability and IMC.

# Likability: Trial and Likability: Gaze Interaction Effect on IMC

According to results presented in **Table 1**, our initial hypotheses regarding the effect of likability on IMC and the mediating effect of gaze between likability and IMC seemed to be rejected. This might be because that the effect of likability on IMC varied with time (trial), and/or that the effect of gaze on IMC was moderated by likability. In order to test these possibilities and to explore the effect of other factors on IMC, we created Model C with LMEMs:

Model C: IMC ∼ Likability + Trial + Gaze (finger) + Gaze (body) + Likability:Trial + Likability:Gaze (finger)

In Model C, the dependent variables were the occurrence of inphase, anti-phase and the sum of in- and anti-phase in the 100% condition, and Frequency change rate in the 150% condition. We entered Likability, Trial, Likability:Trial (interaction), Gaze (finger), Gaze (body), and Likability:Gaze (finger) as fixed effects, and participant, participant's gender, likability order as random effects. The p-value of a fixed effect was determined with the Kenward–Roger approximation to the degrees of freedom (Halekoh and Höjsgaard, 2014).

In the 100% condition, results showed a significant interaction effect of Likability:Trial on the sum of in- and anti-phase coordination (p < 0.01), and significant interaction effects of Likability:Trial and Likability:Gaze (finger) on the in-phase coordination (both p < 0.05). The statistics approached but did not reached the significant level for the main effect of Likability and the interaction effect of Likability:Gaze (finger) on the sum of in- and anti-phase coordination (0.05 < p < 0.1). These results indicated that the effect of Likability on IMC varied with time (Trial), and that likability moderated how Gaze (finger) affected IMC. Further analysis was performed in the following section to seek how IMC varied with Likability and Gaze (finger).

In the 150% condition, results showed that Gaze (finger) exerted a significant effect on frequency change rate (p < 0.05).

### **Likability's effect on IMC in the 100% condition**

In order to explore how IMC varied with likability in different trials, we performed the repeated-measures ANOVAs with the structure of 3 Likability (baseline, likable, and unlikable): 3 Trial (first, second, and third trial). The dependent variables were the occurrence of in-phase, anti-phase and the sum of in- and anti-phase coordination.

Results revealed no main or interaction effect of Likability on the occurrence of the in-phase or anti-phase coordination (Inphase: Baseline 34.21% ± 3.51, Likable 32.66% ± 4.43 Unlikable 26.16% ± 3.86. F2,<sup>42</sup> = 1.26, p > 0.05, η 2 <sup>p</sup> = 0.056; Antiphase: Baseline 10.94% ± 1.77, Likable 8.39% ± 2.20, Unlikable 8.74% ± 1.59. F2,<sup>42</sup> = 0.57, p > 0.05, η 2 <sup>p</sup> = 0.027). However,

TABLE 1 | Results of the linear mixed-effect models (LMEMs) predicting interpersonal motor coordination (IMC) (100% condition: occurrence of in-phase, anti-phase and sum of both in- and anti-phase coordination; 150%: frequency change rate) with Likability or Gaze (finger) as the fixed effect.


Model A: IMC ∼ Likability. Model B: IMC ∼ Gaze. <sup>∗</sup>p < 0.05.

an interaction effect of Likability:Trial was found for the sum of in- and anti-phase coordination: F4,<sup>84</sup> = 3.45, p < 0.05, η 2 <sup>p</sup> = 0.141. This result was consistent with those obtained with LMEMs shown in **Table 2**. Further, Post hoc tests (Fisher LSD) demonstrated that the percentage of the sum was significantly higher in the baseline condition compared to the other two conditions (both p < 0.05) in the first trial of the coordination task; and it was significantly higher in the likable condition than in the unlikable condition (p < 0.05) in the third trial of the coordination task. The difference between the likable and baseline conditions in the third trial of the coordination task approached but did not reach the statistically significant level (p = 0.051).

Examining the coordination change over practice time in each likability condition, we found that the level of coordination dropped significantly in the baseline condition from the first compared to the third trial (p < 0.05). It dropped slightly in the unlikable condition and increased in the likable condition, however, the increased level of coordination also approached but did not reach the statistical significance in the likable condition (p = 0.084) (**Figure 4**).

In short, our results illustrated that likability led to greater extent of IMC in the last portion of the finger tapping task, and that the level of coordination varied with practice time in the 100% condition.

#### **Gaze (finger)'s effect on IMC in 100% condition**

The Likability:Gaze (finger) interaction effect on the occurrence of the in-phase coordination (p < 0.05) as well as the sum of the in- and anti-phase coordination (0.05 < p < 0.1) indicated that likability moderated the impact of gaze (finger) on IMC, suggesting that looking at the confederate's finger exerted different effect on IMC depending on the level of likability. In order to understand the relation between Gaze (finger) and IMC in each of the likability conditions, we checked the correlation between Gaze (finger) and IMC (in-phase and sum of in- and anti-phase) in these three conditions independently.

As for the relation between Gaze (finger) and the occurrence of in-phase coordination, results showed a positive correlation between these two variables in the likable condition (r = 0.377, p < 0.05), but not in the other two conditions (baseline: r = −0.051, p = 0.671, unlikable: r = −0.135, p = 0.271). The comparison between the three correlational strengths was

performed (Raghunathan et al., 1996) to show that the correlation was significantly stronger in the likable condition as compared to the baseline (z = 2.50, p = 0.012) and to the unlikable conditions (z = 3.12, p = 0.002). No significant difference was revealed between baseline and unlikable condition (z = 0.47, p = 0.64).

Similar results were obtained for the correlation between Gaze (finger) and the sum of the in- and anti-phase coordination (baseline: r = −0.042, p = 0.738; likable: r = 0.308, p < 0.05; unlikable: r = −0.182, p = 0.143; **Figure 5**). The comparison tests also showed that the correlation in the likable condition was significantly higher than the baseline (z = 1.99, p = 0.046) and unlikable condition (z = 2.25, p = 0.024), and no significant difference between baseline and unlikable condition (z = 0.14, p = 0.89). All these results suggested that focal visual information uptake of the partner's movement led to IMC only when the interaction partner was likable.

#### **Gaze (finger)'s effect on frequency change rate in 150% condition**

In the 150% condition, the LMEM showed that likability was not a significant predictor of the frequency change rate (Baseline: 4.3% ± 1.23, Likable: 4.3% ± 1.54, Unlikable: 2.3% ± 1.39),


The dependent variables were the occurrence of in-phase, anti-phase and sum of both in- and anti-phase coordination in the 100% condition, and the frequency change rate in the 150% condition. Likability, Trial, Gaze (finger), Gaze (body), Likability:Trial, and Likability:Gaze were entered as fixed effects to see which factor was the significant predictor. ˆ0.05 < p < 0.1, <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

and that only Gaze (finger) was a significant predictor. Pearson's correlation was run to show that Gaze (finger) was significantly positively correlated to the frequency change rate in the 150% condition (r = 0.158, p < 0.05), suggesting that looking at the confederate's finger tapping would increase the participant's tapping frequency regardless of the level of likability.

# Likability's Effect on Gaze (Finger)

In this step, we also explore whether likability would influence the amount of gaze onto the confederate's movement. We used LMEM to predict the amount of Gaze (finger) by entering Likability, Trial, and Likability:Trial as fixed effects, using participant, participant's gender, and the likability order as random effects. Results showed that in both 100 and 150% conditions, Likability was not a significant predictor (100%: F2,<sup>20</sup> = 3.08, p > 0.05; 150%: F2,<sup>20</sup> = 2.01, p > 0.05). The Friedman's tests also failed to show a significant difference in the amount of Gaze (finger) between different likability conditions (100%: Mdn\_Baseline = 2.9%, Mdn\_Likable = 5.5%, Mdn\_Unlikable = 3.0%, p = 0.277; 150%: Mdn\_Baseline = 1.7%, Mdn\_Likable = 5.0%, Mdn\_Unlikable = 1.4%; p = 0.203). These results indicated that the amount of gaze onto the confederate's finger did not depend on the likability level.

# Results Summary

(1) Post-conversation likability questionnaire showed that the level of likability was highest in the likable condition and lowest in the unlikable condition with baseline in the middle. It suggested that the manipulation of likability was successfully performed.


# DISCUSSION

Our study explored whether likability influences how individuals behaviorally coordinate with each other while exploring natural gaze direction. We found that when the confederate was tapping

at the same tempo as the participant, likability affects IMC in interesting ways over time: While likability had no influence on IMC early in the motor synchronization task, we saw that participants who liked their partners — due to an induced friendly conversation in the interaction — showed higher IMC as the interaction wore on, compared with participants who had neutral or unfriendly interactions with their partner. More interestingly, we found that likability of the partner moderated how focal visual information uptake influenced IMC.

Previous studies demonstrated that vision is an essential element in coupling two individuals (Schmidt and O'Brien, 1997; Richardson et al., 2007; Oullier et al., 2008), and the amount of available visual information is positively correlated to the level of unintentional coordination (Richardson et al., 2007). The conclusion is drawn without taking into account the likability of the coordination partner, which is a key psychosocial feature of the person. Differently from these findings and as a novelty of our study, we found a positive correlation between focal visual information uptake and IMC only when her likability was high, inferring that whether looking at the partner's movement would lead to coordination depends on the likability of the person. It implied that likability of the interaction partner might have been a confounder in those studies, suggesting that likability needs to be seriously treated in future studies on IMC. Furthermore, the moderating effect of likability determined the participant's tendency to coordinate more when the partner was likable, and it helped partially to explain the higher level of coordination in the last portion of the task in the likable condition.

We noticed, however, that the variations of gaze (finger) as a function of likability with time did not correspond exactly to that of IMC in the 100% condition, indicating that gaze alone could not explain the performance of IMC. It was still impossible, however, to deny that IMC was determined by the amount of visual information uptake since vision incorporates both focal and peripheral vision. In the present study, our eye tracker only registered focal vision, and did not take into account the peripheral view. Recent studies instructing the participant's vision not to be focused on the coordination object suggest that peripheral vision would also lead to some level of coordination (Richardson et al., 2007), particularly when the object was oscillating at the same frequency. Similarly, our results also indicated that participants coordinated with the confederate by using the peripheral information. As shown on **Figure 3**, the "finger" area was restricted to a confined area, and it was not located in the natural straight visual field of the participants. Moreover, there were several cases in which even if gaze direction was not directly focused on the confederate's finger, the level of IMC was high (**Figure 5**). This observation was a demonstration that the participant's peripheral vision captured the confederate's tapping information; otherwise no IMC could be established in the current study's paradigm, since no other forms of perceptual information (aside from visual information; e.g., sound, touch) about the partner were available to the participant (Richardson et al., 2007). Therefore it could only be declared that the focal visual information uptake was not the mediator between likability and IMC. Further investigation is needed concerning how the focal and peripheral visual information uptake influence IMC.

As for the underlying reason for why IMC varied with likability and time, apart from the moderating effect of likability, we assumed that motivation might have also been involved in the interplay between likability and IMC. This was mainly because of our findings that IMC was high in the first trial of the baseline condition, and in the third trial of the likable condition. The first trial of the baseline condition was always arranged right after the participant and the confederate first met. Participants might have been curious about the confederate; hence, they were probably motivated to have a further interaction with her. As motor coordination serves as a useful tool to establish affiliation (Lakin and Chartrand, 2003; Hove and Risen, 2009), the high level of coordination at the beginning may manifest the participant's motivation to be affiliated with the confederate (Miles et al., 2011). The decreasing trend of IMC in the baseline and unlikable conditions might be due to a general decrease in motivation with time, although participants were only engaged in tapping for a total of 4.5 min during each of the three visits to the lab. In the likable condition, however, the decrease was compensated by the confederate's likableness. The finding was consistent with our expectation. It corresponds to our daily experience that interaction with unlikable people is shortened, leading to a reduced amount of coordination. On the contrary, with persons we genuinely like, we attempt to maintain affiliation, and this may lead to a persistent high level of motor coordination since motor coordination is able to increase affiliation (Hove and Risen, 2009). However, the claim that motivation genuinely played a role in this process obviously needed further investigation. For instance it will be helpful in future research to record the level of motivation trial by trial in order to seek whether changes in IMC can be explained by motivation.

When the confederate tapped at 1.5 times of the participant's frequency, our results demonstrated a positive correlation between the amount of gaze on the confederate's movement and the frequency change rate regardless of the likability level. This finding was consistent with Issartel et al.'s finding that available visual information could lead to frequency entrainment no matter if they were willing to coordinate or not (Issartel et al., 2007). But our initial hypothesis was rejected that participants did not show higher amount of frequency increase when the level of likability was high. One possible reason might be that during the IMC task, participants did not look more onto the confederate's tapping in the likable condition as compared to the other two conditions. Because the extent of frequency increase only depended upon the amount of focal vision on the movement in this particular condition, this determined that the frequency increase rate was not different in these likability conditions.

To be noticed is that our results showed that likability moderated the relation between gaze (finger) and IMC in the 100% but not 150% condition. It was not clear why the moderation effect of likability occurred only when the partner was oscillating at the same tempo. It might be due to the nature of the coordination task since past research showed that the level of coordination differed with task and that people were less entrained when the partner's oscillating frequency exceeded their own preferred frequency (Richardson et al., 2007). Previous findings adopting the unintentional IMC paradigm

suggested that individuals are more likely to be unintentionally entrained into coordination regimes when the tapping frequency is within the range of ±10% of one's preferred frequency (Schmidt and Richardson, 2008). In our experiment, the confederate was tapping at 1.5 times of the participant's current tapping frequency, which could be perceived as too high as compared to their own tempo. In order to follow the instruction of "keeping a constant" frequency, they might have restricted themselves to be influenced by the confederate. Therefore we reckoned that the effect of likability might have been masked by the participant's willing to follow the instruction of maintaining their own tempo.

In the 100% condition, we checked the occurrence of inphase, anti-phase and the sum of these two patterns in our study, whereas previous studies treated the occurrence of in- and antiphase separately (Schmidt and O'Brien, 1997; Richardson et al., 2007). Some only calculated in-phase coordination. For example, Hove and Risen (2009) found the effect of phase entrainment on the likability ratings. Phase relation was only referred to inphase coordination since the synchrony was calculated as the co-occurrence of the two person's taps within 100 ms in their study (Hove and Risen, 2009). In our study, we computed the sum of in- and anti-phase patterns. The phenomenon that individuals are entrained into these two patterns of coordination could be explained by two main theories. In one theory, the finding of mirror neurons might be effective in explaining why people are engaged into in-phase coordination (performing the same movement) (Rizzolatti et al., 1996). However, it does not well explain anti-phase coordination since anti-phase coordination requires individuals to perform a temporally opposite movement. We believe that the second theory, the ecological approach to perception and action, provides a more reasonable account. According to this approach, a person is able to directly perceive both the environment and the self in relation to the environment (Gibson, 1979). Existing work evidenced that relative phase (an index of the relation between self and the environment) exists in the visual information that could be directly harnessed to coordinate with the perceived movement (Schmidt et al., 1990; Bingham et al., 1999). The main reason why individuals are entrained into in- and anti-phase coordination might be because the near-preferred-frequency rhythmic movement contains particular visual information that triggers individuals to spontaneously perform corresponding coordinating behavior. The characteristic of being triggered by external stimuli (be it social or not) might represent one's overall sensitivity to the visual information of the external stimuli. Studies on mimicry suggest that the general sensitivity is critical for establishing affiliation with others (Chartrand and Bargh, 1999; Lakin and Chartrand, 2003), and it may also be affected by one's personality traits (e.g., pro-social trait, extraversion) or clinical diagnosis (e.g., autism, depression) (Condon and Ogston, 1966; Lumsden et al., 2012; Marsh et al., 2013; Duffy and Chartrand, 2015). Therefore, if we consider both in- and anti-phase patterns of coordination as representing one's general sensitivity to the visual information, it is not unreasonable to take the sum of these two intrinsic patterns together as an index of the level of coordination. Moreover, taking the sum of both in- and anti-phase together did not violate the results of previous work (Schmidt and O'Brien, 1997; Richardson et al., 2007), in which the sum of these two patterns of coordination was also statistically higher than the chance level.

In sum, our study indicated that the coordination task itself influenced how individuals behave. The effect of likability only becomes obvious when the coordination partner was oscillating at one's preferred frequency. Our study explored the natural gaze direction during the coordination task, and it inferred the importance of investigating the role of peripheral vision and motivation during the interaction. Overall, our study suggests that IMC is a complex phenomenon, which is sensitive to multiple factors.

# Strengths and Weaknesses

Our study adopted a within-subject design by having participants interact with the same person in three different likability conditions, which simulated the real social situation in a good way, because it occurs in our daily life that likability of the same person can change with time and events. It is argued that if IMC varies with likability even with the same person, it is possible to assess the level of likability through measuring the performance of IMC. This particular experimental design might provide empirical evidence particularly for people who are interested in evaluating interpersonal relationship through behavioral assessment.

Motivated by previous studies reporting the close relation between visual perception of the partner's movement and IMC, we explored how the participants directed their gaze during the coordination task. Different from studies which required participants to close their eyes or look in a specific direction (Richardson et al., 2007; Oullier et al., 2008). Our present study released the visual constraints by allowing participants a natural looking behavior. Together with other studies investigating natural gaze during interaction (Broz et al., 2012; Gironzetti et al., 2016), our study served as an expansion for seeking natural gaze in IMC specifically.

One weakness lies in the lack of naturalness of the IMC task. Here we adopted a finger-tapping task, which is not a common daily human activity. Recent studies based on the advancement of image analyzing techniques evidenced the possibility of measuring coordination in more natural settings (Ramseyer and Tschacher, 2011; Schmidt et al., 2012, 2014; Paxton and Dale, 2013; Kupper et al., 2016). Kupper et al. (2016) used the motion energy analysis to obtain the time series of the activity of a pre-defined area of a person during a conversation by means of detecting pixel changes between two consecutive images (Ramseyer and Tschacher, 2011; Kupper et al., 2016). Their studies indicated that this technique is a valid tool to capture the coordination level during natural social interactions. Schmidt et al. (2012, 2014) implemented a similar image analyzing technique to compare the phase relation between the two time series in a joke-telling task, and found the dominant presence of intrinsic in-phase and anti-phase patterns of coordination. Paxton and Dale (2013) recorded how participants interacted during conversations and analyzed their bodily synchrony with frame differencing analysis. Complexity matching was also reported as a means to capture coordination in a natural dyadic conversation (Abney et al., 2014). In addition,

Grammer et al. (1998) reported using behavioral pattern searching algorithms to look for behavioral correlates of coordination during natural conversation. These studies showed the possibility of directly measuring coordination in an ecological setting. However, implementing purely natural conversational situations in our case would pose a considerable difficulty to reveal whether the level of coordination was influenced by the amount of perceptual information uptake. First, interactants are moving in a gross way during natural interaction, and exhibit simultaneously various gestures, postural sway, head movements and so on. Second, eye tracking does not guarantee a specific relation between gaze direction and source of entrainment, as stated above. Third, the control of other types of perceptual information uptake, such as auditory perception, is difficult to achieve during natural conversation. In our study, we tried to adopt the best compromise between task naturalness and mechanism exploration.

Due to technical problems, although the recording of the eye tracker was intended to be launched by the first tap of the participant, sometimes the eye tracker was initiated a bit late (within 2 s). In this case, we checked the overall distribution of the coordinated behavior instead of the momentto-moment dynamics of coordination. This limitation prevented us from exploring the hypothesis whether gaze onto the partner's movement preceded the coordinated behavior.

Another issue pertains to the ongoing concern regarding the use a confederate in our experiment. Recent studies indicated that the confederate's behavior can be different from the spontaneous behavior of naïve participants because they are familiar with the study hypothesis and procedure (Brennan et al., 2010), and this might influence the results. A recent meta-analysis also found that involving confederates in the experiment might influence how participants perceive them and the relationship between them (Vicaria and Dickens, 2016), which might affect the IMC performance. The confederate was aware of the hypothesis in our study, and this might have affected the participant's performance in IMC although we tried to reduce the possibility to the minimal level. In the experiment, she was specifically instructed to express a neutral emotion when tapping with the auditory metronome in all conditions. Considering the simplicity of the task the confederate was performing (finger tapping without communicating with the participant), we assumed that her performance in the interpersonal finger

# REFERENCES


tapping task could be literally considered as equivalent in different likability conditions. In this sense, the difference in IMC between likability conditions might not be attributed to the employment of the confederate. Even though, setting another condition with a naïve participant might be ideal to determine whether the employment of the confederate affected our results.

# CONCLUSION

As human behavior could be both the output of the cognitive processes and the vehicle one uses to achieve one's purpose, the impact of likability on IMC is more than straightforward. Individuals may coordinate at both high and low level with a likable person depending on multiple factors such as likability, motivation, gaze direction, and so on. Our study indicates that psychosocial properties such as likability of the interaction partner should be cautiously treated when investigating IMC.

# AUTHOR CONTRIBUTIONS

ZZ, RS, LM, MG, and BB designed the experiment. ZZ and RS wrote the matlab code for running the experiment. ZZ, RS, LM, and BB analyzed the data. ZZ, RS, LM, and BB wrote the manuscript.

# FUNDING

This experiment was financially supported by the European Project AlterEgo, FP7 ICT 2.9 – Cognitive Sciences and Robotics, Grant Number 600610.

# ACKNOWLEDGMENTS

We thank Stephane Reysen for his suggestion on designing our likability questionnaire. We are grateful to Frank Bernieri for his useful and enthusiastic suggestions on how to manipulate likability. We also thank the European Commission and the AlterEgo Project (FP7 Grant #600610) for their financial support.

considerations. J. Pers. Soc. Psychol. 51, 1173–1182. doi: 10.1037/0022-3514.51. 6.1173




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Zhao, Salesse, Marin, Gueugnon and Bardy. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Taking Up an Active Role: Emerging Participation in Early Mother–Infant Interaction during Peekaboo Routines

Iris Nomikou1, 2 \*, Giuseppe Leonardi 2, 3, Alicja Radkowska<sup>4</sup> , Joanna R ˛aczaszek-Leonardi <sup>4</sup> and Katharina J. Rohlfing<sup>2</sup>

*<sup>1</sup> Psychology Department, University of Portsmouth, Portsmouth, United Kingdom, <sup>2</sup> Department of German Studies and Comparative Literature Studies, Paderborn University, Paderborn, Germany, <sup>3</sup> Faculty of Psychology, University of Finance and Management in Warsaw, Warsaw, Poland, <sup>4</sup> Faculty of Psychology, University of Warsaw, Warsaw, Poland*

#### Edited by:

*Hanne De Jaegher, University of the Basque Country (UPV/EHU), Spain*

#### Reviewed by:

*Akira Takada, Kyoto University, Japan Kaya de Barbaro, University of Texas at Austin, United States*

> \*Correspondence: *Iris Nomikou iris.nomikou@port.ac.uk*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *30 October 2016* Accepted: *08 September 2017* Published: *10 October 2017*

#### Citation:

*Nomikou I, Leonardi G, Radkowska A, R ˛aczaszek-Leonardi J and Rohlfing KJ (2017) Taking Up an Active Role: Emerging Participation in Early Mother–Infant Interaction during Peekaboo Routines. Front. Psychol. 8:1656. doi: 10.3389/fpsyg.2017.01656* Dynamical systems approaches to social coordination underscore how participants' local actions give rise to and maintain global interactive patterns and how, in turn, they are also shaped by them. Developmental research can deliver important insights into both processes: (1) the stabilization of ways of interacting, and (2) the gradual shaping of the agentivity of the individuals. In this article we propose that infants' agentivity develops out of participation, i.e., acting a part in an interaction system. To investigate this development this article focuses on the ways in which participation in routinized episodes may shape infant's agentivity in social events. In contrast to existing research addressing more advanced forms of participating in social routines, our goal was to assess infants' early participation as evidence of infants' agentivity. In our study, 19 Polish mother–infant dyads were filmed playing peekaboo when the infants were 4 and 6 months of age. We operationalized infants' participation in the peekaboo in terms of their use of various behaviors across modalities during specific phases of the game: We included smiles, vocalizations, and attempts to cover and uncover themselves or their mothers. We hypothesized that infants and mothers would participate actively in the routine by regulating their behavior so as to adhere to the routine format. Furthermore, we hypothesized that infants who experienced more scaffolding would be able to adopt a more active role in the routine. We operationalized scaffolding as mothers' use of specific peekaboo structures that allowed infants to anticipate when it was their turn to act. Results suggested that infants as young as 4 months of age engaged in peekaboo and took up turns in the game, and that their participation increased at 6 months of age. Crucially, our results suggest that infants' behavior was organized by the global structure of the peekaboo game, because smiles, vocalizations, and attempts to uncover occurred significantly more often during specific phases rather than being evenly distributed across the whole interaction. Furthermore, the way mothers structured the game at 4 months predicted infant participation at both 4 and 6 months of age.

Keywords: mother-infant interaction, social routines, scaffolding, agentivity, coordination

# INTRODUCTION

Social interaction requires the coordination of agents' independent behavior in a manner that is appropriate within a given culture, relevant to a situation, and efficient in a task at hand. Whereas, the most important question when thinking about adult interaction seems to be how independent agents come to co-construct a given functional interaction, the focus on the developmental time scale leads us to ask: How do infants become agents in the first place?

Traditional approaches to the development of social skills focus mostly on age-dependent transformations of individual cognitive abilities in children. They view development as a unidirectional trajectory with specific milestones to be achieved on the way toward a particular end point. Viewing agentivity from this perspective positions the process of its development within the infant's mind. These approaches stand in contrast to ecological approaches that focus on continuous individual– environment interactions in which development is bidirectional: Infants not only shape their environment but, at the same time, are also shaped by it. Viewing the development of agency from this perspective means trying to characterize the complex interactional structures in which children are immersed and the transformative role they might possess (Fogel and Thelen, 1987; Reed, 1996). One such approach is the dynamic systems approach, with its notion of reciprocal causality between local and global systems or levels. Reciprocal causality underscores how individual behaviors give rise to and maintain global interactive patterns and how, in turn, they are shaped by them (Riley et al., 2011; Richardson et al., 2014). In the developmental context, one global level seems to play a crucial role in shaping individual skills: the level of structured interaction reenacted for and with the child (R ˛aczaszek-Leonardi et al., 2013; Rohlfing et al., 2016).

Early interactions comprise activities that can be characterized by their high repetitiveness: Repetition of themes (and their modification) occurs not only within single interaction episodes (Stern, 1977; Stern and Gibbon, 1979) but also across multiple interactions in time. In this article, we consider a special form of recurrent interactions, namely social routines. Social routines operate by presenting predictable elements so frequently that the child comes to recognize the structure they constitute. In contrast to coordination through contingent responsiveness to infants' initiatives that are performed locally in a turntaking manner, social routines facilitate coordination through the predictability of series of caregiver-driven actions as a whole. In well-practiced routines, successive actions follow one another as "moves" distributed between the participants because that particular sequence is given by the format (Snow et al., 1987). The interesting aspect of routines is that it is not crucial for a child to "understand" the individual moves as elements of the routine in order to perform them. For early routines, such as Hello; How are you? Fine, thanks, and you? Fine, Gleason and Weintraub (1976) propose that learning the routine does not require knowing what it means to feel fine. This is because the predictability of the appropriate actions provides an adequate basis for the child to perform correctly: It is more about saying and doing the right things at the right time than about any deeper semantic processing. During the first several runs of a routine, the infant's participation might be limited; but, in time, infants learn their moves as well as the roles involved and adults start to demand participation. In this way, responsibility for some parts of the sequence shifts eventually to the infant (Snow et al., 1987; Heller and Rohlfing, 2017). Thus, social routines provide a context in which to observe the development of coordination of activities. Social routines also provide a context in which to observe the process of shaping agentivity, because infants are treated as participants from early on (Ochs, 1988; Zukow-Goldring, 1996; De León, 1998; Takada, 2012; R ˛aczaszek-Leonardi et al., 2013; Nomikou et al., 2016). It is within these interactions that infants learn "to coordinate their engagement, that is, to adjust their behavior in response to and in anticipation of each other's actions" (Rossmanith et al., 2014, p. 3). This happens because the modes of interacting with caregivers instill values of agency (R ˛aczaszek-Leonardi and Nomikou, 2015). Thus, the search for the origins of infants' ability to coordinate with others is none other than the quest for the origins of agentivity within interaction, because interindividual relations shape the individual agents on which they depend (De Jaegher and Froese, 2009).

With respect to the global and local structures shaping agentivity mentioned above, social routines are an ideal context in which to observe how a global format of interactional moves when repeated often enough—shapes the local behavior of the child; that is, how to perform the correct next step in a sequence and how to act her or his part in an interaction. Given the amount of time caregivers and infants spend every day on various kinds of routines, it might be reasonable to assume that they constitute culturally transmitted practices that scaffold the development of agentivity. Our main goals are, therefore (1) to document the active role infants take so that their actions fit the routine format, (2) to characterize the properties of such routinized interactions that seem to facilitate emergent agentivity of an infant, and (3) to identify whether early in their development infants are engaging in the routine as a whole (orienting toward its global structure) rather than reacting to individual elements of it (acting at a local level).

In this article, we focus on peekaboo play (see also Bruner and Sherwood, 1976) as a restricted "action format" (Ratner and Bruner, 1978; Bruner, 1983) involving a limited number of elements (Ratner and Bruner, 1978) which makes the game easy to repeat. Through repetition, there is a "clear-cut task structure [that] permits a high degree of prediction of the order of events" (Ratner and Bruner, 1978, p. 392). Ratner and Bruner (1978) point to the fact that these games have a clearly demarcated and reversible role structure. Thus, due to its interactive nature, the activity of a peekaboo game not only entails a particular temporal order of individual actions ("what to do next") and specific junctures ("when is my turn") but also a particular social organization toward a joint goal: Participants assume certain interactive roles and take responsibility for role-related tasks ("who does what") (Nomikou et al., 2016). The constituents of the game are the hidden person (mother or infant), the device for hiding (cloth or hands), the agent effecting the hiding, and the agent effecting reappearance. Ratner and Bruner (1978) report that the important phatic stages in the game, the presequence and the subsequence, are intended to keep players in contact with each other. According to Bruner and Sherwood (1976), there is a basic "syntax" of necessary constituents: contact disappearance—appearance—contact. Taken together, the games comprise a global structure in the form of an interaction protocol that can be negotiated between the participants when targeting a joint goal (Rohlfing et al., 2016).

Such early games have been reported to be played when infants are around 2–3 months of age (e.g., Fantasia et al., 2014). They have been characterized as fundamental, allowing the nature of early communication to be explored (Bates, 1979; Bruner, 1983). Fernald and O'Neill (1993) report that during peekaboo, infants show pleasure when they can predict the next step in the actions. However, existing literature has described infants' participation in peekaboo in terms of their ability to change semantic elements: These are, for example, the appearance or disappearance in the sequence (Bruner and Sherwood, 1976; Ratner and Bruner, 1978; Bruner, 1983). These studies showed that, in time, infants understood the semantics of these elements of the game and could vary, for example, who disappears (caregiver, child, or object), how the disappearance is carried out (behind the palms of hands, a cloth, or a chair), or where the reappearance will take place (e.g., same side or different). Other studies have described infants' participation as the production of consistent, speech-like phonological forms in specific phases of early games (e.g., Ratner and Bruner, 1978; Hsu et al., 2014). This is due to the fact that Bruner's and others' original work on peekaboo explicitly related it to language acquisition. The idea is that within such a constrained rule-like interaction format, infants learn to use conventionalized behaviors; that is, not any kind of vocalization but a particular one, and this resembles what happens in language acquisition. Because of the relation to language development, most studies on peekaboo have focused on infants' development of vocalizations within peekaboo routines, investigating infant behavior in the second half of their first year and their second year of life (e.g., Bruner and Sherwood, 1976; Rome-Flanders and Cronk, 1995; Hsu et al., 2014). While taking these behaviors into account convincingly relates early games to later language development (see also Snow et al., 1987; Rome-Flanders and Cronk, 1995), they represent quite advanced forms of participating in a social routine. Ignoring more basic behaviors in research on early games makes infants from birth to 7 months appear passive (Parrot and Gleitman, 1989; Rochat et al., 1999). Clearly, there is a need to develop measures allowing us to assess infants' early participation.

Early participation has also been investigated in experimental setups that manipulated the structure of the peekaboo game. For example, Parrot and Gleitman (1989) investigated 2-, 6-, 7-, and 8-month-old infants' smile, laughter, and eyebrow movements, and Rochat et al. (1999) investigated 2-, 4-, and 6-month-old infants' use of gaze and smile. In both studies, the infants used these modalities when their expectations about the game were violated and/or confirmed. Yet, due to the scripted nonresponsive nature of their design, it could be argued that these studies put the infant in a spectator stance (Reddy and Uithol, 2016) in which their participation, although perhaps to some extent observable, was not really demanded. A step away from these controlled observations was taken by Fantasia et al. (2014) who used a semi-experimental setting. The authors investigated infants' participation and expectations in familiar early play routines and in violated forms thereof (no sound or no gesture). Infants as young as 3 months showed overall decreased participation (less smiling, laughing, and body movement) and more stunned face expressions in altered play in comparison to the known play routine. The above studies are interesting, because they show that although infants may not use verbal modalities earlier in development, they are already capable of selecting behaviors from a repertoire of other resources such as smile, body movement, or gaze. Another study addressing the shortcomings of experimental manipulation was carried out by Szufnarowska and Rohlfing (2014). They filmed mothers playing peekaboo with their very young infants in a more natural setting. They found that 2-month-old infants engaged in the activity by smiling back at their mother after she reappeared. The interesting finding from this analysis was that it took more than one repetition of a peekaboo round for the infants to show this response. This underscores the importance of the repeatability of the interaction patterns. Furthermore, for the mother, the smile had an important motivational effect, supporting her in continuing the game. Interestingly, the analyses revealed that infant smiles were, to a large degree, embedded in episodes of mutual gaze. The value of sustaining attention for social interaction with older children has recently been recognized by Yu and Smith (2016). It seems, however, that an interaction with young infants can already benefit from this: In dyads that managed to establish mutual gaze, a smile initiated a series of turns (Szufnarowska and Rohlfing, 2014). These insights clearly speak in favor of the interactive nature of early games in which both participants need to engage. However, in the current literature, the circumstances under which infants gain a grasp of the structure of the peekaboo game are still nebulous. As already mentioned above, existing studies focus on advanced forms of participation, use experimental designs that do not really capture infants' participation in naturalistic environments, and, finally, those few studies that do investigate more natural early interactions have not yet provided a developmental account of early, initial forms of participation. To sum up, although existing results may increasingly lend support to the idea that the global structure is built up, the question how infants become capable of maintaining it remains unanswered.

Pursuing the question how infants acquire the global structure of a game, Bruner (1983; see also Ratner and Bruner, 1978) focused on the role of caregivers adjusting to the child's developing sensory and motor abilities and the way this allows a more vivid engagement in and control of an interaction. The argument behind the focus on the role of caregivers is that caregiver scaffolding behavior operates on different timescales: On a short-term timescale, it provides structure to the ongoing interaction. On a long-term timescale, recurring instances or features of the provided structure lead to the emergence and stabilization of interaction frames that shape current and later development (Nomikou, 2015). This is because development is shaped by cumulative experience (Hsu and Fogel, 2003; Fogel et al., 2006), suggesting that variability in the way in which caregivers act on early interactions will be reflected in the later behavior of the infant. This assumption has its roots in socio-cultural theory and (among others) the work of Vygotsky (1978) who suggested that parent–child interaction characterizes development prospectively and is consistent with studies suggesting that different qualities of interactions will lead to different developmental outcomes (e.g., Keller and Gauda, 1987; Bornstein and Tamis-LeMonda, 1997). Bruner and Sherwood (1976) emphasize the caregivers' role in teaching infants the global structure that will result in their more active participation.

Some evidence on the relationship between routine structures and infant participation comes from the work of Ross and Lollis (1987) who found that 9-month-old infants reveal knowledge of the content of a routine and both their roles and those of their partners by taking their turns at appropriate times and by repeating that role during interruptions of the routine. They suggest that understanding aspects of the structure of games may precede the ability or desire to assume certain roles. This, we argue, might underestimate younger infants' abilities to participate. Yet it does provide an interesting approach for looking into early participation and recognition of the global structure of routines by focusing on the individual steps of the peekaboo game and how the infants fit their behavior into these. In concert with the evidence suggesting that sequential structure affects early participation in interactions (Fantasia et al., 2015), it seems plausible that mothers who create more opportunities for their infants to take up their turn will have infants who participate more actively than other infants.

In sum, there is a need for studies that focus on infants younger than 6 months and their communicative means if we are to understand the basis for their increasingly active participation. In line with research on early interactional participation (R ˛aczaszek-Leonardi et al., 2013; Reddy et al., 2013; Fantasia et al., 2014), we do not agree with the statement that infants are "too young to take an active part in... peekaboo" (Rome-Flanders and Cronk, 1995, p. 343). Instead, we argue that interactional behavior in general (i.e., knowing what to do next, the awareness of the interactive role, and how to distribute the work in order to reach a joint goal; see Rohlfing et al., 2016) is a prerequisite of understanding the global structure of the game and the driving force in social coordination. "Many of the forms that later occur in practical situations make their first appearance in the safe confines of structured games" (Ratner and Bruner, 1978, p. 401). Hence, in the present study, we were interested in the development of infants' early participation in a social game and we focused on 4- and 6-month-old infants. More specifically, we were interested in their emerging participation in the routine, manifested in their attempts to take an active role at specific phases of the game as well as the use of social signals within the structured interaction. Given the simple recurrent structure of the peekaboo format, and the fact that previous research has already documented infants' sensitivity to perturbations in the sequence of the actions in the game (e.g., Rochat et al., 1999; Fantasia et al., 2014; Hsu et al., 2014), we hypothesized that infants would attempt to take up an active role at key points in the activity: The use of their behaviors at specific parts of the activity would evidence their sensitivity to the local structure of the game; and their use of different modalities at different parts of the game would evidence their more global recognition of the routine and the role-related tasks. Also, we hypothesized that this participation would increase longitudinally. Furthermore, we predicted that infants' participation would be moderated by the properties of mothers' scaffolding. With scaffolding, we refer to the mothers' way of structuring the activity. This was assessed mostly by the frequency of using specific game phases, although we also explored the duration of the phases as a further possible variable. More specifically, we hypothesized that the way in which mothers structure the activity (e.g., the game phases they use) would relate to the active role that the infants take up in the game. Finally, assuming the cumulative nature of development, we hypothesized that the scaffolding at an earlier age would predict infants' participation at a later time point.

# METHODS

# Participants

The data for the present analysis came from a sample of 20 Polish mother–infant dyads (see Szufnarowska and Rohlfing, 2014). We coded interactions of 19 dyads (11 boys and 8 girls) for this study. The data for one dyad was not available for both time points and could not be analyzed. Infants were 4 months old during the first visit (M = 126 days, SD = 8.79) and 6 months old during the second one (M = 186 days, SD = 9.63). Participants were recruited in the maternity ward of a hospital in Warsaw.

# Procedure

Data were collected in the families' homes. Mothers and infants were filmed at a temporal resolution of 25 fps with three HD cameras positioned on mountings (see **Figure 1**). Mothers were asked to place their infant on a table in a supine position and stand in front of her or him. A supine baby position has been shown to enhance mutual gaze (Fogel et al., 1993). The first camera was placed opposite where the mother was standing, filming from below and arranged to capture the mother's face and upper body. The second one was positioned behind the mother, more to one side and registering the infant's face and body from a higher position over her back. The third camera was located laterally on one of the sides, capturing the participants in profile and giving an image of the whole scene (see **Figure 1**). Sound was recorded through built-in microphones.

FIGURE 1 | Camera setup.

The cameras were set up at the beginning of the session. Mothers were asked to play with their infants as they normally do for 3 min, and subsequently to play peekaboo for as long as they wished. The aim of the free play was to familiarize the dyad with the new situation and especially with the cameras. After 3 min, the experimenter reentered the room, asked the mothers to play peekaboo, and left the room once again so as not to distract the dyad. Peekaboo is a social game known and played by Polish mothers (the main phrase "Peekaboo!" translates as "A-ku-ku!"). The mothers were told to play peekaboo any way they wanted to (see Szufnarowska and Rohlfing, 2014). This is a difference between the current study (Szufnarowska and Rohlfing, 2014) and previous studies in which parents were asked to play a rather strict form of peekaboo games (Rochat et al., 1999; Bigelow and Rochat, 2006). When the dyads finished, the mother called the experimenter back into the room.

# Data Analysis and Coding

We initially familiarized ourselves with the data through repeatedly viewing the videos and collecting single cases that we described qualitatively. This led to the development of coding categories that we then applied to the entire data corpus. To address our questions, we needed to focus on the structure of the peekaboo game and the ways in which (or the resources with which) the infants participated in the peekaboo game.

### Peekaboo Structure

As already mentioned in the introduction, the constituents of the game are the person hidden (mother or infant), the device for hiding (cloth or hands), the agent effecting the hiding, and the agent effecting reappearance. Ratner and Bruner (1978) and Bruner and Sherwood (1976) provide details on the structure of the peekaboo game that we used as an initial guide when viewing the data.

**Figure 2** presents the opening sequence of an interaction with a 4-month-old infant. At the beginning of the sequence, the infant is looking toward the side. The mother looms over the infant, touches him, and the infant turns his gaze toward the mother. It is only then that the mother lifts the cloth to cover herself. After uncovering her face, mother and infant resume contact with each other through mutual gaze.

We observed that some caregivers did not allow for variation (initial contact and reestablishment of contact), whereas others allowed for variation of this structure in, for example, the way they carried out the covering and uncovering of the infant.

**Figure 3** illustrates three consecutive appearances of the mother. Each time the mother varies the location from which her face reappears. Furthermore, variability could be introduced into the game by varying the duration of uncover from very fast and unexpected to very slow and extended as in the two following examples.

In **Figure 4**, the duration of the uncover phase is around 0.3 s. The mother drops the cloth, looming over the infant to reveal her face. A different case is illustrated in **Figure 5**. In this case, the mother has covered the infant with the cloth and is stretching the uncovering action, slowly pulling the cloth off the infant's face. Here, the uncover phase lasts more than 3 s.

FIGURE 2 | Basic "syntax" of the peekaboo game.

Further analysis revealed variability in the way the dyads structured the peekaboo. There were cases in which the main constituents were connected with each other through pauses (e.g., at the transition from hiding to reappearance); whereas there were other cases in which this was omitted. We named these intervals "waiting," in the sense that the mothers were waiting at transition points for the infants to take action, creating slots for infants to take their turn. Yet, mothers actively used these sequences in various ways, so as to engage the infant while their face was invisible to her or him as in the following examples.

In **Figure 6**, the mother accompanies the entire waiting phase with her verbal behavior, pretending she is looking for the infant because she is hidden by the cloth (see transcript below).

P02; 4 months old (01:44–01:46)

1 M: Nie ma nie ma nie ma Asi

There's No There's No There's No Asia

snapshots of the video presented. Arrows indicate the exact moment in time when the snapshots were taken.

In another case, the mother is holding up the cloth like a barrier/curtain between herself and the infant and she moves the cloth from left to right for the entire duration of the waiting phase, sustaining the infant's attention to the location of the mother's face while this is being hidden by the cloth (see **Figure 7**).

A further observation was that mothers sometimes clearly marked upcoming phases of the peekaboo in both their actions and their verbal behavior. In the example below, the mother has unfolded the cloth and is holding it on the infant's body. As illustrated in **Figure 8**, the mother lifts the cloth to an intermediate position and stops there. She accompanies the

lifting movement by saying "Uwaga," which can be translated as "attention," thus setting the stage for the next action of covering her face with the cloth. In other similar cases, the mother rearranged the infant's body or the cloth in her hands while asking for the infant's acknowledgment to continue by saying "Jeszcze raz?" (one more time?), or by explicitly announcing the next action by saying, for example, "Teraz mama zniknie co?" (now mummy will be gone, hm?).

We called these types of sequences "preparation" phases, because they somehow mark the upcoming phases of the game and potentially help infants anticipate them. Finally, some dyads inserted other sequences between phases such as tickling games. These differences in structure gave the impression of some peekaboo games being very fast and tightly structured, whereas others were more playful and loose. To account for these differences in the ways peekaboo games were structured, we extended the structure proposed by Ratner and Bruner (1978) and Bruner and Sherwood (1976) (see **Table 1**).

**Figure 9** exemplifies different potential structures of peekaboo rounds and how different round structures could provide different opportunities for the infant to participate by fitting her or his behavior to the format of the game. The first round at the top of **Figure 9** is a minimal round containing only the basic phases of peekaboo (as in Bruner and Sherwood, 1976). The orange line represents the junctures after which the next phase follows. There are two slots in this minimal round. This means there are two opportunities in which the infant could become active; either after the cover phase, in which the infant would uncover, or after the uncover phase in which the infant could initiate the acknowledgment. The rounds represented in the middle and bottom of the figure are examples of more varied peekaboo rounds. The one in the middle includes a waiting phase after the first juncture. In this case, the addition of this extra phase might prolong the time available for the infant to take an active turn, thus providing more opportunity for her or him. Finally, the round illustrated at the bottom contains an optional round both before the cover and after it. By embedding optional phases before and after the basic constituents of the game, the structure provides more opportunities to participate. It becomes clear that through the inclusion or omission of phases, many variations of the game are possible, both within a specific interaction as well as across multiple interactions and across participants.

Having defined the above types of peekaboo phases, we coded the entire data corpus using frame-to-frame coding of onset and offset of events with ELAN transcription software (Wittenburg et al., 2006). The structure of the peekaboo game was coded in terms of the phases of a single peekaboo sequence—which we call a round of peekaboo—in which one of the participants was covered (mother or infant). The phases of a peekaboo sequence were coded continuously in time and were mutually exclusive. The end of one phase is the beginning of the next. Initially we distinguished between full and short rounds, with a short round lacking the acknowledgment phase.

The total time of analyzed video material at 4 months was 73 min; at 6 months, 62 min. The average duration of the video recordings at 4 months was 3:49 min (SD = 2:30 min). The shortest recorded session was 1:30 min and the longest was 11:57 min. At 6 months, the average duration was 3:14 min (SD = 1:17 min). The shortest recorded session was 0:51 min and the longest was 6:40 min.

The total number of rounds played in the Peekaboo game was 925 (448 at 4 months, and 477 at 6 months). The average number of rounds played by the dyads at 4 months was 23.58 (SD = 14.36, min = 6, max = 62) and at 6 months it was 25.11 (SD = 12.84, min = 8, max = 64). **Figure 10** uses a scatterplot to summarize the above data relative to the session duration and number of rounds played by each dyad.

The distribution of the rounds played per dyad (see **Figure 10**) also illustrates the degree of variability in the way the peekaboo game was structured. Some dyads played the game for a short length of time, whereas others extended the game over longer periods. Also, some dyads played more rounds within a comparable amount of time than others, suggesting that rounds

FIGURE 8 | Detail from ELAN transcript. Highlighted in blue is a preparation interval. The letters (A,B) refer to the snapshots of the video presented. The arrows indicate the exact moment in time when the snapshots were taken.


were sometimes performed very quickly and other times at a slower tempo.

## Infant Behavior

Having described the structure of the game, the next step was to observe the ways in which infants participated in it, showing awareness of the game structure by taking an active role by behaving appropriately in the various phases of the game. More specifically, after the covering phase, the infant is required to uncover (either her/himself or the mother); whereas after the uncovering phase, the infant is required to initiate a new cover. Hence, different phases of the game require different actions from the infant. Such a behavior is illustrated in **Figure 11**: In **Figure 11A**, the mother has positioned the cloth on the infant's face and releases it from her hands. **Figures 11B,C** show how the infant then grasps the cloth and manages to pull it downward partially uncovering her face.

A very common observation was that infants often attempted to grasp and pull the cloth, but did not succeed in uncovering themselves on their own. In the case illustrated in **Figure 12**, we can see the infant attempt beginning in the waiting phase. In **Figure 12A** the infant is moving her hands toward the cloth, embracing it with open palms while the mother has her hands right on the cloth but is not acting on it in any way. In **Figure 12B**, which is toward the end of the waiting phase, the infant has grasped the cloth. The mother synchronously grasps the cloth preparing to pull it. In **Figure 12C**, the mother and infant together pull the cloth, the mother carefully supporting the infant's downward movement.

Another attempt is illustrated in **Figure 13**. This time, the infant attempts to uncover the mother's face. In **Figure 13A**, the mother leans forward to enable the infant to uncover her. As a response to the mother's looming motion, the infant stretches her arms and touches her mother's hand (**Figure 13B**). In **Figure 13C**, the infant reaches over the cloth to grasp it. At the same time, we can see the mother already starting to lift her head to uncover herself, assisting the uncover. In **Figure 13D**, the mother, while still holding the cloth, lowers her hand supporting the infant's downward movement and continuing to lift her head upward, she reveals her face.

Another infant behavior indicating an active role in the routine is the attempt to initiate or effectuate a new cover. In **Figure 14**, a 6-month-old infant pulls the cloth over his head. In this sequence, we can once more observe the fine scaffolding of the mother enabling the infant to succeed in the cover. In

**Figure 14A**, the infant extends his arms holding the cloth and he starts moving them backward. The mother facilitates this action by lifting the back side of the cloth (**Figure 14B**), and following the infant's lead, keeps the cloth raised until the infant rests his arms (and cloth) behind his head (**Figure 14D**). These

FIGURE 12 | Detail from ELAN transcript. The green box marks the waiting phase and the red box the uncovering phase of the peekaboo structure. Highlighted in blue is the infant's attempt to uncover. Panels (A–C) refer to the snapshots of the video presented. The arrows indicate the exact moment in time when the snapshots were taken. Data is from a 6-month-old infant.

observations led to the decision to include infants' attempts to grasp and move the cloth in the right direction in our analysis. An attempt both to uncover and to cover signals participation in the game, even when it is not successful.

Participation can also be signaled by other behaviors that the infants can manifest at specific phases. What seems to be particularly relevant is a smile after the uncovering phase (**Figure 15**) reflecting reestablishment of engagement with the caregiver (Bruner and Sherwood, 1976) or expectation (Szufnarowska and Rohlfing, 2014).

Furthermore, an increased level of vocalizations, appearing in the final phases of the game, also reinforces the reestablishment of engagement after reappearance (Ratner and Bruner, 1978). We then considered such behaviors as additional indices of infants' participation in the game.

The second level of coding, thus, involved the infant's actions in a peekaboo round. It includes infants' responses at key junctures of the activity (**Table 2**). Here again, the onset and offset of the various coding categories were coded in ELAN. Coding of infant vocalizations was carried out using PRAAT phonetics transcription software (Boersma and Weenik, 2010) and then imported into ELAN.

# Quantitative Data Analysis

The analytical strategy was (1) to characterize the structure of the peekaboo game, its variability as provided by the mothers,

infant's head. Panels (A–D) represent the phases of the infant initiation of cover.

FIGURE 15 | Mother and infant (4 months old) smiling during the acknowledgment phase. Panels (A,B) represent the development of the smile.

and the variability of infants' behavior quantitatively; (2) to relate infant behaviors to the phases occurring during the game in order to evidence their structuring by the routine; and (3) to use multivariate multiple regression models to check whether the general use and duration of any of the phases of the game was predictive of infant behaviors.

More specifically, in the first step, we calculated descriptive statistics to reveal the structure (the sequencing and duration of phases) of the peekaboo game. Mothers had considerable freedom in structuring and timing the game, and although some phases follow one another logically (e.g., uncovering after covering), they could repeat any of them or introduce some variability both in terms of sequencing and in terms of phase duration. For every round of the game, we registered the duration and sequence of the phases used and computed a frequency distribution of these sequences.

We then turned to infants' behaviors. Every action could occur at different points of the game and more than one time for each phase or round. For every phase in every round, we counted whether a specific behavior occurred at least once, and computed the percentage occurrence of an infant behavior divided by the number of each of the phases. To check whether a specific infant behavior is more likely to occur in a specific phase than in others, which would indicate a recognition of the structure of the game and an infant's active participation, and to check whether this recognition depends on age, we ran a repeated measures ANOVA (one for every behavior analyzed) on this data, using the phase of the game and the age of the infant as within-subjects factors.

We subsequently tested our hypotheses on the relationships among the properties of the reenacted routines and infants' behavior using a multivariate multiple regression model. The number of infant behaviors registered within a session was standardized for each session for every dyad by dividing it by the number of rounds played. The resulting standardized measures constituted the outcome variables of the regression model in which we checked whether they related to the standardized measure of the number of phases used by mothers during the game. The hypothesis was that the relative use of certain phases and their durations might scaffold agentivity. More specifically, the use of the preparation phase might mark the next step in the sequence, allowing infants to anticipate what will happen next. It provides room for an action after the uncover and before the next cover. Moreover, the use of the waiting phase in some way "freezes time." It stops the game until the infant acts by attempting to uncover her or his face. At the same time, it creates a "slot" for her or his behavior to take place, inviting the infant to act. The acknowledgment phase reestablishes contact between mother and infant and is the phase in which the joint pleasure of playing the game is manifested. In this phase, one would expect the infants to participate by using smiles and vocalizations. Finally, the topic change may provide for an extended period of released tension providing room for the infant to initiate the next peekaboo round. In addition to the relative use of the phases, we checked whether the duration of these phases has a scaffolding effect on infants' participation. Thus, the predictor variables considered in the fitted regression models were in one case, the ratios of the phases; and in another, the average duration of the phases. The outcome variables were the ratios of infants' behaviors.

# RESULTS

# Peekaboo Structure

**Table 3** presents the count of full rounds and phases standardized over the total number of rounds. **Figure 16** uses a bar chart to illustrate the sequencing structure of the rounds played by dyads. Being routinized social games, peekaboo games are certainly quite restricted in the possible sequencing or combination of phases used. This shows up in the frequency of the two most used sequences, which account for almost 90% of the total types of recorded sequences (N = 925). In these sequences, together with the three basic phases of the game—Covering (C), Uncovering (U), and Acknowledgment (A)—we always found the Waiting (W) phase in between. The only difference between them was the Preparation phase (P) that was either used or not used at the beginning of the game. This distribution did not differ between the two age groups.

Another observation concerns the distinction between basic and optional phases. The use of the basic phases of peekaboo was quite stable around a ratio of 1 (which means one phase per round of a peekaboo; see **Table 3**) and with little variation across dyads. At the same time, we found substantial variation in the optional phases: For example, the values for the preparation phase ranged from a minimum of 0.06 to over 1.46 indicating that while some mothers rarely included a preparation phase in a peekaboo round, other mothers used it more than once within a single round. The same holds for the topic change phase. An interesting observation is that the mothers showed little variation in their use of the waiting phase at 4 months (SD = 0.05), with both the minimum rate and maximum rate close to 1. Yet, at 6 months, there was more variation (SD = 0.13), with some dyads using it <50% of the time (i.e., omitting the use of this phase).

To further explore the way mothers shape the structure of the peekaboo routine, we analyzed the durations of the phases. **Figure 17** shows that the covering and uncovering phases were short (lasting around 500 ms) and basically invariant; other phases showed greater variability. The acknowledgment phase, although mothers used it consistently (see **Table 3**), showed duration variation across dyads. The range in both visits at different ages was quite large with some mothers spending as low as 1.5 s on acknowledgment and others up even to 5 or 7 s



TABLE 3 | Ratio of occurrence of phases over peekaboo rounds.

As can be seen in **Figure 17**, a comparison across ages revealed only a minimal difference in the duration of the covering and uncovering phases in the peekaboo games played with 4- and 6-month-old infants. This difference was larger for the other phases. Nevertheless, across the two time points, the average durations of the phase intervals did not differ significantly from each other for any of the phases apart from the duration of the acknowledgment phase [paired-groups t(18) = 2.59, p = 0.019]. This was shorter at 6 months (M = 2.6 s) than at 4 months (M = 3.3 s).

# Infant Behaviors

Next, we focused on the extent to which infants react and participate in the peekaboo game by counting the relevant behaviors recorded during the game. **Figure 18** presents the

FIGURE 17 | Boxplot of the dyads' averaged duration times (in seconds) for Peekaboo game phases across visits. The black diamonds overlaid on the boxplots indicate the phase mean duration. \**p* < 0.05. The black dots indicate outstanding observations.

average number and variance of infants' coded behaviors within each age group. For behaviors such as smiles and vocalizations, we noticed that already at 4 months of age, they were quite numerous and variably distributed in the various dyads. If we look at behaviors determined more specifically by the context of the peekaboo game, Attempts to uncover were frequent both at 4 and 6 months, whereas the Initiations of covering were indeed quite occasional in both age groups. We observed successful Covering and Uncovering even more rarely; the only exception being the successes in uncovering for infants at 6 months of age.

Given the very low number of Successes in uncovering and covering behaviors (see **Figure 18**), in all the following analyses, we collapsed Attempts and Successes (to cover and uncover) into the respective categories.

Another observation in these data was a quite systematic increase of activity with age as seen both in absolute values (see **Figure 18**) and in the frequency of coded infant behaviors standardized on the number of rounds played—that is, the ratios (see **Table 4**). We used separate t-tests on the ratios to evaluate our hypothesis that frequency of behaviors would be greater with age. For Attempts to uncover, t(18) = −3.1, p < 0.01, and for Smiles, t(18) = −2.07, p < 0.05, there was a significant age difference, whereas Initiation-of-Cover, t(18) = 0.49, p = 0.68, and Vocalizations, t(18) = 0.39, p = 0.64, did not differ in the two age groups.

Next, we investigated whether infants' behavior related to the particular phases of the peekaboo game. For this purpose, we computed the number of phases enacted by the dyad during the interaction that were accompanied by the various behaviors. For example, if during one interaction, we recorded 10 Preparation phases and the infant smiled in 8 of them, the incidence of Smile during Preparation would be 80%. In this way, we controlled for the variable number of enacted phases in a given interaction. **Table 5** shows these percentages averaged across all the dyads at 4 and 6 months. Given that the incidence of every behavior was computed on the total number of each of the phases enacted during the interaction (which varied across phases and dyads), the total of the cells, either in columns or in rows, does not sum up to 100%.

For each of the selected behaviors in infants (see **Table 5**), we ran a two-way repeated measures ANOVA with Phase (6 levels) and Age (2 levels) as the two within-subjects factors.

For Attempt to uncover, both main effects of Phase, F(5, 90) = 40.18, p < 0.001, and Age, F(1, 18) = 10.95, p < 0.01, as well as their interaction, F(5, 90) = 2.32, p < 0.05, were significant. Post-hoc analyses (with Bonferroni correction) clarified the nature of these effects: Attempts to uncover were concentrated TABLE 4 | Ratio of occurrence of infant behaviors over the peekaboo rounds.


clearly during the Waiting phase (significantly greater percentage than in all other phases; see **Table 5** and **Figure 19**), and the significant interaction effect was due to an increased occurrence of this behavior during the Waiting phase at 6 months, whereas in the other phases, there was no change in the incidence of this behavior between age groups.

Regarding Initiation of Covering, the ANOVA indicated only one statistically significant main effect of Phase, F(5, 90) = 3.55, p < 0.01, but no significant differences in post-hoc comparisons. This was probably due to the choice of a very conservative method for the family-wise control of the alpha level (Bonferroni correction).

We then considered the incidence of Smile in the various phases. The ANOVA indicated a significant main effect of Phase, F(5, 90) = 20.3, p < 0.001, and Age, F(5, 90) = 8.56, p < 0.01, but no significant interaction. The Bonferroni post-hoc analysis clarified that Smiles occurred significantly more often during Acknowledgment, Topic change, and Covering than in the other phases (see **Figure 19**), whereas they occurred significantly less often in Preparation than in Acknowledgment and significantly more often in Preparation than in Uncovering. Moreover, the same pattern was present at a significantly higher level when the infants were 6 months old.

When looking at Vocalizations, the ANOVA revealed a significant main effect of Phase, F(5, 90) = 21.11, p < 0.001, but no main effect of Age or any significant interaction. The posthoc analysis indicated that the effect derived from a difference between Covering and Uncovering phases and all the others (see **Figure 19**), with a significantly lower incidence of Vocalization in these two phases compared to the remaining four.

Taken together, results indicated that infants' behavior was organized by the structure of the peekaboo game, because Smiles, Vocalizations, and Attempts to uncover were found to occur significantly often at specific phases.

# The Relation of Maternal Play and Infants' Participation

To test the hypothesis that mothers' structuring of the game would scaffold infant participation, we explored the relation between the extent to which mothers used the various phases of the game (their frequency and their duration) and those infants'



*If more than one behavior of the same kind occurred within the same phase, only one was counted.*

behaviors that we identified as indices of participation in the game: Attempts to uncover, Initiation of Covering, Smiles, and Vocalizations.

To control for the varying length of dyads' interactions, we divided the number of times a phase was used by the number of rounds played by the dyad. We entered this ratio, computed for all the phases and age (4 and 6 months), into a multivariate multiple regression model as explanatory variables while using the standardized number of infants' behaviors as the dependent (or outcome) variables. Standardization was achieved, as above, by dividing the occurrence of infants' behavior within a certain interaction by the number of rounds played by the dyad—again, to control for the varying number of peekaboo rounds played within interaction.

The test for multivariate model comparisons yielded significant results for the Preparation phase, Pillai's V = 0.61, F(4, 27) = 10.56, p < 0.001, and for the Acknowledgment phase, Pillai's V = 0.29, F(4, 27) = 2.80, p < 0.05, suggesting that these two phases significantly explained the variation in infant behaviors. This means that there was indeed an effect of the predictor variables on the combined outcome of infant behaviors taken together (all four outcome variables at once), and that this was due specifically to the frequency with which the Preparation Phase and the Acknowledgment Phase were used during the interaction. In other words, increasing the frequency of these phases impacted significantly on infants' behavior overall. When we analyzed the outcome variables separately, to determine which of the outcome variables was affected by this effect, the multiple regression models were significant for Attempts to uncover, F(5, 32) = 7.34, p < 0.001, adjusted R <sup>2</sup> = 0.46; Smiles F(5, 32) = 2.77, p < 0.05, adjusted R <sup>2</sup> = 0.19; and Vocalizations, F(5, 32) = 3.84, p < 0.01, adjusted R <sup>2</sup> = 0.28. This finding suggests that some combinations of the phases of the game were related to these behaviors when considered singularly, and the relation was stronger in the case of the Attempts to uncover and Vocalizations, in which the explained variance was greater.

We then turned to check which coefficients in these models differed significantly from zero. In the model with the Attempt to uncover as outcome variable, the only significant coefficient was the one for the Preparation phase, b = 0.63, t(32) = 4,28, p < 0.001. Preparation was also the only significant predictor in the model with Smiles as the outcome, b = 0.96, t(32) = 2.55, p < 0.05, whereas both Preparation, b = 2.77, t(32) = 3,82, p < 0.001, and Acknowledgment, b = −5.63, t(32) = −2.25, p < 0.05, attained significance in the model for Vocalizations behaviors. This means that the presence of Preparation seemed to relate significantly to an increase in Attempts to uncover, Smiles, and infant Vocalizations. Additionally, the use of Acknowledgment seemed to be associated with an overall decrease in Vocalizations.

This model did not yield any effect of age. To explore more closely the relation between the way in which the game was structured at a given time point and infants' participation at a later time, we fitted a new multivariate regression model using the values for Preparation, Waiting, Acknowledgment, and Topic change phases at 4 months of age as potential predictors of infants' behavior at 6 months of age as outcome variables. According to this analysis, both Preparation, Pillai's V = 0.72, F(4, 11) = 7.19, p < 0.01, and Acknowledgment, Pillai's V = 0.71, F(4, 11) = 6.69, p < 0.01, contributed significantly to the multivariate model. However, individual analyses revealed that only one regression model attained significance, namely, the one with Attempts to uncover as the outcome variable, F(4, 14) = 5.96, p < 0.01, adjusted R <sup>2</sup> = 0.52: Here, the only significant coefficient, was for the Preparation phase, b = 0.31, t(14) = 3.78, p < 0.01. In other words, the use of the Preparation phase at 4 months seemed to be the main variable in maternal behavior predicting the frequency with which Attempts to uncover would be enacted by infants at 6 months.

Another multivariate regression model explored the possible relationship between the duration of the phases used in the game and infant behaviors. The outcome variables in this model were the same as above (e.g., the standardized number of behaviors over the total number of rounds in a session), whereas the predictors were the averaged time durations of all the phases in each dyad and session. In this case, however, the multivariate test for the model comparison did not yield significant results, and this did not justify additional univariate multiple regression analyses on each separate outcome variable. Hence, we did not see any clear relationship between the duration of phases and the frequency of infant behaviors.

# DISCUSSION

The study of human development offers a unique window on the emergence of joint activity formats. By looking at the ways in which infants learn to coordinate their actions in relation to other people we can observe how they come to grasp themselves as agents contributing to a joint goal of an interaction. In many studies, infants' ability to vocalize or make use of conventional means of communication is taken as an indication for their emerging active role in an interaction (Hsu et al., 2014). These include for example the use of vocalizations with specific phonological properties depending on whether the infants are playing games with parents or not. By showing that infants can regulate the types of vocalizations they use depending on interaction context, these studies demonstrate that they have grasped the different interaction structures and how they should behave in them. This, we argue, underestimates infants' early participatory behaviors. By analyzing behaviors which are rather advanced for infants in the first months after birth, existing studies might be making younger infants seem less capable. In our study, we therefore looked for further modalities of early infant participation that could indicate infants' emerging grasp of the structure of social interactions and their role. We examined this by assessing participation in social routines and its development longitudinally. Our goals were (1) to document the active role infants adopt so that their actions fit the routine format, (2) to characterize the properties of such routinized interactions that seem to facilitate emergent agentivity of the infant, and (3) to identify whether early in their development infants are engaging in the routine as a whole (orienting toward its global structure) rather than reacting to individual elements of it (acting at a local level). We also studied the caregivers' ways of shaping this joint activity as predictors of the development of active participation and possible origins of agentivity rather than focusing on the individual mental machinery that makes it possible. As the context for our investigation, we chose a repetitive action format that, as a simple rule-governed activity, enables infants to develop expectancies about the interaction and to display their participation early on. More specifically, we observed mothers and their infants playing peekaboo when the infants were 4 and 6 months old. Guided by existing research on early intersubjectivity and engagement (Markova and Legerstee, 2008; Reddy et al., 2013; Fantasia et al., 2014), we hypothesized that even young infants at the age of 4 months will attempt to take up an active turn in the key phases of the game, and that this will become evident in the modalities they use during particular phases of the game. Extending this research we provided longitudinal comparisons of early routine interactions and expected that this participation would increase with age (Ratner and Bruner, 1978) and that the mother will play a crucial role in scaffolding infants' active participation (Vygotsky, 1978). We therefore explored the relationship between the ways in which mothers structured the peekaboo game and infants' multimodal participation.

The initial exploration of the data provided insights into the multiple ways in which mothers structured the game, varying both the elements of the game, their frequency, as well as their duration. Moreover, infants participated in the game by employing multiple resources: They attempted and succeeded in covering and uncovering themselves or the mother, they smiled and showed excitement through body movement, and they also used vocalizations. From these initial qualitative observations, we developed a coding scheme and operationalized infants' participation as the use of certain behaviors at specific phases of the game: The behaviors included were their attempts to and successes in covering and uncovering themselves or the mother and the use of smile and vocalizations.

Regarding mothers' structuring of the peekaboo, we found little variation across dyads in the use of the obligatory phases of the game (Cover, Uncover, and Acknowledgment). At the same time we found substantial variation in some of the optional phases of the game (Preparation and Topic Change). Comparing the interactions at 4 and 6 months we found that mothers also showed variation in the use of Waiting phase, which at 4 months was used consistently but was more likely to be omitted at 6 months by some mothers. However, the durations of the phases did not change significantly from 4 to 6 months (with the exception of the Acknowledgment phase).

Regarding infant behaviors, we found that already at 4 months of age, all but one infant attempted to uncover during at least one peekaboo round. Successful uncovers were scarce (albeit existing). Attempts to uncover occurred significantly more during the Waiting phase. Interestingly, infants did not vocalize and smile equally across phases, but rather during some particular phases, suggesting a selective contribution according to the structure of the interaction. More specifically, smiles occurred mostly during Covering and Acknowledgment. The use of smiling during Covering could indicate some kind of anticipatory behavior. It could be that the infants recognize what is coming next, be it the tactile sensation of being covered by the cloth or the anticipation of the next phase of the game—namely, the uncovering of the face and engagement with the mother. Infants' use of smile during the Acknowledgment phase indicates their participation in what the Acknowledgment phase is for, namely reestablishing the visual connection with the mother and expressing one's enjoyment of playing the game or simply of being together. At this point we cannot disentangle whether the infant is smiling because of the specific phase of the game or due to the fact that she or he is imitating the mother. Our qualitative observation was that the mothers also smiled during this phase, and that their smiles may actually precede those of the infants. Future analyses should therefore focus on uncovering the fine temporal structuring of mothers' and infants' smiles. It could be that earlier in development the mother uses her smile to elicit a smile, which is a phase-proper behavior. At a later age smiles could appear without a smile from the mother (local cue) but because of the game (global structure). For the vocalizations, we found that they occurred less during Covering and Uncovering. This could be due to the duration of these phases, which were quite short. Yet, the fact that smiles did occur during these short intervals but vocalizations did not, suggests that the intervals were long enough for some reaction but this reaction was less likely to be a vocalization. This poses the question of whether infants chose to use one modality instead of the other. Moreover, the decreased use of vocalizations in these phases is in itself interesting, because it possibly suggests that infants might choose not to vocalize in transitional phases or phases with increased movement. These first results speak in favor of our hypothesis that infants show active participation in the routine by regulating their behavior according to the structure of the game. Furthermore, they point to the fact that the infant may not be locally reacting to the previous behavior of the mother but is sensitive to the fact that different phases of the game require different behaviors. This, we suggest, could be evidence of a more global understanding of the structure of the game as a whole.

Comparing across the two data points, infants' attempts to uncover and smiles increased significantly, lending support to the hypothesis that infants' participation increases as they become familiar with the rules of the game. Concerning vocalizations, we found no difference between the interactions when the infants were 4 and 6 months old.

More crucially, the mothers' way of organizing the game related to infants' behavior. Here, we found that across ages, the use of the Preparation and the Acknowledgment phase significantly predicted the variance in all infants' behaviors. More specifically, the use of the Preparation phase related positively to infants' smiles and attempts to uncover. Also, the use of the Preparation phase together with the phase of Acknowledgment explained a significant portion of variance in infant vocalizing. In this case, the use of the Preparation phase related positively to infants' vocalizations, whereas Acknowledgment made a negative contribution, indicating that the use of this phase was associated with decreased infant vocalizations.

When analyzing the phases at the infants' age of 4 months as potential predictors of infants' behaviors at 6 months, we found that the frequency of occurrence of the Preparation phase was predictive of infants' attempts to uncover. No other regression model involving other phases and infant behaviors attained significance. Finally, analyses exploring the possible relationship between the duration of the phases used in the game and infant behaviors did not yield any significant results.

The results of the multivariate multiple regression models suggest that the structure of peekaboo can be viewed as a kind of scaffold enabling infants to participate. The preparation phases, for example, included mothers' preparing the setting for the cover such as sorting the cloth in their hands or bringing the cloth in position to cover and stopping there. It was this phase that differentiated the two most frequent sequences of peekaboo. Preparation phases can be thought of as initiations of presequences (Schegloff, 1968, 2007). Filipi (2009, p. 3) proposes that their function is to create the conditions for the "entry" of paired actions and to project further action (Schegloff, 1968, 2007). With respect to the structure of peekaboo, Preparation phases project the covering phases that follow. The evidence presented here suggests that the structuring of activities can foster infants' participation in them. Some further evidence on the relevance of structure, has been presented recently by Fantasia et al. (unpublished) who found weaker sequential structuring of early interactions in mothers diagnosed with postpartum depression. Similar to our results with young infants, Hodapp et al. (1984) described mothers' scaffolding behaviors and their effectiveness in early social games for 8- to 14-month-old infants. They reported that in early stages in which the infants had not yet mastered the game, mothers used "attention-getting" and "stagesetting" scaffolds to facilitate play. This behavior has also been observed in other settings such as book reading. Rossmanith et al. (2014) reported on mothers' use of "action arcs," that is, ways of building up tension at key junctures during the bookreading activity such as just before turning the page. Similarly, Zukow-Goldring (1996, p. 220) in what she called "attentiongathering" interactions, presented an account of how caregivers attract infants' attention to subsequently direct it toward the perceptual structure they have selected. This includes preparatory actions such as an "inbreathe" (Zukow-Goldring, 1997, p. 229; Nomikou, 2015; Heller and Rohlfing, 2017) but also behaviors marking completion of actions, goals, or intermediate action steps (Meyer et al., 2011; Nomikou and Rohlfing, 2011). In our qualitative observations, we did find cases corroborating the findings of these studies. Also, the finding that the Preparation phase at 4 months predicted infants' attempts to uncover at 6 months not only provides additional evidence for the scaffolding role of pre-sequences but also links to the role of long-term timescales of recurring interactions and how these might shape both current and later development (Nomikou, 2015). Yet, it is also conceivable that the use of the preparation phases was regulated by the infants' behavior. For example, it could be the case that the mothers chose to use preparation phases to attract infants' attention to the game before covering when infants were losing interest in the game. We would need to further expand our existing qualitative analyses and quantitative measures to investigate this interactional loop in more detail.

For the phase of Acknowledgment, it is possible that this part of the game concerns the management of child's consolidation processes. It complements the function of the Preparation phase that addresses the child's attention and perception. Ratner and Bruner (1978) considered this phase of reestablishing contact in what they called phatic stages of the game to be essential for it to be called a peekaboo game. Our findings suggest that during acknowledgment, mother and infant share the experience of playing the game, confirming to each other that they are involved in it; and this confirmation supports infants' agentivity. Alternatively, the acknowledgment phase might also be important for the emotional exchange to establish social attunement (Markova and Legerstee, 2008; Rossmanith et al., 2014). Nonetheless, our findings suggest a negative relationship between the Acknowledgment phase and infants' vocalizations. This relationship is somewhat puzzling. Looking at the descriptive data on the use of this phase, it is striking to see that although all mothers used this phase very regularly, some mothers used it in a much more exaggerated manner (i.e., used more than one acknowledgment phase within a single peekaboo round). There are two possible explanations of this finding: One possibility could be that these mothers might be potentially trying to elicit a response from their infants. In line with the idea of an interactive loop, it is possible that mothers might be trying to create an experience of social attunement to elicit a vocalization if their infant is not vocalizing very often. Another possibility could be that the exaggerated use of the Acknowledgment phase might cause increased verbal behavior on behalf of the mother. In concert with recent work on the development of infants' sensitivity to turn-taking (Gratier et al., 2015) and complementary roles in vocalization (Leonardi et al., 2017) this could lead to infants not vocalizing to avoid overlap with the mother.

Overall, our study not only demonstrated early signs of active participation as possible origins of agentivity in game routines but also attempted to pinpoint some characteristics of the way routines are enacted that might facilitate agentivity. Our findings suggest that the use of some of the phases of the game was indeed related to a higher probability of displaying active behavior on behalf of the infants. For example, the Preparation phase, which is optional to the game itself, turned out to be associated strongly with infants' participation in the game. Likewise, the Acknowledgment phase, which may function on an emotional level, may be used in an attempt to facilitate the way in which interactional practices "draw infants from birth into forms of responsible corporeal engagement" (Takada, 2012; p. 76). When interpreting our findings, we further suggested that infants are probably also playing an active role in regulating the structure provided by their mothers. This speaks to the bidirectionality of the forces shaping interaction. Further research needs to analyze what kind of coordination within which phases of the game is necessary to shape successful game participation.

Most importantly, we can use the research presented here to derive the suggestion that active participation, and thus agentivity, in game routines can be scaffolded by caregivers preparing and acknowledging a peekaboo round. This example contributes to the idea that infants' actions (and development) are embedded in and also generated by the context of caregiving. Expanding from the very constrained setting of peekaboo which we observed, our study could further contribute to understanding how this early form of agentivity could relate to later intersubjective forms such as joint engagement in routines which transcend the here-and-now of the dyad. Although we cannot answer this question directly, our data propose a continuity account for the ways in which more mature forms of agentivity could develop out of repetitive interactions in which an infant first emerges as an agent in relation to others.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Ethics committee of the University of Muenster, Germany with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Ethics committee of the University of Muenster.

# AUTHOR CONTRIBUTIONS

IN prepared the initial draft. KR supervised the data aquisition. IN, AR, KR, and JR developed the coding schema. IN and

# REFERENCES


Bruner, J. S. (1983). Child's Talk: Learning to Use Language. New York, NY: Norton.


AR coded and supervised the data coding. IN provided microanalytical results while GL together with IN conducted the statistical analyses. All authors analyzed, interpreted the data and wrote the manuscript.

# ACKNOWLEDGMENTS

This work was funded by the NCN-DFG collaborative Beethoven 2119 project EASE (2014/15/G/HS1/04536). We would like to express a big thank to our participants as well as Joanna Szufnarowska for data acquisition, and our research assistants (Monique Koke, Janina Werner, Sam Cosper, Bettina Wagner and Urszula Kalinowska-Drozd) for coding the data.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Nomikou, Leonardi, Radkowska, R˛aczaszek-Leonardi and Rohlfing. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Interpersonal Movement Synchrony Responds to High- and Low-Level Conversational Constraints

Alexandra Paxton1, 2 \* and Rick Dale<sup>3</sup>

*1 Institute of Cognitive and Brain Sciences, University of California, Berkeley, Berkeley, CA, United States, <sup>2</sup> Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, United States, <sup>3</sup> Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States*

Much work on communication and joint action conceptualizes interaction as a dynamical system. Under this view, dynamic properties of interaction should be shaped by the context in which the interaction is taking place. Here we explore *interpersonal movement coordination* or *synchrony*—the degree to which individuals move in similar ways over time—as one such context-sensitive property. Studies of coordination have typically investigated how these dynamics are influenced by either *high-level constraints* (i.e., slow-changing factors) or *low-level constraints* (i.e., fast-changing factors like movement). Focusing on nonverbal communication behaviors during naturalistic conversation, we analyzed how interacting participants' head movement dynamics were shaped simultaneously by high-level constraints (i.e., conversation type; friendly conversations vs. arguments) and low-level constraints (i.e., perceptual stimuli; non-informative visual stimuli vs. informative visual stimuli). We found that high- and low-level constraints interacted non-additively to affect interpersonal movement dynamics, highlighting the context sensitivity of interaction and supporting the view of joint action as a complex adaptive system.

Keywords: interpersonal coordination, synchrony, joint action, conversation, movement dynamics, crossrecurrence quantification analysis, working memory, dual-task performance

# 1. INTRODUCTION

Human interaction is a complex and dynamic process. From the subtle modulation of speech to the dynamic displacement of the body in posture or gesture, humans must fluidly organize behavior in time across multiple modalities to interact effectively with one another. Contributing to the ongoing debate about the underlying mechanisms of interpersonal processes (for reviews, see Brennan et al., 2010; Dale et al., 2013; Barr, 2014; Paxton et al., 2016), we here build on previous work (Paxton et al., 2016) to propose that context is critical for understanding how interaction unfolds. By using advances in wearable technology (Paxton et al., 2015) to manipulate task parameters during an interactive experiment, we explore the influence of context on dynamics of body movement during conversation and turn to a particular theoretical framework to help understand it: dynamical systems theory (DST).

From biomes to hurricanes, many physical and biological systems are recognized as complex dynamical systems. These systems exhibit what are called emergent properties—that is, characteristic behaviors that emerge not by instructions from some top-down controller but as a function of local

#### Edited by:

*Hanne De Jaegher, University of the Basque Country, Spain*

#### Reviewed by:

*Wolfgang Tschacher, University of Bern, Switzerland Hannes Matuschek, University of Potsdam, Germany*

#### \*Correspondence:

*Alexandra Paxton paxton.alexandra@gmail.com*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *05 October 2016* Accepted: *21 June 2017* Published: *28 July 2017*

#### Citation:

*Paxton A and Dale R (2017) Interpersonal Movement Synchrony Responds to High- and Low-Level Conversational Constraints. Front. Psychol. 8:1135. doi: 10.3389/fpsyg.2017.01135* interactions among the component parts within given contextual pressures. A famous example of this is the so-called "butterfly effect." This principle suggests that subtle factors in a present context may cascade into larger effects, which themselves serve as a context that constrains ongoing behavior (e.g., Lorenz, 1963). While it began in the physical and mathematical sciences, DST has become a powerful lens for understanding human behavior and cognition as well (Barton, 1994; Mathews et al., 1999).

DST—along with other complexity sciences (cf. Mathews et al., 1999)—provides a conceptual and analytic framework to capture the context-sensitive, soft-assembled, emergent properties of cognitive, behavioral, and affective phenomena. Though its influence is still growing in psychology more broadly, DST principles and analyses have led to novel insights into such phenomena as reading (e.g., Van Orden and Goldinger, 1994), gaze (e.g., Engbert et al., 2005), cultural evolution (e.g., Kenrick et al., 2003), general cognitive function (e.g., Van Orden et al., 2003), and more. DST—and, more specifically, a branch called synergetics (Haken, 1977)—has significantly influenced the understanding of self-organizing principles in cognition (e.g., Haken, 1990; Stadler and Kruse, 1990; Haken and Portugali, 1996).

Increasingly, cognitive scientists interested in social phenomena are recognizing the value of DST to understanding human interaction (e.g., Vallacher et al., 2002; Coleman et al., 2007). Within this area, DST may be uniquely equipped to explore interpersonal coordination—the idea that individuals influence one another's behavior, cognition, and emotion as a result of their interaction. By shifting analysis away from the individual and conceptualizing the dyad as the focus of analysis, we can begin to explore the behavioral, cognitive, and emotional dynamics that emerge from the contextual pressures constraining the dyadic system—like the specific task or type of conversation in which the dyad is engaging.

Interpersonal coordination has been an increasingly influential way to capture interpersonal dynamics over the last few decades (Condon and Sander, 1974). This phenomenon has been studied under a variety of names—like accommodation, alignment, the "chameleon effect," contagion, coordination, coupling, mimicry, synchrony, and synergy<sup>1</sup> . Interestingly, the idea that coordination and other behaviors are adaptive in the DST sense extends to even some of the earliest works in this domain (Sander, 1975).

Within interpersonal coordination research, the interpersonal synergies perspective has perhaps the strongest connection to DST ideas (Riley et al., 2011). Historically, most work on interpersonal coordination has tended to be characterized by what we have called a "more is better" perspective (see Abney et al., 2015). This perspective holds that individuals tend to become more similar over time as a result of their interaction and that this increased similarity tends to be better for a variety of interaction outcomes (e.g., Pickering and Garrod, 2004).

However, the interpersonal synergies perspective posits that interpersonal dynamics are fundamentally shaped by a variety of factors that exert pressure on the interpersonal system. Under this view, interacting participants will not necessarily become uniformly more similar over time. Instead, different contextual factors—like interactants' relationship, goals, physical or perceptual environment, affordances (in the Gibsonian sense; e.g., Gibson et al., 1999) and conversation type—will lead to different configurations of behavioral channels (e.g., Fusaroli and Tylén, 2016).

Inspired by research on DST, we have elsewhere proposed a classification system for different components of an interaction (Paxton et al., 2016), dividing the influences on communication dynamics into top-level and bottom-level systems. Top-level systems function at a lower frequency, change over longer timescales, and tend to have fewer degrees of freedom; bottomlevel systems, by contrast, function at a higher frequency, can change over very short timescales, and tend to have more degrees of freedom<sup>2</sup> . Examples of top-level systems would include conversational contexts and interpersonal relationships; bottomlevel systems would include body movement or phonetics.

Studies of coordination often focus on only one of these systems at a time—like how coordination influences rapport (e.g., Hove and Risen, 2009) or how perceptual information influences coordination (e.g., Richardson et al., 2007a). In this paper, we explore how simultaneous constraints on both systems influence coordination: high-level contextual constraints (i.e., those affecting the overarching top-level systems) and low-level contextual constraints (i.e., those affecting the rapidly changing bottom-level systems).

Approaching nonverbal social behavior during conversation from the synergies perspective, the present study focuses on how high- and low-level contextual constraints can change interpersonal coordination over time in naturalistic interaction. Specifically, we explored how conversation type—whether argument or a friendly conversation—and perceptual information—either informative or noisy perceptual signals3—altered coordination of interacting participants' head movements. We proposed four hypotheses, guided by previous findings.

Keeping with our earlier work (Paxton and Dale, 2013a,b), we use "coordination" as a general term for the idea that individuals affect one another's behavior over time as a result of their interaction. We use "synchrony" as a specific case of coordination: Interacting individuals are synchronized to the extent that they tend to exhibit the same behavior at the same time. Although we do not explore time-locked phase synchrony here (cf. Richardson et al., 2007a), we use time series analyses to quantify whether interacting individuals generally tend to behave similarly in time.

**41**

<sup>1</sup> It is outside the scope of the current article to outline the differences in these terms. For more on terminology within this domain, see Paxton and Dale (2013b) and Paxton et al. (2016).

<sup>2</sup>We recognize that this "top" vs. "bottom" categorization is a simplification, as it likely approximates a spectrum of spatial or temporal scales; we nevertheless feel this organizing scheme is useful for emphasizing the differing role of either end of this spectrum.

<sup>3</sup>By "informative" we mean to say simply that participants must attend to the stimulus for a secondary task. We do not mean that the stimulus will be informative for the conversation.

# H1: Overall, head movement will be synchronized.

Previous work suggests that interacting individuals' gross body movements (Nagaoka and Komori, 2008; Paxton and Dale, 2013a) and head movements specifically (Ramseyer and Tschacher, 2014; Paxton et al., 2015) become more similar over time as a result of their interaction. Therefore, we expect that we will find that participants' head movement will be synchronized. That is, we anticipate that participants will be more likely to move (or not move) their heads at the same time than not.

H2: Dynamics of nonverbal communication signals will be sensitive to conversation type (as a high-level contextual constraint).

H2A: Argument—compared to friendly conversation—will decrease head movement synchrony.

Mounting evidence suggests that coordination dynamics are sensitive to high-level contextual perturbations (Miles et al., 2011), including conversation type (Paxton and Dale, 2013a; Abney et al., 2014; Main et al., 2016). Despite some exceptions for example, when analyzing gaze patterns (Paxton et al., under review) and when discussing assigned (rather than personally held) beliefs (Tschacher et al., 2014)—conflict has been found to decrease interpersonal synchrony (Paxton and Dale, 2013a; Abney et al., 2014). We therefore expect to find some difference in movement synchrony between the two conversation types (H2); directionally, we expect that argument will decrease synchrony (H2A).

H3: Dynamics of nonverbal communication signals will be sensitive to perceptual information (as a low-level contextual constraint).

H3A: Changing visual information interpreted as noise—rather than a meaningful signal to be remembered—will increase head movement synchrony.

Low-level contextual constraints—like perceptual information have been relatively less studied in coordination research. This may have been due to limitations in previous experiment tools: Any perturbations to the dyadic system have had to expose both participants in a naturalistic, face-to-face interaction to the same environmental stimulus. A previous study found that holding a conversation over loud ambient noise—as compared with an otherwise silent room—led to an increase in head movement synchrony (Boker et al., 2002). This supports the idea that interpersonal coordination may serve to boost the "signal" in communication within the "noise" of the environment (Richardson and Dale, 2005; Shockley et al., 2009).

Although the concept of "information" has a variety of meanings within cognitive science, we here simply mean that the signal is imbued by the participant as having relevance to some task. For the present study, this is not a signal that is relevant to the conversation itself but to another memory task. It is contrasted with signals in the environment that are not directly relevant to any task at hand—signals that we may call "noise." Crucially, in the current study, both sets of stimuli are otherwise identical, allowing us to disentangle the effects of the stimulus itself and the information imbued in the signal by the interlocutor.

The current study extends previous work to see whether visual "noise" can serve the same function as auditory noise—boosting synchrony and, possibly, comprehension. We hypothesize that nonverbal communication signals will respond to low-level contextual constraints or perturbations (H3). Directionally, we expect that noise will increase synchrony (H3A).

H4: Dynamics of nonverbal communication signals will be non-additively sensitive to conversation type (as a high-level contextual constraint) and to perceptual information (as a low-level contextual constraint).

While previous studies have focused on the effects of either highlevel constraints (e.g., Miles et al., 2011; Paxton and Dale, 2013a; Abney et al., 2014; Main et al., 2016) or low-level constraints (e.g., Boker et al., 2002; Richardson et al., 2007a), we are unaware of any studies to date that have combined the two. We see our work as providing a vital step in the exploration of interaction and coordination under the DST perspective: If communication is a dynamical system, we would expect to see that behavior is context sensitive and does not uniformly react to all constraints (cf. Riley et al., 2011; Paxton et al., 2016). Therefore, we hypothesize that head movement synchrony will be non-additively sensitive to both high- and low-level constraints; however, as the first such study of these simultaneous dynamics (of which we are aware), we do not have a directional hypothesis.

# 2. METHODS

# 2.1. Participants

Forty-two undergraduate students from the University of California, Merced participated as 21 dyads. Dyads were created as participants individually signed up for experiment appointments per their own availability through the online subject pool system. Each participant received course credit in return for participation. By chance, dyads included some pairs of women (n = 9; 43%), some pairs of men (n = 3; 14%), and mixed-gender pairs (n = 9; 43%) according to participants' selfreported gender identities. Participants in 2 dyads reported being acquainted with one another prior to the experiment (10%).

Additional dyads—not included in the counts above participated but were not analyzed here. Two (2) additional dyads were excluded due to lack of conflict in the argumentative conversation, as we have done in previous work using a similar paradigm (Paxton and Dale, 2013a).

We also experienced technical difficulties with the servers running our data collection program for a number of additional dyads. In order to be included in the present analysis, each participant in the dyad must have had recorded movement data for at least 4.5 min (including the calibration period; see Section 2.3) of each of the two conversations described in Section 2.2. In an additional 21 dyads (not included in the counts above), the server failed to record the minimum 4.5 min of movement data for at least 1 of the 2 participants in at least 1 of the 2 conversations. This occurred because the program used to run the data collection software prioritizes fidelity of the connection to the data collection server above all (see Paxton et al., 2015); any perturbation of that connection causes the program to be terminated. However, until the point of termination, data were continuously and regularly sampled.

For example, assume that participants A and B are participating in the experiment. For the first 3 min, the movements of participants A and B are sampled regularly (according to Section 2.2). At minute 4 of the first conversation, participant A's connection to the server is perturbed, causing the server to disconnect participant A's movement tracker. Participant A's regularly sampled data for the first 3 min are saved, but no further data for participant A are recorded, although the conversation continues as usual. Participant B's tracker, however, remains connected to the server, and after being regularly sampled for the rest of the 8-min conversation, participant B's data are saved. Even if all 8 min of movement data were successfully saved for both participants in the second conversation, this dyad would be excluded from our analysis. Although participant A has an unbroken 3-min movement time series from the first conversation and an unbroken 8-min movement time series from the second conversation, this dyad would not have the minimum 4.5 min of movement data for both participants in both conversations.

Although this prioritization led us to discard a number of dyads due to insufficient data, it also allowed us to ensure that the behavior of the included dyads were continuously and regularly sampled during the experiment—leading to very few missing samples in the included dyads. We chose this cutoff prior to analysis and did not explore other thresholds for inclusion.

# 2.2. Materials and Procedure

This study was carried out in accordance with the recommendations of Institutional Review Board of the University of California, Merced, with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Institutional Review Board of the University of California, Merced.

As noted below (see Section 2.2.2), the informed consent process did not give any foreknowledge of the specific phenomenon (i.e., similarity of movement), conversation prompts or topics, nor the hypotheses of the study. Data were collected by research assistants blind to study hypotheses for 20 of the 21 dyads; data for the remaining (1) dyad was collected by the first author. The first author also assisted in data collection for 4 dyads.

#### 2.2.1. Experiment Design

The experiment had one within-dyads and one betweendyads element. Conversation type was a within-dyads condition and was adapted from Paxton and Dale (2013a): Each dyad had one argumentative conversation and one affiliative conversation. Conversation order was counterbalanced (randomly assigned) to prevent order effects. For the betweendyads condition, each dyad was randomly assigned to a "noise" (n = 9; 43%) or a "dual-task" (n = 12; 57%) condition<sup>4</sup> . Both conditions are described in greater detail below.

## 2.2.2. Data Collection

Upon arriving, participants were separated and led to private (semi-enclosed) areas with desks within the lab. Each was then given a series of questionnaires, including a sociopolitical opinion questionnaire. The opinion questionnaire neutrally inquired about the participant's opinion on a variety of issues (e.g., abortion, death penalty, marriage equality<sup>5</sup> , whether Spanish should be an official U.S. language, whether student loans should be partially forgiven by the U.S. government). The participant responded to each question in a brief, open-ended response area and by indicating opinion strength on a 1 (feel very weakly) to 4 (feel very strongly) scale.

After both participants completed the questionnaires, they were brought together in a small, private space. Participants were seated facing one another in stationary chairs approximately 0.97 m (3.17 feet) apart (measured at the front legs). Participants were told that they would be having "two conversations for about 8 min each" for a study "about how people hold conversations," but no information about the nature or emotional valence of the prompts was given. (If asked, participants were told that they would be given the conversation topic immediately before beginning each conversation.) The experimenter then told the participants to take a few minutes to introduce themselves to one another while the experimenter stepped outside of the room to complete some last-minute paperwork before beginning the experiment.

The experimenter then left the room for approximately 3 min. Unknown to the participants, the experimenter spent this time comparing the two participants' opinion surveys to identify up to 3 topics for which participants (a) wrote the most differing opinions and (b) indicated the strongest opinions. We refer to these as "candidate argumentative topics" below.

After 3 min, the experimenter re-entered the room and gave each participant a Google Glass (Alphabet, Inc.), a piece of wearable technology worn like glasses that features a small quartz screen over the wearer's right eye and an on-board processor on the wearer's right temple. The experimenter then explained the device to the participants, adjusted the Glass (as necessary) to fit each participant, and tested to ensure that each participant could fully see the screen. (For complete fitting procedure, see Paxton et al., 2015.) Participants were reminded that they would be having "a couple of conversations about different topics" and that they would "[be given] the topic for each conversation right before [they] start." They were told that the Google Glass would be "recording information about the conversation," but the nature of the recorded information was not described in

<sup>4</sup>A subset of the affiliative conversations across both between-dyads conditions served as a brief proof-of-concept study in an earlier methodological paper (Paxton et al., 2015). Only the affiliative conversation of dyads who were assigned to the affiliative-first conversation order were included in that analysis. <sup>5</sup>Described to the participants as "gay and lesbian marriage."

detail to avoid drawing participants' attention to their head movements<sup>6</sup> .

Each Google Glass ran the PsyGlass program (Paxton et al., 2015). Once initialized, each participant's PsyGlass program randomly generated a screen color at 1Hz (i.e., 1 color per s), with a 0.9 probability of generating a blue screen and a 0.1 probability of generating a red screen. (If color<sup>t</sup> were the same as color<sup>t</sup> <sup>+</sup> <sup>1</sup>, the screen did not flicker or otherwise indicate that it was refreshing). PsyGlass also recorded participants' head movements by measuring the three-dimensional accelerometer data (i.e., x, y, z axes, with the origin point set as the position at the time of initialization) at 250 Hz (i.e., 1 sample per 4 ms), which were transmitted to and stored on the experiment server at 4 Hz (i.e., 1 transmission per 250 ms). All code for the program can be downloaded from the GitHub (GitHub, Inc.; http://www.github. com) repository for PsyGlass (http://www.github.com/a-paxton/ PsyGlass).

#### 2.2.3. Task Condition (Between-Dyads)

Before initializing PsyGlass on each participant's device, both participants were given instructions about their between-dyads condition. All dyads—across both task conditions—were exposed to the same stimuli through PsyGlass, the same red-and-blue screens. The two conditions differed only in the interpretation or significance of the colored screens. In the noise condition, the dyad was told that the flashing screens were a result of a bug in the program and that participants should have conversations as normal. In the dual-task condition, the dyad was told to remember the number of times that the screen turned red while having their conversation and that they would be asked to write that number down after the conversation was finished (similar to the oddball paradigm; Squires et al., 1975). After answering any participant questions, the PsyGlass program was initialized.

## 2.2.4. Conversation Type (Within-Dyads)

Again, each dyad held 2, 8-min conversations—one affiliative conversation and one argumentative conversation. In both cases, participants were instructed to stay on the assigned topics or on topics very similar to the assigned topic. After assigning each prompt, the experimenter remained seated behind a computer outside of the participants' immediate peripheral vision, surreptitiously monitoring the conversation.

The affiliative prompt was identical across all dyads, asking them to discuss media that they both enjoyed, find something that they both enjoyed, and talk about why they liked it. The goal of the affiliative prompt was to emphasize similarity and engender rapport between participants.

The argumentative prompt relied on the candidate argumentative topics identified from the opinion surveys. The prompt asked participants to discuss their views on the top-rated candidate topic (again using neutral phrasing) and asked participants to "try to convince one another of [their] opinions." If the conversation stopped altogether or shifted away from being argumentative in nature (e.g., if both participants came to a consensus), the next highest-rated candidate topic was assigned. If the second candidate topic again failed to produce sustained argumentative conversation, the third candidate topic was assigned.

After initializing PsyGlass for the first time, the prompt for the first (randomly assigned) conversation type was given. After 8 min of conversation, the experimenter informed the participants that their conversation was over, and PsyGlass was terminated. Participants then removed their Google Glass and were led to their private desks to complete two brief questionnaires about the conversation (not analyzed here), including—for dual-task condition dyads—the number of times they had seen a red screen. Once both participants had completed the questionnaires, participants were brought back to the joint space and re-fitted with the Google Glass. After ensuring that both participants could again see the entire screen, PsyGlass was initialized, and the remaining prompt was given.

Participants were not given any foreknowledge about the topics or type of conversation before being assigned the relevant prompt. That is, if participants were assigned to have the affiliative conversation first, they had no knowledge that their second conversation would have an argumentative prompt; the same applied if the participants had had the argumentative conversation first.

# 2.3. Data Preparation

Each participant produced one movement time series for each conversation. The time series captured timestamped accelerometer values along x, y, z axes. After applying an anti-aliasing zero-phase fourth-order Butterworth filter, we downsampled the data to 10 Hz, a sampling rate similar to those utilized in our previous movement coordination work (Paxton and Dale, 2013a; Abney et al., 2015; Paxton et al., 2015). We transformed these three-dimensional values into a single value for acceleration at each time point by taking the threedimensional Euclidean distance of the time series. We then applied a smoothing zero-phase second-order Butterworth filter to the acceleration signal for each participant.

We then trimmed the movement data to remove the time between the PsyGlass initialization and the beginning of the conversation data. Immediately before beginning each conversation (i.e., after having been given the appropriate prompt), participants were asked to produce a brief bout of high-velocity head movement (nodding and shaking their heads rapidly). This was done under the guise of "initializing the program" but was used as a marker for the beginning of the conversation data.

Because each dyad took 60–120 s to test PsyGlass and hear the conversation prompt, we used derivatives of acceleration to identify the latest moment of intense movement by both participants during that window. We explored both acceleration's first-order (jerk) and second-order (jounce) derivatives to identify possible markers. The cutoff points identified by jerk and jounce were significantly correlated, r = 0.62, t(40) = 4.94, p < 0.0001. However, because jounce produced more conservative (i.e., later) estimates of cutoff times, we used jounce.

<sup>6</sup>Participants were also video- and audio-recorded with separate equipment, but this information is not under consideration in the current paper.

Cutoff times for each conversation were created at the dyad level. For each participant in each dyad, we identified the time of largest jounce in the first 60–120 s of the conversation. We then chose the more conservative (i.e., later) cutoff point of the two participants, which we applied as both participants' cutoff points.

Finally, we truncated both participants' time series in each conversation to the shorter of the two lengths, if they were not already identical (e.g., due to server failure). Because PsyGlass initializes data collection simultaneously for both participants (Paxton et al., 2015), we did not need to time-align the beginning of the conversation.

After applying these cutoffs, conversations had an average of 6.54 min (range = 2.62–9.26 min) of recorded movement data. It is important to emphasize that dyads completed the full experimental conditions even when they did not have complete movement records: Connectivity issues with the experimental server only resulted in a failure to record the movement time series after the perturbation to the data collection server occurred. We find it important to note because if participants had experienced different experimental conditions (e.g., if some had held only a 5-min first conversation while others had held an 8-min first conversation), we could not infer that the intended manipulations (i.e., conversation type and task condition) were the cause of any effects, rather than any of the unintended conditions (e.g., shorter conversations).

# 2.4. Data Analysis

We measured coordination by combining cross-recurrence quantification analysis (CRQA) and growth curve analysis (GCA). This combination allows us to quantify the amount of moment-to-moment coordination occurring between interacting participants, along with longer-scale trends. We describe these techniques briefly below, but a more detailed explanation of the benefits of using CRQA and GCA together can be found in Main et al. (2016)<sup>7</sup> . We then used a linear mixed-effects model to analyze the resulting data.

#### 2.4.1. Cross-Recurrence Quantification Analysis

CRQA is an outgrowth of recurrence quantification analysis (RQA), a nonlinear time series analysis that captures the structure and patterns of states visited by a single dynamical system over time (Eckmann et al., 1987). CRQA extends RQA by capturing the amount to which two different systems covisit similar states in time and has become a staple for analyzing human data from a dynamical systems perspective (e.g., Shockley et al., 2003; Dale and Spivey, 2006; Richardson et al., 2007b; Gorman et al., 2012; Anderson et al., 2013; Fusaroli et al., 2014; Vallacher et al., 2015). Detailed explanations of CRQA and its applications in a variety of settings are available in Marwan et al. (2007), Coco and Dale (2014), and Main et al. (2016).

In our case, CRQA allows us to quantify when two participants moved in similar ways during conversation. Unlike studies of more rhythmic movements (e.g., tapping to a metronome), head movement dynamics during conversation comprise both periodic (e.g., underlying postural sway) and non-periodic (e.g., nodding intermittently during conversation) components leaving phase-coupling analyses (e.g., Richardson et al., 2007a) less suitable for our current purposes. We chose CRQA as a method that does not assume or require periodicities and that can be more resilient to the noise inherent in a new method (i.e., measuring interpersonal dynamics with head-mounted accelerometers in Google Glass).

Current best practices for continuous CRQA include reconstructing the phase space for each pair of signals using timedelay embedding (Shockley et al., 2003; Riley and Van Orden, 2005) and then calculating recurrent points by identifying the radius size at which overall recurrence rate (RR) of the plot is equal to 5% (cf. Marwan et al., 2007; Konvalinka et al., 2011). More detailed information on phase space reconstruction and embedding are available from March et al. (2005) and Iwanski and Bradley (1998). We follow these best practices to calculate CRQA for each conversation of each dyad<sup>8</sup> . The parameters for each dyad are available in the OSF and GitHub repositories for the project (see Section 3).

CRQA was implemented in R (R Core Team, 2016) using the crqa library (Coco and Dale, 2014). We obtained the diagonal recurrence profile (DRP) for each conversation of each dyad. The DRP captures how much coordination occurs within a "window" of relative time between participants. Here, we target a window of ±5 s, consistent with previous work on body movement coordination generally (Paxton and Dale, 2013a) and head movement specifically (Ramseyer and Tschacher, 2014). With a sampling rate of 10 Hz, this creates a window of interest of ±50 samples. Intuitively, the DRP can be read much like a cross-correlation profile (Paxton and Dale, 2013a), with some differences. (For more on the differences between DRPs and cross-correlation profiles, see Main et al., 2016.)

Essentially, the DRP allows us to explore similarities in patterns of movement that are independent of absolute time while revealing patterns of relative time. The DRP captures leading and following patterns along with simultaneous movement. In other words, we are able to use DRPs to see, at any given time in the conversation, whether participants are more likely to be moving in similar ways (i.e., higher rate of recurrence or RR) or in dissimilar ways (i.e., lower rate of recurrence or RR).

Because both participants will have the same length time series (because of identical sampling rates within the experiment), Participant A and Participant B will both have samples for all time points, t. The DRP compares Participant A's head movement at t with Participant B's head movement at t − 50, ..., t, ..., t + 50. When Participant B's t < 0, the DRP captures the degree to which Participant B leads the movement state for Participant A at t; when Participant B's t > 0, the DRP captures the degree to which Participant A at t leads the movement state for

<sup>7</sup>Although Main et al. (2016) present the categorical case, the same principles apply to the continuous case, which we employ here.

<sup>8</sup>We do recognize that there are open questions about how to best handle phase space reconstruction for RQA that can extend to CRQA, particularly with regard to the choice of embedding dimension (e.g., Marwan et al., 2007). While some previous work suggests that an embedding dimension of 1 (m = 1) is sufficient, we follow current recommendations to determine embedding dimension with false nearest neighbors for each participant in the dyad (Riley and Van Orden, 2005) and selecting the higher embedding dimension of the two (Marwan et al., 2007).

Participant B. When we compare Participant A's movement at t with Participant B's movement at t, the DRP captures the amount to which both participants engaged in movement at the same time. The DRP also captures the reverse—comparing Participant B's head movement at t with Participant A's head movement at t − 50, ..., t, ..., t + 50.

#### 2.4.2. Growth Curve Analyses

GCA is a time series analysis used to quantify the degree to which changes over time can be best described by various orthogonal polynomials (Mirman et al., 2008). Rather than assuming that data are described by a linear relationship, GCA determines how well the data are fit by polynomial relationships (e.g., linear, quadratic, cubic) and disentangles the contribution of each polynomial independently. In the current analysis, we focus only on the first- and second-order orthogonal polynomials.

In other words, GCA allows us to distinguish how much the linear and quadratic forms separately contribute to the overall shape of the data. As a result, GCA is a powerful technique for quantitatively comparing DRPs, allowing us to explore leading/following patterns (with the linear lag term) and coordination patterns (with the quadratic lag term).

#### 2.4.3. Model Specifications

All data analysis was performed in R (R Core Team, 2016). Using the lme4 library (Bates et al., 2015), we created a linear mixed-effects model to quantify the effects of linear lag (LL; leading/following) and quadratic lag (QL; coordination) with conversation type (within-dyads; dummy-coded: affiliative [0] or argumentative [1]) and task (between-dyads; dummy-coded: dual-task [0] or noise [1]) on head movement recurrence rate (RR). Dyad and conversation number were included as random intercepts; for both random intercepts, we included the maximal random slope structure that permitted model convergence using backwards selection per current best practices for linear mixedeffects models (Barr et al., 2013). Compared against the randomintercepts-only model, the maximal model justified by the data better fits the data; these results are provided in the supplemental repositories for the project (see Section 3).

As discussed below (see Section 3), our data and analysis materials—including code with the precise specifications for all models—are freely available in public repositories for the project. For interested readers, we here provide the single-equation mathematical form of our linear mixed-effects model using Barr et al.'s (2013) conventions:

$$\begin{aligned} RR\_{dt} &= \beta\_0 + D\_{0d} + N\_{0d} + (\beta\_1 + D\_{1d} + N\_{1d})c\_d + \beta\_2 k\_d \\ &+ (\beta\_3 + D\_{3d} + N\_{3d})l\_{dt} + (\beta\_4 + D\_{4d} + N\_{4d})q\_{dt} \\ &+ \beta\_5 c\_d k\_d + \beta\_6 l\_{dt} q\_{dt} + \beta\_7 c\_d l\_{dt} + \beta\_8 k\_d l\_{dt} \\ &+ (\beta\_9 + D\_{9d} + N\_{9d})k\_d c\_d l\_{dt} + \beta\_{10} c\_d q\_{dt} + \beta\_{11} k\_d q\_{dt} \\ &+ \beta\_{12} k\_d c\_d q\_{dt} + \beta\_{13} c\_d l\_{dt} q\_{dt} + \beta\_{14} k\_d l\_{dt} q\_{dt} + \beta\_{15} k\_d c\_d l\_{dt} q\_{dt} \\ &+ e\_{dt} \end{aligned} \tag{1}$$

Equation (1) estimates the recurrence rate RR for any dyad d at lag t. It does so by estimating the global coefficients notated as β1,...,15—for each fixed effect: conversation type c, task condition k, linear (i.e., orthogonal first-order polynomial) lag l, quadratic (i.e., orthogonal second-order polynomial) lag q, and all interaction terms. Random intercepts for dyad identity D<sup>0</sup> and conversation number N<sup>0</sup> are included. We also include the maximal slope structure that permit model convergence using backwards selection from the fully maximal model in accordance with current best practices (Barr et al., 2013). The fixed effects included in the maximal slope structure for random intercepts are noted above (β<sup>n</sup> + Dnd + Nnd).

Although we report effects of LL in the model (noted l above), we are cautious in interpreting them. Participants were paired by a fairly random process (i.e., by individual sign-ups for open experimental timeslots that did not allow participants to see their partner's identity) and were randomly assigned to their seat in the interaction space (i.e., by arrival time; each chair was closer to one or the other of the private questionnaire spaces). Unlike previous studies (Main et al., 2016), we had no a priori expectations about or reasons to expect leading/following behaviors; therefore, we refrain from deeply interpreting any LL results.

#### 2.4.4. Comparing to Baseline

In keeping with recommended baselines for nonlinear analyses, we also create a baseline using a Fourier phase-randomization analysis (Theiler et al., 1992; Kantz and Schreiber, 2004). Phase randomization creates a surrogate dataset that contains the same power spectrum as the real data but differs in phases, retaining the autocorrelations of the original time series. Here, we use the nonlinearTseries package (Garcia, 2015) in R (R Core Team, 2016) to create 10 phase-randomized surrogate time series for each conversation of each dyad to provide a more robust baseline analysis. We then perform CRQA over these new time series using the same parameters as the real data. Essentially, the resulting recurrence dynamics capture the amount of similarity that emerges by chance between the two time series (in this case, interacting individuals)<sup>9</sup> .

In our Supplementary Materials on GitHub and the OSF (see Section 3), we also perform a baseline analysis using a sample-wise shuffled baseline, a more common baseline technique in interpersonal coordination research that breaks temporal correspondence between two time series by separately randomizing (or shuffling) the order of each sample from the real behavior time series (Dale et al., 2011; Louwerse et al., 2012). Although this destroys more inter-sample dependencies, the sample-wise shuffled baseline also destroys the autocorrelation of the time series. This creates a somewhat less conservative baseline, as shuffled baselines cannot strongly account for the hysteresis of the system. Because the samples are shuffled independently, the temporal dynamics of shuffled baselines through their reconstructed phase-spaces are not influenced by their previous time-steps. By retaining the autocorrelation of the individual time series in the phase-randomization surrogate analysis, we are able to account for the chance that two individual time series might "live" in similar regions for some amount of time simply due to their own dynamics, rather than the influence of the other time series.

<sup>9</sup>We thank a reviewer for suggesting this more robust analysis.

We provide the results from analyses using the sample-wise shuffled baseline in our Supplementary Materials (see Section 3). The results are highly similar to those performed against the phase-randomization baseline, although our results suggest that the phase-randomization baseline provides a more conservative metric for the amount of synchrony that might occur by chance.

# 3. DATA AND CODE SHARING

We have made data and code (including code for data preparation and analysis) for the project freely available according to current best practices for data stewardship. Due to the nature of self-disclosure in the conversation data (especially in the argumentative context), we were permitted to release only limited information about each dyad: de-identified movement time series for each participant in each conversation, the dyad's assigned experimental condition, and the dyad's gender makeup.

Current best practices for open science include the sharing of data and code in public repositories (see Nosek et al., 2015; Blohowiak et al., 2016; Gewin, 2016; Kidwell et al., 2016). Two prominent venues for storing and sharing materials are the Open Science Framework (OSF; http://osf.io) and GitHub (GitHub, Inc.; https://www.github.com/). Both OSF and GitHub serve as platforms to share materials, promote community contribution, and facilitate open re-use (and re-analysis) of materials by others through appropriate attribution. Furthermore, the OSF allows researchers to "freeze" specific versions of the project—for example, at the point of publication (as we have done here)—providing a crystallized, unmodifiable snapshot of all files at that time.

All data and code for the project are freely available through our OSF repository (Paxton and Dale, 2017): https://osf.io/4yqz8/

All code can also be freely accessed through our project's GitHub repository: https://www.github.com/a-paxton/dualconversation-constraints

# 4. RESULTS

All analyses were performed in accordance with the model specifications described in Section 2.4.3. We here present only the standardized model, as it allows us to interpret estimates as effect sizes (see Keith, 2005). (The unstandardized model is available in the project's OSF and GitHub repositories; see Section 3.) Full standardized model results are presented in **Table 1**. For clarity within the text, we reference main and interaction terms in parentheses within the text so that readers can easily find the relevant values in **Table 1**.

Results indeed suggested that high- and low-level constraints influence coordination dynamics—even in some unexpected ways. Contrary to our hypothesis H1, we did not find evidence of overall time-locked synchrony. Participants' head movements were, in fact, better described by a turn-taking pattern with slight leading-following dynamics (LL × QL).

Consistent with our hypotheses H<sup>2</sup> and H2A—and replicating our previous findings (Paxton and Dale, 2013a)—we found TABLE 1 | Results from the standardized linear mixed-effects model (implemented with lme4; Bates et al., 2015) predicting recurrence of head movement between participants (RR) with conversation (within-dyads; dummy-coded: affiliative [0] or argumentative [1]), task (between-dyads; dummy-coded: dual-task [0] or noise [1]), linear lag (LL; leading/following), and quadratic lag (QL).


*The model's fixed effects alone accounted for 37% of the variance (marginal R<sup>2</sup>* = 0.37*), while the fixed and random effects accounted for 94% of the variance (conditional R<sup>2</sup>* = *0.94).* .*p < 0.10;* \**p < 0.05;* \*\**p < 0.01;* \*\*\**p < 0.001.*

that argument significantly decreased RR compared to affiliative conversations (conversation; see **Figure 1**). Conversation also affected moment-to-moment coupling dynamics: Recurrence during the affiliative conversations was higher but more diffuse, while recurrence in the argumentative conversation was lower and showed a distinct turn-taking pattern (conversation × QL).

Interestingly, although we hypothesized that the noise condition would increase RR compared to the dual-task condition, we did not find a significant main effect of task condition (task). Instead, we found that task affected the dynamics of coordination only in conjunction with other pressures (task × conversation × QL). We explored these patterns in greater depth by analyzing each of the conversation types (i.e., affiliative and argumentative conversations) separately.

# 4.1. Post-hoc Analyses of Interaction Terms

Results for the standardized models exploring the complex interaction term are presented in **Table 2**. For clarity, we again refer in the text only to the model variables so that readers can find the relevant statistics in the model. As with the first model, we ran both standardized and unstandardized versions of these models, but we present only the standardized models in the text. Additional information—including the unstandardized models—can be found in the OSF and GitHub repositories for the project (see Section 3).

As in the main model, both follow-up models showed that head movement showed turn-taking patterns with some leader-follower dynamics (LL × QL). No other effects

reached significance in the post-hoc analyses of the affiliative conversations.

The results of the post-hoc analyses of argumentative conversations, however, showed context-sensitive responses to low-level constraints. Overall, participants demonstrated a much stronger turn-taking pattern of head movement during argumentative conversations (QL). These effects were much more pronounced during the dual-task condition than in the noise condition (task × QL), with recurrence exhibiting the characteristic U-shaped DRP of turn-taking behavior.

# 4.2. Comparisons to Phase-Randomized Baseline

The patterns outlined above hold even compared to baseline measures of synchrony. For brevity, tables of results comparing real data to phase-randomized surrogate baseline data are included in **Appendix**. **Table A1** is the companion to **Table 1**; **Table A2** is the companion to **Table 2**. In these tables, the "data" variable refers to either the baseline surrogate data (data = −0.5) or the real experimental data (data = 0.5).


*To follow up on the four way interaction term in the main model (see* Table 1*), we targeted each conversation type in separate models, using their own standardized datasets. The affiliative model's fixed effects alone accounted for 7% of the variance (marginal R<sup>2</sup>* = *0.07), while the fixed and random effects accounted for 91% of the variance (conditional R <sup>2</sup>* = *0.91). The argumentative model's fixed effects alone accounted for 5% of the variance (marginal R<sup>2</sup>* = *0.05), while the fixed and random effects accounted for 94% of the variance (conditional R<sup>2</sup>* = *0.94).* .*p* < *0.10;* \**p* < *0.05;* \*\**p* < *0.01;* \*\*\**p* < *0.001.*

Again, only the standardized models are presented in the text of the current paper. Unstandardized models—along with standardized and unstandardized models performed with the sample-wise shuffled baseline—are available on the project's GitHub and OSF repositories (see Section 3). Due to the complexity of the overall model (**Table A1**), we use the posthoc models (**Table A2**) as a framework for discussing the results.

### 4.2.1. Affiliative Conversation Post-hoc Analyses for Comparison to Baseline

Strikingly, these results suggested that the level of recurrence observed in the affiliative conversations was not overall significantly different from baseline (data), although the two datasets did differ in their dynamics (data × LL × QL). The surrogate data showed significantly lower leadingfollowing patterns (data × LL) and exhibited no turntaking nor synchrony patterns (data × QL significant, but not QL).

The affiliative conversations also differed from baseline with the task data. The results suggested a trend toward significantly higher overall recurrence in the noise condition than we would expect to see by chance, although it did not reach significance (data × task). We did, however, find a significant difference in the coordination dynamics between the two task conditions (data × task × QL): Compared to the flat recurrence profile of the baseline in both conditions, the dual-task condition demonstrated more of the inverted-U-shape of synchrony, while the noise condition demonstrated more of the U-shape of turntaking.

## 4.2.2. Argumentative Conversation Post-hoc Analyses for Comparison to Baseline

Unlike the affiliative conversations, we found that levels of recurrence were—overall—significantly lower than baseline (data). In other words, participants coordinated with one another even less than what would be expected by chance, and that decreased recurrence was more likely to appear in a turn-taking pattern (data × QL) with some leader-follower effects (data × LL).

Task constraints also exerted significant effects on the dynamics of recurrence. In addition to showing different leaderfollower behaviors across the two tasks (data × task × LL), the data revealed differences in the temporal patterning of movement across the two tasks (data × task × QL). Essentially, participants' head movements showed much stronger turn-taking patterns in the argumentative conversations in the dual-task condition than in the noise condition, which had a relatively flat recurrence profile.

# 5. DISCUSSION

Communication is a rich, complex phenomenon that plays a central role in daily human life. We use conversation flexibly, allowing us to engage in mundane transactions, bond over shared interests, collaborate to complete joint tasks, and argue about our political opinions. Although these different communicative contexts are part of our everyday experiences, the scientific study of these dynamics have largely centered on friendly or collaborative contexts. The current study aimed to contribute to a fuller picture of communicative dynamics by investigating how conflict (a high-level contextual perturbation) and rapidly changing visual information (a low-level contextual perturbation) interact to affect the dyadic system.

Here, we specifically targeted interpersonal synchrony of head movement—that is, the similarity of participants' head movement over time during their interaction. We used PsyGlass (Paxton et al., 2015), a stimulus-presenting and movement-recording application on Google Glass, to capture the acceleration time series of participants' head motion during naturalistic conversations shaped by high-level (i.e., argumentative or affiliative conversational context) and lowlevel (i.e., noise or dual-task visual information condition) constraints. From the theoretical position that human interaction is a complex adaptive system, we hypothesized that interaction dynamics should be sensitive to each of these constraints.

Our analyses found support for some—but not all—of our hypotheses. Taken together, our results support the idea of interaction as a complex adaptive system while highlighting inconsistencies within previous literature and suggesting avenues for future research.

# 5.1. Head Movement Synchrony

Perhaps most unexpectedly, we did not find support for our hypothesis that participants would be synchronized in their head movement patterns (H1). Instead, participants' head movement tended to exhibit time-lagged synchrony or turn-taking dynamics (cf. Butler, 2011). These results stand in contrast with previous work on head movement synchrony, which has shown that individuals tend to synchronize their head movements during conversation.

Interestingly, these patterns resemble those observed in speech signals during friendly and argumentative conversations (Paxton and Dale, 2013c). Of course, this suggests that the current measure of head movement may be influenced by speaking. Future work should disentangle the ways that intrapersonal coupling of head movement and speaking may influence interpersonal head movement coordination.

However, relatively little research has targeted head movement synchrony, and the existing work in this area has used very different methods and analyses. For example, Boker et al. (2002) (a) tracked head movements with passive three-dimensional motion-tracking sensors at 80 Hz, (b) analyzed Euclidean velocity, (c) did not mention whether a filter was used on the movement time series, (d) calculated synchrony through windowed cross-correlation (i.e., a linear time series analysis) and (e) did not use a baseline. On the other hand, Ramseyer and Tschacher (2014) (a) tracked head movements through video (i.e., by quantifying displaced pixels from frame to frame in a region of interest around the head) at an unspecified sampling rate, (b) analyzed a "flattened" velocity (i.e., 2D projection of 3D movement), (c) filtered movement signals with an unspecified filter, (d) calculated synchrony as the absolute value of the windowed cross-correlation coefficients between participants, and (e) used a "window-wise" shuffled baseline (i.e., preserving local structure within the data by shuffling 1 min chunks rather than shuffling all samples independently). By contrast, we (a) tracked head movements with active headmounted sensors at 10 Hz (after downsampling), (b) analyzed Euclidean acceleration, (c) filtered movement signals with a lowpass Butterworth filter, (d) calculated synchrony with crossrecurrence quantification analysis (i.e., a nonlinear time series analysis without windowing), and (e) used a phase-randomized baseline (and, in our Supplementary Materials, a sample-wise shuffled baseline). Future work should explore the degree to which these and other factors may influence findings of head movement synchrony. Our task also differed by integrating highand low-level constraints. We turn to these next.

# 5.2. Differences in High-Level Contextual Constraints

Conversational context modulated these patterns of coordination (supporting H2). Consistent with previous research (Paxton and Dale, 2013a), we also found support for our directional hypothesis. Argument decreased synchrony (supporting H2A): Participants moved in more dissimilar ways during argumentative conversations relative to affiliative ones.

The way in which the two high-level contexts influenced synchrony was particularly interesting. Synchrony during friendly conversations was indistinguishable from chance, while synchrony during argumentative arguments was significantly lower than what would be expected by chance. This contrasted with our previous work (Paxton and Dale, 2013a), which found that overall body movement synchrony during friendly

conversations was higher than expected by chance and that synchrony during arguments was not significantly different than chance. However, this again would be consistent with the patterns observed in speech rather than movement (Paxton and Dale, 2013c), as mentioned earlier.

# 5.3. Differences in Low-Level Contextual Constraints

We also found that low-level contextual constraints influenced coordination dynamics (supporting H3), but the results surrounding our directional hypothesis were more nuanced (H3A). We found no significant differences in the overall levels of synchrony in the presence of informative or uninformative visual input, instead finding differences in the moment-tomoment dynamics of coordination across high-level contextual constraints. The effects of task condition emerged only during arguments, again supporting the idea that emergent behaviors—like synchrony—are context-dependent: Head movements exhibited a marked turn-taking pattern during argumentative conversations in the dual-task condition but had relatively flat temporal correspondence in the noise condition.

In finding an interaction effect for the low-level contextual constraint, the current study may highlight the importance of the cognitive interpretation of the perceptual information in the environment. Previous work on auditory perceptual information simply introduced a noisy background stimulus (Boker et al., 2002); no additional interpretation was needed. The current work, by contrast, presented the same perceptual stimulus to participants (i.e., changing blue and red screens), and the two conditions differed by the significance (or lack thereof) of that stimulus.

Although we found no main effect between task conditions (see Section 5.5), the differences of these two conditions relative to one another can meaningfully inform some of our understanding of these phenomena. The turn-taking coordination dynamics during arguments in the dual-task condition (compared with the flat profile in the noise condition; see **Figure 1**) may suggest a slight reworking of the influences seen in previous work. The auditory noise of Boker et al. (2002) would have presented task-relevant difficulties, since hearing and speaking are directly affected by ambient noise. By contrast, our "noise" condition—a flashing screen—may not have directly impacted conversation, compared with the increased cognitive load of performing a working memory task while having a complex conversation. This suggests a slight change in what may boost coordination: Like the auditory noise of Boker et al. (2002) and the dual-task condition of the present study, perhaps constraints must be task-relevant in order to influence movement coordination.

# 5.4. Conversation as a Complex System

The partial support for H3<sup>A</sup> provided the strongest evidence for context sensitivity of conversation to high- and low-level constraints. Our results both supported and failed to support our directional hypothesis, depending on the context. The effects of high- and low-level contextual constraints were neither uniform nor additive; instead, high- and low-level contextual effects interacted to produce unique patterns. We interpret these results as fitting with the idea that conversation can be fruitfully conceptualized through dynamical systems theory (DST), supporting our final hypothesis (H4).

While previous work explored only perceptual noise within "free conversation" (p. 350; Boker et al., 2002), the present study asked participants to engage in two distinct discourse activities argument or friendly conversation. This allowed us not only to explore the effects of low-level contextual constraints in a new modality (i.e., vision) but provided us with an opportunity to combine it with a growing emphasis on exploring coordination dynamics in different conversational contexts.

Our results add nuance to previous findings about perceptual noise: Rather than uniformly increasing coordination (cf. Boker et al., 2002), low-level contextual pressures alter coordination dynamics only in some conversational contexts. Our results also add nuance to previous findings about conversational context: Rather than uniformly decreasing synchrony (Paxton and Dale, 2013a; Abney et al., 2014), argument's effects can be modulated by low-level perturbations. Moreover, these low-level perturbations affect behavior differently depending on the overarching highlevel context—exerting a stronger influence on coordination dynamics during argument compared to affiliative conversations. Most strikingly, this particular combination of high- and lowlevel context has led to unique behavioral dynamics, leading both synchrony in both friendly and argumentative conversations to decrease (relative to chance) and to reorganize their temporal dynamics.

Of the contributions of the current study, we believe that our results most compellingly speak to the importance of recognizing conversation as a complex dynamical system. Consistent with the interpersonal synergies perspective on coordination (e.g., Riley et al., 2011), we find that coordination is sensitive to contextual constraints. Put simply, coordination—as one property of interaction, which we view as a complex dynamical system is simultaneously sensitive to low-level perceptual information, cognitive interpretation of this low-level information, and highlevel interpersonal goals.

# 5.5. Limitations and Future Directions

The current paper provides one of the first simultaneous explorations of high- and low-level contextual constraints in naturalistic conversation. As a result, the study has several limitations that are opportune areas for future directions.

First, we found that the difference in recurrence between affiliative and argumentative conversations was modulated by task: Argumentative conversations were more strongly affected by task condition than affiliative conversations (see **Table 2**). However, this pattern could have emerged in a variety of ways: For example, compared to non-visually-disrupted conversation, noise could have decreased coordination; the dual-task condition could have increased coordination; both could have decreased, with noise simply leading to a greater decrease; both could have increased, with dual-task simply leading to a greater increase; or some other pattern may be at work. Simply put, although we can address relative differences between the two conditions, we cannot make strong claims as to the precise mechanism behind the differences in absolute coordination from the current study. Future work should include a baseline condition without any visual noise (holding all other experimental pressures equal) in order to target these possibilities. (A baseline condition would also help choose among similar causes behind the difference in peakedness between noise and dual-task conditions in argument.)

Second, we here only investigated linear (i.e., leading/following) and quadratic (i.e., synchrony or turntaking) patterns across all dyads. As we have observed in our previous work, these data appear to exhibit interesting dyadspecific effects (see **Figure 2**), and future work should investigate them as dyad-level analogs to individual differences. It may be of interest to include higher-order polynomial patterns (e.g., cubic, quartic) in future analyses, both in describing the observed data and in understanding what they might mean psychologically or interpersonally.

Third, research should continue across additional modalities and contexts. Not all constraints should affect conversation equally; therefore, there should be no expectation that the same dynamics will emerge across all modalities. The effect of low-level constraints in a joint task-performance environment may be quite different than naturalistic conversation. Similarly, introducing perturbations of varying severity to different perceptual modalities may unequally affect interpersonal dynamics. Future work should continue to map out these effects to better understand interaction.

Finally, we present only a first exploration of these dynamics; our findings should be replicated, especially in larger samples. The sample included here is fairly normative for conversational coordination research (for discussion of sample sizes, see Paxton and Dale, 2013a); the only other study exploring the effects of perceptual perturbations on conversation dynamics (to the authors' knowledge) included only 4 dyads (Boker et al., 2002). Issues of open science and reproducibility are particularly salient at this time to psychology and cognitive science (cf. Open Science Collaboration, 2015), so we provide (1) open-source code for our data collection techniques (on the PsyGlass GitHub repository: http://www.github.com/a-paxton/PsyGlass), (2) a high level of methodological detail about our procedure (in Section 2.2), (3) our data (on OSF: https://osf.io/4yqz8/), and (4) open-source code for our data preparation and analysis techniques (on OSF, https://osf.io/4yqz8/, and GitHub, https:// www.github.com/a-paxton/dual-conversation-constraints).

These tools will help us and other researchers interested in interpersonal coordination and communication dynamics to integrate our practices, resources, and findings so that we can—together—better refine our understanding of human social behavior.

# 6. CONCLUSION

In this paper, we explore the dynamics of human interaction in an experiment and analyses inspired by ideas from complex adaptive systems. Patterns of nonverbal behavior during conversation change based on both high-level contextual constraints—like what kind of conversation people are having and low-level contextual constraints—like the significance of visual information in the environment. Replicating previous work, we find that argument decreases movement synchrony. Interestingly, we find that high-level constraints interact with low-level ones, mitigating or exacerbating the effects of argument depending on the cognitive interpretation of the perceptual stimuli. We see our results as contributing to the growing view that patterns of communication—even subtle signatures of body movement—are shaped by the host of contextual factors that surround the conversation.

# AUTHOR CONTRIBUTIONS

Experiment design: AP and RD. Data collection: AP. Data analysis: AP and RD. Manuscript writing and editing: AP and RD.

# REFERENCES


# ACKNOWLEDGMENTS

Our thanks go to the undergraduate research assistants from the University of California, Merced who helped with data collection (in alphabetical order): Neekole Acorda, Kyle Carey, Nicole Hvid, Krina Patel, and Keith Willson. We also thank Aaron Culich for his assistance with computation resources on Jetstream, which was made possible through the XSEDE Campus Champion program and the Berkeley Research Computing (BRC) program of Research IT at the University of California, Berkeley. This project was funded in part by a Moore-Sloan Data Science Environments Fellowship to AP, thanks to the Gordon and Betty Moore Foundation through Grant GBMF3834 and the Alfred P. Sloan Foundation through Grant 2013-10-27 to the University of California, Berkeley.


spectators in a fire-walking ritual. Proc. Natl. Acad. Sci. U.S.A. 108, 8514–8519. doi: 10.1073/pnas.1016955108


unintentional interpersonal coordination. Hum. Movement Sci. 26, 867–891. doi: 10.1016/j.humov.2007.07.002


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Paxton and Dale. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# APPENDIX

# Comparisons to Baseline

This appendix provides the results of analyses comparing the real experimental data to the phase-randomized surrogate baseline data. **Table A1** compares the real and baseline data using the analysis scheme provided in Section 2.4. **Table A2** provides posthoc analyses diving into the differences between the affiliative and argumentative conversations.

TABLE A1 | Results from the standardized linear mixed-effects model comparing the real data to the phase-randomized surrogate baseline (implemented with lme4; Bates et al., 2015).


*The model's fixed effects alone accounted for 8% of the variance (marginal R<sup>2</sup>* = *0.08), while the fixed and random effects accounted for 51% of the variance (conditional R<sup>2</sup>* = *0.51).* .*p* < *0.10;* <sup>∗</sup>*p* < *0.05;* ∗∗*p* < *0.01;* ∗∗∗*p* < *0.001.*


TABLE A2 | Results from two standardized linear mixed-effects models comparing real data to phase-randomized surrogate baseline (implemented with lme4; Bates et al., 2015).

*To follow up on the interaction terms in the main model (see* Table A1*), we targeted each conversation type in separate models, using their own standardized datasets. The affiliative model's fixed effects alone accounted for 2% of the variance (marginal R<sup>2</sup>* = *0.02), while the fixed and random effects accounted for 42% of the variance (conditional R<sup>2</sup>* = *0.42). The argumentative model's fixed effects alone accounted for 10% of the variance (marginal R<sup>2</sup>* = *0.10), while the fixed and random effects accounted for 64% of the variance (conditional R <sup>2</sup>* = *0.64).*.*p < 0.10;* \**p < 0.05;* \*\**p < 0.01;* \*\*\**p < 0.001.*

# Wild Bodies Don't Need to Perceive, Detect, Capture, or Create Meaning: They ARE Meaning

J. Scott Jordan\*, Vincent T. Cialdella, Alex Dayer, Matthew D. Langley and Zachery Stillman

*Department of Psychology, Institute for Prospective Cognition, Illinois State University, Normal, IL, United States*

Keywords: theory of mind (ToM), embodiment, embodied cognition, perception, relational properteies, intrinsic properties

For years, experimental psychologists have assumed it is difficult for one person to know the mental states of another because all we can directly experience about each other is observable behavior. As a result, mental states need to be inferred via what has come to be known as a theory of mind. According to contemporary embodiment theorists however, some of whom refer to themselves as enactivist theorists, the mental states of others are not internally isolated at all, with some arguing social cognition is direct (Gallagher, 2008, 2015) while others propose it can sometimes be constituted by social interaction (De Jaegher et al., 2010).

#### Edited by:

*Joanna Raczaszek-Leonardi, University of Warsaw, Poland*

#### Reviewed by:

*Ezequiel Di Paolo, Ikerbasque—Basque Foundation for Science, Spain*

> \*Correspondence: *J. Scott Jordan jsjorda@ilstu.edu*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *12 December 2016* Accepted: *23 June 2017* Published: *14 July 2017*

#### Citation:

*Jordan JS, Cialdella VT, Dayer A, Langley MD and Stillman Z (2017) Wild Bodies Don't Need to Perceive, Detect, Capture, or Create Meaning: They ARE Meaning. Front. Psychol. 8:1149. doi: 10.3389/fpsyg.2017.01149*

While we are sympathetic to the complex systems approach embodiment theorists tend to take on the issues of cognition and social interaction, we are concerned their theorizing about subjective properties (i.e., meaning, feelings, experiences, and emotions) leaves such properties vulnerable to epiphenomenalism. That is, the actual work of cognition and social interaction is described in terms of complex, multi-scale, causal dynamics among objective phenomena such as neurons, brains, bodies, and worlds, and the meanings, feelings, experiences, and emotions are said to be emergent from, caused by, identical with, or an informational aspect of, the objective phenomena. In short, the embodiment-driven scientific description of cognition and social interaction renders subjective properties logically unnecessary to the scientific description.

While some embodiment theorists approach the reality of subjective properties via a phenomenological perspective that pretty much assumes the reality of subjective properties without being concerned with potential epiphenomenalism (Gallagher, 2008, 2015; De Jaegher et al., 2010), those who work to establish the non-epiphenomenal reality of experience in a complex systems framework tend to define experience in terms of relational properties (Holt et al., 1910; Charles, 2011; Gallagher and Zahavi, 2014; Silberstein and Chemero, 2015), the most popular perhaps being Gibson (1966) and his notion of affordances. According to this view, organisms perceive their environment, including other organisms, in terms of behavioral possibilities (i.e., affordances). These possibilities are simultaneously about both the organism and the environment. Given they are constituted of bi-directional aboutness, they are considered to be inherently meaningful. Meaning, in this sense, is being defined in terms of aboutness.

The practice of using complex systems theory to describe relational properties has been around for some time (Rosen, 1958; Varela et al., 1991; Kauffman, 1996; Emmeche, 2002). And when we conceptualize relational properties as vehicles of subjective properties via concepts such as affordances, we make good progress toward establishing the non-epiphenomenal status of experience (Silberstein and Chemero, 2015). However, despite the introduction of a relational property (e.g., an affordance) at one level of reality, we leave open the possibility that reality is also constituted of non-relational properties; that is, properties that are in no way constituted of their relations with other aspects of reality, what one might refer to as an intrinsic property (e.g., weight is a relational property, while mass is an intrinsic property). Such a possibility proves problematic because the notion of intrinsic properties has come under increasing attack by contemporary philosophers of science. According to Jammer (2000), inertial mass emerges from a particle's interaction with the Higgs field: "...a scalar field that 'permeates all of space' and 'endows particles with mass' (p. 162)." Bauer (2011) asserts this type of interactive dependence renders mass externally grounded, which means the particle's mass is partially constituted by its relations to its context. Others have rendered similar criticisms of the notion of intrinsic properties via concepts such as ultra-grounding (Harré, 1986) and Global Groundedness (Prior et al., 1982). In a similar vein, Schaffer (2003) and Dehmelt (1989) claim that there may no fundamental level to reality at all.

Such an assault on intrinsic properties challenges the idea that some properties are relational and others are not which, in turn, problematizes the idea of defining one level of reality (i.e., the internal dynamics of a single-cell, or an organismenvironment coordination) as being meaningful because it entails a relational property. According to Wild Systems Theory (WST—Jordan and Day, 2015), all properties are constituted of and by their relations with context. As a result, all properties are inherently meaningful because they are naturally and necessarily about the contexts within which they persist. From this perspective, meaning is ubiquitous. In short, reality is inherently meaningful.

Given this notion of an inherently relational, meaningful reality, WST goes beyond the notion of affordances and proposes instead that organisms are meaning because they are inherently relational in that they constitute embodiments of the constraints (i.e., contexts) they have had to phylogenetically, as well as ontogenetically embody in order to sustain themselves (Jordan and Ghin, 2006, 2007). Bones, muscles, and brains for example, constitute embodiments of the constraints involved in propelling a body as a whole, through a gravity field. At every level of scale, from the single-cell up through the organismenvironment coordination, such wild bodies are inherently relational and, therefore, inherently meaningful (Streeck and Jordan, 2009). As a result, wild bodies are not information detectors or information processors, but rather, modulators of context.

WST's ontology of ubiquitous, multi-scale relationality firmly establishes the reality of subjective properties by revealing the intrinsic-relational dualism that lies at the heart of most contemporary takes on relational properties. If reality is inherently relational, all the way down, we do not need to posit vehicles of content. And given that other organisms were part of the contextual constraints that organisms had to embody to sustain themselves, social interaction is only special in that it constitutes yet another level of the inherently meaningful, relationality in which all wild bodies are nested.

To be sure, some may feel that by making meaning ubiquitous, WST ultimately renders it meaningless. Jordan and Day (2015) propose however, that because everything is meaningful, nothing is meaningless. Jordan and Vinson (2012) propose that nonliving systems also constitute embodiments of context and, as a result, are also inherently meaningful (i.e., inherently about the contexts they embody). What distinguishes the aboutness entailed in living and non-living systems is the dynamics by which such systems sustain their integrity. Non-living systems exist as "systems" in a persistent state of tension between strong and weak forces, and their micro-macro structures are not coupled in ways that sustain any particular aspect of the coupling in response to changes in these forces. The micromacro dynamics of living systems however (e.g., the chemicals that constitute a single cell, and the cell as a whole, respectively), are dynamically coupled in ways that generate work (i.e., energy transformations) that continually bring energy into the system and allow it to generate and sustain ordered states (e.g., organelle maintenance, genetic transcription, and the Krebs cycle) capable of resisting, to some extent, the strong and weak forces within which such systems are perpetually nested. Regardless of the dynamical differences between living and nonliving systems however, both constitute embodiments of context and, as a result are inherently relational and meaningful. From this perspective, phenomena such self-awareness, qualia, and consciousness are phylogenetically scaled-up recursions of the meaning inherent in all embodiments of context. In short, one might regard phylogenetic history as the evolution of meaning.

In conclusion, it is perhaps a bit unfair to hold embodiment theorists responsible for overcoming epiphenomenalism. Cognitive science as a whole has been working to ground experience and subjectivity for quite some time. Much to their credit, contemporary enactivists pay close attention to phenomenology and develop research methods that include phenomenology as an important aspect of the research. And according to WST, this extremely valuable research will definitely advance our understanding of the relations that exist between brains, bodies, environments, and phenomenology. In the end however, such research will not prove necessary to grounding phenomena we refer to as "experience" and "subjectivity" because such phenomena are phylogenetically scaled-up versions of the same inherent relationality that constitutes all phenomena. Human consciousness, human subjectivity, and human meaning constitute evolved forms of inherent relationality—evolved forms of meaning. In essence, one might say that meaning is reality interacting with itself.

# AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct, and intellectual contribution to the work, and approved it for publication.

# ACKNOWLEDGMENTS

We'd like to thank the reviewer for the very insightful comments made on previous versions of this manuscript.

# REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Jordan, Cialdella, Dayer, Langley and Stillman. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Understanding the Impact of Expertise in Joint and Solo-Improvisation

#### Johann Issartel<sup>1</sup> \*, Mathieu Gueugnon<sup>2</sup> and Ludovic Marin<sup>2</sup>

<sup>1</sup> Multisensory Motor Learning Laboratory, School of Health and Human Performance, Dublin City University, Dublin, Ireland, <sup>2</sup> EuroMov – University of Montpellier, Montpellier, France

Joint-improvisation is not only an open-ended creative action that two or more people perform together in the context of an artistic performance (e.g., theatre, music or dance). Joint-improvisation also takes place in daily life activities when humans take part in collective performance such as toddlers at play or adults engaged in a conversation. In the context of this article, joint-improvisation has been looked at from a social motor coordination perspective. In the literature, the nature of the social motor coordination characteristics of joint-improvisation for either the creative aspect or daily life features of this motor performance remains unclear. Additionally, both solo-improvisation and joint-improvisation need to be studied conjointly to establish the influence of the social element of improvisation in the emergence of multi-agent motor coordination. In order to better understand those two types of improvisation, we compared three level of expertise – novice, intermediate and professional in dance improvisation to identify movement characteristics for each of the groups. Pairs of the same level were asked to improvise together. Each individual was also asked to perform an improvisation on his/her own. We found that each of the three groups present specific movement organization with movement complexity increasing with the level of expertise. Experts performed shorter movement duration in conjunction with an increase range of movement. The direct comparison of individual and paired Conditions highlighted that the joint-improvisation reduced the complexity of the movement organization and those for all three levels while maintaining the differences between the groups. This direct comparison amongst those three distinct groups provides an original insight onto the nature of movement patterns in joint-improvisation situation. Overall, it reveals the role of both individual and collective properties in the emergence of social coordination.

Keywords: expertise, dance improvisation, joint-action, wavelet transform, interpersonal coordination

# INTRODUCTION

Human behavior does not only consist of set goals. We plan our actions but need to constantly make changes in this plan to fit the situation requirement. At the same time, if any unplanned events emerge from our interaction with the environment, we immediately react to them. This constant interaction with the world around us is quite efficient and accurate. In other words, improvising is an action humans tend to do on a daily basis. Interestingly, we do not consider "what we do" as an improvisation. Improvisation is not a concept that is paramount to our daily thoughts

#### Edited by:

Michael J. Richardson, University of Cincinnati, United States

#### Reviewed by:

Lior Noy, Weizmann Institute of Science, Israel Motonori Yamaguchi, Edge Hill University, United Kingdom

> \*Correspondence: Johann Issartel johann.issartel@dcu.ie

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 23 December 2016 Accepted: 12 June 2017 Published: 30 June 2017

#### Citation:

Issartel J, Gueugnon M and Marin L (2017) Understanding the Impact of Expertise in Joint and Solo-Improvisation. Front. Psychol. 8:1078. doi: 10.3389/fpsyg.2017.01078

even if one could consider that this is "what we actually do." In the late 80's, Agre and Chapman had started to question the concept of improvisation and its role in daily life activities. From their perspective, everyday is a constant moment-to-moment improvisation – "life is a continual improvisation" (Agre and Chapman, 1987, pp. 287). Usually, the term improvisation is used in arts as an off-the-cuff performance with the absence of anticipation or planning taking into account the audience (i.e., the environment). Although it is possible to improvise alone, in our everyday life, improvisation is almost all the time an action that requires an interaction either between a human and the non-human environment (e.g., synchronizing with music, with a video game and so on) or between people (e.g., collaboration, competition, and synchronization). The latter, called social interaction, is one of the most important source of improvisation. In other words, we all improvise in our daily life. In that sense, joint-improvisation can be seen as a sense of cooperation between performers (Seham, 2001) to create a moment, frequently reported as "being in the zone." Those moments of togetherness (Hart et al., 2014; Noy et al., 2015) are the expression of integration of the individual and collective properties merged together.

The notions of individual and collective properties come from von Holst's (1937) paper when he claimed that individual components possess intrinsic properties that tend to persist even when these components are coordinated with others (i.e., collective properties). For any biological component, there is a joint effect of the individual properties to resist to changes – maintenance tendency – in conjunction with the magnet effect attracting those components together (i.e., the collective properties). In the context of an improvisation (when movements are not constrained), one would see the individual properties as the characteristics of the performers' creative movements whereas collective properties would be related to the interaction between these movements. In a previous study, we investigated the organization of the individual and collective properties during improvisation (Issartel et al., 2007). Participants were asked to move freely their forearm in the sagittal plane by exploring, without constraint, the whole range of frequency. Using a wavelet analysis, we found a presence of an individual motor signature expressing the intrinsic dynamic that leads the motor behavior in a specific and limited range of frequencies. However, when two people interacted together in an improvisation task, the individual motor signatures changed and were partially modulated to fit each other. More precisely, this emergence of collective properties between participants was observed in terms of frequencies of movements that could lead to coordination.

Furthermore, using the well-known mirror game paradigm (Noy et al., 2011; Gueugnon et al., 2016), Hart et al. (2014) investigated the specific moment of togetherness in improvisation. Participants were asked to mirror each other and create interesting synchronized motion with and without a designated leader. They here observed that each leader person performed a specific velocity profile of their movements (i.e., skewness and kurtosis). Interestingly, in specific moments of togetherness, both players of the interaction changed their motor signatures toward an universal signature (resembling to a velocity profile of a sine wave) in order to be coordinated and improvised together. Finally, the organization of the individual and collective properties has been extended by a recent work from Słowinski et al. (2016) ´ . They confirmed the presence of individual properties in terms of the velocity distribution of the improvised movements during mirror game. By comparing motor signatures and coordination of interactants, they showed that individual properties have to be taken into account in social coordination. Indeed, their results suggest that the similarity between individual signatures promotes interpersonal coordination during joint improvised action leading to better "social glue," affiliation or social exchange (Wilson and Knoblich, 2005; Ashton-James et al., 2007; Semin, 2007; Hove and Risen, 2009; Miles et al., 2009; Semin and Cacioppo, 2015).

Overall, those joint-action characteristics are highly dependant upon the individual capabilities. One common way to identify the individual characteristics is to compare novice with experts. The idea is to quantify and qualify what makes an expert, the one able to perform unique, optimized, efficient, and proficient movement patterns (Kiefer et al., 2011, 2013). To characterize individual movement expertise, researchers have targeted a specific population: expert dancers. For example, Kiefer et al. (2011, 2013) have highlighted that the balance skills of expert dancer lead to greater balance ability without compromising the adaptability and flexibility of the coordinative structure. Jarvis et al. (2014) reported higher trunk variability for experts prior to landing in a "sauté" while observing a lower variability for this same group for any other kinematic and inter-segmental coordination. The above-mention results reveal the importance of the key role of individual variability when it comes down to understand movement pattern expertise. These individual characteristics were also considered in joint-action dance situations.

In joint-action situations three main characteristics could be examined: (i) subjective, (ii) physiological, and (iii) kinematic markers of joint-action. The subjective measures would tend to evaluate the sense of togetherness experienced by the performers (Nachmanovitch, 1990; Seham, 2001). Those instants, referred as "being in the zone" (in the context of an improvisation), are considered as the peak moments in terms of performance and/or synchrony amongst performers. They tend to be accompanied by physiological responses with increased heart rate associated with subjective rating of togetherness (Noy et al., 2015). The kinematic markers in joint-action also revealed that high level of togetherness between performers is characterized by smooth and symmetric movement properties (Hart et al., 2014). For example, those kinematics properties could be expressed in terms of amplitude of movement, frequencies of the movement performed or relative phase between the performers (Gueugnon et al., 2016). Along the same line, Washburn et al. (2014) have demonstrated that trained dancers have developed better visuo-motor coordination capabilities than untrained dancers. Experts express better capabilities in discriminating their partners ongoing movement and anticipating future behavior (Calvo-Merino et al., 2010). Overall, in the context of complex actor-environment interaction, experts' better synchronization capabilities seem to play a role in activity of daily living. These

capabilities would act as facilitator of social awareness and social entrainment as well as adaptive behavior.

The article investigated the question of expertise in improvisation task in the aim to specifically identify movement characteristics that would reflect expertise in dance improvisation. This identification can be done both at individual and collective levels where we expect to observe a modification of the marker of improvisation with expertise. We would then be able to question how expertise modifies the joint effect of maintenance tendency and magnet effect. The experimental manipulation of two dimensions (both individual and collective characteristics as well as expertise) will allow a double comparison of influence of an improvisation task on each of these dimensions. It will also allow us to untangle together the influence of expertise on individual and collective characteristics in improvisation task. One would expect to observe a clear difference between the levels of expertise where individual expert dancers' movement characteristics would perform a wider variety of movements. These differences would be magnified in the context of a joint-improvisation where the magnet effect would tend to reduce the variety of movement produced for all levels of expertise while maintaining a clear difference between groups.

# MATERIALS AND METHODS

# Participants

Thirty-six participants were randomly paired in 1 of the 3 specific groups of dance expertise. In the 1st group, called "Novice Dancers," participants had no experience of dance other than what most people would have had in their personal leisure time. The second group, called "Intermediate Dancers," had 4–5 years experience in contemporary dance. Typically, they would have attended 2–3 times of week classes while also taking part in public performances as part of a troupe. The third group called, "Expert Dancers," had at least 10 years experience as professional contemporary dancers. Informed written consent was obtained for all participants on the day of data collection. All participants were free to withdraw from the study at any stage. Full ethical approval was granted by the University Research Ethics Committee.

# Procedure and Design

Participants were seated on a chair with their right elbow resting on a table in front of them. Participants were instructed to look at a black dot placed at eye level on a wall located 2 meters away in front of them. For all experimental Conditions, participants were asked to move their right forearm in the sagittal plane while keeping their wrist and fingers constantly aligned with their forearm (i.e., no movement of the wrist or fingers). Their left hand was resting on their left leg. Participants were instructed not to move their head or trunk and not to raise their elbow off the table. Participants were invited to freely move their forearm in the sagittal plane by exploring, without constraint, the full range of amplitude, phase, and frequency. Those free movements were performed in two Conditions ("Paired" and "Alone"). In the "Paired" Condition, participants were seated across from each other in a way that their forearms were directly aligned with the back dot located directly in front of them. In this Condition, participants were asked to take into account the movement of the other participants to perform his/her own movements. This setup was conceived to ensure that participants would only have a peripheral vision of the other participant's forearm. In the 2nd experimental Condition, the participant was on its own, called "Alone," where they were told, as mentioned above, to freely move their right arm in the sagittal plane. Each participant performed 1 block of 6 trials for each Condition (i.e., "Paired" and "Alone"). The Conditions were randomized across the pairs. The duration of each trial was 3 min with a 2 min rest interval between trials. The experimental set was similar to the one used in a previous article of Issartel et al. (2007).

# Materials

Elbow goniometers Biometrics SG 110 (Biometrics, Oxford, England) measured the flexion and extension of the forearm. From the elbow center of rotation, one end of the goniometer was attached to the forearm and the other end on the upper arm. The sampling rate was set at 50 Hz.

# Data Analysis

As participants were able to freely move their forearm, non-stationary time-series were collected preventing us from using traditional human movement signal processing methods (**Figure 1**). The method to be used had to take into account the pluri-frequency nature of the signal as well as the changes in phase that is usually observed in an improvisation-like task (see Issartel et al., 2007 for example of improvisation-like data). The wavelet transform (WT) and the cross-wavelet transform (CWT) methods were used to quantify the signals in terms of frequency and phase (Schmidt et al., 2014). Multiple frequencies can be observed at the same time and over time while also considering the relative phase for each of those frequencies. This method opens the door to multi-scale signals analyses over finite spatial and temporal domains.

The WT and CWT methods transform traditional time-series into scalograms: an expression of the signal in frequency as function of time. Those scalograms are obtained by the convolution of the time-series with an analyzing function (see Issartel et al., 2006, 2015 for more details). The scaling of this analyzing function determines the characteristic frequency of the signal at a given time. This analyzing function is also swept over time giving us an analysis of the whole time-series for a set frequency range as function of the time. To cover the frequency range of participants' movement, the band of frequencies chosen for this analysis was [0.04–6.35 Hz]. The analysis of the signals was performed with the Morlet analyzing function (order of 8, see Issartel et al., 2006).

For the "Alone" Condition, one scalogram was analyzed as described above. For the "Paired" Condition, the CWT analysis provides us with two separate scalograms. The first one is a scalogram that is a representation of the common frequencies between the two participants. The second one represents the relative phase for each of those common frequencies.

To characterize the performance of the participants, five variables have been used. (i) We extracted the number of frequencies performed by the participants for each trial from the WT and CWT spectrum. Along the same line, (ii) we calculated the spread of the frequency range covered for each trial. The range of frequency will provide information in terms of movement speed so that we will be able to consider if some groups performed wider range of frequencies and also slower and/or faster movement. To consider the energy content of the signal, an atomic reconstruction analysis was performed. The idea was to scan the whole WT spectrum to extract specific pocket-like of events representing key moments during each trial. The reconstruction performs iterations of the spectrum to reveal the atoms containing local maxima within 1 s vicinity (Bardainne et al., 2006). The stopping criterion was set at 90% of the reconstruction level to avoid the inclusion of local maxima that would come up as mathematical artifacts of the WT and CWT analysis. Those artifacts are mainly caused by the tradeoff between the accuracy in time and the accuracy in frequency that is inherent to such computation. Hence, the output from those analyses allow us to characterize (iii) the number of atoms which gives us a representation of the number of events occurring during the improvisation as well as (iv) an estimation of their duration. Finally, in order to assess coordination in the "Paired" Condition, we extracted (v) the distribution of the relative phase.

# Statistics

fpsyg-08-01078 June 30, 2017 Time: 8:9 # 5

Five ANOVAs were applied to for the number of frequencies, the frequency range, the number of Atoms, the duration of the Atoms, and the distribution of the relative phase. Sphericity was assessed for each of these variables. When sphericity was not met, the Greenhouse and Geisser's correction for the degrees of freedom was applied. Bonferroni's correction post hoc analysis was used where necessary to assess the direction of significant effects.

# RESULTS

# Number of Frequencies

The 3 (Groups) × 2 (Conditions) repeated-measures ANOVA on Number of Frequencies yielded a significant main effect for Groups [F(2,33) = 19.83, p < 0.01, η 2 <sup>p</sup> = 0.55]. There was no main effect for Conditions [F(1,33) = 0.6, p > 0.05, η 2 <sup>p</sup> = 0.02] and no interaction effect between Conditions and Groups [F(2,33) = 0.46, p > 0.05, η 2 <sup>p</sup> = 0.03]. Post hoc comparisons revealed significant differences between Novice and Intermediate Dancers (p < 0.01), Novice and Expert Dancers (p < 0.01), and Intermediate and Expert Dancers (p < 0.05) revealing that Intermediate Dancers performed more frequencies than Novice Dancers and that Expert Dancers performed more frequencies than Intermediate and Novice Dancers (**Figure 2**).

# Spread of Frequencies

The 3 (Groups) × 2 (Conditions) repeated-measures ANOVA on Spread of Frequencies yielded a significant main effect for Groups [F(2,33) = 7.71, p < 0.01, η 2 <sup>p</sup> = 0.32]. There was no main effect for Conditions [F(1,33) = 1.32, p > 0.05, η 2 <sup>p</sup> = 0.04] and no interaction effect between Conditions and Groups [F(2,33) = 1.56, p > 0.05, η 2 <sup>p</sup> = 0.09]. Post hoc comparisons revealed significant differences between Novice and Expert Dancers (p < 0.01) revealing that Expert Dancers explored a larger range of frequencies in comparison with Novice Dancers (**Figure 3**). There were no significant differences between Intermediate and Novice Dancers (p > 0.05) or Intermediate and Expert Dancers (p > 0.05) indicating that the Intermediate Dancers behavior is situated between the Novices and the Experts Dancers.

# Number of Atoms

The 3 (Groups) × 2 (Conditions) repeated-measures ANOVA on Number of Atoms did not yielded any significant main effect for Conditions [F(1,33) = 1.54, p > 0.05, η 2 <sup>p</sup> = 0.05] or Groups [F(2,33) = 1.45, p > 0.05, η 2 <sup>p</sup> = 0.08]. Also, there was no interaction effect between Conditions and Groups [F(2,33) = 0.41, p > 0.05, η 2 <sup>p</sup> = 0.02]. This result indicates that the expertise level does not influence the number of events performed by the participants (**Figure 4**).

# Duration of Atoms

The 3 (Groups) × 2 (Conditions) repeated-measures ANOVA on Atoms Duration yielded a significant main effect for Groups [F(2,33) = 15.34, p < 0.01, η 2 <sup>p</sup> = 0.48]. There was main effect for Conditions [F(1,33) = 9.94, p < 0.01, η 2 <sup>p</sup> = 0.23] and an interaction effect between Conditions and Groups [F(2,33) = 3.7 p < 0.05, η 2 <sup>p</sup> = 0.18]. Post hoc comparisons indicated significant differences between Novice and Intermediate Dancers (p < 0.01), Novice and Expert Dancers (p < 0.01) for both Conditions revealing that both Intermediate and Expert Dancers tend to perform each atom for a shorter duration in comparison with Novice Dancers (**Figure 5**). Also Novice Dancers in the Alone Condition perform each movement for a longer period of time in comparison with the Paired Condition (p < 0.01). At Condition level, there was no significant difference between Intermediate and Expert Dancers (p > 0.05).

# Distribution of the Relative Phase

The relative phase values were extracted from the CTW spectrum. The distribution of the relative phase angles was determined across six 30◦ regions of relative phase between 0 ◦ and 180◦ . A 3 (Groups) × 6 (Phase regions) ANOVA yielded a significant group difference for the 30◦–60◦ region [F(2,15) = 4.61, p < 0.05, η 2 <sup>p</sup> <sup>=</sup> 0.49] and for the 150◦–180◦ region [F(2,15) = 14.87, p < 0.05, η 2 <sup>p</sup> = 0.58]. Post hoc analyses revealed two significant differences between Intermediates and Expert Dancers. Firstly, Expert Dancers explored the 30◦–60◦ region more often than the Intermediate Dancers. Secondly, results suggest a higher entrainment of Intermediate Dancers toward the anti-phase region (150◦–180◦ region) in comparison with the Expert Dancers. No other significant differences were found (**Figure 6**).

# DISCUSSION

This study had the objective to investigate the movement characteristics reflecting the expertise in dance improvisation. Three level of expertise were considered (novice, intermediate, and expert dancers). To identify the individual characteristics, each of the dancers performed an improvisation on their own. To analyze the collective properties, dancers performed an improvisation task in pairs. The results clearly show a pathway from novice to experts when it comes down to define the type of movement performed by dancers. This pathway was found in both individual and collective improvisation.

When scrutinizing the experts specific behavior, the larger number of frequencies (**Figure 2**) performed illustrate a richer movement production as they explore a larger and more spread range of frequencies. In other words, they can produce, a wider range of actions while also exploring more frequencies within this wider spectrum. Experts perform slower movement (lower frequencies) in comparison with novices and intermediates. It is important to highlight that in term of "difficulty/complexity" those movements could have been performed by novices and intermediates. There is no mechanical, physiological or neuromuscular constraints that could explain the absence of certain type of movement. This observation crystallized the unique capability of expert dancer to produce, on their own, but also in the interaction with others, certain movements that

FIGURE 2 | Number of Frequencies for the 3 Groups (Novice, Intermediate, and Expert) and the 2 Conditions (Alone and Paired). <sup>∗</sup>Asterisks indicate significant differences P < 0.05.

everyone could perform but that only experts actually perform. In other words, everyone is capable of performing this wide range of action but only expert manage to explore it in the context of this improvisation. This trait is central in our understanding of dance expertise, and more widely in our understanding of movement expertise in general. Expert dancers are able to produce a unique motor performance within the same range of possibilities available to novice and intermediate dancers. Experts and Intermediates dancers also tend to move on, from one type of action to the next one, more often than Novice dancers (i.e., shorter atom duration) while going though intermediate phases that lead to the next phase of joint-action. Overall, those

findings demonstrate that the amount of experience in momentto-moment improvisation enhances the capability and capacity of the performers.

As classicially reported in the literature, behavioral synchrony has been described as a marker of expertise (Noy et al., 2011; Sofianidis et al., 2012, 2014; Washburn et al., 2014). Expertise can be qualify as an ability to be more tuned with the "information about sequence structure and upcoming movement possibilities" (Washburn et al., 2014, p. 11). Better ability to distinguish grammatical sequence (Opacic et al., 2009), better at reading current and future events. It's an ability to jointly consider the performer own movement capabilities and the

expectation of the confederate own capabilities. Dance expertise favors the emergence of moment-to-moment coupling (in both frequency and phase) and better movement discrimination such as deciphering what their partners would perform while also been able to anticipate future events. This will in turn facilitate the synchronization between the performers (Calvo-Merino et al., 2010). Those two elements: anticipation and discrimination of the moment-to-moment coordinated performance would occur concomitantly in an improvisation task. The interaction between anticipation and discrimination can be discussed in line with the concepts of maintenance tendency and magnet effect. Being able to discriminate his/her partner's movement would in turn facilitate the magnet effect and therefore the social entrainment between the two performers. At the same time, being able to better anticipate their partner's movement would enhance the performer's choice of action to be performed. Then maintenance tendency would be at play guiding the performer to continue to explore with his/her own individual movement characteristics (von Holst, 1937). In other words, the more the dancers anticipate, the more they can keep their own motor signature. It is the same principle when a couple of salsa dancers are perfectly in phase but the woman partner add extra little moves with her head or leg. It is because she anticipates the movement of her co-actor, that she can maintain her own motor signature and add other ancillary movements. In addition, when the woman dancer is able to anticipate, the male dancer is more incline in maintaining his own performance (maintenance tendency). This point is in a way contradicting Washburn et al. (2014)'s argument as they suggest that dancers higher level of coordination could be either due to a better ability at (i) discriminating movement properties or (ii) at anticipating confederate actions independently of their own action capabilities. Based on the specific expert behavior observed in this study, expert improvisation seems to reflect the conjunction of the individual and collective properties (the alliance of maintenance tendency and magnet effect) rather than a dissociation between the performer's action capability and their ability in discrimination and anticipating the action of others.

The unique characteristics of expertise can also be interpreted in terms of expert ability to optimize task's constraint (Newell, 1986; Sofianidis et al., 2012), enabling the emergence of complex physical movement (Kiefer et al., 2011). Also as proposed by Sofianidis et al. (2015, p. 216) expert dancers may have an "improved multisensory integration capacity." The authors made this discussion point in the context of an interpersonal ankle/hip synchronization task where expert dancers depicted a more stable ankle/hip phase relationship. The expertise unique characteristics observed in our study are in line with Sofianidis et al. (2015) findings and those of Washburn et al. (2014) described above. On one hand, expert dancers have the capacity to produce unique movements while taking into account the movements proposed by their partner. The observed coordinated

behavior reflect the combination of their own movement capabilities, their ability to discriminate the information of the confederate action while also anticipating future movements. On the other hand, novices were less capable of anticipating, discriminating while also having reduced movement capabilities resulting in a reduced variety of movement, a lower range of frequencies and a tendency to maintain longer any performed frequency.

As for intermediate participants, it seems they are "on the way of becoming expert" in the sense they do not behave as novices but they are not yet experts, when observing all key variables. However, the relative phase results are unique and raise an interesting discussion point. Intermediate dancers manage to explore more the anti-phase region than both experts and novices. Why aren't expert using this kind of coordination? Is it a lack of expertise? This argument does not appear to be very convincing as the experts have five more years of experience. They have been employed by professional choreographers for years to create and performed public performances. If their expertise is not a reason explaining those differences, then we should consider the nature of the relationship between frequency and relative phase in movement production. To contextualize this interaction, it seems important to make a reference to the HKB Model (Haken et al., 1985) demonstrating that a modification of the control parameter (e.g., frequency) alters the order parameters (e.g., relative phase). More specifically in Bardy et al.'s (2002) experiment participants, stood in front of a large video screen and were asked to track the front-to-back oscillations of a video graphic target that varied in frequency in a stepwise manner. The authors observed a qualitative change of the order parameter (the relative phase between the ankle and the hip) due to the increased frequency of target motion. In the context of this current improvisation task, we have observed that expert dancers proposed a larger range of movement frequencies as well as a higher number of frequencies. Those unique frequencies only developed by experts, seem to characterize dance expertise. As a consequence, it seems possible that this unique set of frequencies have on knock-on effect in their ability to also propose a wide range of relative phase (even non-natural ones when performing 30◦–60◦ relative phase). This argument is in line with the performance of the intermediate dancers. This group performed more anti-phase movement than the expert dancers while been unable to perform the same range of frequencies in comparison with the expert group. This finding opens the doors to future research: could practice/learning bring the expert dancers to the next level where they would be able to maintain their range of movement frequencies while performing

# REFERENCES


more anti-phase coordination? Likewise, would expert dancers be better at coordinating in an unusual range of relative phase (30◦–60◦ ) that can only be possible after learning such a nonspontaneous range of coordination (Zanone and Kelso, 1992)?

Overall the improvisation situation proposed in this study revealed that expert dancers are able to come up with a unique creative performance through movement patterns in space and time. Not only those creative performance characteristics are present in a solo improvisation; unique expertise trait were also found in the joint-improvisation. Results of this study revealed that experts developed specific non-verbal communication, through their unique movement patterns, as observed with the behavioral markers discussed above. Expert dancers are attuned to their own movement patterns (Opacic et al., 2009) and also those of their partners during a creative performance. This acquired double propensivity to perform a unique set of movement while taking into account the confederate's movement seems to be a signature of dance experts in the context of a joint-improvisation. Overall, better social coordination ability coupled with higher action capabilities (and/or creativity) could enhance daily life social activities in increasing cohesion and communication (Dale et al., 2014). In that sense, this expertise could also bring a better adaptive behavior in the work place and/or during any type of group physical activities.

# ETHICS STATEMENT

All authors acknowledge ethical responsibility for the content of the manuscript and will accept the consequences of any ethical violation. This work received full ethical approval from University of Montpellier (France).

# AUTHOR CONTRIBUTIONS

JI and LM conceived and designed the experiment. JI performed the data collection and data analysis. JI, LM, and MG wrote the article.

# FUNDING

This experiment was supported by the European Project AlterEgo, FP7 ICT 2.9 – Cognitive Sciences and Robotics, Grant Number 600610.

chirplet atomic decomposition. Example from the Lacq gas field (Western Pyrenees, France). Geophys. J. Int. 166, 699–718. doi: 10.1111/j.1365-246X.2006. 03023.x



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Issartel, Gueugnon and Marin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# To Pass or Not to Pass: Modeling the Movement and Affordance Dynamics of a Pick and Place Task

Maurice Lamb<sup>1</sup> \*, Rachel W. Kallen<sup>1</sup> , Steven J. Harrison<sup>2</sup> , Mario Di Bernardo3, 4, Ali Minai <sup>5</sup> and Michael J. Richardson<sup>1</sup>

<sup>1</sup> Center for Cognition, Action and Perception, University of Cincinnati, Cincinnati, OH, United States, <sup>2</sup> Department of Kinesiology, University of Connecticut, Connecticut, CT, United States, <sup>3</sup> Department of Electrical Engineering and Information Technology, University of Naples Federico II, Naples, Italy, <sup>4</sup> Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom, <sup>5</sup> Department of Electrical Engineering and Computing Science, University of Cincinnati, Cincinnati, OH, United States

#### Edited by:

Hanne De Jaegher, University of the Basque Country, Spain

#### Reviewed by:

Varun Dutt, Indian Institute of Technology Mandi, India Loïc Deschamps, University of Technology of Compiègne, France

> \*Correspondence: Maurice Lamb maurice.lamb@uc.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 14 December 2016 Accepted: 08 June 2017 Published: 28 June 2017

#### Citation:

Lamb M, Kallen RW, Harrison SJ, Di Bernardo M, Minai A and Richardson MJ (2017) To Pass or Not to Pass: Modeling the Movement and Affordance Dynamics of a Pick and Place Task. Front. Psychol. 8:1061. doi: 10.3389/fpsyg.2017.01061 Humans commonly engage in tasks that require or are made more efficient by coordinating with other humans. In this paper we introduce a task dynamics approach for modeling multi-agent interaction and decision making in a pick and place task where an agent must move an object from one location to another and decide whether to act alone or with a partner. Our aims were to identify and model (1) the affordance related dynamics that define an actor's choice to move an object alone or to pass it to their co-actor and (2) the trajectory dynamics of an actor's hand movements when moving to grasp, relocate, or pass the object. Using a virtual reality pick and place task, we demonstrate that both the decision to pass or not pass an object and the movement trajectories of the participants can be characterized in terms of a behavioral dynamics model. Simulations suggest that the proposed behavioral dynamics model exhibits features observed in human participants including hysteresis in decision making, non-straight line trajectories, and non-constant velocity profiles. The proposed model highlights how the same low-dimensional behavioral dynamics can operate to constrain multiple (and often nested) levels of human activity and suggests that knowledge of what, when, where and how to move or act during pick and place behavior may be defined by these low dimensional task dynamics and, thus, can emerge spontaneously and in real-time with little a priori planning.

Keywords: behavioral dynamics, affordance dynamics, joint-action, pick and place, dynamical systems theory

# INTRODUCTION

Living and working in shared spaces often requires that individuals coordinate their actions together to accomplish shared behavioral goals. From a busy family preparing for the day to a couple casually loading a dishwasher together after a dinner party, interpersonal coordination often results in tasks being achieved more quickly and efficiently. Indeed, the addition of other individuals within a task action space constructively increases the complexity of (sub-)task behaviors over time by creating new (and destroying old) opportunities for action. Previous attempts to understand how the behavioral order of such joint-action coordination emerges over time have largely focused on identifying the representational and neural structures that support successful joint-action,

including social action understanding and the perception of others intentional states (e.g., Rizzolatti and Craighero, 2004; Newman-Norlund et al., 2007; Graf et al., 2009; Sebanz and Knoblich, 2009). Equally important, however, is identifying the dynamical processes or laws that not only operate to constrain what and when behavioral actions are afforded during jointactivity, but also naturally shape the movements patterns or trajectories employed in the actualization of task relevant action possibilities.

Interestingly, previous research investigating the dynamical processes of coordinated joint-action and multiagent activity has demonstrated that the behavioral order of such activity is often self-organized and synergistic, naturally emerging from the task-relevant physical, biomechanical, and informational couplings and constraints that exist between co-actors and within a joint-action task space (e.g., Schmidt et al., 1990, 2012; Schmidt and O'Brien, 1997; Marsh et al., 2006; Frank and Richardson, 2010; Richardson et al., 2010; Riley et al., 2011; Anderson et al., 2012; Richardson and Kallen, 2015; Washburn et al., 2015). In turn, a growing number of researchers have also argued that multiagent activity is best conceptualized as a complex dynamical system and, moreover, that the behavioral order of self-organized, synergistic multiagent coordination can be understood and modeled using low-dimensional task or behavioral dynamics principles (e.g., Schmidt et al., 1990, 1998; Warren, 2006; Lagarde, 2013; Dumas et al., 2014; Richardson and Kallen, 2015; Richardson et al., 2015).

Motivated by this latter claim, the objective of the current study was to identify and model the dynamics that are relevant to social and joint-action object moving and passing tasks. As an initial exploration of these dynamics, a relatively simple object pick and place task was employed, in which one actor had to move objects from one tabletop location to another either alone or by passing the object to another co-actor. Of particular concern was identifying and modeling the affordance related dynamics that defined an actor's choice to move an object alone or to pass it to their co-actor and the trajectory dynamics of an actor's hand movements when moving to grasp, relocate, or pass the object. With regard to the latter aim, we were interested in determining whether the simple behavioral dynamics model of route selection and locomotory path navigation previously developed by Fajen and Warren (2003, 2004; also see Warren, 2006; Warren and Fajen, 2008) could be successfully generalized to model the smaller scale hand movement trajectories that occur during object pick and place tasks. We were also interested in determining whether an actor's choice of pass/release location is modulated by the location of the intended target location and/or the location of a co-actors hand. Below, we briefly review research and theory most relevant to these issues, prior to further detailing the specifics of the current study and the hypotheses being investigated.

# Affordances and Affordance Dynamics

Affordances are opportunities for action within an agentenvironment system (Gibson, 1979; Michaels and Carello, 1981; Shaw and Turvey, 1981; Turvey et al., 1981; Reed, 1996; Chemero, 2003). More specifically, affordances are lawful agent-environment action potentials that capture the complementary relation (the "fit") between an agent and the environment. For instance, a surface of a given height affords climbing (or not) in relation to an individual's body height and leg length (Warren, 1984). When sitting, an object is reachable (or not) based on the distance of the object relative to the arm-torso extension capabilities of the reaching agent.

Of course, if a human agent is allowed to stand and walk then any object is reachable and affords grasping so long as its size and weight are within the strength and grasping capabilities of the agent concerned. In addition to standing and walking over to grasp an object, a human agent could also use a stick or a pole to move an object within reaching distance. Similarly, if another agent with sufficient lifting capabilities is standing closer to a goal object, the human agent who wishes to reach and grasp the object in question could always ask that other agent move the object to a location within their reach or simply pass it directly into their hand. The significance of these latter examples is that they highlight how affordances are not only defined in relation to the bodily capabilities of an individual agent, but are also defined in relation human-tool systems (Shaw et al., 1995; Smitsman, 1997; Bongers et al., 2004) and jointaction or multiagent systems (Stoffregen et al., 1999; Richardson et al., 2007, 2010). The significance of this is that extending or increasing the degrees-of-freedom of one's perceiving-acting system via the embodiment of tools and cooperative co-action not only increases the number of different ways in which a certain affordance can be actualized, but can also increase the number of action possibilities or affordances that are available within an agent-environment system. For instance, a nail only affords hammering for a hammer-hand system. A large sofa only affords lifting and moving for a two-or more-person system.

With regard to understanding the dynamics of human and multiagent coordination, affordance research has revealed that action- or body-scaled ratios that capture the intrinsic relation between action relevant properties of an agent or multiagent system, A, and an environmental surface or object, E, can be used to predict critical shifts in the perception and/or actualization of affordances (e.g., Warren, 1984; Mark, 1987; Warren and Whang, 1987; Kinsella-Shaw et al., 1992; Richardson et al., 2007). For example, individuals spontaneously transition from reaching by extending their arm, to reaching by bending at the hip and extending their arm, to reaching by bending from an upright posture while extending their arm at critical actionscaled (E/A) ratios characterizing relevant relations between object distance and height in terms of the agent-environment system (e.g., Carello et al., 1989; Mark et al., 1997). Similarly, individuals' exhibit abrupt transitions between one-hand and two-hand grasping, and between one-person and two-person grasping at critical object-size/hand-size and object-size/armspan ratios, respectively; typically at an E/A ratio of 0.75 (e.g., van der Kamp et al., 1998; Richardson et al., 2007). Accordingly, E/A (where E is a measured action relevant environmental property and A is the measured action relevant property the agent) represents a generic control parameter that not only defines the afforded state(s) of an agent-environment system, but also characterizes the stability of the behavioral modes employed to actualize those afforded states (e.g., Warren, 1984; Mark et al., 1997).

With this control parameter in hand, subsequent research investigating the dynamics of affordance actualization has revealed that individuals do not always transition from one behavioral mode to another at the same critical E/A ratio (i.e., exhibit critical point transitions). Rather, individuals typically exhibit hysteresis, in that they transition between different affordance related behavioral modes at different E/A values depending on whether E/A is increased over time or decreased over time (e.g., Fitzpatrick et al., 1994; van der Kamp et al., 1998; Richardson et al., 2007). For instance, individuals transition between one-hand and two-hand, and between one-person and two-person grasping at a higher E/A ratios when object size is scaled from small to large (approximately 0.85) than when object size is scaled from large to small (approximately 0.65; e.g., van der Kamp et al., 1998; Richardson et al., 2007). The significance of hysteresis with regard to understanding the dynamics of human behavior is that it implies multi-stability (two or more states or modes of behavior are stable over a range of control parameter settings), as well as nonlinearity (e.g., Strogatz, 1994; Kelso, 1995; Richardson et al., 2014). As such, affordance transitions can be conceptualized as bifurcation events, with affordance dynamics modeled as a nonlinear dynamical system (e.g., Frank et al., 2009; Lopresti-Goodman et al., 2011; Harrison et al., 2016).

# Joint-Action Pick and Place Behavior

In its simplest form a pick and place task involves an individual picking up a specified object and moving that object to a specified location. Understanding the nested sequencing of sub-action movements entailed by such behavior is non-trivial, however, given the large number of redundant degrees-of-freedom of the human movement systems and the underdeterminacy in endpoint trajectories and/or joint angle configurations that this redundancy creates. Accordingly, there has been an extensive amount of research on such behavior, including research on the relationship between movement time, velocity, distance, and target goal size, path or trajectory length minimization, end-state comfort dynamics, end-effector vs. limb-joint control, hand-eye coordination, and so on (e.g., Fitts, 1954; Flash and Hogan, 1985; MacKenzie et al., 1987; Dean and Brüwer, 1994, 1997; Wolpert, 1997; Flash and Sejnowski, 2001; Jax et al., 2007; Rosenbaum et al., 2012). Of particular relevance here, is the well-established finding that given an obstacle free environment, humans tend to reach for and move hand-held objects along (i) a relatively straight line trajectory between pickup and drop-off locations, with (ii) a non-stationary, bell shaped, velocity profile that minimizes jerk and has a peak velocity between a 1/3 and 1/2 of the way through a movement (e.g., Fitts, 1954; MacKenzie et al., 1987; Dean and Brüwer, 1997; Flash and Sejnowski, 2001; Jax and Rosenbaum, 2007).

There is also a growing body of literature on joint-action pick and place behavior, including the effects of action observation on an actors' hand movement trajectories and grasping behavior (e.g., Becchio et al., 2008, 2012; Costantini et al., 2011; Ellis et al., 2013), the movement and action decision dynamics of individuals working independently of one another in a shared task space (Meulenbroek et al., 2007; Lorenz et al., 2014; Meyer et al., 2016; Scharoun et al., 2016), and when and how participants grasp, hold, and move objects together (e.g., Georgiou et al., 2007; Richardson et al., 2007; Vesper et al., 2009). As detailed above, joint-action pick and place behavior can also involve one agent passing an object to another agent when there is sufficient interaction between co-actors (Becchio et al., 2008; Meyer et al., 2013), with such interaction further increasing the constructive under-determinacy of how individuals are able to move an object from one location to another. Interestingly, although there is some recent evidence to suggest that individuals tend to pass objects to co-actors in a manner that maximizes the beginning-state comfort of the co-actor (so called, third–order motor planning; e.g., Ray and Welsh, 2011; Meyer et al., 2013), little is known about the location where actors choose to place or release an object for another co-actor in an under-constrained joint-action pick and place task. Indeed, when one actor chooses to pass, place, or release an object for another individual to move within a real-world context, a specific release/passing location is rarely pre-defined or specified prior to the passing action. A modest number of studies have started to examine this latter question within the context of human-robot interaction (e.g., Cakmak et al., 2011; Strabala et al., 2013) and have found that individuals prefer predictable pass locations and orientations. However, the highly constrained nature of the task contexts and object hand-over manipulations employed in these latter studies means that it is hard to generalize the results of these studies to human-human pick and place behavior (also see Shibata et al., 1995). Accordingly, a sub-aim of the current study was to begin to address this gap in the literature and, in particular, begin to identify the degree to which individuals spontaneously choose object pass and release locations as a function of a waiting coactors hand location and/or the final target goal location of the to-be-moved object.

Both the previous research outlined in this section and our own piloting indicated that when neither co-actor was constrained, it was not clear whether pass decisions and locations depended on co-actor movements while awaiting a pass, coactor movements once they received a pass (i.e., the passer's perception of the receiver's action capabilities), or all decisions depended only on features of the task environment. Often the person waiting on the pass would move prior to receiving the pass, though what drove that movement was not clear from the data. Notably, this means that in order to interpret and model joint-action pick and place behavior, we first needed to model and understand features of pass decisions in a social pick and place task, where interactions between co-actors are minimized. Thus, while the current task involved social action it was not a joint-action task (Becchio et al., 2008, 2012). By starting with the current social action task, the results of the current experiment will facilitate understanding and modeling joint-action pick and place behaviors when pass decisions and behaviors are relatively unconstrainted and co-actor behaviors become more interdependent. As such, the task we present in this paper is important for joint-action research because it fills significant gaps in the literature on joint-action pick and place tasks, including understanding where and when individuals pass to a human co-actor in an otherwise unconstrained task space.

# Modeling Behavioral Dynamics

The term "behavioral dynamics" refers to a general framework for understanding and modeling the complex movement dynamics that characterize the behavior of actors within an agentenvironment system. First detailed by Warren (2006) in order to understand the complex movement patterns of individuals performing solo-action tasks, the approach employs task specific models (Saltzman and Kelso, 1987) to discern the dynamics of coordinated behavior, and is equally applicable to joint-action and multiagent activity (e.g., Dachner and Warren, 2014; Rio and Warren, 2014; Richardson et al., 2015). Consistent with the more general dynamical and complex systems approach to human behavior (e.g., Kugler et al., 1980; Saltzman and Kelso, 1987; Thelen et al., 1994; Richardson et al., 2014), it places a strong emphasis on self-organization and contextual emergence, and, in turn, attempts to formally (mathematically) model human and multiagent behavior as emerging from the lawful interaction of physical and informational processes, biomechanical couplings, and contextual constraints.

A key requirement for modeling the behavioral dynamics of a specific action or movement task effectively is to define a functional, yet low-dimensional description of the corresponding task space. This includes appropriately defining (i) the task goal in terms of the relevant terminal objective, (ii) the minimal number of task dimensions (i.e., axes and task variables) required to express this terminal objective, and (iii) the task dynamic topology (equations of motion) for each task dimension and degree-of-freedom (Saltzman and Kelso, 1987; Warren, 2006). A foundational example of such task dynamics modeling is provided by the work of Fajen and Warren (Fajen and Warren, 2003, 2004; also see Warren, 2006; Warren and Fajen, 2008 for a review), in which the authors successfully modeled the selforganized behavioral dynamics of human locomotory navigation and route selection. Although the complete model proposed by Warren and Fajen is able to successfully capture route switching dynamics in relation to moving and stationary environmental goal locations and obstacles, of primary relevance here is the simple manner by which they modeled the locomotory trajectories of agents moving from an arbitrary start location to a fixed goal position. In this (sub)model, a locomoting agent was defined abstractly (at the whole-body level) as a directional pointmass within a Euclidian (x, y) planar task environment, with the agent's heading direction, ϕ, and the angle of the target goal location, θ<sup>g</sup> , defined with respect to one of the planar task axes (i.e., an exocentric reference frame was employed). The terminal objective of the locomoting agent was then defined as simply turning toward a target goal location by changing their heading direction or turning rate, ϕ˙, until ϕ−θ<sup>g</sup> = 0. The topology of this terminal objective was captured using the adapted mass-spring system.

$$
\ddot{\varphi} = -b\_{\mathcal{K}} \dot{\varphi} - k\_{\mathcal{K}} \left( \varphi - \theta\_{\mathcal{K}} \right) f(d\_{\mathcal{K}}), \tag{1}
$$

where ϕ˙, and ϕ¨, corresponds to the velocity and acceleration of the agent's heading angle, ϕ, and b and k are damping and spring/stiffness terms, such that −bgϕ˙ acts as a friction force on the turning rate, and the function −k<sup>g</sup> ϕ − θ<sup>g</sup> operates to minimize the difference between the agent's current heading angle, ϕ, and the angle, θ<sup>g</sup> , that will lead the agent toward the goal. Finally, f(d<sup>g</sup> ) is a function that modulates the rate of change in heading angle as a function of the distance, d<sup>g</sup> , to the goal typically this is set such that the closer the goal the more rapid deviations of ϕ away from θ<sup>g</sup> are minimized.

Although it might be hard to imagine that a simple system such as Equation (1) could effectively capture any form of complex human movement behavior, the ability of Equation (1) to successfully predict the steering and locomotory navigation behavior of human agents has been verified across numerous experimental procedures and environmental task contexts and with the addition of a similar obstacle avoidance function<sup>1</sup> the model has provided strong evidence that such behavior can emerge without a priori planning as a self-organized result of interacting environmental attractors and repellers (see Warren, 2006; Warren and Fajen, 2008 for reviews). Recent research has also demonstrated how similar route selection equations can be extended to a range of complex multi-agent locomotion or pedestrian tasks (e.g., Dachner and Warren, 2014; Rio and Warren, 2014) and that the behavioral dynamics approach more generally can be employed to understand and identify the lowdimensional dynamics laws the underlie a wide range of jointaction and multiagent movement coordination tasks (e.g., Lucas et al., 2015; Richardson et al., 2015, 2016).

# Current Study

As stated above, the objective of the current study was to begin to explore the behavioral dynamics that underlie social and jointaction object moving and passing tasks using a relatively simple object pick and place task, in which one participant had to move objects from one tabletop location to another either alone or by passing the object to a co-actor. The key manipulation was the relative distance of the starting (appearance) and target goal (drop-off) locations of the to-be-moved object with respect to the standing position of the participant and co-actor, with a specific range of appearance and drop-off locations chosen to identify and model three central facets of social and joint-action pick and place behaviors, namely: (1) the affordance dynamics that characterized an actor's choice to move an object alone (i.e., not passing) or passing it to the co-actor; (2) where an participant chooses to pass/release an object and the degree to which this pass location is modulated by the location of the intended target location; and (3) the trajectory dynamics of the participant's hand movements when moving toward, with, or passing an object.

<sup>1</sup>Fajen and Warren (2003) have modeled the change in ϕ with respect a stationary point-mass obstacle as by adding the function + PN k<sup>o</sup> (ϕ − θoi) e −|ϕ−θoi| f doi to

i Equation (1), where + k<sup>g</sup> (ϕ − θo) operates push the agent's heading direction, ϕ, away from the heading angle, θo, that leads toward the obstacle as a function of distance, f(do). Here, the addition of the exponential function, (e |−ϕ−θo| ) , ensures that the angular acceleration away from an obstacle quickly rises near the obstacle and results in a positive (right) truing rate when heading to the right of θ<sup>o</sup> and a negative (left) truing rate when heading to the left of θo.

Based on the previous research outlined above, we expected that participants would transition between passing and not-passing behavior as a function of their arm/torso reach capabilities. Of more interest, was determining what environmental variables operated to define the corresponding E/A control parameter. For the current task, we expected that target location would largely moderate a participant's passing decision. However, it was possible that an object's appearance distance might also operate to constrain passing decisions. We also expected that participants would exhibit hysteresis when the relative distance of the target location from the grasping agent was increased vs. decreased over time, indicative of a multistable, nonlinear dynamical process that could be modeled accordingly.

We had no a priori predictions with regard to the location that participants would choose to pass/release objects for their co-actor to pick up given the lack of previous research on this question. In general, however, we did expect that participants would exhibit a stationary and highly predictable pattern of behavior (Shibata et al., 1995; Cakmak et al., 2011; Strabala et al., 2013), either choosing a single pass/release location or passing/releasing objects in a position functionally related to the intended target location and the co-actors hand position.

With regard to the hand-movement trajectories of participants, we expected that the spatial dynamics of these movements would be qualitatively similar to the goal directed locomotory movements observed by Fajen and Warren (2003, 2004) and, thus, could be model using a adapted (extended) version of Equation (1). Note, however, that in contrast to the constant velocity assumption underlying the Fajen and Warren behavioral dynamics model of locomotory movements, we expected participant movements to exhibit a non-constant velocity profile and that a corresponding non-constant velocity function would need to be developed in order to successfully model the pick and place movements investigated here.

# MATERIALS AND METHODS

# Participants

Sixteen University of Cincinnati students (aged 18–28 years) were recruited to participate in the experiment. 8 male and 8 female participants took part in the study. Participants received credit as a part of a class requirement for an undergraduate Psychology course. All participants provided written consent prior to completing the study, with the procedures and methodology employed reviewed and approved by the University of Cincinnati Institutional Review Board.

# Materials and Apparatus

An illustration of the experimental task setup is displayed in **Figure 1**. As can be seen from an inspection of this figure, the participant and co-actor stood in front of 1.5 × 0.89 × 1.15 m table in a 3 × 4.9 m laboratory room and completed the object moving and passing task in a room-scaled virtual environment in which the virtual laboratory and table were isomorphic in size and location. The co-actor (henceforth confederate co-actor) in this experiment is a lab assistant and is known to the participant to be a lab member. The physical table acted as a solid surface that both limited the participant and confederate co-actor movements within the virtual environment and created a surface on which the participant and confederate co-actor could move a handheld wireless Polhemus Latus motion-sensor (Polhemus Ltd, Vermont, USA) that tracked their right hand movements within the virtual environment at 96 Hz. The participant was positioned on one side of the table, standing half way between the middle of the table and the pickup location, with the confederate co-actor positioned in the middle of the table on the opposite side.

The virtual environment, task objects, and task controllers were designed using the Unity 3D game engine (version 5.2.0; Unity Technologies, San Francisco, California) and Sketchup 2015 (Tremble Navigation Technologies, Sunnyvale, California). The virtual environment and task objects were presented to participants using an Oculus Rift DK2 headset (Oculus VR, Irvine, California), which had a vertical field of view of 105◦ and a horizontal field of view of 94◦ . The participant and confederate co-actor's head movements were also tracked using Oculus Rift DK2 head tracking system. Separate computers connected by a LAN connection powered the Oculus Rift DK2 HMDs, with each computer handling the rendering of the virtual environment and controlling the head movements for the participant and confederate co-actor. The Host computer (participant) handled the motion tracking inputs, task controllers, and data recording. The maximum display latency between the participant and confederate co-actor real-world movements and their movements in the virtual environment was 33 ms. The experimental task states, including positions of participant and confederate co-actor's hands and head position, the appearance state and position of the target objects, and which individual was in possession of a target object, were continuously recorded at 70 Hz.

Virtual reality was employed for the current experiment because it offered two immediate advantages over a real world pick and place task: (1) the task reset time between trials can be instantaneous when using virtual reality allowing for a large number of trials to be completed in a timely manner and (2) the virtual environment allows for improved control of possible confounds, limiting visual task and behavioral information available to each co-actor to only that which is being explicitly tracked during the task, i.e., task states, right hand movements, and head movements. Moreover, as a future goal of this line of research is implementation of the proposed dynamical model in artificial agents, the virtual reality paradigm provides an ideal apparatus for obscuring the identity/origin of co-actor behaviors.

Within the virtual environment, the participant and confederate co-actor were represented as identical virtual avatars modeled after a crash test dummy with a height of 1.8 m, with the virtual environment being identical for the participant and confederate co-actor except for the fact that they were positioned on the opposite sides of the virtual table. The height of the participant and confederate co-actor's visual field was also calibrated such that their viewing height was equivalent regardless of their actual height. Both the participant and confederate co-actor's right hands were represented by a semi-transparent blue sphere at the end of the dummy's right wrist in order to simplify interaction with the task environment.

The participant and confederate co-actor's hand-held wireless Polhemus Latus motion-sensors controlled the movements of this sphere. An inverse kinematics controller (model and controller supplied by Root Motion, Tartu, Estonia) driven by these motion sensor movements and the head movements of the participant and confederate co-actor controlled the right arm and body movements of the participant and confederate co-actor's virtual avatar, respectively. The resulting arm and body movements were not identical to the real world arm and body movements of the participant and confederate co-actor, but were close enough to render any differences between the real and virtual body postures of the participant and confederate co-actor unnoticeable or not functionally relevant.

# Experimental Task

The experimental task required a participant to move virtual disc objects that appeared on one side of the virtual tabletop to an indicated target location on the opposite side of the virtual table, with a choice of either moving the object alone or passing the object to the confederate co-actor. The disc objects always appeared on the participant's left side and the target location, specified by a red square, always appeared on the participant's right. A trial began when both the participant and confederate co-actor indicated they were ready by placing their sphere/hand in a blue ready location (blue square) displayed directly in front of them on the virtual table. When both the participant and confederate co-actor's virtual hands were ready, the ready locations would disappear and a disc would appear in one of 5 pickup locations along with one of 20 red target locations. The participant was instructed to pick up the disc when it appeared and attempt to move it to the target location. A pickup occurred when the participant's sphere came in contact with the disc. When picked up, the disc moved with the participant's hand until it reached the target or the participant passed the disc. The participant was informed that if the reach to the target was either too far or uncomfortable, they could pass it to the confederate co-actor. A pass involved picking up the disc and then releasing it somewhere on the table by lifting their hand from the table. Importantly, the confederate co-actor was instructed to remain at the ready position unless the participant initiated a pass (i.e., they were instructed not to move prior to a participant initiating a pass by releasing the object for them to pick up). This instruction insures that pass decisions and locations are not influenced by anticipatory or communicative movements initiated by the pass receiver. While such movements may be important to more complicated pick-and-place task, they can obscure how task features and passer preference affect pass behavior<sup>2</sup> . To complete a pass, the confederate co-actor would pick up the disc and move it to the target. A trial was completed when the disc reached the target. Upon trial completion the disc and target would disappear and the ready boxes for the next trial would

<sup>2</sup> In particular, note that interpretation of pass receiver movements as playing either a communicative or anticipatory role in task performance requires some understanding of the task dynamics driving the passer's decisions and behaviors independent of those movements.

appear. The participant's preferred reach was recorded before completing the experiment by asking the participant (inside the virtual environment) to reach to the farthest comfortable point along a blue line that appeared along the left side of the table. This reach distance was then used to scale the 5 appearance pickup locations to each participant's preferred reach distance. The 5 disc appearance/pickup positions, illustrated as yellow circles in **Figure 1**, were located along the same axis as the calibration line on the table extending perpendicular to the participant. These appearance/pickup locations corresponded to 20, 40, 60, 80, and 100% of a participant's preferred reach distance (i.e., E/A ratios of 0.2, 0.4, 0.6, 0.8, and 1.0)—the average reach distance of participants was 52.2 cm (SD = 6.98 cm). Relative to the ready/start location these object pickup locations were positioned at a negative x-distance of 32.3 cm and had mean y-positions of −1.4, 7.2, 15.8, 24.4, and 33 cm, respectively.

The same 20 unscaled target locations on the right side of the table were used for all participants. These target locations were equally spaced from the near to the far edge of the participants' side of the table. Relative to the start/ready location these had positive x-distance of 103.7 cm and y-positions from −7 cm to 59.5 cm in 3.5 cm steps.

# Procedure

Participants were told that the experiment was investigating the dynamics of object pick and place behavior and that they would be completing a simple pick and place task with a confederate co-actor. The participants and confederate co-actor were then embedded within the virtual environment using the HMD and viewing height, sensor, and appearance location calibration was performed. Task instructions were then provided to the participant and after participants indicated that they understood the task procedure and goal, experimental trials began. Participants were told that the task would involve 600 trials and that if the reach to the target was either too far or uncomfortable, they could pass the object to the confederate co-actor. Moreover, participants were encouraged not to strain themselves in order to reach a target. No further instructions regarding when or where to pass were given to participants.

Experimental trials were broken up into 3 blocks of 200 trials (i.e., 5 appearance/pickup locations × 20 target locations × 2 trials for each appearance-target location combination). In the first and third blocks of trials, the discs appeared sequentially, either progressively moving away from the participant (ascending) or toward the participant (descending) over trials with appearance order counterbalanced across participants. During these blocks, each pickup/appearance location was presented 40 times in a row with each presentation occurring twice for each of the 20 target locations, once while target locations appeared in an ascending order and once when they appeared descending order. Participants always experienced the same ascending-descending or descending-ascending order across appearance locations in the first and third trial blocks, with these target appearance conditions counterbalanced across participants. In the second block of trials, each pickup-target location pair was presented twice in a random order from trial to trial. After each 200 trial block, the participant and confederate co-actor were given an opportunity to rest before continuing to the next block. Blocks lasted between 10 and 15 min.

# RESULTS AND DISCUSSION

The current pick-and-place task was designed to address three related questions. First, what task variables determined the participants' decision to pass or not pass an object and what were the associated affordance related dynamics of these behavioral events? Second, where did participants choose to release the object when passing the object to a co-actor and to what degree was the pass location functionally related to the intended object goal location and/or the confederate co-actors hand location. Third, what were the trajectory dynamics of the participant's hand movements when moving to grasp, relocate, or pass an object within a two-dimensional task space. Below we consider each of these questions in turn.

# What Drove Pass Decisions?

For the pick and place task investigated here, there were essentially two relevant distance-related task variables that were likely to have influenced the participant's pass/no-pass behavior: the distance from the participant's ready location to the object pickup location and the distance from the participant's ready location to the object target drop-off location. Note that, by instructing the confederate co-actor to passively wait for passes, we have effectively eliminated the possible complicating (but potentially important) role anticipatory or communicative movements on behalf of the pass receiver. As a preliminary examination of the relationship between these two task variables and the participants' dichotomous, pass or not pass decisions, separate point-bi-serial correlations were conducted on the trial-by-trial pass/no-pass data series for each participant for each trial block (i.e., ascending-descending, random, and descending-ascending target location trial blocks). As can be seen from an inspection of **Table 1**, only target location was significantly correlated with the participant's pass/nopass behavior across trials, with an overall average correlation between the participant's trial-by-trail pass/no-pass behavior and target location of 0.796 (SD = 0.074; p < 0.001)<sup>3</sup> . In other words, the distance of the object pickup location appeared to have no effect of pass/no-pass behavior, with pass/no-pass behavior almost completely driven by the distance of target goal location<sup>4</sup> .

With regard to the target distance that participants transitioned between passing and moving the object alone,

<sup>3</sup>Herarchical logistic regression analyses were also performed to confirm that target distance was the only task variable to significantly predict participant pass/nopass behavior. Not only did this analysis further confirm that there was no correlation between participant trial-by-trail pass/no-pass decisions and object pick location, but it also verified that the variable interaction between pickup and target location did not predict participant trial-by-trail pass/no-pass behavior beyond that predicted by target distance alone.

<sup>4</sup>A participant-by-participant hierarchical logistic regression analysis with decision to pass on current trial as the dependent variable and the current target location and previous pass decision as an independent variable resulted in an average Nagelkerke's R 2 -value of .903 and .943 for the ascending a descending conditions, respectively (all χ <sup>2</sup> > 200.00, p < 0.001).

this occurred at an average y-target distance of 42.4 cm (SD = 9.17), which corresponded to an E/A ratio (i.e., ytarget-distance/participant comfort reach distance) of 0.823 (SD = 0.19). Consistent with previous affordance research (e.g., Fitzpatrick et al., 1994; van der Kamp et al., 1998; Richardson


et al., 2010), participants also exhibited hysteresis with the pass/no-pass transition occurring at an average E/A ratio of 0.853 (SD = 0.24; target y-distance of 43.7 cm) for the ascending target distances and 0.797 (SD = 0.21; target y-distance of 40.8 cm) for descending target distances, indicating that the relative stability of passing and non-passing behavior was more or less equivalent across this E/A parameter range (see **Figure 2**). To verify that this hysteretic effect was significant, a one-way repeated measures ANOVA comparing the distance (target location) that participants switched between passing and non-passing behaviors as a function of target location order (i.e., ascending, descending, and random), was conducted. Using a Greenhouse-Geisser correction this analysis revealed a significant effect of target location order, F(1.44, 21.606) = 8.908, p = 0.003, η 2 <sup>p</sup> <sup>=</sup> 0.373, with Bonferroni post hoc analysis indicating that pass/no-pass transition distance for the ascending target order was significantly higher compared to the pass/nopass transition distance for the descending target order (p = 0.027). There was no difference between the ascending and random target location orders (p = 0.541), but there was a significant difference between descending from random location orders (p = 0.015).

# Where did participants release/Pass Objects?

A Pearson correlation analysis revealed that, for a majority of participants, the (x, y) tabletop location where they released (passed) objects for the confederate co-actor during passing trials was significantly correlated with (i) the pass location chosen on the previous passing trial, (ii) the target location, and (iii) to a much lesser extent, the object pickup location (see **Table 2**). Separate hierarchical linear regression analyses were conducted on each participant's passing trial event series as a function of trial block, with trial pass location as the dependent variable and location of the previous pass, target location, and pick-up location sequentially entered as independent variables. As can be seen from an inspection of **Table 3**, this analysis revealed that on any given passing trial a participant's previous object release/pass location was the dominant predictor of a participant's current object release/pass location, with current target location and pickup location only slightly increasing the percentage of variance accounted for. This suggests that participants tended to more or less pick a location to release/pass the object for the confederate co-actor during early passing trials and then stick with that location across passing trials. To further verify the latter possibility, a cluster analysis was conducted, using the K-means cluster analysis algorithm, which finds cluster centers that minimize the sum of squared error (SSE) for a given number of clusters, k. We analyzed the release/pass locations to determine whether these locations typically clustered around 1, 2, or 3 cluster centroids. The optimal number of clusters was defined as the value of k such that the difference of the SSE for a reference distribution, determined by Monte Carlo sampling of a reference distribution, was greatest compared to the other values of k.

The results of this K-means cluster analysis can be seen in **Table 4**. As expected given the preliminary correlation and regression analysis reported above, for the majority of participants the optimal number of clusters was 1 within the same trial block. However, as can be seen from an inspection of **Figure 3**, participants appeared to adopt one of two object release/pass location strategies. That is, release/pass locations tended to occur in one of two general areas of the task space, with some participants exhibiting a tendency to release/pass objects nearer to the confederate co-actor's hand, while other participants tended to release/pass the objects nearer to the object target (drop-off) locations. This is particularly clear from an inspection of the 3D histograms of all participant pass locations in the bottom panel of **Figure 3**, where two distinct peaks appear in the histograms corresponding to the two passing regions. Using k-means cluster analysis to define these 2 location clusters (i.e., specifying k = 2 clusters for all participant pass locations) we observed that 8 participants made more than 50% of their release/passes in the cluster region closest to the confederate co-actor's ready/start location (near-confederate coactor region; see middle panel of **Figure 3**) and 6 participants made more than 50% of their release/passes in the cluster region closest to the targets (near-target region; see top panel of **Figure 3**). The remaining two participants began the experiment TABLE 2 | Average correlations between participants' trial-by-trial pass locations and object pickup and target locations, as well as participants previous pass decision, as a function of trial block.


% sig. <0.05 equals to the percentage of participants who exhibited a significant relationship between pass location and the corresponding task variable.

releasing/passing in the near-target region, but then in blocks 2 and 3 released/passed most of their passes in the near-assistant region. For those who always released/passed in the same region, the near-target participants (n = 6) released/passed objects in the near target region on average 94.8% of the time in the near-target region and the near-confederate co-actor participants (n = 8) released/passed objects in the near confederate co-actor region on average 89.1% of the time, further indicating that individuals tended to pick a general table location to pass/release objects for the confederate co-actor and then continue pass to that region across passing trials.

The center of the near-confederate co-actor and near-target cluster regions had (x, y) locations of (46.4 cm, 46.07 cm) and (66.95 cm, 58.93 cm) respectively. This corresponded to an average distance of 61.5 and 89.6 cm from the participants, respectively, and 19.8 and 50.7 cm from the confederate coactor's position, respectively. It remains unclear whether these locations represent a comfort-mode location, either with respect to the participant or the confederate co-actor. Consistent with previous research on third–order motor planning, it is possible that the reason why the distances of the two release/pass locations are beyond the participants' comfort reach distance (i.e., correspond to E/A ratios of 1.18 and 1.72, respectively) is because the actors (consciously or unconsciously) are attempting to maximize the beginning state comfort of the confederate coactor (i.e., Gonzalez et al., 2011; Ray and Welsh, 2011; Meyer et al., 2013). For the current task, however, determining what constitutes the comfort-mode location or location of leastenergy expenditure for the confederate co-actor is non-obvious


#### TABLE 3 | Average hierarchical linear regression results for participants' trial-by-trial pass locations as a function of trial block.

[% sig.] equals to the percentage of participants who exhibited a significant relationship between pass location and the corresponding task variable.


<sup>a</sup>p-value > 0.1 indicates good fit.

<sup>b</sup>Distribution fit determined from the average center of drop locations in a given participant block.

<sup>c</sup>Optimality defined as the number of clusters (1, 2, or 3) which results in the greatest reduction in variability of individual drop location distances from the cluster center.

and likely corresponds to a manifold of possible release/pass locations. Thus, it seems more likely that participants employed very little third–order motor planning from trial-to-trial and more or less picked a release/pass location very close to the confederate co-actor or within the reach of the confederate coactor but closer to the target location. Thus, while participants tended to settle into one of two stable passing locations, it is unclear from the current experiment what about the participants or task-space drives the selection of a given pass location.

Finally, in order to better understand the within cluster trialto-trial pass/release location variability, that was not clearly accounted for by variation in target location or previous pass location, we classified the distribution of pass locations around the average center of pass locations for each participant and block. This was done by first calculating the squared Euclidean distance of each pass from the average center of all pass locations for each participant in a given condition. The probability distribution of this data was then estimated using a kernel density estimation and the probability distribution was fit to a Gaussian, Exponential and Log-normal distribution. A Onesample Kolmogorov-Smirnov test was used to determine the probability that the distribution of distances from the average center came from one of the possible sample distributions. Results of this analysis are displayed in **Table 4** and illustrated in **Figure 4**. Consistent with recent research demonstrating how human behavioral variability over time exhibits significant

optimal number of clusters was calculated as above, using either 1 or 2 clusters, and k-means cluster analysis was performed. Conditions with more than 1 cluster have red and blue drop locations. The bottom plots in each section provide a 3-d histogram of the drop locations in order to illustrate frequency of drops in a given region and location. The red circle in the bottom right corner of each plot illustrates the size of the disc object.

degrees of persistence (e.g., Holden, 2002, 2005; Stephen and Mirman, 2010 for reviews), this analysis revealed that the distribution of pass locations around the average center tended to be log-normal (**Table 4**).

# How did the participants move?

To determine the trajectory dynamics of participant movement we separated the participant's pick and place movements into 3 sub-task movements: (1) object pickup movements or movements from the ready/start location to the object pickup location; (2) object pass movements or movements from object pickup to object release/pass; and (3) object target movements or movements from the object pickup to the object target drop-off location. The beginning and end of pickup and target movements corresponded to the first sample at which the center of the participant's hand-held motion sensor crossed the outer boundary of the corresponding start/object/target location. The beginning and end of pass movements corresponded to the first sample at which the center of the participant's hand-held motion sensor crossed the object pickup location (after picking up the

object) and the moment the participant released the object for the confederate co-actor.

An illustration of the spatial trajectories observed for the different sub-task movements is provided in **Figure 5** (left). These heat-map plots were created by dividing the table into 310 × 170 grid for pass and target trajectories and 930 × 510 grid for pickup trajectories due to the greater number of pickup trajectories. For each sub-task movement the number of times the participant's location was recorded in a given grid cell was recorded to create a histogram of trajectory locations in table coordinates. Colors are assigned to each cell from a color map with 64 colors. Overall, these heat-map plots revealed a consistent pattern of sub-task movement trajectories across participants. What is most apparent is that during pass and target movements participants consistently deviate from a straight-line path. More often than not, target and pass sub-task trajectories curved down toward the participant's standing position before curving back to the corresponding goal pass/release or target position. Although pickup movements trajectories were much closer to straight-line paths, there was also a consistent curve to the pickup movements for the closest and furthest pickup locations, albeit to a much lesser degree compared to pass and target sub-task movement curvature. Accordingly, the analysis of the sub-task movements focused on (a) the degree to which participants' total trajectories curved away from the shortest, straight line path between the start and end locations of the movement, (b) the deviation of the participants' initial heading or movement angle from the angle of the straight line path, and (c) the initial heading or movement angle (direction) of movement, as well as (d) the peak velocity and velocity profile of the sub-task movements. These trajectory measures were also important for determining whether the behavioral dynamics of these sub-task movements could be captured by an adapted version of the Fajen and Warren (2003, 2004) model described above.

The magnitude of movement curvature was quantified for each sub-task movement trajectory by calculating the area (m<sup>2</sup> ) between the actual sub-task trajectory and the straight-line trajectory calculated from the first and last (x, y) location of the corresponding movement time-series. The area between the actual trajectory and straight-line trajectory was determined using the trapezoidal method of numerical integration. Prior to computing trajectory curvature, a spline interpolation procedure was employed to time-normalize the movement trajectories (to length of 512 points) in order to minimize variation in area estimations due to movement time variations. The initial movement or heading angle of each sub-task movement was calculated as the angle between the 1st and 9th points of the timenormalized movement trajectories. The angle (in degrees) was calculated with reference to the positive x-axis of the tabletop, such that horizontal straight-line movements directly across the tabletop from left to right would have an initial heading angle of 0◦ and horizontal straight-line movements directly across the tabletop from right to left would have an initial heading angle of 180◦ . The deviation from the straight-line angle was calculated as the initial participant movement angle minus the straight-line path angle, such that negative values corresponded to participant movement angles that were less than (under shot) the straight line path angle and positive values corresponded to participant movement angles that were greater than (over shot) the straight line path angle.

As can be seen from an inspection of **Figure 6**, the average degree of movement curvature for pickup and pass movements

exhibited a somewhat linear change from positive to negative values as the action-scaled distance of pickup location increased, where positive curvature corresponded to movements that curved above the straight-line trajectory between the beginning and end locations of the movement and negative curvature corresponded to movements that curved below the straightline trajectory between the beginning and end locations of the movement. Separate one-way repeated measures ANOVAs comparing the participant mean curvature values as a function of pickup location for pickup and pass sub-task movements revealed that this change was statistically significant [all F(4, 60) > 120.97, p < 0.001, η 2 <sup>p</sup> > 0.90].

The data plotted in **Figure 6** also indicates that degree and direction (positive vs. negative) of trajectory curvature for all subtask movement types was directly related to the deviation of the initial movement angle from the straight-line angle between the beginning and end points of a movement. More importantly, although there was a change in initial movement angle as a function of the action-scaled pickup location for all sub-task movements [all F(4, 60) > 25.54, p < 001 η 2 <sup>p</sup> > 0.63], initial movement angle for the pass and target sub-task movements were largely independent of the end state distance or location of the movement. Specifically, for pass movements there was no significant difference between the participant mean initial movement angle for near-confederate co-actor and near-target participants, [F(1, 12) = 2.24, p > 0.16, η 2 <sup>p</sup> = 0.16]. Similarly, for target movements there was no change in participant mean initial movement angle as a function of target distance. This latter finding can be clearly discerned from inspection of **Figure 7**, where the overall mean initial movement angle is plotted for each pickup-target location combination for which target movements occurred. Taken together, this suggests that the trajectories

exhibited by participants for each sub-task movement type were a result of participants employing a fixed, non-straight-line initial movement angle for each pickup location.

(B,D) correspond to the best-fit line detailed in each plot. Error bars represent stand errors of the mean.

The highly predictable relationship between pickup location and initial movement angle for each sub-task movement type is illustrated in **Figures 6B,D**, **7C**. For pickup movements this relationship was linear, with the range or change in the overall mean initial movement angle (185.88◦–201.59◦ ) much smaller than the range of mean straight-line angles (171.83◦–236.15◦ ) between the start/ready location and the five action scaled pickup locations. Again, this accounts for the positive to negative degrees in movement curvature as the pickup distance increased (see **Figure 7A** and Right-top panel of **Figure 6**). For the pass and target sub-task movements, the relationship between the overall mean initial movement angle was nonlinear, with the magnitude of change in initial movement angle decreasing as the distance of the pickup location increased. In addition, the initial movement angles employed when moving away from each pickup location were nearly exactly the same for the pass (range: 31.21◦ to −35.15◦ ) and target movements (range: 31.35◦ to −33.89◦ ), further emphasizing the fact that for the current task the intended end-point location played, on average, very little role in determining the initial movement angle when moving the object away from the pickup location. From the current study it is not clear what accounts for the observed initial trajectory angles. One possibility is that the observed initial angle ranges are the result of biomechanical constraints imposed on participant movements while reaching across the table.

Finally, the velocity of each sub-task movement was calculated from the non-normalized trajectory time-series. The resulting velocity time-series were then time normalized using the same 512 point spline interpolation procedure defined above. The overall average time-normalized velocity profiles for each subtask movement are displayed in **Figure 5** (right). As expected, participants exhibited non-constant, positively skewed velocity profiles for all sub-task movement types. There was no meaningful effect of pickup, release/pass, or target location with minimal variation in peak velocity across sub-task movements: pickup (Mdn = 1.473, Q1 = 1.43, Q3 = 1.494), pass (Mdn = 1.757, Q1 = 1.731, Q3 = 1.76); target (Mdn = 1.798, Q1 = 1.758, Q3 = 1.833). However, a Greenhouse-Geisser corrected one-way ANOVA did revealed a significant difference in peak velocity between the sub-task movements, [F(1.187, 1.039) = 10.013, p = 0.004], with Bonferroni post-hoc analysis revealing that the peak velocity for the shorter distance pickup movements was significantly lower (M = 1.46 m/s, SD = 0.04 m/s) compared to the pass (M = 1.75 m/s, SD = 0.32 m/s) and target (M = 1.8 m/s, SD = 0.31 m/s) sub-task movements (both p < 0.025). There

was no significant difference in peak velocity between the pass and target sub-task movements (p > 0.05).

# MODELING BEHAVIORAL DYNAMICS

The current study had two overall aims. The first aim was to identify the behavioral dynamics that underlie a relatively simple object pick and place task, in which one participant had to move objects from one tabletop location to another either alone or by passing the object to another co-actor. Of particular interest was how the changes in relative distance of the starting (appearance) and target goal (drop-off) locations of the to-be-moved objects with respect to a participant's standing position would influence (1) the affordance dynamics that characterized an actor's choice to move an object alone or to pass it to a confederate co-actor, (2) the location that a participant would choose to release an object when passing it to the confederate co-actor, and (3) the trajectory dynamics of the participant's hand movements when moving toward, with, or passing an object.

With regard to the affordance dynamics that characterized a participant's choice to move an object alone or to pass it to a confederate co-actor, results revealed that the participant's decision to pass or not-pass an object was a function of the intended target distance, with participants exhibiting a nonlinear phase transition between passing and not-passing at an average E/A ratio of 0.82 (i.e., ratio of y-distance of target/comfort reach distance of participant). Moreover, participants exhibited hysteresis, transitioning at a higher E/A ratio when target distance was increasing over trials compared to when target distance was decreasing over trials (i.e., 0.85 and 0.80 respectively), implying that the dynamics underlying this affordance actualization process were not only nonlinear, but were also multi-stable. Interestingly, although each participant was somewhat consistent with regard to the location that they chose to release/pass the objects to the confederate co-actor during passing events, the specific location chosen did not appear to be too dependent on the pickup location of the objects, nor the end target location. Rather, it appeared that participants either picked a location relatively close to the confederate co-actor's hand or relatively closer to the drop-off target locations and simply continued to release/pass objects in that same general location over the course of a trial block. Finally, participants exhibited a consistent pattern of curved movement trajectories across pickup, pass, and target movements, with movement curative a result of participants employing a stable set of non-straight-line initial movement angles that co-varied with pickup location. In addition, participants exhibited nonstationary velocity profiles, with peak velocity occurring within the first ½ of a corresponding pickup, pass, or target movement.

The second aim of the current study was to determine whether a simple behavioral dynamics model could be employed to capture these dynamics. More specifically, we were interested in whether an adapted version of the Fajen and Warren (2003, 2004) behavioral dynamics model of human locomotory navigation to a stationary target goal could be employed to capture the pick and place movements investigated here. We anticipated that at least two extensions would be required: (i) a non-stationary velocity function would have to be employed when modeling the handmovement trajectories of participants; and (ii) a nonlinear action selection process to define whether participants passed or not. Below, we detail a preliminary model that not only incorporates these extensions, but exhibits the same qualitative movement and affordance dynamics exhibited by participants.

# Hand-Movement Dynamics

To model the dynamics of the participant's hand movements during object pickup, pass and target movements, a task specific parameterization of Equation (1) was employed. More specifically, the heading direction or angle, ϕA, of a participant's (from this point on referred to as "agent," A) hand or end-effector during pickup, pass and target movements was defined by

$$
\ddot{\varphi}\_A = -b\_{\mathcal{S}} \dot{\varphi}\_A - k\_{\mathcal{S}} \left( \varphi\_A - \theta\_{\mathcal{S}} \right) \left( e^{-c\_1 d\_{\mathcal{S}}} + c\_2 \right),
\tag{2}
$$

where ϕ˙A, and ϕ¨A, correspond to the velocity and acceleration of the agent's end-effector heading angle, respectively, and b and k are damping and spring/stiffness terms, such that −bgϕ˙<sup>A</sup> acts as a friction force on turning rate, and the function −k<sup>g</sup> ϕ<sup>A</sup> − θ<sup>g</sup> operates to minimize the difference between the current heading angle, fA, and the angle θ<sup>g</sup> , of the corresponding sub-task goal/target location (i.e., the pickup location for pickup movements, the release/pass location for passing movements, and the target/drop-off location for target movements). A novel feature of Equation (2) is the presence of the factor (e <sup>−</sup>c1d<sup>g</sup> + c2) in the second addend of the right-hand side. This factor modulates the effect of the term in Equation (2) operating to minimize the distance between the heading angle and the target angle. Specifically, it introduces an exponentially decaying function characterized by a constant offset parameter c<sup>2</sup> and an exponential decay rate which is a function of the constant parameter c<sup>1</sup> and the function

$$d\_{\mathcal{S}} = \left[ \left( X\_{\mathcal{S}} - \varkappa\_A \right)^2 + \left( Y\_{\mathcal{S}} - \wp\_A \right)^2 \right]^{1/2},\tag{3}$$

where X<sup>g</sup> , Y<sup>g</sup> and xA, y<sup>A</sup> are the coordinates of the current sub-task goal location and the current location of the agent's end-effector (hand), respectively (see Fajen and Warren, 2004; for more details). The parameter c<sup>2</sup> simply ensures that the rate of change in heading direction never goes to zero (Fajen and Warren, 2004).

It is important to appreciate that θ<sup>g</sup> and d<sup>g</sup> (defined in Equation 3), change as the position of the agent's hand/endeffector changes and are defined by

$$\theta\_{\mathcal{S}} = \cos^{-1}\left[\frac{\left(Y\_{\mathcal{S}} - \chi\_A\right)}{d\_{\mathcal{S}}}\right],\tag{4}$$

Now, recasting Equation (2) as a system of first-order differential equations and adding two extra equations defining the change in the xA, y<sup>A</sup> position of the agent's end-effector over time results in the following system of equations,

$$\begin{aligned} \dot{z}\_1 &= z\_2 = \dot{\varphi}\_A \\ \dot{z}\_2 &= \ddot{z}\_1 = \ddot{\varphi}\_A = -b\_{\text{\textg}} z\_2 - k\_{\text{\textg}} \left( z\_1 - \theta\_{\text{\textg}} \right) \left( e^{-c\_1 d\_{\text{\textg}}} + c\_2 \right) \\ \dot{z}\_3 &= \dot{x}\_A = \nu\_A \sin z\_1 \\ \dot{z}\_4 &= \dot{y}\_A = \nu\_A \cos z\_1, \end{aligned} \tag{5}$$

where v<sup>A</sup> is the movement velocity of the agent's end-effector (hand). In order for the model to capture the non-constant velocity profile observed in participants v<sup>A</sup> is defined by means of the additional 2nd order differential equation

$$\ddot{\nu}\_A = -b\_\nu \dot{\nu}\_A - k\_\nu \left(\nu\_A - C\_\nu (1 - e^{-d\_\xi})\right),\tag{6}$$

where b<sup>v</sup> and k<sup>v</sup> operate as damping and stiffness terms on the rate of change of vA, which increases and decreases as a function of the target (goal) distance, d<sup>g</sup> . When the agent's endeffector or hand is far away from the target location, (1 − e −d<sup>g</sup> ) approaches 1 and v<sup>A</sup> increases. As the distance to the goal location decreases, however, (1 − e −d<sup>g</sup> ) begins to approach zero and v<sup>A</sup> decreases accordingly. C<sup>v</sup> is a constant parameter that specifies the maximum velocity in m/s, such that the same equation can be used for a wide range of different movement distances, with differential peak velocities resulting for shorter and longer distances. Combining Equations (6) and (7) into a system of first order differential equations results in the endeffector (hand) movements or trajectories of an agent begin captured by

$$\begin{aligned} \dot{z}\_1 &= z\_2 = \dot{\varphi}\_A \\ \dot{z}\_2 &= \ddot{z}\_1 = \ddot{\varphi}\_A = -b\_\xi z\_2 - k\_\xi \left( z\_1 - \theta\_\xi \right) \left( e^{-c\_1 d\_\xi} + c\_2 \right) \\ \dot{z}\_3 &= \dot{x}\_A = z\_5 \sin z\_1 \\ \dot{z}\_4 &= \dot{y}\_A = z\_6 \cos z\_1, \\ \dot{z}\_5 &= z\_6 = \dot{\nu}\_A \\ \dot{z}\_6 &= -b\_\nu z\_6 - k\_\nu \left( z\_5 - C\_\nu (1 - e^{-d\_\xi}) \right), \end{aligned} \tag{7}$$

# Action Selection Dynamics

The dynamics of action selection observed in the current experiment were modeling using the equation

$$
\dot{\mathbf{x}} = -\boldsymbol{\alpha} + \mathbf{x} - \mathbf{x}^3 \tag{8}
$$

where x represents the state variable for action section (i.e., affordance mode) and α corresponds to the re-normalized E/A ratio calculated as

$$\alpha = \left(\sigma - \frac{d\_{\rm \g}}{R\_A}\right) \delta \tag{9}$$

Here, d<sup>g</sup> is the distance of the agent's end-effector (hand) to the target location, R<sup>A</sup> is a measure of the agent's maximal preferred reach. α is the E/A ratio participants typically switch between behavioral modes, and σ and δ are constant scaling factors. As can be seen from an inspection of **Figure 8**, where Equation (8) is plotted as the potential function

$$V(\mathbf{x}) = \alpha \mathbf{x} - \frac{\mathbf{x}^2}{2} - \frac{\mathbf{x}^4}{4} \tag{10}$$

this system results in a saddle-node bifurcation as α is scaled up or down past ±α<sup>c</sup> (approximately ±α<sup>c</sup> = 0.35). Moreover, the system exhibits a region of bi-stability between ±α<sup>c</sup> and corresponding hysteretic behavior. More specifically, for α < −α<sup>c</sup> and α > +α<sup>c</sup> the system has a single stable fixed point at −xst and +xst, respectively. For −α<sup>c</sup> < α < +α<sup>c</sup> , however, the system has two stable fixed points at, −xst and +xst, respectively, as well as an unstable fixed point between the two. This system has previously been employed to capture the nonlinear transitions in categorical speech perception (Tuller et al., 1994; Tuller, 2005), attitude change (Richardson et al., 2014) and conciliation dynamics during conflict situations (Coleman et al., 2007), and appears to represent a generic nonlinear decision or action selection process (van Rooij et al., 2013). For the current pick and place task, we arbitrarily defined convergence on a stable fixed point at −xst to specify non-passing (i.e. moving alone) and convergence on a stable fixed point at +xst to specify passing. Accordingly, when α < −α<sup>c</sup> and α > +α<sup>c</sup> the system is mono-stable and the agent always converges on the one stable corresponding action mode. However, when −α<sup>c</sup> < α < +α<sup>c</sup> the action selection dynamics are bistable, with the likelihood of converging on one of the two corresponding action modes (i.e., passing or not-passing) a function of the relative stability of the two fixed points and the previous state of system.

# MODEL SIMULATION

To determine whether the movement trajectory dynamics defined by Equation (7) and the action selection dynamics defined by Equation (8) were able to qualitatively capture the behavioral dynamics exhibited by participants in the current pick and place task, a MATLAB (2014a) simulation was conducted. A flow diagram illustrating the structure of the simulation is provided in **Figure 9**. The simulated environment consisted of a 1.50 × 0.89 meter rectangular space matching the experimental table's dimensions. Pickup locations were calculated based on the average participant comfort reach distance of 52.2 cm. The initial model and simulation target locations matched the ready and target locations in the original task setup. Eight different simulations sequences were conducted, with each simulation sequence consisting of 3 blocks (ordered, random, ordered) of 200 trials (600 trails in total for each simulation sequence). For four of the simulations the passing location corresponded to the near-target passing location (0.7695, 0.5893) observed in the experimental data. For the other four simulations the passing location corresponded to the overall average nearconfederate co-actor passing location (0.464, 0.5607) observed in the experimental data. Experimentally observed pass location variability is likely due to the many complex interactions from which this passing behavior emerges (Holden, 2002, 2005; Stephen and Mirman, 2010). However, in our model this variability is simulated using a sequence of random values generated from a lognormal distribution that were added to the passing location in order to produce a pass location distribution that was similar to the original data.

The action selection dynamics (Equation 8) were integrated for 1,500 steps using the MATLAB ODE45 function with the end state of the integration used to drive the decision to pass or go to the target. The output state of the action selection equation was stored as an input for integration of the action selection equation in the next trial (x = 0 for the first trial in a sequence). Based on the results of the original experiment, the initial trajectory angles for each sub-task movement type for each trial was calculated using the regression equations in **Figures 6**, **7** for pickup and pass/target movements, respectively. Random noise was added to the initial angle from a uniform distribution with min/max values of ± 20◦ . The movement dynamics (Equation 7) were integrated separately for each sub-task movement using the Euler integration (0.01 time step), with integration terminated when the model location was within 4 cm of the target location. Random noise was added to the model heading direction, ϕA, at each time step of the integration using a uniform distribution with min/max values of ± 1.14◦ .

Heat-maps were created using the same method as in the original experiment; however, due to the reduced variation in the model a 1,240 × 680 grid was used for pickup trajectories. As can be seen from an inspection of **Figure 10** the overall heatmap plots revealed patterns of sub-task movement trajectories similar to those observed in the original experiment. As observed in actual participants, the model deviates away from a straightline path during pass and target movements with a trajectory that tended to curve down toward the bottom of the task space

FIGURE 8 | Illustrations of the potential function plots for Equation (8) for changes in the value of α. In (A), the value of α increases from α < 0 to > 0. As α approaches 0, the system becomes bi-stable but continues to converge on a stable solution at −xst. As α increases and −xst becomes less stable the system eventually converges on the solution at +xst. In (B) the value of α decrease from α > 0 to α < 0, exhibiting the same characteristics as illustrated in (A) but in the opposite direction.

before curving back to the corresponding goal pass/release or target position. This curvature is driven in the model by the initial trajectory angle set at the beginning of each sub-task movement. When the initial angle is calculated using the straightline angle between the initial trajectory location and the sub-task goal location, the model does not exhibit this curving behavior, even with noise added to the heading direction. This suggests that

location depending on the output of the action selection dynamics.

when participants pick up the object they immediately start toward the other side of the table but do not decide exactly where they are going until later in the trajectory. The observed curved trajectories emerge from the initial conditions of the sub-task trajectory and the dynamics of the system. Velocity also plays a role in the curvature of the trajectory, with trajectories tending to curve more and longer when the velocity is high. As can be seen in the heat-map of the passing trajectories, the curve toward the passing locations tends to be less abrupt in the simulations than observed in the original experiment. One possibility that might account for this difference may be that the decision to pass occurs at some point after pickup before the participant has located the target location. Future studies could look at factors that further affect trajectory curvature, including the possibility that action selection occurs online and not at a single point within a task-goal trajectory.

**Figure 11** illustrates the percentage of passes performed for each target location depending on the appearance order of the targets (ascending, descending, or random). As can be seen in **Figure 11**, the action selection dynamics of the model exhibit hysteresis similar to observations in the original experiment (see **Figure 2**). To verify that the hysteretic effect observed in the simulation experiment was significant, a one-way repeated measures ANOVA was conducted comparing the distance (target location) that the model switched between passing and not passing as a function of target location order (i.e., ascending, descending, and random) in each simulation run. This analysis revealed a significant effect of target location order, [F(2, 12.007) = 13.946, p < 0.001, η 2 <sup>p</sup> <sup>=</sup> 0.666], with Bonferroni post hoc analysis indicating that pass/no-pass transition distance for the

ascending target order was significantly higher compared to the pass/no-pass transition distance for the descending target order (p = 0.005). There was no significant difference between either the ascending or descending and random target location orders (p > 0.05).

Finally, as can be seen in **Figure 10** (right), the shape of the average velocity profile is qualitatively similar to the average velocity profile observed in the original experiment. The peak velocity occurs around the first 1/3rd of the trajectory, with a difference in the magnitude of average peak velocities between the pickup sub-task goal and the target and pass sub-task goals.

# CONCLUSION

The current study identified and modeled the affordance and nested sub-task movement dynamics of a simple pick and place task. As expected, the results revealed a consistent pattern of behavioral action across participants, with the transition between social (object passing) and solo action (not passing or moving the objects alone) determined by an intrinsic relation between the participant's action capabilities and the physical task-relevant constraints (Warren, 1984; Mark, 1987; Warren and Whang, 1987; Kinsella-Shaw et al., 1992; Richardson et al., 2007; Harrison et al., 2016). The hysteretic nature of the transition from soloto social-action was also expected and provided further evidence that the perception and actualization of mutually destructive (and constructive) affordance possibilities is governed by nonlinear, multi-stable dynamical processes (Kelso, 1995; Frank et al., 2009; Richardson and Kallen, 2015). The verified implication of these findings was that a simple nonlinear bifurcation function (Equation 8), parameterized by a normalized E/A ratio of the participant's comfort reach capabilities relative to the distance of the intended object target goal location (Equation 9) could be employed to effectively capture the affordances dynamics exhibited by participants (see **Figure 11**).

moving toward participant. Random target appearance are represented by the black dotted line with handles. Asterisks represent the point at which 50% of decisions were passes and 50% were not (note that this point could occur between target locations). Each target location was presented 5 times each per Ascending and

Interestingly, participants consistently released/passed the object in roughly the same location throughout the experiment, either near the targets or near the co-actor. Although nearly all participants settled on one of these two pass location strategies, it remains unclear why any particular participant chose one passing location over the other and further research is needed to investigate how and why these location preferences emerged. It is significant, however, that the pass location chosen by a given participant was dependent on task-invariant features of the task-space, namely, the confederate co-actor's hand location or the confederate co-actor's hand location relative to the target locations. Together with the fact that a participant's chosen pass location was independent of changes in trialto-trial object appearance and target distance locations, this suggests that participants chose their pass location with respect to the global structure of entire task context. This suggests that predictions about a participant's pass location can be made without reference to smaller scale fluctuations that occur as the task unfolds. Moreover, precise prediction about the specific release/pass location chosen by a given participant appears to be of little importance with regards to functional task completion or with regards to modeling the behavioral dynamics observed. That is, so long as an object is released/passed in a location that can be easily reached by the confederate co-actor, the object can be picked up and moved effectively by the coactor. This is not to say that there are not locations that would result in more efficient or optimal patterns of behavior (and less overall energy expenditure); rather this appears to be less important than the predictability of current and future release/pass locations (Cakmak et al., 2011; Strabala et al., 2013). Indeed, the specification of a pass in the current task context was defined by the invariance of returning to the same chosen release/pass location, not the degree to which the release/pass location corresponds to some optimal pass location. Accordingly, the degree to which third–order motor planning (Ray and Welsh, 2011; Meyer et al., 2013) operated to constrain the behavior of participants appeared to be minimal in the current task.

Descending conditions and 10 times for the Random condition.

The results of the current study also demonstrated how the trajectory dynamics of the participant's sub-task hand movements, including movement velocity, could be effectively captured by an adapted version of the Fajen and Warren (2003, 2004) behavioral dynamics model of locomotory path navigation. The significance of this finding is twofold. First, it highlights how the same low-dimensional behavioral dynamics can operate to constrain multiple (and often nested) levels of human activity. Second, it suggests that, with the exception of pass locations that require further investigation, knowledge of what, when, where and how to move or act during a social interaction is often lawfully defined by these low dimensional task dynamics and, thus, can emerge spontaneously and in real-time with little a priori planning. Indeed, participants in the current task did not appear to plan out their sub-task movement trajectories from the outset, nor did they even appear to plan their subtask movement with regards to the shortest path of the final end state or task goal. In fact, participants did not adjust their initial angle to the specific sub-task goal location on a given trial, even when the location of the sub-task goal was predictable. Instead, participants essentially moved in the general direction of the next sub-task goal, shaping the needed trajectory over the course of movement. As a result, the movement trajectory and velocity profiles that occurred were simply an emergent product of historically dependent initial conditions (parameterizations) operating within a set of well-defined task constraints.

Clearly, the confederate co-actor in the current pick and place task played a minimal role. It is therefore possible that the observed dynamics would have been different if the confederate co-actor was more engaged in the task (e.g., picked up and passed objects also). In particular, when two or more agents are simultaneously active in a shared task space the decisions on whether to pass and where to pass are dependent on the behavioral movements and action possibilities of both actors together. Although future research is planned to investigate the behavioral dynamics of a more complex joint-action pick and place scenario, it is possible that very minimal changes to the current pick and place model will be required to capture the dynamics of such joint action behavior. That is, it seems likely that the movement trajectory dynamics of actors in a truly joint action pick and place task would be almost identical to those observed in the current task, with the only addition needed to Equation (2) being an obstacle avoidance coupling to prevent the actors bumping into each other. The action selection dynamics of the actors would also need to be coupled, such that the affordance dynamics of each actor are mutually dependent. However, these minimal changes are easily implemented and would not increase the dimensionality of the system of equations detailed above. Of major interest, would be whether such minimal changes could produce patterns of behavioral joint-action as complex as those that would be expected during real human-human behavior—i.e., the emergence of complexity from non-complexity.

Finally, the Fajen and Warren model of path navigation has been successfully implemented in robotic systems for local obstacle avoidance and path navigation in novel environments (Huang et al., 2006; Nemec and Lahajnar, 2009). Building on this previous work and the current research, a future next step is to explore the application of the proposed model in humanrobot and human-virtual avatar joint-action pick and place tasks. Demonstrating how this and other task or behavioral dynamics models can be employed for the development of robust humanmachine systems will not only further validate the effectiveness of the such models for effectively capture human multiagent behavior, but will also further emphasize the degree to which such models are able to provide a grounded explanation of multiagent behavior in general.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the University of Cincinnati Institutional

# REFERENCES


Review Board with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the University of Cincinnati Institutional Review Board.

# AUTHOR CONTRIBUTIONS

ML: Collected and analyzed data, developed, tested and parameterized model. RK: Contributed to experimental design as well as theory and model development. SH: Contributed to model development and testing. Contributed to data analysis and interpretation. MD: Contributed to model development (instrumental in development of velocity model). Contributed to data interpretation. AM: Contributed to experimental design. Contributed to data analysis and interpretation as well as model development and testing. MR: Contributed to model development, characterization, and testing. Contributed to data analysis, interpretation, and presentation. Contributed to experimental design.

# ACKNOWLEDGMENTS

This research was funded by The National Science Foundation (NSF#1513801) and National Institute of Health (R01GM105045-01). We would also like to thank Patrick Nalepka and Conner Wolfe for help with data collection and Krasimira Tsaneva-Atanasova, Richard Schmidt, and Elliot Saltzman for helpful comments.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Lamb, Kallen, Harrison, Di Bernardo, Minai and Richardson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Quantifying and Modeling Coordination and Coherence in Pedestrian Groups

Adam W. Kiefer 1, 2, 3, 4 \*, Kevin Rio<sup>1</sup> , Stéphane Bonneaud<sup>1</sup> , Ashley Walton<sup>4</sup> and William H. Warren<sup>1</sup>

<sup>1</sup> Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, RI, United States, <sup>2</sup> Division of Sports Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States, <sup>3</sup> Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States, <sup>4</sup> Center for Cognition, Action and Perception, Department of Psychology, University of Cincinnati, Cincinnati, OH, United States

#### Edited by:

Rick Dale, University of California, Merced, United States

#### Reviewed by:

Verónica C. Ramenzoni, National Scientific and Technical Research Council, Argentina Daniel Richardson, University College London, United Kingdom

> \*Correspondence: Adam W. Kiefer adam.kiefer@cchmc.org

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 02 December 2016 Accepted: 23 May 2017 Published: 28 June 2017

#### Citation:

Kiefer AW, Rio K, Bonneaud S, Walton A and Warren WH (2017) Quantifying and Modeling Coordination and Coherence in Pedestrian Groups. Front. Psychol. 8:949. doi: 10.3389/fpsyg.2017.00949 Coherent collective behavior emerges from local interactions between individuals that generate group dynamics. An outstanding question is how to quantify group coordination of non-rhythmic behavior, in order to understand the nature of these dynamics at both a local and global level. We investigate this problem in the context of a small group of four pedestrians walking to a goal, treating their speed, and heading as behavioral variables. To measure the local coordination between pairs of pedestrians, we employ cross-correlation to estimate coupling strength and cross-recurrence quantification (CRQ) analysis to estimate dynamic stability. When compared to reshuffled virtual control groups, the results indicate lower-dimensional behavior and a stronger, more stable coupling of walking speed in real groups. There were no differences in heading alignment observed between the real and virtual groups, due to the common goal. By modeling the local speed coupling, we can simulate coordination at the dyad and group levels. The findings demonstrate spontaneous coordination in pedestrian groups that gives rise to coherent global behavior. They also offer a methodological approach for investigating group dynamics in more complex settings.

Keywords: group locomotion, group coordination, cross-recurrence quantification, principal components analysis

# INTRODUCTION

Collective behavior in humans and other animals is thought to arise from local interactions between individuals that are coupled by sensory information. This coupling may be modulated by factors such as environmental context (e.g., presence of predators, food sources), motivation (e.g., metabolic state, goals), and cognitive or social constraints (e.g., strategies, group membership, dominance relations). To understand the emergence of collective behavior, researchers must characterize both the local coupling between individuals and the global patterns of coordination. Such an approach calls for a set of analytic tools that can quantify the degree and stability of spatiotemporal coordination at both the individual and collective levels. The purpose of this paper is to investigate coordination in human collective behavior, beginning with the analysis of local and global coordination in small pedestrian groups.

By way of introduction, consider the flocking behavior of a murmuration of starlings. Each bird is visually coupled to nearby neighbors, and this local coupling influences an individual's behavior in accordance with a particular set of "rules;" we call them control laws to emphasize their continuous dynamical as opposed to logical form. These local interactions give rise to coordinated behavior between neighbors, which in turn feeds back to involve more individuals, so the coordination pattern propagates through the flock. The end result is a self-organized pattern of global motion that emerges from local interactions. The exact nature of the control laws that govern these local interactions and how they generate coherent flocking behavior is an active area of research (Ballerini et al., 2008; Cavagna et al., 2010; Hildenbrandt et al., 2010; Lukeman et al., 2010).

It is difficult to infer the local control laws based solely on the observed global behavior, however. An important theoretical result is that different sets of interaction rules can generate the same pattern of coherent flocking (Vicsek and Zafeiris, 2012); thus, the local control laws are underdetermined by analysis of the global behavior. This finding implies that direct experimental study of interactions between individuals is required to model the control laws, which can then be used to simulate coordination patterns. Therefore, a complete account of collective behavior demands an approach that combines a local-to-global (bottomup) perspective, in which empirically-grounded control laws are used to predict global behavior, and a global-to-local (topdown) perspective, in which measurements on global behavior are analyzed and compared with the predictions (Sumpter et al., 2012).

We are pursuing this dual approach to understand the collective behavior of human crowds. The program of research includes characterizing the control laws by which visual information guides locomotion, a pedestrian model that generates locomotor trajectories, and multi-agent simulations of the emergent crowd dynamics. Warren (2006) proposed a behavioral dynamics framework that aims to characterize how stable low-dimensional behavior emerges on-line from the interactions between an agent and its environment. Goaldirected behavior such as locomotion is regulated by perceptual information in accordance with task-specific control laws (Gibson, 1979; Warren et al., 2001; Warren and Fajen, 2004). Within this framework, Fajen and Warren (2003, 2007) and Warren and Fajen (2008) developed a pedestrian model that successfully characterizes locomotor behavior such as steering to stationary and moving goals, and avoiding stationary and moving obstacles. This model has recently been extended from agent-environment interactions to interactions between pairs of pedestrians (dyads), including pursuit and evasion, following, and walking side-by-side (Cohen et al., 2010; Bonneaud and Warren, 2012; Page and Warren, 2013; Rio et al., 2014).

In certain contexts, two pedestrians may have the goal of walking together, in which case they visually coordinate their velocity, i.e., walking speed and direction of travel (heading). During pedestrian following, Rio et al. (2014) found that the follower matches the leader's speed, independent of their interpersonal distance (1–3 m); this is accomplished by nulling the optical expansion of the leader (see also Lemercier et al., 2012; Bruneau et al., 2014). A similar speed-matching strategy was observed in side-by-side walking, with a similar coupling strength (Page and Warren, 2013). In addition, Dachner and Warren (2014) found that pedestrians match the walking direction of a neighbor, independent of interpersonal distance (1, 2, 4 m), with a comparable coupling strength in following and side-by-side walking. They recently proposed that speed and heading are jointly controlled by nulling both the optical expansion and the change in bearing direction of the leader (Dachner and Warren, 2017). These results indicate that pedestrian dyads utilize visual information to adopt a common speed and direction over a range of distances and positions.

This research has established a preliminary set of control laws that govern pedestrian interactions. An outstanding question is whether they scale from dyads to groups, and ultimately, can account for the self-organization of collective crowd behavior. Answering this question requires methods for quantifying the emergent patterns of coordination at both the local and global scales. This is a particularly difficult problem given that pedestrian locomotor trajectories are a continuously evolving, aperiodic behavior. Accordingly, it requires analysis tools that can identify the temporal pattern of non-rhythmic coordination between dyads at a local level, as well as group coherence at a global level.

As a first step, the system must be operationalized. In previous work, two behavioral variables have been used to describe a locomotor trajectory: (1) the agent's direction of heading (Φ), and (2) the agent's speed (s), which together define the agent's velocity in an allocentric coordinate frame. This operationalizes a pedestrian as having two degrees of freedom (DoF), which may be coupled between neighbors. Similarly, Riley et al. (2011) proposed that behavioral coordination between two agents arises from the coupling of their DoF. It is believed that agents couple the DoF of a system via shared information variables, so that the DoF directly regulate one another. Hence, the control of behavior at the level of the group emerges via functional, information-based linkages between the behavioral variables of individual agents. When framed in terms of behavioral dynamics, collective behavior can be considered a problem of informationally coupling the appropriate behavioral variables to yield a stable solution of the global behavioral dynamics. For the task of locomotion, each pedestrian is operationalized as a two DoF system with the state variables Φ and s. Each additional individual in a group of N pedestrians would add two more state variables to the collective system, so the total DoF = 2N. Thus, the state space of the system has 2N dimensions.

Once the behavioral variables are identified, the next step is to quantify the degree of coordination at the collective level. From a global perspective, the degree of coordination among a set of pedestrians would be reflected in a reduction of the effective DoF of the system to a value between 2N, such that all individuals move independently, and 2, such that all individuals move with the identical speed and direction. One way to measure the reduction in a system's DoF is to quantify the dimensional compression of the observed behavior. Principle Components Analysis (PCA) is a valuable tool in this regard (Riley et al., 2011). PCA can be used to identify collective variables, or principle components, based on the relations among observations in a high-dimensional state space (cf. Haken and Wunderlin, 1990). It also indexes the load magnitude of each state variable on the identified principle components, which can help uncover the coupling between behavioral variables. The strength of PCA is its ability to include many variables of a complex system in a single analysis and to provide an output that quantifies the degree of relation, or even coordination, between the component variables. Its limitation is that PCA is a linear analysis, and therefore assumes linear relations among the system's variables. PCA provides the first part of the analysis by quantifying group coherence at the global level.

At the local level, the next step is to quantify the degree of coordination between pairs of individuals in a group, to reveal the coupling strength as a function of variables such as neighbor distance and position. One approach is to compute the linear cross-correlation between the time series of speed (or heading) for two pedestrians. The limitation of this analysis is that it assumes that individuals are coupled at a single time-scale and that behavior is stationary (i.e., a constant delay). It therefore has limited utility in analyzing more complex systems, such as bidirectional coupling at multiple time-scales and non-stationary behavior that evolves over time.

Cross-recurrence quantification (CRQ), is well-suited to the latter type of data and has proven useful in analyzing interpersonal coordination (cf., Shockley et al., 2003; Richardson, D. C. et al., 2007; Ramenzoni et al., 2012). CRQ is a non-linear analysis that indexes repeating patterns in a pair of time series at multiple temporal scales (Webber and Zbilut, 1994; Shockley et al., 2002). In particular, the output measure "cross-maxline" (CML) has proven to be a reliable estimate of the temporal stability of coordination, associated with coupling strength, between two movements (Richardson, M. J. et al., 2007; Page and Warren, 2013). However, these local analyses are limited to a pairwise comparison of dyads in a group.

Finally, to determine whether a model of the local coupling can account for the observed patterns of coordination, agentbased simulation methods can be used to try and reproduce the data. In particular, we investigate the mechanism of coordination by testing whether our model of the local "rule" for speed matching, derived from data on pairs of pedestrians, generalizes to coordination in a group, and can explain the adoption of a common collective speed and heading.

Our goal in the present paper is to measure the degree of coordination in pedestrian groups at the global and local levels, and to model the local coupling that generates such coordination. Establishing the emergence of coordinated behavior is prerequisite to modeling the informational control laws, characterizing the conditions for the emergence of such behavior, and eventually investigating the roles of other cognitive and social variables. In the present experiment, groups of four pedestrians walked toward one of three goals, while the group's initial density (interpersonal distance) was varied on each trial (see **Figure 1**). The role of density is important due to its potential contribution to self-organization: if coupling strength is distance-dependent, higher densities would create stronger local interactions and promote coherent crowd formation. Previous results have shown that, for an individual pedestrian, the coupling to obstacles decays exponentially with distance, asymptoting at 3–4 m (Fajen and Warren, 2003), but on the other hand, the coupling between pairs of pedestrians appears to be independent of distance, at least up to 3–4 m (Dachner and Warren, 2014; Rio et al., 2014). In the present experiment, we explored interpersonal distances of 0.5–2.5 m within groups of four people.

As described above, we analyzed two behavioral variables: the walking speed s and walking direction Φ for each agent. This resulted in a total of eight state variables, or DoF, for the fouragent system. To determine whether the observed coordination is a consequence of the informational coupling between individuals and is not due to other task constraints, we compared the real groups with virtual groups that were constructed by randomly sampling the same four pedestrians from four different trials. At the global level, we hypothesized that the real groups would exhibit dimensional compression in all conditions, compared to the virtual groups. We also investigated whether dimensionality would be reduced more in the higher density conditions. At the local level, we hypothesized that the coupling strength would be greater between real dyads than virtual dyads, and we asked whether it would increase as a function of group density. Finally, we tested whether Rio et al.'s (2014) speed-matching model generalizes to the observed speed coordination between individuals in a group and can explain the emergence of a common speed.

# METHOD

# Participants

Five groups of four participants (N = 20; M age 23.57 ± 0.93 years; 12 female, 8 male), students at Brown University, were compensated \$15 for their participation. Participants had normal or corrected-to-normal vision and no history of cognitive deficits, lower extremity injury, or neuromuscular disorders that would inhibit normal locomotor activity. This study was carried out in accordance with the recommendations of the Brown University Institutional Review Board with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Institutional Review Board.

# Materials and Apparatus

The experiment was conducted in the VENLab at Brown University, a 12 × 14 m open room. The head position of each participant was tracked with a MicroTrax inertial tracker affixed atop a lightweight bicycle helmet on the head. Each tracker communicated with an IS-900 ultrasonic overhead grid tracking system (InterSense, Billerica MA, USA) and provided 6 DoF position (4 mm RMS error) and orientation (0.1◦ RMS error) data at 60 Hz. Three cardboard goal poles (∼2 m tall and 0.5 m in diameter) were placed at an initial distance of 8 m from the "trigger line" for the front two participants, and spaced 2 m apart, with goal 2 straight ahead, goal 1 to the left, and goal 3 to the right (see **Figure 1**). Colored tape was used to mark four

possible starting positions in a square configuration, with initial interpersonal spacing of 0.5, 1.0, 1.5, or 2.5 m on a side.

# Design and Procedure

Each group completed eight trials in each of 12 conditions (see **Figure 1**), with four densities (interpersonal distances of 0.5, 1.0, 1.5, 2.5 m) crossed with three goal positions (left, straight, right). This resulted in a total of 96 trials, presented in a random order, in each experimental session. Goal position was manipulated in order to vary the heading direction between trials, and thus was not included as a factor in the statistical analyses.

At the beginning of each trial the four participants were randomly assigned to the four positions in the square configuration: (1) front right, (2) front left, (3) back right, or (4) back left (**Figure 1**). Once they were standing in the correct location, an experimenter gave a verbal "go" signal and the group began to walk straight ahead. As the last participant crossed a notional "trigger line" 1 m after the starting line, the experimenter gave a verbal command of goal 1, 2, or 3. The only instruction given to the participants was to walk to the specified goal at a comfortable pace without stopping. Participants were not told to stay together as a group or to maintain their initial configuration. Each trial lasted ∼6–8 s.

# Data Reduction and Analysis

The tracking system recorded the medial-lateral and anteriorposterior head position (x- and z-coordinates, respectively) of each participant at a sampling rate of 60 Hz. The raw (unfiltered) position data were used to compute the participant'ss and Φ from the displacement between successive samples, according to the following equations:

$$s\_i = \frac{\left( (\mathbf{x}\_i - \mathbf{x}\_{i-1})^2 + (z\_i - z\_{i-1})^2 \right)^{0.5}}{\Delta t},\tag{1}$$

$$\phi\_i = \tan^{-1} \left( \frac{\mathbf{x}\_i - \mathbf{x}\_{i-1}}{z\_i - z\_{i-1}} \right), \tag{2}$$

where x<sup>i</sup> and z<sup>i</sup> are the head position on the ith frame, in room coordinates. The Φ and s time series were used for all subsequent analyses.

#### Virtual Group Construction

Certain aspects of the procedure—such as a common goal, a simultaneous go signal, a simultaneous goal command, and walking at preferred speed—may have yielded correlations between participants that were not due to the visual coupling. To isolate the effect of the coupling from these task constraints, the data from real groups were compared with control data from constructed virtual groups that were not visually coupled. For each real group trial, a paired virtual group trial was created by randomly selecting a time series for the same four participants in the same condition, but from four different trials. Thus, all task constraints were matched, except that the participants in the virtual group were not perceptually coupled with each other. The four randomly selected time series were temporally aligned based on the goal command, and their lengths equated by cropping the beginning and/or end of the time series, to match the length of the shortest time series (a requirement of both PCA and CRQ analysis). This resulted in four randomly selected time series of equal length that were aligned by the goal command. Using these virtual groups as a control ensured that any significant coordination between participants was due to the perceptual coupling, not the task constraints.

#### Principal Components Analysis (PCA)

PCA identifies linear relationships within multi-dimensional datasets and then maps the original data into a newly defined space, with the principal components as its axes. The principal components represent the dataset's primary dimensions of variation, but do not necessarily map directly onto the original dimensions of the actual measurement. The end result is a representation of potentially new, important collective variables that best account for the variance within the observed system.

In the context of the present experiment, eight variables of interest representative of the 8 DoF of the observed system (i.e., Φ and s for each of the four participants in each group) were submitted to a single PCA. The data were normalized using a z-score transform prior to analysis. PCA was performed in Matlab using the princomp function and the results were examined in a similar fashion to Ramenzoni et al. (2012). First, the number of components that together account for 90% or more of the variance in the data set was determined. To investigate dimensional compression in the real vs. virtual group, a 4 × 2 mixed-model ANOVA was conducted on number of components, with initial density as a within-subjects factor and group (real vs. virtual) as a between-subjects factor, averaged across goal position. Next, the amount of variance accounted for by the first principal component (PC) in the real vs. virtual group was compared using an identical mixed-model ANOVA. The analysis was limited to the first two PCs because (a) the subsequent components were dependent on the first PC, and (b) the second PC provides additional context about the subsequent loadings. Greater variance accounted for by the first PC in the real group indicates dimensional compression, and thus greater coherence, in the visually coupled system. Finally, the mean correlation coefficient (r) for the loading of each behavioral variable on the first PC was examined to investigate which of the eight variables were most influential in characterizing the group's behavior. The r-values were transformed using a Fisher's z' transform and submitted to a 4 × 8 × 2 mixed-model ANOVA with initial density and agent position as within-subjects factors, and group as a between-subjects factor, again averaged across goal position for PC1. The aim of this analysis was to examine whether the speed or heading of an agent in a particular position more strongly influenced the group's behavior and whether this influence depended on density.

### Cross-Correlations

At the local level, linear cross-correlation was used to measure the strength of the relation and the time delay between pairs of Φ time series (and, separately, pairs of s time series) for each of the six dyads in a group (illustrated in **Figure 1**, right). On each trial, the cross-correlation between the two time series for each dyad was computed, varying the time delay from −2,000 to +2,000 ms (where positive delays imply that the back participant lags behind the front participant, or the left participant lags behind the right participant in side-by-side dyads). For statistical comparisons, mean r-values for each participant were computed using Fisher's z transform to correct for non-normality and submitted to a 4 × 6 × 2 mixed-model ANOVA (density × dyad × group); the mean z-values were transformed back into the mean r-values reported below. A similar ANOVA was performed on the optimal delay for each pair of time series.

# Cross-Recurrence Quantification (CRQ)

A non-linear, two-dimensional CRQ analysis was used to quantify the time-correlated activity between pairs of Φ time series (and pairs of s time series) for each dyad in a group. Referring to **Figure 2**, a CRQ analysis is conducted by first embedding the pair of normalized time series in a multidimensional, time-delayed phase space (see Webber and Zbilut, 1994; Shockley et al., 2002; Marwan et al., 2007). Because not all variables that make up the behavior in a dynamical system are necessarily knowable a priori, phase space reconstruction allows for the behavior of these potentially "hidden" variables in the dynamical system to be evaluated via their interaction with, or influence on, the known variable (in this case the Φ or s time series). Hence, the structure of the reconstructed phase space can reveal the underlying dynamics of the dynamical system as a whole. Specifically, the "neighborliness" of points within some tolerance or radius in phase space can indicate recurrent points in the two time series. These points represent states in one time series that closely correspond to previous, current or future states in the other time series, and can illustrate behavioral patterns of coordination in the observed system. The recurrent points are identified and represented in a cross-recurrence plot (see **Figure 2**, bottom), from which a suite of measures can be computed to quantify these patterns (see Shockley et al., 2002; Marwan et al., 2007 for a review of analysis procedures).

The present experiment focused on cross-maxline (CML): specifically, the longest diagonal line of consecutive recurrent points on a cross-recurrence plot. This provides a measure of the longest time interval that the heading (or speed) of two participants was coupled (i.e., the two participants maintained the same direction of travel or walking speed, as specified by a predetermined threshold viz. radius) during a given trial, and this interval could occur at any point during a given trial. CML is known to be sensitive to the temporal stability of coordination between two time series, associated with coupling strength. The parameters used for CRQ were as follows: for Φ, embedding dimension = 6; delay = 4 data points; radius within which points are counted as recurrent = 0.7% of the actual distance separating points in reconstructed phase space, and for s embedding dimension = 5; delay = 3 data points; radius within which points are counted as recurrent = 1.0% of the actual distance separating points in reconstructed phase space.

# RESULTS

# Principal Components Analysis

See **Figure 3** for sample biplots—a representation of both the observations and variables—of PC coefficients for a real group (left panel) and a virtual group (right panel). The clustering of the speed variables along the positive x axis of the real group (left) indicates a consistent, positive loading of those variables on PC1, as contrasted with the virtual group (right) where the variables exhibit greater variance around both the positive x (PC1) and positive y (PC2) axes.

# Number of Components

The number of components required to account for 90% of the variance was significantly lower in real groups (M = 3.61 ± 0.12) compared to virtual groups (M = 6.18 ± 0.07), F(1, 8) = 583.95, p < 0.001, η <sup>2</sup> = 0.99 (see **Figure 4**). Thus, the external task constraints appear to reduce the group DoF from 8.0 to 6.18, and the perceptual coupling between participants further reduced the DoF to 3.61, consistent with the emergence of global coordination. There was a significant interaction between group

FIGURE 2 | A schematic of the steps in the CRQ analysis. For each trial, the speed time series of one agent (FR = top left) and a second agent (BR = top right) are unfolded separately into a shared reconstructed phase space via time-delayed copies of each measured time series, denoted as sFR,BR (center, left). Recurrent points within a given radius and strings of recurrent points are identified with respect to each point in phase space and represented in a cross-recurrence plot (center, right), in which each axis represents the sFR and sBR time series at each time step. Each pixel indicates a recurrent point on a recurrence plot (bottom), and the diagonal line structures indicate the length of a string of recurrent points, or the co-evolution of the two time series at different time delays. The longest diagonal line, cross-maxline (CML), was computed for each dyad in the group.

and density, F(3, 24) = 3.46, p = 0.032, η <sup>2</sup> = 0.30; post-hoc tests revealed that this was driven by the group difference with the real groups exhibiting a lower number of components needing to account for 90% of the variance. No other main effects of dyad or density were found (p > 0.05).

### PC1

The first principal component accounted for significantly more variance in real groups (M =59.29% ± 0.79) than in virtual groups (M =31.47% ± 0.45), F(1, 8) = 142.60, p < 0.001, η 2 = 0.95. This result confirms dimensional compression in group behavior due to the visual coupling. There was also no main effect of initial density on the variance accounted for by PC 1, and no interactions.

## Contribution of Variables to PC1

The composition of the first principal component was further examined to determine the relative contribution of each of the eight behavioral variables, by computing the loading (r) of each variable on PC1. Overall, the s and Φ variables for all agent positions in the real group exhibited a stronger correlation with PC1 than they did in the virtual group (M = 0.36 ± 0.006 and M = 0.31 ± 0.008), F(1, 8) = 31.23, p < 0.001, η <sup>2</sup> = 0.78, suggesting that the behavior of real groups was more coherent than that of virtual groups. There was also a main effect of position, F(7, 56) =52.27, p = 0.000, η <sup>2</sup> = 0.867. Follow-up t-tests (Bonferroni corrected p ≤ 0.01) indicated that across all agent positions, the s variable was more strongly correlated with PC1 in the real groups than in the virtual groups (all p < 0.001), whereas there were no

group differences for the Φ variable (all p > 0.01). Within the real groups, the s variable had a higher correlation than the Φ variable (p < 0.001), whereas in the virtual groups, s and Φ did not significantly differ (all p >0.01). Greater group coordination was, therefore, primarily due to the visual coupling of walking speed; in contrast, individual headings were generally aligned whether or not participants were visually coupled, presumably due to the presence of a common goal. See **Figures 5A,C** for the distribution of correlation coefficients for the loading of speed on PC1 in the real and virtual groups, and **Figures 6A,C** for the corresponding distributions for heading. The descriptive values of skewness, kurtosis and variance for all coefficients loading on PC1 appears in **Table 1**.

#### PC2

The second principal component was also examined to determine the amount of variance accounted for in each group. The results indicated that PC2 accounted for significantly more variance in real groups (M =20.23% ± 0.68) compared to virtual groups (M =17.81% ± 0.68), F(1, 8) = 21.88, p = 0.002, η <sup>2</sup> = 0.73. There was no main effect of density nor significant interaction effects (p > 0.05).

### Contribution of Variables to PC2

Negative correlation coefficients were prevalent for PC2. Because of this, analyses were limited to qualitative observations and descriptive characteristics of the distribution of coefficients and skewness, kurtosis, and variance. The distribution of correlation coefficients for speed as it loaded on PC2 exhibited a negatively skewed, unimodal distribution for the real group compared to a somewhat biomodal distribution with almost no skew in the virtual group (See **Figures 5B,D** for the distribution of coefficients for speed in the real and virtual group, respectively). Similarly, the distribution for heading as it loaded on PC2 exhibited a bimodal distribution with less skew than the virtual group (see **Figures 6B,D** for the distribution of coefficients

for heading). See **Table 1** for skewness, kurtosis, and variance descriptive values for all coefficients loading on PC2.

# Cross-Correlations

#### Speed (s)

ANOVA on transformed r revealed a main effect of group, F(1, 8) = 57.76, p < 0.001, η <sup>2</sup> = 0.88, such that the real group was more strongly coupled than the virtual group (M = 0.832 ± 0.021 vs. 0.358 ± 0.166, respectively). There was also a significant group × density × dyad interaction, F(9, 72) = 2.88, p = 0.006, η 2 = 0.22. Bonferroni-corrected t-tests revealed that all real group dyads had a significantly higher correlation compared to the virtual group dyads (p < 0.001). No other comparisons were significantly different. ANOVA on the optimal delay revealed a group × dyad interaction, F(3, 24) = 3.02, p = 0.05, η <sup>2</sup> = 0.22, with follow-up tests indicating that the optimal delay for the real group back side-to-side dyad was significantly lower (M = 0.00 ± 0.00 s) compared to the corresponding virtual group dyad (M =

0.04 ± 0.07 s). No other significant differences were found with respect to group, density or dyad.

# Heading (Φ)

ANOVA on r revealed no significant effects of group, dyad, density or interactions between/among these factors. ANOVA on delay revealed a significant main effect of dyad, F(3, 24) = 3.16, p = 0.04, η <sup>2</sup> = 0.24; however, Bonferroni corrected post-hoc tests did not reveal any significant differences between the various dyads. No other effects of group, density or dyad were observed. As mentioned above, because all participants turned to walk to a common goal, their heading directions were highly correlated in the virtual group as well as the real group.

# Cross Recurrence Quantification

#### Cross-Maxline for s

Representative cross-recurrence plots for speed from a trial with a real dyad (**Figure 7**, left) and virtual dyad (**Figure 7**, right). Prior to inferential analyses a log10 transform was conducted to correct for positive skewness in the data. A significant main effect of group was observed on CML, F(1, 8) = 87.90, p < 0.001, η 2 = 0.917. Specifically, the real group exhibited an average CML (M = 111.13 ± 10.92 samples) more than twice as long as the virtual group (M = 48.78 ± 2.79 samples), irrespective of dyad or initial density. This result demonstrates that the speed coupling is significantly more stable in the real than the virtual groups. There were no main effects of density or dyad, but a significant density × dyad × group interaction was found, F(9, 72) = 3.16, p = 0.003, η <sup>2</sup> = 0.283. Follow-up t-tests (Bonferroni corrected p ≤ 0.01) indicated that the real groups were more strongly coupled than the virtual groups for all densities and dyads, but no other effects were significant (see **Figure 8**). These results imply that the speed coupling is equally stable at high and low densities, and for leader-follower and side-by-side dyads.

# MODELING

Given that speed coordination was significantly greater in real than virtual groups, whereas heading coordination was not, we proceeded to simulate speed coordination in real groups based on Rio et al.'s (2014) model of the local coupling. A dyad was simulated by using the time series of speed for one participant (the "leader") as input, and computing the time series of acceleration for a model "follower," according to Equation (1):

$$
\ddot{\mathbf{x}}\_f = \mathbf{c} \cdot \left[ \dot{\mathbf{x}}\_l - \dot{\mathbf{x}}\_f \right] \tag{3}
$$

where x˙<sup>l</sup> is the leader's speed, x˙<sup>f</sup> is the follower's speed, and c is a gain parameter. We adopted c = 1.87, the best-fit parameter value from Rio et al. (2014), and the initial speeds of the leader and

TABLE 1 | Descriptive properties of the Φ and s PC coefficient distributions for PC1 and PC2.


follower were zero. The simulation was evaluated by comparing the time series of the model "follower" with that of the human "follower."

Simulations were performed for each dyad on each trial. The six dyads were classified into three dyad types: front-back, sideby-side, and diagonal (see **Figure 1**). Front-back dyads were symmetrical relative to the group's walking direction, so they were analyzed together; the same held for diagonal dyads. By contrast, the side-by-side dyads were fundamentally different from one another; pedestrians in the front side-by-side dyad were visually coupled only to each other, while those in the back sideby-side dyad could potentially receive visual information from all three neighbors in the group. For this reason, the front side-byside and back side-by-side dyads were analyzed separately.

For front-back and diagonal dyads, the front participant served as the "leader" and the back participant as the modeled "follower;" side-by-side dyads were simulated twice, with the left (right) participant as the "leader" and the right (left) participant as the modeled "follower." Performance was evaluated by computing the correlation coefficient (Pearson's r) between the simulated "follower" time-series and the observed time-series of the human "follower" on each trial; root-mean-squared-error (RMSE) between the two time series was also analyzed.

# Simulations of Speed Coordination

Sample time series of the simulated and observed "follower" acceleration (both in red), together with the observed "leader" acceleration (in blue), for four dyads appear in **Figure 9**. The mean correlation for the front-back dyads was r = 0.89 ± 0.33 (RMSE = 0.26 m/s<sup>2</sup> ), for the diagonal dyads was r = 0.87 ± 0.01 (RMSE = 0.26 m/s<sup>2</sup> ), for the front side-side dyad was r = 0.79 ± 0.30 (RMSE = 0.29 m/s<sup>2</sup> ), and for the back sideside dyad was r = 0.74 ± 0.30 (RMSE = 0.28 m/s<sup>2</sup> ; **Figure 10** top). A two-way ANOVA on transformed r revealed a main effect of dyad, F(3, 64) = 8.00, p < 0.001, η <sup>2</sup> = 0.27. Post-hoc comparisons with Bonferroni correction indicated that the model performs significantly better on front-back dyads and diagonal dyads than on the back side-by-side dyad (p < 0.001 and p < 0.01, respectively), probably because back dyads are less strongly

FIGURE 7 | Sample cross-recurrence plots for speed time series from a real (Left) and a virtual (Right) leader-follower dyad. Note the presence of a main diagonal line (i.e., line of synchronization) and the additional diagonal lines that are visible in the cross-recurrence plot for the real dyad. These are indicative of a temporally stable speed coupling between agents.

coupled to each other and influenced by the front dyad. Post-hoc comparisons with Bonferroni correction showed no significant pairwise differences in correlation (p > 0.05) as a function of density.

A similar pattern of results holds for statistical tests on RMSE of speed (see **Figure 10**, bottom). A two-way ANOVA revealed a main effect of dyad on RMSE, F(3, 64) = 6.86, p < 0.001, η <sup>2</sup> = 0.24, and a main effect of density, F(3, 64) = 6.81, p < 0.001, η <sup>2</sup> = 0.24, but no interaction, F(6, 48) = 0.48, p > 0.05. Bonferroni-corrected post-hoc comparisons confirmed that the model performs better on front-back dyads and diagonal dyads than on both side-by-side dyads (p < 0.05).

In sum, the speed-matching model generalizes from pairs of pedestrians to small groups. It provides a close approximation of the local speed coupling, and successfully explains both pairwise coordination and an emergent group speed.

# DISCUSSION

The present experiment investigated the degree of coordination in pedestrian groups during goal-directed walking, with the aim of analyzing the effects of a visual coupling, group density, and neighbor position on collective behavior. We analyzed the behavioral variables heading Φ and speed s in a four-pedestrian group, yielding an eight DoF system. We then submitted the behavioral variables to a global (collective) analysis: (1) PCA to index the dimensional compression of group behavior; and to local (pairwise) analyses: (2) linear cross-correlation to estimate the coupling strength between dyads in a group, and (3) nonlinear CRQ to measure the dynamic stability of the local coupling.

Our main finding is that most analyses yielded evidence of spontaneous coordination in walking speed due to the visual coupling in real groups, compared to reshuffled virtual groups. It is important to point out that the external task constraints in this experiment (common goal, simultaneous go signal, simultaneous goal command, similar preferred walking speeds) by themselves induced similar behavior across individuals, which we estimated using the shuffled virtual groups. We expect that emergent heading and speed coordination would be observed in less restricted contexts, and research is under way to study spontaneous coordination in both heading and speed.

At the global level of analysis, the PCA indicated that visually coupled pedestrian groups exhibited significant dimensional compression across all experimental conditions. Note that the external task constraints accounted for a reduction of ∼2.2 DoF (from 8 to 6.2) in the virtual groups, a 23% reduction in DoF. Yet the visual coupling produced a further reduction of ∼2.6 DoF (from 6.2 to 3.6) in the real groups, or an additional 33% reduction in DoF. This is indicative of a functional reorganization of DoF via the informational coupling of behavioral variables, consistent with the emergence of collective coordination. These results are similar to those of Ramenzoni et al. (2012), who demonstrated dimensional compression in an interpersonal supra-postural task, and support the reduction of DoF in interpersonal coordination proposed by Riley et al. (2011).

The analysis of the composition of PC1 offers preliminary evidence of a new collective variable underlying the emergence of group coordination in the context of the current task. The loading of behavioral variables on PC1 suggests that speed coordination is a primary contributor to the collective behavior, whereas heading coordination was no greater in the real than the virtual group. Further, the analysis of the composition of PC2 demonstrated that the heading and speed loading is not simply dichotomous, as evidenced by the bimodal distribution for the heading coefficients in the real group (**Figure 5B**) and the negatively skewed unimodal distribution of the speed coefficients (**Figures 6B**). This indicates that the heading behavioral variable was a relatively weak contributor to the first two PCs overall. Thus, the remaining discussion focuses on the analysis and modeling of speed coordination.

At the local level of analysis, the cross-correlations for speed indicated a high visual coupling strength within the groups. Specifically, a significantly higher mean correlation was found for

the real group (r = 0.84) compared to the virtual group (r = 0.36), independent of dyad. This can be explained similarly to the PCA results, in that the visual coupling increased the speed correlation for all dyads. It appears that local coupling strengths can be reliably estimated by pairwise linear correlations. However, the pairwise cross-correlations did not reveal a significant difference between types of dyads. This could be due, in part, to the possibility that back participants were influenced by more than one neighbor at a time. We are currently developing a neighborhood model that allows us to estimate the combined influence of multiple neighbors.

The non-linear CRQ analysis provided further evidence regarding the strength and stability of the local coupling. Speed coordination exhibited a longer CML in real groups than in virtual groups, indicating that the visual coupling was dynamically stable. Specifically, real dyads were stably coupled for almost two full seconds (i.e., 111.13 samples at 60 Hz), at some point in each 6–8 s trial.

Taken together, the PCA, cross-correlation, and CRQ results indicate that the global coordination in the present task is due in large part to the local coordination of speed, which in turn emerges from the visual coupling between individual pedestrians. Finally, we tested whether an empirical model of the local speed coupling could reproduce the observed coordination patterns. The simulation results supported this interpretation, for the coordination of dyads in a group is reproduced by the speed-matching model. The simulation results show that the speed-matching model generalizes from pairs of pedestrians to pedestrian groups, and imply that the local coupling is sufficient to explain the adoption of a common speed. We conclude that the local visual coupling can account for the pattern of global coordination.

Somewhat to our surprise, we did not observe a consistent effect of density on the degree of coordination. In fact, no measures yielded significant density effects, consistent with our previous finding that speed coordination in following is independent of interpersonal distance over 1–3 m (Rio et al., 2014). It is possible that the range of densities tested (0.5–2.5 m spacing) was insufficient to reveal an effect, or that the external task constraints, combined with a short walking distance, limited the degree of variation in the data. Research is in progress to test a wider range of densities (up to 4 m spacing) over longer walking distances, without a common goal or timing signals.

Finally, we would like to mention that we also performed an uncontrolled manifold (UCM) analysis on the eight-dimensional Φ and s data (Scholz and Schöner, 1999), as another way to estimate the reduction in effective DoF. This approach was unsuccessful, and it is instructive to consider why that was the case. A UCM analysis depends on the existence of reciprocal compensation between two or more behavioral variables in the system, which is considered a signature of motor synergies. But in retrospect, there is no reason to expect reciprocal compensation in collective group behavior: the acceleration of one agent would not be expected to produce a compensatory deceleration by a coupled agent to maintain the mean speed, but rather a coordinated acceleration; similarly, a change in heading direction by a subset of agents would not be expected to yield compensatory heading changes in the other direction, but a coordinated turn by the group. This observation suggests that reciprocal compensation may not be a general characteristic of all forms of interpersonal coordination in human groups (cf. Riley et al., 2011).

The present work is a starting point for understanding collective behavior in pedestrian groups. We began by analyzing the local coupling in dyads, on the hypothesis that this generic coordination mechanism would scale up to small groups, large crowds, and even flocks or schools in other species. Expanding the methodological framework of interpersonal coordination (Riley et al., 2011; Ramenzoni et al., 2012) to the behavior of small groups, we obtained evidence of dimensional compression and speed coupling. The present framework provides a foundation for the analysis and modeling of local and global coordination in future research. It is likely that other factors may also constrain group coordination. For example, cognitive processes such as decision-making and motivation, and social factors such as group membership, dominance relations, and social communication, may influence the selection of goals, neighbors, walking speeds, and control laws and shape the emergent crowd dynamics. The present experiment evaluates ways of quantifying local and global coordination in many of these contexts, and offers an approach to characterizing emergent collective behavior.

# AUTHOR CONTRIBUTIONS

AK led all behavioral analyses on the project, and was the lead contributor on the manuscript. KR and SB participated in the experimental design, led all data collection, data post-processing, modeling and simulation efforts. They also contributed to the write-up of the manuscript. AW contributed to the behavioral analyses and the write-up of the manuscript. WW was the leader on the project and provided guidance on the project method

# REFERENCES


and analyses, and also provided support during write-up of the manuscript.

# ACKNOWLEDGMENTS

This research was funded by NIH 5R01 EY010923-25 (WW). The authors would also like to thank Henry Harrison for his help with subject recruitment.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Kiefer, Rio, Bonneaud, Walton and Warren. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Functional Synchronization: The Emergence of Coordinated Activity in Human Systems

Andrzej Nowak1,2 \*, Robin R. Vallacher<sup>2</sup> , Michal Zochowski<sup>3</sup> and Agnieszka Rychwalska<sup>4</sup>

<sup>1</sup> Department of Psychology, SWPS University of Social Sciences and Humanities, Warsaw, Poland, <sup>2</sup> Department of Psychology, Florida Atlantic University, Boca Raton, FL, United States, <sup>3</sup> Department of Physics and Biophysics Program, University of Michigan, Ann Arbor, MI, United States, <sup>4</sup> The Robert Zajonc Institute for Social Studies, University of Warsaw, Warsaw, Poland

The topical landscape of psychology is highly compartmentalized, with distinct phenomena explained and investigated with recourse to theories and methods that have little in common. Our aim in this article is to identify a basic set of principles that underlie otherwise diverse aspects of human experience at all levels of psychological reality, from neural processes to group dynamics. The core idea is that neural, behavioral, mental, and social structures emerge through the synchronization of lower-level elements (e.g., neurons, muscle movements, thoughts and feelings, individuals) into a functional unit a coherent structure that functions to accomplish tasks. The coherence provided by the formation of functional units may be transient, persisting only as long as necessary to perform the task at hand. This creates the potential for the repeated assembly and disassembly of functional units in accordance with changing task demands. This perspective is rooted in principles of complexity science and non-linear dynamical systems and is supported by recent discoveries in neuroscience and recent models in cognitive and social psychology. We offer guidelines for investigating the emergence of functional units in different domains, thereby honoring the topical differentiation of psychology while providing an integrative foundation for the field.

Keywords: synchronization, function, self-organization, mind, brain, social systems

# INTRODUCTION

Humans perform an astonishing array of activities with varying degrees of complexity, and they do so at a wide range of operational levels. On even the most mundane day, people prepare and consume meals, engage in physical exercise, plan activities, socialize with acquaintances and friends, drive a car and navigate traffic patterns, compose messages and letters, play games, accommodate their behavior to meet the demands of informal and formal social situations, daydream, and think about their personal qualities and weaknesses. On less mundane days, they may create music, write an essay or compose a poem, develop a theory, attempt to resolve a conflict, coordinate with other people to accomplish complex tasks, or play Pokémon Go. Each of these activities represents operations involving brain function, movement, perception, and higher-order cognition, and many of them also involve social interaction and coordination with other people who have their own personal and interpersonal agendas.

#### Edited by:

Joanna Raczaszek-Leonardi, University of Warsaw, Poland

#### Reviewed by:

Riccardo Fusaroli, Aarhus University, Denmark Iris Nomikou, University of Portsmouth, United Kingdom

> \*Correspondence: Andrzej Nowak nowak@fau.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 28 October 2016 Accepted: 22 May 2017 Published: 13 June 2017

#### Citation:

Nowak A, Vallacher RR, Zochowski M and Rychwalska A (2017) Functional Synchronization: The Emergence of Coordinated Activity in Human Systems. Front. Psychol. 8:945. doi: 10.3389/fpsyg.2017.00945

These activities and levels of operation are typically investigated in terms of their local dynamics, an established approach to understanding that has given rise to the highly compartmentalized discipline of psychology. Neuroscience, judgment and decision-making, and group dynamics, for example, tackle very different facets of human experience, and do so with little attention to possible underlying principles that provide integration for them. With this lack of theoretical integration in mind, our aim in this article is to suggest that the different activities and operational levels characterizing human experience can be understood in terms of a common process that has potential for forging a unified account of psychological functioning.

The core idea is that all operational levels of human activity, from brain function to group dynamics, represent the formation of functional units that result from the tendency for lowerlevel elements to achieve coordination and operate in concert to accomplish tasks. More specifically, we propose that functions in neural, psychological, and social structures emerge by the dynamic creation of functional units that are established by assembling a set of synchronizing lower-level elements into a coherent structure. This hypothesis has its well spring in principles of complexity science and non-linear dynamical systems and receives tentative support from recent discoveries in neurophysiology and recently developed models in psychological and social science.

# PROCESSES OF SYNCHRONIZATION

Brains, motor behaviors, minds, dyads, and social groups are clearly very different from one another. Brains are composed of neurons, motor behavior involves muscle contractions and limb movements, human minds represent the expression of thoughts, perceptions, and emotions, dyads consist of interacting individuals, and groups consist of many individuals in interaction. The elements in each case—neurons, muscle movements, thoughts and feelings, individuals—are clearly distinct by almost any criterion. From another perspective, however, these phenomena share important features. Each represents a complex system composed of many lower-level elements, and the operation of each system involves mutual influences among these elements.

We propose that these similarities across levels can be conceptualized in terms of common mechanisms by which any complex system performs a function. In broad terms, cooperative activity among elements is the essence of effective performance in any system. In more precise terms, the performance of a function requires the synchronization of specific elements, and changes in the configuration of these elements as the function unfolds in response to task demands.

# The Meaning of Synchronization

Synchronization can be described from two perspectives: at the level of system dynamics and at the level of influence among system elements. At the system level, synchronization refers to the coordination in time among the states or dynamics of the elements comprising the system (e.g., Schmidt and Richardson, 2008). With respect to the brain, this aspect of synchronization is manifest as in-phase relations in the activation of neural elements, or locking to an externally driven oscillatory signal (Buzsaki, 2006), although more complex forms of coordination are possible and have been observed. With respect to motoric behavior, the contraction of different muscle groups must be coordinated in time to coalesce into an activity (e.g., Bernstein, 1967; Turvey, 1990; Thelen, 1995; Kelso, 1997). With respect to the mind, an ensemble of cognitive and affective elements must be mutually consistent to generate a higher-order mental state such as an attitude, belief, or value (e.g., Thagard and Nerb, 2002). With respect to dyadic interaction, the overt behavior and internal states (e.g., emotions, attitudes) of the individuals must achieve coordination in time in order for the interaction to proceed smoothly (e.g., (Newtson, 1994; Fusaroli et al., 2014). And with respect to social groups, collective performance of any task requires the coordination in time of individuals' activities (e.g., Arrow et al., 2000).

At the level of elements, synchronization can be viewed in terms of mutual influence, with consistent signals arriving at an element from other elements (Singer, 1999; Engel and Singer, 2001; Uhlhaas et al., 2009, and references therein). In the simplest attractor neural networks, for example, correct recognition of an incoming pattern is associated with each neuron receiving relatively congruent signals regarding its state from all the neurons with which it has connections (Zochowski et al., 1993). With respect to motoric behavior, each muscle relevant to the behavior must receive congruent signals from the other relevant muscles in order to perform the behavior (e.g., Bernstein, 1967). With respect to the mind, a coherent view or attitude is experienced when the thoughts that arise in consciousness call to mind other thoughts that support the same view or attitude (e.g., Abelson et al., 1968; Tesser, 1978). In dyads, the separate components of each person's behavior (e.g., posture, facial gestures, postural cues, tone of voice, and speech content) coalesce into a coherent message (e.g., expressing an internal state, conveying an expectation, etc.) (e.g., Fusaroli et al., 2014). With respect to social groups, effective collective action depends on each group member receiving clear signals from other group members regarding of his or her contribution to the group effort (e.g., Forsyth, 1990). For example, attempting to synchronize one's walking with others marching in a parade is an easy task when the others are synchronized because the signals from them concerning one's suggested movements are consistent. If, however, the group is not synchronized, the signals arriving from different individuals are conflicting.

Both perspectives on synchronization—the temporal coordination of dynamics and congruence in signaling among elements—represent the binding of dynamics (i.e., the dynamics of one element is dependent on the dynamics of another element). Such binding does not necessarily involve performing the same action at the same time, but rather may involve compensatory dynamics. A group, for example, can have complex forms of synchronization if there are different tasks to be performed. This is clear in a band, for example, where each member plays a different instrument, yet each instrument

informs the other instruments where it is in the musical piece and what sound should be made at each moment.

The basic hypothesis that synchronization plays a crucial role in the emergence of functions, both within and between levels, is consistent with several lines of research from complex systems, social and cognitive psychology, and social science. The present model, however, extends existing models by identifying mechanisms by which the synchronization of elements occurs. In particular, it identifies a dynamic scenario in which synchronization is an intermittent phenomenon characterized by the repeated assembly and disassembly of elements in accordance with shifting tasks and challenges faced by the system.

# Assembly of Functional Units

Functional units may be mobilized in three ways that reflect the emergence of synchronization. First, synchronization may result from the structural connections among system elements; some of the elements of the system may be connected to other elements in a manner that is more or less stable, which creates the potential for communication and therefore mutual influence. Mutual influences through these links can establish synchronization, even if the links are relatively weak (Pikovsky et al., 2003, and references therein; Strogatz, 2004). Activation of each of the elements sends signals to other connected elements, resulting in synchronization among the elements of the whole assembly. Each instance in which a functional unit is assembled strengthens the connections between the elements, paving the way for the next appearance of the same configuration. In effect, if functional units arise on the basis of structural connections, they tend to recreate the same configuration of elements in consecutive emergence of the units.

This assembly process can be observed at the level of the brain, the mind, and social groups. In the brain, neural structures that possess anatomically systematic and direct connections with each other will tend to synchronize. Such connections facilitate synchronization either by transmitting excitatory and inhibitory impulses (Buzsáki and Draguhn, 2004; Buzsaki, 2006) or by modulating intrinsic neuronal properties of connected neurons (Bogaard et al., 2009; Fink et al., 2012, 2013; Knudstrup et al., 2016). In mental systems, the co-occurrence of cognitive elements creates new associative links and strengthens existing links between elements. At the social level, repeated synchronization between individuals increases their liking for one another and strengthens their interpersonal relations. Family and friendship ties, for example, can serve to synchronize the thoughts and actions of the individuals involved. In similar fashion, close friends are likely to cooperate in the achievement of diverse goals.

If the interactions between elements are reflected in structural connections, the stability of these connections will facilitate the recreation of similar (or identical) assemblies of elements. If a highly trained mechanism is disrupted, it is easy to re-establish. This is easy to appreciate in stable social groups. If the members of a family take a vacation in different places, for example, they are likely to reunite once their vacations have ended. However, if the elements are connected by quick-changing bindings of dynamics, a momentary alteration of the functioning of relations between the elements may contribute to an emergence of distinct functional units. Even a small disturbance of a newly formed mechanism may cause qualitative changes in its performance. Therefore, if someone or something divides a group of persons who randomly had a conversation on a street, they may never reunite again.

Second, elements are likely to achieve mutual synchronization if they become salient in some manner at the same time. This mechanism is likely to be used to synchronize elements that are instrumental to the achievement of a goal. Activation of these elements by an internal control process (e.g., attention) can result in their emergent synchronization. On the level of the brain, attention can momentarily bind the dynamics of elements (Lopes da Silva, 1991). As an example of this mechanism, Wróbel (2014) hypothesized that during perception, attention is mediated through activation of selected neural groups though oscillations in the beta band, that in turn are being synchronized to form specific representations, in the gamma band (Wróbel, 2014). On the level of mind, recalling elements that are relevant to a judgment or a decision activates these elements, which are then likely to be synchronized into a judgment or become the basis for a decision. In social groups, individuals who have skills instrumental to solving a group problem or the achievement of a goal are often explicitly or implicitly called upon, promoting the formation of a team that synchronizes to perform the function.

External factors may also induce momentary synchronization among a set of elements by selectively activating them. At the level of the brain, sensory input can activate distinct neural assemblies in the brain, with this heightened activation creating the potential for mutual influence among the respective assemblies. Impression formation exemplifies this mechanism at the level of the mind. Thus, those features that distinguish a person in a given context will be integrated into the resultant impression, while other features are likely to be neglected (e.g., Asch, 1946). At the social level, meanwhile, if a few people stand out as the most active and expressive in a large group, they are likely to become coordinated in some fashion because the activity of each is most visible to the others. Therefore, the persons who are active in a given situation begin to act spontaneously and have greater chances of creating a functional unit—in this case, a subgroup performing a task. There is a positive feedback loop between momentary synchronization and momentary influence among elements, such that coherent elements influence one another more strongly and elements that influence one another become increasingly synchronized (Waddell and Zochowski, 2006 ˙ ).

In the third mechanism, the state or the actions of each element suggests the possible range of states and actions of other elements. In neural networks, this phenomenon is described as multiple constraints and it is one of the basic mechanisms by which artificial neural networks function (McClelland and Rumelhart, 1986). This can be observed at the level of dyads and social groups; it is described as social codependency in game theory and as affordance categories in the ecological approach (Gibson, 2014). An example of codependency is a situation in which an individual stepping right or left makes this position unattainable for the other person. An analysis of reciprocal delimiting of one's own affordances is an important mechanism in the dynamic analysis of codependency in sport. For example,

synchronization of soccer players is partially a result of the fact that players of one team block their opponents in order prevent them from performing certain actions, thus reducing their affordance (Vilar et al., 2013). Synchronization of elements can thus emerge not only as the result of some elements inducing others to be in a specific state, but also by elements dynamically limiting the ensemble of states that the other elements can adopt. This mechanism can provide for complex patterns of synchronization in functional units.

# Dynamics of Functional Units

Most models emphasizing the emergence of functions through synchronization of lower-level elements typically assume a static framework, in which the dynamics (if any) are limited to simple externally or internally imposed tasks. Dynamic processes play a more prominent role in the present model, promoting sustained change in the structure and functioning of the system in question. The core idea is that in carrying out higher-order functions, various configurations of elements are composed and decomposed along with the development and achievement of the function. Once a function is accomplished, the set of elements may be disassembled, ready to be reassembled in a different manner to perform a different function. New functional units may also be subject to decomposition by a control mechanism; this takes place when the elements are unable to achieve sufficient coherence necessary for the unit to carry out its functions. The present model, in other words, emphasizes the intermittent nature of synchronization, with the repeated assembly and disassembly of functional units in response to changing tasks, challenges, and environmental constraints. Thus, synchronization is not a mere consequence of functioning but also an important component of self-regulatory control (Zochowski and Liebovitch, 1997, 1999 ˙ ; Zochowski and ˙ Dzakpasu, 2004; Waddell and Zochowski, 2006 ˙ ).

The dynamics underlying the assembly and disassembly of functional units mirror one another. Whereas increasing synchronization strengthens momentary influence among elements and thus creates a functional unit, decreasing synchronization weakens the momentary influence among elements and thus disintegrates the functional unit. Regardless of whether the initial factor is weakening of momentary influence or breakdown of synchronization, the functional unit disintegrates. These elements then may become integrated into different functional units.

Whether the system will organize the same elements into the same functional units depends on the degree to which the emergence of the functional unit is dictated by the structural properties (i.e., couplings between elements) as opposed to the temporary binding of dynamics induced by momentary synchronization. If the elements influence one another primarily by structural linkages, the relative stability of the connections will result in the re-emergence of similar, if not identical ensembles of elements. A highly automatic or overlearned response, for example, may be temporarily disrupted but is easily re-established in the same form. In like manner, synchronizing neural groups form different spatial patterns in different tasks, reassembling their coordination whenever the function performed requires it (e.g., Kelso and DeGuzman, 1991). If, however, the elements are coupled primarily by fast-changing bindings of dynamics, momentary changes in the functional relations between elements can make the re-emergence of the original configuration unlikely, promoting instead a vastly different functional unit. In performing a relatively novel act, for instance, even a slight disruption can promote a wholesale change in the action (Vallacher and Wegner, 1987).

Function imposes constraints on synchronization. Even the same act might involve different configurations of lower-level elements in order to perform a particular function. When hitting a chisel with a hammer, for example, professional blacksmiths unconsciously coordinate arm muscles to maintain precision from strike to strike. However, such precision is not present on the level of a single muscle. In one strike, a particular muscle might be more engaged than in another strike, with another muscle compensating for the muscle's lack of engagement (Bernstein, 1967).

# SYNCHRONIZATION IN PSYCHOLOGICAL PROCESSES

The functional role of synchronization can be seen at all levels of psychological reality: brain function, perception, motor behavior, higher-order action, mental processes, dyadic behavior, and collective action in social groups.

# Stimulus Representation and Consciousness

Synchronization plays a crucial role in how the brain performs its functions. Brain function requires both the segregation and integration of information, whether sensory or retrieved from memory. With the development of techniques for visualizing brain activity, we know relatively well how the brain segregates such information by specifying distinct regions for processing specific types of information. Our knowledge about how the brain integrates information, however, is much more limited. The leading hypothesis relates information integration to synchronization between regions processing different types of information (e.g., von der Malsburg, 1994; Singer and Gray, 1995). Synchronized activity of neural assemblies in the brain is theorized to be important to the performance of sensory and perceptual functions (von der Malsburg, 1994). Synchronized oscillations between brain regions have been observed in motor and cognitive functions, specifically in conscious processing (von der Malsburg, 1994; Tononi et al., 1998). Sensation of the simplest object requires the synchronized activity of neural ensembles (cf. Tononi and Edelman, 1998; Sauvé, 1999; Engel and Singer, 2001). Moreover, long-range synchrony between distant brain regions is observed in multiple forms of behavior (Harris and Gordon, 2015). Correlation code is also thought to underlie selective attention (Niebur et al., 2002; Gomez-Ramirez et al., 2016).

To understand how synchronization of neural activity could fulfill the role of information integration, we need to realize what a daunting task it is to combine inputs from so many dispersed and functionally distinct sources. The binding

problem represents the prototypical challenge for integration of information in the brain. If a person is perceiving a blue circle and a red square for example, how does the brain bind the shape and color features to form a representation of the object? In other words, how does the brain know that the circle is blue and the square is red?

Singer and Gray (1995) proposed that temporal characteristics of the neural activity are responsible for the binding, such that all the neuronal groups coding different features of the same object will synchronize their activity to within the range of milliseconds. This process enables integration of multiple features and the concurrent performance of multiple perceptual functions, such as the integration of features into several distinct objects. This can be achieved by using distinct temporal patterns (e.g., frequency and phase differences) for the performance of each function (i.e., integration of each object's features). The same mechanism may explain hierarchical organization, where one group of neurons belongs to more than one integrative unit at the same time (e.g., through synchronization on harmonic frequencies). The temporal correlation hypothesis also explains how integrated wholes may interact at higher levels of information processing, as synchronized neural assemblies form a functional unit at a higher level, which is distinguishable from other neural assemblies because of its particular temporal pattern. Synchronized neural assemblies are more visible than are unsynchronized assemblies, even if the former are smaller, because a neuron is much more likely to produce an action potential if the incoming signals from its input neurons are synchronized.

Such binding must occur across virtually all modalities: auditory binding may be needed to discriminate the sound of a single voice in the crowd, and binding across time is required to perceive the motion of the object. A cross-modal binding is required to associate the sound of a ball striking a bat with the visual percept of it, so both can be perceived as different aspects of the same event. Cognitive binding, for example, must link visual perception of an object with its semantic knowledge, memory reconstruction, and cross modal identification (see Neuron, 24, 1999, for a review). Synchronized activity is mostly visible (and recorded) as synchronous oscillations in the electrical activity between various brain regions. Interestingly, gammaband synchronous oscillations (GSO) of neural-electrical activity are believed to bind sensory sensations to represent distinct objects (Buzsaki, 2006; Buzsaki and Wang, 2012) and attention is mediated through activation of selected neural groups though oscillations in the beta band (Wróbel, 2014).

At each level of information processing, synchronized groups form functional units that integrate into increasingly complex structures. These neuronal groups from different brain regions may correspond, for example, to personal memories, affective reactions, and so forth, with respect to the object. Each assembly at a lower level may be responsible for detecting specific features of the stimulus, but it is the synchronized representation of the various assemblies that gives rise to conscious awareness of the object. Such a synchronized neural group is similar to the notion of cell assembly, as proposed by Hebb (1949), in which intragroup connections facilitate activation of the entire group when a single neuron is activated. This, in turn, strengthens the within-group connections, as epitomized by the phrase, "cells that fire together wire together." In effect, the strength of coordination partially depends on the history of learning and is represented by changes in the strength of synaptic connections (i.e., changes that occur on a relatively slow time-scale) that accompany learning.

The temporal correlation hypothesis does not require the formation of stable structural connections, but rather proposes that temporal strengthening of synapses (LTP long-term potentiation) may also be responsible for the creation of a synchronized functional unit. Functional units are therefore dynamical formations appearing for a short time and disassembling shortly thereafter, allowing for the creation of new functional units (Rychwalska, 2013).

To a certain extent, the interaction among elements may also change on an even more intermittent basis due to changes in focus of attention (e.g., Friston, 1994; Maunsell, 1995). Attention, in other words, brings together diverse groups of neurons that then have the opportunity to synchronize with one another.

The functional unit highest in the hierarchy that can be described in the brain activity is possibly a unified conscious "scene" (Tononi and Edelman, 1998)—a representation of a time frame in the stream of consciousness. Such high integration requires long-range correlations and complex temporal patterns of coordination. In other words, functional binding between distinct neural assembles has to be highly flexible, enabling the functional cluster to move through a sequence of distinct states without losing its synchronization (Koch et al., 2016; Palva, 2016; Ward, 2016; cf. Nakatani et al., 2013). At the same time, loss of consciousness itself (e.g., due to anesthesia) is generally associated with "cognitive unbinding" (Mashour, 2013 and references therein) and is thought to be mediated by loss of long range synchrony in the brain (Lewis et al., 2012).

# Higher-Order Mental Process and Structure

Once conscious representations are formed (in accordance with the scenario outlined above), they become elements subject to further integration processes that result in higher order mental structures such as action representations, judgments, and selfconcepts. As with the brain, synchronization plays a crucial role in this process. If the process of progressive integration can maintain synchronization among a subset of elements, it proceeds until a cognitive function is performed (e.g., a judgment, a meaningful action, a new insight into self), which in turn is subject to further integration processes, and so on.

Considerable research has established that coherence is indeed a basic principle in cognitive function and structure (cf. Abelson et al., 1968). Within this framework, a variety of mechanisms have been identified whose function is to maintain coherence in the face of incongruent information or social influence (e.g., dissonance reduction, discounting, selective memory, etc.) (cf. Tesser et al., 1996; Swann, 1997).

The nature of the cognitive function dictates the specific metric by which coherence is assessed. In forming a judgment of someone, the function is the establishment of an unequivocal

behavior orientation toward the person (cf. Jones and Gerard, 1967). In self-understanding, the function is self-assessment (cf. Tesser and Campbell, 1983). In action representation, the function is effective performance (cf. Vallacher and Wegner, 1987). In each case, the issue of coherence is how well the elements support each other (i.e., coordinate) in achieving their respective function. Thus, a coherent social judgment is one in which all the activated cognitive elements are consistent in their implications for evaluation of the target. In self-understanding, meanwhile, a coherent self-concept is one in which activated selfrelevant information paints the same evaluative portrait. And in action, a representation is effective to the extent that the lowerlevel action features synchronize to produce a fluid performance (cf. Vallacher et al., 1989; Csikszentmihalyi, 1990).

When coherence among elements cannot be achieved in the process of progressive integration, control mechanisms disassemble the emerging structure and attempt to coordinate the elements or a new set of elements. This process may be repeated until the function is achieved (i.e., a coherent judgment is reached or an effective action is performed) or, alternatively, it is possible for the disassembled elements to become reconfigured into an entirely different functional unit. A new function, in other words, may emerge from the disassembly and subsequent reconfiguration of cognitive elements (Vallacher et al., 1998). In action, for example, an inability to maintain the act of "persuading someone" may lead to a reconfiguration of one's speech acts as "expressing oneself."

The functioning of mind may thus be described as the continual assembly and disassembly of cognitive elements in the search for coherence. The stream of consciousness may ultimately be a tumbling ground for whimsies (James, 1890), but this very feature of thought enables the emergence of structure and effective function. The progressive assembly and disassembly of system elements is reflected in the temporal trajectory of emergent thought. In social judgment, for example, univalent (evaluatively congruent) information is organized into progressively higher level structures reflecting increased coherence, a scenario that is reflected in thought-induced attitude polarization (Tesser, 1978). Mixed valence information, however, tends to result in the repeated assembly and disassembly of differently valenced elements in a process of dynamic integration (cf. Vallacher et al., 1994; Vallacher and Nowak, 1997). The process of progressive integration has also been observed with respect to self-reflection, with individuals who are instructed to focus on the details of their action displaying increasing oscillations in their self-evaluations during selfreflective thought, indicative of the assembly of progressively higher-order evaluatively coherent structures (Vallacher and Nowak, 1999; Vallacher et al., 2002).

From the perspective of synchronization, coherence of cognitive representations is fundamental. Coherent representations will be integrated into higher-order representations, while incoherent ones will either be disintegrated or will have their incoherent parts eliminated in the process of integration. From this standpoint, the signals of coherence are global cross-modal signals. Coherence in one sensory modality favors progressive information integration in other modalities; incoherence in one modality disrupts signal integration taking place in different modality. Research has shown that watching incoherent figures evokes a sensation that a musical selection does not follow familiar principles, while watching coherent figures facilitates the feeling that such music is familiar (Ziembowicz et al., 2013; Winkielman et al., 2015).

Despite the deep roots of this perspective in classic treatments of mind (e.g., James, 1890; Kohler, 1929; Wertheimer and Riezler, 1944; Asch, 1946), the traditional approaches to modeling cognitive function have typically portrayed the mind as a stable organization of knowledge. Connectionism has emerged in recent years as the tool of choice in investigating how systems resolve conflict and maximize coherence (cf. Read and Miller, 1998). Thus, the function of cognitive networks is assumed to be the satisfaction of multiple constraints (represented by connections), such that the network achieves a configuration in which the states of nodes are least conflictful. Although connectionist models can solve the coherence problem, they have an important limitation with respect to modeling the scenario we have described. In particular, most models are limited to a single step, in that once a coherent solution has been achieved, the system is trapped in this state and does not evolve further.

# Action Control

Minds do not exist for their own sake, leaving people "buried in thought" (Tolman, 1951). The mental content and structures that emerge in line with the synchronization scenario outlined above provide the basis for overt behavior in the context of environmental constraints, challenges, concerns, and personal goals. Because the local environment for action is subject to noteworthy and continual changes, people's mental representations must be dynamic as well, undergoing reconfiguration when necessary to promote and maintain effective action and to repair ineffective action. This scenario of repeated assembly and disassembly of mental representations in service of effective action is central to action identification theory (Vallacher and Wegner, 1987). The theory holds that effective performance of an action is associated with progressive integration of the lower-level structural elements of the action. This integration of elements into a higher-level functional unit promotes a corresponding shift in the person's mental representation of what he or she is doing. A novice tennis player, for example, is likely to identify his or her behavior in terms of the basic acts involved—adjusting body position, swinging the racket, and so forth. As these basic acts become sufficiently synchronized to promote effective play on the tennis court, the person's identification of the action will change accordingly to a more integrative (higher-level) representation—"playing tennis," "getting exercise," or perhaps "competing against an opponent."

By the same token, if the action becomes ineffective when identified at a particular level of identification, the person is likely to shift to a lower-level identification that reflects the basic structural elements of the action. The tennis player who fails to play tennis effectively, for example, may regain mental control of the action by refocusing his or her conscious attention on shifting his or her body position and swinging the racket. Through this scenario of repeated assembly and disassembly of

mental representations of action, people eventually converge on an optimal level of action identification that reflects the degree to which the action's structural elements are synchronized and constitute an effective functional unit (e.g., Vallacher et al., 1989).

The emphasis on the cognitive representation of action in this scenario may seem at odds with a large body of research on behavioral coordination (e.g., Bernstein, 1967; Kelso and DeGuzman, 1991; van Wijk et al., 2012). Researchers in this area have emphasized that reactions to changing environmental circumstances and skill acquisition do not require conscious mental representations. Instead, there is a direct coupling of perception and action, such that environmental affordances are registered at a perceptual level without the need for higher-level cognitive interpretation. Environmental affordances also shape motor reactions through coupling of behavior and perception, such that refined and skillful enactment of behavior leads to finer distinctions in the perception of the context in which the action unfolds.

This perspective holds that in developing a motor skill, the specific movements become coupled, so that the system as a whole loses degrees of freedom (e.g., Bernstein, 1967; Turvey, 1990). So although hundreds of muscles are involved in even such an act as shaking hands, for example, it is unlikely that the central nervous system could cognitively cope with the control of each muscle. Bernstein (1967) suggested, however, that muscles form function-specific synergies—self-organizing assemblies by locally coupling and constraining each other's contractions. These patterns of mutual constraint are flexible, changing in accordance with the requirements of the function. The patterns of coordination among hand muscles, for example, is different when hitting than when grasping. The patterns of coordination are also context-specific. So even when performing the same task, the pattern of coordination may be quite different. Operating a wrench may require different muscle configurations when it occurs in a confined space (e.g., under the hood of a car) than when it occurs in an open space (e.g., on a workbench).

From the perspective of action identification theory, skills acquired at the motor level (e.g., the coordination of inter-limb movement configurations) correspond to the lowest levels of action identification. As the action becomes progressively mastered or habitual, patterns of motor coordination become non-conscious elements in higher-order units that are increasingly accessible to conscious representation. Once conscious representation of an action's higher-level meaning is achieved, however, the lower-level automated elements can, in principle, become subject to conscious representation as well. Learning to walk, for example, occurs without thinking about how to move one's legs; rather, it involves trial and error in service of navigating the physical environment. Although walking remains largely automatic once it is learned, such that its elements (e.g., shifting weight) are not mentally represented, circumstances may arise that bring these elements into consciousness. Thus, a slippery floor might focus a person's conscious attention of how he or she is shifting his or her weight and moving his or her legs. So although the mutual constraints promoting patterns of movement coordination may develop without conscious control, they may subsequently become subject to conscious control and modification.

# Dyads

In dyads, any interaction (e.g., conversing) or task (e.g., problem solving or moving a box) requires synchronization at various levels, including motoric behavior and internal states (emotions, thoughts) (e.g., Nowak et al., 2000). The development of interpersonal synchronization is well documented. In conversations, for example, individuals spontaneously synchronize their facial expressions (e.g., Stel and Vonk, 2010). This effect is so prevalent that people will even mimic the facial expressions of an inanimate object—for example, a robot (Hofree et al., 2014). Synchronization of facial expressions, in turn, tends to promote the corresponding emotional state in each member of the dyad, in line with the facial feedback hypothesis (e.g., Laird, 1974; Strack et al., 1988).

Computer simulations of dyadic interaction have shown the relationship between synchronization patterns and the inner properties of the two coupled units (individuals) takes diverse, and often quite unexpected, forms (Nowak et al., 2002). Although small changes in the dynamical properties of either unit may promote correspondingly small differences in synchronization, sometimes even very minor changes in these properties will produce qualitative changes that can be interpreted as phase transitions in the form of coordination.

When we take into account the complex dynamics associated with each individual, the higher-order system created by two individuals can become capable of especially rich dynamic properties, generating rich and complex patterns of coordination. The observed forms of coordination go beyond simple inphase synchronization and anti-phase synchronization to include considerably more complex forms (Nowak et al., 2005). The complexity of two coupled systems may greatly exceed the complexity of each of the component systems (i.e., individuals) or it may become drastically simplified in a scenario resembling the control of chaos (Ott et al., 1990).

Conversation is an especially important form of dyadic interaction. Fusaroli et al. (2014) argue that function is critical in organizing interpersonal synergy in a dialog. Beyond simple in-phase synchronization, the individuals in a dialog display complementary dynamics, with one person compensating for the other with respect to mistakes and perturbations. The two individuals become integrated into a higher-order unit that, in turn, influences their respective cognitive, linguistic, and motor processes aimed at achieving a common goal. Synchronization, in other words, occurs at multiple levels, both within and between the individuals.

The pattern of synchronization is modulated by the function of the interaction and by the interaction context. Thus, the mode of synchronization that is functional in one context might be dysfunctional in another context. For example, repeating simple utterances of a partner might be functional in a highly structured situation (e.g., repeating commands to ensure accuracy of communication), but would be awkward and redundant—hence, dysfunctional—in a unstructured social conversation.

Interactions in a dialog serve to distribute cognitive processes and actions between the individuals following the demands of the task and each individual's capacities. The dyad, then, becomes a higher-order unit capable of achieving more than what can be achieved by the individuals behaving alone. The function is defined at the level of the emergent dyadic whole rather than at the level of each individual. Fusaroli et al. (2014) argue that this process of organizing interpersonal interactions in a dialog is structured in service of a joint function rather than in the separate cognitive systems of the individuals. The interaction patterns are characterized by stability and clear ordering of the dynamics of both individuals (e.g., the rhythm of a conversation). The functionality of dyadic dialog is clearly visible in dimensional compression (Bernstein, 1967). This means that the collective variability in joint coordinative tasks is less than the variability of each individual's movements, in analogy to the coordination involved in an individual's performance of a task, as described earlier (p. 10).

# Groups

A social group is not only a set of people, the relations between them, and the social structure, but also the continuous process of synchronization of gestures, looks, acts, and communication (cf. Arrow et al., 2000). The achievement of a group task depends upon such synchronization (cf. Forsyth, 1990; Schmidt and Richardson, 2008; Marsh et al., 2009). Decision-making requires the coordination of information and opinions, for example, while the performance of a group action requires the synchronization of the actions of the group members. Synchronization also establishes group structure. Social relations, in fact, may be defined in terms of categories of synchronization (Baron et al., 1994; Newtson, 1994; Nowak et al., 1998; Marsh et al., 2009; Miles et al., 2009). Synchronization with group other members leads to the formation of social ties and promotes a feeling of connectedness (e.g., Chartrand and Bargh, 1999; Lakin and Chartrand, 2003; Dijksterhuis, 2005), while the inability to achieve synchronization evokes feelings of solitude (Nowak and Vallacher, 2007).

In the pursuit of coordination, individually conditioned behaviors merge into regular patterns of joint action (Guastello and Guastello, 1998; Marsh et al., 2009). The emergence of coordinated behaviors may be operationalized as a correlation in time between the internal states and the behaviors of individual members of a group. A group is more predictable (i.e., it has fewer degrees of freedom) than any of the individuals considered separately. This means that the behavior of group members both limits and is limited by the behavior of other members. Although participants of a group discussion take the floor independently, for example, they do so in the context of what has already been said.

Different challenges and tasks may require different patterns of coordination. A task may require negative feedback (reciprocal dampening of reactions), enacted by criticism, for example, or by reducing the number of possible decision variants. Alternatively, the task may require positive feedback intended to generate many ideas, motivate one another to work, or otherwise contribute to the group effort. When a group focuses on making a final decision between two options, for example, a discussion may involve a sequence of statements alternately expressing arguments for each of the options. Also, an increased number of "we" messages may appear in participants' references to the task at hand, since the group functions as a whole to make a collective decision or an action plan.

Momentary coordination of group members engaged in a discussion or a collaborative activity is a sinusoidal process—it rises and falls from moment to moment along with the work of the group. In a given moment of a group's duration, the behavior of its members organizes itself around a task to be performed or an issue to be discussed. The members of the group commence collaboration in order to carry out a task or to convince others to agree with a particular opinion. Temporary increases of coordination may be described as an emergence of functional units serving the purpose of carrying out micro-tasks. A given pattern of coordination between the participants breaks down immediately after a given objective is reached or a thread of the discussion runs out.

It is not necessary for the entire group to be synchronized; rather, different subsets of individuals will synchronize to accomplish a task and then de-synchronize once the task is completed (e.g., Sawyer, 2005). Over time, then, a group can be characterized by the emergence and disassembly of different interaction patterns reflecting the synchronization of various subsets of group members. Ziembowicz (2015), for example, demonstrated that in task-oriented groups, the momentary emergence of dyadic interaction structures tended to characterize the appearance and resolution of interpersonal conflict. Interactions involving more than two individuals, however, tended to be associated with more positive affect, weaker opinions, and greater inquiry. Different emergent social structures, then, carry out different functions in social groups.

The coordination of group members' behaviors occurs through their reactions to one another, and through the exchange of gestures, looks, and messages. But coordination can also occur on a deeper level with respect to emotions, judgments, beliefs, and action plans (cf. Nowak et al., 1998). Group-level synchronization is sometimes manifest as emotional contagion, for example, whether in face-to-face contact (e.g., Hatfield et al., 1993) or in social networks (e.g., Kramer et al., 2014). Research (Nowak et al., 2005; Johnson, 2006) has shown that synchronization on a behavioral level is fundamental for the possibility of deeper levels of synchronization. Visual synchronization is especially important for the emergence of mutual positive emotions and empathy.

Several mechanisms promoting positive synchronization in interpersonal relations and groups have been identified. Similarity in attitudes, for example, is a basic principle of interpersonal attraction (e.g., Byrne et al., 1986), promoting the development of social ties between two or more individuals. Computer simulations of social influence (Nowak et al., 1990) have demonstrated that locally defined influence principles (e.g., social impact, Latane, 1981) lead to the emergence of locally coherent clusters of like-minded individuals (e.g., those with similar opinions or beliefs). Computer simulations of social interdependence have also demonstrated the emergence of locally

coherent structures, where coherence is defined as similarity in strategies of interpersonal relations (e.g., Hegselmann, 1998; Nowak and Vallacher, 1998, Chapter 7; Axelrod, 2006). Mechanisms have also been identified that preserve and enhance interpersonal and group coherence, such as the rejection of deviates and the emergence of group norms (e.g., Festinger, 1950; Clore and Gormly, 1974; Latane, 1981).

The social ties that result from deeper levels of synchronization provide for increased influence among group members, analogous to synaptic connections in the brain and to associations in the mental system. A variety of factors apart from social ties, however, affect coordination in a group. For example, physical proximity momentarily magnifies the effective influence among individuals. The momentary salience of particular individuals (e.g., by virtue of physical appearance or behavior) can also affect the temporary configuration of links between individuals, magnifying some and weakening others. Momentary coherence (e.g., a shared mood or activity) can also reconfigure the links between subsets of individuals. Such coherence might be induced, for example, by some external signal such as music or highly salient events. In work groups, meanwhile, different structures of communication among group members tend to be associated with the emergence of correspondingly distinct modes of task solution and problemsolving (Leavitt, 1951; Shaw, 1951; Guetzkow and Simon, 1955).

Even in the context of existing social relations, not all interpersonal or communication links are activated at the same time. A person clearly has stable links to his family, for example, but these are not active when he or she is in some other social setting (e.g., work). In combination with the factors that operate independently of social ties (proximity, etc.), this suggests that social groups, much like mental and neural structures, have an assembly and disassembly aspect to them, reconfiguring themselves continually in response to changing environmental demands and contingencies.

Coordination among group members is typically associated with effective collective action. Beyond promoting strong and enduring bonds (i.e., cohesiveness) in a group (Forsyth, 1990), coordination has been identified as a critical factor in optimizing performance in work groups (Steiner, 1972) and sports teams (Vilar et al., 2013). At the same time, though, research has traced certain forms of dysfunctional group dynamics to global synchronization among interacting individuals. In "groupthink," for example, a heightened concern with group cohesion can stifle dissent and thereby short-circuit natural self-correction tendencies (e.g., critical feedback, desire for individuation) that might otherwise prevent ill-conceived group decisions and actions (Janis, 1982).

Although existing relationships among the individuals in a group can promote the emergence of a collective functional unit, group-level synchronization can emerge in the absence of social ties. The phenomenon of "deindividuation" (Zimbardo, 1969; Diener, 1980), for example, refers to the loss of individual identity and self-awareness in large, unstructured groups engaged in a common action. This phenomenal state tends to produce heightened coordination of moods, thoughts, and actions among all the individuals in the group, which can promote irrational and sometimes violent behavior. The most extreme manifestation of global group synchronization is panic, where each individual tries to perform the same action (e.g., leaving through a single door from a burning building) without adopting a more functional mode of coordination (e.g., turn-taking). In their model of collective action, Turner and Killian (1957) noted that in unstructured group situations that today are seen as breeding grounds for deindividuation, there is often the spontaneous emergence of a group norm that synchronizes and maintains the actions of the group as a whole.

In sum, the coordination of individuals' actions in a group or collective context—whether productive as in problem solving or seemingly irrational as in groupthink or deindividuation—represents the emergence of functional units. In this scenario, individuals represent lower-level elements that become synchronized, either through their mutual influence or through their common response to an external signal (e.g., a leader, a perceived threat or an opportunity). Groups are certainly distinct from neural systems, conscious representations, individual actions, and dyadic interactions, but they conform to the same formal scenario we have described for these other basic levels of psychological functioning.

# SYNCHRONIZATION BETWEEN LEVELS OF PSYCHOLOGICAL REALITY

This model can be used to understand the emergence of higher-order functional units at progressively higher levels of integration, linking neural, psychological, and social processes in a larger dynamical system. The idea that similar dynamical principles operate at different levels—from neural to behavioral to social—and that these levels influence one another in both a bottom-up and top-down manner has been articulated by complex systems theorists (e.g., Kelso et al., 2013).

In the bottom-up mode, synchronization of elements creates a functional unit that can then function as an element in further synchronization. By this means, synchronization at the level of the brain underlies the creation of thoughts and feelings. The synchronization of thoughts and feelings in an individual can then promote the emergence of his or her judgments and action plans. Once judgments and action plans are created within an individual, these higher-order mental states can synchronize with the judgments and action plans of other individuals with whom the individual is interacting. In a different route, patterns of synchronization among neurons in the brain can induce corresponding patterns of synchronization between muscle movements (Kelso et al., 2013), so synchronization of neuronal groups in the brain can induce behavioral synchronization in a direct way, bypassing cognitive representation. In both cases personal synchronization serves as the platform for dyadic synchronization. In a continuation of this process, synchronized dyads can synchronize with each other to promote effective group performance. On a dance floor, for example, dyads consisting of well-synchronized dance partners can navigate the dance

floor, coordinating with other dyads and avoid colliding with them.

In this process, there are two types of transition between the lower-level and higher-level units. First, the synchronized ensembles of elements become unified into a single functional unit. Individuals who synchronize on a task, for example, become a team, which can then become an element in a higher level of organization—the work group. In this form of synchronization, higher-order units can be decomposed into its component lower-order units. In the second form of transition, a pattern of synchronization among elements at one level which can be described by an order parameter (Haken, 1987) may become an element on the higher level. Roughly speaking, an order parameter is a global variable that describes patterns of dependency among the elements of a system. Organization of system elements, as described by an order parameter, becomes an element in a higher-level system. The same set of elements, in other words, may be synchronized in different ways to produce correspondingly different values of the resultant order parameter. For example, the specific pattern of synchronization among neurons gives rise to specific thoughts and feelings. So in contrast to the first type of transition, it is the type of synchronization rather than the particular subset of elements that gives rise to the higher-level unit.

It is also the case that synchronization at a higher-level can promote patterns of synchronization at a lower level. Social interaction, for example, induces thoughts and feelings in the individuals, which can in turn influence their expectancies and patterns of attention, which can then induce synchronization at the level of neuronal activity. Attention, for example, induces synchronization in the beta frequency, which sensitizes the appropriate set of neurons to synchronize more readily in the gamma wavelength in the process of perception (Wróbel, 2014).

The bottom-up and top-down processes interact with each other in reciprocal feedback fashion, which creates a synchronizing dynamical system that can promotes continual modification and adjustment within and between levels. Synchronizing elements on the lower level self-organize into wholes with emergent properties on the higher levels. These emergent wholes, in turn, influence patters of synchronization of lower level elements. Two individuals interacting in a dialog, for example, form a dyad with properties that cannot be reduced to minds of the interacting individuals. The dyad, as an emergent whole, influences individual's movements, language, cognitions, and emotions, which in turn influences the properties of the dyad (Fusaroli et al., 2014).

# MEASURING SYNCHRONY

The model we have presented brings new understanding of how functions are performed by systems at different levels from mind to social groups. But the model has another benefit as well: the idea that functional units are assembled and disassembled to follow the demands of the task points to a novel way of defining and measuring functions. We can analyze what particular configurations of coordinating elements—be they neurons, concepts, or individuals—are required to perform specific functions.

# Functional Connectivity

To analyze the composition of a functional unit, we treat each synchronized pair of elements as a functional link. For a given period of time adjusted for the system under scrutiny—i.e., milliseconds for neural activity, seconds for the coordination of memories, or minutes for group discussion—we can then combine such existing functional links into a network. In this depiction, functional units' properties can be analyzed with the help of network analysis. For example, we can measure the density of the functional unit: if the density is high (i.e., there are many coordinating pairs), we can assume that either the task performed is complex or requires redundancy. If the density is low, we can hypothesize that either the task is simple or the performing system has well defined roles for its elements. Other network measures—such as diameter or path lengths can be used in similar way to understand both the dynamic requirements of the task as well as the system's efficiency in performing it.

To date, network analysis has been the primary method for analyzing the structure of various systems. Possible dynamics and functions are usually inferred from the properties of structure (Watts and Dodds, 2007; Baronchelli et al., 2013; Weng et al., 2013). However, in very many systems—from brain through the cognitive system up to whole societies—the same structure of connections permits the system to perform various, sometimes diametrically different functions. Therefore, structural network analysis is not enough to understand how function is performed. Network science only partially acknowledges the problem through analyzing the changing structure of networks (Capocci et al., 2006; Holme and Saramäki, 2012). What we propose is to complement standard network analysis with the analysis of functional links dynamically formed by elements coordinating through stable, structural connections.

The dynamical approach to network analysis has to some degree tackled this issue by proposing the paradigm of timedependent or temporal networks (Holme and Saramäki, 2012). This approach has evolved from the observation that most of the network systems analyzed do not "exist" for most of the time. For example, the huge networks of phone contacts only form a connected component (i.e., a network) if they are aggregated over many units of time (hours, days, months). If one were to analyze a single minute, at best the network would consist of many pairs of connected nodes. What has so far been analyzed as a network is usually just a set of possible (latent) connections that effectively exist only for limited periods of time.

Temporal network analysis is suitable in all those cases where the dynamics of the process going on the network is on a similar time resolution as the formation of the structure of the network. All networks dependent on face-to-face contacts (epidemics, opinion dynamics, etc.) or tele-contacts (phone,

social media, texting, etc.) will fall in this category. In those cases, while it is still valuable to understand the network of latent connections, such analysis should be complemented by analyzing the dynamics of connection change as it severely affects various network measures (i.e., path lengths, reciprocity of connections, connected components, etc.).

Temporal network analysis has discovered that the network evolution over time in telephone contacts displays an interesting regularity. Certain connection sequences reappear more frequently than they should by chance (Braha and Bar-Yam, 2009; Kovanen et al., 2011). Such temporal network patterns dynamical motifs—in phone calls are thought to reflect dynamics of the most common social processes over the underlying, stable structure of social acquaintance links (e.g., scheduling and feedback confirmation of meetings in a triad: A->B->C->A).

Dynamical motifs are a first step at analyzing not only structure, but also dynamics of a system as a network, which could help understand how a certain social process can be inferred from the changing structure of a network. We push this idea much further—we propose that certain spatiotemporal patterns of coordinating elements can be extracted from interacting elements and analyzed with network measures to show how a (relatively stable) structure of a system gives raise to many different functions, on different temporal scales.

# Functional Connectivity in Neural Systems

So far, this type of analysis has been used to study neural systems. There, it is especially easy to differentiate between structure and dynamics. Structural links are the (relatively) stable anatomical connections and functional links are the temporal dependencies between the activities of different neural regions (Friston, 1994; Baronchelli et al., 2013). Such links can be extracted at different temporal and spatial scales: from matching spike trains of single neurons or small neural assemblies, from correlated local field potentials of cortical columns as well as from phase locked EEG/MEG recordings from large cortical areas. Although the structural connections limit the possible functional connections, the relation is not unidirectional. Neurons that fire together, wire together (Hebb, 1949). That is, structural links are formed to strengthen the coordination patterns resulting from concurrent activation (i.e., common stimuli), sometimes distinguishing between millisecond differences in synchrony (Bi and Poo, 2001; Caporale and Dan, 2008).

Network analysis of functional connectivity has been successfully applied to brain function (Salvador et al., 2005; Stam and Reijneveld, 2007; Rychwalska, 2013). It has proven to be a useful methodology for the understanding and diagnosis of particular pathologies of brain function, such as Alzheimer's disease (Stam et al., 2007), epilepsy (Ponten et al., 2007), and aging (Meunier et al., 2009). What is particularly promising is that analysis of functional connectivity reflects and differentiates between specific tasks—for example, singing from counting (Shirer et al., 2012) or passive observation from classification tasks (Krienen et al., 2014).

In this area, it is also clear how this method of analysis can also be used to measure the dynamics of functional unit assembly and disassembly. By depicting it as a dynamic network of functional links, we can analyze the change of network measures in time and understand how the demands of the tasks evolve. Functional connectivity networks in the brain change over time (Valencia et al., 2008), which suggests that they indeed evolve with the needs of the task.

# Future Directions

Although functional connectivity analysis has not yet been applied to mental processes or group functioning, it could prove to be a promising direction. In connectionist models of activation spread over memory or semantic network, for example, synchronization of activation of concepts can be easily portrayed as a functional network. The synchronous activation of various elements of the self-concept can also form a graph, with congruencies depicted as positive links and incongruencies as negative links.

In the analysis of groups, social network analysis is a rapidly developing research approach. However, it rarely recognized that configurations are meaningful for function of the social system (Johnson, 2013) or that links can be formed not only through structural connections (e.g., Facebook friends, contacts list on the phone), but also through functional ones (coordinated activity). Applying functional connectivity analysis to group function could illuminate how collective tasks present constraints on the required coordination patterns and how these patterns evolve to enable the group to flexibly switch between different functions.

The challenge for future research in this paradigm is to define meaningful markers of coordination in the respective areas (e.g., cognition, social interaction) that could be used to extract functional links with meaningful temporal resolution (i.e., allowing dynamical assembly and disassembly of functional units). Concurrent activation of certain concepts in the semantic network could be measured by combining physiological (e.g., eye-tracking) methodologies with computer-based methods (e.g., mouse tracking). In the social domain, the vast amounts of data traces collected by social networking through new media (the socalled Big Data) that often contain timestamps of activity could provide a valuable source of possible markers of coordinated activity.

# CONCLUSION

The model we have described offers a way to reframe distinct phenomena in terms of basic principles of synchronization dynamics. Whether the focus in the brain, cognition, social judgment, action, or group behavior, effective functioning is achieved through the synchronization of the lower-level elements at issue (neurons, thoughts, movements, opinions) to form functional units relevant to the task at hand. The coherence provided by the formation of functional units is often temporary, in place only as long as is necessary to perform the task. With changing task demands, then, there is repeated assembly and disassembly of different functional units, each providing the

coordination necessary to perform a particular task demand. In this view, neural, mental, action, and social processes do not represent the output of static structures, but rather represent inherently dynamic systems that operate in accordance with a press for coherent functioning.

Although the importance of coherence in psychological systems is widely acknowledged across disciplines, the mechanisms by which coherence is achieved and maintained is not well understood, nor has there been an attempt to identify such mechanisms that are scalable across different levels of psychological functioning. The model we have presented is an attempt to provide this integration. Although there is tantalizing evidence in favor of this integration, the model is in its nascent stage and thus should be viewed as a heuristic for research agendas. With the appropriate degree of coordination

# REFERENCES


of such research efforts, a comprehensive theory of psychological processes may emerge that can establish a functional scientific paradigm for the understanding of human experience.

# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

# FUNDING

We acknowledge the support of the grant from Polish National Science Centre NCN 2011/03/B/HS6/05084.



Hebb, D. O. (1949). The Organization of Behavior. New York, NY: Wiley.


Kohler, W. (1929). Gestalt Psychology. New York, NY: Liveright.



Turner, R. H., and Killian, L. M. (1957). Collective Behavior. Oxford: Prentice-Hall.

Turvey, M. T. (1990). Coordination. Am. Psychol. 45, 938–953. doi: 10.1037/0003- 066X.45.8.938



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with the author AR and states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2017 Nowak, Vallacher, Zochowski and Rychwalska. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Novel Computer-Based Set-Up to Study Movement Coordination in Human Ensembles

Francesco Alderisio1 †, Maria Lombardi 2 †, Gianfranco Fiore<sup>1</sup> and Mario di Bernardo1, 2 \*

*<sup>1</sup> Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom, <sup>2</sup> Department of Electrical Engineering and Information Technology, University of Naples Federico II, Naples, Italy*

#### Edited by:

*Joanna Raczaszek-Leonardi, University of Warsaw, Poland*

### Reviewed by:

*Andrew D. Wilson, Leeds Beckett University, United Kingdom Paul Baxter, University of Lincoln, United Kingdom Daniel Richardson, University College London, United Kingdom*

> \*Correspondence: *Mario di Bernardo*

*m.dibernardo@bristol.ac.uk*

*† These authors have contributed equally to this work and as first authors.*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *07 October 2016* Accepted: *26 May 2017* Published: *09 June 2017*

#### Citation:

*Alderisio F, Lombardi M, Fiore G and di Bernardo M (2017) A Novel Computer-Based Set-Up to Study Movement Coordination in Human Ensembles. Front. Psychol. 8:967. doi: 10.3389/fpsyg.2017.00967* Existing experimental works on movement coordination in human ensembles mostly investigate situations where each subject is connected to all the others through direct visual and auditory coupling, so that unavoidable social interaction affects their coordination level. Here, we present a novel computer-based set-up to study movement coordination in human groups so as to minimize the influence of social interaction among participants and implement different visual pairings between them. In so doing, players can only take into consideration the motion of a designated subset of the others. This allows the evaluation of the exclusive effects on coordination of the structure of interconnections among the players in the group and their own dynamics. In addition, our set-up enables the deployment of virtual computer players to investigate dyadic interaction between a human and a virtual agent, as well as group synchronization in mixed teams of human and virtual agents. We show how this novel set-up can be employed to study coordination both in dyads and in groups over different structures of interconnections, in the presence as well as in the absence of virtual agents acting as followers or leaders. Finally, in order to illustrate the capabilities of the architecture, we describe some preliminary results. The platform is available to any researcher who wishes to unfold the mechanisms underlying group synchronization in human ensembles and shed light on its socio-psychological aspects.

Keywords: multiplayer games, social interaction, human ensembles, coordination, group synchronization, human-robot interaction, computer software, client-server

# 1. INTRODUCTION

Interpersonal coordination between the motion of two individuals performing a joint task has been extensively studied over the past few decades (Schmidt and Turvey, 1994; Richardson et al., 2007; Oullier et al., 2008; Schmidt and Richardson, 2008; Marsh et al., 2009; Varlet et al., 2011; Walton et al., 2015; Słowinski et al., 2016 ´ ); a recent example being that of the mirror game, presented as paradigmatic case of study where human participants (HP) imitate each other's movements in a pair (Noy et al., 2011). In general, multiplayer scenarios have been investigated less than those involving only two participants, because of practical problems in running the experiments and the lack of models accounting for movement coordination in human groups, in contrast to the numerous studies dealing with animal groups (Couzin et al., 2005; Nagy et al., 2010, 2013; Zienkiewicz et al., 2015).

In this methodological paper we present a novel computerbased set-up that we named "Chronos" (a tool to study synCHRONizatiOn and coordination in human ensembleS), which allows participants to perform a joint task from a distance, both in dyads and in groups. The platform extends the mirror game to multiplayer scenarios, allowing each subject in the group to run a serious computer game where s/he can move the position of an object on her/his computer screen and see traces of the objects moved by the other players. The software makes it possible to show on each screen only the traces of a designated subset of the players in the group (as decided by an Administrator), so that different interaction patterns among its members can be implemented. To prevent any form of social interaction when in the same environment, participants are visually separated by barriers and hear white noise (through headphones connected to their computers) isolating them from any external sound. Alternatively, players can join the group remotely. Therefore, subjects have no information on the identity of those they are interacting with and receive no direct visual or behavioral cues from other members in the group.

Previous existing results on multiplayer human coordination include studies on rocking chairs (Frank and Richardson, 2010; Richardson et al., 2012; Alderisio et al., 2016b), group synchronization of arm movements and respiratory rhythms (Codrons et al., 2014), music (Glowinski et al., 2013; Badino et al., 2014; Volpe et al., 2016), and sport activities (Wing and Woodburn, 1995; Yokoyama and Yamamoto, 2011). However, in these papers the features and the level of coordination are not explicitly correlated to the way the players interact (i.e., the structure of their connections), as all the subjects involved share direct visual and auditory coupling with all the others, and no other patterns are considered. Moreover, inevitable social interaction affects the level of coordination in the group (Healey et al., 2005; Kauffeld and Meyers, 2009; Passos et al., 2011; D'Ausilio et al., 2012; Duarte et al., 2012, 2013; Glowinski et al., 2013; Cardillo et al., 2014). Indeed, body movements, friendship relationships, shared feelings, particular affinities, and levels of hierarchy have a significant impact on how each individual in the ensemble chooses her/his preferred partner(s) to interact the most with (Baumeister and Leary, 1995; Mäs et al., 2010; Stark et al., 2013). By making it possible to implement different visual pairings and minimize social interactions, our computer-based architecture allows instead to assess the impact on the group coordination level of solely varying the structure of interconnections among its members, a phenomenon that has been suggested to be crucial in determining the level of coordination arising in a human group (Passos et al., 2011; Duarte et al., 2012, 2013; Alderisio et al., 2016c).

In addition, Chronos allows the trace of some objects on the players' screen to be moved by virtual players (VP) driven by a computational cognitive architecture which is described in the rest of the paper. Simulated agents make it possible to further analyze the mechanisms underlying human coordination, explore features that are not easily accessible in ordinary human interactions, and point out interesting aspects of the task that are not immediately obvious from experimental observations (Di Paolo et al., 2008; Froese and Di Paolo, 2010). In so doing, we take inspiration from the human dynamic clamp (HDC) originally introduced in Dumas et al. (2014) as a paradigm to control the interactions between a human and a surrogate (virtual agent). By changing the mathematical description of the latter and appropriately tuning its parameters, it is possible to explore different behaviors and tasks, as well as test hypotheses and shed light on how humans interact. In our work we extend the HDC to a multiplayer scenario, thus allowing virtual agents to interact within a group of people.

To illustrate and validate the use of the set-up, we present preliminary experimental results obtained in a group of 5 individuals performing a joint oscillatory task. Specifically, we use the platform to vary the interaction patterns between players and observe the effects of such variations on the coordination level. Rather than being exhaustive, the experiments are reported here to illustrate the capabilities of the new methodology implemented via the platform. Therefore, we leave to future publications a more thorough experimental confirmation of the preliminary findings reported here for illustrative purposes. We also demonstrate that the platform allows the deployment of virtual computer players driven by feedback control algorithms. Specifically, we run preliminary experiments where a VP is enabled to interact with either one or a group of 4 human players, showing that its presence has an effect on their coordination level. Again, these experiments are not meant to be exhaustive but are only included to demonstrate the ability of the novel set-up to run trials involving multiple virtual players (a feature that has not been presented anywhere else in other methods in the literature on movement coordination in groups).

The new platform we present is available for download from https://dibernardogroup.github.io/Chronos to every interested reader.

# 2. CHRONOS ARCHITECTURE

The proposed computer-based platform is a hardware/software set-up consisting of input/output devices, a centralized unit (server and client-adiministrator) processing data, broadcasting movement information to the various client-players and implementing virtual agents, and a Wi-Fi apparatus connecting all the components together. The central server unit receives position data from the client-players and broadcasts to each position data from a subset of the others, according to the desired structure of interconnections being implemented. For example, in a ring network each client-player will only receive position data from two neighboring client-players. The movements of each human agent are detected by a low-cost position sensor, and individuals interact with each other through their own personal computer, on whose screens the central unit broadcasts the appropriate position trajectories according to the assigned topology (visual interaction patterns). The central unit is also responsible for data management and storage.

The proposed set-up is shown in details in **Figure 1** for the case of N human participants and M virtual agents, and described below in all its components (for more information on how to use the set-up and for download of the software,

(WLAN) by means of a dedicated Wi-Fi router.

see https://dibernardogroup.github.io/Chronos and Section 1 of Supplementary Material).

# 2.1. Hardware Equipment

The hardware equipment consists of:


according to the position detected by the position sensor: one of them is kept fixed, while the other corresponds to the input received by the sensor. One additional computer is needed to run the server and a GUI that allows the administrator to set the experimental parameters and the desired visual interaction patterns (see Section 2.2). No further machines are needed to implement the M virtual players, as the cognitive architecture driving their motion is run by the central server that dispatches their position data to the various clients as required.


Furthermore, barriers are employed to separate the players and prevent them from being directly visually coupled (**Figure 2**).

# 2.2. Software Architecture

The software architecture, which is based on a client-server model (Berson, 1996), consists of:


# 2.3. Virtual Player Implementation

Let x(t) ∈ R be the state variable representing the position of the virtual player at time t. The system describing its behavior is given by the following dynamical system:

$$
\ddot{x}(t) = f(x(t), \dot{x}(t)) + u(t) \tag{1}
$$

where f represents the vector field modeling the inner dynamics of the VP when disconnected from any other agent, x˙ and x¨ represent velocity and acceleration of the VP, and u is the control signal modeling how the VP interacts with other players, i.e., its coupling function.

Chronos allows to select different combinations of inner dynamics models and control signals to describe the motion of the virtual player. In what follows, different alternatives are proposed for the VP to exhibit human-like motion features when interacting with one or more partners (Alderisio et al., 2016a,b). Further details on why each of these mathematical models successfully enable a VP to behave in a human-like manner can be found in Zhai et al. (2014a,b, 2015, 2016, 2017).

### 2.3.1. Inner Dynamics Models

The alternative models describing the inner dynamics f of the virtual player can be listed as follows.

• Harmonic oscillator, a linear system given by

$$f(\mathbf{x}, \dot{\mathbf{x}}) = -(a\dot{\mathbf{x}} + b\mathbf{x}) \tag{2}$$

where a and b represent viscous damping coefficient and the elastic coefficient, respectively.

• HKB equation, a nonlinear oscillator given by

$$f(\mathbf{x}, \dot{\mathbf{x}}) = - (\alpha \mathbf{x}^2 + \beta \dot{\mathbf{x}}^2 - \gamma)\dot{\mathbf{x}} - \alpha^2 \mathbf{x} \tag{3}$$

where α, β, γ characterize the damping coefficient, while ω is related to the oscillation frequency, respectively.

Chronos gives also the opportunity to describe the behavior of the VP as a double integrator, that is a system without any inner dynamics (f = 0) whose motion is entirely determined by the coupling with the other agents via the control input u(t). In this case the system describing the behavior of the VP becomes:

$$
\ddot{\mathbf{x}}(t) = \boldsymbol{\mu}(t) \tag{4}
$$

#### 2.3.2. Control Signal

The different options for the control signal u describing the coupling of the virtual player with other agents can be listed as follows.

• PD control, a linear control law given by

$$
\mu = K\_{\rho}(\mathbf{y} - \mathbf{x}) + K\_{\sigma}(\dot{\sigma} - \dot{\mathbf{x}}) \tag{5}
$$

where y is the position of the other agent coupled to the VP, σ˙ is its reference motor signature as defined in Słowinski et al. ´ (2016), that is a velocity trajectory characterizing some desired human-like kinematic features to be assigned to the VP, and K<sup>p</sup> and K<sup>σ</sup> are two control gains. According to the values of K<sup>p</sup> and K<sup>σ</sup> , the VP acts as a leader (more weight given to K<sup>σ</sup> so that the VP priority is to minimize the mismatch between its own velocity and that of the prerecorded motor signature) or as a follower (more weight given to K<sup>p</sup> and hence higher priority to reducing the mismatch between the VP position and that of the other player).

	- When the VP acts as a follower, it is given by

$$\mu = \left[\psi + \chi(\chi - \jmath)^2\right](\dot{\varkappa} - \dot{\jmath}) - Ce^{-\delta(\dot{\varkappa} - \dot{\jmath})^2}(\varkappa - \jmath) \tag{6}$$

with

$$\dot{\psi} = -\frac{1}{\psi} [ (\chi - \jmath)(\dot{\chi} - \dot{\jmath}) + (\chi - \jmath)^2 ] \tag{7}$$

$$\dot{\chi} = -\frac{1}{\chi} (\dot{\chi} - \dot{\chi}) [f(\varkappa, \dot{\varkappa}) + u] \tag{8}$$

where y and y˙ are position and velocity of the other agent coupled to the VP, C and δ are control parameters, and ψ and χ are adaptive parameters. Note that in this case no motor signature can be assigned to the VP.


$$u = \lambda \left( [\psi + \chi(\chi - \sigma)^2](\dot{\chi} - \dot{\sigma}) - Ce^{-\beta(\dot{\chi} - \dot{\sigma})^2}(\chi - \sigma) \right)$$

$$+ (1 - \lambda)K(\dot{\chi} - \dot{\chi}) \tag{9}$$

where λ: = e −δ|x−y| , K is a control parameter, σ and σ˙ are desired position and velocity profiles (motor signature) that allow the VP to generate spontaneous motion, and all the other quantities have been previously defined.

Note that when the VP is influenced by the motion of two or more agents, as it might happen in the Group interaction trials (see Section 2.4), then y and y˙ are appropriately replaced by average position and velocity of all the agents connected to the VP, respectively.

# 2.4. Types of Possible Experiments That Can Be Run

All types of experiments that can be performed through the proposed technology are listed below and summarized in **Figure 3**.

1. Solo experiments. These experiments involve only one agent at a time. Participants are separately asked to generate some spontaneous movement of their preferred hand, so that their individual motor signature as defined in Słowin´ski et al. (2016) can be acquired.

	- HP-HP trials: human participants can either interact in a Leader-Follower condition (one of them leads the game and the other tracks her/his hand movements), or in a Joint Improvisation condition (there is no designation of leader and follower, the two participants are asked to create an interesting and synchronized motion of their preferred hands).
	- HP-VP trials: a human participant is asked to either lead or follow a virtual agent, whose mathematical description for its dynamics can be chosen among different models (see Section 2.3).
	- HP networks: human participants are asked to synchronize the motion of their preferred hand with that of the others they are topologically connected with.
	- mixed HP-VP networks: one or more participants of the group are virtual agents, which can be set to act either as followers or leaders, according to how much attention they pay to tracking the motion of the other group members they are connected with or generating spontaneous movements, respectively (see Section 2.3).

Note that, differently from Dyadic interaction, in the case of Group interaction trials between only two agents it is possible to assign directions to the coupling between players, as well as perform in-silico trials between two virtual players.

# 3. APPLICATION

# 3.1. Participants

A total of 9 people participated in the experiments: 1 female and 8 males (all the participants were right handed, and none of them had physical and mental illnesses or disabilities). The participants, who volunteered to take part in the experiments, were master students, Ph.D. students, and Postdoctoral Research Associates from the University of Bristol. The experiments took place in three separate sessions.

This study was reviewed and approved by the Ethics Office of the University of Bristol. All subjects gave written informed consent in accordance with the Declaration of Helsinki. Ethical harm was minimized: due care was taken to avoid coercion or exploitation, protect confidentiality, minimize the risk of physical and psychological harm and respect autonomy. Any information obtained in connection with this

study remained confidential, and participants' identity is kept anonymous.

# 3.2. Synchronization Metrics

Before describing in details some representative study cases illustrating the features and capabilities of Chronos, we report the metrics used in this work to assess players' performance in Dyadic and Group interaction experiments. Note that such metrics are independent of the architecture we propose. Indeed, depending on the hypotheses a researcher is interested in investigating, other metrics can be employed to analyze the data stored through Chronos, as for example those proposed in Di Paolo et al. (2008), Froese and Di Paolo (2010), Snapp-Childs et al. (2011).

Let x<sup>k</sup> (t) ∈ R ∀t ∈ [0, T] be the continuous time series representing the motion of the kth agent's preferred hand, with k ∈ {1, 2, ... , N}, where N is the number of individuals and T is the duration of the experiment. Let x<sup>k</sup> [ti] ∈ R, with k ∈ {1, 2, ... , N} and i ∈ {1, 2, ... , NT}, be the discrete time series of the position of the kth agent, obtained after sampling x<sup>k</sup> (t) at time instants t<sup>i</sup> , where N<sup>T</sup> is the number of time steps of duration 1T := T NT , that is the sampling period. Let θ<sup>k</sup> (t) ∈ [−π, π] be the phase of the kth agent, which can be estimated by making use of the Hilbert transform of the signal x<sup>k</sup> (t) as detailed in Kralemann et al. (2008).

In Dyadic interaction experiments, the relative phase φdh,<sup>k</sup> (t):= θh(t) − θ<sup>k</sup> (t) ∈ [−π, π] was used to check whether the assigned roles of leader and follower were respected by participants h and k at time t. Indeed, by defining φdh,<sup>k</sup> as the difference between the phase of the leader (player h) and that of the follower (player k), positive values indicate that the designated leader is effectively leading the game while interacting with the follower (Zhai et al., 2016).

In addition, the symmetric dyadic synchronization index ρdh,<sup>k</sup> ∈ [0, 1] originally introduced in Richardson et al. (2012) and defined as

$$\rho\_{d\_{h,k}} := \left| \frac{1}{T} \int\_0^T e^{j\phi\_{d\_{h,k}}(t)} \, dt \right| \simeq \left| \frac{1}{N\_T} \sum\_{i=1}^{N\_T} e^{j\phi\_{d\_{h,k}}[t\_i]} \right| \tag{10}$$

was used to quantify the average coordination level between agents <sup>h</sup> and <sup>k</sup> over time: the closer <sup>ρ</sup>dh,<sup>k</sup> <sup>=</sup> <sup>ρ</sup>dk,<sup>h</sup> is to 1, the lower the phase mismatch is between agents h and k over the whole trial.

The root mean square (RMS) of the normalized position error ǫh,<sup>k</sup> ∈ [0, 100]% defined as

$$\epsilon\_{\hbar,k} := \frac{1}{L} \sqrt{\frac{1}{T} \int\_0^T \left(\chi\_{\hbar}(t) - \chi\_{\hbar}(t)\right)^2 dt}$$

$$\stackrel{}{\sim} \frac{1}{L} \sqrt{\frac{1}{N\_T} \sum\_{i=1}^{N\_T} \left(\chi\_{\hbar}[t\_i] - \chi\_{\hbar}[t\_i]\right)^2} \tag{11}$$

where L refers to the range of admissible position (e.g., the range of motion detected by the Leap Motion controller), was employed as a measure of the position mismatch (expressed in percentage) between the two agents: the lower ǫh,<sup>k</sup> is, the lower the position mismatch is between agents h and k.

When N > 2 (Group interaction), further indices can be used to measure the coordination level of each participant in the group, as well as that of the entire ensemble. Firstly, the cluster phase or Kuramoto order parameter is defined both in its complex form q ′ (t) ∈ C and in its real form q(t) ∈ [−π, π] as

$$q'(t) := \frac{1}{N} \sum\_{k=1}^{N} \varrho^{j\theta\_k(t)}, \qquad q(t) := \operatorname{atan2} \left( \mathfrak{F}(q'(t)), \mathfrak{H}(q'(t)) \right) \tag{12}$$

which can be regarded as the average phase of the group at time t. Secondly, denoting with φ<sup>k</sup> (t):= θ<sup>k</sup> (t) − q(t) the relative phase between the kth participant and the group phase at time t, the relative phase between the kth participant and the group averaged over the time interval [0, T] is defined both in its complex form φ¯ ′ k <sup>∈</sup> <sup>C</sup> and in its real form <sup>φ</sup>¯ <sup>k</sup> ∈ [−π, π] as

$$\begin{aligned} \bar{\phi}'\_k &:= \frac{1}{T} \int\_0^T e^{j\phi\_k(t)} \, dt \simeq \frac{1}{N\_T} \sum\_{i=1}^{N\_T} e^{j\phi\_k[t\_i]},\\ \bar{\phi}\_k &:= \text{atan2}\left( \mathfrak{H}(\bar{\phi}'\_k), \mathfrak{H}(\bar{\phi}'\_k) \right) \end{aligned} \tag{13}$$

The individual synchronization index ρ<sup>k</sup> ∈ [0, 1] originally introduced in Richardson et al. (2012) and defined as

$$\rho\_k := \left| \bar{\phi}'\_k \right| \tag{14}$$

was then used to quantify the synchronization level of the kth participant over the whole trial duration: the closer ρ<sup>k</sup> is to 1, the smaller the average phase mismatch between agent k and the group. Similarly, the group synchronization index ρ<sup>g</sup> (t) ∈ [0, 1] defined as

$$\rho\_{\xi}(t) := \left. \frac{1}{N} \left| \sum\_{k=1}^{N} e^{j \left(\phi\_{k}(t) - \bar{\phi}\_{k}\right)} \right| \right| \tag{15}$$

was used to quantify the synchronization level of the entire group at time t: the closer ρ<sup>g</sup> (t) is to 1, the smaller the average phase mismatch of the agents in the group is at time t. The mean synchronzation level of the group ρ<sup>g</sup> ∈ [0, 1] over the total duration of the performance can consequently be estimated as:

$$\rho\_{\mathcal{S}} := \frac{1}{T} \int\_0^T \rho\_{\mathcal{S}}(t) \, dt \simeq \frac{1}{N\_T} \sum\_{i=1}^{N\_T} \rho\_{\mathcal{S}}[t\_i] \tag{16}$$

# 3.3. Representative Study Cases

To better illustrate the features and capabilities of Chronos, we apply the platform to some representative scenarios. Specifically, we consider first the case of a dyadic interaction between two players, we then move to studying group coordination in an ensemble of 5 players with and without the presence of virtual players. Illustrations of the interfaces exhibited in the different scenarios can be found in Figures S2–S9.

All the experiments involved participants sitting around a table and moving the index finger of their preferred hand as smoothly as possible over a Leap Motion controller, along a direction required to be straight and parallel to the floor. The instruction to move smoothly was given to keep the attentional level of the participants as high as possible throughout all the experiments. Data was originally stored with a frequency rate of 10 Hz, and then underwent cubic interpolation (100 Hz, see Figure S1).

Remark 1. In general, the Leap Motion controller allows to detect the 3D position of both hands by providing several triplets (x, y, z) for each of them. In particular, 4 triplets are provided for the thumb, whereas 5 triplets are provided for the other fingers, in addition to two more triplets representing wrist and palm position, respectively. Given the nature of the task here considered (preferred hand's index-finger 1D motion along the x-axis of the sensor), only the x-coordinate of the index finger's tip position was recorded for each participant.

Remark 2. A too high sampling frequency could cause delays in the communication among different machines. Indeed, regardless of the input device employed as position sensor, increasing the sample rate would lead to a larger quantity of data to be acquired, stored, and then sent to the server from different machines at the same time, and hence to possible undesired communication delays deteriorating the effectiveness of the task. Despite the Leap Motion controller providing a value of sampling frequency up to 40 Hz (Guna et al., 2014), we found that 10 Hz was low enough to avoid delays, yet sufficiently high to guarantee an adequate number of samples to be analyzed. An upsampling was performed a posteriori for the sake of a more accurate analysis.

Remark 3. To make sure that the chosen sampling frequency allowed for synchrony among all the machines involved in the experiments, we run a mock group synchronization trial and verified that the position acquired on the least computationally powerful machine was broadcast in real time to all the others. Any undesired delay when broadcasting the position of the human players could lead to additional phase mismatches, thus deteriorating the metrics introduced in Section 3.2. Furthermore, note that VPs do not introduce delays as they are locally implemented on the administrator's machine.

#### 3.3.1. Solo Experiments

Four participants were asked to separately perform 4 trials, each of duration 60 s. Specifically, each participant was told to perform 2 trials while producing a sinusoidal-like wave at their own natural oscillation frequency, and then 2 more trials while producing an interesting non-periodic motion representing their motor signature (Słowinski et al., 2016 ´ ).

These experiments were carried out in the first session (see Section 1.1 of Supplementary Material for more details on how to perform solo trials via Chronos, and Figure S11 for an example of individual motor signature).

FIGURE 5 | Experimental results in the *Dyadic interaction* experiments. RMS of the normalized position error ǫ*h*,*k* (A), dyadic synchronization indices ρ*dh*,*<sup>k</sup>* (B), and relative phase φ*dh*,*<sup>k</sup>* between the two participants (C) are shown for each pair (Dyad 1 and Dyad 2), where different scales of gray refer to different pairs and players. The height of each bar represents the mean value averaged over the 3 trials for each pair, whereas the black error bar represents its averaged standard deviation. The PDF of the relative phase φ*dh*,*<sup>k</sup>* between the two participants of Dyad 1 (D) and Dyad 2 (E) are shown for the first trial of each pair, where the black solid line refers to HP-HP interaction, and the black dashed line refers to HP-VP interaction.

undirected links between a VP and 1, 2, or 4 HPs (B–E, respectively) are shown. The virtual player acts as a leader in topologies (B–D) and as a follower in topology (E).

# 3.3.2. Dyadic Interaction Experiments

The same four participants were grouped in two pairs: players 1 and 2 formed Dyad 1, while players 3 and 4 formed Dyad 2, respectively.


Interestingly, the relationships between the metrics obtained for the two dyads in HP-HP interaction are replicated when substituting one of the two human players in each pair with a virtual agent (**Figure 5**). This seems to confirm that the VP, as designed in Alderisio et al. (2016a) and implemented in our novel software set-up, is able to interact in a human-like fashion with the other player, becoming a kinematic avatar of the person it is substituting in the game (Zhai et al., 2016). In particular, the RMS of the normalized position error ǫ1,2 obtained in Dyad 1 is lower than ǫ3,4 obtained in Dyad 2 (**Figure 5A**), and the same applies to the dyadic synchronization indices ρd1,2 and ρd3,4 (**Figure 5B**), and for the relative phase φd1,2 and φd3,4 (**Figure 5C**). Notably, for both dyads, the probability density function (PDF) of the relative phase obtained for the two players in HP-VP interaction resembles that obtained in HP-HP interaction (**Figures 5D–E**). Indeed, the PDFs related to Dyad 1 are broader ad centered around 0, whereas those related to Dyad 2 are tighter and shifted on the right.

These experiments were carried out in the second session (see Section 1.2 of Supplementary Material for more details on how to perform dyadic interaction trials via Chronos and replace one of the human players with a VP).

# 3.3.3. Group Interaction Experiments

Two different groups of 4 (the same as Solo experiments and Dyadic interaction) and 5 other participants were separately tested, respectively named Group 1 and Group 2. Participants in each group were asked to synchronize their motion with that of the circles shown on their respective computer screen, representing the movements of the other agents topologically connected with them. However, players had no global information of the topology of their interactions.

### **3.3.3.1. Mixed HP-VP network**

Four participants (Group 1) were involved in this session. Firstly, 3 trials of 30 s each were performed where all participants saw on their respective screens traces of the objects moved by all the others (all-to-all configuration, **Figure 6A**). Secondly, a VP (modeled by HKB equation and adaptive control) fed with the sinusoidal motion of a different player was introduced in the network; participants were told that a fifth human player was interacting with them. The virtual agent was first connected in leader mode to either 1, 2, or 4 HPs (**Figures 6B–D**, respectively), and then in follower mode to all of them (**Figure 6E**). For each topology including the virtual player, once again 3 trials of duration 30 s were performed.

It is possible to appreciate that the highest value of group synchronization observed experimentally is obtained in the HP network, while lower values are obtained when introducing a VP as leader. However, the group synchronization index ρ<sup>g</sup> increases again when a VP is introduced as follower (**Figure 7A**). These results are confirmed by the dyadic synchronization indices ρdh,<sup>k</sup> respectively obtained in the five topologies of interest (**Figures 7B–F**). For each pair of human players, high values (**Figure 7B**) are observed for the topology shown in **Figure 6A**. On the other hand, when a virtual leader is introduced in the interaction (topologies shown in **Figures 6B–D**) the lowest values of dyadic synchronization are obtained for each human player in correspondence to the VP (player 5 in **Figures 7C–E**). Finally, when the VP acts as a follower (topology shown in **Figure 6E**), the highest values of dyadic synchronization indexes ρdh,<sup>k</sup> for each human player are observed in correspondence to

FIGURE 7 | Experimental results in the *Group interaction* experiments—Group 1. The group synchronization indices obtained for the players in the five different topologies of Figure 6 are shown, with different scales of gray representing different topologies (A). The height of each bar represents the mean value over time of the group synchronization index ρ*g*(*t*), averaged over the 3 trials for each topology, whereas the black error bar represents its averaged standard deviation. The corresponding dyadic synchronization indices ρ*dh*,*<sup>k</sup>* obtained for all the pairs of players in the topologies of Figures 6A–E are respectively shown in (B–F). Different symbols and colors refer to mean and standard deviation averaged over the 3 trials performed for each topology, respectively. As ρ*dh*,*<sup>k</sup>* are symmetric by definition, only half of them are depicted.

the VP (player 5 in **Figure 7F**). For more details see Tables S1, S2.

These experiments were carried out in the second session (see Section 1.3 of Supplementary Material for more details on how to perform group interaction trials via Chronos and deploy virtual agents within the human ensemble).

# **3.3.3.2. HP network**

Five participants (Group 2) were involved in this session. Eight different topologies of interactions were implemented among them (**Figure 8**): undirected complete (**Figure 8A**), ring (**Figure 8B**), path (**Figure 8C**), and Star graph (**Figure 8D**), and their respective directed version (**Figures 8E–H**). As for the undirected topologies:


FIGURE 8 | Topology of connections among participants in the *Group interaction* experiments—Group 2. (A–D) represent undirected complete, ring, path and star graph, respectively. (E–H) represent the respective directed versions. Edges without arrows represent *undirected* connections (if participants *i* sees the motion of participant *j*, then also participant *j* sees the motion of participant *i*), whereas in the *directed* case, an edge going out of node *i* and coming in node *j* (the direction of the edge is given by its corresponding arrow) is representative of the fact that participant *j* sees the motion of participant *i*.

players 2 and 4), and as a consequence were not connected to each other.


For each topology, 6 trials of duration 30 s were performed.

The values of the individual synchronization indices ρ<sup>k</sup> of the participants were first averaged over the total number of trials for each kth player and for each topology (both in the undirected and in the directed case), and then underwent a one-way ANOVA with repeated measures. Their mean value and standard deviation over the total number of participants are represented for each topology in **Figure 9**. In the undirected case, the ANOVA performed with Greenhouse–Geisser correction revealed a statistically significant effect of the topology [F(1.201, 4.805) = 8.859, p < 0.05, η <sup>2</sup> = 0.689], suggesting an advantage of Complete graph and Star graph (Bonferroni post-hoc test, p < 0.05). Albeit preliminary, this result seems to confirm independently the observations reported in Alderisio et al. (2016c) showing that undirected interaction patterns among participants affect their coordination level. Also, for all the

topologies, higher mean values and lower standard deviations of group synchronization index are observed in the directed case (with the only exception of the former in the Complete graph). For more details see Table S3.

As expected for both undirected and directed topologies, in most cases (83% for undirected and 91% for directed topologies) the highest mean values of dyadic synchronizations over the total number of trials are observed within topologically connected participants (**Figure 10**). Statistically, visually paired dyads across both undirected and directed topologies were indeed found to exhibit higher synchronization than non-visually coupled dyads [t(78) = −4.544, p < 0.01]. For more details see Tables S4, S5.

These experiments were performed in the third session (see Section 1.3 of Supplementary Material for more details on how to perform group interaction trials via Chronos and set different interaction patterns among participants, and Figure S12 for an example of trajectories recorded in a group interaction trial performed by a human ensemble).

# 4. DISCUSSION

In this work we presented an ad hoc novel computer-based setup for investigating human coordination, both in dyads and in groups, and showed preliminary results on coordination in human ensembles in order to validate its effectiveness. The proposed set-up allows to remove the effects of social interactions among the players and to implement different structures of interconnections. In addition, it allows to deploy virtual agents in the group, thus opening the possibility of further investigating the mechanisms that underly human group coordination through an extension of the human dynamic clamp to multiplayer scenarios (Dumas et al., 2014).

We envisage that the computer set-up presented in this paper can be used in Social Psychology to elucidate what the effects of social interactions are in dyadic or group movement coordination. Indeed, joint action tasks might first be performed while allowing participants to share direct visual and auditory coupling (participants directly look at each other instead of the screen of their personal computers, and do not wear headphones so that they know who they are interacting with), and then while removing them (or vice versa). Moreover, since some of the players can be replaced with one or more virtual agents, our computer technology can also be exploited for the development of artificial agents able to merge and interact within a group of humans (Boucenna et al., 2016; Iqbal et al., 2016), both for recreational (Alac et al., 2011) and rehabilitation purposes (Zhai et al., 2015; Bono et al., 2016; Słowinski et al., 2017 ´ ).

In order to illustrate the features and capabilities of Chronos, we applied the platform to some representative scenarios. Specifically, we validated the use of a virtual player as designed in Alderisio et al. (2016a) in a dyadic interaction task. We found that the behavior exhibited in terms of the metrics used in Section 3.2 by each dyad was the same for both HP-HP and HP-VP interaction. This suggests that the human players involved in our experiments did not change the way of interacting with their partner according to the nature of the latter. In particular, we observed that if a human participant to whom a follower role was assigned in duo interaction ended up leading her/his human partner (in spite of the instruction given), s/he did so also when interacting with a virtual leader (Dyad 1). On the other hand, if a human leader was successfully leading her/his human partner, s/he did so also in the interaction with a virtual follower (Dyad 2). Despite being interesting, such preliminary results are specific for the trials we performed here to validate Chronos, hence they call for more experiments in order to be confirmed and extended.

Moreover, we illustrated the possibility of implementing different interaction patterns in larger ensembles. We observed that, only in the case of undirected topologies, coordination levels in a human ensemble are affected by the specific structure of interconnections among group members when any form of direct visual, auditory or social interaction is removed, a result found also in Alderisio et al. (2016c) yet in the presence of visual and social cues. This leads to open questions on what topology has to be implemented in order to enhance synchronization in the group, what the effects of removing some connections are, and whether the presence of social interaction further increases coordination.

Also, we validated the deployment of virtual agents in a group, and observed that they can decrease coordination levels when acting as leaders. Higher values of group and dyadic synchronization indices were instead observed either when no virtual player was interacting within the human ensemble, or when it was following the motion of all the subjects. These results only suggest that virtual players can be used to vary the level of coordination in a human group, although further work is required to better understand this effect and its implications.

Despite these results being promising, our experiments were presented in this methodological paper mainly to show the features of the computer-based set-up we propose. Rather than being exhaustive, they only illustrate the capabilities of Chronos and the analysis that can be carried out, hence we leave to future

# REFERENCES


publications a more thorough experimental confirmation of the preliminary findings reported here for illustrative purposes.

Some further extensions to our work include the possibility of implementing time-varying topologies to study the effects of dynamically adding/removing connections among interacting participants (Cardillo et al., 2014), and enabling the administrator to provide the players with social cues in real time, based on the quality of their performance (i.e., as measured by the group synchronization index). In addition, it is possible to implement new mathematical models (Snapp-Childs et al., 2011; Zhai et al., 2016) for the VP to perform as joint improviser with other virtual or human agents. Finally, we are exploring the possibility of extending Chronos over the Internet, where it is also necessary to deal with network latency issues.

# AUTHOR CONTRIBUTIONS

Conceived and designed the experiments: FA, GF, MdB. Performed the experiments: FA, ML, GF. Analyzed the data: FA, GF. Contributed analysis tools: FA, GF. Developed the software: ML. Wrote the paper: FA, GF, MdB.

# FUNDING

This work was supported by the European Project AlterEgo FP7 ICT 2.9—Cognitive Sciences and Robotics, Grant Number 600610.

# ACKNOWLEDGMENTS

We wish to thank all the people taking part in the experiments.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2017.00967/full#supplementary-material


Forouzan, B. A. (2002). TCP/IP Protocol Suite. New York, NY: McGraw-Hill, Inc.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Alderisio, Lombardi, Fiore and di Bernardo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Stance Leads the Dance: The Emergence of Role in a Joint Supra-Postural Task

Tehran J. Davis<sup>1</sup> \*, Gabriela B. Pinto1,2 and Adam W. Kiefer3,4,5

<sup>1</sup> Center for the Ecological Study of Perception and Action, Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA, <sup>2</sup> CAPES Foundation, Ministry of Education of Brazil, Brasília, Brazil, <sup>3</sup> Division of Sports Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA, <sup>4</sup> Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA, <sup>5</sup> Center for Cognition, Action and Perception, Department of Psychology, University of Cincinnati, Cincinnati, OH, USA

Successfully meeting a shared goal usually requires co-actors to adopt complementary roles. However, in many cases, who adopts what role is not explicitly predetermined, but instead emerges as a consequence of the differences in the individual abilities and constraints imposed upon each actor. Perhaps the most basic of roles are leader and follower. Here, we investigated the emergence of "leader-follower" dynamics in interpersonal coordination using a joint supra-postural task paradigm (Ramenzoni et al., 2011; Athreya et al., 2014). Pairs of actors were tasked with holding two objects in alignment (each actor manually controlled one of the objects) as they faced different demands for stance (stable vs. difficult) and control (which actor controlled the larger or smaller object). Our results indicate that when actors were in identical stances, neither led the inter-personal (between actors) coordination by any systematic fashion. Alternatively, when asymmetries in postural demands were introduced, the actor with the more difficult stance led the coordination (as determined using cross-recurrence quantification analysis). Moreover, changes in individual stance difficulty resulted in similar changes in the structure of both intra-personal (individual) and inter-personal (dyadic) coordination, suggesting a scale invariance of the task dynamics. Implications for the study of interpersonal coordination are discussed.

Keywords: interpersonal coordination, joint action, movement dynamics, recurrence quantification analyses, self-organization

# INTRODUCTION

Two friends passing a cup of coffee involves the coordination of no fewer than 2 arms, 8 joints, and 50 muscles spread across two separate bodies. To avoid a mishap, each person must, at minimum, continuously track the positions and orientation of one another's hands, and act so as to mutually align their movements within a very narrow window of space and time. Indeed, these sorts of exacting perceptual and motor demands are necessary in even the most basic of joint tasks. And yet (perhaps quite remarkably) waiters frequently pass plates, workers routinely pass tools, and children successfully pass toys with little thought or concerted effort.

It is argued that successful joint actions, such as passing a cup, result from the formation of softly assembled, coordinative structures between multiple actors (Black et al., 2007;

### Edited by:

Rick Dale, University of California, Merced, USA

#### Reviewed by:

J. Scott Jordan, Illinois State University, USA Pedro Passos, Universidade de Lisboa, Portugal

> \*Correspondence: Tehran J. Davis tehran.davis@uconn.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 19 November 2016 Accepted: 21 April 2017 Published: 09 May 2017

#### Citation:

Davis TJ, Pinto GB and Kiefer AW (2017) The Stance Leads the Dance: The Emergence of Role in a Joint Supra-Postural Task. Front. Psychol. 8:718. doi: 10.3389/fpsyg.2017.00718

Shockley et al., 2009; Riley et al., 2011; Saltzman and Caplan, 2015). A central theme of this framework is the appeal to principles of emergent self-organization when explaining how a given number of degrees of freedom (e.g., joints and neuromuscular groups) might become functionally coordinated. For individual actors, "solving" this problem involves the recruitment or reduction of these degrees of freedom in accord with the constraints placed upon the system during the execution of action (Bernstein, 1967; Gelfand and Tsetlin, 1971; Turvey, 2007). According to the inter-personal synergy hypothesis (Riley et al., 2011), the coordination of joint actions between individuals is the result of similar processes—mutual constraint and synergistic organization across two or more people's bodily and cognitive states.

Some support for this position comes by way of research investigating the organization of body segments (e.g., hand and torso) when two people engage in a joint supra-postural precision task (a task where two standing co-actors must make very precise and exacting movements while maintaining upright balance). It has been argued that successes in these sorts of tasks are built upon a nested hierarchy of intra-personal and inter-personal coordinative structures between hand and postural control (Riley et al., 2011; Ramenzoni et al., 2012) that emerge to meet and continuously adapt to the evolving task demands. For example, when a single actor performs a precision grasping or aiming task, the activity of separate body segments within the actor show signs of mutual interdependence: activity from other body segments, postural control systems and even respiration (Balasubramaniam et al., 2000; Kuznetsov et al., 2011; Ramenzoni et al., 2012) act in a compensatory manner to facilitate the actor's goal.

Ramenzoni et al. (2011) found evidence that analogous synergistic processes occur across individuals that are cooperating to complete a shared task. In their experiment, one member of a dyad held a pointer-like object while a partner held a ring-like object. Dyads were then tasked with manually aligning their respective objects so that the pointer remained within the perimeter of the ring (without touching) for the duration of each trial. The task's difficulty was manipulated at the inter-personal level by varying the diameter of the ring smaller rings placed more exacting precision demands between the actors. Intra-personal task difficulty was manipulated by independently changing the stance of each actor: Actors either stood with a normal base of support (their feet shoulder-width apart) or stood in a heel-to-toe tandem stance that narrowed the base of support and required more effort to maintain upright balance. Both challenges were typically met with increases in the degree and the stability of coordination within and between actors. For example, decreases in ring size corresponded to increases in the amount of shared activity between actors' hands and postures as well as increases in intra-personal hand-posture coordination (as measured by cross-recurrence quantification analysis (CRQA), see Section "CRQA: Global Dynamics and Leader-Follower Analyses"). While inter-personal coordination between actors' hands was compromised (but not eliminated) in conditions when one or both actors were in the tandem stance condition, a compensatory increase in the coordination of co-actor's postural activity was observed.

More recently, Athreya et al. (2014) demonstrated that comparable patterns of coordination emerge even in instances where information about the movements of co-actors is limited. Pairs of participants were asked to perform a similar manual alignment task, this time aligning two laser pointers on a black screen. Participants performed this task under conditions where they could see their partner as well as conditions were their partner remained unseen. In the unseen condition, participants still had task-specific information about their confederate via the movement and position of the confederate's laser. Athreya et al. (2014) observed that measures of inter-personal hand coordination remained consistent across both conditions. Importantly, inter-personal torso coordination in the unseen condition was still present (though reduced compared to the seen condition). This result suggests that the postural coordination observed in this task was not entirely an incidental product of visual entrainment (Varlet et al., 2011), but instead may have been closely tied to the detection of information related to the individual and shared task demands.

The two aforementioned investigations questioned how manipulating task demands lead to changes in the coordination dynamics within and across actors. In the present study, we asked an equally important question—how do differences in individual demands result in the spontaneous organization of distinct roles between actors? It is rarely the case that individuals performing a shared task mirror one another's actions. Instead, many joint tasks demand that individuals perform complimentary actions, or adopt roles, to reach their desired ends. For example, "Having a conversation" implies a "speaker" and a "listener" and "Passing a ball" requires a "thrower" and "catcher." More generally, during joint tasks there are actors that lead or initiate an interaction (e.g., speakers and throwers), and actors that follow (listeners and catchers). In many cases these leader-follower roles are not defined a priori, but instead emerge spontaneously provided asymmetries in the individual abilities, constraints, and goals of each actor. In Ramenzoni et al.'s (2011) study, such task asymmetries were present in each individuals' manual and postural demands. Actors were either "rings" setting the boundary constraints of the joint task, or they were "pointers" tasked with maintaining their position within the bounds of the ring. At the same time, the actors faced different challenges to stance—actors standing with their feet in tandem stance encountered greater difficulty maintaining postural stability than actors with their feet shoulder-width apart.

Recently, Bosga et al. (2010) proposed that a framework developed for describing the coordination between multiple joints may be useful in understanding leader-follower dynamics between multiple agents (or multiple body segments across actors). The leading joint hypothesis (LJH) (Dounskaia, 2005, 2010) suggests that individual joints in a multi-joint action play different roles in the production of the global movement, where the leading joint acts as a linchpin for the organization of movement for the remaining joints. Typically, the leading joint emerges out of the interplay of task constraints and the functional and bio-mechanical linkages between body segments. During a multi-articular movement, mechanical interactions between interdependent body segments produce varying amounts of

torque at each joint. The leading joint, typically the joint with a mechanical advantage, exerts additional movement torques on the subordinate joints. This results in greater movement variability and increased complexity in movements around the subordinate joints while maintaining relatively low variability and complexity of movements around the leading joint.

Bosga et al. (2010) demonstrated that analogs to the "leadingjoint" may be found in multi-agent rhythmic coordination. They tasked pairs of actors with cooperatively moving a rocking board side-to-side, while measuring the enclosed angles of various joints about the actors' bodies. Kinematic analysis focused on the angular displacements and continuous relative phase angles of actors' joints, and leader-follower relationships were determined using time-lagged cross-correlations of these values. Analysis of the inter-personal coordination dynamics often revealed the presence of a leading actor whose movement kinematics were consistent with those predicted by the LJH—namely measures of angular displacement variability about the joints of leading rockers tended to be lower than variability about the joints of followers.

With this in mind, we had two specific aims for this study. First, we investigated how patterns of intra-personal and interpersonal coordination changed as a function of each individual's stance demands and disk control. At the intra-personal level, we expected that increases in stance difficulty would result in increases in movement variability about the torso of individual actors. More, we expected that these increases would be met with increases in the regularity (Paterno et al., 2015), complexity and intermittency (associated with functional flexibility, Kiefer and Myer, 2015) of intra-personal coordination between these two body segments. Such changes would reflect the reorganization of the available degrees of freedom to insulate the hand from the effects of increased torso variability. At the level of interpersonal coordination, we predicted similar patterns of effects that increases in shared-stance difficulty would result in increases in the regularity, complexity, and intermittency of coordination between the two actors as they encounter greater difficulties to completing the task.

Our second, more central aim was to investigate the degree to which asymmetries in individual task demands corresponded to the emergence of leader-follower dynamics in inter-personal coordination. For example, consistent with the LJH, we predicted that the member of the dyad whose body segments exhibited greater movement variability would have an increased likelihood as acting as a "subordinate joint" or follower in the joint task. Also, given that the LJH makes claims regarding complexity, we were inclined to predict that the actor showing greater complexity in intra-personal coordination would likely emerge as a follower in the task. Given our above predictions about the relative effects of stance difficulty, this leader-follow dynamic was expected to be most pronounced in conditions when the actors faced asymmetrical stance demands. That is, we predicted that actors facing less challenges to upright stance would tend to lead the coordination when their partners were in the more difficult tandem stance. When both actors faced similar stance demands, the regularity and systematic nature of this leaderfollower relationship were expected to be reduced.

# Participants

We recruited 24 undergraduates (14 women and 10 men) in pairs (12). All participants reported being free of recent injury and had normal or corrected to normal vision. Informed consent was obtained in agreement with the University of Connecticut Institutional Review Board's standards and practices. Undergraduates received course credit for their participation.

# Apparatus

We used a short throw projector to display a computer-generated scene onto a vertical white screen. The screen was translucent so that the projected scene could be seen on both sides. Pairs of participants stood facing one another on opposite sides of the screen (see **Figure 1**). Each participant stood approximately 1.2 m away from the screen.

In addition, we used a wireless 6DoF magnetic motion tracking system (Liberty Latus; Polhemus LTD, Colchester, VT, USA) to capture the position and orientation of participants' body segments in 3-dimensional space. Each participant held one motion sensor in their dominant hand while another sensor was attached to their waist—providing densely sampled (94 Hz) data about the hand and torso movements. The position of each participant's handheld sensor was mapped to a computergenerated avatar, a uniquely colored disk in the virtual scene constructed using custom software. By moving their hands, participants moved their respective disks—a displacement of the hand in the medial-lateral (ML) and superior-inferior (SI) axes resulted in an equal displacement of the disk on the screen [anterior-posterior (AP) hand movements did not affect the display]. As participants stood on opposite sides of the screen they could not see one another, but only the positions of one another's disks. The projected disks were two sizes: 5 cm diameter and 8 cm in diameter, and the alignment task required that the participant with the smaller disk maintain their position within the perimeter of the larger disk. We selected these relative disk sizes as they demanded that participants precisely coordinate their movements to be successful in the alignment task, but allowed for enough flexibility that the task was not extraordinarily difficult (piloting suggested a less than 5% error rate for these sizes). The relative size of each participant's disk was counterbalanced across trials.

# Procedure

During each trial, participants were asked to align their disks such that the smaller disk stayed within the perimeter of the larger disk (as in **Figure 1**). On a given trial each participant stood either with their feet shoulder-width apart (Easy) or in tandem heel-totoe stance that provided an additional challenge to maintaining upright stance (Hard). This resulted in four possible dyad stances (Participant 1's stance – Participant 2's stance) conditions: Easy-Easy, Easy-Hard, Hard-Easy, and Hard-Hard. Stance conditions were crossed with control of the larger disk. Condition-trials were presented in random order in two blocks, resulting in 16 total trials. Each trial lasted for 45 s. To reduce the likelihood

of arm fatigue, inter-trail intervals were a minimum of 30 s, at which time the experimenter would ask both participants if they were ready to continue. Participants were granted additional time if they so requested. Typically, breaks between trials lasted no longer than 60 s.

Participants were given real-time feedback about performance—a red dot appeared in the center of each disk when participants were out of alignment. In addition, participants were provided feedback about their overall performance via a meter on the edge of the projection. This meter decreased anytime the two participants were not in alignment with one another at a rate of 5% reduction for each second of error. Participants were told that the bar represented a performance score and that they should strive to keep the bar as full as possible and not allow it to become completely empty.

# Movement Analyses

Movement time series were collected for the x, y, z positions of each of the four markers. Movements in x, y, z corresponded to movements along the ML axis, AP axis, and SI axis, respectively. Prior to analysis, these data were smoothed using a 10-Hz Butterworth filter. The first 4 s of each time series was truncated to remove transients (as the participants settled into the alignment task).

When analyzing hand movements, we initially focused on positions along the ML and SI axes (movements in AP had no effect on the position of the avatar disk). When analyzing torso movements, we focused on ML and AP (as participants stood the entire time, no appreciable changes were expected in SI). However, meaningful effects in hand and torso were only found in the ML axes, therefore, for all reported data we focus on our measures as they relate to movement in the ML axis.

# CRQA: Global Dynamics and Leader-Follower Analyses

Time series data were submitted to CRQA. CRQA is a nonlinear modeling technique that captures patterns of coordination between two interacting time series by indexing instances of their co-visitation in a shared, multidimensional phase space (Zbilut et al., 1998; Marwan and Kurths, 2004; Coco and Dale,

2014; Fusaroli et al., 2014) (see **Figure 1c**). The time series may be from different body segments of a single actor as in intrapersonal coordination, or from two actors as in inter-personal coordination. Most natural systems have preferred states that they (approximately) revisit in stretches of repeating behavioral patterns, or recurrences (Poincaré, 1890). When dealing with dual time series, cross-recurrences may be interpreted as instances when one series is visiting a state that was occupied by the other at a previous point in time. The resulting structure of these cross-recurrences reveals important information about the organization and coordination dynamics of the system(s) under observation.

Cross-recurrence quantification analysis begins with the identification of cross-recurrent points and proceeds with several other measures that describe their relative number, density, distribution, and structure (Shockley, 2005). Visualizing these characteristics in reconstructed phase space is difficult when it has more than three dimensions. To this end, a simplified method involves indexing the cross-recurrent points between the embedded time series in a N × M binary matrix where N is the first time series and M is the second. Each point N<sup>i</sup> that is determined cross recurrent with M<sup>j</sup> is denoted with a mark at (i, j). CRQA is a quantitative analysis of this cross-recurrence plot (Eckmann et al., 1987; Marwan and Kurths, 2004) (see **Figure 1d**), and includes measures that highlight the density of cross-recurrent points, as well as their deterministic structure. For instance, the recurrence rate (RR) is the ratio of cross-recurrent points to all points in the phase space. RR is often used as an index of global coordination between two systems. When conducting CRQA, a non-trivial matter is the selection of the appropriate delay, embedding dimension, and radius parameters for the reconstructed phase space. Here, we selected the appropriate parameters for each trial based upon an optimization routine (Coco and Dale, 2014) using the average mutual information (Fraser and Swinney, 1986) and false nearest neighbors (Kennel et al., 1992) methods. The optimal radius was selected based upon the criterion that the final RR was between 3 and 5% (Shockley, 2005).

Successive or adjacent recurrent points form lines that reflect the structure of the coordination between the time series. Diagonal lines mark instances were the two series are co-evolving or moving parallel with one another through phase space. DET, or determinism, is a measure of the percentage of cross-recurrent points that form these diagonal line structures. Assuming a relatively constant RR, greater DET suggests stronger (i.e., more frequent) coupling between the time series. To assess changes in the complexity of coordination, we used a measure related to the Shannon information entropy of the diagonal line lengths in the recurrence plot. The Shannon information entropy is sensitive to the number of lines in the recurrence plot. Relative entropy (rENTR) accounts for this bias by normalizing the entropy value against the number of lines in the recurrence plot (Coco and Dale, 2014). This allowed us to more faithfully compare across trials and conditions.

The percentage of recurrent points forming vertical lines (laminarity or LAM), as well as the average vertical line length (trapping time or TT) index the proportion and average duration of laminar states. In auto-recurrence (when a time series is compared against itself), these vertical line measures are typically interpreted as capturing the degree of intermittency or rigidity ("stickiness") in a system—that is how often and how long a system gets stuck in one or more states for a given behavior (Kiefer and Myer, 2015). When considered in the context of overt behavior, an actor that can smoothly and efficiently transition between and among stable states of behavior would exhibit lower rigidity values than an actor that does not transition effectively. Indeed, decreases in both LAM and TT have been associated with greater functional flexibility in skill acquisition or development (Wallot and Grabowski, 2013; Kiefer and Myer, 2015). However, in cross-recurrence (Cox and van Dijk, 2013), these measures take on a slightly different meaning. In the context of two actors, M and N (see **Figure 1d**), the vertical line measures speak to actor M visiting a single point in phase space, and then actor N visiting that same space over consecutive temporal samples even as actor M has moved on. This could indicate that actor M led or constrained actor N into a certain movement pattern, before moving on, with the result that actor N maintains or is stuck in that movement pattern for a certain length of time longer than the duration of time actor M spent in that same space, or trajectory.

Leader-follower relationships may be further assessed by taking note of the symmetry properties of the cross-recurrence plot. In the cross-recurrence plot, an imaginary line of incidence (LOI) runs along the diagonal where N<sup>i</sup> = M<sup>j</sup> . This line represents points where both time series are exhibiting a 0-lag synchronization over consecutive samples. Cross-recurrent points in the triangular regions above and below the LOI represent points in time when one time series is revisiting a state previously occupied by the other at a given time delay (i.e., >0 lag in either direction). For example, cross-recurrent points where N<sup>i</sup> > M<sup>j</sup> indicate that point N is visiting a state previously occupied by M and cross-recurrent points where N<sup>i</sup> < M<sup>j</sup> indicate the opposite. While CRQA measures regarding the entire cross-recurrence plot provide metrics of the global dynamics, evaluating these regions separately allowed us to compare the structure of coordination as a consequence of which time series was ahead of the other. For example, greater DET in the upper region compared to the lower region in **Figure 1d** would suggest that the coordination between the two series is more tightly coupled when time series M is entering states at consecutive time points previously occupied by time series N over consecutive time points, but with a time-lag greater than 0.

Time lags are able to be quantified via the orthogonal distance from any cross-recurrent point to the LOI. A measure of the diagonal-wise RR, then, provides a measure of the density of recurrent points at a particular time lag (a measure analogous to cross-correlation). A simple measure of diagonal-wise RR involves indexing the lag at which it is greatest within a selected window around the LOI, providing a measure of the degree of leader-follower relationships between the time series (Dale and Spivey, 2006; Warlaumont et al., 2014). Here, we refer to this value as LAGMAX.

# Measures and Design

fpsyg-08-00718 May 5, 2017 Time: 16:29 # 6

Intra-personal analysis included measures of movement variability of hand and torso, as well as CRQA measures of within-individual hand-torso coordination. Inter-personal analyses focused on the coordination between the two actors' hands, including both CRQA measures and task performance measures. For both levels of analyses, we tested for differences in actors' behavior as a function of their relative stances (dyad stance: Easy-Easy, Easy-Hard, Hard-Easy, and Hard-Hard) and which actor controlled the larger disk. When one actor controlled the smaller disk, the other, by definition, controlled the larger disk. We considered that this manipulation may have defined distinct, a priori roles for the dyad members—for example the larger disk may be interpreted as a boundary for the smaller disk to remain within.

Both intra-personal and inter-personal analyses included a 2 (disk control) × 4 (dyad stance) repeated measures design. In the case of intra-personal coordination, we were also concerned with identifying differences between co-actors. As such, in our intrapersonal analyses we crossed disk control and dyad stance with a between-factor for actor (Person 1 or Person 2 of the dyad). In the case of inter-personal coordination, our concern was not in differences between actors, per se, but instead differences in coordination as a function of which actor was moving ahead of, or leading, the other. Therefore, for these analyses, disk control and dyad stance were instead crossed with an additional withinfactor to account for differences in CRQA measures as a function of triangular region (upper triangle or lower triangle).

# RESULTS

# Individual Level Analyses

#### Movement Variability

We quantified movement variability as the standard deviation of effector (hand and torso) position during each trial. While hand movement variability tended to be greater in the Hard-Hard dyad stance relative to the Easy-Easy dyad stance, no significant main effects nor any interactions were observed for either hand or torso (ps > 0.05).

# Hand-Torso Coordination

Analysis of variance (ANOVA) of intra-personal hand-torso DET revealed a dyad stance × actor interaction [F(3,66) = 16.92, p < 0.001, η 2 <sup>p</sup> = 0.43]. When dyads were in identical stances (Easy-Easy and Hard-Hard) there was no difference in DET between actors. When actors were in different stances (Easy-Hard and Hard-Easy) the actor in the Hard stance condition exhibited greater DET between hand and torso. Overall, individuals' DET was greater when actors were in the Hard stance compared to the Easy stance. ANOVA also revealed a dyad stance × control interaction [F(3,66) = 5.84, p = 0.001, η 2 <sup>p</sup> = 0.21].

Similar patterns of significant effects (see **Table 1**) were observed for rENTR, LAM, and TT, however, we note that the dyad stance × control interaction was non-significant (p = 0.060) for TT. A graphical representation of these effects may be found in **Figure 2**.

#### Summary of Individual-Level Analyses

Before continuing to our inter-personal data, we briefly revisit the intra-personal results and their implications. First, we hypothesized that our stance manipulation would impact actors' postural stability. However, our initial analysis of the data did not yield the anticipated increases in torso sway variability. In light of previous conflicting findings (Ramenzoni et al., 2011), we considered the possibility that any individual changes in movement variability may have been obscured by the design of our original analysis. For example, Ramenzoni et al. (2011) analyzed movement variability as a function of each actor's own stance (e.g., Easy stance) independent of the stance of their partner (e.g., Hard stance); while our primary concern was how each actor's demands related to their partner's provided a particular stance (e.g., dyad stance: actor in Easy and partner in Hard). To address this possibility, we re-analyzed each actor's hand movement and torso movement variability only as a function their own stance (Easy or Hard) and their prescribed control (larger or smaller disk). While torso movement variability tended to be greater when actors were in the Hard stance (p = 0.058), hand movement variability remained indifferent to these two factors. Indeed, it has been observed that variability in goal-directed arm movements may remain immune to the effects of increased postural challenges (Voudouris et al., 2013). Here, the observation that hand movement variability was relatively immune to the independent effects of one's own stance suggests that individuals' hands and torsos were behaving in a synergistic fashion to meet the precision demands of the task.

Our data offers additional support for the intra-personal synergy hypothesis—actors in our task reorganized the coordination between their hand and torso to compensate for increased challenges to stance. More specifically, individual actors exhibited greater regularity (DET) and complexity (rENTR) of intra-personal coordination when in the more difficult stance condition. These changes were accompanied by increased intermittency (increases in LAM and TT) in the coordination between hand and torso. Moreover, consistent

#### TABLE 1 | Intra-personal coordination effects.


<sup>+</sup>p < 0.05; <sup>∗</sup>p < 0.01; ∗∗p < 0.001.

FIGURE 2 | Measures of intra-personal hand-torso coordination. Here and in all remaining figures (unless otherwise noted), error bars represent 95% confidence intervals. (<sup>∗</sup> ) denotes p < 0.05; (+) denotes p < 0.10.

with our hypotheses, when pairs of actors faced different stance demands, these measures differentiated pairs of actors in mixedstance conditions (Easy-Hard and Hard-Easy); but were similar across actors when they performed the task while in identical stances (Easy-Easy and Hard-Hard).

Taken together, these intra-personal results had important implications for our inter-personal analyses. First, they supported the broad hypothesis that our actors faced differing task demands due to our experimental manipulations. These differences resulted in systematic changes in patterns of intra-personal coordination between hand and torso. We further hypothesized that differences in intra-personal constraints and coordination would result in differences in the observed patterns of inter-personal coordination—most notably the leader-follower relationship between members of the dyad.

For example, one interpretation of the LJH framework would predict that that the actor exhibiting greater complexity in intrapersonal coordination should be the follower in the joint task. In our present study, complexity was indexed by the relative entropy of cross-recurrences between hand and torso (rENTR). Given our intra-personal results, this hypothesis would predict that when actors were in mixed stances, the actor in the more difficult stance (greater rENTR) would be more likely to be the follower in coordinating to meet the joint precision task. Conversely, in the same stance conditions no differences were observed between actors' intra-personal rENTR suggesting that leader-follower relationships in these conditions were less like to be systematic. In what follows, we test this hypothesis as it relates to our data.

# Dyad Level Analyses

#### Leader-Follower Analyses

When considering inter-personal coordination, LAGMAX indexed the degree to which one of the participants led the other in coordinated movement. Here, positive LAGMAX indicated that Person 2 led the coordination, and negative LAGMAX indicated that Person 1 led. ANOVA confirmed that dyad stance had a significant effect on inter-personal LAGMAX for hands [F(3,33) = 26.02, p < 0.001, η 2 <sup>p</sup> = 0.70]. As we were concerned with whether this manipulation produced a meaningful leadlag, we tested whether each resulting condition mean was different from zero by using 95% confidence intervals where 0 ∈/ Mean ± 95% CI was considered significant. Consistent with our predictions of leader-follower emergence, mean LAGMAX was significantly greater than zero in the Easy-Hard (Person 1-Person 2) stance condition (447 ± 182 ms) and less than zero in Hard-Easy condition (−517 ± 184 ms). When actors were in identical stances there was no significant lag (see **Figure 3**). No statistically significant effect was observed for disk control nor was there any observed interaction (ps > 0.05).

#### Interpersonal Hand Coordination

Main effects of dyad stance were found for DET [F(3,33) = 9.04, p < 0.001, η 2 <sup>p</sup> = 0.45], rENTR [F(3,33) = 10.26, p < 0.001, η 2 <sup>p</sup> = 0.48], LAM [F(3,33) = 3.72, p < 0.021, η 2 <sup>p</sup> = 0.25], and TT [F(3,33) = 4.37, p < 0.011, η 2 <sup>p</sup> = 0.28]. Each measure increased

FIGURE 3 | (Top) LAGMAX as a function of dyad stance. Lags less than zero indicate that Person 1 lead the coordination at the hands; greater than zero indicates Person 2 led. As illustrated here, a leader-follower dynamic emerged in conditions were actors were in mixed-stance conditions, where the actor in the Hard stance tended to lead the actor in the Easy stance. (Bottom) Mean lag profiles ( ±1000 ms) of each stance condition. Mixed dyad stance conditions (Easy-Hard and Hard-Easy) are in black. Note that these conditions produce greater asymmetries in diagRR about 0 compared to the relatively flat curves in conditions when actors were in similar stances (in gray). Error bars represent standard error.

as a function of dyad stance difficulty: Easy-Easy was lowest, Easy-Hard and Hard-Easy were intermediary, and Hard-Hard was highest (see **Figure 4**). Moreover, dyad stance × triangle interactions were observed for TT [F(3,33) = 3.18, p < 0.037, η 2 <sup>p</sup> = 0.22] and rENTR [F(3,33) = 5.43, p = 0.004, η 2 <sup>p</sup> = 0.33]. In the asymmetrical dyad stance conditions, TT was greater in periods when the time series of the actor in the Hard stance was ahead of the actor in the Easy stance. No significant differences were observed when actors were in identical stances. The interaction effect for rENTR was primarily driven by a simple effect for triangle in the Hard-Easy stance condition. No simple effects for triangle were observed in the remaining dyad stance conditions.

Notably, disk control did not have a significant main effect on any of our output measures.

### Task Performance

We measured task performance on each trial in two manners. An overall score was provided by the height of the performance meter, which was in turn a function of the amount of time spent successfully performing the alignment task. In addition, a continuous time series of inter-disk distances was used to analyze the precision with which participants performed the task.

Neither dyad stance nor control had any significant effect on overall task performance (the amount of time spent in alignment). However, dyad stance did have an effect on the average distance between the center of the two actors' disks [F(3,33) = 3.53, p = 0.025, η 2 <sup>p</sup> = 0.24]. Overall, when participants were in the Easy-Easy stance condition, they kept their avatar disks in tighter alignment (mean distance: 0.69 cm; SD: 0.12 cm) compared to the remaining three stance conditions (mean distances all greater than 0.75 cm). That said, participants were able to perform the task exceptionally well in all conditions and overall task performance was preserved in spite of increases in stance difficulty.

#### Summary of Joint-Level Analyses

Consistent with previous work (Ramenzoni et al., 2011) we found our measures of inter-personal coordination varied as a function of our dyads' shared stances. The regularity (DET) and complexity (rENTR) of coordination between actors' hands increased from Easy-Easy to mixed stance (Easy-Hard and Hard-Easy) to Hard-Hard stance conditions. Analogous increases in our laminar measures (LAM and TT) indicate that flexibility decreased in a commensurate manner. Put another way, these changes tracked with the increases in the combined stance difficulty—when both actors were in the Easy stance their combined stance difficulty was relatively lower than the challenges faced when both actors were in the Hard stance, while mixed conditions were intermediary. Viewed in this light, the pattern of inter-personal coordination effects is consistent with our observed intra-personal effects. When faced with additional challenges to completing the task, actors compensated in similar manners at both levels of coordination.

Importantly, as indicated by observed LAGMAX data, we found evidence of leader-follower dynamics between actors. These observations were consistent with our general working hypothesis—leader-follower relationships in interpersonal hand coordination were most pronounced in conditions when actors faced asymmetric stance demands. This result was supported by TT measures indicating that in mixedstance conditions the average duration in which one actor was stuck in a state previously occupied by another was greater in regions where the actor in Hard stance entered those states first. Notably, the direction of this relationship was not as we predicted given observed differences in the complexity of intra-personal coordination. Motivated by findings that extend the LJH to joint action, we predicted that actors exhibiting greater complexity in intra-personal coordination would be more likely to follow in the joint task. However, our results indicate the opposite—actors in the Hard stance, though typically exhibiting greater intrapersonal rENTR, tended to be leaders in the interpersonal hand coordination.

# Analysis by Role Motivation and Model Definition

Our results indicated the emergence of leader-follower roles in interpersonal hand coordination was most pronounced in conditions were co-actors faced asymmetrical stance demands. In light of this result we re-analyzed the intra-personal dependent measures as a function of each actor's role (leading vs. following), actor's control (smaller disk vs. larger disk) and the dyad's stance symmetry (different stances vs. same stances). A leader and a follower was determined for each trial using the interpersonal hand LAGMAX values. Because participants were not experimentally assigned to "role" and, therefore, our groups were unbalanced, we determined the relationship between our dependent measures and factors using a linear mixed effects regression model with emergent role, actor's control, and stance demands as fixed effects and dyad as a random effect. For brevity we present only the F-tests from the results here (type III Wald F-tests with Kenward–Roger degrees of freedom approximation).

## Intra-Personal Coordination as a Function of Leader-Follower

As we anticipated, intra-personal coordination measures were observed to vary along with emergent role. Overall, leaders exhibited stronger hand-torso coupling (DET) than followers [F(1,25.3) = 15.67, p < 0.001]. This difference was exaggerated when actors faced different stance demands [interaction effect: F(1,206.7) = 7.22, p = 0.007]. Similar relationships were also observed for LAM [leaders greater than followers: F(1,26.5) = 18.28, p < 0.001; interaction effect: F(1,207.3) = 10.83, p = 0.001] and TT [leaders greater than followers: F(1,26.5) = 12.54, p = 0.002; interaction effect: F(1,202.0) = 12.52, p < 0.001]. Leaders also exhibited greater rENTR than followers [F(1,23.3) = 6.13, p = 0.021], however, no interaction was observed (see **Figure 5**). Notably, similar linear effects mixed regression models for hand movement and torso movement variability did not reveal significant results—neither varied according to emergent role.

# GENERAL DISCUSSION

When two people organize their actions to achieve a shared goal, their combined efforts reflect a nested structure of intra-personal and inter-personal coordination. Combined efforts, however, almost never equate to identical efforts. Differences between actors' skill and physical abilities often result in asymmetries in task demands and, as a result, individuals working together often need to perform distinct and complementary actions in order to complete a shared task. In the present study, we investigated how these asymmetries influence both intra-personal and inter-personal coordination during a joint supra-postural task, focusing on the spontaneous emergence of leader-follower roles when performing this cooperative task.

To briefly revisit our hypotheses, on the outset we predicted that (1) actors in the tandem stance would face greater individual challenges to postural stability (as indexed by movement variability) compared to actors in the feet-apart stance, and (2) these increases in stance difficulty would reflect in the coordination between hand and torso to meet the task's precision demands. More specifically, we anticipated that (3) actors in the difficult, tandem stance condition would exhibit greater complexity and intermittency in intra-personal coordination compared to actors in the easier feet-apart condition. Provided

this result, we predicted that (4) actors in the Hard stance condition would be more likely to follow their Easy-stance confederates—that is, their hand movements would slightly lag behind the movements of partners in the easier stance. We predicted that (5) these lead-lag relationships would be most systematic during trials when actors were in different stances. When actors were in identical stances we anticipated that the presence of lead-lag would be less pronounced.

Our data support many of our original hypotheses, with a caveat regarding hypothesis 1 and an exception for hypothesis 4. At the level of individuals, we found evidence that actors in tandem faced additional challenges during the joint task. While our analyses revealed no changes in hand movement variability, they hinted at increased postural sway variability for actors in tandem stance. More notably, actors in tandem stance did exhibit changes in the organization of intra-personal coordination in line with our prediction and consistent with previous literature. These changes included increases in the regularity, intermittency, and complexity of coordination between hand and torso in order to meet the precision task demands. One interpretation for this result is that actors in the more difficult stance condition faced a reduction in the number of available states (degrees of freedom) that they could occupy, or were willing to occupy, and still complete their task. For example, when in the tandem stance, actor's movements needed to be more tightly constrained lest the actor lose their balance.

When framed as above it is perhaps not surprising that contrary to our predicted direction— actors in the more difficult stance tended to lead their Easy-stance confederates. Actors with compromised postural stability may have had less opportunity or flexibility to adapt to the activity of their partners. In turn, the more stable and meta-flexible actor—the actor who was able to respond to their partner by optimizing their rigidity and flexibility without becoming stuck or falling apart (Pincus and Metten, 2010)—use their flexibility for the benefit of the dyad (indeed, analogs to these sorts of counterbalancing relations abound in the motor literature with respect to injury and compensatory reorganization of other body segments—much like when one's right leg bears an additional load if the knee or ankle of the left legs is sprained). Our dyads were able to organize their actions to meet the shared task demands the challenges and changes had no appreciable effect of the degree to which pairs of individuals were able to complete the task. Thus, while individuals were able to work together with similar competence across our experimental conditions, they organized their intra-personal and inter-personal activity in very different manners depending on the prevailing task constraints.

It is also worth noting potential distinctions between the complexity measures used here and those often employed in research regarding the LJH. In particular, the LJH predicts that the increased variability and complexity about the subordinate joints is the result of the subordinate joints resolving interacting torques that are produced during action. As such, the hypothesis makes specific claims about components that are mechanically linked. This was also the case in Bosga et al.'s (2010) extension to joint action—actors movements were mechanically linked via a rocking board. Here, no such linkages were present between actors. Instead our actors were informationally linked. Though informational couplings have been shown to produce constraints similar in kind with mechanical couplings, it is routinely the case that there are important differences in the characteristics of the coupling produced (Schmidt and Richardson, 2008). Our results suggest that the LJH may not be the appropriate framework for addressing tasks of these sorts. At the same time, the rENTR measure may not be synonymous with the complexity measures typically employed in the LJH literature. Rather than focus on movement fluctuations between body segments rENTR speaks to the complexity/homogeneity of their coupling through time.

Our central focus was identifying relationships between individual task demands, individual task dynamics, and the self-organization of leader-follower roles in joint tasks. To this end our data demonstrate systematic relationships between individual task difficulty, the organization of action within individuals (intra-personal coordination) and the organization of action across individuals (inter-personal coordination). It is notable that the pattern of effects for both intra- and inter-personal coordination were similar in kind. Measures of inter-personal hand coordination tended to increase as a function of the dyad's shared-stance difficulty—lowest when both members of the dyad were in an Easy stance, greatest when both members were in the Hard stance, and intermediate when the individual stance difficulties were mixed. These results suggest similar compensatory processes occurring within and across individuals in order to meet changes in individual and joint task demands. Whether these increases were a functional response to the task demands or a result of a reduction in available degrees of freedom remains an open question.

Interestingly, the a priori assignment of role (disk size) appeared to have no appreciable effect on emergent role—that is, who controlled the larger disk or the smaller disk did not have any statistically significant influence on who led and who followed. We note, however, that using a similar paradigm, Davis et al. (2016), were able to identify a relationship between disk control and dyadic performance using a complementary nonlinear analysis technique, multi-fractal detrended fluctuation analysis. That no effect was found here may be due to the lack of sensitivity of our CRQA measures, or lack of additional manipulations specifically targeting this factor.

Most germane to our study, we observed that members of the dyad organized their activity into leader-follower roles when facing asymmetrical task demands. While lead-lag relationships have been investigated using CRQA methods in conversational settings (Richardson and Dale, 2005; Dale and Warlaumont, 2011) only recently have similar methods been directed at the lead-lag analysis of body movements during goal-directed joint activities (Abney et al., 2015). Here we employed an analysis that allowed us to further compare the deterministic structure of inter-personal coordination depending upon when one actor "took the lead" compared to the other. In particular, our results regarding the laminar states of intra-personal and interpersonal coordination are revealing. Increases in LAM and TT in

intra-personal coordination suggest that actors in more difficult stance conditions may have had more difficulty (or reluctance) transitioning among available stable states of behavior. Scaling up to the level of the dyad, inter-personal coordination measures indicated a lack of flexibility in the coordination between the pair, with one visiting and becoming trapped in a state that the other had previously occupied. In this regard, the multi-agent coordination was driven based on the relative flexibility of each of the actors in achieving their individual task demands. Thus, it may be possible in the future to assess the performance weight of each component of a coupled system as an indicator of who would lead and who would follow in the group, with overall group performance indexed via the laminarity measure of the dyad.

Our results, by extension, also suggest that whatever dynamics are observed at larger scales may also be observed (although, perhaps not always manifestly apparent) at smaller scales within multi-agent activity. When considering interpersonal synergies, the character of the synergy should be identifiable at any insertion point of measurement—we may characterize the collective behavior of a multi-agent system through measurement of overall group dynamics. While such an approach may not allow for the direct comparison between all members, it may provide a level of prediction that would suffice for probabilistic behavior (or behavioral capacities) of the group. More broadly, this result is consonant with recent efforts to address interpersonal activity within the framework of interaction-dominant dynamics (Van Orden et al., 2003; Diniz et al., 2011; see, for example, Riley et al., 2011). In contrast to component dominant dynamics, which characterizes systems in terms of local-scale effects between relatively static structures, interaction-dominant dynamics are characterized by effects across a range of scales. The observed similarities between intra- and inter-personal coordination dynamics do not by themselves provide conclusive evidence that the dyadic coordination in the present task represents an interaction-dominant system. Indeed, our analysis does not directly test for this possibility. However, when couched with recent investigations of interpersonal coordination that more explicitly address this possibility (e.g., Bedia et al., 2014; Dumas et al., 2014)—including in a similar task (Davis et al., 2016) our finding bolsters this hypothesis. An important takeaway from these results, then, is that a proper characterization of interpersonal behavior may necessitate looking across scales, as it is likely that multiple scales of activity are contributing to the global dynamic.

# CONCLUSION

To successfully engage in a joint action, individual actors must often resolve their own, local task demands. How individuals

# REFERENCES

Abney, D. H., Paxton, A., Dale, R., and Kello, C. T. (2015). Movement dynamics reflect a functional role for weak coupling and role structure in dyadic problem solving. Cogn. Process. 16, 325–332. doi: 10.1007/s10339-015- 0648-2

meet these demands may, at times, be wholly intrinsic, but more than likely is due to the influence of the activity of other co-actors. Here, we showed how individual task demands influenced coordination at the intra-personal and inter-personal scales, most prominently resulting in the organization of leader-follower roles in the joint action. Given that our actors did not have any specific knowledge of one another's task demands, this raises the possibility that the observed activity was organized around some informational variable related to the visual display—that is, there may have been something in the way the disks moved that influenced the emergence of roles in the present task. Future directions may seek to explore this possibility, and may offer further avenues of inquiry in the relationships between individuals and groups in joint actions.

# ETHICS STATEMENT

All procedures were conducted in agreement and accordance with the guidelines and approval of the University of Connecticut Institutional Review Board. Each participant provided verbal informed consent prior to the start the study.

# AUTHOR CONTRIBUTIONS

TD conceived of the study, designed the experiments, and drafted the manuscript. GP collected the data and performed the CRQA and statistical analyses. AK drafted the manuscript and made important theoretical contributions. All authors have read and approved the final version of the manuscript, and agree with the order and presentation of the authors.

# FUNDING

We wish to acknowledge the support of the National Science Foundation (INSPIRE: BCS-SBE-1344275) and CAPES Foundation, Ministry of Education of Brazil (BEX 0803-14-6) in the preparation of this manuscript.

# ACKNOWLEDGMENTS

We are grateful to Mikol Nguyen for his help with data collection and all around good research assistantship. We also thank James Dixon and Michael Turvey for their suggestions and recommendations on previous versions of this draft.

Athreya, D. N., Riley, M. A., and Davis, T. (2014). Visual influences on postural and manual interpersonal coordination during a joint precision task. Exp. Brain Res. 232, 2741–2751. doi: 10.1007/s00221-014-3957-2

Balasubramaniam, R., Riley, M. A., and Turvey, M. T. (2000). Specificity of postural sway to the demands of a precision task. Gait Posture 11, 12–24. doi: 10.1016/ S0966-6362(99)00051-X


coordination in female athletes who sustain a second anterior cruciate ligament injury after anterior cruciate ligament reconstruction and return to sport. Clin. Biomech. 30, 1094–1101. doi: 10.1016/j.clinbiomech.2015.08.019


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Davis, Pinto and Kiefer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Synchrony in Joint Action Is Directed by Each Participant's Motor Control System

Lior Noy1,2, Netta Weiser<sup>3</sup> and Jason Friedman3,4 \*

<sup>1</sup> Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel, <sup>2</sup> The Theatre Lab, Weizmann Institute of Science, Rehovot, Israel, <sup>3</sup> Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel, <sup>4</sup> Department of Physical Therapy, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel

In this work, we ask how the probability of achieving synchrony in joint action is affected by the choice of motion parameters of each individual. We use the mirror game paradigm to study how changes in leader's motion parameters, specifically frequency and peak velocity, affect the probability of entering the state of co-confidence (CC) motion: a dyadic state of synchronized, smooth and co-predictive motions. In order to systematically study this question, we used a one-person version of the mirror game, where the participant mirrored piece-wise rhythmic movements produced by a computer on a graphics tablet. We systematically varied the frequency and peak velocity of the movements to determine how these parameters affect the likelihood of synchronized joint action. To assess synchrony in the mirror game we used the previously developed marker of co-confident (CC) motions: smooth, jitter-less and synchronized motions indicative of co-predicative control. We found that when mirroring movements with low frequencies (i.e., long duration movements), the participants never showed CC, and as the frequency of the stimuli increased, the probability of observing CC also increased. This finding is discussed in the framework of motor control studies showing an upper limit on the duration of smooth motion. We confirmed the relationship between motion parameters and the probability to perform CC with three sets of data of open-ended two-player mirror games. These findings demonstrate that when performing movements together, there are optimal movement frequencies to use in order to maximize the possibility of entering a state of synchronized joint action. It also shows that the ability to perform synchronized joint action is constrained by the properties of our motor control systems.

Keywords: visuomotor tracking, mirror game, intermittent control, joint action, motor control

# INTRODUCTION

In order to succeed in performing a joint action, for example, lifting a heavy object together, the individual actors need to be coordinated (Sebanz et al., 2006). This social coordination can be challenging, in particular when the performed joint-action is open-ended, as in the case of jointly improvised motion (Dumas et al., 2010; Noy et al., 2011; Watanabe and Miwa, 2012; Noy, 2014; Hari et al., 2015; Gueugnon et al., 2016a; Feniger-Schaal and Lotan, 2017; Słowinski et al., 2017 ´ ).

#### Edited by:

Michael J. Richardson, University of Cincinnati, USA

#### Reviewed by:

Johann Issartel, Dublin City University, Ireland Lincoln John Colling, University of Cambridge, UK

> \*Correspondence: Jason Friedman jason@tau.ac.il

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 03 November 2016 Accepted: 23 March 2017 Published: 10 April 2017

#### Citation:

Noy L, Weiser N and Friedman J (2017) Synchrony in Joint Action Is Directed by Each Participant's Motor Control System. Front. Psychol. 8:531. doi: 10.3389/fpsyg.2017.00531

One strategy that reduces the challenge of social coordination is seeking common ground. For example, when two people are asked to independently choose a meeting point in a foreign city (e.g., Paris, the so-called Schelling game), they often manage to pick the same salient location, for example, the Eiffel Tower, from their common ground (Schelling, 1960; Clark, 1996; Vesper et al., 2011). In the context of conversation, common ground is defined as the knowledge, beliefs and assumptions of the participants about what they mutually know (Clark, 1996; Schober and Spiro, 2014). During a conversation, participants develop a hierarchy of aligned representations, the implicit common ground (Garrod and Pickering, 2004). This common ground is used to align meaning through a process of interactive alignment at lower levels such as particular choices of words or the alignment of body postures (Garrod and Pickering, 2009).

When performing joint action, people converge to an implicit common ground by moving in a more predictable way than when moving alone. For example, participants reduce the variability of their movement when they need to coordinate key presses in a reaction time task with a partner (Vesper et al., 2011) or to perform joint hopping (Vesper et al., 2013). Making your behavior more predictable is one mechanism for achieving successful joint action.

A recent finding on improvised joint motion can be interpreted according to the mechanism of convergence to an implicit common ground. In previous studies, we examined improvised joint motion using the mirror game paradigm – a theater based practice in which two actors improvise synchronized and interesting motion together (Noy et al., 2011; Noy, 2014). In the experimental one-dimensional mirror game, pairs of participants create synchronized motion together by moving handles on parallel tracks (Noy et al., 2011, 2015b; Hart et al., 2014; Zhai et al., 2015; Zhao et al., 2015; Dahan et al., 2016; Gueugnon et al., 2016a; Słowinski et al., 2017 ´ ). A main finding from these mirror game studies is that players can enter a dyadic pattern of synchronous movement using predictive control. This pattern of synchronized motion is characterized by smooth and jitter-less motion, without the typical jitter resulting from reactive control in a leader-follower dynamic. This dyadic pattern was termed co-confident motion (CC motion) (Noy et al., 2011, 2015a) and has been suggested as an experimental proxy for the state of togetherness (Hart et al., 2014; Hari et al., 2015; Noy et al., 2015b), a dyadic state high synchrony and high performance, related to the notions of group flow (Sawyer, 2008) and being in the zone (Seham, 2001; Noy, 2014).

In a recent work, we analyzed the kinematic properties of basic movement elements (motion strokes between stopping events) during CC motion (Hart et al., 2014). We found that different players converge to a canonical pattern when they enter the dyadic state of CC motion. This canonical pattern consists of symmetrical basic movements, resembling a sine wave. These movements do not have the individual tendencies observed when players are in a leader-follower dynamic, for example, the tendency to move in a non-symmetric way with high skewness. It seems that during CC motion segments, participants shed their individual motion style in order to reach a common ground that supports synchronized joint action.

Interestingly, the canonical motion pattern that was observed in synchronized CC motion was identical to the optimal solution of a well-known computational motor control model. According to the minimum jerk model – a classical motor control model that describes a wide variety of human movements (Flash and Hogan, 1985) – the optimal solution for rhythmic motion (as oppose to point-to-point motion) is a sine wave (Hogan and Sternad, 2007). It is possible that during CC periods, the two players converge to a canonical pattern stemming from an optimal state of each participant's motor control system. This connection suggests a general mechanism for achieving synchrony in joint action: finding the common ground stemming from the similar motor control systems of the two participants.

To test this idea, we looked for a feature of participants' motor control systems that will direct the choice of motion parameters during synchronized joint action to a specific 'sweet spot.' One clue was an auxiliary finding in Hart et al. (2014). In the supporting information, we analyzed the peak velocity and frequency of motion segments within and outside CC motion. We found that CC segments tend to occupy a different region of the velocity-frequency space to leader segments. In particular, CC motions tend to have shorter durations, with motion frequencies in the range of 0.6 – 1 Hz (see Supplementary Figure S5 in Hart et al., 2014). It seems that the 'sweet spot' for achieving synchronization in the mirror game is for movements at relatively high frequencies.

Several studies from the field of motor control suggest a mechanism that explains this preference for achieving synchronization at higher frequencies. It turns out that that people cannot perform smooth motions (i.e., with a single peak in the velocity profile, or equivalently, without jitter) that are longer than a certain duration (Morasso et al., 1983; Milner, 1992; Vikne et al., 2013). In the context of motor control, a smooth motion without jitter is often considered as a submovement, a central concept in the theory of intermittent control (Navas and Stark, 1968; Miall et al., 1986, 1993; Burdet and Milner, 1998; Morasso et al., 2010). According to this theory, for point-to-point movements with a longer duration than a certain threshold (that is, below a certain frequency of motion) the motor control system cannot produce a single smooth motion (with a single peak in the velocity profile) but rather divides the motion into multiple, overlapping submovements, which results in jitter and nonsmooth motion. For example, van der Wel et al. (2009) showed that people can produce smooth motions only up to a duration of approximately 1000 ms (corresponding to a frequency of 0.5 Hz).

To summarize, we hypothesize that participants cannot perform CC using relatively long duration motions (low frequencies), as these motions cannot be performed with a single velocity peak due to limitations of the motor control system. Supporting this hypothesis is a recent finding from our lab where we analyzed the motor control mechanisms underlying the mirror game using controlled perceptual-manual tracking tasks (Noy et al., 2015a). In that work, we found that the rate of participants' jitter motion increases at lower frequencies of the tracked stimuli (see Figure 4B in Noy et al., 2015a). As CC

motion requires no jitter, this finding supports the notion that participants will perform more CC motion as the frequency of the tracked stimuli is higher.

To test this hypothesis we followed a dual-track route, analyzing both tracking experiments using fixed stimuli, and the more ecological dyadic mirror games. The mirror game is an open-ended task and hence is challenging for testing specific hypotheses, as the experimenters do not have control over the range and variation of performed motion. To overcome this, we previously suggested supplementing the mirror game with controlled experiments focusing on the perceptual-manual tracking facet of the game (Miall et al., 1986, 1993; Noy et al., 2015a). Here, we follow this route by asking participants to manually track continuous one-dimensional movements that were displayed on a computer screen, in a setup similar to the experimental mirror game (Hogan and Sternad, 2007; Elliott et al., 2009; Degallier and Ijspeert, 2010). This enables us to create an evenly designed set of stimuli with different combinations of frequencies and velocities. The same set of stimuli was presented to all participants, and we hypothesized that the probability of CC motion (synchronized and smooth motions produced by the participants in the manual tracking) should increase as a function of the frequency of the presented stimuli.

In addition, to connect our findings to the field of joint action and social coordination, we performed the same analysis on a series of datasets from two-player experimental mirror games collected in previous studies (Noy et al., 2011, 2015b; Hart et al., 2014; Feniger-Schaal et al., 2016). These datasets include pairs of expert improvisers, and pairs of a repeated expert and a novice, in different conditions (e.g., round duration, leader/follower role). The current study therefore studies the effect of stimuli frequency on the rate of CC motion both in a well-controlled single person tracking task, and in a more ecological and open-ended twoperson task.

# MATERIALS AND METHODS

# Participants

Eighteen right-handed participants participated in the experiment, from the student population at Tel Aviv University (age 21–29, 12 females). Right handedness was confirmed using the Edinburgh inventory (Oldfield, 1971). The study was approved and carried out in accordance with the Tel Aviv University Human Ethics committee, and all participants gave written informed consent in accordance with the Declaration of Helsinki. The participants were paid for their participation.

# Apparatus

Data was collected using a digital graphics tablet (30.5 cm × 45.5 cm, Intuos2, Wacom Ltd), with a Samsung computer monitor (29.5 cm × 53.3 cm) used to display feedback of the hand position in the various conditions. Data collection was carried out using the RepeatedMeasures software (Friedman, 2014), and the data was analyzed using custom Matlab (MathWorks, Inc.) scripts.

# Experiment Setup

The participant was seated in front of a table, on which the graphics tablet rested (see **Figure 1A**). A custom-made shelf (made by cutting a hole in the top of an IKEATM LACK coffee table and trimming the legs) was placed directly above the tablet, which held the computer monitor that displayed feedback such that it was positioned 20 cm above the tablet. The seat height was adjusted so that the participant could move their hand freely on the tablet. The participant held a stylus in their dominant (right) hand; movements were restricted to 1D (left–right movements) by creating a track with two metal rulers. The location of the tablet and the screen was calibrated such that the location of the feedback shown on the screen was exactly above the actual position of the stylus (under the stand), participants could only see this feedback presented on the screen, and not their hand movements directly. We estimated the delay between movement of the hand and visual feedback of its location at approximately 80 ms, using a high-speed camera (120 Hz) camera, by comparing in a test the first frame when the hand moves compared to the first frame when the ellipse moves. This is comparable to the values found in similar setups (Zopf et al., 2015). This delay was not noticeable to the subjects, particularly as subjects could not see their hands moving, only the feedback.

# Experimental Protocol

An oscillating stimulus, consisting of half-sine waves, was shown as a red ellipse moving horizontally (**Figure 1B**), with each trial beginning with the red ellipse appearing in the center of the screen followed by a gong sound. As the stylus touched the tablet a blue ellipse appeared above its location (**Figure 1C**). The participants were instructed to imitate the movement of the red ellipse with the blue ellipse by moving the stylus left and right. The task included 11 one-minute trials, with breaks between each trial. The frequencies of the movements were selected from the frequencies 0.25, 0.375, 0.5, 0.625, 0.75, and 0.875 Hz, and the peak velocities selected from 20.0, 26.7, 33.3, 40.0, and 46.6 cm/s, such that each frequency and peak velocity occurred approximately the same number of times. Each trial consisted of three < frequency, peak velocity > combinations (e.g., see **Figures 1D–F**), apart from trials 1 and 6 which consisted of only two combinations. The complete set of stimuli is described in **Table 1**, and is available for download (Noy et al., 2016). To prevent discontinuities in the velocity profiles, we replaced the position and velocity between 250 ms before to 250 ms after the join (points where the prescribed frequency and/or amplitude change) with a third order polynomial fit to match the position and velocity at its start and end, thus ensuring that the position and velocity were continuous throughout the trial. The order of the trials was randomized for each participant.

# Data Analysis

We calculated the relative position error (dX), relative velocity error (dV), and mean timing error (dT) using the techniques described in Noy et al. (2015a). These values are reported in **Table 2**. The jitter and co-confident (CC) periods were computed using the same techniques described previously (Noy et al.,

FIGURE 1 | Experimental setup. (A) The experimental setup consisted of a Wacom Intuos 2 tablet situated under a table, such that the participant could not see their moving hand. The participants moved the stylus left and right within a channel formed by two metal rulers. (B) Feedback on the position of the stylus was provided by a blue oval, which moved left and right exactly the same amount as the hand moved left and right. The participants were instructed to follow the movement of the red oval, which also moved only left-right. (C) A screenshot of the experiment, showing the red, computer controlled oval, and the blue, participant controlled oval (D–F). Three example of the stimulus (trials 2, 3, and 7), consisting of concatenated half-sine waves. The numbers on the graphs indicate the frequency of that part of the movement (separated by the dashed lines). The peak velocities for the three segments were (D: trial 2) 20.0, 40.0, and 26.7 cm/s; (E: trial 3) 46.6, 33.3, and 46.6 cm/s; and (F: trial 7) 33.3, 20.0, and 46.6 cm/s.

#### TABLE 1 | Stimulus properties.


The stimuli consisted of repeated half-sine waves, which started and ended at the horizontal midline of the screen, and alternated between movements to the left and right. Each trial consisted of three combinations of frequencies and peak velocities, with the exception of trials 1 and 6, where a single frequency/peak velocity combination was shown for two thirds of the trial. Due to different durations of the half-sine waves, each frequency/peak velocity combination was not exactly the same length, rather they were selected to be approximately one third of the trial duration (i.e., 20 s).

2015a,b). Briefly, we found the best registration of the data with the stimulus (Tang and Müller, 2008). We determined the locations of acceleration zero crossings (AZC), and removed those that corresponded to AZC in the stimuli. The remaining AZCs were defined as the jitter points. The jitter frequency is calculated as half the reciprocal of the distance between jitter points. Segments of movements were classified as CC if they contained exactly one AZC (i.e., no jitter), and the stimuli and response were fairly similar [dV < 0.95, dT < 0.15 s; see Noy et al. (2015a) for definitions of these measures]. **Figure 2** shows examples of jitter and CC regions. Values are presented as means ± standard deviation. 95% confidence intervals are presented for all parameter estimates.

#### Similarity between Participants' CC Segments

We measured the similarity between participants' CC segments. We first separated the 11 trials to sections of fixed stimuli (a specific pair of frequency and peak velocity). This resulted in 31 segments (from nine trials with three sections and two trials with two sections, see **Table 1**). We converted the motion traces in each section (position vectors) from all participants to CC vectors with the same length, containing 1 for time points that were inside motion segments that were detected by the automatic CC algorithm (CC segments) and 0 otherwise. We next compared for each trial section, all possible pairs of CC vectors from different participants (yielding 153 comparisons, from

our N = 18 participants). We computed the Hamming distance for each comparison of two CC vectors (coming from different players responding to the same stimuli). We averaged the Hamming distance of each pair of players over the 153 pairs to arrive at a distance score (between 0 and 1). The resulting 31 distance scores reflect the average distance between CC responses for a given stimuli (trial section).

To test the statistical significant of these distance scores, we compared them to distance scores of shuffled data. To create a single shuffled dataset, we repeated the above procedure with one difference. When stacking together CC vectors of our 18 participants we randomly chose for each participant a CC vector that is, a CC vector from the same participant but from any of the 11 trials. Notice that we did not shuffle the order of the section, that is, the shuffled data compared the response of players to the same section (first, second, or third) in different trials. For a single shuffled dataset, this procedure resulted in one set of 31 simulated distance scores similar to the real distance scores. We repeated this procedure 10,000 times, and averaged across all simulations to get a set of simulated distance scores from the shuffled data. We then computed the statistical difference between the real distance scores and the simulated distance scores using a matched-pair t-test.

# Dependence of CC Probability on Frequency and Peak Velocity

We plotted a histogram of CC probability as a function of stimulus frequency, using the CC values described above. The stimuli frequency could only take one of six values, due to the experimental design. We similarly plotted the CC probability as a function of the peak velocity (one of five values).

# CC Probability in Two-player Mirror Games

We computed the CC probability in two-player games, taken from previous studies, as a function of motion frequency. These data sets were collected in previous studies on the two-player mirror game (Noy et al., 2011; Hart et al., 2014; Feniger-Schaal et al., 2016). We looked at three data sets: "Expert–Expert (EE)," "Novice-Expert 1 (NE1)," and "Novice-Expert 2 (NE2)." Description of the three data sets appears in **Table 3**. Note that in contrast to this study, the frequency of the motion can take any value. To allow easy comparison with the current study, we used the frequencies selected in this study as the bin centers in the histogram, which means that the number of entries in each bin will differ.

# Comparison of CC Probability across Different Experiments

We compared the CC probability in the different experiments using a mixed-design ANOVA, with between-subjects factor of experiment [four experiments – experiment from this paper (TP), and the three two-player games: EE, NE1, and NE2], and a within-subject factor of frequency (six values). Tukey's honest significant difference test was used for post hoc comparisons.

# RESULTS

# Participants Succeeded to Track Mirror-Game Like Motion

As expected, the participants could successfully track the stimuli, with relatively little error. The tracking errors are shown in **Table 2**, which can be compared to Table 2 from Noy et al. (2015a), from where it can be observed that the errors are of a similar order of magnitude. It should be noted, however, that in the Noy et al. (2015a), study, the stimuli were unpredictable, whereas in this study they were largely predictable. This may explain why in this study we found lower dX and mean timing errors (dV), as well as lower jitter frequency rates and much higher %CC values.

# CC Segments Are Similar across Participants

During CC segments, the participants move in synchrony with the stimuli, and show little or no corrective jitter movements. Two examples of stimulus and response are shown in **Figure 3**. In the CC segments, shown in gray, there is almost no jitter corrections (i.e., AZCs, shown as black stars), and the participant's velocity profile is very close to the velocity of the

#### TABLE 2 | The values shown are the mean and standard error over the 18 participants.


Relative position error (dX) and relative velocity error (dV) are unitless.

#### TABLE 3 | Details of the data used to calculate CC proportion from two-player games from previous studied.


stimulus. Different trials showed different amount of CC motion (see **Table 2**, last column), because of the different stimulus properties. The CC segments for all stimuli and participants are shown in **Figure 4**, with the dotted lines indicating the time of the change in frequency and/or peak velocity of the stimuli (different trial sections). It can be observed that there is much overlap between participants in their CC regions.

To test this, we computed the distance score of CC vectors of different participants in each trial section (see Materials and Methods), and compared it to simulated data (see Materials and

Methods). As expected, the distance scores from the real data (mean ± SD: 0.27 ± 0.16) was lower than the average distance scores from the simulated shuffled data (0.47 ± 0.03), and these differences were statistically significant (matched paired t-test: t(10) = −6.67, p < 0.001, 95% CI = [0.22 −0.33]).

# Probability of CC Is Predicted by the Frequency of the Stimuli

In the previous section, it was shown that CC segments are relatively consistent across participants, which implies that the probability of observing CC is a function of stimulus properties. Using data binned for all participants and trials, we showed that the probability of CC is a function of the frequency of the stimuli (see **Figure 5A**), specifically the probability of observing CC increases dramatically as a function of stimulus frequency, with no CC observed for any participant at the lowest frequency stimuli used in this experiment (0.25 Hz). As the stimulus frequency increases, the probability of observing CC increases. To test whether this result is significant, we performed the same comparison but individually for each participant. We then tested whether the slope of the regressions lines was significantly greater than zero, and found that for all participants, the slope was indeed greater than zero, this difference is supported by a t-test (t(17) = 24.48, p < 0.0001, 95% CI = [1.09 1.30]). In the Supplementary Material, we show that this finding is not simply a result of the CC detection algorithm used.

A similar comparison can be performed with peak velocity, shown in **Figure 5B**. While the probability of observing CC does increase as a function of increasing peak velocity, the change of probability is much less dramatic (approximately from 0.4 to 0.6). This increase is observed consistently across participants, with all participants showing slopes of regression lines greater than zero, supported by a t-test (t(17) = 11.72, p < 0.0001, 95% CI = [0.007 0.009]).

# Relationship between Movement Frequency and CC Is Also Found in Two-player Mirror Games

In the previous section, we showed that the probability of CC can be predicted by the frequency of the tracked stimuli, for a one-player version of the mirror game with largely predictable stimuli shown on a computer screen. In contrast, in the regular two-player version of the mirror game, the motions (movements of a handle) are chosen in an open-ended manner by the players. To test whether the effect of stimuli frequency on the probability of achieving CC generalizes to this version of the mirror game, we performed a similar analysis with data from three additional data sets, described in **Table 3**.

The relationship between motion frequency and CC probability are shown in **Figure 6**. For all three experiments, the probability of CC is zero at low motion frequencies, and increases as the motion frequency increases. Unlike the results from the current study, there is a drop-off at a higher motion frequency. To determine whether this result is seen across subjects, we again fitted a regression line for each participant, and tested whether they are positive using t-tests. For all three groups, we found positive slopes for all subjects, supported by t-tests (EE: t(8) = 4.04, p = 0.004, 95% CI = [0.27 1.00]; NE1: t(23) = 9.76, p < 0.0001, 95% CI = [0.32 0.49]; NE2: t(38) = 12.45, p < 0.0001, 95% CI = [0.54 0.74]).

# Comparison between the Experiments

We compared the four experiments using a mixed-design ANOVA. The CC probability differed between the groups, as shown by a main effect of experiment [F(3,85) = 42.9, p < 0.001]. Pairwise comparisons revealed that the percentage of CC in the experiment described in this paper (TP: 39.1 ± 2.1%) was significantly higher than those in the other three groups (EE: 22.5 ± 3.1%; NE1: 10.1 ± 1.8%; NE2: 15.3 ± 1.4%; p < 0.001 for all three). Additionally, the EE group show significantly higher CC probabilities than the NE1 group (p = 0.005), but the NE1 groups was not significantly different from the NE2 group. There was also a main effect of frequency [F(5,425) = 168.4, p < 0.001], with each subsequent frequency showing a CC probability significantly higher than the previous frequency (p < 0.001), apart from the last pair (0.75 and 0.85 Hz), which were not significantly different (p = 0.326). Finally, there was an interaction of experiment and frequency [F(15,425) = 13.85, p < 0.001], which demonstrates that the slopes were different for each experiment. In particular, while the differences are very small for low frequency stimuli (0.25 Hz), with the differences between groups ranging from 0% (TP and NE1/NE2; not significant) to 1.9 ± 0.7% (TP and EE; p = 0.04), for the higher frequencies, there is a greater difference between the groups. For

example, at 0.875 Hz, the differences range between 7.3 ± 6.5% (TP and EE; not significant) and 32.3% (TP and NE1; p < 0.001).

# DISCUSSION

We analyzed participants' ability to manually track piecewise constant stimuli, simulating the behavior of a follower in a mirror game. The 'virtual leader' produced the same movements across different participants. By using the same stimuli (which is not the case in the regular mirror game), we were able to

expose reoccurring patterns in human motion synchronization behavior. In particular, we focused on participants' co-confident (CC) motion periods, and their relationship to the tracked stimuli frequency and peak velocity.

We found that participants successfully tracked the virtual leader's motion. The manual tracking was done with lower errors compared to Noy et al. (2015a). This difference is probably due to the fact that the stimuli in the current study were more predictable and less complex than in the previous work. CC regions were strikingly similar across participants (**Figure 4**), a fact that can be observed due to the repeated stimuli used in the current, one-player version of the mirror game.

The main finding of this work is that the probability of CC was well predicted by the frequency of the stimulus. At low frequencies (slow movements), there was no CC at all, and the amount of CC increased as the frequency increased. The effect of the magnitude of the peak velocity of the stimulus on the probability of CC was much smaller. This finding was corroborated with the analysis of three data sets from studies employing the two player mirror game. While there is an imbalance in the two experimental designs (one person vs. two people; predetermined stimuli vs. individually selected stimuli), we suggest that the similar findings strengthen our claims that this is a general principle and not specific to the types of game.

Numerous studies have examined the question of perceptionaction coupling (Kelso et al., 1990; Prinz, 1997; Wolpert and Ghahramani, 2000; Rizzolatti and Sinigaglia, 2010), i.e., the inter-relatedness or common coding of perception and action. Observing a movement being performed can trigger a representation of the necessary movement to be made, potentially as a result of mirror neurons in the brain (Rizzolatti and Sinigaglia, 2010). In this task, the participants need to predict the future location of the stimuli in order to succeed in producing smooth movements. This may be achieved through a process of neural simulation (Wolpert et al., 2003). In this study, we found that the participants were unable to generate smooth movements at low frequencies. Based on the action-perception framework, this may be a result of either an inability to predict such movements (as they are not part of our natural repertoire), an inability of the motor system to produce them, or a combination of the two.

Similar tasks have been studied in the past, including tracking tasks (e.g., Miall et al., 1993), tapping to an external cue (Repp, 2005; Repp and Su, 2013) and music tasks (Novembre and Keller, 2014). A wide variety of analysis techniques have been used, including comparing power spectrums (Miall et al., 1993), error magnitudes, neuroimaging, measures of synchrony to specific events such as metronome beats (Repp, 2005) and variability (Elliott et al., 2009) to name a few. In this task, as we were specifically looking at the question of which stimuli can be successfully copied in a smooth manner, we chose to focus our analysis on the CC measure.

The current findings demonstrate the usefulness of our approach of using controlled, single player mirror game studies to complement studies on two player mirror games. The mirror game is a useful paradigm that allows for a quantified analysis of synchronization in an open-ended joint action task. The usefulness of task is demonstrated by the large number of published studies that employ the mirror game since its origin as an experimental paradigm in 2011 (Hart et al., 2014; Słowinski ´ et al., 2014; Noy et al., 2015b; Feniger-Schaal et al., 2016; Gueugnon et al., 2016a,b; Słowinski et al., 2016 ´ ). However, the open-ended nature of the task makes it difficult to perform repeated and well-controlled experiments, as each game has different motion patterns. Using a single person mirror game with a virtual (and fixed) leader overcomes this challenge (Noy et al., 2015a). Other groups have taken this approach a step further by developing and testing models of following, leading and joint improvisation in the mirror game using well-controlled avatars and robots (Zhai et al., 2014; Zhao et al., 2015; Khoramshahi et al., 2016; Słowinski et al., 2016 ´ ).

Our approach also integrates methods and findings from the fields of motor control and joint action, for studying the motor control layer of jointly improvised action. This integration is in line with recent works showing the interplay of joint action and motor control, for example, studies using motor control concepts such as synergies in the context of joint action (Riley et al., 2011; Romero et al., 2015). The current work contributes to this literature by highlighting the role of an individual's motor control system in guiding and possibly limiting joint action.

The current work offers several contributions to the field of motor control. First, we add to previous findings showing an upper limit on the duration (or lower limit on the frequency) of smooth motion segments (van der Wel et al., 2009). By systematically manipulating both the frequency and the peak velocity of the stimuli, we replicated in a systemic way the strong effect of stimuli frequency (and to a much lesser effect, of peak velocity) on the possibility of moving in a smooth way. In addition, we showed this effect in a continuous repetitive tracking task, while previous works used point-to-point motion guided by a metronome. It will be interesting in the future to study the smoothness of participants' movements in response to stimuli at different frequencies, presented either visually as in our manual tracking task, or using auditory cues, as in the metronome driven tasks of van der Wel et al. (2009).

In general it seems that human prefer not to make slow, long duration movements, although these movements may use less energy (Berret and Jean, 2016). This is likely because there is also a cost to making longer duration movements, for example attentional or metabolic costs. Shadmehr (2010) suggested temporal discounting as an explanation for the tendency to avoid slow movements. Temporal discounting says that given a particular movement to make, making a faster movement will lead to a larger reward; this reward can overcome the additional costs involved in making a faster movement (e.g., greater energy expenditure).

The notion of intermittent control (Navas and Stark, 1968; Miall et al., 1986, 1993; Burdet and Milner, 1998; Morasso et al., 2010; Gawthrop et al., 2011) implies that complex movements (like the movements in the current experiment) are composed of multiple submovements that are concatenated together. Each submovement is generally assumed to be smooth, for example following a minimum jerk velocity profile. Whilst the stimuli in this experiment are maximally smooth (consisting of sine waves),

the participants do not generate sine waves themselves when the frequency of motion is low. Rather, they concatenate multiple submovements to approximate the shape of the sine wave, but in doing so, they produce jittery movements. In this case, as the ideal duration of the movement is fixed by the stimuli, temporal discounting cannot explain why subjects do not produce smooth and long duration submovements instead of jittery movement consisting of several submovements. The best strategy to mirror a player who uses long duration submovements is to move in a similar way, also using long duration submovements. According to the speed-accuracy trade-off (Wickelgren, 1977), these longer duration submovements should also be more accurate. Avoiding these movements – and making more intermittent corrections – leads to worse performance, and a reduction in reward. It remains an open question whether avoiding long duration submovements stems from a neural constraint, a biomechanical constraint, a lack of practice in performing such movements, or a combination of these factors.

The issue of practice raises an interesting question that can be studied experimentally. It is likely that similar to most other perceptual-manual tasks, the performance in the online tracking task of the current experiment can be improved with practice. Previous research has shown a clear distinction between the performances of experts and novices in the mirror game (e.g., Noy et al., 2011, and see also **Table 3**). The higher performance of experts in the mirror game can be the result of learning in different routes: better execution, better perception and factors related to the joint improvisation per se (e.g., the ability to leave a stable pattern, see Dahan et al., 2016). The current paradigm offers the opportunity to test one of these possible routes of performance improvement.

The current work also offers several contributions for the field of joint action. The mirror game is recognized as an important paradigm for joint action and social neuroscience (Hari et al., 2015) and is used as a tool for measuring and developing interventions for different social disorders (Bardy et al., 2014; Brezis et al., 2015). The analysis of CC periods is central for mirror game studies, due to its theoretical underpinning as a marker of co-predictive controllers (Noy et al., 2011; Dahan et al., 2016), and its presumed connection to the experience of 'togetherness' (Noy, 2014; Noy et al., 2015b). It is therefore important to understand the limits of this measure. We find that achieving CC is much easier in medium-to-fast frequency motions. During low frequency motions, there is a relatively high amount of jitter, that stems not from a dyadic failure in performing improvised joint action but from limits of the motor control systems of each individual. This is an important observation for researchers using the mirror game as an experimental and interventional paradigm.

More generally, this observation highlights the need to be extremely careful when moving from theoretical concepts ('togetherness') to a well-defined operational metric (CC motions). We have previously noted that the CC measure captures only a 'thin slice' of the phenomenon of togetherness (Noy, 2014). For example, in a previous work participants in the mirror game produced little CC at low frequencies (in line with the findings here) but sometimes reported a high level of subjective togetherness at these moments (Noy et al., 2015b). Togetherness and CC should not be treated interchangeably, and the current work further highlights this notion.

In the context of theater improvisation, the finding that motion synchronization is easier to obtain using high frequency movements is somewhat surprising. In theater improvisation the mirror game is used as an exercise for bringing actors into a state of togetherness (Noy, 2014). To enhance the chances of getting into this state of togetherness a teacher might suggest that participants should move slowly (i.e., long duration movements) and use simple and repetitive motions (Boal, 2000). In contrast, the current work shows that in the experimental one dimensional mirror game participants are better able to achieve synchronization when avoiding long duration movements.

Future studies can further analyze and explain the differences between the one dimensional and whole body mirror games. The enrichment in synchronized movements at high frequencies in the one-dimensional game vs. low frequencies in the whole body mirror game might stem from different sources. One possible explanation involves the different perceptual complexity in the two setups. In the whole body mirror game, participants freely move different body parts, including their arms, torso and legs, and their partners have to simultaneously move the same parts. In the experimental mirror game, participants perform only back and forth motions of a single end-effector. Maybe the more complex multi-part motions in the whole body mirror game cannot be tracked when movements are at high frequencies, due to increased perceptual demands. In other words, depending of the task difficulty, slowing down or accelerating the motion could be both beneficial in a synchronization task.

Another possible route can model the different costs and rewards in the two setups. In the mirror game task, participants have different costs (e.g., energy consumption, cognitive load) that are related, among other things, to the speed and the complexity of the performed motions. The relationships between these different factors can be task dependent. For example, in the one-dimensional mirror game the physical motion is constrained in a track with clear boundaries, and it is possible that cognitive or biomechanical effects reduce the costs of highfrequency motions in this setup. In a similar vein the mirror game task induces different rewards, including an inner feeling of togetherness that might be related to the state of CC motions. A future model can try to tie together these different factors. As a small step toward this goal we have recently tested the subjective experience of participants in the mirror game, and found a higher level of subjective togetherness in CC periods, reported using a continuous togetherness-dial, when participants watch a video recording of their own games (Noy et al., 2015b).

Finally, it is possible that in the whole body mirror game, participants achieve the state of togetherness with motion patterns that differ from the operational CC measure developed for the experimental mirror game. Future studies can examine these questions, and measure the kinematic patterns of synchronized motion in the whole body mirror game. It will be interesting to discover whether players similarly converge to a 'sweet spot' of motions when they get into synchronized motion.

Part of the inability to perform slow movements may be due to the difference between the frequencies of these movements and the resonant frequency of the body parts being moved. Limbs possess mechanical properties, which determine their resonant frequencies (Turvey et al., 1988). Making movements at close to the resonant frequency results in lower metabolic costs (Holt et al., 1995), greater stability and maximal predictability of movements (Goodman et al., 2000). The slow movements described here (0.5 Hz) are significantly slower than the resonant frequency of the muscle-limb complex of the forearm, which was observed to range from 1.1 to 2.0 Hz (Hatsopoulos and Warren, 1996), although we note that this is not a perfect model of the arm as used in this experiment. Similarly, when coordinating pendulum movements, subjects are best able to coordinate their movements when the resonant frequencies of the pendulums are similar (Schmidt and Turvey, 1994).

The main claim of the current work is that a specific limit of individuals' motor control systems (the inability to perform long duration, smooth motions) dampens the twoperson synchronization: achieving CC at low frequencies is simply not possible. There is, however, a silver lining for this limitation. As both individuals have similar bodies, which are controlled in a similar way, we can speculate that their similar motor control systems impose similar limitations on their joint action. In this sense, the similarity of the dyad's bodies provides a common ground that supports their joint action.

This interpretation raises interesting questions about importance of similarity between actors' motor controls and bodies in joint action. It was suggested that observers use a model of their own movement kinematics to predict the actions of others (Prinz, 1997; Sebanz et al., 2003; Colling et al., 2014). If so, a similarity of body proportions between two agents might be helpful in achieving synchronization in joint action. Previous work supported this idea by showing that people synchronize better with recording of their own actions (Flach et al., 2003; Keller et al., 2007). In the context of the mirror game, one can speculate therefore that it will be easier to perform mirroring between similar agents, for example, between two adults vs.

# REFERENCES


and adult and a child. Recent studies have started to unpack these questions by showing, for example, that people with similar motion repertoires perform better together in the mirror game (Słowinski et al., 2016 ´ ).

Despite the importance suggested here for the similarity of motor control systems in synchronized joint actions, it is possible that mirroring can be achieved between agents with very different bodies and motor control systems. One example is cross-species mirroring. It was shown that dolphins are able to mirror human motions by using different body configurations, for example by lifting their tail from the water in response to a sitting human lifting her leg (Herman, 2002). In other words, while we suggest here that synchrony in improvised joint action is directed by the individuals' motor control systems, we believe that such synchrony is not totally dictated by the interacting motor control systems, and that mirroring and togetherness can be achieved via multiple routes (Rumiati and Tessari, 2002).

# AUTHOR CONTRIBUTIONS

LN and JF conceived and designed the experiments. NW performed the data collection. LN and JF participated in the statistical analysis and interpretation of the data. LN, NW, and JF wrote the article.

# ACKNOWLEDGMENT

We thank members of the Theater Lab at the Weizmann Institute for stimulating discussions and helpful comments on this work.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2017.00531/full#supplementary-material


Flach, R., Knoblich, G., and Prinz, W. (2003). Off-line authorship effects in action perception. Brain Cogn. 53, 503–513. doi: 10.1016/S0278-2626(03)00211-2



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Noy, Weiser and Friedman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Modeling Multi-Agent Self-Organization through the Lens of Higher Order Attractor Dynamics

Jonathan E. Butner\*, Travis J. Wiltshire and A. K. Munion

Department of Psychology, University of Utah, Salt Lake City, USA

Social interaction occurs across many time scales and varying numbers of agents; from one-on-one to large-scale coordination in organizations, crowds, cities, and colonies. These contexts, are characterized by emergent self-organization that implies higher order coordinated patterns occurring over time that are not due to the actions of any particular agents, but rather due to the collective ordering that occurs from the interactions of the agents. Extant research to understand these social coordination dynamics (SCD) has primarily examined dyadic contexts performing rhythmic tasks. To advance this area of study, we elaborate on attractor dynamics, our ability to depict them visually, and quantitatively model them. Primarily, we combine difference/differential equation modeling with mixture modeling as a way to infer the underlying topological features of the data, which can be described in terms of attractor dynamic patterns. The advantage of this approach is that we are able to quantify the self-organized dynamics that agents exhibit, link these dynamics back to activity from individual agents, and relate it to other variables central to understanding the coordinative functionality of a system's behavior. We present four examples that differ in the number of variables used to depict the attractor dynamics (1, 2, and 6) and range from simulated to non-simulated data sources. We demonstrate that this is a flexible method that advances scientific study of SCD in a variety of multi-agent systems.

#### Edited by:

Joanna Raczaszek-Leonardi, University of Warsaw, Poland

#### Reviewed by:

Robert J. Lowe, University of Skövde, Sweden Hecke Schrobsdorff, Max Planck Institute for Dynamics and Self Organization, Germany

#### \*Correspondence:

Jonathan E. Butner jonathan.butner@psych.utah.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 08 August 2016 Accepted: 27 February 2017 Published: 20 March 2017

#### Citation:

Butner JE, Wiltshire TJ and Munion AK (2017) Modeling Multi-Agent Self-Organization through the Lens of Higher Order Attractor Dynamics. Front. Psychol. 8:380. doi: 10.3389/fpsyg.2017.00380

Frontiers in Psychology | www.frontiersin.org March 2017 | Volume 8 | Article 380 |

Keywords: dynamical systems, social coordination dynamics, multi-agent coordination, attractors, agent-based modeling

# INTRODUCTION

For many animals and humans, social interaction is pervasive in daily life. Social interaction occurs across many time scales and varying numbers of agents; from one-on-one to large-scale coordination in organizations, crowds, cities, and colonies. Since social interactions occur at different scales, and in ways that change dynamically over time, they can be quite a complex phenomenon to study without appropriate guiding theoretical and methodological frameworks.

In dynamical systems theory, complexity arguably occurs due to the emergent, selforganizational nature of the system. Emergent, self-organization here implies that there are higher order macroscopic patterns occurring over time that are not necessarily due to the actions of any particular controlling agents or components, but rather due to the collective ordering that occurs from the individual interactions of the agents or components of the system (Halley and Winkler, 2008). Taken together these diffuse interactions contribute to more macro-scale phenomena that are observed over time. Common examples of this type of emergent, self-organization of social behavior occurs in flocking birds and in schools of fish that appear to move in a highly coordinated fashion (e.g., Couzin and Krause, 2003).

Because of the multitude of agents (or system components) that give rise to emergent patterns, it is often difficult to determine how one should depict the resultant system. In line with approaches to social coordination dynamics (SCD), we aim to uncover the dynamic processes that underlie the ways in which agents are able to organize their behavior and change together in time (Oullier and Kelso, 2009). This emergence is a form of coordination that specifically implies the occurrence of a functional ordering of components that interact across spatial and temporal dimensions, often with multi-directional relationships (Kelso, 2009; Butner et al., 2014a). We aim to model this emergent, multi-agent coordination through attractor dynamics depictions (which we discuss in detail in the next section).

From the SCD perspective, two or more agents are able to coordinate their behavior based on some form of mutual information exchange. This information exchange generates coordinative structures with higher order patterns not easily identifiable from the lower order interactions. The resultant higher order patterns are then depicted through attractor dynamics in which the patterns are attributed with stability properties implied by an underlying topology (Kelso, 1995). SCD is thus consistent with notions of weak emergence (Bedau and Humphreys, 2008), but the resultant patterns have then been modeled using attractor dynamics descriptions depicting patterning over time that have stable properties under perturbations. One major distinction between SCD and examples of weak emergence (usually through agent-based or cellular automata models) is in the scale of the social systems involved.

Agent-based models are usually quite large-scale social systems, while SCD has often focused on a dyadic scale of analysis. SCD has excelled in generating models of intentional and spontaneous dyadic interpersonal rhythmic behavior such as finger or limb oscillations (e.g., Haken et al., 1985; Schmidt et al., 1990; Oullier et al., 2008), swinging pendula (Schmidt and O'Brien, 1997), and rocking in chairs (Richardson et al., 2007). Some recent work has provided ways to assess social interactions in larger scales such as coordination of groups bigger than dyads (e.g., Richardson et al., 2012; Duarte et al., 2013). One challenge is to generate models of emergent, multi-agent coordination in social systems where the agents may not behave rhythmically per se, but are following some organizing rules or structures that give rise to coordinated behavior serving a functional purpose.

In the current paper we build on SCD approaches, by modeling the results from large-scale agent-based systems as a function of attractor dynamics. Our chosen technique utilizes mixture modeling in conjunction with topological equations to represent attractor dynamics. This approach is particularly attractive in that topological representations of phase space can yield more qualitative information in comparison to other time series approaches (Strogatz, 2014), generating a more complete picture of the underlying system dynamics. Specifically, we examine a series of agent-based examples, and model each set of time series as a function of their changes through time. We show how sets of linear equations can depict the higher order emergent patterns in ways consistent with attractor dynamics. The advantage of this approach is that we are able to quantify the self-organized dynamics that agents exhibit, link these dynamics back to activity from individual agents, and relate it to other variables central to understanding the coordinative functionality of a system's behavior. Our goal is to exemplify the strategy. In all, we present four examples that differ in the number of variables used to depict the attractor dynamics (i.e., the dimensionality of the systems) and range from simulated to non-simulated data sources.

# Attractor Dynamics

In dynamical systems theory, the concern is often placed on what states a system is drawn toward, or away from, as it changes over time (e.g., Richardson et al., 2014). This epitomizes the notion of attractor dynamics. Attractor dynamics are merely a mathematical way of expressing repetitious behavior in the face of constant disruptions to those repetitions. The constant disruptions are inherently part of the system in that open systems are dissipative and function far from equilibrium to maintain patterns (Prigogene and Stangers, 1984). By only examining a portion of a system, as is common in empirical research, the unexamined variables are treated as constant disruptions or perturbations to those patterns. These repetitious behaviors describe the most probable system states and their ability to remain in these states (while facing perturbations) conveys the inherent stability of those states. These attractor dynamics can then be modeled using differential/difference equations, allowing for exploration of the inferred dynamics and theorizing the manifolds in which the system functions (Differential equations are based on idealized models for when change in time approaches zero while difference equations estimate models using the observed discrete differences; Butner et al., 2015).

Assuming a system exhibits stability, the emergence of a limited set of patterns, which can be described in terms of topological features, are plausible. These topological features can be described using map analogies, because there is a strong tie between topology and maps. In fact, differential topology is the math behind maps. Traditionally, topographical maps convey elevation of a landscape. But, the notion of topologies can also be applied as a graphical representation of how data are changing over time.

To ease the interpretation of differential topology, we will temporarily link movement on maps to different topological features. This interpretation is directly relevant to several of the examples (although more general definitions are extant; Butner et al., 2015). An Attractor is when the agents are drawn toward a particular coordinate over time or a particular directional heading. This is akin to a topographical valley. A Repeller is a coordinate that agents move away from. These would be reflected topographically as a mountain peak. A Saddle occurs when agents are attracted in one dimension and repelled in another. It is analogous to a topographical ridgeline because it can separate different patterns such as two attractors (Abraham and Shaw, 1983; Butner et al., 2015). A Cycle corresponds to a push/pull of two dimensions on one another. Combined versions of patterns

described can also be observed such as spiral attractors where there are circling movements for how agents converge toward an attractor. Saddles and cycles require at least two dimensions and thus will only be possible in the later examples (not merely with heading as it is a one dimensional example).

To continue with the link to maps, we will begin with agentbased models that function spatially. As a simplification, we can reduce their behavior to movement along an X and Y axis or merely the directional heading of agents (when we only require a single dimension to depict the system). We can then model the simultaneous change of these variables over time. In this way, we capture the movement of many agents and can characterize them with attractor descriptions. With this information it is possible to examine and identify patterns of change for the overall system using the particular topological features defined above to describe how multiple agents are moving over time (Butner et al., 2015). It is in these terms that we gain an understanding of the emergent, coordination of many agents.

As a beginning example, consider a Flocking agent based model (Wilensky, 1998) in NetLogo v5.2.1 (Wilensky, 1999) designed to emulate the self-organized behavior of how flocks of birds might come to match one another's movements creating complex group behavior. Agents start with a random heading and constant velocity in a wrapped environment (makes a torus). The heading for each agent is determined by three rules: (1) alignment states that each agent tends to turn to be moving in the same direction as nearby agents; (2) separation states that each agent will turn to avoid an agent when it gets too close; and (3) cohesion states that agents tend to move toward other agents. As the agents "fly" through the two dimensional environment they update their headings over time. **Figure 1** shows time series of the headings for all (300) agents simultaneously over one thousand iterations. It is clear that early in the simulation the full range of headings are observed yet, in later times the range of headings become more restricted and shared by the agents. This is an example of the emergent coordination that occurs within the Flocking model.

To depict these results topologically requires identifying the underlying map in which the agents are interacting. An attractor, in this case, would be the heading(s) in which the agents move toward and the stability would be the resistance exhibited in the system when an agent begins to diverge from this attractive heading and pulled back thusly. The map is not one of actual hills and valleys, but instead the resultant decrease in heading directions, from the emergent self-organization between agents. Thus, the trajectories for agents imply an underlying pattern that we can infer. One assumption of dynamical systems theory is that there is one—or perhaps multiple—underlying patterns emergent from the interactions of individual agents over time. Interactions result in a consistent pattern, that the system flexibly returns to when interactions or outside forces briefly move the system away from its primary pattern. This notion of consistency in the face of perturbations is stability. While it is easy to observe the convergence of heading amongst the agents in **Figure 1**, little information can be drawn in regards to the number of underlying patterns and their inherent stabilities.

The flocking example is a useful one in that the implied map is not a map of X and Y coordinates, but one of heading—it is a one

dimensional map. One dimensional maps are not very interesting to draw; they are a line showing where the data converges over time. That is, attractor dynamics are time implicit models rather than time explicit ones and thus, are akin to collapsing the X axis in **Figure 1**, while adding in notions of where each agent goes next to determine the map.

Dynamical systems theory has long provided the theoretical framework and terminology for describing multi-agent selforganized patterning. Returning to **Figure 1**, an apt depiction of the Flocking simulation is one that begins with many attractors that cease to exist over time, which produce a limited set of stable attractors. This qualitative description captures the evolving process, without any of the quantitative dynamics. By quantifying them through topological equation representations, we can further differentiate aspects of the system and specify the strength of the attractors. We therefore next cover the step of quantifying the dynamic.

#### A Vector Based Approach

There are several ways to estimate differential topological equations. In all cases, we must first express the data in terms of data vectors rather than values. For the heading data illustrated in **Figure 1**, the data is structured such that two or more points in time are used to define a data vector, known as a time delay or Toeplitz data structure (e.g., Boker and Laurenceau, 2006). The data is structured so that a value at time t and a value at time t+1 are two variables within the model. Further, our models are all estimated in structural equation modeling wherein change was built into the models themselves as latent variables (McArdle, 2009). One can also estimate change directly through a discrete difference or various methods for estimating derivatives and thus while we use structural equations to build our models, this is far from a necessity (Boker et al., 2010).

Different attractor dynamics are then captured through expressions of change predicted by value. For example, Equation (1) expresses the potential dynamics for the headings of the various agents.

$$
\dot{x}\_t = b\_0 + b\_1 x\_t + e\_t \tag{1}
$$

Current heading of a given agent at time t is x, x-dot is an estimate of its derivative with respect to time, b<sup>0</sup> is the intercept, b1, the slope with respect to x and e<sup>t</sup> is error. For clarity, this equation is written in regression form where velocity in heading at each point in time is treated as the criterion and position (current heading) is the predictor. When the slope in Equation 1 is negative, we observe an attractor where the time series are attracted toward a value of −b0/b<sup>1</sup> known as the set point (Butner et al., 2015). A repeller occurs when the slope is positive instead of negative. The strength of attraction/repulsion is defined by the steepness of the slope relative to zero.

Equation (1) is limited in that it can only capture a single topological feature (Butner et al., 2015). While the system may converge to a single heading, this convergence is developed over time. Consistent with the qualitative description of the flocking model, we should observe several patterns that cease to exist as time continues. This results in a much more limited set of dynamic patterns that occur at later times. We therefore expand our approach to allow for multiple sets of Equation (1). We did this through an analytic technique known as mixture modeling.

### Mixture Modeling Methods

Mixture modeling is a taxonomic approach that can be combined with structural equation modeling (Enders, 2006) as an alternative way to capture interactions (Jung and Wickrama, 2008). Non-linear dynamical systems allow for multiple topological patterns by building non-linear transformations, such as the interaction and therefore mixture modeling can be used as a way to capture the different topological features by slicing up the overarching state space under an assumption that each dynamic is locally linear. One description of mixture modeling is as a multiple group analysis (stacked model), where assignment to group is unknown (Muthen, 2001). Multiple group models allow for different parameters across groups. We can extract different equation sets by allowing key parameters to differ across these groups while equating others. Specifically, we allowed the slope coefficients characterizing how position predicted each velocity, the intercepts for the velocity factors, the means for the position factors, the residual variances for the velocity factors, and the variances for the position factors to vary across sets of equations [see Appendix A (Supplementary Material) for an example in Mplus (Muthén and Muthén, 1998–2012)].

As previously described, the sign of the slope coefficients capture the type (e.g., attractor, repeller, limit cycle) and strength of attraction for the dynamic implied by the equations (see also Butner et al., 2015). In addition, the velocity intercepts help determine the set point, or relative position to which the dynamics can be described (e.g., the location of the attractor). Following logic laid out under notions of centering and simple slopes analysis (Cohen et al., 2003), the means and variances for the position factors help depict common trajectories implied by the pattern and thus help identify the basin of attraction. By allowing for variation in these parameters across latent classes, we can infer a number of varying topological features, as opposed to a single feature.

Mixture modeling can be used as a confirmatory or exploratory method. In either case, there must be established criteria for fit. The current preferred methods are through forms of the Bayesian Information Criterion (BIC) or through forms of model testing such as log likelihood or chi-square comparisons to see if the current number of extracted groups improves description of the data beyond the previous number of groups. Specifically, the BIC and sample size adjusted BIC tend to minimize when the proper number of mixture groups has been extracted (Sclove, 1987; Nagin and Tremblay, 1999; Nylund et al., 2007) and both have been used under different circumstances usually relating to the sample size (sample size adjusted BIC is preferred when n < Bauer and Curran, 2003; Lubke and Neale, 2006; Enders and Tofighi, 2007).

Model identification can also be informed by various likelihood ratio tests (LRT), which are used to test relative model fit by testing the null hypothesis that competing models demonstrate comparable fit (Vuong, 1989). Within latent variable models such as the present one, the Vuong-Lo-Mendell-Rubin test (Lo et al., 2001) is an accepted methodology for testing the equivalence of two associated probability density functions (Henson et al., 2007). Simulation studies have indicated that the VLMR test favors selection of more components when used with small samples, resulting in increased Type I error rates; this suggests the need for an adjusted test (aVLMR) with samples less than 300 (Lo et al., 2001). For our purposes, we chose to rely on the BIC.

Note that our data had an inherent dependency—the nesting of multiple measures through time within each agent. Ignoring a data dependency is known to produce biased standard errors with large alpha inflation as the common result (Cohen et al., 2003). However, current mixture modeling practices that incorporate methods for accounting for the dependency preclude any descriptions of predictors. In this case, that would result in the loss of the means and variances for the position factors that detail key information about the basins of attraction. We therefore chose to temporarily ignore the dependency, recognizing that the standard errors for each coefficient may be biased toward Type 1 errors.

To better understand the extracted equation groups, we saved out the posterior probabilities for each data vector. This is the probability that each instance in time for a given agent belonged to one of the classes characterized by a particular equation set where the set of posteriors for a given vector sum to one. It is the equivalent of factor scores if mixture groups as likened to a categorical latent variable. The value of the posterior probabilities is that they allow us to specifically link each agent to the various attractor dynamics at each point in time. Through the combination of the description of each attractor dynamic and the posterior probabilities linking the agents to the topologies, we are able to traverse between the observed vectors from the agents to the underlying topology.

# One Dimensional Systems

What follows is an illustration of the analytic strategy for the flocking example using the headings from all agents. Fit indices of the 300 flocking agents over 1,000 iterations resulted in sixteen unique attractors (as indicated by the BIC at its lowest value). **Table 1** contains the estimated parameters for each of the sixteen equations. All sixteen patterns are attractors as indicated by the negative slopes. They vary in their stability, indicated by the range of slopes. The headings to which each pattern indicates a point of attraction is identified by converting the intercepts and slopes into the set point (−b0/b1). In essence, the flock example is characterized by a total of sixteen unique attractors.

We can link the attractors back to the individual agents through the posterior probabilities. For purposes of relating to the initial assessment of the many unique patterns dying off, we chose to illustrate the average posterior probabilities (the average likelihood a given agent is depicted by a given attractor) as a function of time. **Figure 2** shows the average posterior probabilities for each attractor dynamic. The legend shows the heading attracted to (set point) and level of attraction (slope) as a function of time. Consistent with **Figure 1** (and expectations), initially there were many attractors, but somewhere around iteration 300, two specific attractors started to dominate (dotted lines in **Figure 2**).

Notice that they share the same heading of 273 degrees, but with slightly different degrees of attraction. Recall the three rules that constitute the changes in heading over time: alignment, separation, and cohesion. Alignment and cohesion drive the agents toward a single heading, but separation instead evokes divergence when agents become too close (and specifically overrides the other two rules). What distinguishes the patterns is not the heading they are drawn toward, but in the divergences themselves due to separation that produces a weaker attractor. Note that agents can be switching between the two attractors over time moving to the slightly weaker attractor, as they need to avoid collisions.

We gain additional information from the quantitative attractor dynamic description as illustrated in **Figure 2** when compared to **Figure 1**. Each data vector is now depicted not only in terms of its vector, but also the likely attractor in which it is drawn (through the posterior probabilities). Further, the description is now in terms of the underlying system forces that depict the type of pattern (all attractors since all the slopes were negative), the location to which the patterns are relative (the set points), and their stability under perturbations (the deviation of the slopes from zero). However, thinking topologically becomes even more beneficial as we move toward systems with more dimensions.

# Two Dimensional Systems

Modeling a two dimensional system can be captured through two simultaneous equations.

$$
\dot{\mathbf{x}}\_t = b\_0 + b\_1 \mathbf{x}\_t + b\_2 \mathbf{y}\_t + \mathbf{e}\_{\mathbf{x}t} \tag{2}
$$

$$
\dot{y}\_t = b\_3 + b\_4 \varkappa\_t + b\_5 \dot{\wp}\_t + \varrho\_{yt} \tag{3}
$$

TABLE 1 | Unstandardized coefficients from the sixteen attractor solution for the Flocking model of headings.


Patterns ordered by slope deviation from zero (to match Figure 2). Italicized patterns marked with an \* match dotted patterns in Figure 2.

These equations represent two variables measured simultaneously (x and y) at time t, x-dot and y-dot are their estimated derivatives at time t, b<sup>0</sup> and b<sup>3</sup> are intercepts, b<sup>1</sup> and b<sup>5</sup> are each variable predicting its own derivative, b<sup>2</sup> and b<sup>4</sup> are crossover or coupling relationships, and ext and eyt are errors in equation. Notice that Equation (2) is identical to Equation (1) with the addition of the other changing variable also predicting velocity in x (or x predicting velocity in y). By having both variables changing simultaneously, we generate a two dimensional depiction. The emergent dynamic (attractor, repeller, etc.) is a function of all the b coefficients in the equations (Gottman et al., 2002). Common interpretation is that the own effects (i.e., x predicting change in x and y predicting change in y) depict the stability properties of the dynamic pattern (attractor, repeller, or saddle) such that negative coefficients are indicative of attractive behavior and positive coefficients are indicative of repulsive behavior in the respective dimensions. The crossover relationships (also known as coupling effects) are commonly interpreted to represent the push-pull of variables that constitute cycles and swirling qualities graphically. The set point is a function of both equations. And as noted earlier, two-dimensional systems can include saddles and cycles, which are topological features that are not possible in one-dimensional systems.

While many cases can be interpreted as described in the previous paragraph, some cases do not always conform to the conventional interpretations (and we include some examples of this below). A common violation relates to the notion of collinearity. If all variation in both x and y perfectly map onto one another, then x and y are essentially a single dimension. Under this circumstance the coefficients can be misrepresentative of the dynamic pattern. In our spatial movement circumstance, agents will sometimes capitalize on diagonal movement as a

primary, singular dimension. Assessment of the eigenvalues and eigenvectors of the coefficients (treated as a Jacobian matrix of partial derivatives for estimating local Lyapunov exponents; Arabanel et al., 1992) is a method for verifying and determining whether to follow the classic interpretation or whether the interpretation should be modified.

### The Ants Model

Consider the Ants model (Wilensky, 1997) in NetLogo v5.2.1 (Wilensky, 1999). This agent-based model was designed to simulate ant colony foraging behavior. The simulation consists of 125 ants each with the same instructions, starting at a nest in the center of a two-dimensional space. Ants are released one at a time from the nest, moving at a constant velocity. Three food sources are placed within the two-dimensional space each with a finite quantity of food supply. The ants search the environment for food (following a random direction algorithm) and upon locating and collecting food, return it to the nest. The primary mechanism for the emergent foraging behavior involves the ants releasing digital pheromones while carrying food and that the ants are attracted to this pheromone. This is much like how stigmergy, a form of environmental modification by individual social animals that affords collective coordination, is proposed to work in live ant populations (Theraulaz and Bonabeau, 1999). The nest also releases a pheromone signal so that the ants can find the nest. The simulation allows for the manipulation of the evaporation and diffusion rates of the pheromones, which we left at default settings. **Figure 3** shows the standard placement of food sources in the environment in relation to the nest at the center.

From visual inspection, several emergent colony behaviors can be observed. Ants will search the environment until a critical threshold of ants find a given food source. At this point the ants

FIGURE 3 | Screenshot of nest and food placement of the Ants model from Netlogo.

will form a trail between the food source and the nest. There are sometimes congestion-like behaviors that occur in the middle of the trail or near the nest as more ants converge toward the strongest pheromone locales. Once the food source is used up, the ants once again spread out into a search pattern until a new food source is found. In this case, we will depict the attractor dynamics of the ant movement in two dimensions as a way to characterize the different ant behavioral patterns.

We extracted the horizontal (x) and vertical (y) coordinate position of every ant from the beginning of the simulation until the last food pile was fully exhausted, totaling 1,080 iterations. **Figure 4** is a kernel density plot of the ant positions, collapsed across all ants and all iterations. This shows the regions where ants spent most of their time and can be thought of as the probability density function of the data (under the assumption of two dimensions)—a graphical illustration of the integral of the dynamics. The density plot is read in the same fashion as a topographical map, where the lines illustrate more density. Note that the greatest density is at the nest (0,0). This was likely a function of all the ants starting at the nest, including the dispersion algorithm of only a single ant leaving the nest per iteration. It is also a function of all the ants returning to the nest to deliver food. Each branch of the density plot corresponds to one of the food sources, consistent with a trail between the nest and the food source. The densest part for each of the branches was, however, closer to the nest than the food source.

**Figure 5** contains trails of three exemplar ants as vector plots in time to help illustrate the link between individual agents and the model estimated from all agents. **Figure 5A** shows the trail of an ant that helped collect food from all of the food piles. However, it also shows searching behavior in some of the areas of the world where food did not reside. **Figure 5B** illustrates an ant that only helped collect food from two piles and also participated in searching behavior in empty quadrants of the world. **Figure 5C** shows an ant that participated minimally in food collection instead spending more time searching. As a whole, these illustrate that the emergent behavior is not from any one ant. Instead, it is through their interactions with one another (through pheromones) and the environment (food resources relative to the nest) that their behavior becomes emergently coordinated.

Our mixture model identified a total of 7 different patterns in the example ant model (minimized BIC at 7 groups). **Table 2** shows all the coefficients for the seven different patterns, labeled by their colors from **Figure 6**. The last two columns are the eigenvalues wherein we built matrices of the own and coupling effects in the same order as Equations (2) and (3) (First row: own predicting x, coup predicting x and second row: coup predicting y, own predicting y). The eigenvalue procedure allows us to account for when the coefficients do not directly represent the type of attractor dynamic due to the primary axes for the dynamic depictions being different from the variables used in the equations. When the eigenvalues are both real numbers and negative, the system depicts an attractor. When the eigenvalues are both real and positive, the system depicts a repeller. When one is positive and one is negative, the system depicts a saddle. Imaginary numbers instead depict cyclic behavior with complex numbers being a combination of cyclic and attractive/repulsive at the same time (Abraham and Shaw, 1983).

**Figure 6** is a topographical representation of the seven attractor dynamics patterns emergent in the ant behavior. **Figure 6** was generated by using the estimated equations from the mixture model in conjunction with the adaptive Runge-Kutta algorithm from the deSolve package (Soetaert et al., 2010) in R (R Core Team, 2016) to estimate example trajectories iterated over time. In each case, values were chosen using the position means and variances extrapolating in all possible combinations of one standard deviation in X and Y and iterating the trajectories forward in time. Details on each pattern follow.

The blue, brown, and green patterns correspond to the food piles while the red pattern corresponds to the ant nest. The yellow pattern corresponds to searching an area where no food existed. The light blue captured the pattern of the ants converging in the middle of the trail as the pheromones were most intense there and the purple captured the dispersal after the food pile in the upper left had been fully collected (it was the first pile found in the simulation).

Notice how each pattern is captured through a different attractor dynamic. For example, the red nest pattern shows a repeller in which ants leave the location. If we capture each ant trail of food collection through the other patterns, then what primarily remains is the initial leaving from the nest. The blue and brown patterns, both corresponding to food piles, show cyclic properties (they have imaginary components to their eigenvalues). This is capturing the pattern of getting the food from the pile, bringing it to the nest and returning. The pattern corresponding to the lower left food pile was a saddle, however attractive in one dimension and repulsive in the other. By having

the set point far from the dynamic pattern, the saddle generated curved trails that could then be completed by feeding into other, already established, patterns.

Now, we link the agents to these patterns and to key system descriptions—in this case food depletion. **Figure 7** shows the decline in the food piles as a function of time. Notably, the ants found the pile in the upper left first, followed by the lower left and then finally the middle right. We ran seven multilevel models treating the posterior probabilities of each pattern as the outcome as a function of the proportion of food remaining in each pile (a three predictor MLM). The fixed and random effects along with intraclass correlations (ICC) are in **Table 3**. All random effects were significantly non-zero suggesting that there was variability in their likely pattern as a function of the remaining food piles among the individual ants. The fixed effects can be interpreted as whether or not the likelihood of being in a pattern occurred where a positive sign meant that declines in a food pile corresponded to declines in the pattern and a negative sign meaning that declines in a food pile corresponded to increases in the pattern. Given the order of the food pile depletions, the pattern of effects can also roughly determine when the pattern was more prevalent.

The red pattern at the nest was likely when all the food sources were untouched and declined in likelihood as all the food piles declined, consistent with the ants initially leaving the nest to search. The blue, light blue, and purple patterns all associated with the upper left quadrant were all less likely when the last food pile was untouched, but only the purple (the theoretical

dispersion after the food pile was depleted) was contingent upon the corresponding upper left food pile. The negative sign was indicative that declines in the first pile increased the likelihood of the purple dispersion pattern consistent with leaving the trail to find another food source once the food in the first pile was depleted. The green (corresponding to the lower left food pile) and brown (corresponding to the middle right food pile) patterns were predicted by all three food piles with negative coefficients suggesting that as any food depleted, these became more likely consistent with these food piles being found later. Finally, the yellow pattern was only uniquely predicted by the middle right food pile depletion such that as the food pile declined, so did the likelihood of being in the search pattern. Given that as more ants found the last food pile, more converged on it. Once it depletes, however, fewer ants would be in this search pattern.

### Baboons Navigation Data

So far, we have relied on simulations to illustrate how one can depict higher order emergent coordination for agent interactions using attractor dynamics. Our next two examples are derived

Butner et al. Attractor Dynamics of Multi-Agent Self-Organization

group solution along with eigenvalues for the Ants model. Pattern Own Coupling Intercept Eigenvalues Blue X −0.009 (0.002) 0.006 (0.002) −0.361 (0.064) −0.011 + 0.007i Y −0.013 (0.001) −0.009 (0.002) 0.096 (0.064) −0.011 − 0.007i Light Blue X −0.020 (0.003) 0.018 (0.004) −0.467 (0.075) −0.040 Y −0.023 (0.004) 0.020 (0.004) 0.522 (0.083) −0.002 Purple X −0.014 (0.003) 0.014 (0.004) −0.146 (0.035) −0.028 Y −0.001 (0.005) 0.027 (0.004) 0.107 (0.049) 0.013 Yellow X 0.005 (0.001) 0.018 (0.003) −0.645 (0.071) −0.011 Y −0.017 (0.003) −0.005 (0.001) 0.605 (0.085) −0.000 Green X −0.003 (0.001) 0.002 (0.001) −0.049 (0.014) −0.001 Y 0.003 (0.001) −0.004 (0.001) 0.002 (0.014) 0.001 Brown X 0.000 (0.001) 0.001 (0.000) 0.022 (0.019) 0+0.002i Y 0.000 (0.000) −0.003 (0.001) 0.046 (0.017) 0−0.002i Red X 0.000 (0.001) 0.001 (0.002) 0.011 (0.004) 0.019 Y 0.019 (0.002) −0.006 (0.001) 0.036 (0.004) 0.000

TABLE 2 | Unstandardized coefficients (and standard errors) for the seven

from observed data. **Figure 8** represents a solution from global positioning system (GPS) data collected from a troop of baboons at the De Hoop Nature Reserve in South Africa. **Table 4** contains the coefficients and eigenvalues, again using colors to indicate correspondence. To collect this data, researchers recorded the positions of 14 adult baboons by holding a GPS device over or very close to each animal at different points over a 74 day period (data was made available by Bonnell et al., 2016; and further details of the original study can be found at Bonnell et al., 2017). Consistent, with the ants data, this example data is in an x/y coordinate space, but now in longitude and latitude. To facilitate estimation due to variability occurring in small decimal places, longitude and latitude were mean-centered and multiplied by 1,000.

**Figure 8** illustrates that several of the patterns show cyclic behaviors. In fact, all the eigenvalues were negative with 7 of the 10 showing imaginary eigenvalues consistent with cyclic behaviors. Further, all patterns had at least one negative real eigenvalue suggesting that they all were attractive indicating a pattern of convergence for baboons. **Figure 8** clearly shows that the patterns were not equally attractive, however, in that vector length differed dramatically when example trajectories were estimated. This can also be seen by the size of the eigenvalues where some were quite close to zero in their real number portion(s) while others were much smaller numbers approaching and surpassing negative one. Thus, some of these patterns were more stable clusters for the baboons while others were more loose associations around the shared longitude/latitude set point.

In their original work, Bonnell et al. (2017) evaluated whether the movement patterns of a focal individual baboon was influenced by the location of the troop as a collective or by the locations of specific influential members of the troop. Ultimately, their results showed evidence for both of these patterns. In some cases, the focal baboon's movement was highly influenced by the average movement location of the entire troop. In other cases, the focal baboon's movement was quite

FIGURE 6 | Topographical illustration of the seven equation solution for the Ant simulation.

sensitive to the movements of the alpha female (F1) and the alpha male (M1). To link back to individual baboons, our results suggest a consistent pattern as illustrated in **Figure 9** wherein we show the average posterior probabilities for each baboon illustrating which pattern would arguably influence a given baboon the majority of the time (again, colors correspond). Few distinctions existed between the female and dominant male baboons showing preference for the light green (cyclic attractor)


TABLE 3 | Unstandardized coefficients (and standard errors) and intraclass correlations from multilevel models predicting the posterior probabilities of being in each of the seven groups for the Ants models.

\*Denotes p < 0.05.

and yellow (attractor) patterns. The dominant male (M1) showed slightly more preference for the magenta pattern (also an attractor). Thus, there is evidence of following the primary male baboon, but also one of a female majority. And yet in both cases these most common patterns represent the least attractive patterns (eigenvalues closest to zero) in that there is lots of wandering in comparison to the other patterns inferred from the GPS data.

# Beyond Two Dimensions

As we move beyond two dimensions, it is difficult to make easy to read and meaningful maps of the data. However, our approach is not limited to two dimensions. By relying on TABLE 4 | Unstandardized coefficients (standard errors) and eigenvalues for the 10 pattern solution from the Baboon GPS data.


the eigenvalues presented earlier, one can derive the higher order patterns to illustrate what is occurring without a means to draw them. Further, it also allows us to point out that any variables can be captured as attractor dynamics—they do not inherently need to be spatial, as illustrated by our next example.

Each new dimension corresponds to an additional equation. In the six-dimensional case that follows, we model six simultaneous equations where change in each variable is treated as the outcome from each equation. Further each variable at a given point in time is allowed to freely predict the changes in each equation. The matrix used to generate the eigenvalues is based on the coefficients where, as before, the main diagonal are the own effects and the off diagonals are the coupling relationships. Each matrix row corresponds to a different equation.

#### Affect in Families Data

To show a non-spatial example with more than 2 simultaneous change equations, we modeled positive and negative affect from the PANAS (Watson et al., 1988) taken from mothers, fathers, and adolescents from 252 families where the adolescent has type 1 diabetes. The data are taken from the Adolescents with Diabetes and Parents Together study where each family member completed a daily diary for 14 days (further study details can be found at Berg et al., 2009).

We extracted two stable patterns (three patterns would not properly converge and fit indices supported the two pattern solution). **Table 5** provides the estimated coefficients. Notably, the eigenvalues were quite different between the two patterns. The first pattern generates all negative eigenvalues indicating that it forms one large six dimensional attractor (−0.709, −0.522, −0.455, −0.428, −0.285, −0.257). The second pattern, on the other hand had complex numbers for the first two eigenvalues suggesting cyclic behavior as a primary component (−0.613+0.027i, −0.613 −0.027i, −0.420, −0.343, −0.180, −0.157).

Though we cannot draw a map to represent this higher order pattern, one way to represent the changes in the system is through a network diagram. **Figures 10A,B** shows only significant (alpha = 0.05, two-tailed) pathways between affect variables. The beginning of an arrow is value and the end of an arrow is change. Blue arrows represent negative relationships and brown ones are positive. Note between **Figures 10A,B** the connections between individuals breaks down substantially with the cyclic nature relating to the less connected network. The most noteworthy is the changing connections of father's affect to the mother and adolescent. It is noteworthy that these coefficients merely indicate prediction and thus any interpretation of causality would overstate the relationship. That said, fathers were clearly showing less connection in the second pattern.

To link back to individual families, we built a multilevel model predicting the posterior probabilities for the first pattern as a function of diabetes risk for the adolescent on a given day. We use the variable risk as an easy to interpret indicator as to how well the adolescent was managing their diabetes on a given day. Risk is a rescaled version of daily blood glucose variability and level such that zero indicates perfect maintenance at doctor recommended levels and 100 indicates either going too high or too low repeatedly (both of which can be quite dangerous; see Kovatchev et al., 2006). Since posterior probabilities for a given data vector add to one, high probabilities of being in the first pattern inherently implies a low probability of being in the second. **Table 6** contains the coefficients. At zero risk on a given day, families were equally likely to be in each pattern (the intercept is the posterior probability when risk was zero). As risk increased, however, families were more likely to fall into the second pattern. That is, on good days we see the more connected attractor pattern and on bad days the father appears less connected and the family affect adopts a cyclic attraction pattern instead.

# DISCUSSION

Kelso (2009) posited that SCD "unites the spontaneous, selforganizing nature of coordination and the obviously directed, agent-like properties characteristic of animate nature into a single framework" (p. 1540). This logic matches with self-organization from agent-based models, and cases where many agents engage in social coordination, more generally. By connecting attractor dynamics modeling with cases where there are a range of agents and a range of outcomes allows for a generalized approach to quantifying the emergent patterns.

Through various examples, we illustrated that the attractor dynamics can be captured using a combination of difference/differential equation modeling and mixture modeling. Further, we showed that these attractor patterns and their occurrence could be linked with different outcomes. For the flocking model, we found sixteen attractor patterns of the agents' heading that converged on fewer attractors over time. For the ants model, we found seven dynamic patterns to depict their motion in a two-dimensional x/y space that roughly corresponded to qualitative depictions of rules the ants follow. For the baboon navigation data, we found ten patterns in two-dimensional longitudinal and latitudinal space in which the probability of exhibiting a particular attractor was contingent upon influential baboons in the troop (e.g., an alpha male). For families where an adolescent has type 1 diabetes, we found two patterns in a six dimensional affect space that corresponded to


TABLE 5 | Unstandardized coefficients (and standard errors) from the two pattern solution for the Affect Daily Diary.

In the form of matrices used to estimate eigenvalues. Table was rounded to second decimal for space. Rows are changes (∆) in Mother (M), Father (F), and Adolescent (A).

higher and lower levels of risk from the disease. By using the data from all the agents, the underlying topology is inclusive of all the agents. In the ants model, for example, not all ants illustrated being influenced by every pattern. Instead, ants can exist in a single pattern their entire time or move between them. Thus, the underlying map implied by the set of dynamic patterns generates an inclusive generalization both within and between agents that capitalizes on the most probable systems states over the duration of the observation period.

In each circumstance, the technique depicts the topological feature in terms of the implied patterns and the stability of those patterns. Whereas, the flocking model only contained attractors that varied in their set points (attractive headings) and their stabilities, the ants model illustrated all the common possible attractor dynamic patterns including attractors, repellers, saddles, and cycles.

The complexity of the underlying pattern is directly related to the number of dimensions. With a single dimension, attractor dynamics may only convey attractors and/or repellers. With two dimensions, cycles and saddles can be inferred. Beyond two dimensions, chaotic (strange) attractors are possible, though all currently known chaotic attractors require non-linear equation forms and the equations herein were restricted to linearity within each equation group. Thus, this is a limitation of the technique provided.

In each case, we then linked the quantification back to the individual agents. Through mixture modeling we did this by outputting the posterior probabilities. These probabilities are the

TABLE 6 | Unstandardized coefficients (and standard errors) from multilevel model predicting the posterior probability of the first pattern as a function of Diabetes risk.


\*Denotes p < 0.05.

probability that a given data vector is under the influence of a given dynamic pattern, the probabilities for a given vector sum to one across all the possible patterns. Therefore, these probabilities maintain the data dependency we inherently ignored in the estimation for the dynamic patterns themselves. We therefore always either examined the probabilities at a collapsed agent level (e.g., averages) or through multilevel modeling wherein the dependencies could be properly taken into account. In each case, it could be linked to possible variables of interest used to depict the system. For the headings, this was illustrated with time in that attractors should collapse as time goes on. For the ants model, this was illustrated through food supply. For the baboons, this was illustrated through the location of the alpha male and the females. For affect in families where the adolescent had type 1 diabetes, it was illustrated with the diabetes risk exhibited that day. In all, this allows one to link the higher order patterns back to meaningful outcomes that characterize when agents behave in certain ways or exhibit theoretically important states.

In the spatial examples, we utilized variables that depicted the spatial movement. As an initial foray into understanding attractor dynamics, thinking spatially helps make the concepts more intuitive. But, ultimately, these concepts can be applied in many contexts where relationships are not inherently spatial. Being able to think about the spatial analogs helps ground what is being observed, but does not inherently limit the domains in which attractor dynamics can be examined.

Further, individual equation parameters do not always align with the system depiction graphically or through the eigenvalue procedure. In the ants example, this had to do with the reliance on diagonal movement of the ants. By depicting the system through an equation of x and an equation of y, we mask diagonal movement—it is really a straightforward combination of the two dimensions rather than showing some independence. More generally, the coefficients are under an assumption that the dimensions chosen are the primary dimensions for depicting the changes occurring in the system. The eigenvalue procedure bypasses this assumption by instead capitalizing on dimensions that maximize the strength of the attractor dynamics.

Once we moved beyond two dimensions, the eigenvalue procedure becomes even more valuable. There is no easy way to graphically "see" the implied dynamic, but the sign and distinctions between real and imaginary portions elucidate the attractor pattern. In practice, anytime we model a system with two or more equations we should adopt the eigenvalue procedure as a means to understand the higher order pattern in addition to any interpretations applied to the individual coefficients themselves. For example, it is common to interpret coupling coefficients as the push/pull of one variable upon another. However, this fails to capture what pattern the pushpull creates as their interpretation is under an assumption that we somehow picked ideal dimensions to represent them. Locally, the coefficients maintain their meaning, but we cannot extrapolate the more global pattern of which they are a part.

In regards to equation identification, the technique is not without its limitations. The choice of slicing up the data into a series of locally linear equations is an imperfect method for capturing non-linear dynamic models. Specifically, non-linear dynamic models can have both multistability in which more than one pattern is stable simultaneously and cases where variables differentiate when one pattern is or is not accessible. By slicing up the data into a series of locally linear equations through mixture modeling, these two circumstances are difficult to distinguish. One can begin to distinguish these circumstances by attempting to predict the posterior probabilities. However, ultimately multistability is distinguished by states being probable despite nothing differentiating them (or when the dimensions being examined are all that differentiate them). That is, multistability would occur under a lack of being able to predict differences of when agents would be in one or the other. Thus, this approach provides a limited potential for knowing when multistability exists as opposed to having some variable differentiate them. We may never examine the "right" variable or are instead in the situation of arguing a null finding to support the multistable case.

In contrast, it is possible through a cusp catastrophe model in conjunction with multilevel modeling, for example, to allow for differentiating variables (also known as control parameters) without their identification (Butner et al., 2014b), though knowing which scenario you are observing requires examination of many more qualities than discussed herein (Gilmore, 1981). Further, manifolds (the surfaces implied by topological equations) are smooth, while the mixture modeling approach is more patchwork. We do not know the reach of a given attractor dynamic—we chose to represent each dynamic through one standard deviation in each direction from the means when we utilized the Runge-Kutta algorithm to graph plausible trajectories. Notably the means and standard deviations are specific to each dynamic pattern (allowing some to be large and others to be smaller). However, the boundaries of one pattern to another are truly unknown, requiring some inference.

Notably, SCD has tended to rely on cyclical descriptions to model the rhythmic coordination of social agents. While the modeling approach illustrated herein allows for cycles, it does not assume their existence. The direct equation link is that SCD generally functions with second order equations where the second derivatives (acceleration or change in velocity) are treated as the outcomes. Within our structural equation model, it would be analogous to building a quadratic growth model on Toeplitz data where the quadratic growth latent variable would be the second derivative predicted by the other two latent variables (Butner and Story, 2010). Moving to a second order model automatically implies two dimensions and thus generates cycles. However, it is not without a cost. Specifically, second order modeling in this form assumes that the set point of the cycles must equal zero. Overcoming this assumption is currently something under consideration for modeling dynamic patterns and once resolved will unite these approaches more generally.

# CONCLUSION

Understanding how large-scale, multi-agent social systems coordinate is challenging and complex. In part, the challenge is due to the fact that there are so many agents, system components, and potential system states that can become coordinated; all of which may change over time (Van Orden et al., 2003). These many components interact generating higher order system behavior that is emergent and dynamic. However, knowing the "Dynamics demystifies...emergence" and it can also provide "basic laws for a quantitative description of phenomena that are observed" (Kelso, 2009; p. 1540). As such, we have expanded on work in SCD by demonstrating the utility of modeling the attractor dynamics of several systems to characterize their higherorder behavioral patterns and showed how these patterns varied over time and could be linked to meaningful aspects of the systems.

Within domains, such as agent based modeling, qualitative depictions of higher-order patterns are often known, but not quantitatively modeled. In SCD, phenomena can be nonrhythmic, and yet dynamically coordinated. They can exhibit stability and multistability. Thus, using attractor dynamic descriptions along with statistical innovations, such as mixture modeling, provide a reasonable solution to understanding the large-scale, multi-agent social coordination. Characterizing the higher order properties of the system in this way forms a foundation for examining the emergent patterns through time in either a confirmatory or exploratory manner. This same technique, as we have shown, can be utilized with simulated as well as observational data.

It is our aim that we recognize that we study systems that are inherently open systems (even though simulations are often closed). By examining part of the system (the variables we measure), unobserved aspects of the system function

# REFERENCES


as perturbations to the system. Thus, a system depicting families is open because we are only examining some of the variables involved. To understand how agents exhibit emergent self-organization and coordination, we have advanced a general quantification that can be applied to a range of social systems, such as two individuals that form a couple up to a crowd's behavior. We hope that the widely applicable techniques will be adopted to advance scientific understanding of SCD.

# ETHICS STATEMENT

The Adolescents with Diabetes and Parents Together project was conducted in accordance with the recommendations of the University of Utah Institutional Review Board. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the University of Utah Institutional Review Board.

# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

# ACKNOWLEDGMENTS

This study was partially supported by grant R01 DK063044- 01A1 from the National Institute of Diabetes and Digestive Kidney Diseases awarded to Deborah Wiebe (PI) and Cynthia Berg (co-PI). We also would like the thank Tyler Bonnell for making the baboon data publicly available and for answering our inquiries regarding the data set.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2017.00380/full#supplementary-material

Modeling Human Dynamics: An Interdisciplinary Dialogue, eds S.-M. Chow, and E. Ferrar (Boca Raton, FL: Taylor and Francis), 161–178.


a Monte Carlo simulation study. Struct. Equation Model. 14, 535–569. doi: 10.1080/10705510701575396


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Butner, Wiltshire and Munion. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Sensorimotor Coarticulation in the Execution and Recognition of Intentional Actions

#### Francesco Donnarumma<sup>1</sup> , Haris Dindo<sup>2</sup> and Giovanni Pezzulo<sup>1</sup> \*

*1 Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy, <sup>2</sup> Computer Science Engineering, University of Palermo, Palermo, Italy*

Humans excel at recognizing (or inferring) another's distal intentions, and recent experiments suggest that this may be possible using only subtle kinematic cues elicited during early phases of movement. Still, the cognitive and computational mechanisms underlying the recognition of intentional (sequential) actions are incompletely known and it is unclear whether kinematic cues alone are sufficient for this task, or if it instead requires additional mechanisms (e.g., prior information) that may be more difficult to fully characterize in empirical studies. Here we present a computationally-guided analysis of the execution and recognition of intentional actions that is rooted in theories of motor control and the coarticulation of sequential actions. In our simulations, when a performer agent coarticulates two successive actions in an action sequence (e.g., "reach-to-grasp" a bottle and "grasp-to-pour"), he automatically produces kinematic cues that an observer agent can reliably use to recognize the performer's intention early on, during the execution of the first part of the sequence. This analysis lends computational-level support for the idea that kinematic cues may be sufficiently informative for early intention recognition. Furthermore, it suggests that the social benefits of coarticulation may be a byproduct of a fundamental imperative to optimize sequential actions. Finally, we discuss possible ways a performer agent may combine automatic (coarticulation) and strategic (signaling) ways to facilitate, or hinder, an observer's action recognition processes.

#### Edited by:

*Joanna Raczaszek-Leonardi, University of Warsaw, Poland*

#### Reviewed by:

*Cristina Becchio, University of Turin, Italy Carol A. Fowler, Retired, USA*

#### \*Correspondence:

*Giovanni Pezzulo giovanni.pezzulo@istc.cnr.it*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *29 October 2016* Accepted: *07 February 2017* Published: *23 February 2017*

#### Citation:

*Donnarumma F, Dindo H and Pezzulo G (2017) Sensorimotor Coarticulation in the Execution and Recognition of Intentional Actions. Front. Psychol. 8:237. doi: 10.3389/fpsyg.2017.00237* Keywords: coarticulation, joint action, action recognition, planning, distal actions, sequential action

# 1. INTRODUCTION

Imagine a football player who is approaching the opponent team's area with the ball. One can define the player's current goal as approaching the area, while his distal intention is passing the ball or shooting. For both his teammates and his opponents, inferring the player's distal intention (not only his current goal) offers an advance opportunity to help or hinder him, highlighting the importance of goal and intention recognition in realistic social interactions, cooperative or competitive. From a computational perspective, another's proximal goals and distal intentions can be considered hidden (i.e., non-observable) cognitive variables that can be inferred based on observables (e.g., the player's behavior) and prior knowledge (e.g., tactics used by the soccer team) (Wolpert et al., 2003). While generally difficult in real-world social settings, goal and intention recognition may be less formidable than commonly believed, because proximal kinematics turn out to be very informative.

A series of studies have shown that humans are surprisingly good at inferring another person's proximal goals or distal intentions, even with apparently little data (Sartori et al., 2009). One recent study reveals that participants who observed grasping movements were able to report accurately whether the to-begrasped object was small or big as early as 80 ms after movement onset, suggesting that action kinematics can be very informative at early perceptual stages (Ansuini et al., 2016). A similar case may be made for the recognition of distal intentions. For example, considering the case in which an agent makes a decision between "grasping a bottle to pour water" vs. "to move the bottle," evidence shows that the agent's decision is already discriminable by the first part of the motor action, i.e., the grasping of the bottle (Sartori et al., 2011). In fact, the way in which the bottle is grasped turns out to be slightly different (e.g., at the level of action kinematics) in the two cases. More in general, many studies show a "tendency to grasp objects differently depending on what one plans to do with the objects" (Rosenbaum et al., 2012), which means that hand preshape can be used as a cue to infer the distal intention. This situation has equivalents in other domains, such as for example linguistics, in which it is widely known that the pronunciation of segments depends on other segments which are close to them, e.g., the next segment (coarticulation, see e.g., Fowler, 1980; Fowler and Saltzman, 1993; Mahr et al., 2015). These subtle changes in the action kinematics provide information about the performer's goals (Sartori et al., 2009; Neal and Kilner, 2010; Manera et al., 2011; Becchio et al., 2012; Naish et al., 2013; Quesque et al., 2013; Ansuini et al., 2015; Lewkowicz et al., 2015; Cavallo et al., 2016). At least in some cases, even subtle cues are detectable and can help observers to infer the performer's distal intentions early on, thus resulting in communicative and not only pragmatic effects.

The informativeness of early kinematic cues may even increase in explicitly cooperative social settings. For example, one study reveals that during the same motor action of placing an object, the deceleration phase was found to be slower when a "giving" action (proximal goal) was directed to another individual than when it was performed without this social constrain (Becchio et al., 2008). A series of other studies have shown that, when engaged in social interactions, co-actors usually signal their intentions and carve their action kinematics in ways that make their action goals easier to discriminate, when there is asymmetric information and the performer agent is more knowledgeable than the observer about the task at hand (Vesper et al., 2010; Pezzulo, 2011; Pezzulo and Dindo, 2013; Pezzulo et al., 2013a; Sacheli et al., 2013; Candidi et al., 2015).

These and other studies have assessed the usefulness of (early) kinematic cues for understanding an actor's proximal goals but also his distal intentions. One possible explanation for this phenomenon is that, in the context of grasping actions, an object can be handled and manipulated differently depending on a performer's intention—hence the agent's intention can be inferred from the way the agent performs the motor action. This explanation, however, lacks a quantitative (or computational) characterization so far and it is unclear whether one can derive the benefits of distal intention recognition from normative principles, e.g., the minimization of action costs. Furthermore, it is unclear if the explanation is sufficient to explain the data; for example, if appealing to early kinematic cues can fully explain the rapidity of intention recognition found in human studies, or if it is instead necessary to appeal to additional mechanisms (e.g., sophisticated prior information or evolutionary adaptations for intention recognition that are fundamentally different from those that permit recognizing proximal action goals).

In this paper, we offer a computationally-guided explanation of distal intention recognition that is rooted in normative theories of computational motor control and (embodied) sequential action (Sandamirskaya and Schöner, 2010; Rosenbaum et al., 2012; Pezzulo et al., 2014, 2017; Lepora and Pezzulo, 2015; Pezzulo and Cisek, 2016). In a control theoretic perspective, proximal actions have to simultaneously fulfill the concurrent demands of proximal and distal goals (or first-order and higher-order planning). In other words, any goal-directed action is shaped according to its proximal and distal goals: first-order planning (associated with proximal goals) determines object handling grasp trajectory according to immediate task demands (e.g., tuning to the orientation or the grip size for an object to be grasped); higher-order planning (associated with distal goals) alters one's object manipulation behavior not only on the basis of immediate task demands but also on the basis of the next tasks to be performed. This would imply that in certain conditions one can impose a cost on the proximal action or execute it suboptimally in order to fulfill the requirements of a distal action, e.g., a waiter can grasp a glass with a thumb-down posture if he has to successively turn it upright (Rosenbaum et al., 1990). The necessity of simultaneously optimizing proximal and distal components of an action sequence (e.g., "reaching and grasping a bottle to pour water" vs. "to move the bottle") implies the coarticulation of consecutive motor acts, which would thus provide a normative rationale for the differences in the former part of the sequence ("reaching and grasping the bottle") depending on the latter part or distal intention<sup>1</sup> .

Below we present a computational analysis of coarticulation during object grasping showing that (i) an agent who coarticulates proximal and distal actions produces different kinematic patterns in the first part of a sequential action ("reaching and grasping the bottle") depending on his distal goal ("pouring" or "moving the bottle"); (ii) in turn, coarticulation gives rise to kinematic features that are sufficient for observers to correctly discriminate the agent's distal intention early in time at least in some cases. Our analysis provides computational-level support for the idea that accurate intention recognition may be due to early kinematic cues elicited during proximal actions, without necessarily requiring additional mechanisms. In turn, the elicitation of informative cues may be a byproduct of the optimization of sequential actions and does not need to have necessarily a social goal (e.g., facilitation of action recognition, like in signaling Vesper et al., 2010; Pezzulo, 2011; Sacheli et al., 2013)—although of course automatic (coarticulation) and

<sup>1</sup>For the sake of simplicity, here we equate coarticulation and assimilation, see also (Jerde et al., 2003). However, there is a conceptual difference between the two: coarticulation is the underlying process (i.e., the temporal overlap between sequential actions) while assimilation is the superficial result (in terms of increasing the similarity of the last part of the first movement to the first part of the last movement).

strategic (signaling) modulations of one's own action kinematics can be merged.

# 2. COMPUTATIONAL APPROACH

In computational motor control, it is widely assumed that action representations stem from (probabilistic) internal models (Wolpert et al., 2003; Jeannerod, 2006; Shadmehr and Krakauer, 2008; Friston et al., 2010, 2017; Pezzulo et al., 2015, in press; Donnarumma et al., 2016; Maisto et al., 2016; Stoianov et al., 2016). These models can be hierarchical, with higher hierarchical levels encoding more abstract and distal aspects and lower hierarchical levels encoding more proximal aspects that are related to action performance. At lower levels, actions such as grasping or pouring can be associated with probability distributions over hand kinematics (e.g., controls of angles of fingers), which of course change over time as the action unfolds.

Within this general probabilistic framework, we model the performer agent using a computational method that combines basic actions (or motor primitives) such as grasping and pouring to realize a sequential action (e.g., grasp a bottle to move it or pour), with or without coarticulating them. Furthermore, we model the observer agent using a computational method that infers the performer agent's current action, by "simulating" the execution of (the same) motor primitives for grasping, moving and pouring. Below we briefly introduce these two computational models, which we successively specify more formally.

# 2.1. Rationale of the Computational Approach

### 2.1.1. Performer Agent

According to our coarticulation hypothesis, we describe the planning of sequential actions as the coarticulation (or assimilation) of two successive motor primitives, e.g., motor primitive for reaching-and-grasping and one for grasping-and-pouring. Intuitively, assimilation implies that if the two sequential actions are modeled by two different probability distributions of hand kinematics (Dindo et al., 2011; Pezzulo et al., 2013a), these two distributions are made more similar by sampling from their probabilistic superposition (aka, coarticulated distribution) rather than the two original distributions, over time. **Figure 1** offers a schematic illustration of this concept in a simplified 2D domain, where the proximal action (from time zero to time 1,000) corresponds to moving a mouse to the center-right, and the distal action (from time 1,000 to time 2,000) corresponds to moving the mouse to the top-right or bottom-right. The colors correspond to the mean and variance of the probability distributions of hypothetical center-right, top-right and bottom-right mouse movements. The figure shows how the same proximal action—move to center-right—can be either independent from (top panel) or assimilated/coarticulated with (bottom panel) the successive action of reaching the top-right or bottom-right. In the latter case, the effects of assimilation on the mouse movements are apparent from time 600, well before the (theoretical) beginning of the distal action.

## 2.1.2. Observer Agent

According to motor theories of cognition, the computational mechanisms (and internal models) used for action planning and execution are also reused for action understanding, using motor simulation (Jeannerod, 2006). In keeping with this idea, we model the action observation process as a (probabilistic) inference problem, in which an observer agent considers multiple possible hypotheses that correspond to the actions that may have generated the observed sensory stimuli (i.e., whether the performer agent is grasping for pouring vs. grasping for moving) and has to select one of them. To do so, the observer agent simulates executing multiple actions in parallel (from his own motor repertoire), compares the predictions under these different hypotheses with the observed movements, and assigns high probability to the action/hypothesis that generates the smaller prediction error. This process is iterated over time using a probabilistic scheme (see below), so that as the performer agent's actions unfold in time, evidence accumulates for one of the alternative hypotheses. Note that using this framework implicitly requires the assumption that performer and observer agents share the same set of motor primitives, although the probabilistic parameters might differ according to individual actor's knowledge and expertise. Our simulations will show that this motor simulation process converges more readily to the correct explanation when the performer agent uses coarticulation—and in this latter case, an observer agent can correctly recognize the distal intention of a demonstrating agent while he is still executing the proximal action.

# 2.2. Formal Aspects of the Computational Approach

## 2.2.1. Performer Agent and the Coarticulation Distribution

Coarticulation is the process of altering one's own behavior to facilitate the next action. In this framework, a proximal action is coarticulated (or assimilated) with the next action in a sequence if the differences between the (probability distributions denoting the) two actions are minimized, while at the same time it maintains its correct pragmatics (e.g., a reaching action has to effectively reach the bottle despite being coarticulated with a successive grasping action).

To exemplify this concept, let's consider two actions (e.g., reaching a bottle and executing a power grasp), each implemented as a motor primitive (m<sup>1</sup> or m2) that, for every moment in time, can be associated to a probability distribution (for example, a Gaussian distribution over its corresponding kinematic parameters such as hand and finger configurations). **Figure 2** shows the distributions associated to the two motor primitives, p(x<sup>t</sup> |m1) for model m<sup>1</sup> (e.g., reaching the bottle, in blue) and p(x<sup>t</sup> |m2) for model m2 (e.g., power grasp, in red), at time t. Based on these two original distributions, it is possible to compute the novel coarticulated distribution pcoa(x<sup>t</sup> |m1) (e.g., reaching the bottle while preparing to grasp it with a power grasp, in orange), which corresponds to the fact that at time t, the motor primitive m<sup>1</sup> is coarticulated with m2. Obviously, this example only describes what happens in a single temporal instant, while

denote the probability of occupying a given position in space, during time, from red (highest probability) to blue (lowest probability).

actions unfold in time. To model temporal dynamics of motor primitives, it is possible to extend the same formalism using continuous distributions, see below.

It is important to remark that any sample drawn from the coarticulation distribution (in orange) at time t should simultaneously satisfy two constraints: it should be representative of the original distribution of the first motor primitive p(**x**<sup>t</sup> |m1) while at the same time should have a high probability of belonging to the second motor primitive m<sup>2</sup> (or in more general cases, even to a set of future motor primitives, mj). In keeping, to obtain an approximation of the coarticulation distribution, we adopt a rejection sampling technique. Let **x**ˆ<sup>t</sup> be a sample from the original distribution p(**x**<sup>t</sup> |mi) or a motor primitive m<sup>i</sup> . Given K random values, u<sup>k</sup> ∈ [0, 1], sampled from the uniform distribution over [0, p(**x**<sup>t</sup> |m<sup>k</sup> )/p max k ], we decide to accept the sample **x**ˆ<sup>t</sup> if the following holds:

$$u\_i \prec w\_i \cdot p(\hat{\mathbf{x}}\_t|m\_i) \quad \text{and} \quad u\_j \prec w\_j \cdot p(\hat{\mathbf{x}}\_t|m\_j), \forall j \ne i \tag{1}$$

where **w** = [w1,w2, . . . ,wK] is a vector of weights that modulate the contribution of the individual motor primitives in the coarticulation distribution. Intuitively, this implies that a sample is accepted if and only if it is a "good exemplar" of both (say) the grasping and the pouring distributions—implying that the novel coarticulation distribution combines aspects of grasping and pouring.

In the case of continuous distributions p(**x**<sup>t</sup> |mj), the coarticulation distribution becomes:

$$\rho^{coa}(\mathbf{x}\_t|m\_i;\mathbf{w}) \propto w\_i \cdot \rho(\mathbf{x}\_t|m\_i) \prod\_{j \neq i} (w\_j \cdot \rho(\mathbf{x}\_t|m\_j)) \tag{2}$$

The resulting coarticulation distribution p coa(x<sup>t</sup> |mi) is constructed in such a way that its kinematic parameters are the most probable for the motor primitive m<sup>i</sup> but also the most similar to those of the primitive(s) to be executed next (mj). As illustrated in **Figure 1**, the motor primitives for (say) grasping and pouring then mesh coherently over time (bottom panel: coarticulation), rather than being simply executed one after the other (top panel: no coarticulation). Another way to appreciate the key features of the coarticulation distribution is contrasting it with its "converse": the signaling distribution, see **Figure 2**. While the coarticulation distribution is constructed in such a way to emphasize the similarities between two motor primitives, the signaling distribution is constructed in such a way to emphasize their differences—hence the former (coarticulation) distribution is more appropriate to model assimilation effects (e.g., between two consecutive motor primitives as in our examples) and the latter (signaling) distribution is more appropriate to model dissimilation effects such as those arising during non-verbal, sensorimotor communication (Vesper et al., 2010; Pezzulo, 2011; Pezzulo and Dindo, 2013; Pezzulo et al., 2013a,b; Sacheli et al., 2013; Candidi et al., 2015). See the Appendix for a more formal treatment of the signaling distribution.

### 2.2.2. Observer Agent and Probabilistic Motor Simulation

Our implementation of action understanding via motor simulation is based on a Dynamic Bayesian Network (DBN) shown in **Figure 3**. DBNs are Bayesian networks representing temporal probability models in which directed arrows depict assumptions of conditional (in)dependence between variables (circles) (Murphy, 2002). Shaded nodes represent observed variables while the others are hidden and need to be

estimated through the process of probabilistic inference. In our representation, the process of action understanding is influenced by the following factors expressed as stochastic variables in the model (see Dindo et al., 2011 for a more detailed account of the model):


During action observation, the model has to infer which action the performer agent is performing (e.g., whether he or she is currently grasping, pouring, lifting, etc.—where each action, proximal or distal, is denoted by an index it). The goal-directed action is considered to be hidden (i.e., not directly observable); but it can be inferred on the basis of noisy sensory observations (denoted as zt), e.g., the performer's hand movements. The logic is the usual of (inverse) Bayesian inference, which considers multiple potential actions as candidate explanations, which compete to explain the sensory data (Wolpert et al., 2003; Demiris and Khadhouri, 2005; Dindo et al., 2011; Friston et al., 2011; Pezzulo, 2013; Donnarumma et al., in press). Each action i<sup>t</sup> is associated with a paired inverse-forward model (see Equation 4 below). Re-enacting these actions "in simulation" generates a motor control u<sup>t</sup> (given the hidden state xt−1, aka inverse model), and a prediction of the next hidden state x<sup>t</sup> (given the motor control u<sup>t</sup> and the previous state xt−1, aka forward model). Comparing the predicted and sensed movements under various competing hypotheses (e.g., grasping, pouring) permits to assess which one generates less prediction error, hence explaining better the data. A priori contextual information c<sup>t</sup> can bias the inferential process and the initial choice of the internal models to test (in case they are too numerous).

The following equations describe the observation model (Equation 3), which specifies the way (noisy) sensory stimuli are used to estimate the state of the demonstrator (e.g., hand position); the transition model (Equation 4), which specifies how the state of the demonstrator evolves as a function of his or her goals and motor commands; and the a priori distribution over the set of hidden variables (Equation 5), which represents the perceiver's prior belief and is a necessary ingredient of Bayesian systems.

$$p(\mathcal{Z}\_t|\mathcal{X}\_t) = p(\mathbf{z}\_t|\mathbf{x}\_t) \tag{3}$$

$$p(\mathcal{X}\_t|\mathcal{X}\_{t-1}) = p(\mathbf{x}\_t|\mathbf{x}\_{t-1}, \boldsymbol{u}\_t, \mathbf{i}) \cdot p(\boldsymbol{u}\_t|\mathbf{x}\_{t-1}, \mathbf{i}) \tag{4}$$

$$p(\mathcal{X}\_0) = p(\mathbf{x}\_0) \cdot p(c\_0) \cdot p(i|c\_0) \tag{5}$$

The inference exploits the usual (prediction) error-correction mechanisms of Bayesian systems. The model starts with prior hypotheses about the demonstrator's actions and intentions, and these are iteratively revised as new sensory evidence is sampled. The evidence provided by the perceptual process and the observed states (zt) is responsible for "correcting" the posterior distribution when integrating the observation model p(z<sup>t</sup> |xt). In other words, those parts of the hidden state that are in accordance with the observations will exhibit peaks in the posterior distribution. Since those states have been produced by a goal-directed motor primitive, the marginalization of the final

posterior distribution produces the required discrete distribution over motor primitives, p(i<sup>t</sup> |z1:t).

Thus, the motor primitive with the highest probability (above a fixed threshold) is selected as the "winning" primitive; such an inference process can be iterated over time by using the full posterior distribution as the prior for the next step, until convergence. Ultimately, the aim of the whole process is estimating the probability of each model given the current observations so far (i.e., likelihood). The most plausible model is the one that maximizes the posterior probability of the model. As usual in a Bayesian setting, the whole process is influenced by the choice of the prior distributions for the available motor primitives: the more likely is a particular motor primitive apriori, the more reliable and fast its recognition. In particular, using this framework action recognition is influenced by an auxiliary (contextual) variable, which can intuitively reflect an agent's contextual knowledge (e.g., that pouring is highly unlikely if the bottle is almost empty) that biases the motor primitives that are actually simulated by the agent. While prior probabilities and contextual information are important in real-life scenarios, we do not use them in our simulations.

# 3. EXPERIMENTAL SETUP AND RESULTS

We performed a series of computational simulations, in which one (performer) agent executes one of two sequential actions (e.g., "reaching and grasping a bottle to pour water" vs. "reaching and grasping a bottle to move it") in two conditions: with and without the coarticulation method explained in Section 2.2.1. At the same time, the other (observer) agent has to disambiguate these two alternatives as soon and as accurately as possible, using the probabilistic motor simulation methods introduced in Section 2.2.2. These simulations permit us to study the benefits of coarticulation, and to test the "sufficiency" hypothesis introduced earlier: namely, that kinematic features at early stages of a coarticulated action permit an observer to recognize the action. In our scenario, this means that a sequential action (e.g., "reaching and grasping a bottle to pour water") can be discriminated already during the first (reaching) phase. Conversely, when the same action is executed without coarticulation, it can only be recognized during the second phase, after the agent has grasped the bottle.

In our simulations, for the sake of simplicity we focused on two two-step sequential actions: reach-and-pour vs. reach-andmove. In practice, we used three motor primitives: the former primitive (reach-to-grasp) corresponds to the first step in both sequences, while the other two primitives (grasp-to-pour and grasp-to-move) correspond to the two final actions to complete the first and second sequential actions, respectively. At each moment in time, from 0 ms (beginning of sequential action) to 1,500 ms (end of sequential action), each motor primitive corresponds to a probability distribution over controls of finger, thumb and wrist of a (human) hand.

The motor primitives were derived based on controls and parameters extracted from human data collected from six adult male participants. Each participant executed every primitive action 50 times, and data on angles of finger, thumb and wrist were collected using a dataglove (HumanGlove - Humanware S.r.l., Pontedera, Pisa, Italy) endowed with 16 sensors. The former (reach-to-grasp) motor primitive was acquired while participants reached an object the size of a bottle with a concave constriction (see also Sartori et al., 2011), with no knowledge of the next action to perform with it. We selected the latter two primitives (graspto-pour and grasp-to-move) as instances of power grasp and a precision grip actions, respectively, in which the end-position of the fingers was analogous to the positions reported by Sartori et al. (2011) while humans grasped a bottle to pour or move it, respectively.

The internal dynamical models (motor primitives) used in the simulations were obtained by regressing the aforementioned data (50 trials for 6 participants for each primitive), to obtain probability distributions over angles of finger, thumb and wrist, over time. For each motor primitive, we used an Echo State Gaussian Process (ESGP) (Chatzis and Demiris, 2011), a method for the Bayesian modeling of sequential data that produces a measure of confidence (or uncertainty) on the generated predictions (the model predictive density), which can be directly used within our computational approach.

In the simulations reported below, a non-coarticulated action corresponds to the first primitive (reach-to-grasp) being used for the first 1,000 ms, while one of the two remaining primitives (grasp-to-pour or grasp-to-move, depending on the task) is used for the successive 500 ms. A coarticulated action corresponds to the first primitive (reach-to-grasp) being coarticulated with one of the two remaining primitives (grasp-to-pour or graspto-move, depending on the task) during the interval between 500 and 1,000 ms, using the coarticulation method explained in Section 2.2.1. In other words, we derive the coarticulated actions by "meshing" two primitives, not by using separate ESGPs. Note that in the simulations, we coarticulated the index finger and thumb controls (not the wrist controls), coherent with their importance in grasping and pouring actions (Sartori et al., 2011).

A first result of our simulations is that during the execution of the former (reach-to-grasp) motor primitive in the sequence, Maximum Grip Aperture and Time of Maximum Grip Aperture differ significantly if the primitive is coarticulated with a graspto-pour primitive, with a grasp-to-move primitive, or not coarticulated at all, see **Figure 4**. This result is not remarkable per se, but can be considered as a "safety check" that the different intention elicits different action kinematics, with a pattern that is qualitatively coherent with the results of Sartori et al. (2011) in a similar scenario. What is more important for us was studying whether (and how) this difference in action kinematics translates into an advantage for the observer agent, at early stages of the performer's agent movement.

To answer this question, we simulated the behavior of an observer agent that has to recognize the actions performed by the performer agent, using the probabilistic motor simulation mechanism described in Section 2.2.2. As shown in **Figure 5**, the observer agent had a clear advantage in recognizing the performed action when it was coarticulated. More specifically, the figure shows that without coarticulation the performer agent's distal intention (pouring or moving the bottle) can be recognized only after he reaches the bottle (i.e., after time 1,000), while with coarticulation it can be recognized much earlier, during the execution of the first motor primitive (i.e., well before time 1,000). This latter result illustrates that coarticulation affords intention recognition in ways that are qualitatively different from the mere execution of a (non-informed) action.

# 4. DISCUSSION

Humans excel at recognizing distal intentions on the basis of (apparently) little information, but the cognitive and computational mechanisms underlying this ability are incompletely known. We have proposed that normative principles regulating the coarticulation of sequential actions can explain how it is possible to infer a performer's distal intention by looking at the kinematics of his proximal actions.

To test this idea, we implemented a series of simulations in which the performer agent executes sequential actions (reachand-pour or reach-and-move) as sequences of two primitives (reach-to-grasp and grasp-to-pour, or reach-to-grasp and graspto-move) with or without coarticulation. Our results show that two successive actions can be coarticulated (or assimilated) in such a way that the kinematics of the proximal action are adequate for (and informative of) the next action(s) in the sequence. Indeed, our results show that, first, coarticulated actions have characteristic kinematic features compared to noncoarticulated actions, and second, that these features may be sufficient for an observer agent to correctly recognize the performer's agent distal intention early on. This result holds despite the fact that we used simplified motor primitives and only coarticulated index finger and thumb controls. In principle, an observer agent having access to richer visual stimuli and more sophisticated primitives (with more controls and degrees of freedom) may enjoy additional benefits; it is however possible that coarticulation only operates on a restricted set of degrees of freedom, e.g., those that are necessary to solve the task, as for the uncontrolled manifold hypothesis (Scholz and Schöner, 1999). At the same time, it is possible that in real-life conditions, some information encoded in movement kinematics that would be potentially useful to infer a performer's intention may nevertheless remain invisible to observers—for example, when parametric variations are too small to be detected (Naish et al., 2013; Cavallo et al., 2016). Our computational study shows that coarticulation promotes the appropriate preconditions for advance intention understanding, but the additional factors that may favor (or prevent) an advantage for observer agents remain to be fully assessed.

Our emphasis on the sufficiency of kinematic cues to solve intention recognition tasks does not imply that interactive agents do not use other sources of information, such as (prior information on) the context in which the action takes place. For example, it has been argued that the same action (approaching a person with a knife) can be motivated by two different intentions (e.g., Dr. Jekyll who wants to cure or Mr. Hyde who wants to kill) and these can be disambiguated based on the place where the action occurs (operating room or dark street) (Kilner et al., 2007), but see Jacob and Jeannerod (2005); Kilner et al. (2007); Becchio et al. (2008) for alternative proposals. This kind of prior information can be readily incorporated in the action recognition scheme described above, through the contextual (C) node of the DBN. By considering the probabilistic relations between contexts and actions, it would be possible to bias the action recognition process and distinguish the intentions motivating two actions, even when they are kinematically identical—a situation that, as we have discussed, may be more the exception than the rule. Furthermore, it would be possible to extend the model discussed here so that it also directs saccadic eye movements to the most informative parts of the demonstrator's actions, in keeping with the idea that action recognition uses an active sensing scheme (Donnarumma et al., in press). Modeling eye movements would help understanding under which conditions subtle kinematic cues that are embedded in goal-directed actions are detected by observer agents.

Following a motor cognition approach, our model implements action recognition as a (Bayesian) inferential process that uses the logic of "analysis-by-synthesis" or action simulation (Jeannerod, 2006). This is in keeping with evidence (reviewed in Grafton, 2009) that observer agents simulate the

FIGURE 4 | Maximum Grip Aperture and Time of Maximum Grip Aperture when (1) the reach-to-grasp primitive is coarticulated with grasp-to-pour, (2) the reach-to-grasp primitive is coarticulated with grasp-to-move and (3) the reach-to-grasp primitive is not coarticulated (baseline condition), as if there was no successive action.

actions they observe in their brains. Alternative hypotheses point, for example, to a more ecological or enactive view of action understanding, which appeal to "direct perception" rather than (Bayesian) inference (Gibson, 1966). While this alternative perspective would differ from our implementation, the logic of our argument here may be the same—that is, that coarticulation generates information that an observer agent can use to form an advance understanding of the performer's goals (via Bayesian inference or direct perception).

It is notable that we have illustrated the model by discussing coarticulation in the domain of reaching and grasping actions, where essentially coarticulation implies the preshaping of hands before executing a grasping action (Jeannerod, 2006). However, the phenomenon of coarticulation is evident in all sequential actions, and the model presented here is (in principle) general enough to address analogous phenomena in other domains, including speech, sign language (Jerde et al., 2003) and the planning of smooth action sequences (Rosenbaum et al., 2006). It remains to be assessed by future studies whether the computational scheme presented here is empirically adequate to explain sequential action in these and other domains, or if it needs to be extended to include more sophisticated internal generative models (e.g., of hierarchical dynamics rather than only sequences of motor primitives Kiebel et al., 2008, 2009; Donnarumma et al., 2015a,b)—as well as the relative merits of alternative frameworks such as those stemming from a dynamical systems perspective (Kelso, 1995; Marsh et al., 2006, 2009).

To sum up, according to this (normative) proposal, the main goal of coarticulation is to optimize sequential actions, and the facilitatory effects for social cognition are byproducts of this process. In other words, according to this proposal, there is no need of any action recognition or mindreading adaptation in the observer, because the action recognition process is greatly facilitated by the performer—albeit often unwittingly (but see the Appendix). This process is effective because during the execution

# REFERENCES


of sequential actions, there is a sort of backward influence from the latter action (and its constraints) to the former action. Thus, the former action already includes subtle but reliable kinematic cues, which can be used to infer the performer's distal goal—and we humans excel at picking up these cues.

# AUTHOR CONTRIBUTIONS

FD, HD, and GP conceived the study, collected and analyzed data, and wrote the manuscript.

# FUNDING

The present research is funded by the Human Frontier Science Program (HFSP), award number RGY0088/2014, by the EU's FP7 under grant agreement no FP7-ICT-270108 (Goal-Leaders).

# ACKNOWLEDGMENTS

The GEFORCE Titan used for this research was donated by the NVIDIA Corporation.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Donnarumma, Dindo and Pezzulo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# APPENDIX

# A.1. Relations between Coarticulation (or Assimilation) and Signaling (or Dissimilating) during Social Interactions

While we have emphasized the automaticity of coarticulation, it can also be used strategically in social contexts; for example, to lower (or raise) the co-actor's uncertainty about our plans—that is, to help him or her understand an actor's own distal intentions, or to feint; or even (in principle) to smoothly combine one's own actions with those of co-actors (Gonzalez et al., 2011). To illustrate how it is possible to use coarticulation strategically, we denote with p(**x**|mi) the sequence of the states associated to the motor primitive m<sup>i</sup> computed using the coarticulation distribution p(**x**<sup>t</sup> |mi). If a performer agent wants to facilitate the perceiver's action recognition process, (s)he can compute the weights **w**i(t) so that they minimize the following equation:

$$\mathbf{w}\_{i}(t) = \operatorname\*{argmin}\_{\mathbf{w}(t)} \left[ \operatorname{KL} \left[ p\_{i}^{\text{coa}} (\mathbf{w}(t)), p\_{i} \right] + \lambda \mathcal{S} \left( \theta - p\_{i}^{\text{simulated}} \right) \right] \tag{A1}$$

where:


The KL term considers the cost of coarticulation, where cost can be associated to biomechanical factors, effort, and other forms of costs (e.g., cognitive costs associated to planning and executing non-familiar or non-habitual movements). The λ term permits modulation of the amount of coarticulation (λ = 0 means no coarticulation). By minimizing the above quantity, the performer agent essentially disambiguates the coarticulated action from possible alternatives, thus permitting an observer agent to infer his distal intention at early stages.

This latter example shows how it is possible to use coarticulation to signal one's own intentions (e.g., make them "readable"), or conversely to feint another intention, analogous to other sensorimotor communication dynamics during social interactions (Vesper et al., 2010; Pezzulo, 2011; Pezzulo and Dindo, 2013; Pezzulo et al., 2013a; Sacheli et al., 2013; Candidi et al., 2015). Indeed, in our formulation coarticulation and signaling are not just similar but stem from a consistent computational approach. Indeed, the distribution defined in Equation (2) is the dual of the signaling distribution defined in Pezzulo et al. (2013a), and which can be used to dissimilate between the current action having been performed and alternative actions, with the aim to facilitate the perceiver's agent recognition of the proximal goal.

Defining a function:

$$\begin{aligned} p^{comm}(\mathbf{x}\_{l}|m\_{i};\mathbf{w}) &\propto \; \mathbf{w}\_{i} \cdot p(\mathbf{x}\_{l}|m\_{i})\\ &\prod\_{k \in Dissim} (1 - \; \mathbf{w}\_{k} \cdot p(\mathbf{x}\_{l}|m\_{k}) / p\_{k}^{max}) \cdot \\ &\prod\_{j \in Assim} (\mathbf{w}\_{j} \cdot p(\mathbf{x}\_{l}|m\_{j})) \end{aligned} \tag{A2}$$

where p max k is the maximum value for the distribution p(**x**<sup>t</sup> |m<sup>k</sup> ), Dissim is the set of motor models to be dissimulated and Assim is the set of motor models to be coarticulated.

In short, one can use the same equation to flexibly combine or interleave assimilation and dissimilation of actions, see **Figure 2**. As we have shown here, one can assimilate two consecutive actions, and this would correspond to coarticulation. However, one can also assimilate two simultaneous actions, and this would correspond to a feint, in that it would render the observer's action recognition process more difficult. Finally, dissimilating one's current action from the alternatives would amount to signaling (and helpful for the observer agent), while dissimilating two consecutive actions in an action sequence would amount to feinting own's own distal intention. This equation can thus be used to derive formal descriptions of various strategies to help or hinder during social interactions, which can be helpful for the (trial-by-trial, model-based) analysis of human data (Candidi et al., 2015).

# The Unresponsive Partner: Roles of Social Status, Auditory Feedback, and Animacy in Coordination of Joint Music Performance

#### Alexander P. Demos<sup>1</sup> \*, Daniel J. Carter<sup>1</sup> , Marcelo M. Wanderley<sup>2</sup> and Caroline Palmer<sup>1</sup> \*

<sup>1</sup> Department of Psychology, McGill University, Montreal, QC, Canada, <sup>2</sup> Department of Music Research, CIRMMT, McGill University, Montreal, QC, Canada

#### Edited by:

Michael J. Richardson, University of Cincinnati, USA

# Reviewed by:

Takako Fujioka, Stanford University, USA Valentina Fantasia, Max Planck Institute for Human Development (MPG), Germany

#### \*Correspondence:

Caroline Palmer Caroline.palmer@mcgill.ca Alexander P. Demos ademos@uic.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 04 August 2016 Accepted: 20 January 2017 Published: 14 February 2017

#### Citation:

Demos AP, Carter DJ, Wanderley MM and Palmer C (2017) The Unresponsive Partner: Roles of Social Status, Auditory Feedback, and Animacy in Coordination of Joint Music Performance. Front. Psychol. 8:149. doi: 10.3389/fpsyg.2017.00149 We examined temporal synchronization in joint music performance to determine how social status, auditory feedback, and animacy influence interpersonal coordination. A partner's coordination can be bidirectional (partners adapt to the actions of one another) or unidirectional (one partner adapts). According to the dynamical systems framework, bidirectional coordination should be the optimal (preferred) state during live performance. To test this, 24 skilled pianists each performed with a confederate while their coordination was measured by the asynchrony in their tone onsets. To promote social balance, half of the participants were told the confederate was a fellow participant – an equal social status. To promote social imbalance, the other half was told the confederate was an experimenter – an unequal social status. In all conditions, the confederate's arm and finger movements were occluded from the participant's view to allow manipulation of animacy of the confederate's performances (live or recorded). Unbeknownst to the participants, half of the confederate's performances were replaced with pre-recordings, forcing the participant into unidirectional coordination during performance. The other half of the confederate's performances were live, which permitted bidirectional coordination between performers. In a final manipulation, both performers heard the auditory feedback from one or both of the performers' parts removed at unpredictable times to disrupt their performance. Consistently larger asynchronies were observed in performances of unidirectional (recorded) than bidirectional (live) performances across all conditions. Participants who were told the confederate was an experimenter reported their synchrony as more successful than when the partner was introduced as a fellow participant. Finally, asynchronies increased as auditory feedback was removed; removal of the confederate's part hurt coordination more than removal of the participant's part in live performances. Consistent with the assumption that bidirectional coupling yields optimal coordination, an unresponsive partner requires the other member to do all the adapting for the pair to stay together.

Keywords: joint action, temporal coordination, social status, dynamical systems, auditory feedback

# INTRODUCTION

fpsyg-08-00149 February 10, 2017 Time: 17:47 # 2

When musicians perform together, they must coordinate and adapt their actions in different social contexts. A musical ensemble, for example, can have a hierarchy with a principal director (such as a conductor of an orchestra) and sub-directors (such as the first violinist), or they may have a more equal or egalitarian relationship among members, as seen in some string quartets (Gilboa and Tal-Shmotkin, 2012). Regardless of the social context, the musicians must stay in tight temporal coordination to have a successful performance. To achieve this coordination, musicians rely on the auditory feedback from their own actions and the sound of their partners' actions to adapt to and anticipate each other (Goebl and Palmer, 2009; van der Steen and Keller, 2013). The success of synchronization between performing musicians may also depend on the directionality of influence, referred to as coupling in dynamical systems theory; for example, one performer may influence the other (unidirectional coupling), or both may influence each other (bidirectional coupling). In order to contrast the types of coupling, we test the synchronization between pairs of pianists while we manipulate the social relationships between the partners, the access to their auditory feedback, and the direction of influence between the partners.

A non-linear dynamical systems perspective can explain the synchronization between two people in terms of coupling, or an energy transfer, that facilities the adjustment of their actions to maintain a stable phase relationship (Haken et al., 1985; Kelso, 1997; Pikovsky et al., 2001; Strogatz, 2003; Marsh, 2010, 2013; Latash, 2014). An energy transfer between two people typically occurs through perceptual information, such as when people use auditory feedback about their partners' actions to adjust their actions (Néda et al., 2000; Riley et al., 2011; Demos et al., 2012). Coupling between people can be unidirectional or bidirectional. In unidirectional coupling, one system adapts to changes in the phase or period of the second system, such as a pianists adapting to a recording. Bidirectional coupling occurs when both systems respond and adapt to one another (Pikovsky et al., 2001), such as two pianists adapting to each other. Current dynamical mathematical models suggest that bidirectional coupling yields an optimal form of coordination as each person can share in the adapting (Strogatz, 2000), whereas unidirectional coupling would require all of the adaptation to occur by one member of the pair to maintain synchrony. Temporal coordination in joint music performance may be unidirectional (as when a performer plays with a non-responsive recording) and would be expected to generate less synchrony or bidirectional (as when a performer plays with a responsive live partner) and would be expected to generate more synchrony. We compare live and recorded performances in a manipulation of duet performance, in which the participants do not know whether the confederate's performance is animate (live) or not (recorded).

Marvel et al. (2009) describe the shift of social relationships in a group of people as arising from inequalities in energy transfer among the members. Originating from applications of balanced relations in graph theory (Cartwright and Harary, 1956), Marvel et al. (2009) interpret the connections in a social network as an energy minimization process. This theory defines an energy landscape with certain relationships within the social network as more stable than others, with the intrinsic goal to avoid unbalanced (unstable) relationships. One can apply this concept to a musical relationship with groups as small as two; for example, musical duets composed of equally or unequally experienced or informed members. In our design, we manipulate how much knowledge the participant believes the confederate has about the task. We create either a balanced relationship in which the confederate is an equal partner in the task, or an unbalanced relationship in which the confederate is an experimenter in the task. The latter instruction is designed to suggest the confederate has more knowledge, power, experience, or information about the task, and thus the social relationship is imbalanced. Although a social imbalance may affect the way performers perceive each other, we do not expect it to drastically affect the degree of temporal coordination, as social imbalance during music performance is relatively common; for example, when one ensemble member is in charge of directing the group, all ensemble performers must stay coordinated in time or else the music will not sound correct.

Auditory feedback during joint music performance can also create imbalance among musicians. Studies of auditory feedback effects have generally used one of two manipulations: those that manipulate the feedback from live performance, and those that manipulate the effects of recorded performance feedback. Studies that manipulate feedback from live performance suggest that the removal of self-feedback causes less disruption to temporal coordination than the removal of the partner's feedback (Goebl and Palmer, 2009; Loehr and Palmer, 2011; Zamm et al., 2015). Those studies also show that the more auditory feedback that is removed, the larger the asynchrony becomes between duet performers. Studies that manipulate feedback from performers playing with audio recordings suggest that performers can synchronize better with recordings of their own performances than with recordings of other performers (Keller et al., 2007). As well, studies suggest that there are individual differences related to a performer's ability to synchronize with a recorded partner (Novembre et al., 2013). To our knowledge, no study has yet compared directly the effects of auditory feedback on adaptation to synchronization between live and recorded performance, one goal of the current study.

We examined the temporal synchronization between duet pianists while the social relationship of the pair was manipulated. Each participant pianist was introduced to their partner as either an experimenter (imbalanced hierarchical relationship) or as a fellow participant (balanced equal relationship), in a manipulation of social status. We expected that participants would attribute expertise and prior knowledge to the confederate as an experimenter, and would therefore be more motivated to perform well in the experimenter condition. We also manipulated the animacy of the performances with which the participants performed: half of the performances were live, and half were recordings of the same confederate pianist. Because the confederate's hands and arms were not visible to the participant seated across the room at a separate piano and the confederate performed the music in each duet performance,

the participants did not know whether they were hearing live or recorded performances. We expected that the recorded performances, which did not permit temporal adaptation in both directions, would yield unidirectional coupling from participant to confederate (recording), while the live performances would yield the possibility of bidirectional coupling between the two pianists and thus more synchrony between performers. Finally, the auditory feedback from each pianist's performances was presented or was removed (four levels) from the headphones of each pianist across conditions (both pianists heard the same feedback within conditions). We expected that asynchronies would worsen as feedback was systematically removed across the four conditions, with greater worsening when it was removed from the confederate's part than the participant's part.

# MATERIALS AND METHODS

# Participants

The participants were N = 24 adult pianists (M age = 25.79 years, SD = 10.24) with a minimum of 8 years of piano formal instruction (M = 13.1, SD = 3.5). Twenty one of the 24 participants were right-handed, 17 were female and none had known hearing difficulties. Participants were recruited from the Montreal music community. A pre-screening test required participants to play a musical melody (described below) twice without error, and all 24 participants passed. A 21-year-old right-handed male confederate with 8 years of formal piano instruction and no known hearing difficulties performed with each participant in the duet conditions. He was instructed to limit his head and body movements across all performances.

The two social groups: those told they were performing with partners or with experimenters, were compared in terms of their age, amount of musical training, gender, familiarity with the musical piece, and whether they had formally prepared the piece for performance prior to the experiment. Comparisons are shown in **Table 1**. There were no significant differences between the groups.

# Materials and Apparatus

The pianists (the participant and the confederate) sat facing each other at two keyboards with weighted keys (Roland RD-700s), and received feedback from themselves and from their partner through Bose QuietComfort 20 Noise Canceling headphones. Piano (GM2\_002, no reverberation) and metronome (GM2\_232) sounds were generated by a Roland Mobile Studio Canvas Sound Module (SD-50). The FTAP program (Finney, 2001) was used to generate feedback manipulations, play the metronome and keyboard sounds, and record output from the keyboards in MIDI format on a Linux (Fedora) computer (Dell T3600).

# Musical Stimulus

The musical excerpt used for both the pre-test and experimental trials was the opening 4 bars (re-notated into 8 bars of eighth notes; see **Figure 1**) from J. S. Bach's Prelude in C Minor, BWV 847. Each performance consisted of playing the excerpt three times at a tempo provided by a metronome set to one quarternote Interonset Interval (IOI) = 225 ms. The stimulus was chosen for its rhythmically isochronous nature, as well as the equivalent difficulty between the hands. Participants were sent the sheet music prior to testing, and were asked to practice the stimulus prior to a pre-test.

# Design

The study employed a mixed design with one betweensubject factor of the confederate's Social Status (introduced as experimenter or participant) and two within-subject factors of Auditory Feedback (four levels) and Animacy of the confederate's performances (live or pre-recorded). The four within-subject Auditory Feedback manipulations included hearing full sound ("Both present" condition), participant sound only ("Confederate-removed" condition), confederate sound only ("Participant-removed" condition), or hearing no sound ("Both-removed" condition).

## Social Status

The participants were randomly assigned to one of two social status conditions: Half of the participants (12) were assigned to a condition in which they were told that the confederate was an experimenter in the study, and half were told the confederate was another participant, with the goal of inducing a change in the perceived social hierarchy of the participant-confederate relationship.

# Auditory Feedback

There were four conditions of auditory feedback removal. In each condition, both the confederate and the participant heard the same auditory feedback. In the Both-present feedback condition, both participants heard feedback from both parts. In the Participant-removed condition, sound was presented from the confederate's part only, again to both performers. In the Confederate-removed condition, sound was presented from the participant's part only, to both performers. In the Both-removed condition, no feedback was presented to either performer. The last three conditions are referred to as auditory perturbations, during which performers were instructed to continue performing. The perturbation duration lasted for 9–12 notes. At the end of a perturbation, full auditory feedback would begin for the next 10–24 notes, after which another perturbation window could begin. The recovery period provided time for participants to return to baseline synchrony. The starting points of the perturbations were balanced across strong and weak beats and across durations within each condition.

# Animacy

There were two conditions of the confederate's performance Animacy: a live performance (an 'animate' partner) or a prerecorded performance (an 'inanimate' partner). The confederate recorded a total of 20 recordings (both parts performed together) over the course of 4 days, and 8 (four upper part and four lower part) were selected based on their similarity to one another along the dimensions of tempo (IOI M = 230.85, SD = 8.60) and variability (CV M = 0.38, Range = 0.31–0.46). The confederate

#### TABLE 1 | Participant characteristics by Social Status group.

fpsyg-08-00149 February 10, 2017 Time: 17:47 # 4


continued to perform on the keyboard during all trials, and the screen between the pianists prevented the participant from seeing the confederate's hands, arms, and torso, thus removing knowledge of which trials were live or pre-recorded.

#### Blocking

The participant was randomly assigned to perform either the upper voice (using the right hand) or lower voice (using the left hand) for eight trials of the first block of the experiment, and the confederate was assigned to perform the alternative part. In the second block, the participant and confederate switched parts (and hands) for the last eight trials. Within each block, four trials contained manipulations with full auditory feedback, and four trials contained six instances each of the three auditory feedback manipulations in randomized order. Of the six auditory perturbations, two were removals of the participant's sound only,

two were of the confederate's sound only, and two were removals of both performers' sound. The three different perturbation conditions were presented in a counterbalanced order both across and within blocks to control for practice effects. Half of the trials were live performances of the confederate and half were prerecorded performances, presented in a counterbalanced order across the experiment, across each block of performances that differed by assignment of participant to part, and across each subblock of four performances with and without auditory feedback. All pre-recorded performances were also counterbalanced across the entire experiment to ensure that each participant performed with all eight pre-recordings. Thus, half of the performances were played with auditory feedback removal and half were played without; half of the performances were with a live confederate, and half with a recorded confederate; and half of the participants performed with a confederate known as an experimenter and half known as a participant.

# Procedure

Participants were given a pre-test to confirm they could play the piece three times through without error. After passing the pre-test, the confederate entered the room and participants performed the stimulus once with the confederate. The confederate was not known to any of the participants. The participant played on a keyboard facing the confederate, with the hands, arms, and torso of the confederate occluded from view by a screen, in order to prevent visual cues of the confederate's movements and to reduce the possibility of knowing whether the performance was live or a recording. The confederate's head and shoulders were still visible to the participants.

The participant and confederate then continued the 16 experimental trials, in which each performance consisted of an initial metronome cue of four ticks presented at a quarter note IOI of 550 ms. Participants were instructed to stop playing at the sound of a cymbal, which occurred between 1 and 5 notes after the end of the third repetition of the musical stimulus. Any trials on which the participant played too fast or too slow relative to the metronome cue, or performed the beginning of the trial with pitch errors (keypresses that generated pitches that differed from the information indicated in the musical score) were stopped at the start of the trial, and restarted up to three times.

After the completion of all duet performances, participants completed a post-test questionnaire on social aspects. In addition to the behavioral aspects of the design, results from the post-test questionnaire were examined to determine whether the social interaction of playing with a partner influenced the asynchrony of the pair. Six measures of the relationship with the confederate were tested, each on a 7-point Likert scale: how likeable the confederate was, how stressful, how smooth, and how pleasant the participant found interacting with the confederate to be, and how connected they felt to the confederate. There was also a measure of how successful participants thought their synchronization was, also measured on a 7-point Likert scale.

# Data Analyses

Both the participants' and confederate's performances were examined first for pitch errors. Any perturbation window within which a pitch error occurred by either performer was excluded from analysis; this resulted in the exclusion of 15.2% of trials. Pitch errors occurred less often in the primary (upperfrequency) voice (5.0%) than in the secondary (lower-frequency) voice (10.2%), consistent with previous studies of errors in piano performance (Palmer and van de Sande, 1993, 1995). Due to the differences in error rates, analyses were conducted collapsed across voices (the assignment of voice was a withinsubjects variable). The dependent variables of IOI and absolute asynchrony (confederate [live or recorded] – participant), based on tone onsets, were then computed. Asynchronies greater than 3 standard deviations (1.4% of all asynchronies) were excluded from analysis. Signed asynchronies were evaluated for potential Social status effects on leadership. Finally, mean absolute asynchronies and IOIs were computed within each perturbation window and analyses were conducted on the mean values across trials by the factors of Animacy, Feedback, and Social Status.

Analyses were conducted in R (3.3.1) with the afex package (Singmann et al., 2016) used to calculate the ANOVAs. The Lsmeans package (Lenth, 2016) was used for follow-up testing using corrected degrees of freedom for statistical violations (Kenward–Rogers method).

# RESULTS

# Confederate's Live and Pre-recorded Performances

First, the confederate's entire pre-recorded and live performances were compared on dimensions of tempo (measured by mean interonset interval, IOI) and variability (measured by standard deviation of IOIs, SD), to confirm that participants heard performances of equivalent temporal variability in the two Animacy conditions. The confederate's live performances varied across participant; since there were 24 participants and four live trials each, this resulted in 92 live confederate trials compared with four pre-recorded performances. A bootstrap method was applied to the live confederate trials for comparison with the pre-recorded trials. 1000 subsamples of four trials were sampled with replacement from the set of 92 live confederate's trials. The mean IOI was recalculated for each subsample, to provide an overall bootstrap estimate for comparison with the confederate's pre-recorded performance IOIs. This procedure was undertaken for live performances when the confederate was introduced as experimenter and as partner to the participant. The bootstrap means and standard deviations are displayed with the observed pre-recorded counterparts in **Table 2**, which suggested no observable differences between the means or standard deviations for the two sets of performances.

# Effects of Perturbations on Interonset Intervals

Next, we compared the confederate's mean IOIs within the perturbation windows. The confederate's mean IOI values for each perturbation window are shown by condition in **Figure 2**.


TABLE 2 | Timing characteristics of confederate's live and recorded performances by social status group (after outliers due to participants' pitch errors removed).

An analysis of variance on those values by Social Status, Feedback Condition, and Animacy indicated no significant main effects or interactions. As **Figure 2** suggests, the Confederate's tempo remained stable across conditions.

**Figure 2** also shows the participants' mean IOIs within the perturbation windows by condition. The same analysis of variance on those values indicated a significant effect of Feedback condition, F(3,66) = 18.43, MSE = 32.84, η 2 <sup>G</sup> = 0.20, p < 0.001, and the interaction of Feedback with Animacy approached significance, F(3,66) = 2.53, MSE = 15.48, η 2 <sup>G</sup> = 0.02, p = 0.06. As shown in **Figure 2**, participants' performances slowed most when auditory feedback from both parts was removed; post hoc comparisons indicated the Both-removed condition was slower than all other conditions (Tukey's HSD = 6.53, p < 0.001). The removal of sound slowed participants' performance slightly less when the confederate was introduced as an experimenter, but the difference did not reach significance.

# Asynchronies across Entire Performance

The absolute asynchronies between participant and confederate were first evaluated across the entire performance of the Full Sound condition, to confirm the representativeness of the patterns of behavior measured in the perturbation windows. **Figure 3** shows the mean absolute asynchrony (participant and confederate's tone onsets, in ms) for all simultaneities as notated in the musical score, by Social Status and Animacy. The mean asynchronies indicated significant effects of Animacy, F(1,22) = 19.87, MSE = 63.93, η 2 <sup>G</sup> = 0.26, p < 0.001, and a significant interaction of Social Status with Animacy, F(1,22) = 4.62, MSE = 63.93, η 2 <sup>G</sup> = 0.08, p = 0.04. As shown in **Figure 3**, asynchronies were larger for pre-recorded than for live performances, as expected; this contrast was larger when the confederate was introduced as a partner [live – recording: t(22) = −4.67, p < 0.001] than when he was introduced as an experimenter, t(22) = −1.63, p = 0.12. The main effect of Social Status approached significance, F(1,22) = 3.31, MSE = 98.06, η 2 <sup>G</sup> = 0.08, p = 0.08; asynchronies tended to be larger when the confederate was introduced as a partner than as an experimenter.

To test the possibility that the participants' response to the social status of the confederate was to use a strategy of following (lagging) the confederate when introduced as experimenter versus participant, we also measured the signed asynchronies across the entire live performances, defined as participant's tone onsets minus confederate's tone onsets. The mean signed asynchronies in the Both-present condition were equivalent when the confederate was introduced as experimenter (M = 4.96 ms), and as partner [M = 5.41 ms; t(22) = 0.16, p = 0.88], indicating that the participants did not alter any strategy to lag or lead the confederate in response to how the confederate was introduced across the live performances.

# Effects of Perturbations on Asynchronies

The absolute asynchronies during the perturbation windows were tested next for the effects of Social Status, Feedback condition, and Animacy. **Figure 4** shows the mean values. Main effects of Feedback condition, F(1,22) = 60.14, MSE = 36.62, η 2 <sup>G</sup> = 0.46, p < 0.001, and of Animacy, F(1,22) = 57.68, MSE = 47.07, η 2 <sup>G</sup> = 0.26, p < 0.001, indicated that asynchronies were larger when performances were pre-recorded than when they were live, as expected. In addition, asynchronies increased as feedback was removed, with larger asynchronies in the Bothremoved condition than in the Both-present condition (Tukey contrasts), t(66) = 10.85, p < 0.001, the Participant-removed condition, t(66) = 12.22, p < 0.001, and the Confederateremoved condition, t(66) = 8.40, p < 0.001. The Confederateremoved condition generated significantly larger asynchronies than the Both-present condition, t(66) = 4.05, p < 0.001, and the Participant-removed condition, t(66) = 5.41, p < 0.001, and significantly smaller asynchronies than the Both-removed condition, t(66) = −6.80, p < 0.001. The main effect of Social Status approached significance, F(1,22) = 3.10, MSE = 87.07, η 2 <sup>G</sup> = 0.03, p = 0.09, with slightly larger asynchronies when the confederate was introduced as a partner (M = 26.34 ms) than as an experimenter (M = 23.97 ms).

There was also a significant interaction of Feedback condition with Animacy on the asynchronies, F(3,66) = 8.63, MSE = 36.09, η 2 <sup>G</sup> = 0.11, p < 0.001. As shown in **Figure 4**, removal of the participant's feedback decreased the asynchronies in the recorded performances such that they did not differ from the asynchronies in the live performances. Live performances generated uniformly smaller asynchronies than pre-recorded performances for Both-present (Tukey contrast), t(86.7) = −4.01, p < 0.001, Confederate-removed condition, t(86.7) = −8.11, p < 0.001, and for Both-removed condition, t(86.7) = −3.18, p < 0.01.

The increased asynchronies in the no-sound condition coincided with the participant's slower tempo (as shown in **Figure 2**), suggesting that this was the most difficult condition. To confirm that the asynchrony effects in the Both-removed condition were not simply due to tempo effects, the analyses were recomputed for windowed asynchrony values divided by the previous IOI (IOI was based on participant in the first analysis, and on mean of participant and confederate in a second analysis). The ANOVAs reported above were repeated on the adjusted asynchronies; the main effects and interactions were unchanged from those reported, suggesting that the difficulty due to feedback removal affected both coordination and tempo.

# Effects of Social Status on Perceived Interaction

Participants' responses to questions about the social interaction were compared for the two Social Status groups who were introduced to the confederate as experimenter and as partner; each question was answered on a scale of 1–7. **Table 3** shows the mean values for responses by each group. As shown in **Table 3**, participants who were introduced to the confederate as an experimenter judged their interaction to be significantly smoother and more pleasant overall than those who were

FIGURE 3 | Mean absolute asynchronies (in ms) in entire baseline performances ("sound present" auditory feedback condition) for live and recorded performances by social status of confederate (experimenter or partner).

introduced to him as a partner. Interestingly, this difference is in the same direction as the asynchrony values, which were slightly larger (3 ms) for the partner-introduced than the experimenterintroduced performances (although the difference did not reach significance).

In addition, participants were asked whether they successfully synchronized with their partner, using a 7-point scale (1 = Not at all, 7 = Very much so). Participants who were introduced to the confederate as an experimenter judged their synchronization as more successful (mean score = 5.92) than those who were introduced as partner (M = 3.75, Mann–Whitney U = 123.5, p = 0.003). Thus, both perceived social interaction and perceived

#### TABLE 3 | Mean responses to social interaction questionnaire by confederate's social status.


synchronization success were influenced by the social status of the partner manipulation.

# DISCUSSION

This study identified three major factors that influence the balance in temporal coordination among performing musicians. We measured duet performances of pianists each of whom performed with both live and recorded performances by the same confederate pianist. To our knowledge, this was the first study to compare animate (live) and inanimate (recorded) and social imbalance conditions in the same experiment, allowing a comparison of bidirectional and unidirectional coupling effects by the same performer. Consistently larger asynchronies were observed in performances of recorded than live performances across all conditions, consistent with the hypothesis that performers used bidirectional coupling during live performances and unidirectional coupling when playing with recorded performances (Riley et al., 2011). This finding held when the timing characteristics (tempo mean and variability) of the confederate's performances were equivalent across live/recorded performances, and across the removal of auditory feedback from participant and confederate parts.

The study also investigated the role of the partner's social status on temporal coordination. The knowledge that the participants believed the confederate had about the task created a balanced (equal) partner relationship of participant and confederate for half of the participants, and an unbalanced (hierarchical) relationship with the "experimenter" for the other half. Slightly larger asynchronies, which reflect more instability, were observed for participants who performed with "partners" than with "experimenters." This effect was significant only when participants played with recordings (**Figure 3**). The weak effect is perhaps not surprising for experienced musicians, as they rely on an ability to perform in imbalanced relationships (conductororchestra) as well as with musicians of unequal experience.

Larger effects of social status were observed in the participants' judgments of perceived synchrony. Ratings given by participants in the "experimenter" confederate group were significantly higher than the "partner" group for the question of how successful they perceived their synchronization to be. Although the social imbalance manipulation did not create large instabilities in the observed piano keystroke asynchronies, it did create differences in participants' perceived success in synchrony. One possibility is that the label "experimenter" heightened performers' awareness or attunement to the temporal instability. The notion of temporal attunement has been applied to music explicitly to capture listeners' anticipatory behavior for when rhythmic events will occur (Drake et al., 2000). Thus, performers may have been more temporally attuned to the confederate when the social manipulation made the confederate's role more important. Another possibility for the disparity between observed and perceived synchrony was a desire to please the experimenter; participants did give higher ratings for the smoothness of their interaction with the confederate, and how pleasant they found it (**Table 3**), when the confederate was introduced as experimenter. They did not, however, rate the confederate more likeable when introduced as experimenter than partner. Thus, the manipulation of social balance between partners seemed to change their perception of their social interaction more than their degree of temporal coordination.

Removal of auditory feedback from both pianists' headphones also created an imbalance between the duet pianists. As expected, asynchronies were largest when feedback from both parts was removed. In addition, feedback removal from the confederate's part caused larger asynchronies than feedback removal from the participant's part in the live performances, consistent with previous findings (Goebl and Palmer, 2009). Feedback removal of the participant's or confederate's parts did not change synchronization with recordings, presumably because the inanimate recordings permit only unidirectional coupling.

In sum, temporal coordination in joint music performance provides an excellent testing ground for dynamical systems principles of coupling that facilitate the maintenance of a stable phase relationship. The current study has demonstrated how auditory feedback provides information to guide that coupling, and how the animacy of the performance (live or recorded) alters the type of coupling (bidirectional or unidirectional). The findings are also consistent with the dynamical model's assumption that bidirectional coupling between partners, available in live performance, yields an optimal form of coordination, compared with unidirectional coupling, such as what arises when a performer plays with a recording. The effects of social status on temporal coordination and perceived synchrony are consistent with previous findings that temporal synchrony and perceived affiliation are correlated in tapping tasks (Hove and Risen, 2009). The unresponsive partner: a performer who does not react ("why aren't you listening to me?", cried the soloist to the accompanist), requires the other member to do all the adapting for the pair to stay together.

# ETHICS STATEMENT

The McGill University Research Ethics Board reviewed and approved the study. Both oral and written (signed) consent was obtained from all participants prior to their participation in the experiment.

# AUTHOR CONTRIBUTIONS

fpsyg-08-00149 February 10, 2017 Time: 17:47 # 9

AD: contributed to the conception, design, data acquisition, analysis, interpretation, drafting, and revising. DC: contributed to the conception, design, data acquisition, analysis of data, drafting. MW: contributed to the conception, design, revising of the work. CP: contributed to the conception, design, interpretation, drafting, revising.

# REFERENCES


# ACKNOWLEDGMENTS

This research was funded in part by an NSERC-Create postdoctoral fellowship to AD, by Canada Research Chair and NSERC grant 298173 to CP, and by NSERC grant RGPIN-2014-05672 to MW. We gratefully acknowledge the assistance of Frances Spidle and Vivek Kant.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Demos, Carter, Wanderley and Palmer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Reference as an Interactive Achievement: Sequential and Longitudinal Analyses of Labeling Interactions in Shared Book Reading and Free Play

Vivien Heller <sup>1</sup> \* and Katharina J. Rohlfing<sup>2</sup>

*<sup>1</sup> Department of German Studies, School of Humanities and Cultural Studies, University of Wuppertal, Wuppertal, Germany, <sup>2</sup> Faculty of Arts and Humanities, Paderborn University, Paderborn, Germany*

The present study examines how young children and their caregivers establish reference by jointly developing stable patterns of bodily, perceptual, and interactive coordination. Our longitudinal investigation focuses on two mother–child dyads engaged in picture-book reading and play. The dyads were videotaped at home once every 6 weeks while the children aged from 9 to 24 months. Inspired by conversation analysis and multimodal analysis, our developmental approach builds on the insight that the situated and embodied production of reference is fundamentally an interactive achievement. To examine the acquisition of reference, we developed a descriptive instrument that takes account of not only the dyad's joint accomplishment but also each participant's contributions to it. The instrument is based on the sequential reconstruction of the jobs that both participants have to accomplish jointly in order to achieve reference: establishing visual perception as a relevant resource, constituting a domain of scrutiny, locating a target, and construing the (meaning of the) referent. Methodologically, these jobs serve as a *tertium comparationis* for the longitudinal comparison of both the adult's as well as the child's contributions to establishing reference. We used this instrument to examine (1) what bodily and verbal resources the participants employed, and (2) how their contributions to accomplishing the jobs changed over time. Findings showed that the acquisition of reference was closely related to the child's increasing ability to recognize, fulfill, and set up conditional relevancies. We conclude that the adult's dynamic and contextualized use of conditional relevancies, recipient design, and observability is a crucial driving force in the acquisition of reference.

Keywords: reference, sequential organization, conditional relevance, observability, coordination, interaction, language acquisition, joint attention

# INTRODUCTION

Determining how young children come to understand that words refer to something has been a continuous topic in language acquisition research. For Bruner (1976, p. 69), the acquisition of reference entails the problem of "how one individual manages to get another to share, attend to, zero in upon a topic that is occupying him." Arriving at a shared understanding of a referent

Edited by:

*Rachel W. Kallen, University of Cincinnati, USA*

#### Reviewed by:

*Valentina Fantasia, Max Planck Institute for Human Development (MPG), Germany Nicole Rossmanith, University of Portsmouth, UK*

> \*Correspondence: *Vivien Heller vheller@uni-wuppertal.de*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *30 September 2016* Accepted: *19 January 2017* Published: *14 February 2017*

#### Citation:

*Heller V and Rohlfing KJ (2017) Reference as an Interactive Achievement: Sequential and Longitudinal Analyses of Labeling Interactions in Shared Book Reading and Free Play. Front. Psychol. 8:139. doi: 10.3389/fpsyg.2017.00139* is a substantial challenge when reference is conceived merely as words being mapped onto their referents, because in the real world, there are simply too many options when it comes to selecting one of the numerous potential referents (Trueswell et al., 2016). Considering the fact that speakers often produce "proxy" or "dummy" noun phrases (e.g., "what's-his-name") for the referent, Clark and Wilkes-Gibbs (1986) asked how it is possible for participants to be sufficiently sure of having achieved a mutual understanding of the referent—a problem that Clark and Marshall (1981) referred to as the "mutual knowledge paradox." This paradox also exists when reference is established non-verbally by, for example, pointing to an object within the coparticipants' joint perceptual space. Pointing is usually understood as a "communicative body movement that projects a vector from a body part" and "indicates a certain direction, location, or object" (Kita, 2003, p. 1). At first sight, the meaning of pointing seems to be self-evident in that it requires only the recipient to "trace, by symbolic extrapolation, a path from the gesture to the thing" (Fillmore, 1997, p. 6). Yet the mutual knowledge paradox remains, because pointing gestures only roughly indicate a certain area that may be populated by various persons, objects, and so forth. Even if the recipient manages to locate the pointed-to target and thus to resolve this perceptual ambiguity, she or he still needs to sort out another problem: Does the pointing refer to the object as such, or to one of its features; or does it simply predicate that the object is located in a particular area (see Kita, 2003, p. 3)? The meaning of the pointed-to target—the actual referent—still remains ambiguous. And yet, in everyday interaction, reference is usually achieved without problems.

In this article, we assume that participants themselves have developed procedural and linguistic solutions for dealing with perceptual and semantic ambiguities. Acquiring reference would then mean acquiring these procedural and linguistic solutions. Following a pragmatic perspective (Rohlfing et al., 2016), we assume that for a situation to become "shared," interactants have to arrive at a joint understanding of the purpose of their activity. As a result, children need to learn "as much about the rules of dialogue" as they learn about the "lexical labels" (Bruner, 1976, p. 74).

A number of answers have been proposed in response to the question when and how children engage in establishing joint reference. In the following, we shall give a rough overview of relevant streams of research, and show how existing studies have mapped out the necessary cognitive and communicative resources as well as the necessary external resources for the acquisition of reference.

# Cognitive and Communicative Resources for Establishing Reference

Children have been found to engage in joint attention (JA) from 9 months onward. JA is achieved when both partners manage to engage with the same referent. However, it was results reported by Baldwin (1991, 1993) that first motivated a closer investigation of the child's sociocognitive abilities. She demonstrated that infants "are not just passive in the joint reference enterprise" (Baldwin, 1993, p. 398). They have a range of communicative means at their disposal with which not only to display their interest in objects, persons, and so forth but also to direct their coparticipant's attention (e.g., Liszkowski et al., 2004; Liszkowski, 2005; Begus and Southgate, 2012). They use these resources for both imperative and declarative purposes (Bates et al., 1976; Franco and Butterworth, 1996; Liszkowski et al., 2004, 2007). Moreover, they understand that their actions have a bearing on their partner, and they use this knowledge to elicit a label or further talk (Begus and Southgate, 2012; Begus et al., 2014). Pointing is among the first communicative means for directing the coparticipant's attention to objects and events (Bruner, 1983; Franco and Butterworth, 1991; Marcos, 1991; Butterworth and Itakura, 2000; Behne et al., 2012). At around 14 months of age, children accompany their pointing with the local deictic "da!" or "there" (Clark, 1978; Clark and Sengul, 1978; Murphy, 1978). Clark (1978) has proposed four stages in the development from deictic gestures to deictic words:

At the first stage, children use gestures like pointing to pick out an object for their "listeners." At the second, they add to their gesture their first deictic word, often in the form eh (from adult there) or da (from adult that). Later still, at a third stage, they combine a deictic word with other words to form longer utterances like That shoe... Finally, at a fourth stage, they learn how to use deictic words in utterances without any accompanying gesture (p. 96).

Whereas the stages capture a progression in the child's use of deictic means, they do not reflect the need for deixis to also be embedded in the ongoing interaction. Yet to be successful, the child has to make sure that the partner is ready to perceive the pointing ("visual checking," see Franco and Butterworth, 1996). In other words, pointing must be prepared interactively. Likewise, pointing grants relevance to a certain reaction by the recipient. Filipi (2013, p. 145) has shown that children first learn to establish joint attention and are then held "accountable for 'doing' something with that attention when it is provided." Hence, it seems that the "recognition of a situation as communication" (Gliga and Csibra, 2009, p. 352) and the child's sensitivity to the organization and the purpose of the task is important for acquiring reference. Studies applying sequential analyses to young children's interactions stress the public nature or "observability" of each participant's actions as a crucial resource (Wootton, 1997; Kidwell and Zimmerman, 2006, 2007). What is lacking, however, is studies on early interactions showing how this "observability" is achieved and adapted to children's communicative and cognitive abilities.

# External Resources for the Acquisition of Reference

Input-oriented approaches have examined how adults facilitate JA; how they modify their talk in episodes of JA; and how adult feedback affects developments in referential communication (see Ate¸s-¸Sen and Küntay, 2015, for an overview). Mothers have been found to point and refer to objects verbally more often in episodes of JA (e.g., Bruner, 1981; Tomasello and Farrar, 1986; Marcos, 1991). Furthermore, parameters for "referential transparency" (Trueswell et al., 2016, p. 11; Schmidt, 1996) have been identified that help children to attend to novel objects visually and thus to resolve ambiguities when linking objects with words (Pruden et al., 2006; Horst and Samuelson, 2008; Axelsson et al., 2012; Liszkowski, 2014; Trueswell et al., 2016; Yu and Smith, 2016). Adult coparticipants often present objects and actions in salient ways. They bring objects into the child's visual focus, shake them, and thus exploit the child's sensibility to human movement (e.g., Rader and Zukow-Goldring, 2010; Pitsch et al., 2014; Yu and Smith, 2016). In interactions with older children, mothers rely on verbal behavior to initiate and maintain their child's attention (Estigarribia and Clark, 2007). Although it could be shown that the caregiver's "input" in episodes of JA correlated positively with the child's use of pointing (Murphy, 1978; Marcos, 1991) and vocabulary (Tomasello and Farrar, 1986), these studies do not fully explain how participants actually arrive at a shared situation and a mutual understanding of the referent—a demand that goes clearly beyond joint attention to a particular target and requires the solving of semantic tasks.

Another strand of research investigating external resources looks beyond the phenomenon of JA. These studies take a broader view on the interactive contexts in which reference is established, and examine how interaction forms a source in the child's cognitive development (Vygotsky, 1998). A number of studies taking this approach have examined how the sequential structure of routines such as games or joint book readings is established (Ninio and Bruner, 1978; Snow and Goldfield, 1983; Filipi, 2009, 2013; Fantasia et al., 2014; Rossmanith et al., 2014; Heller and Rohlfing, 2015; Rohlfing et al., 2015, 2016). Based on a longitudinal study of one mother–child dyad, Ninio and Bruner (1978, p. 8) demonstrated that picture-book reading takes the form of a "standard action format" that consists of recurring dialogue cycles, each comprising an orderly sequence of moves. From a conversation analytic perspective, the structure is underpinned by "conditional relevancies" (Schegloff and Sacks, 1973); that is, normative expectations regarding what type of "relevant next" should follow a move of a certain type. In interactions with young children, adults have been found to "plan ahead" for conditional relevancies, thus guiding the child and creating "an interactional context that is most likely to occasion a desired response" (Mehus, 2011, p. 133). Such stable organization helps children to identify and predict recurring semantic-pragmatic elements in a sequence (Ratner and Bruner, 1978; Snow and Goldfield, 1983). Drawing on microanalyses, Rossmanith and colleagues have examined how caregivers structure book reading routines by shaping parts of activities into bigger or smaller dynamic "action arcs" with a beginning, build up, climax, and resolution (Rossmanith et al., 2014, p. 8). These render the structure of the routine visible for the child. By providing a recurring pattern, they facilitate the coordination of not only visible behaviors but also cognitive and perceptual operations (Rohlfing et al., 2016).

Focusing on adult–adult interactions, multimodal and sequential approaches have examined which "practical problems" participants have to solve when establishing reference. They have shown that joint reference is a sequentially organized process that requires participants' coordination of body posture, gaze, movements and verbal resources (Hanks, 2000; Hindmarsh and Heath, 2000; Goodwin, 2003b; Stukenbrock, 2009; Mondada, 2012; Sidnell and Enfield, 2016). The present study examines how children become involved in this interactive and sequentially organized process and how stable patterns of bodily, perceptual, and interactive coordination emerge over time. In the following section, we present an analytical instrument with which to describe this process. The instrument is based on the sequential reconstruction of the interactive jobs (see next section) that are constitutive for establishing reference. Using these jobs as a tertium comparationis, we examine how each job is achieved interactively at different data points and relate changes in the devices available to children and their shares in performing the jobs to changes in the adult's interactive demands and support. In the last section, we develop an explanatory account of what drives the acquisition of reference. We argue that fundamental features of interaction—sequential organization, recipient design, and observability—inform the supportive practices that adults employ to achieve joint reference in interactions with young children.

# A DESCRIPTIVE INSTRUMENT FOR ANALYZING REFERENCE AND ITS ACQUISITION AS INTERACTIVE ACHIEVEMENTS

# Interactive Jobs of Establishing Reference

When establishing reference, participants have to solve at least two problems: First, they have to deal with the perceptual problem of locating a target. Second, they have to solve the semantic problem of identifying or rather construing the referent. Hence, it appears that establishing reference inheres recurrent practical problems that require the ongoing and dynamic coordination of the participants' bodily and visual conduct. This is why participants rely on procedural solutions or "practical methods" (Garfinkel, 1967) that enable them to treat and perform "establishing reference" as an "unproblematic" activity in their everyday lives. Building on a framework based on sequential analyses of establishing reference in different settings such as dinner talk, guided tours, self-defense classes, physician– patient consultations (Stukenbrock, 2009, 2015), and picturebook reading (Heller and Rohlfing, 2015), we assume that the procedural solution to establishing reference entails four sequentially ordered jobs.

# Job 1: Establishing Visual Perception as a Relevant Resource

To make a pointing gesture perceptible, the pointing person has to establish her or his body as a perceptually relevant resource (Hindmarsh and Heath, 2000; Goodwin, 2003b; Stukenbrock, 2009; Mondada, 2012). Therefore, bodily displays must be coordinated with the recipient's visual attention. Hindmarsh and Heath (2000) have shown that speakers employ verbal resources such as deictic terms ("here!") to highlight the very moment at which visual orientation becomes relevant—a resource that is also employed in interactions with children (Estigarribia and Clark, 2007, p. 804). The recipient, on the other hand, is required to direct her or his visual attention toward the speaker and to understand that the partner's arm or index finger is not relevant in itself but should be interpreted as an instrument referring to something else and thus serving as an intermediary locus of attention (Stukenbrock, 2009; Rader and Zukow-Goldring, 2010).

## Job 2: Constituting a Domain of Scrutiny

Next, the recipient needs to understand what space the speaker is orienting toward. It is important to emphasize that the speaker's display of attention—her or his orientation toward a certain space by posture, pointing, or local deictics—does not yet indicate a particular object in space. Rather than transparently locating the target itself, it "specifies... a domain of scrutiny, a region where the addressee should begin to search for something that might count as target" (Goodwin, 2003a, p. 73). The co-participant is thus required to reorient her or his visual attention; that is, to shift it from the body of the speaker to a "search space" (Stukenbrock, 2009, p. 304). At the same time, the speaker needs to monitor whether the co-participant construes the search space in the same way as her- or him self. Hence, this job is accomplished when both participants have established a particular space as a shared focus of attention.

# Job 3: Locating the Target

This job requires the recipient to determine the particular target of the pointing gesture. Unlike Butterworth, we do not assume that the act of locating coincides with the identification of the referent. Butterworth (2003) suggests that certain ecological mechanisms enable a "'meeting of minds' in the selfsame object" (p. 22). Likewise, other studies have assumed that locating a target already implies understanding its meaning (e.g., Pruden et al., 2006; Axelsson et al., 2012; Trueswell et al., 2016). Admittedly, locating the target and construing the referent are often achieved at one go. Yet, misunderstandings and repairs do occur in the process of establishing reference (see below), suggesting that locating a target and construing the referent are in fact different achievements (Stukenbrock, 2009, 2015). Whereas locating a target requires a perceptual effort (which may lead to shared perception), construing the referent is a semantic process (occasioning shared understanding). Our own analyses of the ways in which not yet competent members are involved in establishing reference (Heller and Rohlfing, 2015) provide further evidence for the need to distinguish between the two.

### Job 4: Construing the Referent

Once the target is located, the recipient needs to disambiguate its meaning. Therefore, she or he needs to tie acts of pointing or verbal deictics and labels "to the construals of entities and events provided by other meaning-making resources as participants work to carry out courses of collaborative action with each other" (Goodwin, 2003b, p. 218). Hence, to identify the referent, the coparticipant draws on contextual resources; that is, her or his understanding of the joint activity (e.g., book reading, building a tower) in which the reference is embedded (Hindmarsh and Heath, 2000; Liszkowski, 2014). She or he then develops hypotheses about the meaning of the pointed-to target (Stukenbrock, 2009, p. 307). This semantic work is conducted visibly and verbally: Adult recipients often display their understanding that can then be confirmed, specified, or repaired by the speaker (Stukenbrock, 2015, p. 316).

To summarize, we conceptualize reference as an interactive and sequentially organized process that requires participants to observably and methodically orient themselves toward four jobs. Whereas previous developmental research has focused mainly on Jobs 1 and 3 (Estigarribia and Clark, 2007), sequential analyses provide evidence that establishing reference also requires participants to constitute a domain of scrutiny and to construe the referent. The four sequentially ordered jobs thus serve as a procedural solution to practical problems of perceptual and semantic ambiguity. Note that scope of our descriptive instrument covers basic forms of reference; that is, activities in which participants refer to something in their immediate surroundings. It does not apply to references to past, future, or fictitious events.

# Descriptive Levels of the Instrument

Starting from the perspective that reference is fundamentally an interactive achievement, a developmental approach to reference has to tackle the question how individual abilities can be described without ignoring the fact that reference is a collaboratively organized process. Our solution to this problem is to view the interactive process itself as a part of the analysis. Therefore, we build on an analytical approach developed by Hausendorf and Quasthoff (2005) designed originally to examine the acquisition of narrative competence. Adopting this instrument for the acquisition of reference, we distinguish two levels of description: the level of jobs and the level of the devices needed to get the jobs done.

Jobs represent the organizational tasks (Sacks, 1995; Quasthoff et al., 2017) the participants orient toward in the joint achievement of reference. Because these jobs follow a sequential logic, this level of description captures the sequential organization of reference. Furthermore, the present analysis will demonstrate that each of the four jobs is organized as a two-part exchange or adjacency pair in which a move of type A establishes a "conditional relevance" for a move of type B (Schegloff and Sacks, 1973). Hence, the second move is functionally dependent on (or made normatively expectable by) the first. Each job has been achieved when the second pair part of the expected type has been produced. Reference, then, is successfully established when each of the four jobs has been fulfilled regardless of how and by whom. The jobs thus serve as a tertium comparationis for the longitudinal comparison of both the adult's and the child's contributions to establishing reference.

Devices is the term given to the bodily, prosodic, and verbal means or resources with which the jobs are accomplished. They describe each participant's contributions to the jobs. Moreover, different devices can be deployed to accomplish the jobs.

By distinguishing between interactive jobs and devices, the instrument takes into account both the dyad's joint accomplishment and each participant's contributions to establishing reference. It thus provides the basis for a longitudinal comparison of the adult's and the child's contributions without losing sight of the fact that reference is coconstructed. This allows us to examine (1) what bodily-visual and verbal resources participants employ to accomplish the jobs and (2) how their shares in the jobs change over time.

# MATERIALS AND METHODS

# Participants

The longitudinal analysis is based on video recordings of face-to-face interactions between caregivers and two typically developing children as they aged from 9 to 24 months. These dyads were selected from a larger corpus (e.g., Rohlfing et al., 2015) and include children of both genders. Based on our corpus, they represent "typical" courses of language acquisition. Participants were recruited in the German city of Bielefeld and its surroundings. The mothers' educational background was comparable; both had university degrees.

# Data Collection and Transcription

Each family was visited at home once every 6 weeks (12 data points). Two different activities were videotaped, free play (lasting 20–25 min) and picture-book reading (lasting 5– 10 min). For the latter activity, the dyads were given a colorful folder: Each page presented photographs showing, for example, a spoon on a mug or a child on a swing. Altogether, the corpus comprises 10.5 h of video recordings. For each point of data collection, three to eight episodes were transcribed in Elan (EUDICO Linguistic Annotator; Lausberg and Sloetjes, 2009). The 93 transcripts cover 42 min of interaction. The transcription follows the notation conventions of Gesprächsanalytisches Transkriptionssystem 2 (GAT 2, Couper-Kuhlen and Barth-Weingarten, 2011). It depicts participants' verbal, non-verbal (e.g., pointings, depictive gestures, gaze), and paraverbal actions (e.g., accentuation, pitch movement, loudness) in their sequential order (see Appendix). All transcripts were checked by two research assistants. Parents provided written informed consent for the study as well as specific consent for the publication of images in the transcripts. The names used in the transcripts are pseudonyms. The first number in the transcript title refers to the dyad (01 and 07); "BR" and "FP" refer to "book reading" and "free play."

# Analytical Procedure

The analysis entailed two steps: Drawing on conversation analysis (Sacks, 1995) and multimodal analysis (Streeck et al., 2011), we first examined how each job was achieved by the dyad in different interaction episodes (section Age-Related Sequential Analyses). This sequential analysis focused on the devices adults and children employed to get the jobs done. Examples are presented for four age spans (9–14, 15–17, 18–22, and 23–24 months). The age spans were not determined a priori, but are based on our analyses. They reflect changes in the adults' interactive demands and/or the children's contributions to establishing reference. In the second step, we related changes in the children's devices and shares in the jobs to changes in the adult's interactive demands and support (sections Longitudinal Comparison: Children's Devices and Shares in the Jobs and Longitudinal Comparison: Adults' Devices and Shares in the Jobs).

# ANALYSES AND FINDINGS

# Age-Related Sequential Analyses Establishing Visual Perception as a Relevant Resource (Job 1)

### **9–14 months**

How visual perception is established as a relevant resource depends decisively on the participants' bodily arrangements. For book reading with young children, mothers typically arrange a nested configuration (Ochs et al., 2005) and position the child on their lap facing outwards (**Figure 1**). Thus, the child shares a visual field with the mother and does not need to redirect her or his gaze from the mother's body to the pointed-to domain of scrutiny (Job 2). When the mother points to the book, both her finger and the domain of scrutiny can be perceived simultaneously (see Yu and Smith, 2013). During play, participants sit face to face or side by side (**Figure 2**). This arrangement requires the pointing person to first draw the coparticipant's visual attention to her or his own body.

In the first sequence, Lea (9 months) is in a nested position.

```
(1) 07-BR-spoon (9 months)
001 L [((turns page, looks at rings)) ]
002 M [AH:::: was ham wir denn DA:::;]
       AH::: what do we have the::re;
003 L ((looks at picture))
```
FIGURE 1 | Nested arrangement.

At the beginning of episode (1), the participants' visual attention is not coordinated. While Lea is turning the page and looking at the rings of the file, the adult is looking at the picture. At this moment, the adult produces a what question that is prefaced with a lengthened interjection (line 2: "AH::::"). The question has a standard format:



In both examples, the interjection serves as an audible display of the speaker's excitement about having discovered something new. The pronoun "we" indicates that the speaker addresses the question to both herself and the coparticipant, thus making joint attention relevant. The local adverb "da"/"there" is lengthened and accented (see Estigarribia and Clark, 2007). Even if the child cannot understand the lexical meaning of the words, the prosody is designed to arouse her or his attention (see Pitsch et al., 2014, for a similar finding). Thus, in this sequential position, the what question does not ask for a label but establishes a sequential implication for the child to direct her or his gaze toward the mother's body (in this case: her hand). The what question and the bodily response thus form an adjacency pair; that is, a two-part exchange in which the second move is functionally dependent on the first. Forming the first pair part, the what question sets up a conditional relevance for visual coordination as a second pair part. In our data, the children frequently treat the what question as sequentially implicative by redirecting their gaze toward the mother's hand in front of the picture.

In play situations, mothers place the object in front of the child and thus reduce the need for the child to shift her or his gaze between mother and object.

#### (4) 01-FP-bag (10 months)


When opening the bag, the mother publically displays her own attention through a sharp intake of breath (line 2; see Rossmanith et al., 2014). This is followed by the summons "LOOK." (see also Murphy, 1978; Estigarribia and Clark, 2007; Pitsch et al., 2014; Rossmanith et al., 2014). Just like the what question, the summons forms a first pair part that establishes a conditional relevance for visual coordination.

#### **15–17 months**

From 15 to 17 months onward, a variation in the division of labor can be observed. Every now and then, it is the child who initiates the job of establishing visual perception as a relevant resource, thus reversing the sequential obligations. In extracts (5) and (6), Lea attracts her mother's attention by displaying her own excitement.


To establish visual perception as a relevant resource, Lea employs devices used previously by the adult: breathing in (Excerpt 5) and, a few weeks later, interjections (Excerpt 6). Here, the child also points to the book (line 3), thus already initiating the next job.

# **18–22 months**

In this age span, another change could be observed in the book-reading situation. Now, the first job was sometimes skipped. Visual perception was made relevant only at the very opening of the book-reading routine. As soon as the routine got under way, neither child nor adult employed interjections, questions, or summons to display their own and elicit the coparticipant's visual attention. A decrease in verbal attention getters was also observed by Estigarribia and Clark (2007), albeit with respect to interactions with older children. In the following extract, Ole locates a target (by vocalizing and pointing) immediately after his mother has turned the page.

(7) 01-BR-dino (19 months)



The skipping of the first job indicates that participants have arrived at a mutual understanding of the job and the overall activity—they mutually rely on each other's attention.

### **23–24 months**

From 23 months onward, it can be observed that children employ questions that the adult used months before. Given the fact that only a couple of weeks before, the coparticipants were found to mutually rely on each other's attention, this is surprising. The questions or prompts, however, are a device that enables the child not only to attract but also to direct the adult's attention in a more specific way (Clark, 1978) by, for example, asking for a label. The fact that the mother resists this obligation (as in Excerpt 8), reflects her heightened expectation with regard to Lea's ability to label the referent herself.


**Table 1** summarizes the devices adults and children employ to establish visual perception as a relevant resource. The list is not meant to be exhaustive. In different spatial configurations participants might well-draw on additional resources.

#### Constituting a Domain of Scrutiny (Job 2)

The most striking developments in constituting a domain of scrutiny can be observed between 9 and 14 months of age. In this period, the child comes to understand the book and the toy storage bag as domains of scrutiny. Again, this job is organized as an adjacency pair.

#### **9–14 months**

To establish joint reference, adult and child need to constitute a domain of scrutiny in which the target can be located. This entails two demands: First, the child must come to understand that (and for what purpose) something should be searched for a cognitive demand as formulated by Rohlfing et al. (2016). Second, the child must come to understand where—in which area—the search should be made. When the child is not in a nested configuration and does not "automatically" share the same visual focus with the mother, the adult frequently brings the domain of scrutiny into the child's immediate visual field.



# 003 O: ((touches book))

In Excerpt (9), the mother holds the book above the child's head. Overlapping with this, she uses the local adverb "HERE;" as a device to instruct the child where and when to look. This summons forms a first pair part of another adjacency pair and establishes a conditional relevance; this time, for orienting toward the domain of scrutiny. Ole produces the expected second pair part by touching the book.

Another prototypical device is where questions. Like the summons, they set up a conditional relevance for orienting toward a search space. Yet in contrast to a summons, they entail two demands: first, the understanding that something should be searched for; and second, what this something is (Murphy, 1978). When constituting a domain of scrutiny, the adult makes only the first aspect relevant.

#### (10) 07-BR-spoon (9 months)


Rather than conveying to the child what she is expected to search for, the mother's where question is designed to help Lea understand that she is expected to search for something. The accented "WHE::RE" (line 4) is designed to evoke a searching stance on the side of the child. Like the "HERE," the where question projects the relevance of orienting toward the domain of scrutiny.

Overlapping with her question, the mother therefore marks the domain of scrutiny by moving the book and lifting it closer to Lea (line 4). This action indicates that the mother does not yet expect Lea to understand that the book itself, located right in front of Lea, constitutes the search space. Nonetheless, Lea does not produce a relevant next action. After being asked the question a second time (line 5), Lea bends forward and touches the book with her face (line 6). By repeating the question for a third time (line 7), the mother, however, does not ratify this reaction as an adequate response.

The analysis reveals that constituting a domain of scrutiny depends crucially on a mutual understanding of the current context of interaction. In this case, this job is not achieved because it requires the child to understand the purpose for which the book is being used. Although the domain of scrutiny is already in the child's visual focus, it is not recognized as such. This shows that constituting a domain of scrutiny is not only a matter of visual orientation but likewise a matter of understanding the purpose of searching: "Beyond the visual conduct, participants draw upon the activities in which reference emerges and forms a part, in order to produce, and make sense of, reference" (Hindmarsh and Heath, 2000, p. 1857).

In the context of book reading, understanding the purpose also involves knowing how to deal with pictures. During the first episodes of book reading, the child explores the book as an object:



019 L [((strokes with rIF over picture ]

Responding to the mother's where question (line 5), Lea grasps the rings of the file (line 6). The adult allows time for exploring the materiality of the pictures and thus for experiencing the physical impossibility of taking something "out of the book." When locating objects herself, the mother traces their form (line 9), thus pointing to the depicted object and, at the same time, highlighting its depictive nature as such ("completely SMOOTH"; see Rohlfing et al., 2015, for similar strategies). Understanding depiction as such is a prerequisite for understanding what can be done with books and how they constitute a domain of scrutiny (see Ganea and Canfield, 2015, for a recent summary).

### **15–17 months**

From 15 months onward, children usually display an understanding of the book-reading routine. As soon as they know how the book is used, adults do not need to establish the book as a domain of scrutiny. Therefore, this job is skipped in this particular routine:


Lea turns page and keeps her eyes on the book. The mother initiates the next cycle of establishing reference by displaying her excitement (Job 1). Then she immediately labels the referent (Job 4). Lea's hand is held in the air; it is not clear whether it depicts the fishing rod or is just held "ready."

In the play setting, the job retains its importance. At 15–17 months, children start to use pointing to refer to distant entities that the co-participant is currently not oriented toward. In the following example, Ole establishes visual attention as a resource (Job 1) by standing up, moving into his mother's visual focus, and initiating eye contact. Then he points behind him (where a visitor is waiting behind the corner), thereby constituting a domain of scrutiny (Job 2).

#### (13) 01-BR-thinking (17 months)

$$\text{is } \mathsf{M} \text{ or } \mathsf{cut} \text{ is } \mathsf{a}\text{-c\`on} \text{ or } \mathsf{a}\text{-c\`on} \text{ or } \mathsf{a}\text{-c\`on} \text{ } \mathsf{r}\text{\`on} \text{ } \mathsf{r}\text{\`on}$$

visual focus)) 021 |!DA!- |

!THERE!- |((looks at M, points to person standing behind the wall)) |

022 M: sanDAlen; sandals

$$\begin{array}{l|l} \text{023} & \text{O:} & \text{!} \text{DA!}- & \text{!} \\ & & & \text{!} \text{THERE!}- \\ & & & \text{!} \text{(1008) at M. points to place} \\ & & & \text{beidind him} \\ 024 & \text{M: } \text{w01ts nochma GUCER. geh:n.} \\ & & \text{wana go looking again.} \\ 025 & \text{O:} & \text{(thinking face)} \\ & & \text{w02ts noch} \\ & & \text{w1:} \\ & & \text{w1:} \end{array}$$

Note that when pointing behind him, Ole's visual focus and the focus of his pointing diverge. Thus he orients toward two spaces at the same time: While maintaining eye contact with his mother, his pointing constitutes a domain of scrutiny in the opposite corner of the room. The mother formulates an assumption about the referent (line 22: "sandals"). By repeating the pointing and the local deictic (line 23), Ole indicates that his mother's assumption did not match what he wanted to convey and he prompts another attempt. The mother indeed produces another formulation (line 24) that he then accepts. Two issues are worth mentioning here: First, the example shows that Ole is able to create two diverging focuses of visual attention at the same time and thus to direct the coparticipant's gaze toward a distant space. Hence, he is able to initiate the first two jobs. The location of the target and the construal of the referent is left to the adult. Second, the episode provides an excellent example for our claim that "constituting a domain of scrutiny," "locating a target," and "construing the referent" are, in fact, different jobs. The mother's wrong assumption clearly shows that orienting toward a search space does not automatically imply the location and construal of the referent.

**Table 2** summarizes the devices adults and children employ to constitute a domain of scrutiny.

# Locating the Target (Job 3) **9–14 months**

This job requires the recipient to determine a certain target in the domain of scrutiny. Again, this involves a perceptual effort. In interactions with very young children, adult coparticipants enhance the perceptibility of the act of locating. In Excerpt (14), the mother makes her own search both visible and audible.

(14) 01-BR-dog (10 months)

003 O: ((touches book with rH))

#### TABLE 2 | Adults' and children's devices for constituting a domain of scrutiny.


As soon as both participants share a visual focus on the domain of scrutiny (line 3), the mother sustains the child's attention by breathing in. She then overtly displays the search with her eyes by moving her index finger across the page until an object is found. Temporally aligned with the movement of the finger, she produces a lengthened sound (line 5 and again in line 7) that ends exactly at the moment when the object is located. In this way, the mother makes the relevant action—locating an object—observable. Her finger is being used to guide Ole's visual focus. By following the movement of the index finger, Ole can locate the object at exactly that moment when the end of the search is marked vocally ("bs:::t"). Immediately afterwards, the target is also labeled [line 6 and 8, see section Construing the Referent (Job 4)].

a CAT;

In this example, a perceptual action is carried out publically and observably (Kidwell and Zimmerman, 2006). This facilitates the child to coordinate her or his attention (Rader and Zukow-Goldring, 2010; Pitsch et al., 2014), enabling her or him not only to locate the target but also to perceive the coparticipant's perception. Given that not only mutual perception of an object but also reciprocal "perception of being perceived" (Hausendorf, 1995, p. 186) is a sine qua non for establishing reference (and interaction in general), this way of making perceptual acts observable for the coparticipant is particularly suited to acquaint the child with the reciprocal perception of being perceived.

Another device that adults employ is where questions. In the previous section (Job 2), we showed that where questions are used initially to evoke a searching stance in the child. As the interaction moves forward, the second implication of the

#### (15) 07-BR-spoon (9 months)


010 L [((lh touches picture, fingers splayed))]

$$\begin{array}{ccccc} \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} \\ & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} \\ & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} \\ & & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}} & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}{0.0pt}{0.0pt} \end{array} & \begin{array}{ccccc} \text{\raisebox{0.0pt}{0.0pt}{0.0pt}{0.0pt}{0.0pt} \end{array} & \begin{array}{c} \\ & \text{\raisebox{0.0pt}{0.0pt}{0.0pt}{0.0pt} \end{array} \\ \end{array}$$

In line 5, the conditional relevance is reestablished. Now Lea touches the book with her face (line 6). Reestablishing the conditional relevance again (line 7), the mother does not ratify Lea's action (touching the book with the face) as an adequate reply. Only now, when a response is observably absent, does the mother answer the question herself, thus taking over Lea's responsibility. In her turn, the mother temporally aligns the point with the local adverb "THE:RE," which is not only accented but also produced with a breathy voice. Because "da/THE:RE" is emphasized repeatedly in this way, we refer to this device as the emphatic da/there. The emphatic da/there marks the fulfillment of the conditional relevance (i.e., the achievement of the goal of the search), and thus resolves the tension built up by the question (see Rossmanith et al., 2014).

In other words, crucial devices for locating the target—pointing and the verbal deictic—are again performed visibly and audibly and thus made available for the child. In concert with her mother, Lea brings her right hand to the book. Stopping the movement (line 9), she first observes the mother's pointing and then splays out her fingers before tapping the target. This movement is treated by the mother as a meaningful action. Using smile voice (Couper-Kuhlen and Barth-Weingarten, 2011), she both formulates and ratifies Lea's action (line 11). This way, she conventionalizes Lea's movement that now becomes a communicative means (Lock, 1980; Marcos, 1991).

Another device adults employ is manual guiding:

```
(16) 07-BR-mug (11 months)
019 (2.5)
020 M |DA:: ist der becher; |
      THE::RE is the mug;
      |((guides Lea's hand, [taps on
      picture)) |]
021 L [((looks at picture)) ]
022 M |DA: ist der becher; |
      THE:RE is the mug;
      |((taps on picture)) |
```
Before locating the target verbally, the mother has taken Lea's right hand. Note that the mother's index is positioned on Lea's metacarpus and pushes the other fingers downwards. Overlapping with her verbal utterance, she then brings Lea's index finger closer to the book (line 20). The touch of the book induces Lea's visual attention (see Zukow-Goldring, 1996): She shifts her gaze to the book (line 21). As soon as Lea looks at the book, the pointing is repeated. Again, the emphatic da/there and the touch of the book are temporally aligned (line 20 and line 22). Hence, what is made available here is not only the movement and the local adverb but also the sequential position in which the action is expected.

#### **15–17 months**

From 15 months onward, the children point without help. More importantly, they use this device in two different sequential positions, either as a response to the adult's where question or as an initiative to start off the job of locating. Pointing is now clearly established as a communicative device (Marcos et al., 2003). In extract (16), Lea responds to her mother's initiation.

Note that the prosodic design of the where question has been altered. The adult no longer places the focus accent on the interrogative pronoun but stresses the referent instead. This reflects heightened expectations regarding the child's understanding of the activity: The adult presupposes that the child has taken a searching stance and can now also focus on the object of the search.

The child responds to where questions by pointing and producing the verbal deictic "da"/"there." The local adverb is temporally aligned with the point and produced with an extra strong accent (line 12). Hence, it closely resembles the mother's emphatic da/there. Because the referent is already mentioned in the adult's where question, locating the target and identifying the referent are achieved at once. Now that the child consistently produces the second pair part, the mother expands the sequence. She not only reformulates the child's utterance as a syntactically complete sentence (line 13) but also produces an evaluation (line: 14: "exactly;"), thereby transforming the adjacency pair into a three-part structure. This structure, known as IRE (Mehan, 1979: initiation, reply, evaluation), is typically observed in formal and informal learning contexts. The book-reading activity is thus turned into an instructional routine (Tarplee, 2010), casting the caregiver in the role of the instructor and the child in the role of the instructee.

This contextualization of the activity goes hand in hand with two other innovations: As soon as establishing reference is achieved smoothly, the adult heightens the demand by asking series of questions (see also Murphy, 1978). Furthermore, the adult other-initiates self-corrections (Schegloff et al., 1977) when the child's response is inaccurate. Excerpt (18) illustrates this finding.



**007 M und wo ist die KLAMmer? and where is the PIN?** 008 L ((points to other part of the table)) 009 <<nodding> WUW;> **010 M die !WÄ!scheklammer; the !PIN!; 011 zeig mir mal die WÄscheklammer. show me the clothesPIN.** 012 L ((points to pin)) 013 M <<creaky> AH::> die wäscheklammer the clothespin is ist am TISCHon the TAble-

After Lea has answered the first question (line 5), the adult immediately produces a second question (line 7) that asks for another detail. Withholding an evaluative receipt (Filipi, 2013) and repeating the request once more (line 10), the adult other-initiates a correction. Note that the request is also explicated (line 11 "show me"), thereby making it easier for Lea to understand that the activity has been halted, and that a revision of the previous utterance is expected. Lea indeed interprets this as a request to self-correct her response: She corrects her answer by pointing to another detail of the picture (line 12), and this is confirmed by the mother (line 13).

Between 15 and 17 months, the children in our study also began to start the job of locating:

(19) 01-BR-stirring (16 months)

001 O: ((turns page))

**002 |((points to spoon))| |mh::; |**

After turning the page, Ole immediately initiates the job of locating a target by vocalizing and pointing. At the same time, Ole produces a vocal gesture (line 2: "mh::;") with which he labels the target [Job 4; we return to this gesture in the next section (Job 4)]. Hence, Ole has accomplished two jobs at once: he has located and identified the referent.

### **18–22 months**

From 18 months onward, children no longer display any difficulties in locating targets. In the book-reading routine, no further innovations could be observed with regard to the third job. New developments could be observed, however, when adults replaced their where questions with what questions, thereby requiring the child to label the referent her- or him self next section.

**Table 3** summarizes the devices adults and children employ to locate the target.



# Construing the Referent (Job 4)

# **9–14 months**

Although the younger children in our study do not yet possess the conventional communicative means to construe a referent, they are nonetheless being involved in this job. This is achieved by the adult's choice of a particular question format: Because the referent is already given in the where question, the act of locating coincides with construing the referent. When the child observably cannot not deal with this demand, the mother either assists by manual guiding or takes over the job, thus demonstrating how to deal with the interactive demand see previous section.

### **15–17 months**

From 15 months onward, the children in our study contributed to the job of construing the referent in a substantive way.

```
(19) 01-BR-spoon (17 months)
```



Ole initiates the job of locating a target and simultaneously depicts the movement and sound of drinking (line 2). Thus, he deploys a depictive practice that Streeck (2008, p. 295) terms acting: "the gestural action of the hand shows the practical action of a hand" and evokes an action. In this case, it is not the hand, but the mouth that represents itself in the action of drinking. With this depiction, Ole construes the referent. Now, the mother increases the interactive demand: She no longer uses where questions but asks what questions (line 4) that require the child to take on the main work of construing the referent (Murphy, 1978). Ole produces the verbal label "ÖFfel;" (line 7), which is aligned with a circling movement. The spoon is thus "indirectly represented by a schematic act that 'goes with"' it, a practice that Streeck (2008, p. 293) terms handling: "A motor schema or prehensile posture is coupled with an affordance of the referent." Ole has "invented" this gesture (Behne et al., 2014) in previous episodes. When an object (e.g., the spoon) has been labeled, his mother often extended the sequence by asking "and what does one do with it?" Ole responded with a stirring movement that was taken up by his mother. In this context, however, he does not employ the movement to refer to the activity but to the object itself. He thus reuses semiotic resources with a new method of representation (see Heller and Rohlfing, 2015).

### **18–22 months**

In this period, the adults continued to ask what questions. The interactive demands for the child increased in two respects:


When the child produces an unintelligible label (line 12), the adult systematically reestablishes and explicates the conditional relevance (lines 13–14). Halting the progression of the activity, the child is required to attend to the articulation of the word (line 15). In other cases, the adults reformulate the child's utterance, thus modeling the articulation of the word (line 22). Furthermore, the series of questions asking for familiar objects is extended (here: lines 17 and 20). The labels are then combined into one "thick description" (line 23).

## **23–24 months**

In the following months, the sequential pattern remained the same. Being ascribed the main responsibility for construing and labeling the referent, the child relied increasingly on verbal resources alone (see Murphy, 1978; Ninio and Bruner, 1978):


**Table 4** summarizes the devices adults and children employ to construe the referent.

# Longitudinal Comparison: Children's Devices and Shares in the Jobs

In the following section, we track changes in the children's devices and shares in the jobs across time. The longitudinal comparison reveals changes in two areas: On the level of jobs, the children came to understand the mechanism of conditional relevancies. On the level of devices, the children first made use of non-verbal resources that were then combined with and partially replaced by verbal resources.

### Developments on the Level of Jobs

As demonstrated above, establishing reference was achieved within four jobs that were each organized as an adjacency pair. Initially, each job was initiated by the adult who produced the first pair part. The children then increasingly displayed their understanding of the sequential implication by producing the second pair part. The age at which children started to orient toward conditional relevancies differed depending on the job: Whereas the conditional relevancies of establishing visual perception as a relevant resource (Job 1) were already responded to at 9 months of age (Excerpt 1), the implications of constituting a domain of scrutiny and locating a target first needed to be demonstrated by the adult. Only at the age of 15 months did the children produce conditionally relevant and conventional actions such as pointing to the target (Excerpts 13 and 17). Shortly afterwards, they also occasionally set up conditional relevancies for locating a target themselves (Excerpt 19). Whereas they started to initiate Jobs 1–3 by 15 months, we could observe initiations of construing the referent only at the age of 18 months (Excerpts 19 and 21).

In sum, on the level of jobs, the child's participation developed from being responsive to conditional relevancies to proactively setting up conditional relevancies. Furthermore, the children seemed to work their way forward through the sequential order: Both children mastered the initial jobs first before they occasionally began to initiate Job 4 and to oversee the whole sequential organization.

#### Developments on the Level of Devices

For the devices, the longitudinal comparison suggests that the children adopted means that had been used previously by the adult co-participant. At 15 months, the children initiated Job 1 by producing sharp intakes of breath and interjections (Excerpts 5 and 6); at the age of 24 months, they also employed what questions (Excerpt 8). All these devices had been used consistently by the adult. Likewise, the children acquired devices for locating a target that the adult co-participant used throughout the episodes: Pointing and pointing aligned with the emphatic da/there became a part of the children's repertoires around the age of 15 months (Excerpt 17). With respect to the fourth job, the children were first expected to identify a referent by pointing. When the mothers increased the demand by asking what questions instead of where questions, the children started to use depictive gestures (Excerpt 19). Remarkably, the use of the gestures was not based on imitation; instead, their "invention" (see Behne et al., 2014) had been "provoked" by the adults' questions about depicted objects such as "What does one do with a spoon?" (Heller and Rohlfing, 2015). Depictive gestures were replaced increasingly by verbal means (aligned with pointing) at the age of 18 months (Excerpt 20). This is in line with findings reported by Capirci et al. (1996), Goldin-Meadow and Butcher (2003), and Mai-Rong et al. (2015).

Hence, on the level of devices, development proceeds from using somatic and non-verbal resources to using verbal and


symbolic resources. However, somatic and non-verbal resources remain important across development and continue to facilitate the smooth execution of the jobs. The use of somatic and non-verbal resources allows children to actively participate in establishing reference long before they are able to speak. From 15 months onward, the sequential machinery of establishing reference runs smoothly. An important finding is, then, that at this age, children have acquired essential competencies for establishing reference even if they do not have the verbal resources at their command.

The longitudinal comparison shows that at an early age, the children's shares in the jobs do not conform to what is usually expected from competent participants in establishing reference (see Mehus, 2011, for a similar finding). Nevertheless, all jobs are accomplished. When the child does not respond to conditional relevancies in the expected way, the adult takes over the child's tasks and does "extra work" (see Hausendorf and Quasthoff, 2005). We shall pursue this aspect in the next section.

# Longitudinal Comparison: Adults' Devices and Shares in the Jobs

In the following section, we track changes in the adults' devices and shares in the jobs. On the level of jobs, the adult provided support for the child to understand the mechanisms of conditional relevancies. On the level of devices, the adult increasingly replaced somatic resources by symbolic ones and also required the child to employ verbal means.

### Changes on the Level of Jobs

Setting up conditional relevancies is the basis for initiating the jobs. In interactions with young children, this was done consistently by the adult (Excerpts 1–4). Furthermore, the adult made sure that the conditional relevancies remained in force when they were not responded to adequately by:


These supportive practices ensured the maintenance of the sequential order. Their use underwent considerable changes over the course of the child's second year of life:

Reestablishing conditional relevancies was observed throughout the child's second year. At the beginning, adults reestablished conditional relevancies when a response was absent (Excerpt 15). In this way, they ensured that the sequential implication remained in force (see Filipi, 2013: "pursuing a response"; Hausendorf and Quasthoff, 2005). Later, conditional relevancies were also reestablished when the response was inadequate; for example, when the child located the wrong target or produced an unintelligible label (Excerpt 20). This prompted the child to correct the response (see Tarplee, 2010). Explications of sequential implications (e.g., "show me" or "say, what is that"; see Hausendorf and Quasthoff, 2005) could be observed only from 17 months onward (Excerpts 18 and 20) when the child displayed sufficient understanding of verbal utterances. Before this, the caretakers tended to rely on making sequential implications perceptible (see below).

Assisting the child in producing a second pair part is contingent on establishing a conditional relevance. This was used mainly to get perceptual tasks done. Between 9 and14 months, the adult assisted the child in locating a target by guiding her or his visual focus and manual guiding (Excerpts 14 and 16). As soon as the children were able to locate a target themselves, assistance was omitted. These observations extend previous findings reported by Zukow-Goldring (1996) showing that the child's attention is "educated." Our analyses show that this "education" also includes the sequential position in which the action is expected.

Taking over a task, that is, producing the second pair part in place of the child, was realized only when a response remained absent even after reestablishing a conditional relevance (Excerpt 15). This is consistent with findings reported by Hausendorf and Quasthoff (2005). As soon as the child displayed her or his ability to produce the expected second pair part, the adult refrained from taking over the child's task. Taking over thus served two functions: First, it guaranteed that the job was in any way accomplished at all and that the activity could continue; second, it made the expected action observable for the child and provided a model for what to do when and how.

With the four practices of (1) setting up conditional relevancies, (2) reestablishing and explicating conditional relevancies, (3) assisting the child in producing a second pair part, and (4) taking over a task, the adults ensured that the jobs were being accomplished no matter how much the child was able to contribute. Thus, they were oriented toward the successful achievement of reference. At the same time, the highly differentiated employment of the four practices was oriented toward gradually reducing the adult's "extra work" (Hausendorf and Quasthoff, 2005) and arriving at equal contributions to establishing reference.

As soon as the children mastered certain jobs, they were also given the opportunity to set up conditional relevancies themselves. This observation is consistent with what Bruner describes as "handover" (Bruner, 1983, p. 60). In addition, our analyses revealed that the focus of the conditional relevancies shifted from perceptual to semantic ones. In interactions with young children, adults focused on those jobs that mainly entailed perceptual demands. The use of where questions in the first half of the second year made it easier for the dyad to achieve joint reference. Because the referent was already given with the adult's where question, the jobs of locating the target and construing the referent merged together and could both be achieved by pointing. Around 17 months, where questions were replaced consistently by what questions. This shifted the focus to the semantic task of construing and labeling the referent (Excerpts 19 and 20; see Miller and Weissenborn, 1979, for a similar finding). This also made it possible to differentiate familiar referents from unfamiliar ones (see Bruner, 1976) and thus to direct the child's attention to "new objects."

#### Changes on the Level of Devices

On the level of devices, it could be observed that sequential implications were first made understandable by sensorily perceptible means (see Zukow-Goldring, 1996) and increasingly by symbolic (linguistic) means. This could be seen in the design of the what questions. At 9–12 months, mothers made their own excitement perceptible by prefacing what questions with a sharp intake of breath or an interjection (Excerpt 3). At 20 months, these prefaces were usually omitted (Excerpt 20). Likewise, the design of the where questions changed over time. At 9 months, the mothers conveyed the expectation of searching as such by stressing and lengthening the interrogative and additionally lifting the book (Excerpt 14). Eight months later, the expectation became more specific when the target of the search was emphasized (Excerpt 16). It could be observed that the shift from perceptual to semantic tasks went hand in hand with the expectation that the child should increasingly use verbal resources (see Ninio and Bruner, 1978; Bruner, 1983). Whereas conventional non-verbal means such as pointing or gestural depiction continued to be important resources for establishing reference, the adult also asked the children to use verbal means.

In sum, the longitudinal analysis reveals that the availability of devices on the side of the children and their growing shares in the jobs correspond to changes in how adults maintain the sequential order of establishing reference by making use of the supportive practices described above. So far, we have shown that these practices ensured the accomplishment of reference between unequally competent partners in the here and now of each particular episode, and we have shown how this was done. In the next section, we ask what interactive mechanisms these practices are based on and how they drive the acquisition of reference.

# DISCUSSION: WHAT ARE THE DRIVING FORCES IN THE ACQUISITION OF REFERENCE?

On the basis of video-recorded labeling interactions of shared book reading and free play involving children from the age of 9–24 months and their mothers, we sequentially analyzed how the participants dealt with perceptual and semantic ambiguities and eventually established stable patterns of bodily, perceptual, and interactive coordination. In the subsequent longitudinal analysis, we tracked changes in the children's and adults' behavior and examined how caregivers managed to involve children in establishing reference.

Starting from the assumption that reference is fundamentally an interactive achievement, we proposed a descriptive instrument that rests upon empirically reconstructed jobs: (1) establishing visual perception as a relevant resource (2) constituting a domain of scrutiny, (3) locating a target, and (4) construing the referent. Differentiating between jobs and devices allowed us to relate differences in the children's participation in establishing reference to the adults' practices of sustaining the sequential organization.

Concerning the devices, our results (summarized in **Tables 1**–**4**) indicate that children adopted means that had been used previously by the adult. Importantly, Vygotsky (1998) point out that children can pick up only those means that are within their zone of proximal development. Our analyses demonstrate how caregivers fine-tuned their communicative expectations by making sequential implications understandable first by sensorily perceptible and only later by symbolic means. This progression was mirrored in the child's behavior proceeding from using somatic and nonverbal to using verbal and symbolic resources. Importantly however, it was the use of somatic resources that allowed the child to participate actively in establishing reference. These resources continued to facilitate the smooth execution of the jobs.

With regard to the level of jobs, our analyses extend previous findings in which only two tasks (i.e., getting and maintaining attention) were assumed to be involved in establishing reference (Estigarribia and Clark, 2007). Our sequential analyses of dyadic book reading and free play showed that, in fact, establishing reference involves four tasks. Analyses of misunderstandings further demonstrated that the jobs "locating a target" and "construing a referent" are indeed two different jobs that entail perceptual demands for the former and semantic demands for the latter. Furthermore, we showed that each of the four constitutive jobs of establishing reference is organized as an adjacency pair. Thus, each job requires contributions from both participants, with one participant setting up a conditional relevance and the other partner producing the expected second pair part. Joint reference is established successfully when each of the four consecutive relevancies is fulfilled. The four jobs constitute the pragmatic frame (Rohlfing et al., 2016) of establishing reference in which the sequential order of actions and the devices for realizing them become accessible in their pragmatic functions.

It could be observed that the adults employed supportive practices such as setting up, reestablishing, and explicating conditional relevancies; assisting the child; or taking over the child's task in order to maintain the sequential order. In the remainder of this article, we shall argue that these practices work as a driving force in the acquisition of reference, because they make use of basic features of interaction: conditional relevancies, recipient design, and observability. Our analyses show that these features are specifically contextualized in interactions between unequally competent partners (Wootton, 1997; Hausendorf and Quasthoff, 2005).

From an acquisitional perspective, the conditional relevancies (Schegloff and Sacks, 1973) that initiate each of the four constitutive jobs can be understood as interactive demands (Hausendorf and Quasthoff, 2005, p. 270). In constraining the child's actions, the adult's interactive demands serve as a scaffold (Bruner, 1978, p. 19) or yardstick for the child to act in expected and coordinated ways. The more competent partner supports this process by differentiating between acceptable and inacceptable responses (Bruner, 1983; Mehus, 2011). In this way, the child increasingly comes to draw on conventionalized resources (Lock, 1980).

Our longitudinal comparison revealed that the adults' interactive demands change considerably over time. They adapt or "design" their actions specifically "for" their recipients who display different degrees of competence. From a conversation analytic perspective, recipient design represents a constituent feature of interaction in general (Sacks, 1995). From a developmental perspective, fine-tuning (Bruner, 1983; Snow, 1995) can be understood as a form of recipient design. Changes in question designs provide ample evidence for the adult's fine-tuning (Bruner, 1983; Snow, 1995) to the child's developing competence. Likewise, the shift from where questions to what questions exemplifies how adults first reduce and then raise interactive demands. Our findings thus lend further support to the acquisitional effectiveness of the caregiver's dynamic adaptation to the child's abilities (Marcos, 1991; Snow, 1995; Zukow-Goldring, 1996; Wootton, 1997; Vygotsky, 1998; Hausendorf and Quasthoff, 2005; Forrester, 2013; Trueswell et al., 2016). In line with Vygotsky's zone of proximal development (Vygotsky, 1998), our findings suggest that the adults' support in fact enables children to come to grips with the sequential organization of establishing reference and to eventually initiate jobs by themselves.

Adults also make particular use of the observability of communicative actions (Goffman, 1967; Sacks, 1980). With this term we refer to the "systematic ways in which objects and people come to be available to others for inspection via their public character" (Kidwell and Zimmerman, 2007, p. 593; see also Kidwell and Zimmerman, 2006). Our analysis of interactions with not yet fully competent participants demonstrates that observability is enhanced with respect to three domains: the sequential structure, interactive expectations, and devices. First, adults increase the observability of their own devices by embodying their excitement or performing their location of a target both visibly and audibly. Our finding that those devices that were made particularly salient were then later used by the child, supports the claim that the enhanced observability of devices facilitates their acquisition by the child. Second, in their reactions to the child's responses, adults display whether and to what extent that response meets or fails to meet certain expectations (either confirming it, otherinitiating corrections, or reformulating it). This observability of expectations helps the child to meet sequential demands and to gradually employ conventional resources. Finally, the observability of the sequential organization is increased through the book-reading routine itself: Its repetitive structure with several cycles of establishing reference helps the child to recognize the overall sequential scheme (Ninio and Bruner, 1978; Snow and Goldfield, 1983; Rohlfing et al., 2015) or "action arc" (Rossmanith et al., 2014, p. 8) of book reading in which each turning of the page marks the beginning of a new referential cycle.

Enhancing the observability of devices, expectations, and the sequential order can be conceived as a way of increasing the perception of the task structure—an idea that is also reflected in research on "referential transparency" (Zukow-Goldring, 1996; Rader and Zukow-Goldring, 2010; Trueswell et al., 2016; Yu and Smith, 2016). This research has mainly stressed the role of transparency for identifying the referent. Widening the lens on the whole process of establishing reference, our analyses reveal that the importance of transparency or observability also extends to devices for establishing reference and to the sequential organization as a whole.

In sum, we characterize the process of establishing references as a sequential order that is sustained by supportive adults. We conclude that the adults' supportive practices exploit basic features of interaction (conditional relevancies, recipient design, observability) that are specifically contextualized in interactions with less competent partners. Social interaction itself thus proves to be an important source of the child's communicative and cognitive development (Vygotsky, 1998; Hausendorf and Quasthoff, 2005). Further research should examine whether these supportive practices are realized intuitively by all caregivers. To fully answer this question, we need to investigate cases in which caregivers and children display difficulties in establishing joint reference. If caregivers barely establish and maintain the sequential organization described above, it could well be that the children in their care show delays in the acquisition of reference.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Ethik-Kommission der Ärztekammer Westfalen-Lippe and the Medizinische Fakultät der Westfälischen Wilhelms-Universittät Münster with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocoll was approved by Ethik-Kommission der Ärztekammer Westfalen-Lippe and the Medizinische Fakultät der Westfälischen Wilhelms-Universität Münster.

# AUTHOR CONTRIBUTIONS

VH developed the descriptive instrument; KR collected the data; VH and KR analyzed the data and wrote the paper.

# FUNDING

This work was funded by the Deutsche Forschungsgemeinschaft as part of the CRC 673 "Alignment in Communication" at the Cluster of Excellence Cognitive InteractionTechnology "CITEC" (EXC277).

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2017.00139/full#supplementary-material

# REFERENCES


Fillmore, C. J. (1997). Lectures on Deixis. Stanford, CA: CSLI Publications.

Forrester, M. A. (2013). Mutual adaptation in parent-child interaction: learning how to produce questions and answers. Interact. Stud. 14, 190–211. doi: 10.1075/is.14.2.03for


Garfinkel, H. (1967). Studies in Ethnomethodology. Cambridge: Polity Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Heller and Rohlfing. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Effects of Agent-Environment Symmetry on the Coordination Dynamics of Triadic Jumping

Akifumi Kijima<sup>1</sup> \*, Hiroyuki Shima<sup>2</sup> , Motoki Okumura<sup>3</sup> , Yuji Yamamoto<sup>4</sup> and Michael J. Richardson<sup>5</sup>

*<sup>1</sup> Department of Education, University of Yamanashi, Kofu, Japan, <sup>2</sup> Department of Environmental Sciences, University of Yamanashi, Kofu, Japan, <sup>3</sup> Department of Art and Sports Educational Science, Tokyo Gakugei University, Koganei, Japan, <sup>4</sup> Research Center of Health, Physical Fitness and Sports, Nagoya University, Nagoya, Japan, <sup>5</sup> Department of Psychology, Center for Cognition, Action and Perception, University of Cincinnati, Cincinnati, OH, USA*

We investigated whether the patterns of coordination that emerged during a three-participant (triadic) jumping task were defined by the symmetries of the (multi) agent-environment task space. Triads were instructed to jump around different geometrical arrangements of hoops. The symmetry of the hoop geometry was manipulated to create two symmetrical and two asymmetrical participant-hoop configurations. Video and motion tracking recordings were employed to determine the frequencies of coordination misses (collisions or failed jumps) and during 20 successful jump sequences, the jump direction chosen (clockwise vs. counterclockwise) and the patterning of between participant temporal movement lags within and across jump events. The results revealed that the (a)symmetry of the joint action workspace significantly influenced the (a)symmetry of the jump direction dynamics and, more importantly, the (a)symmetry of the between participant coordination lags. The symmetrical participant-hoop configurations resulted in smaller overall movement lags and a more spontaneous, interchangeable leader/follower relationship between participants, whereas the asymmetrical participant-hoop configurations resulted in slightly larger overall movements lags and a more explicit, persistent asymmetry in the leader/follower relationship of participants. The degree to which the patterns of behavioral coordination that emerged were consistent with the theory of symmetry groups and spontaneous and explicit symmetry-breaking are discussed.

Keywords: joint action, symmetry, symmetry-breaking, leader and follower roles, social motor coordination

# 1. INTRODUCTION

Suppose you oscillate the index finger of each hand back and forth at the same time. The abductors and adductors of the two fingers contract simultaneously. This pattern of synchronous coordination is commonly termed in-phase coordination and reflects a symmetric pattern of behavioral action, in that the phase or spatiotemporal position of the two movements is exactly (or nearly exactly) the same over time (they are 0◦ out of phase). In contrast, if you oscillate one index finger leftward and the other index finger rightward at the same time, adduction and abduction occur in an asymmetric manner. This latter pattern of behavioral synchrony is commonly termed anti-phase coordination, because the phase of the finger movements are exactly (or nearly exactly)

#### Edited by:

*Sebastian Loth, Bielefeld University, Germany*

#### Reviewed by:

*Thomas Stoffregen, University of Minnesota, USA Merryn Dale Constable, University of Toronto, Canada*

> \*Correspondence: *Akifumi Kijima akijima@yamanashi.ac.jp*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *29 September 2016* Accepted: *03 January 2017* Published: *02 February 2017*

#### Citation:

*Kijima A, Shima H, Okumura M, Yamamoto Y and Richardson MJ (2017) Effects of Agent-Environment Symmetry on the Coordination Dynamics of Triadic Jumping. Front. Psychol. 8:3. doi: 10.3389/fpsyg.2017.00003* opposite over time (they are 180◦ out of phase). Now, suppose you start oscillating your two index fingers in an anti-phase or asymmetric manner at a relatively slow movement frequency (say one oscillation a second or 1 Hz) and then gradual increase your movement frequency over time so that your fingers move faster and faster. What you find is that at very fast frequencies of movement your fingers will spontaneously transition from anti-phase coordination to the symmetric, in-phase pattern of coordination. In fact, this transition will likely occur no matter how hard you try to maintain an asymmetric or antiphase pattern of coordination; the transition is indifferent to your will. Finally, try and produce a pattern of coordination between your two index fingers that is neither in-phase nor anti-phase. Like trying to maintain anti-phase coordination at fast movement frequencies, you will find that this is also nearly impossible to do, with your fingers being spontaneously pulled back into an in-phase or anti-phase pattern of movement (and more often in-phase than anti-phase). Interestingly, if you try the same experiment with your hands, your arms, your legs, or any two limbs for that matter, you will find the same result; namely, that (two-limb) rhythmic inter-limb coordination is constrained (without practice) to in-phase and anti-phase patterns of coordination, with in-phase-coordination more stable than anti-phase coordination.

This highly robust rhythmic coordination phenomenon was empirically demonstrated by Scott Kelso in the mid 1980s (Kelso, 1984, 1995) and has been effectively modeled (Haken et al., 1985) in a manner consistent with the dynamics of coupled oscillators. Of more relevance here, is that the symmetry of these two coordination patterns are defined by the symmetry of the underlying dynamics of the component limbs (oscillators) and the inter-limb coupling (Golubitsky et al., 1998, 1999). In more formal terms, in-phase coordination reflects a symmetric mode of coordination because it preserves the symmetry of the system. That is, the observed pattern of coordination is invariant to the spatial permutation or interchange of oscillator (movement) 1 and oscillator (movement) 2. In contrast, the antiphase mode of coordination reflects a state of less or broken symmetry, in that the pattern of coordination is no longer invariant to a purely spatial permutation or interchange of the two oscillators (movements). It is important to appreciate, however, that anti-phase coordination is still very much entailed by the symmetry of the system of two identical (or near identical) oscillatory movements and does not correspond to a state of no symmetry. Rather anti phase coordination is symmetric with respect to the spatiotemporal transformation that permutes the two oscillators/movements and shifts the phase by half a period (see e.g., Collins and Stewart, 1994; Kelso, 1995; Richardson et al., 2015 for more details about the spatial and temporal symmetries of the coupled oscillators).

The importance of understanding rhythmic coordination in terms of symmetry is that the theoretical principles of symmetry and symmetry-breaking provide a lawful, yet highly generalizable understanding of behavioral coordination that is indifferent to the particulars of the system, movement, or coordination task being considered. For instance, Golubitsky and Stewart (2003) have demonstrated how the different rhythmic gait patterns observed in human, animal, and insect locomotion are a lawful consequence of the finite set or group of symmetries that define the couplings between the cells of the central pattern generators assumed to underlie gait control.

For example, the gait patterns of quadrupeds are defined by the symmetry group that includes invariance in the permutation between two contralateral cells and also includes invariance in the permutation of four ipsilateral cells (see Golubitsky et al., 1998, 1999; Buono and Golubitsky, 2001). Such symmetry predictions also provide a generalized understanding human arm-leg (4 limb) coordination (Jeka et al., 1993). Harrison and Richardson (2009) have even demonstrated how the gait patterns of two individuals walking one behind the other are spontaneously confined to patterns predicted by the symmetry group approach of Golubitsky and colleagues (see Richardson et al., 2016 for more details).

The significance of the latter interpersonal example, is that it demonstrates how symmetry principles not only define intrapersonal and biomechanically coupled patterns of movement coordination, but also appear to underlie social or informational (visually, auditory) coupled patterns of movement coordination. Perhaps the most famous example of this with regards to rhythmic coordination stems from the work of R.C. Schmidt and colleagues, which has demonstrated how the rhythmic limb or body movements of visually coupled participants are constrained to the exact same, in-phase and anti-phase patterns of coordination defined above (e.g., Schmidt et al., 1990; Amazeen et al., 1995; Schmidt and O'Brien, 1997; Richardson et al., 2005, 2007b). As with intrapersonal rhythmic coordination (Kelso, 1984, 1995), the stability of inphase (symmetric) coordination during interpersonal or visually mediated interaction is greater than that observed for anti-phase (asymmetric) coordination, evidenced by the greater variability of anti-phase coordination compared to in-phase coordination and that visually coupled individuals spontaneously transition from anti-phase to in-phase coordination at faster movement frequencies (Schmidt et al., 1990; Schmidt and Turvey, 1994).

Note the relationship between the order of the symmetry that defines the coordination pattern and the stability of that coordination pattern, not to mention how symmetric systems will tend to exhibit more symmetric patterns of behavior if possible (Kugler and Shaw, 1990; Kelso, 1995; Turvey, 2007). The transition from more to less symmetric states is also, possible, however, if a more symmetric state of behavior becomes unstable beyond some critical control parameter value (i.e., a spontaneous symmetry break occurs) or if some form of asymmetry is introduced into the system (i.e., explicit symmetry breaking occurs). Richardson et al. (2015) have recently argued that the principle of symmetry and the theory of spontaneous and explicit symmetry-breaking provides a highly generalizable way of understanding and predicting the organization and stable patterns of human and social behavior. Motivated by Curie's principle ("the symmetry of the effects are written in the symmetry of the causes" Curie, 1894) and the theory that symmetry breaks operate to create higher order structures of behavioral organization, they argue that the modes or patterns of behavior exhibited by individuals during joint- or social-activity are often a result of spontaneous or explicit symmetry breaking events or task properties (also see Lagarde, 2013; Richardson et al., 2016). As evidence of this, they highlight recent research demonstrating how experimentally assigned leader/follower roles naturally induce compensatory behavioral action on the part of the leader in order to help stabilize a followers action (Vesper and Richardson, 2014). Conversely, individuals often spontaneously induce such symmetry breaks during on-going joint-action in order to establish more stable patterns of behavior. For instance, Richardson et al. (2015) observed that pairs of individuals instructed to rhythmically move back and forth between orthogonally opposed targets unexpectedly adopted an asymmetric pattern of elliptical movement in order to minimize the chance of a collision and at the same time maximize coordination stability. Moreover, the spontaneous appearance of the asymmetric movement pattern established a complementary leader/follower relationship that persisted for the remainder of the experimental task.

Recent research examining the stable patterns of real-world multi-agent behavior have revealed findings compatible with the symmetry approach. For instance, during many two person sports tasks, inter-player coordination patterns intermittently transition between two (Kijima et al., 2012; Okumura et al., 2012) or more stable states or modes of behavior (Yamamoto et al., 2013), with most of these modes reflecting an asymmetrical pattern of behavioral order (e.g., anti-phase) that is dependent on environmental or task constraints (e.g., interpersonal distance). The role of a player (e.g., step forward or away, offense or defense) alters accordingly (Kijima et al., 2012). Such role asymmetries are characteristic of the sports like soccer (Yamamoto and Yokoyama, 2011) and basketball (Fujii et al., 2016) and can depend on the skill of the players. For example, Yokoyama and Yamamoto (2011) asked four participants, including collegiate soccer players, to engage in a simplified three on one soccer game (monkey in the middle game) and found that the symmetry of the coordination patterns adopted by players were skill dependent realizations of behavioral coordination modes predicted by the symmetries of symmetric Hopf bifurcation theory (Golubitsky and Stewart, 1985). In simple terms, the coordination patterns of triads with higher skill level had higher order spatiotemporal symmetry.

# 2. EXPERIMENT

The aim of the current study was to further explore the degree to which the behavioral organization or patterning of social movement coordination is a consequence of the symmetry (or asymmetry) of the physical and informational constraints that define a given agent-environment task context. In other words, the current study was aimed at testing whether the (a)symmetry of a task's action space defines what (possible) patterns of social movement coordination should be observed. To achieve this aim, a three person (triad) coordinated jumping task was developed, in which participant triads were required to jump around different geometrical arrangements of hoops without colliding or bumping into each other. Four different geometric hoop arrangements were employed, 3-, 4-, 5-, and 6 hoop arrangements (see **Figure 1**), such that symmetry of the participant-hoop configuration was greater in the 3- and 6-hoop conditions (referred to as the symmetric conditions) compared to the 4- and 5-hoop conditions (refereed to as the asymmetric conditions).

The study centered on two related predictions. The first concerned jumping direction. Essentially, on each jump event the participants in a triad needed to all jump in the same clockwise or counterclockwise direction in order to avoid colliding. Note that clockwise and counterclockwise jumping were equally afforded in all hoop conditions. In other words, clockwise and counterclockwise jumping were symmetrically stable. Therefore, it was necessary for the participants to collectively break this symmetry on a given jump such that everyone jumped in the same direction. Given that participants were instructed not to talk or non-verbally indicate their intended jumping direction, it was expected that successful jumping sequences would result when this symmetry was spontaneously broken on the first trial and then explicitly (induced) on subsequent jumping trials. That is, participants were expected to explicitly break the symmetry of jump direction by jumping in the same direction over the course of repeated jumping events. However, given the symmetric possibility of clockwise or counterclockwise jumping, both direction preferences were expected to be observed across triads (i.e., the global symmetry of clockwise or counterclockwise jumping was expected to be preserved across triads).

The second prediction concerned the temporal patterning of the participant's jumps, with different patterns expected for the different hoop conditions. In simple terms, the number of open hoops was the same (symmetric) for each participant in the 3 hoop (triangle) and 6-hoop (hexagon) conditions, but different (asymmetric) in the 4-hoop (square) and 5-hoop (pentagon) conditions. More formally, the symmetry of the different participant-hoop configurations can be defined by the group (set) of symmetry transformations (rotations and reflections) that resulted in the geometry of the participant-hoop arrangement remaining invariant (i.e., remaining equivalent or unchanged). Of particular importance was that the corresponding symmetry group for each participant-hoop configuration was equal to the highest order common factor (the highest order isotropy subgroup) of the symmetry group that define the hoop and triad arrangements independently. With regard to the symmetry of the hoop arrangement, the triangular 3-hoop arrangement for instance was invariant to rotations of 120◦ , 240◦ , 360◦ and reflections about the three mid-point axes that dissected each hoop. These symmetry transformations correspond to the rotational symmetry group Z<sup>3</sup> [Z(0, 360), Z(120), Z(240)] and the reflection symmetry group R3, respectively, and when taken together, reflect how the symmetry of the 3-hoop triangle is defined by the dihedral group D3. The symmetry group of the other geometric hoop layouts can be similarly defined, such that the 4-hoop square condition had D<sup>4</sup> symmetry, the 5-hoop pentagon condition had D<sup>5</sup> symmetry, and the 6-hoop hexagon condition had D<sup>6</sup> symmetry as represented in **Table 1**.

With regard to the geometric symmetry of the triad, a specific set of starting hoop locations were employed (see **Figure 1**) such

TABLE 1 | Isotropy subgroups of the actor-hoop configurations.


that assuming that each participant was more or less equivalent in action (jumping) capability, task understanding, motivation, etc., the three participants in each triad could be assigned (interchanged) to any of the defined starting hoop locations. Hence the symmetry of the participants (actors) with regards to assigned hoop location corresponded to the symmetry group S3, meaning that there are 3! (=6) equivalent ways the actors could be permuted with regards to assigned hoop location (i.e., [1-2-3], [1-3-2], [2-1-3], [2-3-1] [3-1-2], and [3-2-1]). Accordingly, the symmetry of the relational configuration of a triad with regards to hoop alignment corresponded to highest order isotropy subgroup of a hoop conditions D<sup>n</sup> symmetry group and the permutation group Sn. As detailed in **Table 1**, this corresponds to D<sup>3</sup> for the 3-hoop (triangle) and 6-hoop (hexagon) conditions and D<sup>1</sup> (or Z2) for the 4-hoop (square) and 5-hoop (pentagon) conditions. Note that D<sup>1</sup> has only one rotational symmetry Z(0, 360) and one reflection symmetry R1, due to asymmetrical or not integer factorization of the corresponding D<sup>n</sup> to S<sup>3</sup> symmetry. I, an isotropy subgroup of all hoop conditions, means transformation that has only one rotational symmetry Z(0, 360) and does not allow any permutation. (For a relevant introductory overview of Group Theory and a detail explanation about the nature of dihedral group D<sup>n</sup> and its relation to S<sup>n</sup> and Zn, see Richardson et al., 2015, p. 238.)

It is important to appreciate the novel hypothesis being tested here; namely, that the symmetry of the temporal coordination observed between triads would be consistent with the symmetry of the isotropy subgroup that defined the participant-hoop configurations. The general prediction was that participant-hoop configurations defined by higher order isotropy subgroups (i.e., 3 and 6 hoop conditions) would result in more symmetric patterns of temporal coordination compared to the participanthoop configurations defined by lower order isotropy subgroups (i.e., 4 and 5 hoop conditions). Accordingly, we expected that the symmetry of temporal lead/lag relationship (i.e., leader/follower role) between actors would be a functional reflection of isotropy subgroup that defined the participant-hoop configuration. More specifically, we expected that the D<sup>3</sup> hoop-triad symmetry of the 3- and 6-hoop conditions would result in a symmetric interchange of participants with regards to who led and followed (lagged) over the course of jumping trials and sequences. This is because the higher order D<sup>3</sup> isotropy subgroup of the participant-hoop configurations for the 3- and 6-hoop conditions corresponded to a more symmetric action space for participants in these conditions. That is, each participant's spatial jumping degrees of freedom (DoF) were equivalent (symmetric) in the 3- and 6-hoop conditions. This action space symmetry was broken, however, in the square and pentagon conditions, and is formally realized by the lower order D<sup>1</sup> isotropy subgroup of the corresponding participant-hoop configuration. Indeed, for the square and pentagon conditions the spatial jumping DoF of participants are asymmetric (see **Figure 1A**). For the square condition two participants have one (common) open space adjacent to them, whereas the third participant does not. For the pentagon condition, one participant has two adjacent open spaces, whereas the other two participants only have one. Accordingly, we expected a corresponding asymmetry in the role of participants with regards to who led and followed (lagged) over the course of jumping trials and sequences, with one actor tending to consistently lead and/or lag behind the other two (i.e., consistent with a D<sup>1</sup> or Z<sup>2</sup> pattern).

# 2.1. Materials and Methods

### 2.1.1. Participants

Twenty-seven undergraduate students from Tokyo Gakugei University and the University of Yamanashi were recruited as participants in the study. Fifteen participants were male and 12 were female, with a mean (SD) age of 20.00(±0.961) years. Participants were randomly assigned to one of nine triads. Participant handedness, or laterality quotient (H) for each participant was determined using the 10 item Edinburgh inventory of handedness (Oldfield, 1971). H value ranges from −100, which corresponds to extreme left-handedness, to +100, which corresponds to extreme right-handedness. H for one female member was −21.739, indicating weak left-handedness (≤ 1 in decile score). The mean (SD) H score for the remaining 26 participants was 60.773(±21.510), with a range of 8.33 (very weak: 1 in decile) to 100.00 (completely right-handed: 10 in decile).

### 2.1.2. Jumping Task & Task Space Geometry

Jumping task: Participant triads were instructed to jump in a clockwise or counterclockwise direction around geometric arrangements of three, four, five or six 0.6 m diameter rubber hoops placed on the center of 2.28 × 2.28 m<sup>2</sup> polyurethane mat (see **Figure 1B**). The hoops were aligned such that both sides of a hoop touched adjacent hoops and the distance between hoops was equal, resulting in the four geometric hoop arrangements: a 3-hoop triangle, a 4-hoop square, a 5-hoop pentagon, and a 6 hoop hexagon (see **Figure 1A**). Each member of the triad was assigned to one of the three colored hoops (i.e., yellow, blue, or red in **Figure 1**). This hoop corresponded to a participant's starting hoop location.

Each member of a triad was instructed to jump with both legs into an adjacent hoop (either to the left or right) at the sound of a specific metronome cue. Participants were instructed to jump together as a group and to avoid colliding into each other. The metronome tone was presented at 1.0 s intervals, with every third metronome beat presented at a higher tone to indicate the time to jump (i.e., the jumping movement cycle ≈ 3.0 s). Participants were informed that they should continue to jump every 3 s (i.e., every higher metronome tone) until they succeeded in performing a sequence of 20 successfully coordinated jumps. If any participant in a triad collided with another participant (i.e., performed an unsuccessfully coordinated jump), the participants were instructed to stop and move back to their assigned starting hoop location and begin the sequence again.

Four triads began with the triangle condition and the number of hoops was increased one by one when the triads completed a sequence of 20 successful trials in the given geometric condition. The remaining five triads began with the hexagon condition and the number of the hoops was decreased. No instructions were provided as to which direction participants should jump. Rather participants were informed that jumping direction could be freely selected at the time of each jump, with the understanding that each member of a triad had to jump in same direction in order to avoid collision. Participants were not informed about what lead/lag relationships should or could be employed, nor were participants designated a-priori as leader/follower (absolutely no information about possible leader/follower roles was provided to participants). Participants were also given explicit instructions not to verbally or non-verbally communicate with each other during the experiment. Accordingly, each member of a triad had to predict the other two members' jumping direction while preparing to execute their own jumping movement. In this preparation phase, downward movement of the center of mass and forward/upward arm swing would be required to recoil enough to jump the distance between the hoops (max inter-hoop jumping distance was 0.6 m).

As detailed above, the participant-hoop configurations employed in the current study were defined by the isotropy subgroups listed in **Table 1**. As further clarification, note that the participant (actor) symmetry, S<sup>3</sup> was isomorphic with the symmetry group D3, which can be seen by the fact that the three actors always form a triangle within the task space (S<sup>3</sup> and D<sup>3</sup> are equivalent symmetry groups). With regards to hoop alignment for the two symmetrical conditions, the symmetry of the triangle and hexagon conditions are captured by the dihedral group D<sup>3</sup> and D6, respectively. Accordingly, for both the triangle and hexagon conditions the isotropy subgroups of the hoopparticipant configuration are D3, Z3, D<sup>1</sup> (or Z2,) and I (identity), with the highest order subgroup being D3. For the asymmetric groups, the symmetry of the square and pentagon conditions is captured by the dihedral group D<sup>4</sup> and D5, respectively. Thus, for both of these conditions the isotropy subgroups of the hoop-participant configuration are only D<sup>1</sup> (or Z2,) and I (identity), with the highest order subgroup being D<sup>1</sup> (D<sup>1</sup> = Z<sup>2</sup> are isomorphic groups and reflect the fact the system is invariant to only 2 transformation; no-change and the reflection/permutation of only two of three elements).

### 2.1.3. Procedure

After arriving at the testing location, participants were given written informed consent in accordance with the Declaration of Helsinki. Following informed consent, each participant completed the 10-item Edinburgh inventory of handedness (Oldfield, 1971). Participants then received instructions about the jumping task and how the jumping task should be performed (again, note that no information about how the task should be completed successful was provided, either in terms of direction, lead/lag relationships, or leader/follower roles). Following these instructions, a cap with 4 motion-tracking markers attached to it was placed on each participant's head to record the jumping movements (see below for more details on the motion tracking systems and markers employed). After the marker cap was secured to each participant's head and participants indicated that they understood the task instructions, participants were then

randomly assigned to one of the different colored hoops and were informed that this colored hoop would always be their starting hoop (location) during the experiment. Triads then practiced the task several times. At the beginning of each jumping sequence, each member of a triad was instructed to stand in their assigned hoop while the metronome tone was presented. Participants started jumping at the time of an experimenter's verbal cue, with the aim of completing 20 successful jumps in a row. As mentioned above, if a collision occurred at any time during a jumping sequence, participants were instructed to return to their respective starting locations and begin the jumping sequence again. The experiment ended when a triad completed 20 successful trials for all four hoop conditions. Triads performed the task alone and did not view other triads performing the task before their own performance. Members of each triad were acquainted with each other, as they were all students from the same course (physical education) at the same university (Tokyo Gakugei University or University of Yamanashi). These procedures adhered to the Faculty of Education in University of Yamanashi research ethics committee guidelines.

#### 2.1.4. Dependent Measure and Analysis

Six infrared cameras (OQUS300, Qualysis, Sweden) were used to record the three-dimensional position of each participant's head location at a sampling frequency of 100 Hz. Each participant wore a 4-marker head cap; three markers aligned in triangle shape to detect participant's head direction and one marker located in the center of the triangle to detect the central position of the head. Prior to analysis, the recorded motion data was filtered using fourth ordered Butterworth filter with cutoff frequency of 6 Hz. Task performance was also recorded using a digital video camera (Sony DCR650; 60 Hz) to determine (1) the frequency of unsuccessful jumps (coordination collision or misses) prior to a triad completing a 20 successful jump sequence for a given condition and (2) the direction of jump rotation, measured in terms of the frequency of counterclockwise rotation.

Measures indicating the participants' temporal behavioral patterns and in particular their leader/follower status were most/more relevant for our research. Especially, the measure to indicate leader/follower status would be more important in current research because the status was not a-priori assigned in the task that would be emerge dependent on participants-hoops geometrical configuration. Therefore, to determine the temporal coordination of triads, the motion data of each participant's head height (head position on the Z-axis) was first divided into each jump cycle by isolating the peaks (top most head positions) over time. From these participant head-height peaks, the lead/lag jumping time between participants at each jump was calculated with respect to the first (lead) jumper (see **Figure 2**). That is, at every jump event the temporal lag of the two follower jumpers was determined with respect to the lead jumper. These standardized lag times were then averaged to provide an overall estimate of the temporal coordination lag. A coupling pattern index (CI) was also calculated from these lag times and was equal to

$$CI = \frac{L\_{12}}{L\_{13}}\tag{1}$$

where L<sup>12</sup> denotes the lag between the leader and the second jumper, and L<sup>13</sup> denotes the last (third) jumper's lag with respect to the leader. As can be seen from an inspection of **Figure 2**, a CI ratio equal or close to 0.0 indicates a coupling pattern in which two participants essentially lead one follower, whereas a CI ratio equal or closes to 1.0 indicates a coupling pattern in which one participant lead two followers. Each triads performance and rotation data from the 20 successful jump sequences were averaged separately for each of the four geometrical hoop conditions and were compared using oneway, repeated measures ANOVAs. Temporal lag and CI were not averaged over the 20 successful trial sequences and, thus, were analyzed using two-way repeated measures ANOVAs (four geometrical conditions × 20 successful trials). Post-hoc analysis

was conducted using Benjamin-Hochberg procedure to control discovery rate.

# 2.2. Results

#### 2.2.1. Jumping Direction

The mean and standard deviation of frequency of counterclockwise jump rotation is displayed in **Figure 3**, with a one-way repeated measures ANOVA indicating no significant difference between geometrical conditions, F(3, 24) = 0.28, p = 0.84, η <sup>2</sup> = 0.19. Somewhat unexpectedly, during successful 20-jump sequences triads tended to jump in a counterclockwise direction in all conditions: 71.67 (±23.59)% of trials in the triangle condition; 73.89 (±29.02)% of trials in the square condition; 69.44 (±31.86)% of trials in in the pentagon condition; and 66.11 (±29.13)% of trials in for hexagon condition. To determine if this overall mean effect was representative of the triads as a whole, a binomial test was employed to confirm the significance of counterclockwise rotation frequency for each individual triad. The results indicated that the preference for counterclockwise direction was significant in 5 triads (p < 0.001), with two triads exhibiting a slightly over chance level of counterclockwise preference, and only one triad exhibiting a greater preference for the clockwise jump direction as demonstrated in **Table 2** (Triad C: 48 clockwise jumps out of 80 successful jumps). In addition, three triads (Triad E, F, and G in **Table 2**) nearly always jumped in counterclockwise direction in all conditions.

**Figure 4** displays mean and the standard deviation of the frequency of misses or unsuccessfully jumps (participant collisions) that occurred prior to achieving a successfully 20 jump sequence. The one-way repeated measure ANOVA revealed no significant difference between the four different geometrical hoop conditions, F(3, 24) = 0.52, p = 0.67, η <sup>2</sup> = 0.26.

#### 2.2.2. (A)symmetry in Temporal Coordination

The overall mean coordination lag of the two followers' relative to the leader is displayed in **Figure 5A**. The statistical analysis revealed a significant main effect of geometrical hoop condition,

F(3, 24) = 4.497, p = 0.012, η <sup>2</sup> = 0.75, with post-hoc analysis indicating that the lag in the square condition was significantly longer than the lag observed for the triangle (p = 0.019) and hexagon (p = 0.019) conditions. This same analysis was also performed after excluding one triad, with the corresponding mean data superimposed in **Figure 5A** using open black circles. This triad (Triad H in **Table 2**) was excluded from this and subsequent analysis because their overall task performance was much poorer than the other triads and, moreover, because the patterning of the jumping behavior exhibited was qualitatively different from the other triads (see below for details on the performance of this triad). The analysis of these data also resulted in a main effect of geometrical hoop condition, F(3, 21) = 6.377, p = 0.003, η <sup>2</sup> = 0.955, but this time with the coordination for both the square and pentagon conditions being significantly longer than that observed for the triangle and hexagon conditions (square-triangle: p = 0.024; square-hexagon: p = 0.024; pentagon-triangle: p = 0.052; pentagon-hexagon: p = 0.041).

**Figures 5B,C** shows the mean lag of the eight triads retained for analysis (i.e., excluding the triad depicted by the circle means in **Figure 5A**) calculated separately for the 20 successful jumps for the four hoop conditions. Consistent with the results of overall mean lag presented above, a geometrical condition (triangle, square, pentagon and hexagon) by 20 jump events two way repeated measures ANOVA revealed a significant main effect of geometrical hoop condition, F(3, 21) = 6.390, p = 0.003; η <sup>2</sup> = 0.96, with the lags for the square and pentagon conditions being consistently longer than those for the triangle and hexagon conditions (post-hoc Benjamin-Hochberg analysis, p < 0.001). There was no main effect of jump event, F(19, 133) = 1.130, p = 0.330; η <sup>2</sup> = 0.139, nor an interaction between geometric condition and jump event F(57, 399) = 1.217, p = 0.15 η <sup>2</sup> = 0.148.

The data presented in **Figure 6** displays the jump lags observed for the exceptional triad identified above and in **Figure 5A**. For this triad, the participant who was assigned to the blue hoop starting location always led the other two participants irrespective of rotation direction or geometric


*Direction was denoted by a letter "c (counterclockwise)" or "w (clockwise)."*

constraint. Although defining a single leader across conditions is a possible strategy for achieving coordinated jumping, no other triad exhibited a consistently stable pattern of participant leading and there is no reason why such single leader dominance should occur in this manner for the two asymmetric conditions unless some a-prior "decision" is made as to who will lead a given jumping sequence or set of jumping sequences. Indeed, the participant in the blue starting location was less likely to lead in all other triads in the square and pentagon conditions (see **Figure 8**). They jumped eight and four additional trials due to coordination misses in the square and pentagon conditions respectively, whereas mean ± SD of misses averaged over the eight triads in each of two conditions was 2.250 ± 1.982 for square and 2.000 ± 1.604 for pentagon. Thus, the triad that included a member who can jump into the space occupied by others show poorer performance, and this manner he/she always took was idiosyncratic relative to others.

**Figures 7B,C** displays the mean CI of the eight triads calculated separately for the 20 successful jump events. The two-way geometrical condition (triangle, square, pentagon and hexagon) by 20 jump events ANOVA revealed a significant main effect of jump event [F(19, 133) = 1.791, p = 0.030; η 2 = 0.506] (**Figure 7A** displays the mean and standard error of 20 jump events). However, Post-hoc analysis indicate no significant difference between jump events. There was no main effect of geometrical condition [F(3, 21) = 1.210, p = 0.330; η <sup>2</sup> = 0.416]

nor an interaction [F(57, 399) = 1.0831, p = 0.326; η <sup>2</sup> = 0.393], which in contrast to expectations initially suggested an equal preference for 1-leader/2-follower and 2-leader/1-follower CI relationships across conditions (although see below and **Figure 8** for more details).

Finally, for each triad we identified the participant that jumped faster than the other two participants in each of the four geometric conditions (i.e., the leader vs. the followers) to determine the frequency that a particular participant position was the leader as a function of jump direction (i.e., clockwise or counterclockwise). These "leader/fastest jumper" frequencies are displayed in **Figure 8**. One way-ANOVAs were employed to compare these frequency counts for each rotation direction in each geometrical condition. Consistent with isotropy subgroup expectations detailed above and listed in **Table 1**, the results revealed no significant effect of a members location in two symmetrical conditions (triangle counterclockwise: F(2, 14) = 0.675, p = 0.525, η <sup>2</sup> = 0.311; triangle clockwise: F(2, 8) = 0.917, p = 0.438, η <sup>2</sup> = 0.479; hexagon counterclockwise: F(2, 14) = 0.040, p = 0.961, η 2 = 0.076; hexagon clockwise: F(2, 10) = 0.420, p = 0.668, η 2 = 0.290), but a significant effect in almost all of the asymmetrical conditions (square counterclockwise: F(2, 14) = 42.737, p = 0.000, η <sup>2</sup> = 2.471; square clockwise: F(2, 8) = 10.682, p = 0.006, η <sup>2</sup> = 1.634; pentagon counterclockwise: F(2, 14) = 6.601, p = 0.001, η <sup>2</sup> = 0.972) excluding the case of clockwise rotation in pentagon condition [F(2, 8) = 1.121, p = 0.372, η <sup>2</sup> = 0.530]. Thus, in the asymmetrical square and pentagon conditions, the participant next to an open jump location in the direction of the previously jumped rotation jumped faster than the other jumpers in 60(in pentagon)-80(in square)% of trials they coordinated successfully. This asymmetric preference was not observed in the triangle and hexagon conditions, with a more symmetric (non preference) leader/follower relationship exhibited across jump events. Normative (symmetric) probability of first jumper (leader) can be postulated as 33%(1/3) (highlighted with broken line in **Figure 8**).

# 3. DISCUSSION

The current study investigated whether the patterns of behavioral coordination during a cooperative, three-person (triadic) jumping task were defined by the (a)symmetries of an participant-hoop configuration. Of particular concern, was the degree to which the symmetries of the actor's jumping direction dynamics and the temporal lead/lag relationship (i.e., leader/follower role) between actors was a functional reflection of the symmetry group(s) that defined the participanthoop configuration. Here we discuss the degree to which the current findings support these symmetry (and group theory) based expectations, first with regards to the triads' jump direction decisions and then with regards to the degree to which the observed (a)symmetries in the temporal lead/lag and leader/follower relationship of co-actors was consistent with the hoop-triad isotropy subgroups defined in **Table 1**.

# 3.1. Asymmetry in Jump Direction Decision

As noted above, it is important to appreciate that the symmetry between counterclockwise/clockwise jumping needed to be collectively broken on each and every trial in order for the triads to complete a successful jump. Indeed, the chance of a triad achieving a single jump successfully was only equal to 2/2 <sup>3</sup> = 2/8 = 1/4 or 25% if each participant in a triad were to randomly choose a jumping direction. Accordingly, participants in a triad were required to make a "collective" decision about which direction to jump, to the left (clockwise jump) or to the right (counterclockwise jump), on any given jumping trial. This was true for all

hoop conditions, in that for all conditions a collision (failed trial) would result if any one participant decided to jump in a direction different from their co-participants. Thus, all of the geometric hoop conditions entailed the same two action possibilities on any given jump, with the potentiality of clockwise and counterclockwise jumping being equivalent. Consistent with our expectations, participants exhibited an asymmetric preference in jumping direction within and across jumping sequences. That is, participants "broke" the symmetry between clockwise and counterclockwise jumping. The equivalence of these two action possibilities in all of the geometric hoop conditions was also reflected by the fact that no difference in jump-to-jump decision dynamics were observed for the different geometric hoop conditions. However, in contrast to expectations, triads did not show an equal or symmetric preference for each jump direction across triads and/or jumping sequences. Rather, triads exhibited a strong preference for the counterclockwise direction (i.e., 70% of successful coordinated jumps were counterclockwise) in all four geometrical hoop conditions.

Interestingly, most of the participants reported retrospectively that subtle changes in the knee extension and flexion movements of their co-actors often indicated which direction they should jump on any given trial. Thus, the decision about which direction to jump could be understood as having occurred spontaneously spontaneous symmetry break - on any given jump, a result of the subtle, yet coordinated fluctuations in the movement dynamics of the triad. Intuitively, one would expect that observing each other's knee movements would not only support the efficacy of the participants' decision about which direction to jump, but would also support the synchronization of the triads preparatory movements and, thus, the ability of the participants to collectively jump in time with the metronome signal. However, although movement fluctuations and the visual coupling between the oscillatory jumping movements of the actors may account for the jump direction chosen on each individual trial, as well as the extremely short lag between members' jumping actions (**Figure 5**), it still does not account for the overall preference for counterclockwise jumping over clockwise jumping.

Perhaps the most likely reason for this counterclockwise bias was a physiological factor, such as hand or foot dominance, with this a-priori biomechanical asymmetry operating as an explicit symmetry-breaking factor on the behavioral organization of the triad. With regards to overall task success, however, a persistent break in the symmetry of counterclockwise vs. clockwise jumping was the most effective strategy for completing a successful 20 jump sequence. That is, always jumping to the left, rather than to the right (or vice versa), best supported the continuous jumping behavior of the participants by minimizing the decision function to only one possibility (thereby reducing the actor's cognitive load and/or need for a strong perceptual attunement to the movements dynamics of others). Accordingly, it is possible that even if hand or foot dominance was not the reason for the counterclockwise bias, other non obvious task asymmetries such as the direction of the auditory metronome tone or experimenter position, may have operated to break the symmetry of the action space. Note, however, that these or other a-priori breaks in symmetry would only need to influence performance on the first jump within a sequence, with this past jumping action then operating as a symmetry-breaking factor on future jumps within the same sequence (or even across sequences), further increasing the preference of the counterclockwise direction relative to the clockwise direction.

# 3.2. Relational Symmetry of Behavioral Coordination and the Actor-Environment Task Space

The results revealed that the difference in participant-hoop configuration for the symmetric (3- and 6-hoop) and asymmetric (4- and 5-hoop) conditions influenced the patterning of the temporal coordination between participants in two ways. As illustrated in **Figure 5**, the first effect was that the overall average lag between jumpers was significantly longer in asymmetrical conditions compared to the symmetrical conditions. However, it is worth noting that the overall average temporal lag between participants was very small for both types of conditions, with the lag for the symmetrical conditions approximately equal to zero and approximate equal to 0.1 s for the asymmetrical conditions. Indeed, the latter lag is still very short compared to standard estimates of human whole body reaction times (0.358 ± 0.600 s in 20 years. for Japanese male and 0.410 ± 0.280 s in 20 years. for female; Japan Industrial Safety and Health Association). As discussed above, these short latencies suggest that in both the symmetrical and asymmetrical conditions each member of triad was able to successful predict when and in what direction the other members of the triad were intending to jump (again, likely due to the detection of knee flexion-extension kinematics).

The second and much more important finding related to the symmetries that defined the frequency with which each actor led (or followed/lagged) the jumping action during successful 20 jump sequences. As illustrated in **Figure 8**, the frequency of participant role (i.e., leader/first jumper vs. follower/lagged jumper) was invariant for the symmetrical hoop conditions, with each actor equally likely to jump first. In contrast for the asymmetrical conditions, there was a strong asymmetry in the frequency of actor role, with a greater magnitude of invariance in terms of the role adopted by a given actor across jumping events that is, one actor adopted the role of leader or follower more often than the other two actors. This latter asymmetry in actor role is particularly clear in the square condition, but is also discernible in the pentagon condition.

Of course, the significance of this latter finding is that its is consisted with the hypothesis that the symmetry (asymmetry) of triad behavior would correspond to the symmetry (asymmetry) of the highest order isotropy subgroup of the participant-hoop configuration. For the triangle and hexagon conditions, the highest order, D3, isotropy subgroup of the participant-hoop configuration was reflected by the fact that each actor was equally likely to emerge as the leader on any given jump event. Moreover, this suggests that the emergence of the lead jumper on any given jump was the result of a spontaneous symmetry break (i.e., could have resulted spontaneously from small temporal fluctuations in actor movement onset/offset times). As already noted, this is consistent with the findings displayed in **Figure 8**, with each actor equally likely to jump first on any given jump trial during the triangle and hexagon conditions. In contrast, for the two asymmetrical conditions, the highest order isotropy subgroup for the square and pentagon conditions was D<sup>1</sup> (or Z2,). This reflected the (explicitly broken) asymmetry in the jumping DoF available to each actor in these two conditions. The corresponding symmetry or group theoretic prediction was that only two actors should be permutable or interchangeable within the task context. That is, one actor should consistently behave differently from the other two. Consistent with this predicted, a more predictable pattern of "leaders" and "followers" was observed for the square and pentagon conditions compared to the triangle and hexagon conditions (i.e., less leader-follower interchange; see **Figure 8**). More specifically, the participant or participants who had open locations next to them in the direction jumped previously, tended to jump first (leading) compared to the other actor or actors. For instance, in the square condition the "red" actor in **Figure 8** led more often during clockwise jumps, whereas the "yellow" actor led more often during counterclockwise jumps. Similarly, for the pentagon condition the "blue" and "yellow" actors were more likely to lead/jump first compared to the "red" actor. Note the latter, pentagon 2-to-1 symmetry and the former, square 1-to-2 symmetry are both entailed by D<sup>1</sup> and Z2.

Finally, it is important to note that the participant-hoop isotropy subgroups defined in **Table 1** (in method section) represent the set of possible behavioral modes that could have occurred, such that lower order coordination patterns were still stable and could have emerged (recall that anti-phase coordination is still a stable pattern of rhythmic inter-limb coordination even though in-phase coordination is the more symmetric pattern Kelso, 1984, 1995). The implication for the jumping task investigated here, is that during the triangle and hexagon conditions triads could have adopted the same pattern of behavior they exhibited in the square and pentagon conditions (i.e., D<sup>1</sup> or Z2,), as well as a cyclic leader/follow pattern (Z3) or even a fixed pattern of behavioral roles (i.e., an I or Identify pattern). Of course, the latter identity pattern was the only other option available to triads in the square and pentagon conditions and would correspond to each actor adopting a fixed, asymmetric role (i.e., leader, second, third jumper) across jump events (to some extend this may have defined clockwise pentagon jumps; see **Figure 8**). As noted in the introduction, however, self-organized dynamical systems typically (or more often) converge on the most symmetric pattern of behavior possible within a given task context (e.g., in-phase in laboratory joint action task: Schmidt et al., 1990; Richardson et al., 2007b; anti-phase in one-on-one competing action: Kijima et al., 2012; Okumura et al., 2012), with such states being more stable in the absence of further symmetry breaking factors. Indeed, for the present task, the emergence of lower order isotropy subgroup symmetries would have required an explicit or induced symmetry break on the part of the participants (Richardson et al., 2016). For instance, the participants would have needed to explicitly communicate or agree on an order or permutation pattern of actor role. The lower-order behavioral modes of coordination could have also been explicitly induced by employing visual information to form a shared task representation of each actors intentional state (Sebanz et al., 2003, 2005, 2006b). This may well have been what resulted in the qualitatively different behavior of the excluded pair shown in **Figure 6**. An interesting question for future research, is whether such cognitive or representational forms of explicit symmetry breaking might be directly specified

# REFERENCES


and (cognitively) understood via the perception of shared task affordances (Sebanz et al., 2006a; Richardson et al., 2007a; Marsh et al., 2009).

# 4. CONCLUSION

In conclusion, the results of the current study reveal that the geometrical (a)symmetry of an actor-environment task space determines the (a)symmetry of the behavior coordination that can emerge. The current study also demonstrates how the formal language of symmetry, namely group theory, can be employed to understand and define the patterns of behavioral coordination that are possible and likely to occur within the given (multi-) agent-environmental task context. The extended implication is that the principles of symmetry and symmetry-breaking can provide a fundamental and highly generalizable theory for understanding and predicting the stable patterns of multi-agent coordination and social activity, one that places a theoretical account of psychological perceptual-motor behavior, as well as cognitive decision making within the formal principles that shape and constrain all biological and natural systems (Richardson et al., 2016).

# AUTHOR CONTRIBUTIONS

AK, HS, MO, and YY contributed to the conception and design of the work. AK, MO, and YY contributed to the data acquisition. AK HS, MO, YY, and MR contributed to the data analysis and interpretation of data. AK, HS, YY, and MR work for drafting and HS, MO, YY, and MR revised the manuscript critically for important intellectual content. All authors contributed in final approval of the version to be published. AK and MR are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

# FUNDING

This work was supported by JSPS KAKENHI Grant Number JP23500711 and JP25390147. The research was also supported, in part, by an award from the National Institutes of Health, R01GM105045.

# ACKNOWLEDGMENTS

We thank Keiko Yokoyama, Keisuke Fujii and Rachel W. Kallen for helpful comments and discussions that improved the manuscript.


subsystems in a small human group. Sci. Rep. 6:23911. doi: 10.1038/srep 23911


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Kijima, Shima, Okumura, Yamamoto and Richardson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Interpersonal Coordination and Individual Organization Combined with Shared Phenomenological Experience in Rowing Performance: Two Case Studies

Ludovic Seifert <sup>1</sup> \*, Julien Lardy <sup>2</sup> , Jérôme Bourbousson<sup>2</sup> , David Adé<sup>1</sup> , Antoine Nordez <sup>2</sup> , Régis Thouvarecq<sup>1</sup> and Jacques Saury <sup>2</sup>

<sup>1</sup> Centre d'Etudes des Transformations des Activités Physiques et Sportives (CETAPS) - EA 3832, University of Rouen Normandy, Mont Saint Aignan, France, <sup>2</sup> Laboratory "Movement, Interactions, Performance" (EA 4334), Faculty of Sport Sciences, University of Nantes, Nantes, France

#### Edited by:

Richard C. Schmidt, College of the Holy Cross, USA

#### Reviewed by:

Ludovic Marin, University Montpellier, France Robert R. Caron, Assumption College, USA

\*Correspondence: Ludovic Seifert ludovic.seifert@univ-rouen.fr

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 23 July 2016 Accepted: 12 January 2017 Published: 30 January 2017

#### Citation:

Seifert L, Lardy J, Bourbousson J, Adé D, Nordez A, Thouvarecq R and Saury J (2017) Interpersonal Coordination and Individual Organization Combined with Shared Phenomenological Experience in Rowing Performance: Two Case Studies. Front. Psychol. 8:75. doi: 10.3389/fpsyg.2017.00075 The principal aim of this study was to examine the impact of variability in interpersonal coordination and individual organization on rowing performance. The second aim was to analyze crew phenomenology in order to understand how rowers experience their joint actions when coping with constraints emerging from the race. We conducted a descriptive and exploratory study of two coxless pair crews during a 3000-m rowing race against the clock. As the investigation was performed in an ecological context, we postulated that our understanding of the behavioral dynamics of interpersonal coordination and individual organization and the variability in performance would be enriched through the analysis of crew phenomenology. The behavioral dynamics of individual organization were assessed at kinematic and kinetic levels, and interpersonal coordination was examined by computing the relative phase between oar angles and oar forces and the difference in the oar force impulse of the two rowers. The inter-cycle variability of the behavioral dynamics of one international and one national crew was evaluated by computing the root mean square and the Cauchy index. Inter-cycle variability was considered significantly high when the behavioral and performance data for each cycle were outside of the confidence interval. Crew phenomenology was characterized on the basis of self-confrontation interviews and the rowers' concerns were then analyzed according to course-of-action methodology to identify the shared experiences. Our findings showed that greater behavioral variability could be either "perturbing" or "functional" depending on its impact on performance (boat velocity); the rowers experienced it as sometimes meaningful and sometimes meaningless; and their experiences were similar or diverging. By combining phenomenological and behavioral data, we explain how constraints not manipulated by an experimenter but emerging from the ecological context of a race can be associated with functional adaptations or perturbations of the interpersonal coordination.

Keywords: ecological dynamics, perturbation, variability, phenomenology, experience

# INTRODUCTION

Interpersonal coordination means that the movements of at least two individuals are coupled. As observed in team sports, individuals can engage in cooperative (within team) and/or competitive (between teams) relationships, which influence the dynamics of the interpersonal coordination to reach the taskgoal (Vilar et al., 2012; Passos et al., 2016). Rowing crews offer an interesting context for studying cooperative relationships because the rowers need to coordinate their action throughout the race and constantly adjust to each other (Hill, 2002; Baudouin and Hawkins, 2004; Sève et al., 2013; de Poel et al., 2016). This was shown, for example, by analyzing the within-crew coordination of force patterns, particularly by computing the area under the force–time curve differences and the force–time shape differences (i.e., to estimate the movement pattern) (Hill, 2002). In particular, Hill (2002) suggested that the kinesthetic perception of force– time shape differences is easier than perceiving area under the force–time curve differences when rowers regulate their coordination.

The two rowers of a coxless pair crew have a cooperative relationship, but it is of a certain type: leader–follower. When the boat has more than one rower, the rower closest to the stern of the boat is referred to as the "stroke," whereas the rower at the opposite end of the boat is referred to as the "bow." The "stroke" rower is the leader, because he/she is supposed to set the stroke frequency for the rest of the crew to follow (Nolte, 2011). Therefore, although rowing is a cooperative endeavor, it is expected that the stroke rower will drive or lead the crew, while the bow rower is driven or follows the stroke's lead. Although the status of leader and follower is given in advance in the crew, it could be expected that any behavioral fluctuations of one rower (for personal reasons such as fatigue or for environmental reasons such as wind, waves, other boats or changes in the river pathway) or of the boat will disturb the stability of the system organization (both at the interpersonal coordination and boat velocity levels). In this case, it cannot always be assumed that the stroke rower alone will restore the stability of the interpersonal coordination and maintain high boat velocity. Among the parameters used to describe rowing performance and explain high boat velocity, the stroke frequency and the variations in boat velocity are important (Hill and Fahrig, 2009; Rauter et al., 2012). As propulsion alternates with oar recovery, the variations in boat velocity cannot be avoided, which led Hill and Fahrig (2009) to suggest that variations in boat velocity can cost as much as an additional 5 s in a 2000-m race compared with a boat hypothetically moving with constant velocity. Therefore, these authors noted that "a slight reduction of velocity fluctuations may be achieved by a moderate reduction of stroke frequency compensated by an increased force output for each stroke" (p. 593), which seems reachable only by elite rowers (Hill and Fahrig, 2009). As this biomechanical aspect is among the most technical challenge in rowing performance, a great part of the literature focused on these intra-cycle velocity variations; therefore, the inter-cycle velocity variations received less attention. However, several authors emphasized that inter-cycle velocity variations must be minimized in rowing (Martin and Bernfield, 1980; Baudouin and Hawkins, 2002, 2004; Soper and Hume, 2004; Nolte, 2011). The first law of Newton (law of inertia) mentions that an object will continue in a state of rest or of uniform motion (i.e., constant velocity) unless acted upon by external forces that are not in equilibrium (for reviews, see Hay, 1993; Bartlett, 2007). In cyclical locomotor activities such as rowing and swimming, fluid dynamic forces act in a direction opposite to the object's motion and are called drag forces. Drag forces resist motion and, therefore, limit speed, thus sports performance in rowing. To maintain a boat in motion at a constant speed, propulsive forces that equal the total drag force, but in opposite direction, have to be exerted. Thus, propulsive forces have a power that equal the product of the drag force and the speed. From there, the aim of rowers is to maintain a constant speed by minimizing both intra- and inter-cycle velocity variations, in order to minimize too high expenditure of energy. In rowing the minimization of inter-cycle velocity variations can be achieved in three distinct ways: (i) monitoring stroke rate (Soper and Hume, 2004 advised 30 cycle.min−<sup>1</sup> for 2000 m; Hofmijster et al., 2007 advised a stroke rate considerably lower than 36 cycle.min−<sup>1</sup> ), (ii) optimizing the ratio between stroke length and stroke rate, and (iii) increasing the synchronization between the rowers. For this latter point, Baudouin and Hawkins (2002) mentioned that "coordination and synchrony between rowers in a multiple rower shell affects overall system velocity" (p. 401); to improve this factor, they advised to examine how force-time profiles between rowers match, which helps to generate a balanced cumulative blade force. This coordinative aspect of rowing performance was recently investigated through the analysis of how rowers experienced their activity (Lund et al., 2012; Millar et al., 2013; R'Kiouak et al., 2016): the authors emphasized that the rowers not only attempted to coordinate their limbs (i.e., intrapersonal coordination) and themselves (i.e., interpersonal coordination), but also sought to coordinate with other environmental information such as the variations in the boat velocity (i.e., extrapersonal coordination) (Millar et al., 2013).

Managing interpersonal coordination therefore seemed more complex than just the bow rower adjusting to the stroke rower. A case study of a coxless pair crew, which combined the analysis of the phenomenological data (e.g., concerns) from stroke and bow rowers as they performed and the biomechanical characteristics of their movements, demonstrated that the rowers needed to continually adjust their interpersonal coordination (Sève et al., 2013). In particular, the biomechanical parameters studied in relation to the interpersonal coordination helped elucidate the stroke rower's perception of "being pushed." The authors showed that the stroke rower had a bigger stroke amplitude, which involved moving more quickly during the recovery phase in order to catch up to the bow rower's movement and be synchronized for the catch (Sève et al., 2013). A second phenomenon concerning the recovery angular velocity was also evoked to explain the stroke rower's perception of "being pushed." The stroke rower exhibited a lower angular velocity during the first part of the recovery, which led him/her to generate higher velocity during the second part of the recovery (Sève et al., 2013). These authors analyzed the crew phenomenology through their pre-reflective self-consciousness embedded in the lived experience, i.e. the immediate meaning that emerges from the individual's action at each instant and in which the following action is anchored (Merleau-Ponty, 1945; Varela et al., 1991). The crew phenomenology analysis was done on the basis of the lived experience, which concerned the perceptions, concerns and actions of the rowers, collected by retrospective phenomenological interviews (according to the course-of-action methodology; Theureau, 2003; Araujo and Bourbousson, 2016). The combination of phenomenological and mechanical data shed light on how the participants subjectively experienced some of the features of interpersonal coordination. As exemplified recently, it also suggested the interest of investigating how rowers in a cooperative context are able to systematically remain aware of what may perturb performance and the interpersonal coordination they are engaged in, especially when the situation is not controlled in a lab but in a race against the clock (Seifert et al., 2016a). Taken together, behavioral and phenomenological data have highlighted how individuals behave, interact and live experience within their environment (including other individuals), thereby enriching our understanding of interpersonal coordination. This type of phenomenological investigation has shed light on interpersonal coordination as being dynamically regulated (De Jaegher and Di Paolo, 2007; Froese and Di Paolo, 2011; Froese, 2012), especially in observational studies in ecological performance contexts with no constraints controlled by the experimenters.

Although a leader–follower relationship could be expected between the stroke and bow rowers, the previous studies exemplified how the interpersonal coordination was influenced by interacting constraints like weather, wind, waves, change in the river pathway, fatigue, race strategy, and partner activity (for an extensive rationale for the constraint-led approach, see Newell, 1986). Therefore, in our rowing study in a cooperative performance context, we explored crew phenomenology to determine how interacting constraints were meaningful to the rowers; that is, whether these constraints were perturbing or contributed to shaping the interpersonal coordination dynamics. In particular, we assumed that examining both the phenomenology and the dynamics of a coupled oscillator system in a coxless pair crew would provide insight into the inter-cycle variability of interpersonal coordination in an ecological context of performance.

Previous studies have already shown that movement and coordination pattern variability may have a functional and adaptive role (Newell et al., 2005; Davids et al., 2006; Seifert et al., 2014, 2016b), highlighting property of "degeneracy" (Edelman and Gally, 2001) or "functional equivalence" (Kelso, 2012) in neurobiological systems. Edelman and Gally (2001) defined degeneracy as the capacity of system components that differ in structure to achieve the same function or performance output. From this perspective, the functional characteristics of variability reflect the adaptability to reach a task-goal and maintain a high level of performance. Adaptive behaviors, in which system degeneracy is exploited, occur when perceptual motor system is stable when needed and flexible when relevant (Warren, 2006; Seifert et al., 2016b). Thus, although neurobiological systems naturally tend to remain relatively stable within a specific context for reasons of energy efficiency and economy (Sparrow and Newell, 1998) stability and flexibility are not opposite. In particular, flexibility is not a loss of stability but, conversely, is a sign of perceptual and motor adaptability to interacting constraints, in order to facilitate (structural or not) changes in coordination patterns, at the same time, maintaining functional performance (Seifert et al., 2016b). A crucial question in rowing is to understand which part of rowers' coordination is changed when a coxless pair crew adapts to interacting constraints. On one hand, stability of the rowers' coordination could mean that the coordination pattern is reproducible and consistent over time and resists perturbations (e.g., wind and waves in rowing). On the other hand, a flexible behavior means that coordination pattern is not stereotyped and rigid, but adapts to a modification in the set of constraints (e.g., when rowers approach a turn in the river or when rowers are exhausted). This in fact illustrates how the perceptual and motor system might exploit degeneracy property.

What makes this study original is that most studies in rowing highlight the necessity of minimizing inter-cycle velocity variations, but fail to examine the relationships between inter-cycle velocity variations and the movement coordination variability of the rowers. Interestingly, this approach has been proposed in swimming, another cyclic aquatic activity. Cycleto-cycle analysis (during three sets of 300 m swam at 70, 80, and 90% of the personal best time of the 400 m) showed that well-trained swimmers exhibited higher swimming velocity, lower inter-cycle velocity variations and higher adaptability of inter-arm coordination than recreational swimmers (Dadashi et al., 2016). These authors concluded "movement pattern variability showed that skilled swimmers could faster adapt to a new task-environmental constraint, suggesting that cycle velocity variation can be used as a prevalent metric to distinguish the technical capacity of swimmers" (p. 8) (Dadashi et al., 2016). This exemplifies Seifert et al. (2014, 2016b) conclusion, that property of degeneracy in perceptual and motor systems supports functional movement coordination variability. Based on the similarities existing between swimming and rowing, (i.e., cyclical skills taking place in an aquatic environment with alternation of underwater propulsion and aerial recovery), previous studies on swimmers suggest that variability in motor coordination can be considered as functional when (i) velocity of locomotion is high and (ii) is associated with low inter-cycle variations. In such case, this variability reflects the degeneracy of the perceptual and motor systems to adapt to the set of constraints.

The first aim of this study was to examine the variability of the interpersonal coordination and of the individual organization in relation to rowing performance, to better understand the leaderfollower relationships. Indeed, the analysis of rower movement variability and of interpersonal coordination variability might inform on how rowers exploit perceptual and motor systems degeneracy. To reach this aim, we conducted a descriptive and exploratory study of two coxless pair crews performing a 3000 m race against the clock without manipulating any constraints. The second aim was to analyze the crew phenomenology in order to understand how the rowers experienced their own action and their joint action when they had to cope with naturally occurring race constraints. As our investigation was conducted in an ecological context of performance, we postulated that combining the data on crew phenomenology with our analysis of the behavioral dynamics of interpersonal coordination and individual organization would enrich our understanding of the role of variability (for more details, see Seifert et al., 2016a) and degeneracy property. Depending on how the performance evolved (decrease vs. maintenance of high average boat velocity), we hypothesized that the race constraints would lead to perturbations or functional adaptations in the interpersonal coordination and/or individual organization, which would be experienced by the two rowers (a) simultaneously or not simultaneously, (b) as meaningful or meaningless, and (c) as similar or diverging concerns.

# MATERIALS AND METHODS

# Participants and Protocol

This study presents two case studies; therefore, it is difficult to generalize the results and to run any statistical analysis. Two coxless pair crews participated in this study: an international men's pair (lightweight) and a national women's pair (junior). The characteristics of the stroke rower of the international crew were: age 26 years, height 187 cm and weight 67 kg; he had 12 years of rowing experience and trained 20 h/week. He was the national champion twice (2009–2010), won the World Cup in 2008, and ranked fourth at the 2008 Olympics Games. The characteristics of the bow rower were: age 30 years, height 183 cm and weight 70 kg; he had 16 years of rowing experience and trained 20 h/week. He was the national champion in 2009 and ranked second at the national championships in 2008; he ranked fourth in the World Cup in 2008 and fourth at the World Championships in 2009. This pair was chosen for the study primarily because both rowers had extensive experience and expertise in rowing and had been rowing together at the top level for 4 years. Conversely, the women of the national junior crew had never rowed together in competition and had only trained together three times. They also had less experience and expertise in rowing than the international men's pair, suggesting that they might exhibit less skill in adapting to each other. Moreover, the stroke rower was a bit more experienced than the bow rower and the coach expected an asymmetric and unbalanced relationship between them. The characteristics of the stroke rower of the national women's crew were: age 18 years, height 178 cm and weight 82 kg; she had 4 years of rowing experience and trained 15 h/week. She was ranked second at the national junior championships in 2008, fourth at the World Championships in 2008 and fifth in 2009. The characteristics of the bow rower of the national women's crew were: age 17 years, height 188 cm and weight 80 kg; she had 3 years of rowing experience and trained 15 h/week. She was ranked fifth at the World Junior Championships in 2009.

The study was designed and conducted in close collaboration with their coaches. The coxless pair is a boat for two rowers, a stroke rower and a bow rower, each having a single oar. The rowing activity was studied during a 3000-m race against the clock. The men's pair had a run of 350 oar strokes in 10′ 51′′96 while the women's pair had a run of 373 oar strokes in 13′ 10′′10. Both runs were performed in the same pathway on different dates. Since this experiment was performed in ecological conditions (on-water), weather conditions were not identical between crews. According to the coach's verbal report, the wind was noticeably stronger for the men's pair than for the women's pair.

This study was carried out in accordance with the recommendations set out in the guidelines of the International Committee of Medical Journal Editors. The ethics committee of Nantes University approved the protocol. The protocol was explained to all participants, who then gave written informed consent in accordance with the Declaration of Helsinki; in particular, the parents of the junior pair gave their consent.

# Mechanical Measurements

Data were collected during the race using the Powerline system (Peach Innovations, Cambridge, UK, http://www. peachinnovations.com). This system has a data acquisition and storage center connected to (a) two sensors to measure the forces applied at the pin of each oarlock (in the direction of the longitudinal axis of the boat), (b) two sensors to measure each oar angle in the horizontal plane (angle between the oar and the axis perpendicular to the longitudinal axis of the boat), and (c) an accelerometer and a speed sensor (impeller fixed to the hull of the boat) placed at the center of the boat (for further details, see R'Kiouak et al., 2016). The accuracy of the force and angle sensors is respectively 2% of full scale (1500 N) and 0.5◦ , and data were sampled at 50 Hz (Coker et al., 2009). The drive phase begins with a minimum oar angle (catch) and ends with a maximum angle (finish), and conversely for the recovery phase (Hill, 2002; Sève et al., 2013).

# Phenomenological Data Collection

The rowers' behaviors and verbal communications (both rowers were equipped with microphones) were recorded during the entire race with two video cameras. The race was filmed from a boat that followed the coxless pairs. To capture the rowers' phenomenology through their pre-reflective self-consciousness embedded in the unfolding activity (i.e., lived experience) (Merleau-Ponty, 1945; Varela et al., 1991), our study included a methodology for retrospective phenomenological interviews (according to the course-of-action methodology; Theureau, 2003; Araujo and Bourbousson, 2016). Essentially, we conducted selfconfrontation interviews immediately after the race to collect the phenomenological data that reflected their pre-reflective self-consciousness (as extensively developed in the cognitive ergonomics field; Theureau, 2003; Mollo and Falzon, 2004). This pre-reflective self-consciousness characterizes the immediate experience for individuals; that is, the meaning that emerges from their action at each instant "t" for a given period and in which the following action is anchored Merleau-Ponty, 1945; Varela et al., 1991; Theureau, 2003. The pre-reflective selfconsciousness is the meaningful part of an individual's activity and situation: the individuals can show it (i.e., the activity can be mimed by the individual and the elements taken into account in the situation can be pointed out), tell it (i.e., the elements of the situation and activity that are pertinent from the individual's point of view can be described) and comment on it (i.e., certain elements of the activity and situation can be connected with other elements) at each instant under certain methodological conditions of confrontation (i.e., relationship of trust between rower and researcher; focusing the rower on the immediate activity with specific questioning) with the behavioral traces of their activity (Theureau, 2003). Thus, the "meaningfulness" of the situation reflects the individual's capacity to construct meanings during the course of his/her activity in relation to the subjective appropriation of the events encountered. Individuals interact only with the environmental elements that are sources of perturbation to the dynamics of their own activity. Therefore, the meaningfulness of the situation characterizes his/her "own world" (i.e., "Umwelt"; von Uexküll, 1992) in which the individual operates to drive the course of his activity (according to the enactive approach developed by Varela et al., 1991). In our study, video recordings collected the behavioral traces of activity during the race. The interviews were based on these video recordings and consisted of confronting each rower with his/her activity. The participants viewed these videotapes while respecting the race chronology. Immediately after each race they were invited to reconstruct and share their own lived experience, which concerned their perceptions (e.g., informational variables such as visual, kinesthetic, haptic, acoustic variables), concerns (e.g., purposes and concerns) and actions (e.g., communications between rowers, actions with the oar). In this way, the researcher was able to more fully focus on the dynamics of the individual's concerns in the situation and the dynamics of what was meaningful for the individual at each instant. Before each interview, the researcher/interviewer reminded the participant of the nature of the interview and the expectation that the participant needed to "re-live" and describe his/her own experience during the race, without any prior analysis, rationalization or justification (Theureau, 2003). This method is designed to reach the level of activity that is meaningful for the individual at his/her pre-reflective level of consciousness. Thus, the goal of the self-confrontation interview is to encourage the participants to verbally report what they did, felt, thought, and perceived during the race, as naturally as possible, from their own perspective (Theureau, 2003). A number of recent empirical studies in the field of sports expertise have demonstrated the fruitfulness of this methodology for studying the activity–situation coupling during interpersonal coordination tasks (Bourbousson et al., 2011, 2012; Poizat et al., 2012, 2013). Researchers who had already conducted self-confrontation interviews of this type in previous research conducted all the interviews.

# Interpersonal Coordination Analysis

Raw data (oar angles, forces applied to the oarlocks, acceleration and velocity) were filtered with a low pass Butterworth filter with a 5-Hz cutoff frequency. Continuous angular velocities were then computed as the first derivative of the angular position using the central difference formula. In line with de Brouwer et al. (2013) and McGarry et al. (1999), interpersonal coordination was assessed using the continuous relative phase (φrel, in degrees) between two oscillators (i.e., oar angles of the stroke and bow rowers). In accordance with Hamill et al. (2000), the data on angular displacements (θnorm) and angular velocities (ωnorm) were normalized in the interval [−1, +1] cycle to cycle. Then phase angles (φstroke and φbow, in degrees) were calculated and corrected according to their quadrant (Hamill et al., 2000):

$$\phi = \arctan(\omega\_{\text{norm}} / \theta\_{\text{norm}}) \tag{1}$$

Last, the continuous relative phase for a complete cycle was calculated as the difference between the two phase angles (Hamill et al., 2000):

$$
\phi\_{\rm rel} = \phi\_{\rm stroke} - \phi\_{\rm flow} \tag{2}
$$

Following the method of Hill (2002), the kinetic analysis of interpersonal coordination related to the area differences (used to estimate the applied power) and form differences (used to estimate the movement pattern). The area under the force–time curve differences corresponded to the force impulse differences between the rowers. The force impulse was computed for each cycle of each rower as the area under the force–time curve. Then, the force impulse differences of the two rowers were computed cycle to cycle. Second, the form differences corresponded to the force–time shape differences that we studied through continuous relative phase. The continuous relative phase was calculated from the force–time curves of the two rowers, using the previous equations detailed for kinematic analysis.

# Individual Organization Analysis

The oar angle–time and force at oarlock–time series of the two rowers of the same crew were compared by Student t-tests in order to detect which rower was responsible for the interpersonal coordination variability. Statistics were performed with Statistica 8.0 with a level of significance fixed at p < 0.05.

# Inter-Cycle Variability in Interpersonal Coordination and Individual Organization

Each cycle was considered between catch points as the local minimum of the oar angle. Then, force and angle data were resampled to 101 points per cycle, in order to make comparisons between cycles (with cycles of similar duration). The inter-cycle variability was assessed with the root mean square (RMS) and the Cauchy index (Ci) (Chen et al., 2005; Rein, 2012). RMS measures the similarity between each cycle and the mean cycle of the time series, while C<sup>i</sup> measures the similarity between two successive cycles of the time series. The calculation of RMS is based on the squared Euclidean distance between two time series at each point that is averaged, and the square root is taken:

$$RMS\_i = \sqrt{\frac{\sum\_{n=1}^{N} \left(Xi\_n - \overline{X}\_n\right)^2}{N}} \tag{3}$$

where N is the number of samples per cycle (i.e., 101 in the present case) and Xi the cycle, with X being the average cycle. This means that the 101 data points of Xi were compared with the 101 data points of X. Thus, a small value of RMS informs about similar patterns of coordination in comparison with the average pattern. C<sup>i</sup> is based on the Euclidian distance that separates two successive cycles during a trial:

$$C\_i = \frac{1}{K^\*(N-1)} \sum\_{n=1}^{N} \sqrt{\sum\_{k=1}^{k} \left(\chi\_{kn(i+1)} - \chi\_{kn(i)}\right)^2} \tag{4}$$

where i corresponds to a cycle, K the number of variables (i.e., the value of continuous relative phase or force difference in the present case), and N the number of samples per variable during one cycle (i.e., 101 in the present case) (Chen et al., 2005; Rein, 2012). Thus, a small value of C<sup>i</sup> informs about similar successive patterns of coordination without defining the nature of the pattern. RMS and C<sup>i</sup> were computed for the continuous relative phase between the oar angles of the bow and stroke rowers, the continuous relative phase between the oarlock forces of the bow and stroke rowers, and the force impulse differences between the rowers. For both RMS and C<sup>i</sup> , when the cycle was within the 95% confidence interval (i.e., average cycle ± 1.96 SD), it was considered as not perturbed.

# Inter-Cycle Velocity Variations of the Boat

The acceleration signal was integrated to provide instantaneous boat velocity variations and then to obtain the average velocity for each cycle. Because drift may occur from the acceleration signal, the average velocity obtained from the accelerometer was aligned on the average velocity calculated from the speedometer. Once the average velocity was computed for each cycle, the average boat velocity, its standard deviation and then its confidence interval were calculated, in order to determine the cycles outside of the 95% confidence interval.

# Combination of Behavioral Data and Performance

The kinematic and kinetic parameters of behavior were then combined with the performance indicators in order to gain insight into the functional and adaptive aspects of the interpersonal coordination variability throughout the race. As emphasized in the introduction, rowers can adapt to a set of race constraints by varying their motor behaviors (structurally) without compromising function (i.e., to maintain stable boat velocity that remains within the 95% confidence interval), providing evidence for neurobiological system degeneracy (Edelman and Gally, 2001; Seifert et al., 2014, 2016b). Therefore, the property of degeneracy in perceptual and motor systems supports the functional variability of interpersonal coordination when it was associated with performance stability; that is, high average velocity (for an extensive discussion about this functional and adaptive aspect of coordination variability in relation to its impact on performance stability, see Davids et al., 2003, 2006; Seifert et al., 2014). The variability of behavior and performance was considered significantly high when the cycle was outside the 95% confidence interval. From there, three scenarios were distinguished to determine whether the behavioral variability was functional and adaptive (i.e., without significant change in boat velocity) or associated with perturbation (i.e., with significant change in boat velocity) in the coupling between rowers: (a) functional adaptation: at least one behavioral parameter (kinematic or kinetic) was perturbed but the boat velocity was not perturbed, (b) behavioral perturbation: at least one behavioral parameter (kinematic or kinetic) and the boat velocity were perturbed, and (c) velocity perturbation: no perturbation of the behavioral parameters but the boat velocity was perturbed.

# Analysis of the Phenomenological Data and Their Combination with Behavioral Data

The verbalization data from the self-confrontation interviews were processed according to the procedure defined in the course-of-action methodology (Theureau, 2003), which follows a comprehensive and idiosyncratic approach and is grounded in the enactive approach (Varela et al., 1991; Stewart et al., 2010; Araujo and Bourbousson, 2016). We therefore followed five steps:

The first step consisted of generating a summary table containing the data recorded during the race (i.e., a brief description of each rower's behavior) and the self-confrontation interview (i.e., verbatim transcriptions of the prompted verbalizations).

The second step consisted of identifying the elementary units of meaning (EUMs), which are the smallest units of activity that are meaningful for an individual. This process was accomplished by analyzing the audio-video recordings together with the verbalization transcripts.

The third step consisted of reconstructing each rower's personal course of action, leading to the identification of the concerns within each EUM that were meaningful to each rower. The course of action is the reduction of the phenomena of human activity to the level of "acceptable symbolic description" (Varela, 1989, p. 184) and is a valid and useful explanation of the activity. This takes into account the individual's construction of meaning for his/her activity as it unfolds and the "extrinsic" characteristics that the individual considers meaningful (Theureau, 2003). Therefore, the reconstructions of the rowers' courses of action consisted of identifying and documenting the components of the EUMs. Three inseparable components were identified and documented in this study: the unit of course of action, the representamen and the concerns. The unit of course of action is the fraction of pre-reflective activity that can be shown, told, and commented on by the individual. The unit of course of action may be a symbolic construct, physical action, interpretation, or emotion. The representamen corresponds to the elements that are taken into account by the individual at a given moment. The representamen may be perceptive or mnemonic. The concerns refer to the inherent interest of the rower's current activity based on what is meaningful to him/her. In our study, we focused particularly on the "meaningfulness" of the concerns; that is, what the rowers really took into account in the environment in order to act. Therefore, concerns were "meaningless" when the rowers could not put his/her concerns into words or when the researcher could not infer the concerns from the recordings of their behaviors and verbal communications.

The fourth step consisted of identifying the typical concerns of the rowers. Typicality refers to at least four aspects that researchers use to identify occurrence-types (Durand, 2014): (a) they concentrate the most attributes of the activity being observed in the sample of individuals and situations under study, (b) they are most frequently observed in the sample, (c) they show a propensity to occur preferentially when conditions having a "family resemblance" to those being observed are produced, and (d) the individuals express a sentiment of typicality about them in their interactions with the researchers.

The fifth step consisted of characterizing the shared experience of the two rowers. To do so, we analyzed each rower's personal course of action and compared them in order to understand whether the typical concerns of the two rowers led them to: (a) simultaneous or not simultaneous, (b) meaningful or meaningless, and (c) similar or diverging concerns. These three criteria were used to characterize the rowers' shared experiences in four collective phenomenological categories (for a similar study, see (R'Kiouak et al., 2016)). The first collective phenomenological category was labeled Simultaneously and Similarly Experienced as Meaningless (SSE-L) when the rowers did not pay attention to the joint action at the pre-reflective level of their activity. The second category was labeled Simultaneously and Similarly Experienced as Meaningful (SSE-F) when the rowers reported a salient, meaningful experience of the joint action to cope with the race constraints. The third category was labeled Simultaneously Diverging Experiences (SDE) when the joint action was associated with diverging concerns (i.e., not similarly experienced). The fourth category was labeled Not Simultaneously Experienced as Meaningful (NSEM) when one rower reported a meaningful experience of the joint action whereas the other rower did not pay attention to it. **Table 1** shows examples of the concerns of the stroke and bow rowers of the international crew, analyzed to determine their simultaneity, meaning and convergence and categorized into one of the four collective phenomenological categories.

Last, the sixth step consisted of combining the phenomenological data with the behavioral and performance data to determine whether the functional adaptations and behavioral perturbations were associated with (a) simultaneous or not simultaneous, (b) meaningful or meaningless, and (c) similar or diverging concerns of the two rowers.

Several measures were taken to enhance the validity of this analysis (Lincoln and Guba, 1985). First, the self-confrontation interviews were conducted in an atmosphere of trust between rowers and researchers. Trust was built via the establishment of an explicit contract between the researcher and the participant that took into account the respective interests of each one. Second, two investigators independently carried out the data analysis (i.e., reconstructing the courses of action and identifying

TABLE 1 | Examples of concerns of the international crew stroke and bow rowers, analyzed to determine the simultaneity, meaning and divergence of these concerns between rowers and assigned to one of the four collective phenomenological categories.


The last column indicates whether the stroke and bow rowers experienced this higher variability in joint action and/or performance as (a) Simultaneously and Similarly Experienced as Meaningless (SSE-L), (b) Simultaneously and Similarly Experienced as Meaningful (SSE-F), (c) Simultaneous and Diverging Experiences (SDE), or Not Simultaneously Experienced as Meaningful (NSEM), on the basis of the phenomenological data.

the typical concerns, then how these concerns were shared by the rowers) and discussed any initial disagreement until a consensus was reached. These two researchers had already coded protocols of this type in previous studies and were accustomed to course-of-action methodology. This method is justified by the particular characteristics of data analysis in this methodology. Indeed, reconstructing a course of action is not strictly a coding procedure: it requires a plausible interpretation of the ongoing construction of meaning during the individual's activity. This is ensured by the parallel data analysis by different researchers, who mutually discuss their interpretations. Third, a saturation criterion was adopted for the categorization of typical concerns. This criterion was considered to be met when no new categories of typical concerns emerged from the processing of further data.

# RESULTS

The oar angle–time curves (**Figure 1**) and the force at oarlock–time curves (**Figure 2**) of the bow and stroke rowers showed in-phase coupling between rowers. However, when the interpersonal coordination was computed for the continuous relative phase from the oar angles, continuous relative phase from the oarlock forces and force impulse difference, variability was noted between cycles.

# Inter-Cycle Variability in Interpersonal Coordination

The inter-cycle variability was examined through its magnitude (RMS and C<sup>i</sup> values) and frequency (number of cycles outside of the confidence interval, based on RMS and C<sup>i</sup> data). Concerning the kinematic data, the international crew exhibited a mean RMS φrel = 3.21 ± 1.42, with 11 cycles outside of the confidence interval and a mean C<sup>i</sup> φrel = 3.42 ± 1.97, with 8 cycles outside of the confidence interval for 340 cycles performed during the race (**Figure 3**). The national crew showed a mean RMS φrel = 7.53 ± 2.99, with 18 cycles outside of the confidence interval and a mean C<sup>i</sup> φrel = 8.13 ± 3.50, with 17 cycles outside of the confidence interval for 363 cycles performed during the race (**Figure 4**).

Concerning the kinetic analysis, the international crew showed a mean force impulse difference between rowers of 3.65 ± 2.19 N.s with 17 cycles outside of the confidence interval, while the national crew exhibited a mean force impulse difference of 4.93 ± 3.38 N.s with 18 cycles outside of the confidence interval (**Figure 5**).

The calculation of RMS and C<sup>i</sup> for the φrel on the kinetic data showed a mean RMS φrel = 7.6 ± 3.5, with 11 cycles outside of the confidence interval for the international crew, and a mean C<sup>i</sup> φrel = 7.2 ± 3.9, with 8 cycles outside of the confidence interval for 340 cycles performed during the race (**Figure 6**). For the national crew, the mean RMS φrel = 13.5 ± 5.3, with 14 cycles outside of the confidence interval, and the mean C<sup>i</sup> φrel = 13.9 ± 5.7, with 14 cycles outside of the confidence interval for 363 cycles performed during the race (**Figure 7**).

# Inter-Cycle Variability in Individual Organization

**Figure 8** shows individual C<sup>i</sup> based on the oar angles. Whatever the crew, statistical analysis showed higher C<sup>i</sup> values for the bow rower (t = 2.39, p = 0.017 for the international crew

and t = 10.84, p < 0.001 for the national crew) than for the stroke rower. The stroke rower from the international crew exhibited 13 cycles outside of the confidence interval, whereas the bow rower exhibited 17 cycles outside of it, with the stroke and bow rowers both outside of the confidence interval for 9 of these cycles. In the national crew, the stroke rower exhibited 22 cycles outside of the confidence interval, whereas the bow rower exhibited 21 cycles outside of it, with both rowers together outside of the confidence interval for 8 of these cycles.

**Figure 9** shows individual C<sup>i</sup> based on the oarlock force production. No significant C<sup>i</sup> differences were noted between the two rowers of the international crew (t = −1.07, p = 0.287) although significant differences occurred for the national crew (t = −2.20, p = 0.028). The stroke rower from the international crew exhibited 4 cycles outside of the confidence interval, while the bow rower exhibited 16 cycles outside of it, with the two rowers together outside of the confidence interval for 3 of these cycles. In the national crew, the stroke rower exhibited 21 cycles outside of the confidence interval and the bow rower exhibited 20 cycles outside of it, with the two rowers together outside of the confidence interval for 8 of these cycles.

# Inter-Cycle Velocity Variations in the Boat

The average boat velocity was 4.47 ± 0.27 m.s−<sup>1</sup> for the international crew and 3.80 ± 0.19 m.s−<sup>1</sup> for the national crew. Twenty-five cycles (distributed over 4 sequences) for the international crew and 21 cycles (distributed over 5 sequences) for the national crew were outside of the confidence interval (**Figure 10**).

# Combination of Behavioral Data and Performance Outcome

When performance (boat velocity) and the behavioral parameters (i.e., kinematic and kinetic) were combined, 16 cycles were identified as outside of the confidence interval for the international crew and could be categorized as follows: 12 cycles (75% out of a total of 16 cycles) corresponded to functional adaptation, whereas 4 cycles (25%) corresponded to behavioral perturbation (**Table 2**).

Concerning the national crew, 26 cycles were identified as outside of the confidence interval and could be categorized as follows: 21 cycles (80.8% out of a total of 26 cycles) corresponded to functional adaptation, whereas 3 cycles (11.5%) corresponded to behavioral perturbation and 2 cycles (7.7%) related to velocity perturbation (**Table 3**).

# Combination of Behavioral and Phenomenological Data

Our first finding indicated that the behavioral and velocity perturbations were always experienced as meaningful by the rowers, particularly as Simultaneously Diverging Experiences (SDE): 25% of the time (4 cycles out of a total of 16) by the international crew (**Table 2**) and 19.2% (5 cycles out of a total of 26) by the national crew (**Table 3**).

Our second finding pointed out that the functional adaptations were experienced in different ways: (a) Simultaneously and Similarly Experienced as Meaningless (SSE-L): 31.3% for the international crew vs. 11.5% for the national crew; (b) Simultaneously and Similarly as Meaningful (SSE-F): 25% for the international crew vs. 26.9% for the national crew; (c) Simultaneous Diverging Experiences (SDE): 6.3% for the international crew vs. 11.5% for the national crew; and (d) Not Simultaneously Experienced as Meaningful (NSEM): 12.5% for the international crew vs. 30.8% for the national crew. These findings highlight that for the most part the two rowers of the international crew simultaneously and similarly experienced functional adaptions. Conversely, the two rowers of the national crew alternated between simultaneous and not simultaneous meaningful experiences of their functional adaptations.

# DISCUSSION

The main finding of our study was the close association between the stability in behavior and boat performance. In particular, boat velocity variability was associated with the variability in the interpersonal coordination and individual organization at kinematic and kinetic levels, which is in accordance with the literature (Soper and Hume, 2004; Hill and Fahrig, 2009; Nolte, 2011). However, it must be recalled that our study was only based on two cases; therefore, it is difficult to generalize the results and to run any statistical analysis.

From there, our aim was to focus on the cycles (for interpersonal coordination, individual organization and boat velocity measurements) outside of the confidence interval to investigate how rowers exploit degeneracy of the perceptual

and motor systems when they coped with race constraints. Degeneracy property supported "functional" adaptations, because the behavior varied structurally while the boat's velocity remained stable. Conversely, behavioral variability was observed as "perturbing" when it leads boat's velocity outside the confidence interval. This can clearly be seen in the international men's pair at 400 and 540 s of the race, when drops in boat velocity (**Figure 10**) were associated with high variability in interpersonal coordination (**Figures 2**, **5**, **6**) and lived as simultaneously divergent experiences (**Table 1**); this observation led us to characterize these events as "behavioral perturbation." Thus, the race constraints were associated with destabilized interpersonal coordination, called "behavioral perturbations" when the boat velocity decreased or "functional adaptations" when the boat velocity was maintained. This summary of our main findings suggests three aspects for in-depth discussion: (a) the functional vs. perturbing role of variability in interpersonal coordination; (b) the constraints that influence the interpersonal coordination dynamics in rowing, notably with respect to the roles given to the stroke (leader) and bow (follower) rowers; and (c) how the variability in interpersonal coordination was experienced and shared, particularly regarding whether the functional adaptations and behavioral and velocity perturbations were similarly experienced by the two rowers.

confidence intervals (gray zone).

# Functional vs. Perturbing Variability in Interpersonal Coordination

The international crew exhibited 25 cycles outside of the confidence interval for the boat velocity and 8–10 cycles outside of the confidence interval for the behavioral parameters (i.e., RMS and C<sup>i</sup> of the kinematic and kinetic parameters). The national crew showed 21 cycles outside of the confidence interval for the boat velocity and 14–18 cycles outside of it for the behavioral parameters. When the boat velocity and the behavioral parameters were considered together, **Tables 2, 3** highlight that 16 cycles were outside of the confidence interval (accounting for 4.7% of the race time) for the international crew and 26 cycles were outside of it (accounting for 7.2% of the race time) for the national crew. Second, more than considering the boat velocity and the behavioral parameters together, the crucial issue was to determine whether the variability in interpersonal coordination could be functional for achieving the task-goal. Indeed, interpersonal coordination variability should not necessarily be construed as noise, detrimental to performance (Newell and Corcos, 1993; Newell et al., 2005, 2006). Nor should it always be viewed as error or deviation from an expert or theoretical model, constantly in need of correction in practitioners (Davids et al., 2006). Interpersonal coordination variability could instead be considered to exemplify the flexibility of rowers to respond to changes in dynamic performance constraints (Davids et al., 2003; Seifert and Davids, 2012; Seifert et al., 2016b). Thus, in line with our hypothesis that rowers might exploit the degeneracy property of perceptual and motor systems to cope with the race constraints (Seifert et al., 2014, 2016b), we have suggested that interpersonal coordination variability was functional when it was associated with performance stability. From there, we identified three scenarios depending on whether the behavioral variability was functional (i.e., without significant change in boat velocity) or perturbing (i.e., with significant change in boat velocity): functional adaptation (12 cycles for the international crew and 21 cycles for the national crew), behavioral perturbation (4 cycles for the international crew and 3 cycles for the national crew), and velocity perturbation (i.e., when only the boat velocity was affected without any behavioral modification, which concerned 2 cycles of the national crew). For 78% of the time, high behavioral variability was functional because it reflected adaptations to dynamical constraints in order to achieve the task-goal (e.g., the phenomenological data indicated that the rowers' behavioral adaptations were oriented

toward acting on the boat direction or its velocity; see the last section for further discussion). However, 22% of the time, high behavioral variability was associated with a perturbation of the boat velocity. According to the magnitude and frequency of the inter-cycle variability of the stroke and bow rowers' respective motor organization, the high behavioral variability came from one rower (3% of the time; mainly the bow rower) or the two rowers simultaneously (14% of the time), or was not associated was the rowers' behavior (5% of time), confirming that interpersonal coordination in rowing is an important feature of performance (Hill, 2002; de Brouwer et al., 2013; Cuijpers et al., 2015). Our study showed that high variability in interpersonal coordination could occur at both kinematic and kinetic levels; however, the behavioral variability observed in the national crew may have been due to a lack of synchronization in force generation and a significantly greater difference in force impulse between the rowers (**Figures 4**, **5**). The next section discusses how these functional adaptations or perturbations in interpersonal coordination can be explained by a set of interacting constraints, notably the role given to the stroke (leader) and bow (follower) rowers in the crew.

# Constraints Influencing the Coordination Pattern Dynamics in Rowing

Our phenomenological data suggested that when rowers did not focus on themselves or their partners, they focused on various task and environmental constraints (e.g., waves, wind, other boats, changes in the river pathway, buoys indicating a certain distance from the end) that could be associated with a destabilization of their interpersonal coordination. As often observed in a range of cyclic movement tasks performed individually (in bimanual coordination, see Kelso, 1984; in postural regulation, see Bardy et al., 2002; in swimming, see Potdevin et al., 2006) or collectively (in the wrist-pendulum paradigm, see Schmidt et al., 1998; in postural regulation, see Varlet et al., 2011; in rowing, see Cuijpers et al., 2015), stroke frequency is a key task constraint that can act as a control parameter. In particular, Cuijpers et al. (2015) showed that when stroke frequency was increased, the synchronization between limbs and between individual actions was also increased. According to our phenomenological data, the rowers often focused on stroke frequency, boat velocity and boat direction, which might have constrained the coordination between

the rowers, leading to functional adaptation or perturbation (**Tables 2, 3**).

Interestingly, these constraints interacted with another constraint theoretically given in advance: the role of each rower. As explained in the introduction, although it was expected that the stroke rower would lead the crew, while the bow rower followed the other's lead (Nolte, 2011), our results (**Figures 8**, **9**) showed that the bow rower exhibited higher variability in his/her kinetic and kinematic parameters more often than the stroke rower. These results indicated that the stroke rower had to compensate or communicate with the bow rower to balance the interpersonal coordination (which was also reported by Lund et al., 2012). In fact, the phenomenological data of the national crew (cycle 13, **Table 3**) showed that the bow rower looked for information in his/her environment and even for instructions from the stroke rower, and sometimes asked the stroke rower to do a better job of driving the crew. The kinematic and kinetic gap between the stroke and bow rowers occurred very often for the national crew (**Figure 9**), which sometimes could not be self-regulated by the stroke rower. For instance, the stroke rower of the national crew turned back to communicate with the bow rower when she perceived dysfunction in the interpersonal coordination (cycle 22, **Table 3**). These types of behavior were observed by Sève et al. (2013) and confirmed that being coordinated with one's partner is a feature of expertise in cooperative contexts of performance (Hill, 2002; Baudouin and Hawkins, 2004). As observed in our study, several recent studies have shown that interpersonal coordination can be optimized by using miming and signaling strategies to communicate concerns to a partner (Sacheli et al., 2013; Candidi et al., 2015). The meaning of "rowing together" (Lund et al., 2012) through verbal and nonverbal communication confirms the importance given to both behavioral and phenomenological investigation (De Jaegher and Di Paolo, 2007). Indeed, because individuals participate in the "generation of meaning through their bodies and action often engaging in transformational and not merely informational interactions" (p. 39) (Di Paolo et al., 2011), the next section considers how the variability in interpersonal coordination (functional adaptation vs. perturbation) was experienced and shared (De Jaegher and Di Paolo, 2007).

# How the Variability in Interpersonal Coordination Was Experienced and Shared by the Rowers

The combination of phenomenological and behavioral data in our study helped determine whether the functional adaptations or behavioral and velocity perturbations (identified from kinetic and kinematic data) were experienced by the two rowers (a) simultaneously or not simultaneously, (b) as

gray line). Gray zone represents the 95% confidence interval (1.96 SD).

meaningful or meaningless, and (c) as similar or diverging concerns.

Our first finding was that the behavioral and velocity perturbations were always experienced as meaningful by the rowers, particularly as Simultaneously and Diverging Experiences (SDE). This finding indicates that the rowers were able to spontaneously focus on information about boat direction and velocity, stroke frequency, other boats, buoys in the river, edges and turns in the river, all of which at times engaged their behavior differently and were associated with interpersonal coordination destabilization. The divergence in the two rowers' concerns also suggested that the predetermined roles of the stroke rower (i.e., given as leader) and bow rower (i.e., given as follower) were not always respected in the crew (as expected by the coach who paired the junior women rowers of the national crew). Thus, it can be hypothesized that such divergent concerns explain the destabilization in the interpersonal coordination and the boat velocity perturbations. However, it must be kept in mind that the destabilization in interpersonal coordination was associated with changes in boat velocity a few times; however, when boat velocity was perturbed, it never lasted for more than three consecutive cycles (according to **Figures 8**, **9**).

Our second finding was that the functional adaptations in the international crew were mainly experienced simultaneously and similarly, sometimes as meaningless and sometimes as meaningful. This emphasizes that at the international level, the rowers were able to exhibit adaptive variability in their behavior (i.e., individual kinetic or kinematic data outside of the confidence interval) and experience it as meaningless (as already underlined by R'Kiouak et al., 2016). In addition, when the rowers experienced a destabilization in their behavior and/or interpersonal coordination as meaningful, they seemed to do so mainly simultaneously and similarly. According to De Jaegher and Di Paolo (2007), this highlights how international rowers can coordinate their experience through interactions and not just physical manifestations. Indeed, as noted by Lund et al. (2012), many times the international rowers both performed and felt the "joint rhythm," suggesting that they were able to feel their partner's actions through the boat velocity variations in order to minimize them (Millar et al., 2013).

Conversely, the two rowers of the national crew alternated between simultaneous and not simultaneous meaningful experiences of their functional adaptions. This finding suggests that a lack of shared experiences would explain why the national crew exhibited more cycles for which the kinetic and/or


Experienced

 as Meaningful (SSE-F), (c) Simultaneous

 and Diverging Experiences

 (SDE), or Not Simultaneously

 Experienced

 as Meaningful (NSEM), on the basis of the

phenomenological

 data.



kinematic data were outside of the confidence interval. Once again, this can be explained by the asymmetric relationship expected by the coach due to the greater experience of the stroke rower in the national crew. As noted by Millar et al. (2013), rowers can alternatingly focus on themselves, their partners and boat behavior, suggesting that sharing simultaneous and similar experiences and behaviors is a highly complex coordination process.

In conclusion, the investigation of how rowers coordinate their behavior and experience helped explain how high variability in interpersonal coordination can result in being either functional or perturbing; either meaningful or meaningless; and either similar or diverging. Degeneracy property of perceptual and motor systems can help to understand how structural variability of the behavior could be either "functional" (when associated to functional stability, i.e., stability of the boat velocity) or "perturbing" (when associated to significant change of the boat velocity). However, although boat velocity variations between cycles appeared as the main contributor to assess rowing performance, using only this parameter to assess the performance outcome might be a limitation of this study. Additional measure of boat heading orientation might help to understand adjustment onto the velocity. Phenomenological data helped to mitigate that limitation by gathering information about the perceived purpose of the coordination changes by the rowers. Indeed, by combining phenomenological and behavioral data, these two case studies showed how constraints—not manipulated by an experimenter but emerging from the ecological context of a race—can be associated with functional adaptations or behavioral perturbations of interpersonal coordination. As already advanced by Millar et al. (2013), our findings suggest that high expertise implies a better feel for one's partner through the boat, which might reflect a greater appropriation of boat behavior. Nevertheless, this interpretation must be further explored with bigger samples of crews.

# AUTHOR CONTRIBUTIONS

Contribution of the authors are as follows: Conception or design of the work: JB, JL, AN, JS. Acquisition of the data: JB, JL, AN, JS. Analysis of the data: JB, DA, LS, JL, JS. Interpretation of data for the work: JB, DA, LS, JL, JS, RT. Drafting the work or revising it critically for important intellectual content: JB, DA, LS, JL, JS, RT. Final approval of the version to be published: JB, DA, LS, JL, JS, RT. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

# ACKNOWLEDGMENTS

This study was supported by a grant from the Region Pays de la Loire (project ANOPACy) and the grant from Normandy Region (ID: CPER-GRR "Logistic, Mobility, Numeric," Project XTerM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

# REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Seifert, Lardy, Bourbousson, Adé, Nordez, Thouvarecq and Saury. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Foregrounding Sociomaterial Practice in Our Understanding of Affordances: The Skilled Intentionality Framework

#### Ludger van Dijk<sup>1</sup> \* and Erik Rietveld1,2

<sup>1</sup> Amsterdam Medical Center, Department of Psychiatry and Amsterdam Brain and Cognition, University of Amsterdam, Amsterdam, Netherlands, <sup>2</sup> Department of Philosophy/Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, Netherlands

Social coordination and affordance perception always take part in concrete situations in real life. Nonetheless, the different fields of ecological psychology studying these phenomena do not seem to make this situated nature an object of study. To integrate both fields and extend the reach of the ecological approach, we introduce the Skilled Intentionality Framework that situates both social coordination and affordance perception within the human form of life and its rich landscape of affordances. We argue that in the human form of life the social and the material are intertwined and best understood as sociomateriality. Taking the form of life as our starting point foregrounds sociomateriality in each perspective we take on engaging with affordances. Using ethnographical examples we show how sociomateriality shows up from three different perspectives we take on affordances in a real-life situation. One perspective shows us a landscape of affordances that the sociomaterial environment offers. Zooming in on this landscape to the perspective of a local observer, we can focus on an individual coordinating with affordances offered by things and other people situated in this landscape. Finally, viewed from within this unfolding activity, we arrive at the person's lived perspective: a field of relevant affordances solicits activity. The Skilled Intentionality Framework offers a way of integrating social coordination and affordance theory by drawing attention to these complementary perspectives. We end by showing a reallife example from the practice of architecture that suggests how this situated view that foregrounds sociomateriality can extend the scope of ecological psychology to forms of so-called "higher" cognition.

Keywords: affordances, social coordination, ecological psychology, enaction, materiality, sociomateriality, Skilled Intentionality Framework, "higher" cognition

". . . I distinguish between the movement of the waters on the river-bed and the shift of the bed itself; though there is not a sharp division of the one from the other." – Wittgenstein (1969, §97)

# INTRODUCTION

In order to understand human social coordination in a Gibsonian framework it is important to understand in what sense affordances – possibilities for action provided to us by the environment (Gibson, 1979) – are always already situated in the sociomaterial practices that make up our human form of life; i.e., in what sense it is an affordance-in-sociomaterial-practice. Our approach is to

#### Edited by:

Michael J. Richardson, University of Cincinnati, USA

#### Reviewed by:

Cor Baerveldt, University of Alberta, Canada Harry Heft, Denison University, USA

\*Correspondence: Ludger van Dijk ludger.vandijk@amc.uva.nl

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 28 September 2016 Accepted: 05 December 2016 Published: 09 January 2017

#### Citation:

van Dijk L and Rietveld E (2017) Foregrounding Sociomaterial Practice in Our Understanding of Affordances: The Skilled Intentionality Framework. Front. Psychol. 7:1969. doi: 10.3389/fpsyg.2016.01969

combine such theoretical work on affordances with concrete reallife situations of social interaction as described in ethnography. As such this paper is targeted to anyone with an interest in affordances and skilled action, including skilled social action as we encounter that for example in ecological psychology, philosophy and various domains within the social sciences. Affordances, as "what [the environment] offers the animal, what it provides or furnishes, either for good or ill" (Gibson, 1979, p. 127), allow us to foreground sociomateriality because they do not occur in isolation. Rather, affordances are aspects of the ecological niche of a kind of animal. They thus always figure in a "setting of environmental features" (Gibson, 1979, p. 129), or of multiple affordances (p. 128).

Affordances are thus always situated. Looking around one will notice things and people offering multiple possibilities for action. Siting in the train for example, I can notice the possibility to drink from a bottle of water, talk to a fellow traveler or return to writing this paragraph. All these affordances belong to a wider socio-cultural context (Hodges and Baron, 1992; Costall, 1995; Ingold, 2000; Heft, 2001; Rietveld, 2008a). For example, I am seated in a "silence area," so talking is not really an option and the bottle of water is not mine, but belongs to my neighbor, so I better not drink from it. Doing not much else than writing in this silence area, I am thus showing a responsiveness to a whole socially significant situation – to the kind of place in which I am situated, to a "behavior setting" (Heft, 2007, 2011). Moreover by typing away in appropriate silence, I contribute to maintaining this behavior setting as a "silence area" indeed. This way of responding and contributing to the maintenance of a behavior setting is a form of social coordination, albeit different from what is typically studied in ecological psychology, because it acknowledges social aspects even in situations where there is no direct interacting with other individuals (but see Barker, 1968; Heft, 2001).

More orthodox forms of social coordination are also found in our example and these are situated as well. In the behavior setting that is the train's silence area, I am actually continuously adjusting my feet and leg placement to accommodate the person opposite me. I pretend not to be annoyed when someone's phone rings, and exchange a brief glance of understanding with someone else equally annoyed. Adapting continuously to the people around me, I let the situation constrain my behavior, and by doing so (still typing in appropriate silence) I again do my part to maintain it. In fact, for any person in this silence area, being responsive to the behavior setting includes both being responsive to the opportunities for action, the affordances, offered by the material environment and to the opportunities for social engagement offered by other people. As we will show, there is no clear separation of the two because in acting skillfully one is attuned to the situation as a whole.

Despite the fact that in such real-life situations affordances and social coordination are situationally intertwined, the contextually situated nature of both is easily overlooked. An affordance for e.g., stair climbing (Warren, 1984) would be responded to very differently if one hopes to get up the old squeaking stairs without waking anyone, and the way one approaches and stops for a red light in a car (Fajen, 2007) changes if one is driving with one's elderly mother-in-law or with a newborn baby in the backseat (see Hodges, 2007). Similarly, judging whether something can be carried together (Richardson et al., 2007) might change as soon as such judgment is required in a different context – say for carrying a coffin at a funeral. The dynamics of the coordinating people that actually carry a coffin would no doubt change as well. In other words, to understand how we respond to affordances offered both by material aspects of the environment and by other people, it is crucial that we understand the practical situation in which such behavior occurs.

Within ecological psychology there is currently a divide between the field of research that focuses on engagement with the affordances offered by the material environment and the field that focuses on social coordination. Moreover, as paradigmatic cases and dedicated methods are developing largely independently in both areas of ecological psychology, they risk focusing on a limited set of phenomena and growing apart further. The above examples, however, indicate that neither work that focuses on affordances in isolation nor on social coordination on its own, can account for the full breadth of skilled involvement of humans in the context of their ecological niche.

In fact, both fields are already focusing mostly on cases of direct "online" behavior (responding either on a current affordance or to another person), but neither foregrounds the larger situational context in which these dynamics unfold. Because of that, ecological psychology has yet to move into the domain labeled "offline" or "higher" cognition, such as dealing with non-existent things (i.e., "representation hungry" problems – see Clark and Toribio, 1994; see also Rietveld and Kiverstein, 2014; Rietveld and Brouwers, 2016; Van Dijk and Withagen, 2016). To get the most out of ecological theory and extend ecological psychology beyond its current scope, we believe we need a framework that integrates both fields in a fundamental way. Doing so, we will claim, requires an understanding of individuals as coordinating with and situated in multiple nested scales of sociomaterial dynamics. We need to understand the human eco-niche as being sociomaterial through and through. Having such an integrative account, we believe, will not only bring the two parts of ecological theory closer together, but will also allow ecological theory as a whole to broaden its scope to include the wide variety of human practices.

To provide such an integrative account and broaden the scope of ecological psychology we will introduce the Skilled Intentionality Framework (SIF). Skilled Intentionality is coordinating with multiple affordances simultaneously. Our main point will be to show that the SIF integrates social coordination and affordances in a fundamental way because it incorporates the notion of sociomateriality. We will do so by first explaining the concept of sociomateriality as found in the field of ethnography in Section "Sociomaterial Entanglement." After this preliminary, we will introduce three complementary perspectives on the human practices and the affordances they imply. In Section "Practices and the Landscape of Affordances" (i) a zoomed out perspective on our human practices as a whole will be provided in which the materiality and standing practices can be identified that constrain an individual's activities. Complementing this perspective the section goes on to discuss

(ii) a zoomed in perspective of a local observer looking at people acting in their environment. This second perspective highlights how individuals coordinating together continuously restructure sociomateriality. In Section "Skilled Intentionality" (iii) a third perspective is added: the lived perspective of a skilled individual being responsive to its surroundings.

Each of these perspectives on the human form of life brings an aspect of human involvement into view. On its own, however, any one perspective also loses sight of other aspects of situated coordination, which is the reason why we believe they are best treated as complementary perspectives. The metaphor of zooming in and out enables us to see how the SIF combines these perspectives to arrive at a rich framework for understanding a wide range of human involvement in ecological terms. Using an ethnographical example of the sociomaterial engagement of architects in practice, we will end Section "Skilled Intentionality" by illustrating how integrating social coordination with affordance-theory in a fundamental way, can open up ecological theory to dealing with "hard cases" of the "whole realm of social significance" as Gibson (1979, p. 128, our italics) called it. In the case of architecture this includes real life situations of dealing with non-existent things, such as a vision, drawing or model for a future building.

# SOCIOMATERIAL ENTANGLEMENT

The neighbor's bottle of water encountered while riding the train offered the opportunity to drink from, even though it would have been highly inappropriate to do so. It offered this affordance rather than, say, throwing it out of the window. The affordance of drinking from a bottle has, as Costall (1997) called a "canonical" character. Crucially, this canonical character comes from the bottle figuring in a large "constellation" of practices as Costall (2012, p. 91) calls it, shared among many individuals. In this way canonical affordances:

"characterize the normative character of the meaning of things. A chair, for example, is for sitting on, even though it may be used in many other ways, e.g., as firewood or for standing upon. The meaning of a chair is defined by its name, sustained and revealed within certain practices, and realized in its very construction." (Costall, 1997, p. 97)

What Costall's notion of canonical affordances stresses is the fact that such affordances are situated not just in the "current" behavior setting, but also in a more encompassing, shared and historically developed constellation – such affordances exist as they persist in shared and social practices (see also Ingold, 2000, pp. 167–168). They exist as many individuals act on them in more or less appropriate ways, in the totality of practices that, together with other affordances, sustain them. For example, citing Dreyfus (1988), Costall points out that a hammer will only be perceived as such against the background of dealing with nails, walls and, say, pictures that afford hanging; i.e., against, what Heidegger (1927/1962, p. 97) called, a "totality of equipment." As such, canonical affordances are part of what we might call a wider "standing practice"; they are relatively persistent material aspects of the practices in our shared sociocultural environment, depending on an entire community of people, yet on no individual in particular.

# Sociomateriality in Practice

In the view that we are developing in this paper, relevant aspects of the environment and of the organism can only be understood in a concrete situation within a constellation of practices. Acknowledging these practices allows us to place socio-cultural aspects of our coordination center stage. However, material aspects of the environment equally partake in the constellation of practices. To see this, the taken for granted conception of materiality as "pre-formed substances" (Orlikowski, 2007, p. 1438) needs to be reconsidered.

Consider for instance research in the social sciences concerned with workplace practices. This research emphasizes how social practices are "inherently bound-up with materiality" such as places, material artifacts, bodies, and, infrastructures (Orlikowski, 2007, p. 1436). Zooming in on concrete situations by means of ethnography, the material and the social turn out to be intertwined in ways that lead researchers undertaking ethnographic studies to speak of "sociomaterial practices" (Mol, 2002; Suchman, 2007). Ethnographer Annemarie Mol illustrates this intertwinement well in discussing her ethnographic work in medical practice:

"[T]he practice of diagnosing and treating diseases inevitably requires cooperation. . . . In the consulting room something is done. . . .[T]wo people are required. A doctor and a patient. . . . The doctor must ask questions and the patient be willing and able to attend to answer them. And in addition to these two people there are other elements that play a more or less important role. The desk, the chairs, the general practitioner, the letter: they all participate in the events . . . As does [the patient's] dog, without whom she might not have even tried to walk more than the fifty meters after which her left leg starts to hurt." (Mol, 2002, pp. 22–23).

This example shows that the particular details (form the desk and the chairs to the patient's dog) of the situations in which we act, matter a lot, which is why the constellation of practices shows up as sociomaterial in nature. It is only within the context of this situation that one of the people is primarily a "patient" (rather than say a grandmother or a love interest) and that the piece of paper shows up as an important "letter" from the general practitioner rather than as offering the opportunity to fold it into an airplane. More generally, ethnographic research like this shows that:

"all practices are always and everywhere sociomaterial, and . . . this sociomateriality is constitutive, shaping the contours and possibilities of everyday organizing." (Orlikowski, 2007, p. 1444)

The "possibilities of everyday organizing," the affordances encountered in daily life in the human ecological niche, form within our practices and these affordances are therefore always and everywhere sociomaterial. For example, we saw already that hammers are not just heavy things – their canonical character emphasized that they afford hammering relative to a sociocultural practice. Moreover now we can see how this in turn

affects its materiality: hammers are not just heavy, hammers are heavy enough, they have the appropriate weight to drive in a nail – a hammer's materiality is sociomateriality (see also Heidegger, 1927/1962, p. 98). In the context of our practices, the aspect of the sociomaterial environment also known as a "hammer" affords driving in nails. To be skillful in dealing with a hammer then, is to know your way about a particular aspect of the human form of life (see Wittgenstein, 1953, §123; see also Rietveld and Kiverstein, 2014; Van Dijk and Withagen, 2014).<sup>1</sup>

# Constitutive Entanglement

According to the social scientists quoted above, possibilities for everyday action are sociomaterial in a constitutive way, that is:

"A position of constitutive entanglement does not privilege either humans or technology (in one-way interactions), nor does it link them through a form of mutual reciprocation (in two-way interactions). Instead, the social and the material are considered to be inextricably related — there is no social that is not also material, and no material that is not also social." (Orlikowski, 2007, p. 1437)

This is of fundamental importance and requires some unpacking. The notion of a constitutive entanglement is characterized by two features. First, its various aspects are interdependent and second, it emphasizes that none of the aspects has priority (is "privileged") over another (see also Ter Hark, 1990; Schatzki, 1996; Mol, 2002, p. 133; Orlikowski, 2007; Ingold, 2011; Van Dijk, 2016; Van Dijk and Withagen, 2016). The ethnographic examples above that stress how the social implies materiality and materiality implies the social offer a case in point. But let us look at both features more closely by considering the relations between practices, affordances, activities and sociomateriality.

The relation between a practice and the affordances implied by it can be understood constitutionally. It is an example of a constitutive relation because (i) the practice and the affordances that take shape within it are interdependent: any affordance will imply a practice for realizing it and any practice will imply a landscape of available affordances.<sup>2</sup> Furthermore (ii) practices and affordances do not admit of a prioritization. As we saw above, the affordance to use a hammer is available within the context of our hammering practices and conversely, the hammering practices are maintained by responsiveness to the possibilities for driving nails into walls.

The relation between parts and whole, we encounter here in the form of activities and practices respectively, can also be understood in constitutive terms. Consider for example the activities and the individuals partaking in a practice – for example the medical practice we saw in the ethnographic example above. As the situation in the doctor's office unfolds, some person is primarily a "patient," papers become important "letters," reciprocally, in doing so, in acting, the behavior setting at the doctor's office is maintained: thus when someone walks in unexpectedly for example, he will immediately adjust his behavior to fit in with the reserved atmosphere. The medical practices are thus maintained by the activities of the individuals within it. What the constitutive reading stresses is that the whole (the practices) gives form to its parts (the activities unfolding within it) and these activities equally give form to the practices as a whole. In general, in a constitutive entanglement the parts are continuous characteristics of a process – this process then is the continuously forming whole<sup>3</sup> (see Barker, 1968; Shotter, 1983; Ingold, 2000, 2011; Van Dijk, 2016).

What this constitutive entanglement highlights, and ethnographic research helps to make tangible, is that we can take multiple complementary perspectives on the constellation of practices and that we can foreground the sociomateriality in each. First, the fact that in this view practices and affordances are two sides of the same coin, i.e., of the same sociomaterial entanglement of people, activities, places and things, allows us to switch between foregrounding the one or the other. Second, the idea that (individual or joint) activities are constitutionally related to (communal) practices allows us to conceptualize their differences as one of degree rather than kind. As researchers, we can thus think of ourselves as zooming in and out in both space and time on a form of life, to bring different aspects of it into prominence: practices or activities, affordances or their sociomateriality.

To unpack this view further and to bring its implications to bear on the relation between social coordination and affordances in ecological psychology, we will now introduce the Skilled Intentionality Framework that has sociomateriality at its heart. Through the SIF we will show how we can take (i) the zoomed out perspective on (relatively) persistent sociomateriality of a whole form of life (see The Zoomed Out Perspective: Practices and Affordances), (ii) a zoomed in perspective of a local observer looking at sociomateriality in flux (see Zooming in on the Landscape of Affordances) and (iii) the perspective from within an unfolding action as an individual responds to the multiple affordances available to him or her (see The Unfolding Action from the Actor's Lived Perspective). Bringing these perspectives together, we will show in Section "Skilled Intentionality Unfolding in Architectural Practice" how they integrate social coordination and affordances such that ecological

<sup>1</sup>There is in important normative aspect to this kind of skilled engagement in a form of life or socio-cultural practice. Such "situated normativity" can be seen as distinguishing between better and worse (e.g., adequate and inadequate, appropriate and inappropriate, or correct and incorrect) in the context of a concrete situation (Rietveld, 2008a). We have analyzed this kind of situated normativity in earlier work, both for unreflective skilled action (Rietveld, 2008a, 2010) and for more reflective forms of skilled action, such as seeking the right word (Klaassen et al., 2010), evaluating the quality of an architectural design (Rietveld and Brouwers, 2016) or making a correct explicit judgment about something (Rietveld and Kiverstein, 2014; Kiverstein and Rietveld, 2015).

<sup>2</sup>The interdependence is also highlighted in the way Gibson (1979) conceives of the relation between an ecological niche, i.e., a "set of affordances" (Gibson, 1979, p. 128) that implies a way of life, and a kind of animal: "The natural environment offers many ways of life, and different animals have different ways of life. The niche implies a kind of animal, and the animal implies a kind of niche. Note the complementarity of the two." (Gibson, 1979, p. 128).

<sup>3</sup>Note that in this view, as a process the whole is itself never complete. It is fundamentally open to change.

psychology will be open to account for real life situations of dealing with non-existent things, such as modeling a future building.

# PRACTICES AND THE LANDSCAPE OF AFFORDANCES

Wittgenstein's concept of the 'form of life' fits in nicely with the constitutional entanglement and the constellation of practices as we identified them in ethnography and Costall's work. The form of life of a kind of animal, as Rietveld and Kiverstein (2014, p. 328) point out, "consists of patterns in its behavior, i.e., relatively stable and regular ways of doing things." Such relatively stable ways of doing things show themselves for example in the regularities that characterizes expert practices like architecture, surgery and academia as well as in our everyday activities, such as our appropriate use of chairs or doors and the way we talk about them (Wittgenstein, 1969, §7).

The constitutive character of the relation between activities and the standing practices, i.e., the form of life, implies that activities are sensible aspects only relative to the form of life. An example of this would be our human form of life in which we use, e.g., chairs for sitting and doors to enter or close off a room. Chairs do not play a role in the forms of life of, say, lions or earthworms, but they are relevant in our form of life. Indeed, were we to show genuine surprise or disbelief each time we encountered a chair, we would act inappropriately in a strong sense: we would fail to make sense because we fail to share with others a way of acting, of responding to, everyday things – that is, we fail to share agreement in our form of life, which is an agreement in what people typically do (Wittgenstein, 1953, §123; see Ter Hark, 1990, p. 70; Schatzki, 2002). The meaning and relevance of our activities are constrained by the form of life in which they figure.

To Wittgenstein, through our concrete activities of talking and doing (Moyal-Sharrock, 2004) both in everyday life and in expertise, a river-bed of practices continuously shows itself. These practices constrain activities – talking and doing – that unfold within it (Rietveld, 2008a), just as the movements of the water is constrained by the river-bed. Reciprocally, the movement also allows for shifts in the river-bed itself:

". . . I distinguish between the movement of the waters on the river-bed and the shift of the bed itself; though there is not a sharp division of the one from the other" (Wittgenstein, 1969, §97)

The Skilled Intentionality Framework (Rietveld, 2012; Bruineberg and Rietveld, 2014, Figure 1; Rietveld and Kiverstein, 2014; Kiverstein and Rietveld, 2015; Rietveld et al., 2016) aligns with Wittgenstein and identifies the constellation of sociomaterial practices we encountered above as our human form of life. The form of life consists, in other words, of our actively maintained standing practices – our regular ways of doing things:

"What is common to human beings is not just the biology we share but also our being embedded in sociocultural practices: our sharing steady ways of living with others, our relatively stable ways of going on." (Rietveld and Kiverstein, 2014, p. 329)

By taking up the Wittgensteinian concept of the form of life, the SIF opens up ecological psychology to sociomaterial aspects of the world. It includes the constraining influence of material properties but also our shared practical understanding of the affordances offered by buildings, chairs, silence areas and other people (Rietveld and Kiverstein, 2014, p. 329 ff.; see Kiverstein and Rietveld, 2015, p. 15).

# The Zoomed Out Perspective: Practices and Affordances

Notice that the form of life, as relatively stable and regular patterns of behavior, is perfectly concrete. We can see this clearly when we as behavioral scientists or philosophers "zoom out" from an individual's activity in a sociomaterial situation to a perspective that allows us to discern patterns in the behavior of a community of people of a larger spatiotemporal grain. Think, for example, of the regularities one would notice when watching a time-lapse recording from above on New York's Paley park or Amsterdam's Vondelpark (see Whyte, 1980; Rietveld and Kiverstein, 2014). Regular ways of doing things appear across, e.g., seasons or times of day, and depend on, for example, the sociomaterial aspects of the environment such as benches, ponds and paths. For example, we will see people walk on paths and to a lesser extent on grass, but not on ponds, except when the water of the ponds is frozen.

We will call this view that aims to overlook such regularities of the form of life from a time-lapse camera a "zoomed out" perspective. Zooming in we see individuals caught up with people and things in multiple ongoing activities, but zooming out we notice their regularities: persistent practices – that is to say, the stable patterns of behaving that characterize the form of life. By calling attention to the form of life, the SIF aims to make the regularities that our activities exhibit tangible, and show how these regularities are sociomaterial and therefore aspects of the environment that are available for coordinating with.

Now in order to have a notion of affordances that acknowledges these large scale regularities and that is therefore open to sociomaterial practice, affordances are defined within the SIF as related to the form of life: an affordance is a relation between an aspect of the sociomaterial environment in flux and an ability available in a form of life (Rietveld and Kiverstein, 2014, p. 335; see also cf. Chemero, 2009; Rietveld, 2016; Rietveld and Brouwers, 2016). From this definition it is clear that this conception of affordances aims to emphasize the entanglement of the ever changing sociomaterial environment and the abilities that continuously form within this environment. Note that the definition does not imply a prior separation of its relata.

Defining affordances relative to a form of life turns the materiality of affordances into sociomateriality in the human case. It allows us to make sense of a chair not just as a place to sit but, as we shall see, as a chair as it figures in its many ways in our human practices, inviting sitting, but also naming, pointing to or marveling at in a museum (Withagen et al., 2012; Rietveld and Kiverstein, 2014). Similarly, doors are not only hinging vertical

surfaces, but are doors that can solicit opening or keeping closed. And, as we shall see, it allows us to understand how stones can afford throwing in one situation and afford being a paper weight in the next. All these affordances are situated, but concretely available aspects of the sociomaterial environment to coordinate to. We can see that they function in these manifold ways if we zoom out on our practices in space and time to notice how chairs, doors, benches, paths and ponds are entangled within and across concrete situations.

Skilled Intentionality Frameworks sociomaterial notion of affordances is a more inclusive notion of affordances than the traditional purely "material" one. Nonetheless, these affordances still pose enormous (material and social) constrains on the possibilities available to an individual, to the extent that their materiality can appear to be "subject to no alteration or only to an imperceptible one" (Wittgenstein, 1969, §99). Thus although an office chair not only allows sitting, but now also affords for example calling it "an office chair," the material reality of this aspect of the sociomaterial environment still does not allow us to fly to Baghdad on it (cf. Cutting, 1982). So the SIF's notion of affordances is constrained by material reality. As we will explain below and in the next section, however, the situated nature of encountering a chair as sociomaterial allows us to also make sense of the fact that a skilled individual will typically be constrained even further – he or she will for example not be inclined to point and call out "an office chair!" save when surprised to find one, as for example in a museum (compare this to a young child). In short, the SIF considers our human actions to be constrained by (and responsive to) not just the environment's materiality, but its (broader and irreducible) sociomateriality.

# Zooming in on the Landscape of Affordances

Having defined affordances relationally and in terms of the form of life, an important re-orientation realized by the Skilled Intentionality Framework is that it allows us to switch between the standing practices and the affordances that they imply. Given that in our human form of life there are many aspects of the sociomaterial environment and many abilities available, these standing practices can thus be seen as unfolding in a relatively persisting rich "landscape of affordances" (Rietveld and Kiverstein, 2014).

By calling attention to the landscape of affordances within a form of life, the SIF allows us not only to understand the practices from the perspective of the affordances that they imply (in the park, for example, including the action possibilities offered by benches, paths and ponds), but it also allows us to zoom in on concrete and situated activities that constitute the various grains of the form of life (see Rietveld and Kiverstein, 2014). Specifically, we can zoom in to the spatiotemporal grain where individuals live (e.g., sitting on benches but not swimming on benches) – that is, the zoomed in perspective of the local observer. This is the perspective where, e.g., the dynamics of a behavior setting, a place or another concrete situation unfold as observed from the perspective of a behavioral scientist or as modeled by coordination dynamics.

Notice, however, that this zoomed in perspective, while highlighting the unfolding dynamics, obscures large-scale regularities. Just as one cannot observe someone's habits if one just observe the person for a few seconds, one cannot observe a practice by watching it briefly. From the zoomed in perspective we are standing too close to see the regularity of the engagement with affordances as it occurs in an entire sociomaterial practice (say the practice of architecture). Yet we do see another aspect of this landscape: we see how the details of the sociomaterial environment are changing and affordances are forming in the sociomaterial entanglement of people coordinating with others and materials in real-time. To make this concrete, let us turn to an example of sociomaterial coordination in action.

### Sociomaterial Constitution in Practice

As an example of the sociomateriality of the landscape of affordances in flux, consider a situation in which one is having a coffee with a friend at a coffee bar. Coffee bars have become part of our human form of life; it is a behavior setting where the "recurrent features" of the coffee bar "both become[. . .] a resource for, and [are] organized by, customers speaking together" in coffee bars (Laurier, 2008, p. 168). Zooming in to the scale of the skilled individual entering into such a place for a drink, the way the room is furnished, the walls, the tables, the bar, the chairs, the people, turn out to entangle into a rich landscape of affordances in flux that enable and constrain the activities of an individual entering into the behavior setting: the welcoming smile of the waiter offers the affordance of ordering coffee, the friend affords having a conversation, the coffee cup affords grasping, the spoon stirring, the coffee drinking, the biscuit eating and the people to the right afford glancing at. Moreover, somewhere on the horizon of this situation, the 4PM train the person is planning to take back home will afford catching.

Looking at the sociomateriality in flux, we can see how social coordination and materiality are intermingling as affordances show up. As different affordances are coordinated with and responded to in appropriate ways, they change the sociomaterial environment – and thus the landscape of affordances shaping the unfolding situation. For example, during the conversation, the affordances of the words spoken by the friend and the affording coffee are coordinated with and get intertwined: "[T]he very fact of drinking . . . eases the conversation along. . . . Alongside this . . . the movements and objects that accompany drinking become resources in talking together" (Laurier, 2008, p. 178).

Consider how, at a later moment in the ethnographic transcript, a detail like placing a cup (in this case a glass) helps to shape the affordance to leave the coffee bar:

"After this quick sip F makes a charming and classic gesture of having finished with her drink even though the glass is not empty when she puts it back on the table: she pushes it away from her. The glass ending up slightly beyond the can of coke, a visible adjustment to the previous repeated return point of the glass to the table. By her pushing it away, she is establishing it, at this point in the unfolding action, as potentially the last sip from the glass.... F displays in this gesture, that she has noticed that B has finished her coffee and is now making available to B that they are potentially both finished with their drinks." (Laurier, 2008, p. 175)

By gesturing ("social") with the glass ("material") and simultaneously changing the layout of the table ("material") in subtle ways, a small part of the sociomaterial environment as a whole is reconfigured.<sup>4</sup> Doing so, the local landscape of affordances changes for both people situated in it and the affordance to leave can become one shared relevant possibility among others.

Notice that from this perspective on a complex yet everyday situation in flux, as from the perspective on the form of life from afar, again everything is social and everything is material to some degree. Situated here in the landscape, the coffee spoon, the cup, the chairs, intermingle to become "resources in talking together" (Laurier, 2008, p. 178). Their materiality constrains the situation and helps to form a temporary "social synergy" (Marsh, 2015) that engages and constrains the behavior of both persons: "The unit they have formed will be resistant to forces that temporarily perturb the action" (Marsh, 2015, p. 23). Even the affordance to stand up and leave, which is coordinated to in the coffee bar situation by the two skilled talkers, is sociomaterial: both the flow of the conversation, the gesture and the change in table configuration enabled it. One would not manifest much skill in conversing if one were to stand up and leave the conversation halfway an unfinished sentence. The relevance of the possibility to stand up in this particular situation here and now, is neither just related to an embodied ability, nor is it just material or social – it is related to the constitutive entanglement of ability and sociomateriality.

### Persistence through Change

This sociomateriality of affordances can be further highlighted when we imagine it is getting late and the 4PM train that one of the friends in the coffee bar needs to catch will leave very soon – she is in a hurry. When in a hurry, this concern of the individual will extend her situation, which means that it includes coordinating to a larger part of the landscape in which the individual is situated. Including the distant departure of the 4PM train within the situation will moreover re-configure many other sociomaterial aspects. For example, the frequency of the sips of coffee by the rushed person will increase and the topic discussed in conversation may be constrained. A moment of harmonious silence can now be the kind of opening in the conversation that moves the person to a slap on the thigh and the remark that it is time to leave (see Laurier, 2008). Although much in the behavior setting remains the same, in the newly unfolding sociomaterial context many affordances also change – even the temperature at which the coffee will afford drinking will be higher. In short, parts of the sociomaterial environment and the resulting behavior patterns are continuously re-arranged and reconfigured and other affordances enter the situation and dissipate as the departure time of the train approaches or the coffee is finished.<sup>5</sup>

From the perspective of a local observer that we adopted when zooming in on persisting practice in our form of life, we can thus see individuals in the process of coordinating skillfully. Any particular action within this process will be constrained by the available possibilities for acting in the form of life that we zoomed in on, and that the individuals that we see grew up in. As a particular action is unfolding, the particular sociomateriality of the local landscape of affordances will constrain the available actions further still – my pen with red ink will not afford drawing a blue line (see Rietveld and Kiverstein, 2014, p. 344) even though drawing blue lines is certainly a possibility available in our form of life.

Nonetheless, there is an important amount of uncertainty for the local observer, because the observed action can always continue in several directions – it has a kind of indeterminacy (Shotter, 1983; Schatzki, 2012, pp. 19–20) in the sense that what is done is yet to be determined – and can only be determined by the observer after the observed activity has been performed. In other words, namely viewed in terms of affordances: from the perspective of a local observer someone else's particular unfolding action at a particular location in the landscape implies a multitude of possibilities which decrease in number as his or her action unfolds further until only one is realized (even though the situation will continue to afford many more actions).

To give an example of this increasing determinacy of action, consider that increasing the frequency of the sips of coffee allows not just for getting the 4PM train, but also for entirely different affordances such as making it to the bakery before it closes and for catching the 4PM movie. These possibilities are all available in the landscape of affordances in which the individual is situated. By continuously being responsive to (and constrained by) the relevant affordances available in the landscape however, the person turns left toward the station rather than right toward the bakery for example, and the possibility to go to the bakery moves further out of the individual's situation (and other possibilities move in). All the while moreover the possibility to catch the 4PM train not only persists but also gains determinacy: the coordinated activity that started with increasing the frequency of sips at the coffee bar, ends in allowing little more than catching the 4PM train by jumping through the aperture of the closing doors of the train at the platform. At that point, the coordinated activity has realized the affordance to catch the 4PM train through coordinating sociomaterial aspects in a particular way, as was seen by the local observer, while of course many new possibilities for action have already entered the situation in which the individual is located.

Zooming in on the nested actions within the catching of the train, we see the same increasing determinacy of the coordinated activity even clearer. Again the space of possibilities available while acting will be constrained by the form of life, including the nesting affordance of catching the 4PM train and the behavior setting that the individual is a part of. The local observer might see for example that the person in the coffee bar is slightly moving forward toward the table. Limited by the narrow scope of the

<sup>4</sup>Note that a closer examination of the gesturing, glass and table layout would reveal that all three are sociomaterial themselves rather than either social or material. <sup>5</sup>What this approach does is taking seriously the flux of the sociomaterial environment (see Ingold, 2000, 2011 for the importance of this). This is crucial for dealing with many real-life situation. Think for instance of situations of crossing a

busy street by foot. The affordances for crossing the street open up and dissipate (discontinuously) all the time.

perspective of the local observer, it is uncertain what will happen next. Moving toward the table brings many affordances "within reach": the possibility to stand up, to knock on the table, to indicate to a friend that it is time to leave, or to grasp the coffee cup. The action possibilities available in the local landscape are many. Again however, a person that lets herself be moved by the demands of the whole situation (including the 4PM train) responds in a way that the affordances attainable will, from the perspective of a local observer, decrease in number as her action unfolds until she in fact reaches and grasps the cup of coffee that she goes on to finish quickly. The indeterminacy of each act is continuously reduced during its unfolding until the relevant affordances are enacted in a certain unfolding sequence which reconfigures the particular sociomaterial entanglement.

What this zoomed in perspective shows is that the sociomateriality of the landscape of affordances that appears relatively persistent from a zoomed out perspective is in flux from the perspective of a local observer. When an individual acts he or she entangles sociomateriality and contributes to the regular ways of doing things available in the form of life. Which regularity of all the available regularities in the form of life the unfolding activity strengthens, however, is determined by what the individual does; in the unfolding sequence of the individual's concrete activities. As the 4PM train is caught for example, it adds to our standing practice of catching trains, but not to that of going to the movie or to the bakery. In doing so, the skillful responsiveness of an individual's situated activities contributed a tiny bit to keeping the affordance of taking trains available in our form of life – the individual enacted the affordance of catching trains (cf. Shotter, 1983). The landscape of affordances that is seen as persisting from a zoomed out perspective, turns out to be maintained, from the zoomed in observer's perspective, by the multitude of ways in which ongoing coordination is entangled in sociomaterial situations in flux.

# SKILLED INTENTIONALITY

In order to integrate social coordination and affordances into a common framework, the foregoing discussion of the SIF showed how social coordination and materiality are situated and entangled in the affordances available in the form of life, which we can see from a zoomed out perspective (the first perspective discussed). We moreover showed what such continuous intermingling looks like in terms of the ongoing coordination we find from a zoomed in (local observer's) perspective on real-life situations (second perspective). However, we also aim to show how responding to affordances is always unfolding in concrete situations and how accounting for this in an integrative framework could extend the scope of ecological psychology. To show this we need to provide a third perspective on the form of life: we need the actor's lived perspective which foregrounds the responsiveness to multiple relevant affordances of an individual that developed his or her skills within the form of life we are considering.

# Acting within a Field of Relevant Affordances

As mentioned above, Skilled Intentionality is defined as coordinating with multiple affordances simultaneously in a concrete situation (Rietveld et al., 2016). Individuals are enmeshed in a constellation of practices; in a form of life. In the SIF, acting individuals can be thought of as continuously forming aspects of the sociomaterial environment and thus as part of the landscape of affordances.

Skilled individuals are already entangled within the landscape of affordances (i.e., their partaking is implied by the "abilities" part of the definition of affordances as relations between aspects of the sociomaterial environment in flux and abilities available in a form of life). They can have access to a part of the landscape in so far as they have the skills to act on it (Noë, 2012).<sup>6</sup> A skilled individual engages with, and continuously develops within a part of the landscape he or she cares about, which is lived as the "field of relevant affordances."

The field of relevant affordances consists of the affordances that are currently significant to a skilled individual as he or she is engaging with a concrete situation. As mentioned above, it refers to the lived perspective, opened by the individual's abilities and concerns, on a part of the landscape of affordances in flux. Experientially the field of affordances is made up by the relevant affordances that "stand out" among the rest of the landscape of affordances (De Haan et al., 2013; Bruineberg and Rietveld, 2014; Kiverstein, 2016). These attractive affordances are described as soliciting, or inviting, behavior (Dreyfus and Kelly, 2007; Withagen et al., 2012; Rietveld and Kiverstein, 2014). The soliciting character of these relevant affordances is the experiential equivalent of a bodily "action readiness" on the part of the skilled individual (Frijda, 1986, 2007). This preparation to act on relevant affordances is possible because of the abilities the individual has acquired thanks to a history of interactions in sociomaterial practices (Rietveld, 2008a).

These relevant affordance-related states of action readiness are crucial for understanding the interdependence of the skilled individual and his or her evolving situation as can be observed by a local observer or scientist (Bruineberg and Rietveld, 2014). Briefly, according to the SIF, a skilled individual has developed her abilities within the dynamics of the landscape of affordances of a form of life. The individual's intrinsic dynamics can be understood as multiple bodily states of action readiness that are attuned to the relevant affordances in the situation. States of action readiness are reciprocally coupled to the landscape of affordances, in the sense that these states of action readiness self-organize and shape the selective openness to the landscape of affordances for the individual to accommodate the skilled individual's concerns, i.e., to allow him or her to maintain or obtain sufficient grip on the situation. In this way, some affordances in the landscape show up as more and some as less relevant to the individual's unfolding activities. The intrinsic

<sup>6</sup>Note that certain types of power and exclusion in society can make it the case that someone with the right skills still does not succeed in getting access to an affordance. A discussion of this political dimension of affordances in the context of the SIF will have to wait to another occasion.

dynamics of the individual's states of action readiness thus allows for a selective openness to be responsive to the relevant affordances the individual encounters as it acts (for more on this see De Haan et al., 2013; Bruineberg and Rietveld, 2014; Kiverstein and Rietveld, 2015; Bruineberg et al., 2016).

# The Unfolding Action from the Actor's Lived Perspective

The field of affordances brings us a final and crucial viewpoint on the form of life: it complements the first zoomed out, overlooking, perspective and the second zoomed in perspective of the local observer of someone's activities with a re-orientation on the latter perspective, now from within the actions as they unfold. That is, thirdly, it adds to our perspectives on the form of life the means to understand the actor's lived perspective – his or her first person experience. Viewed from within, the evolving landscape of affordances appears both as soliciting and as persisting. That is, while we lose track of some of the flux of patterned human activity over time, we gain a sense of skilled intentionality and a renewed view on the persistence of affordances.

Recall how the relative persistence of regularities in standing practices made way for continuous change of the sociomaterial environment as we zoomed in on the landscape of affordance. Now that we re-orient our perspective toward the actor's lived perspective, these two phenomena – the persistence of the form of life that an individual grows up in, and the continuous change in the sociomaterial environment that acting implies – can be reconnected. To see how this would work we need to consider the continuity in the history of the skilled individual who acts.

From the lived perspective we experience the landscape of affordances in flux from within, on the basis of the continuity of our own history of skills as we have been growing up within our form of life (that we can see from a zoomed out perspective). In other words, our individual familiarity with our form of life is based on our history of skilled engagement<sup>7</sup> (see also Heft, 1996; Rietveld, 2008a; Myin, 2016; Van Dijk and Withagen, 2016). From the lived perspective, the landscape of affordances in flux (that we could identify from a zoomed in observer's perspective) shows up in terms of a multitude of possibilities for acting that are relevant to someone's life and current concerns and solicit him or her to act on them. When acting to catch the 4PM train, the individual's particular history within the form of life enables the person to be selectively responsive to those relevant affordances that move him/her toward the train station. In spite of the flux of the situation in which the skilled individual is engaged the person selectively responds to the affordances relevant to him/her.

With the actor's lived perspective on the landscape of affordances in flux we thus connect with both the zoomed out perspective on the form of life and with the zoomed in perspective of the local observer. Notice that, as we have seen above, by selectively responding to the soliciting affordances in the individual's field of relevant affordances, the skilled individual is in the process of contributing to the maintenance of these affordances as available in the form of life as a whole. Thus, the increasing determinacy of the act that we saw from the zoomed in perspective on the landscape of affordances, can return in the lived perspective. Here, however, it has a different character: we propose that a skilled individual can experience the increasing determinacy of action from within the unfolding act as "directedness" toward the relevant affordances available in the form of life that she is in the process of enacting. This unfolding enactment can be experienced pre-reflectively as having an "intentional" character (see cf. Shotter, 1983; Heft, 1989). Unlike the local observer, the acting individual herself will relatively seldom be uncertain about or surprised by the things that she does during the day, because action switches are often already announced by the pre-reflectively experienced attraction/allure of some of the relevant affordances in the field. Considering the skillful responsiveness to multiple nesting and nested affordances simultaneously, i.e., the responsiveness to a whole field of relevant affordances, an individual then manifests skilled intentionality in the context of his or her form of life (see Rietveld, 2008b; Bruineberg and Rietveld, 2014).

# Skilled Intentionality Unfolding in Architectural Practice

Finally, to show the merit of having these three complementary perspectives at our disposal within the Skilled Intentionality Framework, we turn to a case of skilled individuals who have learned to respond skillfully to the sociomaterial practices they are part of: we turn to architects working on a future building (a mobile sculpture). In doing so we aim to show how the sociomateriality of affordances can open up ecological theory to enable it to deal with what traditionally are considered "hard cases" such as dealing with non-existent things. Against the background of this example, in the discussion that follows we return to our starting point and consider the relation between social coordination and engagement with affordances by discussing how the SIF incorporates each and how it invites us to take a more situated approach in each case.

The nice thing about conceptualizing affordances as belonging to a form of life – i.e., in a fundamentally sociomaterial way – is that it can allow ecological psychology to move beyond the concrete-abstract distinction that is omnipresent in cognitive science. Consider a case where architects are designing a large mobile sculpture (taken from Rietveld and Brouwers, 2016). This sculpture, which is the size of a small house, is heavy and constructively requires such a large rear wheel that it might compromise the esthetics of the work of art. The architects start to determine how to proceed, using several affordances offered by the sociomaterial environment and creating some new ones:

"[Junior project leader AM] clicks on her computer, moving and changing lines, perspectives, colors, and scales; she makes adjustments and new sketches to then again revise these by adjusting lines and so on. . . . [S]he prints the five designs [and] walks over to [architect] RR, puts the printed drawings in front of him on the table when, while keeping their eyes focused on the prints, they pull up their chairs and stoop over the five designs. . . .

<sup>7</sup>Below we will see that abilities are acquired in concrete situations in sociomaterial practice. This process of enskilment is typically scaffolded by more experienced practitioners in a process that can be characterized as education of attention (see Rietveld, 2008a; Rietveld and Kiverstein, 2014).

RR picks up a pen, ticks off the second design, and then strikes it through: 'This isn't good.' He then checks the fifth design: 'Can the wheel rotate/turn around here?' And what about the side-view/profile, what does it look like here?' AM responds in a somewhat doubtful way, after which RR also strikes out this design. 'Look, the wheel does not nicely connect here, in the other alternatives you have created more space at this point."' (Rietveld and Brouwers, 2016, p. 9)

Notice the many affordances that solicit and are acted on in a coordinated fashion: lines of the computer solicit changing, chairs afford pulling up and sitting next to each other, the pen solicits picking up and writing, a question affords answering, and the printed drawings solicit several comparisons. Coordinating with these nested affordances, just as in the 4PM train-example, entangles sociomaterial aspects as it enacts the nesting affordance of developing a good design.

Recall how the activity of catching the 4PM train was increasingly determined, and the affordance of taking trains enacted, in simultaneously coordinating to the standing practices in which trains run on time, the possibility of getting to the train station, of drinking coffee, and of paying the bill. Similarly here, in the process of realizing a satisfying design for the mobile sculpture, the design is increasingly determined by acting in accordance with the practices where the final design will have its place (e.g., as a mobile sculpture for public use and as part of an art collection) and simultaneously coordinating to the affordances offered by the printed drawings, the movable 3Dlines, the pen and a collaborator (Rietveld and Brouwers, 2016).

The right design therefore does not need to be "determined" in advance. There is no fully specified picture or description of the end result on the basis of which the design is realized. On the contrary, the design is realized in practice because it is getting increasingly determined or developed in acting within the landscape of affordances. The process of designing the sculpture can even have the determining, directed, character of nesting affordance for the architects, because they are in the process of enacting a satisfying design. For example, having the five different printed drawings affords a more precise conversation, within which they are evaluated one by one, compared and discarded in the process until finally one, it turns out, is selected for further development, which improves the architects' grip on the final design and resolves a feeling of dissatisfaction or discontent (see Rietveld and Brouwers, 2016).

This kind of skilled intentionality is founded on a history of engaging in the relevant practices in which many details of the sociomaterial environment have been encountered (Rietveld, 2008a; Rietveld and Kiverstein, 2014; Myin, 2016). Having the ability to act in accordance with both the point and the details of the sociomaterial practice is having skill – in this case an architect's skill. In the process of coordinating with an evolving field of relevant affordances offered by the sociomaterial environment, the architects tend toward grip on their design. Thus, although during this episode the mobile sculpture was still non-existent in a sense, it is perfectly concrete as the coordination with sociomaterial aspects of the environment is realizing the affordance of designing a mobile sculpture.

# CONCLUDING REMARKS

By making the human form of life the starting point for ecological psychology, and thus foregrounding sociomateriality in each situation, this paper showed how the Skilled Intentionality Framework integrates affordances with social coordination. In the SIF affordances are defined as relations between aspects of the sociomaterial environment in flux and abilities available in a form of life. By showing how the sociomaterial entanglement re-appears when taking three different perspectives on (i) the whole form of life – and the persistent landscape of affordances it implies, and (ii) the zoomed in perspective of an observing behavioral scientist or dynamicist observing an actor located at a particular place in the landscape of affordances and (iii) the lived perspective of person engaged in action – we showed what a situated, integrated take on engagement with affordances looks like. Thus we showed how different aspects of the notion of affordances and of coordination fit in: while theories and methods of (social) coordination tend to focus on the zoomed in observer's perspective on the (inter)actions within an evolving landscape of affordances, those studying affordance perception are mostly focusing on affordances as we encounter them from our lived perspective – as agent-scaled perceived resources for action (e.g., Warren, 1984; Oudejans et al., 1996).

However, as we have stressed throughout this paper, these different viewpoints offer complementary (and not necessarily exhaustive) perspectives on the sociomaterial entanglement of the form of life as a whole (see Klaassen et al., 2010). The perspectives suggest that both fields of ecological psychology could consider broadening their scope in two principled ways. First, they could broaden the range of phenomena within their own perspective. As we have detailed, one is never coordinating with other people in isolation. The sociomateriality of the landscape of affordances in flux urges the study of social coordination to include coordination with materiality, i.e., as sociomaterial synergies. Moreover, zooming out emphasizes a focus not just at the scale of immediate interpersonal (e.g., dyadic) interaction, but to also include nesting scales of coordinating (Wijnants et al., 2012; Schmidt et al., 2014) with more distant dealings and places (Heft, 1996; Bruineberg and Rietveld, 2014; Van Dijk and Withagen, 2016) and perhaps even entire practices (Rietveld and Brouwers, 2016) and language games (Rietveld and Kiverstein, 2014; Van Dijk, 2016). Furthermore, from a lived perspective, one never encounters an affordance in isolation. Studies on affordances should thus take note of the sociomaterial context by studying affordance perception in the context of a field of relevant affordances embedded in a behavior setting (Heft, 2007, 2012) and/or a sociomaterial practice (Rietveld and Brouwers, 2016; Rietveld et al., 2016). By taking a more situated approach, both fields can thus contribute to the same overall goal of extending the reach of ecological psychology toward dealing with case of so-called "higher" cognition.

Second, as each of the three perspectives discussed foregrounds different aspects of the form of life, but backgrounds or neglects others, we believe each field should aim to keep an eye on at least one other perspective on the form of life to be able to claim to see the whole picture. Ethnography highlights

the importance of this as it aims to link a zoomed in perspective on concrete situations, or interviews based on individual's lived perspective, to the regularities at the level of the sociomaterial practice as a whole; i.e., to what we have called a zoomed out view on the form of life. Ecological psychology could thus benefit from including ethnographical methods and social sciences that thematize the patterned practice of the form of life (e.g., Roepstorff, 2008). That way we can get a clear view of the richness of the landscape of available affordances offered by our evolving sociomaterial environment (e.g., Malafouris, 2014) as it persists and changes within cultures, communities, and behavior settings.

The view that we have presented in its multitude of perspectives on the whole does not need to rely on (ontological) priorities, in the sense that it does not need to presuppose a hierarchical and pre-structured world (see Van Dijk and Withagen, 2014; Van Dijk, 2016; see also Hodges and Baron, 1992; Ingold, 2011). For example, the notion of "higher" cognition that we discussed is indicative of a supposed hierarchy, but its use can be avoided once we realize that the phenomenon the notion aims to single out (e.g., architects designing a novel sculpture) amounts to adequately coordinating with multiple affordances simultaneously across increasing scales of sociomateriality. Conceptually this required the notion of a constitutive entanglement. One of the merits of our view on the constitutive entanglement is that it ties our concepts in with a process that constitutes the whole while forming its parts. In this way it opens up our theory to the scrutiny of dynamical methods, in which it is common to distinguish between macro-level patterns of activity and microlevel patterns of activity. For example, we can formalize the dynamics of the agent-environment system as a whole, or focus on a part and use the tools and concepts of dynamical systems theory to increase our understanding of the dynamics of multiple simultaneous affordance-related states of action readiness (see Bruineberg and Rietveld, 2014; Bruineberg et al., 2016).

# REFERENCES


Finally, in a fragment quoted by Costall (1997), Gibson tried to clarify how affordances are objective and subjective through their relational persistence: "Affordances are both objective and persisting and, at the same time, subjective, because they relate to the species or individual for whom something is afforded" (Gibson, 1982, p. 237). Our view makes sense of this idea by showing how the distinction between the individual and the "species" is not the most relevant one. By rather talking about a form of life by focusing more on "how an animal lives than [on] where it lives" (Gibson, 1979, p. 128), i.e., on its way of life, we showed that affordances are both persisting environmental resources which can solicit an individual and persisting relations in the ecological niche. Yet they are continuously forming in the multitude of our activities that make up our form of life.

# AUTHOR CONTRIBUTIONS

LvD and ER: conception of the work, drafting the work.

# FUNDING

We gratefully acknowledge the support obtained from the European Research Council in the form of an ERC Starting Grant (679190) awarded to ER.

# ACKNOWLEDGMENTS

We are indebted to Julian Kiverstein, Martin Stokhof, Jelle Bruineberg, and Rob Withagen for our discussions and for their insightful comments on earlier versions of this paper. We thank Azille Coetzee for proof reading our text. We are moreover grateful to Harry Heft and Cor Baerveldt for their helpful suggestions.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 van Dijk and Rietveld. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Dynamics of Social Interaction: Kinematic Analysis of a Joint Action

Quentin Moreau1,2,3 \*, Lucie Galvan<sup>2</sup> , Tatjana A. Nazir<sup>2</sup> and Yves Paulignan<sup>2</sup>

1 "Sapienza" Università di Roma, Dipartimento di Psicologia, Roma, Italy, <sup>2</sup> Laboratoire sur le Langage, le Cerveau et la Cognition, Institut des Sciences Cognitives, Centre de National de la Recherche Scientifique – Université Claude-Bernard Lyon 1, Lyon, France, <sup>3</sup> Istituto di Ricovero e Cura a Carattere Scientifico, Fondazione Santa Lucia, Laboratorio di Neuroscienze Sociali, Roma, Italy

Non-verbal social interaction between humans requires accurate understanding of the others' actions. The cognitivist approach suggests that successful interaction depends on the creation of a shared representation of the task, where the pairing of perceptive and motor systems of partners allows inclusion of the other's goal into the overarching representation. Activity of the Mirror Neurons System (MNS) is thought to be a crucial mechanism linking two individuals during a joint action through action observation. The construction of a shared representation of an interaction (i.e., joint action) depends upon sensorimotor cognitive processes that modulate the ability to adapt in time and space. We attempted to detect individuals' behavioral/kinematic change resulting in a global amelioration of performance for both subjects when a common representation of the action is built using a repetitive joint action. We asked pairs of subjects to carry out a simple task where one puts a base in the middle of a table and the other places a parallelepiped fitting into the base, the crucial manipulation being that participants switched roles during the experiment. We aimed to show that a full comprehension of a joint action is not an automatic process. We found that, before switching the interactional role, the participant initially placing the base orientated it in a way that led to an uncomfortable action for participants placing the parallelepiped. However, after switching roles, the action's kinematics by the participant who places the base changed in order to facilitate the action of the other. More precisely, our data shows significant modulation of the base angle in order to ease the completion of the joint action, highlighting the fact that a shared knowledge of the complete action facilitates the generation of a common representation. This evidence suggests the ability to establish an efficient shared representation of a joint action benefits from physically taking our partner's perspective because simply observing the actions of others may not be enough.

Keywords: joint action, kinematics and dynamics, social interactions, reach-to-grasp, motor system

# INTRODUCTION

Humans are constantly communicating with their fellows. Of all the great apes, it was only humans who developed a complex verbal language allowing us to communicate our wishes, our intentions and our feelings. Leibniz (1765) described language as the mirror of understanding, a powerful instrument used by an individual to express their own internal processes and to

#### Edited by:

Michael J. Richardson, University of Cincinnati, USA

#### Reviewed by:

Lynden K. Miles, University of Aberdeen, UK Jessica Lindblom, University of Skövde, Sweden

\*Correspondence: Quentin Moreau quentin\_moreau6@orange.fr

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 31 May 2016 Accepted: 12 December 2016 Published: 27 December 2016

#### Citation:

Moreau Q, Galvan L, Nazir TA and Paulignan Y (2016) Dynamics of Social Interaction: Kinematic Analysis of a Joint Action. Front. Psychol. 7:2016. doi: 10.3389/fpsyg.2016.02016

Moreau et al. Dynamics of Social Interaction

describe external objects for others and consequently socially interact with them. However, social communication is not limited to just verbal communication. Through their behavior, humans change their environment, satisfy internal needs, and achieve personal goals (Lemon, 2008). From our early days as infants, a large number of our actions are performed in social contexts, allowing us to develop social skills and the ability to coordinate with interacting agents. One way to study these behaviors is to focus on joint actions (Sebanz et al., 2006). Classical orchestras, collective sports, and ballets are just a few examples of how people can coordinate their movements to achieve a common goal. But such actions are not only accomplished by musicians or those competing in team sports. Simple joint actions are ubiquitous in our everyday life, such as lifting a heavy table with another person or shaking hands with a colleague. Social interactions require dynamic and efficient encoding of others' gesture and a spatiotemporal synchronization of the individuals involved (Sebanz et al., 2006). On the question of the mechanisms of interaction, differing views have been proposed over the years. For example, Coordination Dynamics have explored the social influence of one person on another, highlighting a spontaneous and immediate coordination of their actions while engaging in interpersonal sensorimotor interactions (see Coey et al., 2012 for a review). This theory places social cognition in an embodied-embedded constraint, where social behavior is defined as self-organized. The brain is dynamically in interaction with the environment and other natural sytems (Coey et al., 2012). On the other hand, the most traditional approach of social cognition is set in a cognitivist framework. Evidence indicating that action production and action perception rely on common mechanisms led to the Theory of Event Coding (Hommel et al., 2001). If perception and action rely on common codes, it makes the integration of one's own and co-actor's action for joint actions relatively straightforward (bottom-up processes). On a more top-down perceptive, co-representation of conspecifics during joint action is thought to be a key feature to understanding others' goals and actions (Sebanz et al., 2006).

One cerebral network suggested to play a role in matching observed and executed actions is the Mirror Neurons System (MNS). Since mirror neurons (MN) were discovered more than 20 years ago, we have been able to apply a neuroscientific approach and a new understanding of social interaction. Initially discovered in non-human primates (di Pellegrino et al., 1992) and in humans (Buccino et al., 2001; Rizzolatti and Craighero, 2004; Mukamel et al., 2010), MN are visuo-motor neurons located in the premotor cortex and the inferior parietal lobule, automatically firing during the execution of an action and the observation of the same action performed by another person. MNS has been proposed as the fundamental neural mechanism at the basis of the understanding of others' actions and successful non-verbal social interactions (Gallese et al., 2004). The discovery of MN has given rise to a large number of interpretations of their potential roles in human cognition: understanding of action (Rizzolatti et al., 2001), imitation (Gallese, 2003), empathy (Gallese, 2001, 2003), mind-reading (Gallese and Goldman, 1998), and emergence of language (Rizzolatti and Arbib, 1998).

In recent years, great progress has been made by investigating neural processes during interpersonal motor interactions (Newman-Norlund et al., 2007; Newman-Norlund et al., 2008; Kokal and Keysers, 2010; Konvalinka et al., 2014; Ménoret et al., 2014; Sacheli et al., 2015b). These studies brought to light the recruitment of fronto-parietal networks during interactive contexts where MN are thought to play a role in the internal simulation, action prediction and understanding. However, interpersonal coordination requires both perception and understanding of our partner's movements while controlling our own movements. To explain neural processes in bidirectional interactions, Hari and Kujala (2009) proposed the "interactiveloop" model. This model is that a coupling of perceptive and motor systems of the individuals is necessarily involved during joint actions in order to form common internal representations. By their actions and intentions, a person influences and changes not only their environment but also the movements that the interactional partner needs to perform in order to interact smoothly. These modifications modulate the perception of the environment the other person has had up to this point. That person reciprocates by influencing the environment in return by changing external representations on their partner's brain. The progressive construction of these "action-perception" loops is constantly changing and seems to be essential for encoding social actions and building up a common representation of the action for all the protagonists involved (Hari and Kujala, 2009). The enrolment of fronto-parietal networks might therefore allow the construction of an "interactive loop" to build a common representation of an action in both protagonists, allowing successful interactions.

In line with this model, it is our opinion that building experimental paradigms allowing bidirectional and either synchronic or diachronic adaption (Tognoli and Kelso, 2015) involving at least two interacting subjects (Tognoli et al., 2007; Noy et al., 2011; Konvalinka et al., 2014; Sacheli et al., 2015a; Moreau and Candidi, 2016), it should be possible to highlight a common adaptation between participants at a behavioral level. In our study, we focused on behavior changes during a bidirectional diachronic joint-action, involving one participant putting a base in the middle of a table and another participant placing a parallelepiped into the base's slot. Crucially, the action of placing the parallelepiped is more or less facilitated depending on how the base is oriented on the table (for an equivalent solo action, see Allami et al., 2008). Our paradigm therefore allows space for adaptation and gives us objective measures to define the success of the interaction and the installation of the common representation. The purpose of the study, set on a cognitivist approach, is to highlight that full comprehension of a joint action is not automatic and that the installation of the common representation is progressive.

We attempted to reveal a behavioral change (placing the base in a more optimal position) when both individuals had physically experienced the other partner's motor task difficulty. Results indicate that the adaptation to the new task was not automatic and required a common experience from both participants. We wish to point out a limit of the maximalist interpretations of the MNS, according to which this system serves the understanding of others action and provide empirical data supporting the minimalist approach of the representational content of MN (Pacherie and Dokic, 2006). Although we agree the MNS is part of basic constituents of action understanding, this system appeared not sufficient to fully understand actions and apprehend movement difficulties encountered by the other.

# MATERIALS AND METHODS

fpsyg-07-02016 December 24, 2016 Time: 11:26 # 3

# Subjects

Twenty healthy right-handed subjects (6 males, 14 females; mean age: 22.9, range : 18–50 years) participated in the Main experiment and 10 healthy right-handed subjects participated in the Control Experiment (5 males, 5 females; mean age : 21.2, range : 16–28 years). None of them had any history of neurological disorder. All participants gave written consent before the experiment. Participants were familiarized with the methods prior to the experiment. and an explanation of the purpose of the study was given at the end of the experiment. The study was approved by the Ethical Committee CPP Sud-Est II.

# Procedure

### Main Experiment

The experiment involved a pair of participants: subject 1 (S1) and subject 2 (S2). In the following, participants will be referred as S1 or S2 depending on their original role in the experiment (Condition 1). Subjects were seated face to face on opposite sides of a 50 cm × 70 cm table. Both subjects were instructed to use only their thumbs and index fingers to grip and displace the experimental objects, which were placed on predefined spots: the cylindrical base located to the right of S1 and the parallelepiped to the right of S2. The slot of the base and the parallelepiped were both oriented parallel to the sagittal axis (**Figure 1**). The subjects were asked to carry out a simple task: S1 was to move the base to the center of the table and S2 was to place the parallelepiped in the slot on the base. Note that the action performed by S2 is more or less facilitated depending on how S1 place the base on the table: difficult when the slot in the base remains parallel to the sagittal axis; easy when it is slightly tilted to the left. The experiment was divided into three conditions with instructions given prior to each condition. In Condition 1, the joint action was repeated 20 times with no other instruction other than completing the task after a vocal "Go" signal from the experimenter, which was the same for every trial. In Condition 2, subject's roles were interchanged so S1 was in charge of moving the parallelepiped and S2 the base for 3 trials. Because this second condition was designed to give the opportunity for participants to experience the other's action, data from Condition 2 was not analyzed. In Condition 3, subjects reverted to their initial roles (so they were carrying out the same task as in Condition 1) for another 20 trials. In this final condition, participants were further instructed to perform the task as fast as possible. During the whole experiment, participants were not allowed to communicate their feelings either verbally or by explicit non-verbal utterance. To ensure the accuracy and well-being of the experiment, an experimenter continuously stayed next to the table in order to both control inter-subjects communication and to replace objects at their correct initial location when required. At no time were participants encouraged to change their motor behavior either explicitly or implicitly; they believed the goal was a simple kinematic study until the end of the experiment.

# Control Experiment

In the Control Experiment, the task was explained as in the Main Experiment but subjects only completed Condition 3 (where the task was to perform the joint action as fast as possible). This control served to ascertain that verbal instruction to perform the task as fast as possible was not sufficient to provoke potential behavioral changes. A critical point of the Control experiment was that participants were only assigned to one role; they did not have the opportunity to experience the other participant's task.

# Movements Recording

The movements of the right arm and hand of both subjects were recorded by means of an Optotrak Certus camera (manufactured by Northern Digital Inc, in Waterloo, Ontario, Canada). The camera was fixed 2 m away from the table. Five markers were placed on each subject: marker 1 was on the distal extremity of the thumb, marker 2 on the distal extremity of the index, marker 3 on the metacarpophalangeal joint of the index, marker 4 on the radial styloid and marker 5 was 3 cm over the radial styloid. The spatial position of active markers was sampled at 250 Hz with a spatial precision of 0.1 mm.

Raw data was pre-processed using a second order Butterworth dual-pass filter (cut-off frequency, 10 Hz). Kinematic parameters were assessed for each individual movement using Optodisp software (Optodisp copyright INSERM-CNRS-UCBL, Thévenet et al., 2001). The movement duration was analyzed and the velocity peak of the two sub-phases of the movement recorded (Reach-to-Grasp and Displace). Sub-movements onset and offset were determined by a sequence of at least eleven increasing or decreasing points of the wrist marker velocity profiles. Velocity peak was determined as the maximal value in wrist marker velocity profiles. The workspace was defined by the X, Y, Z axes defined by the table surface. The angle between the fingers markers 1 and 2 of S2 and the sagittal axis was also analyzed. This angle corresponded to the opposition axis (Paulignan et al., 1997) an index of movement difficulty (Frak et al., 2001). Note that the opposition axis is directly linked to the behavior of the subject that places the slot (S1) because the opposition axis changes with the orientation of slot.

# Statistical Analyses

Kinematic parameters were determined for each individual trial and averaged for each participant and condition. Statistical analysis data was analyzed by using Statsoft Statistica 8. General Linear Model (GLM) and the Greenhouse–Geisser correction for non-sphericity was applied when appropriate (Keselman and Rogan, 1980). Post hoc comparisons were performed using the Newman–Keuls correction for multiple comparisons (significance threshold was fixed at p < 0.05).

Execution time and Velocity peaks were both analyzed each by separated 2×2×2 (Condition × Movement × Subject) ANOVAs and final angles by a one-way ANOVA.

# RESULTS

# Kinematics Results

### Execution Times

The 2 × 2 × 2 (Condition × Movement × Subject) ANOVA showed a significant main effect of Condition [F(1,8) = 70,824, p < 0,001], indicating a smaller execution time during Condition 3 compared to Condition 1. There was a significant main effect of movement [F(1,8) = 111,06, p < 0,001], showing that Reachto-Grasp movement was realized faster than Displace and a significant main effect of subject [F(1,8) = 59,335, p < 0,001] that showed that S1's action were executed faster than S2's. Post hoc test indicated that S1 execution times were smaller in Condition 3 compared to Condition 1 for both Reach-to-Grasp (p < 0,001) and Displace (p < 0,001) (see **Figure 2A**) and that S2 execution times were smaller in Condition 3 compared to Condition 1 for both Reach-to-Grasp (p < 0,001) and Displace (p < 0,001) (see **Figure 2B**).

### Velocity Peaks

The 2 × 2 × 2 (Condition × Movement × Subject) ANOVA showed a significant main effect of Condition [F(1,8) = 49,339, p < 0,001], indicating a greater execution speed during Condition 3 compared to Condition 1. A significant main effect of movement [F(1,8) = 114,47, p < 0,001] showed that Reachto-Grasp movement was realized with a smaller speed than Displace. Post hoc test indicated that S1 velocity peaks were greater in Condition 3 compared to Condition 1 for both Reachto-Grasp (p = 0,006) and Displace (p < 0,001) (see **Figure 2C**) and S2 velocity peaks were greater in Condition 3 compared to Condition 1 for both Reach-to-Grasp (p = 0,002) and Displace (p < 0,001) (see **Figure 2D**).

# Final Angle

In order to compare the final angles of Condition 1, Condition 2, and Control, we performed a one way ANOVA that showed a main effect [F(2,6) = 7,7206, p = 0,02]. Post hoc test revealed that the final angle in Condition 3 was significantly smaller compared to Condition 1 (p = 0,03) and compared to Control (p = 0,02) (see **Figure 3**).

# DISCUSSION

In this paper, we developed a diachronic joint action task that induced behavioral adjustments made by one of the participants in order to make the task of their partner easier and consequently improving the common realization of the task for both participants. The results revealed better performance (execution time and velocity) in both sub-movements of the task, Reach-to-Grasp and Displace, for both participants when common representation of the task was achieved.

Our measurements also highlight a significant change in the final angle of the cylindrical base between the Condition 1 (before the subjects switched roles) and Condition 3 (after the subjects switched their roles). According to Allami et al. (2008), this orientation's change from the sagittal axis facilitates the partner Displace sub-movement: the articular tension in the arm and wrist joints are less extreme in Condition 3 than in

Condition 1. This behavioral measurement indicates that the participant placing the base changed their previous behavior, after experiencing the partner's contribution to the joint action.

The results from our statistical analysis reveal an increase in the performance of the subjects for both sub-movements during Condition 3. In Condition 3, both subjects were asked to perform the action as fast as possible. Due to this, one can assess that the instruction (to perform the task as fast as possible) is the main contributor to the increase in performance, rather than changes in the orientation. Because of this, we were also able to calculate the effect size (Cohen's d) according to Fitts's (1954) law , where the level of difficulty of a movement is proportional to the amplitude of kinematics values (Ferri et al., 2011). Thus we compared Cohen's d values calculated for velocity peaks for each sub-movement in Condition 1 and Condition 3 for S2 (the participant placing the parallelepiped). Cohen's d value for Reach-to-Grasp sub-movement between Condition 1 and Condition 3 is 0.76, corresponding to a medium effect, while Cohen's d for Displace Sub-movement between Condition 1 and Condition 3 is 1.38, corresponding to a large effect (Cohen, 1992). The discrepancy between Cohen's d values during the two movements' phases highlights the facilitation of S2 Displace sub-movement when S1 (participants placing the base) changed the orientation of the base. On the one hand, imposed constrain on movement speed in Condition 3 improves the kinematics parameters in both submovements but on the other hand, we observe a greater orientation-specific effect only on the Displace sub-movement. This result is consistent with previous data from Allami et al. (2008).

Our aim, based on Hari and Kujala's (2009) model, was to demonstrate the establishment of a common representation during a joint action. Through the first condition, the two subjects were performing the task freely. Even if Condition 1 was uncomfortable for the participant placing the parallelepiped (S2), the participant placing the base (S1) didn't change the base's orientation to facilitate S2 contribution. Nonetheless, after both subjects experienced each other's role in the action (in Condition 2), a significant change in the base's orientation has been measured, revealing a change in the behavior of S1 during Condition 3. This change also improved the task performance in Displace sub-movement of S2. Changes in opposition axis orientation are close to those obtained in previous studies describing easy movements (Frak et al., 2001; Allami et al., 2008).

As our Control experiment showed, the orientation change is not due to the requested high speed during Condition 3; indeed, no change in angle was measured when the roles of both subjects were not interchanged (Control condition). This change in the instructions was only to maintain participants focus and avoid boredom.

From the neural perceptive, parietal and premotor regions are thought to form the action observation network, also known as the MNS (Rizzolatti and Craighero, 2004). Although action mirroring has become a popular way to explain joint action effects (Frischen et al., 2009), if we adhere to the original description of this neural substrate, both action and observation should activate the same neurons. However, in our experiment, observation of S2's action did not seem enough for S1 to fully understand their difficulty. In fact, S1 kept putting the base in a difficult position for S2 through all Condition 1. However, when S1 had experience of both parts of the joint-action, S1 adapted and changed their initial behavior to facilitate S2's movement. We can agree here that MN were certainly activated to understand globally the action but not the subtleties such as extreme joint angles. The critical behavioral change only appeared in Condition 3, when both subjects shared a common knowledge of the two individual actions composing the joint action. In any case, our experiment stirs up a debate about MNS role in joint-actions. While we do not deny the possible role of MN for joint action, or the fact that they are involved in action comprehension, it seems that observation of the partner's movements (and thus activity in the MNS) was not enough to fully understand the action of others (Kokal and Keysers, 2010) during our protocol. In fact, we highlighted a delayed installation of a common understanding between the two subjects participating in a joint action. Our interpretation of behavioral data is in line with Pacherie and Dokic (2006) minimalist theory.

Over the last few decades, scientific research has improved our understanding of how perception and action are linked. It still remains unclear whether the processing of relevant information during joint actions emerges from the physical and informational constraints (Rigoli et al., 2015) or whether it is supported by high-level representations (social-specific) mechanisms (Sebanz et al., 2006) or through lower perceptive mechanisms (Dolk et al., 2011). In other words, is the mental representation of a partner necessary, and if it is, to what extent does one interacting individual need to mentally represent the actions of others to interact with them?

One of the most studied joint tasks is derived from the Simon task (Simon, 1969) where participants have to respond to the color of stimuli with both their hands while ignoring their spatial location (e.g., red with left hand, blue with right hand). The so-called Simon effect (SE) describes the fact that participants are faster to answer when the stimulus and the response are congruent (e.g., use the left hand when the stimulus is presented of the left of the screen). The SE disappears when participants are asked to only answer to one stimulus (a go/no-go task) but the joint Simon effect arises when performed by a pair of participants (where each participant has a go/no-go task). The joint Simon task has been used to highlight co-representations during joint tasks, suggesting the existence of a specific neural mechanism facilitating social interactions with conspecifics (Tsai and Brass, 2007; Welsh, 2009). An alternative interpretation was proposed by Dolk et al. (2013), suggested that "social" effects from the Joint Simon task can be explained by the Theory of Event Coding (Hommel et al., 2001; Hommel, 2009) and further claimed that generic sensorimotor regions were the substrate of joint actions without the need of a co-representation. Another framework challenging the notion of co-representation during joint action is the Coordination Dynamics approach where social effects are

due to motor coordination rather than mental stimulation of a complementary action (Doneva and Cole, 2014). The findings of our experiment seem to be contesting the latter theories (both Theory of Event Coding and Coordination Dynamics), rather they agree that the common representation of the action during social interaction is through mental representation. Our results appear to have a better fit in a top-down explanation: our behavioral change did not appear automatically through action observation nor motor coordination between subjects but only when knowledge of task-specific features was shared between the two participants. Although our conclusions are only based on kinematics, the change in the behaviors of participants seems to fit in the theory of co-representation. Here we argue that participants lacked knowledge of the others' specific action, resulting in a misrepresentation during Condition 1. It is our belief that, only thanks to the shared experience of the task during Condition 2, was co-representation achieved, resulting in an adaptation of interacting subjects' behaviors in

# REFERENCES


Condition 3. In our task, the common understanding required both action perception and self-experienced execution of the subtasks in order to reach a common understanding through corepresentation. Further experiments focusing on neural activity will be required in order to highlight the installation of the co-representation, such as reported modulations of the alpha rhythm during joint actions (Naeem et al., 2012; Novembre et al., 2016).

# AUTHOR CONTRIBUTIONS

All authors provided substantial contributions to the conception or design of the work. QM and LG were in charge of the acquisition of data, QM, LG, and YP were in charge of the analysis, all authors were involved in the interpretation of data for the work. QM and YP drafted the work and TN revised it critically and all authors gave final approval of the version to be published.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Moreau, Galvan, Nazir and Paulignan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Verbal Synchrony and Action Dynamics in Large Groups

#### Jorina von Zimmermann\* and Daniel C. Richardson

Department of Experimental Psychology, University College London, London, England

While synchronized movement has been shown to increase liking and feelings of togetherness between people, we investigated whether collective speaking in time would change the way that larger groups played a video game together. Anthropologists have speculated that the function of interpersonal coordination in dance, chants, and singing is not just to produce warm, affiliative feelings, but also to improve group action. The group that chants and dances together hunts well together. Direct evidence for this is sparse, as research so far has mainly studied pairs, the effects of coordinated physical movement, and measured cooperation and affiliative decisions. In our experiment, large groups of people were given response handsets to play a computer game together, in which only joint coordinative efforts lead to success. Before playing, the synchrony of their verbal behavior was manipulated. After the game, we measured group members' affiliation toward their group, their performance on a memory task, and the way in which they played the group action task. We found that verbal synchrony in large groups produced affiliation, enhanced memory performance, and increased group members' coordinative efforts. Our evidence suggests that the effects of synchrony are stable across modalities, can be generalized to larger groups and have consequences for action coordination.

Edited by:

Michael J. Richardson, University of Cincinnati, USA

#### Reviewed by:

Lynden K. Miles, University of Aberdeen, UK Nicola Yuill, University of Sussex, UK

> \*Correspondence: Jorina von Zimmermann jorina@eyethink.org

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 30 September 2016 Accepted: 15 December 2016 Published: 26 December 2016

#### Citation:

von Zimmermann J and Richardson DC (2016) Verbal Synchrony and Action Dynamics in Large Groups. Front. Psychol. 7:2034. doi: 10.3389/fpsyg.2016.02034 Keywords: synchrony, behavioral coordination, affiliation, joint action, cooperation

# INTRODUCTION

Thirty strong legs are rhythmically thrilling the ground. Chests, thighs, and arms become drums and strong voices are forcefully chanting together. Eyes are rolled and tongues are poked out. Before every match, the New Zealand rugby team performs the haka, a traditional Maori war dance composed of rigorous, synchronized movements and fierce, rhythmical chants. Amongst other things, the haka was performed before a battle to demonstrate strength and power and to intimidate the opponent. However, anthropologists and historians have argued for a long time that 'keeping together in time' (McNeill, 1995) induces emotional bonding among human groups with significant consequences for interaction and cooperation (von Zimmermann and Richardson, 2015). The haka might scare the enemy on the battlefield or rugby pitch, but it might also strengthen intragroup bonds and have a significant impact on the group's performance.

Rhythmic and coordinated actions such as marching, dancing, singing, or playing music together have been part of human rituals across all cultures in the world (McNeill, 1995; Codrons et al., 2014), but synchrony is not only a human phenomenon. It can be found everywhere in the natural world as well. For example, cardiac cells fire in synchrony and fireflies flash in unison (Strogatz, 2003; Cabeza et al., 2010). Metronomes automatically synchronize if they are put on

a freely moving base (Pantaleone, 2002), and neurons synchronize their activity to allow for coherent percepts and actions (Singer, 1993). Human beings coordinate their postural sway during conversation (Shockley et al., 2009), and their movements during a pendulum swinging task, or while rocking in a chair when visually coupled (Richardson et al., 2005; Richardson M. J.et al., 2007). There seems to be a compelling drive for systems to self-organize in synchrony (Strogatz, 2003), and it has been suggested that human beings possess a fundamental drive to coordinate their actions with the actions of others, as this forms the basis for social connectedness (Marsh et al., 2009).

Social scientists have started to collect empirical evidence for the effects of synchronized human activity and a growing body of research supports the idea that coordinated action can function as 'social glue' that binds people together and enhances their willingness to cooperate (Valdesolo et al., 2010). For example, observing synchronous movement increases perceived rapport and interpersonal connectedness between people (Miles et al., 2009; Lakens and Stel, 2011); exposure to synchronous stimulation enhances the degree of self-other merging (Paladino et al., 2010); and active engagement in synchronized physical and verbal activities boosts actual liking and cooperation (Hove and Risen, 2009; Wiltermuth and Heath, 2009; Reddish et al., 2013; Launay et al., 2014), as well as pro-social behavior toward an interaction partner (Valdesolo and DeSteno, 2011).

To date, an impressive breadth and variety of studies investigating behavioral coordination has been published. However, there are several fundamental questions about the phenomena, which are currently unanswered. For instance, do the effects of coordination scale up from pairs of people to small and then large groups? With a few exceptions (e.g., Wiltermuth and Heath, 2009; Reddish et al., 2013; Codrons et al., 2014; Tarr et al., 2015), behavioral coordination has mostly been studied in pairs, which makes it difficult to generalize from two people to large groups of people. These studies have also mostly studied the effects of coordinated movement. So one might wonder, does it matter which aspect of behavior is coordinated – speech, posture or gesture – in order to produce particular psychological effects? Finally, are the benefits of coordination restricted to social judgments – attitudes and opinions about other people – or does it also affect cognition and joint action, such as the ability of people to perform a dynamic task together?

First, we will briefly review the current answers we have to these questions with a focus on synchrony as a particular form of behavioral coordination. Then, we present an experiment combining verbal synchrony and group action that attempts to answer some of the unresolved issues. Finally, we will discuss how the results of our study fit in with existing research and which future research directions could be taken to clarify the subject further.

# Synchrony in Groups

Most experimental demonstrations of coordinated behavior focus on pairs of participants, or more commonly, a participant and a confederate who has been instructed to mimic body motions (e.g., Chartrand and Bargh, 1999). Most of the findings reported in the synchrony literature also stem from either experienced or observed dyad interaction. While the findings reported significantly advance our understanding about the circumstances under which synchrony emerges and the effects it has, generalizations from pairs to groups can be problematic. It is therefore crucial to also study synchrony experienced in a group context, as coordinated behavior has played an important role throughout history and cultures (Haidt et al., 2008), and has lost none of its significance. Even today, soldiers are still drilled to march in synchrony during their education and parades all over the world, synchrony is frequent in dance and sports, and collective chants take place during rituals, demonstrations, and religious ceremonies to name but a few examples.

One of the reasons for the lack of group studies in relation to synchrony and behavioral coordination more generally is that it is almost certainly difficult to get more than one or two participants into the lab at the same time, or having to coordinate multiple confederates simultaneously. A second possible reason is that group data is often very noisy and challenging to make sense of. In spite of these difficulties, a few studies have been published, which have looked at the effects of synchrony experienced in bigger groups. These studies report that synchrony increases aggressive behavior toward an outgroup and obedience to a leader (Wiltermuth, 2012a,b), while at the same time it increases ingroup affiliation (Tarr et al., 2015), and cooperation (Reddish et al., 2013). Similarly, in a recent study we found that the amount of distributed coordination naturally emerging over time in a choreographic task, which facilitated synchrony without instructing it, predicted how much group members liked each other and the group as a whole, and how much they conformed to each others opinions (von Zimmermann et al., under review). While these studies suggest that synchrony at the group level has similar effects as it has at the pair level, the evidence is still sparse.

# Synchrony across Different Behaviors

Does it matter which kind of behavior is synchronized, or simply that the same action happens at the same time between two or more people? The literature is not clear on this point, as the many skeins of behavioral coordination that have been discovered are isolated in different disciplines, different tasks and types of interaction, different measures and means of analysis. Social psychologists may study mimicry between gestures, ecological psychologists the rhythmic entrainment of body sway, and psycholinguists the repetition of grammatical forms. These differences are important to the scientists, but are they important to the psychological outcome of behavioral coordination?

We wanted to investigate whether the effects of movement coordination reported in the literature would also result from verbal coordination alone. Speakers have been found to possess a remarkable ability to speak in synchrony with one another, without any practice or detailed instructions (Cummins, 2011). Perhaps unsurprisingly then, chanting, or joint speech, can be observed in every human culture, and as a means of storing and passing on information, it predates the written word (Cummins, 2013). It has been speculated that when a group moves and chants together, this will help to increase group affiliation and improve the group's coordination (McNeill, 1995). However, it is not clear

if verbal behavior alone will produce positive effects that will spread over to forms of movement coordination.

# Effects of Synchrony on Cognition and Action

While some studies have investigated how social and cognitive influences, such as the socially undesirable actions of others (Miles et al., 2010a), a cooperative versus a competitive context (Schmidt and Richardson, 2008), or a pro-social mindset in comparison to a pro-self focus (Lumsden et al., 2012), affect the emergence and stability of synchrony, the majority of empirical measures of behavioral coordination are concerned with the positive feelings that an individual will have toward the person or group with whom they are coordinating (von Zimmermann and Richardson, 2015). Sometimes these effects are measured by ratings and judgments the individual makes about the joint performance or likeability of an interaction partner or group, or the degree of similarity and closeness they feel toward them. At other times the effects are measured by decisions the individual makes about sharing resources or opting to cooperate with the group even if that means to personally sacrifice.

In addition to social outcomes, it is possible that behavioral coordination leads directly to changes in cognition and action. Discussions about the evolution of behavioral coordination often focus less on the advantages of liking and positive feelings in a group, and more on the adaptive value of being able to act as a coherent group, planning and executing a hunt, for example. Performance benefits from behavioral coordination are rarely studied, however, with one exception that we are aware of. Valdesolo et al. (2010) found that synchronous rocking in a chair increased the perceptual sensitivity of participants, which helped them perform better on a subsequent joint action task, in which they had to coordinate their movements with those of an interaction partner. Their findings suggest that there is indeed a synchrony-action as well as a synchrony-cognition link and that sharing the specific skill of synchronization might influence the execution of other joint tasks by enhancing cooperative and collaborative skills. Yet, the empirical evidence for the idea that synchronizing behavior at one time improves future action coordination is still sparse and calls for more extensive scientific investigations.

Furthermore, even though there is some evidence that hand movements performed in synchrony enhanced participant's memories for an interaction partner's utterances and facial appearance (Macrae et al., 2008), the benefits of synchronized activity on memory are not well-established, yet. More specifically, the possible benefits of collective speech on memory seem to have been overlooked entirely (von Zimmermann and Richardson, 2015). This is interesting since collective speech is employed in educational settings in which remembering the spoken word is important such as in schools or churches. On top of that, one could speculate that national anthems, songs sung at sport events, or slogans shouted during demonstrations are remembered not only because people are exposed to them frequently, or because they are memorable, but also because they are almost exclusively associated and performed with the collective.

# Verbal Coordination, Groups, and Action

In our experiment, groups of 20–30 participants either read a list of words out loud together or individually. Participants reading single words in unison is quite different to the coordinated, spontaneous joint speech that one finds during demonstrations or at a football game. However, it is a first approximation, and allowed a close comparison with participants in the asynchronous speech condition. Those people read the same words out loud, but started at different places in the list, and so spoke out of time with each other.

After reading for around 2 min, participants played a group video game in which they used audience response handsets to jointly control a tightrope walker and keep him upright (Richardson et al., 2011). Following the game (**Figure 1**),

participants were asked to recall as many words as possible from the list, and rate their feelings toward their group. Our hypotheses were that those in the synchronized reading condition would perform better as a group in the action task, they would remember more words from the list, and have increased feelings of group affiliation.

# MATERIALS AND METHODS

# Participants

In exchange for course credit, 215 participants from UCL participated in this study (M age = 18.85, SD age = 0.90, Number of Males = 35). They were run in eight groups of between 23 and 34 people as part of a lab demonstration course. The participants were informed that this was research on the 'effects of memory retrieval' and were unaware of the true research hypothesis until after the experiment was complete.

# Ethics Statement

Ethical approval was obtained from the UCL Research Ethics Committee. All participants consented to taking part in this experiment and were fully debriefed upon completion of the study.

# Apparatus and Stimuli

Each participant was given a Turning Technologies audience response handset. Button presses were sent to a USB receiver plugged into a MacBook. These responses were sent to the tightrope game, developed by Delosis. The MacBook was connected to a projector, which displayed the game on a large screen that everyone could see.

In the game, participants saw a man holding a pole, balancing on a rope (**Figure 1**). Each time one of the participants pressed either 1 or 3 on their handset, it sent a very small nudge to the tightrope walker, sending him to the left or right. The size of individual nudges depended on the number of people playing, such that the strength of all nudges added together would be the same across games with different numbers of people. The game was made harder by tomatoes that were fired from the sides of the screen, destabilizing the tightrope walker. They appeared at random and their frequency varied to change the difficulty of the game. The movements of the tightrope walker and the appearance of the tomatoes were governed by a physics engine that accounted for the size and position and momentum of the objects.

A game ended when the tightrope walker fell off the rope, or participants successfully kept him upright for 30 s. **Figure 2** shows the tightrope walker's angle and the net response from the audience across 20 s of one of the games in our experiment.

# Procedure

Participants were randomly allocated to groups, and each group was assigned to the synchronous or asynchronous speech condition. Participants were given a list of 54 words, split into three columns. They were told to read them out loud, completing two cycles of the entire list. In the synchronous condition, participants were instructed to start at the top of the page with the first word and read the words at the same time as each other. In the asynchronous condition, participants were first given a number between 1 and 3. They were told to start reading at the top of the first, second, or third column, respectively. Since participants were numbered consecutively where they sat, participants sat next to each other always started in different places.

Once participants had read through the list twice (which typically took around 100 s) they were introduced to the tightrope game. They were allowed a practice session with no tomatoes being fired as we explained how they could control the tightrope walker. Then they played five games with monotonically increasing rates of tomatoes being fired at them. If the tightrope walker fell off before 30 s, the game was restarted, until participants were able to complete a total of 30 s.

After playing the game, participants filled in a worksheet. In 60 s they wrote down as many of the words as they could remember from the list that they had read out previously. Then they responded on a 7-point Likert scale from 'strongly disagree' to 'strongly agree' to the following statements, designed to assess participants' positive feelings toward their group:


# RESULTS

# Memory and Affiliation

Participants in the synchronous conditions scored better on the memory test and felt more affiliation toward their groups, as shown in the two distributions plotted in **Figure 3**. For a memory score, we counted the number of words that participants correctly recalled minus the number that they incorrectly recalled. For every participant, the averages of the four affiliation items were calculated. Affiliation ratings for the synchronous groups (M = 25.22, SE = 0.39) were higher than for the asynchronous groups (M = 22.20, SE = 0.51), and memory scores for the synchronous groups (M = 6.96, SE = 0.93) were also higher than for the asynchronous groups (M = 4.15, SE = 0.70). Conventional t-tests found significant differences between conditions for the memory scores [t(212) = −2.20, p = 0.029] and affiliation ratings [t(212) = −5.88, p < 0001]. The BayesFactor package (Morey and Rouder, 2015) in R was used to estimate the odds of differences between the conditions, plotted on the right of **Figure 3**. For both, memory score and rated group affiliation, an estimated difference of zero between the conditions lay outside the 95% credibility interval (Kruschke, 2010), giving strong evidence in favor of an effect of condition. Participants in the synchronous speech condition remembered more words than

participants in the asynchronous speech condition and they also expressed higher levels of liking for their group.

line shows the net left or right nudge from a group of participants as they try to keep him upright.

# Tightrope Game Performance

We analyzed performance on the tightrope game at three levels, as shown in **Figure 4**. At the broadest level, groups in the two chanting conditions succeeded at the game to roughly equivalent degree, measured by how close to upright they kept the tightrope walker. At the lowest difficultly level, all groups managed the task without having to restart, whereas at the highest level there were 1.3 restarts on average. However, there was not a significant effect on the number of restarts by difficulty level, condition, nor an interaction (all Fs < 1). Yet, looking in more detail at how they played the game, participants in the synchronous chanting condition tended to make a response more readily when the tightrope walker was closer to the vertical, and at each moment in time, their responses tended to be more homogenous within the group.

For each game, we calculated the average distance of the tightrope walker from the vertical in degrees. We ran an ANOVA with difficulty level and chanting conditions as factors, but there was no main effect of condition [F(1,6) = 1.1], only a marginally significant effect of difficulty level [F(1,59) = 3.51, p = 0.08], and no significant interaction [F(1,6) = 0.83]. To analyze individual participants' behavior, we calculated the average distance of the tightrope walker from the vertical at each moment the participant made a response. Participants in the synchronous condition made responses when he was approximately 5◦ closer to vertical. Bayesian analysis showed that the 95% credibility interval for this difference was above zero, which was also reflected by a significant t-test on the condition means [t(192) = 6.43, p < 0.0001].

Finally, we analyzed individual responses, calculating the proportion of identical responses that occurred 250 ms before and after each one. For each chanting condition, we plotted this measure of group similarity against the distance of the tightrope walker from the vertical. As can be seen in the final plot of **Figure 4**, when he was close to vertical, group similarity in responses was low, as participants were nudging him to both the left and right to keep him balanced. As he veered away from the upright, groups responses increasingly became more similar, as it was more apparent which direction he needed to be nudged in

order to right him. However, the two chanting conditions differed in this regard. As shown by the non-overlapping confidence intervals, from around 10◦ onward, responses in the synchronous group were more similar to each other moment by moment. A Bayesian analysis confirmed that between 10 and 70◦ , the 95% credibility interval for this difference between conditions was above zero.

In summary, there is evidence that reading out the list of words together had an effect on participants' behavior in a task of group coordination. When the results were analyzed at the level of games and groups there was only a marginally significant effect of chanting conditions. However, when individuals' responses were analyzed, we found that those in the synchronous condition more readily made responses as the tightrope walker deviated from the vertical, and once he passed 10◦ from the vertical, responses amongst the synchronous group were more similar to each others.

# DISCUSSION

With our experiment we wanted to expand on already existing synchrony and behavioral coordination literature in three ways. First, we wanted to see if the affiliative effects generally reported in pair studies scale up to larger groups. While some studies have reported that synchronized movement in small groups increased liking amongst group members (e.g., Reddish et al., 2013; Tarr et al., 2015), we also found that members of large groups reported to feel closer to each other after they had chanted together in synchrony. The finding that behavioral synchrony

can lead to interpersonal liking and rapport seems to therefore hold true also for much larger groups than previously reported on. This might not come as a surprise since human beings have engaged in synchronous movement and collective speech as part of rituals for centuries with important social consequences: Participation in collective rituals promotes social cohesion and thereby strengthens individuals' attachments to each other and the group, making effective group action possible (Whitehouse and Lanman, 2014). Respectively, research has shown that rituals not only significantly increase ingroup affiliation in comparison to non-ritualistic group activities (Wen et al., 2016), but those rituals, which include synchronous behavior, lead to increased liking and cooperation within a group (Fischer et al., 2013).

Second, we wanted to investigate if verbal synchrony alone is sufficient to induce the affiliative effects of behavioral coordination generally observed. Our groups were only instructed in relation to their verbal coordination, but no statements were made with reference to movement. This means that in theory, through the coordination of their verbal articulations, group members might have also spontaneously coordinated their postural movements (Shockley et al., 2009), and possibly even started sharing physiological dynamics such as heart rate (Fusaroli et al., 2016). While we cannot completely exclude this as a potential alternative explanation of, or at least mediating influence on our findings, we do not believe that any kind of physical coordination, which might have occurred, would have been strong enough to explain our results. In contrast to other experiments, which reported spontaneous coordination of movement or physiological functions, our participants were seated in rows next to and behind each other and did not have direct eye contact with one another. Except for chanting together they also did not interact with each other in any other way before moving on to playing the tightrope game. We are thus confident that verbal synchrony – as the prevalent form of coordination in the experiment – was the main mechanism, which lead to significant changes in our participants. Respectively, individuals' ratings of their perceived affiliation with the group and their groups performance increased in the synchronous condition. Joint speech, like joint movement, allows interaction partners to construe a shared representation of the world, in which intentions become aligned and common ground is established (Cummins, 2014). Like protestors chanting the same slogan together, demonstrating an extreme form of alignment with respect to the world (Cummins, 2014), the participants in our synchronous speech condition probably experienced higher levels of alignment than those participants, who were reading the words out asynchronously. Through coupling their actions during the joint speech task, participants established a common goal with affiliative, cognitive and coordinative consequences.

Third, we were curious to see if synchronous behavior would also affect action and cognition in addition to the social effects often observed. In other words, we wanted to find empirical evidence for the hypothesis that there is a synchrony-action link, that group members, who have previously synchronized with one another, will be better coordinated in a subsequent task. Our evidence supports this idea. Groups overall seem to do better on a coordination task after their members have engaged in synchronous behavior, at least at the harder levels of task difficulty. Why might this be the case? To successfully coordinate behavior and synchronize, people need to anticipate each other's behaviors (Sebanz et al., 2006; Konvalinka et al., 2010). In this respect, it has been argued that perceiving another's movements, for example, activates one's own action system for that same movement, which increases the likelihood for a matched action to occur (Brass et al., 2001). This suggests a tight neural link between perception and action, which could extend to the development of shared representations of a joint action task and of self and other (Hurley, 2008; Kirschner and Tomasello, 2009). While an increase in self-other overlap is said to foster social bonds (Galinsky et al., 2005), one could speculate that participants in our synchrony condition were able to develop a shared representation of the chanting task and each other, which then influenced not only their feelings for each other, but also improved their coordinative skills in the tightrope game.

In this study, however, not only did we find a synchronyaction link, but also a synchrony-cognition link: Participants

who had chanted words collectively, rather than reading them out loud by themselves, remembered more of these words at the end of the experiment. With the present data, of course, we can't judge whether the reason for the memory improvement in the synchronous condition was because the asynchronous chanting was a distraction to participants, and this caused them to encode fewer words in the first place, or because of motivational benefits from higher perceived affiliation with the synchronous group, or because of a general performance boost that mirrored the improved performance in the balancing task. In spite of this limitation, our results seem to be in line with the findings from two other studies, which looked at the relationship between synchrony and memory, albeit in relation to social information. Synchronous movement was reported to enhance people's attention for each other during a social exchange, enhancing memory for another's verbalizations as well as their facial appearance (Macrae et al., 2008). Comparing the memory performance of participants, who listened to words over headphones, while engaging in arm curls together with a confederate either in-phase or in the less stable anti-phase coordination, produced a memory advantage for self-related in comparison to other-related information in the anti-phase coordination, whereas this effect was eliminated when participants had moved in-phase with the confederate (Miles et al., 2010b). The findings from our study suggest that synchronous actions might not only influence memory in relation to social information, but more generally as well. This, however, needs to be tested more rigorously in the future.

A diverse set of researchers have come to the realization that perception, action and cognition cannot be fully understood by investigating single individuals (e.g., Sebanz et al., 2006; Barsalou et al., 2007; Robbins and Aydede, 2009). Studies of situated cognition show that cognition 'in the wild' is intimately linked not only to representations of the external world, but also to the cognitive processes of others. For example, Hutchins (1995) observed the ways that navy navigators would distribute cognitive processes between themselves by using external tools and representations, such as maps and notations. In the past few years, experimental methods have also started to reveal the cognitive mechanisms involved in the joint activity of two people engaged in parallel tasks (Sebanz et al., 2006), talking to each other (Richardson D. C.et al., 2007), or just silently looking at pictures, changing their gaze patterns because of the knowledge that someone is looking at the same thing (Richardson et al., 2012). Knoblich and Jordan (2003) gave a detailed analysis of the way that two people coordinate their actions: To be successful, participants had to anticipate both the movements of the objects in the game and the actions of their partner. It is possible that chanting together in our experiment helped participants to anticipate each other's actions and thereby facilitated coordination in the tightrope walker game. However, it becomes clear that no explanation at this point goes beyond speculation. It will therefore be an interesting task in the future to study how perception, cognition and action are linked in social situations, which involve more than two people, and what the exact mechanisms are, which could explain a synchrony-action link.

Behavioral coordination is often portrayed as something that binds people together, evoking positive and pro-social feelings toward interaction partners. However, there is more to coordinated joint action than hugs. For example, while synchrony, like mimicry (Chartrand and Bargh, 1999) often increases rapport and cooperation, sometimes it has quite different results. In two studies, Wiltermuth (2012a,b) showed that synchrony can lead to aggressive behavior and destructive obedience. People who had just bonded with one another through synchronous action were more likely to comply with each other's requests, even if this entailed to engage in aggressive behavior toward others, such as administering a noise blast to another group of participants, or killing sow bugs at a leader's request (Wiltermuth, 2012a,b). These studies support the idea that physical synchrony does not exclusively lead to pro-social, but also to anti-social and destructive behavior. There seems to be a dark side to the phenomenon, and verbal synchrony seems to have comparable effects. Spectators at a football game who had engaged in collective chanting during the game reported higher levels of aggression than those who had not chanted (Bensimon and Bodner, 2011).

# CONCLUSION

Anthropologists and historians have long argued that acting together in time influences group cohesion and group action. In our experiment, large groups of people, who had engaged in collective speech, acted better together in a subsequent task, displayed improved cognitive functions, and liked each other more. Although we were able to explore the scope of behavioral coordination in our experiment, there is one significant question about the directionality of the effects we found, which we cannot answer with our findings. Does synchrony increase group affiliation and thereby improve cognition and action, or does synchrony increase group performance and this improvement increases the attraction to the group? No matter what the answer to this question is, the New Zealand rugby team should keep performing the haka prior to important games, as it might be an important part of their success strategy.

# AUTHOR CONTRIBUTIONS

JvZ developed the initial experimental design. Both authors performed the testing and data collection together. DR performed the data analysis. Both authors contributed equally to the writing of this manuscript and approved the final version for submission.

# ACKNOWLEDGMENTS

We would like to thank the PALS lab demonstrators and C. Aichelburg at University College London, who helped us with the organisation of this experiment.

# REFERENCES



Wiltermuth, S. S., and Heath, C. (2009). Synchrony and Cooperation. Psychol. Sci. 20, 1–5. doi: 10.1111/j.1467-9280.2008.02253.x

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 von Zimmermann and Richardson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# How Moving Together Brings Us Together: When Coordinated Rhythmic Movement Affects Cooperation

#### Liam Cross <sup>1</sup> , Andrew D. Wilson<sup>2</sup> and Sabrina Golonka<sup>2</sup> \*

*<sup>1</sup> Department of Psychology, Lancaster University, Lancaster, UK, <sup>2</sup> Psychology, School of Social Sciences, Leeds Beckett University, Leeds, UK*

Although it is well established that rhythmically coordinating with a social partner can increase cooperation, it is as yet unclear when and why intentional coordination has such effects. We distinguish three dimensions along which explanations might vary. First, pro-social effects might require in-phase synchrony or simply coordination. Second, the effects of rhythmic movements on cooperation might be direct or mediated by an intervening variable. Third, the pro-social effects might occur in proportion to the quality of the coordination, or occur once some threshold amount of coordination has occurred. We report an experiment and two follow-ups which sought to identify which classes of models are required to account for the positive effects of coordinated rhythmic movement on cooperation. Across the studies, we found evidence (1) that coordination, and not just synchrony, can have pro-social consequences (so long as the social nature of the task is perceived), (2) that the effects of intentional coordination are direct, not mediated, and (3) that the degree of the coordination did not predict the degree of cooperation. The fact of inter-personal coordination (moving together in time and in a social context) is all that's required for pro-social effects. We suggest that future research should use the kind of carefully controllable experimental task used here to continue to develop explanations for when and why coordination affects pro-social behaviors.

Keywords: coordinated rhythmic movement, interpersonal entrainment, interpersonal synchrony, interpersonal coordination, rhythmic entrainment, joint action, social cognition, cooperation

# INTRODUCTION

It is well-established that moving in time with other people can increase cooperation between coactors (Anshel and Kipper, 1988; Wiltermuth and Heath, 2009; Kirschner and Tomasello, 2010; Reddish et al., 2013, 2014; but see Kirschner and Ilari, 2014), though, it is still unclear what it is about these Coordinated Rhythmic Movement (CRM) tasks that makes people more cooperative. Previous work has identified a number of interesting effects and it is now time to begin trying to explain why these effects occur. At present, this work is complicated by the sheer variety of paradigms employed to generate and measure these effects. The purpose of this paper is to try to lay the groundwork for developing an explanation of the pro-social effects of coordination. We do this by tackling a number of basic questions about the effect using a single, well-understood, CRM paradigm.

#### Edited by:

*Michael J. Richardson, University of Cincinnati, USA*

#### Reviewed by:

*Lynden K. Miles, University of Aberdeen, UK Kerry Marsh, University of Connecticut, USA*

\*Correspondence: *Sabrina Golonka s.golonka@leedsbeckett.ac.uk*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *26 September 2016* Accepted: *05 December 2016* Published: *22 December 2016*

#### Citation:

*Cross L, Wilson AD and Golonka S (2016) How Moving Together Brings Us Together: When Coordinated Rhythmic Movement Affects Cooperation. Front. Psychol. 7:1983. doi: 10.3389/fpsyg.2016.01983*

In this paper we consider some classes of model that could characterize how coordination impacts cooperation. These models vary along three dimensions: (1) whether increased cooperation depends on in-phase synchrony (S+) or coordination, more generally (S−), (2) whether the relationship between social coordination and cooperation is direct (D+) or mediated (D−), and (3) whether cooperation varies in proportion to coordination at the individual level (P+), or whether there is a threshold effect (P−). The first dimension relates to whether synchronous (in-phase) movements are necessary to impact cooperation, or whether other coordinations (e.g., anti-phase) can also affect cooperation. The second dimension concerns whether there is a direct path between social coordination and cooperation, or whether this relationship is mediated by other factors, such as group cohesion (e.g., Wiltermuth and Heath, 2009). The third dimension concerns whether there is a linear relationship between coordination and cooperation at the level of individual participants, or whether pro-social benefits obtain (and then remain more or less constant) after a certain threshold in coordination is reached. These models are, themselves, descriptive rather than explanatory. However, this work moves us further down the road toward explanation by explicitly identifying the features that any future explanatory model must possess. We first discuss the dimensions of interest in more detail below with reference to the existing evidence from the literature in favor of particular classes of models. We then summarize our choice of movement task and explain how it enables us to test the dimensions of interest, thereby helping us home in on essential features that an explanatory model of the pro-social effects of intentional coordination must possess.

# IN-PHASE SYNCHRONY VS. COORDINATION (S+ VS. S−)

Movements are coordinated when two rhythmically moving limbs (oscillators) move so as to maintain some relative phase with respect to one another. Movements are synchronous when those limbs move in-phase (i.e., at 0◦ relative phase). During inphase movements, the two oscillators move in the same direction at the same time. During anti-phase (180◦ relative phase) movements, each oscillator moves in the opposite direction as its partner at the same time. Throughout this work the term synchrony is used to refer to in-phase movements only (in line with the general literature on coordination, e.g., Kelso, 1995), although elsewhere anti-phase has sometimes been treated as an example of synchrony (e.g., Miles et al., 2010). Our definition of synchrony was chosen in order to allow us to easily discriminate between strict in-phase synchronization and other forms of coordination (i.e., anti-phase). Technically, successfully moving so as to maintain any relative phase (from 0 to 360◦ ) is an instance of a coordinated rhythmic movement (although there are well known limits to the coordinations humans can produce without extensive training; Kelso, 1995). The question is whether the pro-social effects of coordination reported in the literature, are actually restricted to cases where the coordination is in-phase (synchronous movements).

If coordination, generally, and not just in-phase synchrony, has positive consequences on cooperation, then the effects should be obtained following coordination at any relative phase. We currently lack evidence to support this idea because the majority of tasks used to test the pro-social effects of coordination rely exclusively on in-phase coordination (and, to our knowledge, our experiment is the first work to address the effects of anti-phase coordination on cooperation, specifically). Those that have employed anti-phase conditions have found mixed evidence concerning whether anything besides in-phase synchrony impacts social variables (e.g., Miles et al., 2010; Cirelli et al., 2014; Sullivan et al., 2014). To begin disambiguating the effects of in-phase synchrony from the effects of coordination more generally, Experiment 1 explicitly compares the effects of in-phase and anti-phase coordination on post-task cooperation.

# DIRECT VS. INDIRECT EFFECT (D+ VS. D−)

The effect of coordination on pro-social variables is indirect if coordination must impact an intervening variable (e.g., group cohesion) or coincide with a causally relevant variable (e.g., social context) in order to affect cooperation. If this is the case, then coordination only has positive consequences for pro-social variables by virtue of its effect on something like group cohesion or by providing the opportunity to engage in a certain type of social context. In contrast, the effect of coordination on prosocial variables could be direct. If the relationship is direct then coordination would not need to impact an intervening variable or coincide with another causally relevant variable to influence cooperation.

The literature, to date, is conflicted concerning directness. We first consider evidence for a mediating variable between coordination and cooperation. Research has focused exclusively on two potential mediators—group cohesion and self-otheroverlap. Group cohesion is the feeling of being on the same team and being emotionally connected with other group members. Wiltermuth and Heath (2009), Wiltermuth (2012) found that levels of post-task group cohesion were related to the social effects of coordination, though others (e.g., Reddish et al., 2013; Lumsden et al., 2014; Dong et al., 2015) found no such relationship. The discrepancy in results may be, at least, partially explained by differences in how group cohesion was conceptualized and measured. Reddish et al. (2013) grouped emotional connection, trust and self/other overlap (the extent of self-rated overlap between oneself and others) into a single construct, which was termed group cohesion, after factor analysis suggested they all tap a similar construct. Wiltermuth (2012), on the other hand, measured group integrators only (i.e., perceived closeness, connectedness and similarity to the group) and labeled the construct emotional connection (see also Wiltermuth and Heath, 2009; Lumsden et al., 2014; Dong et al., 2015).

Others have investigated self-other-overlap as a potential mediator of the relationship between coordination and cooperation; again, evidence for the mediated model is inconclusive. Lumsden et al. (2014) and Reddish et al. (2013) found evidence in favor of a mediating relationship, while Reddish et al. (2014) found no evidence for such a relationship. As before, it is difficult to draw conclusions from the literature given the plurality of methods and measures.

Another way the effect of coordination on pro-social variables could be thought of as in/direct depends on whether a coordination task, in and of itself, (i.e., absent a particular social context), is sufficient to impact coordination. If it is direct in this way then coordinating movements with, say, a metronome or a computer display rather than a co-actor, would be sufficient to lead to social consequences. If it is indirect in this way, then coordination must be accompanied by some kind of social context to impact pro-sociality; i.e., effect would not be due to coordination "per se"—coordination itself and/or coordination by itself. There is considerable evidence that some kind of social context is an important element to obtain positive social effects following coordination tasks (Hove and Risen, 2009; Kirschner and Tomasello, 2009; Wu et al., 2013; Launay et al., 2014), however, questions remain about how much social context is necessary and whether this relationship is one of mediation or moderation.

In sum, the evidence from previous research is inconclusive about whether coordination must impact an intervening variable in order to have positive consequences on cooperation. Evidence is stronger for the idea that coordination must coincide with a social context in order to affect cooperation. The studies reported below provide the strongest evidence to date for D+ vs. D− models by testing a variety of potential mediators (i.e., group cohesion, self-other overlap, trust, self-rated success at coordination, self-rated task difficulty, task difficulty, and mood) within subjects at both pre- and post-coordination. In line with the substantial existing evidence that social context is important, all of the studies below involve pairs of participants completing an intentional coordination task together; however Followup 1 manipulates whether the information participants use to coordinate is social or non-social.

# INDIVIDUAL VS. GROUP LEVEL EFFECTS (P+ VS. P−)

Whether the effect of intentional coordination on cooperation is direct or indirect, there are two main types of relationship we might observe between these variables. The first possibility is that individual measures of coordination success predict individual levels of cooperation. That is, changes in cooperation occur in proportion to changes in coordination success. The second possibility is that there is a threshold relationship between coordination and cooperation. In this case, coordination would positively influence cooperation as long as some minimum threshold of coordination success was achieved.

Previous research paints a mixed picture in terms of what to expect on this dimension. The only work focusing on cooperation to take actual measures of coordination found that coordination did not predict cooperation (Kirschner and Ilari, 2014), but this result is limited by the fact that they found no effect of coordination on cooperation anyway. Looking beyond cooperation to other social variables does little to clarify the picture. On the one hand, there is evidence that tightness in movement coupling predicts likability between co-actors (Hove and Risen, 2009). On the other hand, coordination success is not a good predictor of post-task trust (Launay et al., 2013). The studies reported below compare P+ vs. P− models by testing whether individual level success at coordination predicts subsequent individual level cooperation behavior.

# Our Coordination Task

Researchers have used a variety of tasks to investigate the effect of intentional coordination on pro-sociality (e.g., waving cups and singing: Wiltermuth and Heath, 2009; flexing and extending arms: Miles et al., 2010). It is difficult to lay the groundwork for an explanatory model using results from such a variety of complex tasks. It would be preferable to identify a coordination task that is simple enough to study but that is complex enough to allow all the necessary manipulations required to investigate when and how coordination affects social behavior. We believe we have found such a task and this is described below, though, first we explain in more detail the basic structure of CRM tasks, generally.

CRM tasks are essentially perception-action tasks, and have typically been studied as such in the experimental literature (e.g., Kelso, 1995; Bingham, 2001, 2004). They involve the continuous control and matching of rhythmic movements via perceptual information about the coordination between those movements. The rhythm of a CRM is defined by the relative phase between the oscillating movements. Movements are coordinated when a particular relative phase is maintained within some error band. As discussed earlier, in-phase coordination occurs when the movements are in the same direction at the same time, while anti-phase coordination occurs when the movements are in the opposite direction at the same time. The remaining range of coordinated movements is generally described as "out-of-phase." The basic phenomena of a CRM task are that movements are stable at in- and anti-phase, while movements at any other phase are difficult to maintain and highly variable. In-phase movements are more stable than anti-phase movements and, if the frequency of anti-phase movements is increased to around 3–4 Hz they transition to in-phase. These effects persist when the coordination is enacted between two people (Schmidt et al., 1990) and between a person and a point light display (e.g., Wilson et al., 2005a,b). This indicates that the ability to maintain rhythmic coordination depends on a perceptual coupling of information specifying relative phase between oscillators.

Bingham et al. (Bingham, 2001, 2004; Snapp-Childs et al., 2011) have developed a model of CRM (the Bingham model) using a task where participants move joysticks from side to side at some relative phase to coordinate the motions of two dots on a computer screen. The screen shows a point light display representing the limbs' motions (see also Wilson et al., 2005a,b). This task contains all the critical elements of a CRM task: voluntary control of limbs, coordination of limbs with a coactor and perceptual control of the coordination. The Bingham model explains the above phenomena by explicitly modeling the perception-action components involved in the task. Several papers have empirically validated the main predictions of the model (Wilson and Bingham, 2008; Wilson et al., 2010; Snapp-Childs et al., 2011).

The studies below are based on the task used by Bingham and colleagues to develop an explanatory model of CRM. This CRM task is particularly well-suited to the job of discriminating S+ and S− models, as it's possible to run the task with any target relative phase, and of discriminating P+ and P− models, as it allows us to compute precise and sensitive measures of coordination that can be used to determine how much actual coordination predicts post-task measures, if at all. We can then combine data from this task and other measures to discriminate D+ and D− models as well.

This choice of task is also ideal for constructing an appropriate control task, which has proven a major challenge in the literature. A good control task must be comparable to the CRM task, involving co-actors making comparable movements (though ones that are not rhythmically coordinated with their co-actors). However, control tasks in the six papers looking at how CRM affects cooperation varied considerably in how closely they match the experimental task (see **Table 1**). Some previous work has even used anti-phase movements as a control condition. However, as noted above, moving anti-phase (or even out-of-phase) with someone is still a type of CRM. People can and do entrain at antiphase, and similar social effects might also be fostered by antiphase interpersonal entrainment (see Cirelli et al., 2014). Tasks involving completely disparate activities such as doing a jigsaw (Reddish et al., 2014) or watching a documentary (Anshel and Kipper, 1988) may also not be appropriate controls, as they are too different from the experimental tasks at hand. For example, tapping one's foot in time to a metronome with two other people is not very similar to doing a jigsaw with two other people (Reddish et al., 2014), as these tasks vary in multiple ways (i.e., one includes music and one does not, one includes coordinating your moments with the other person in a certain way while one does not employ movement coordination at all). This makes interpreting findings between conditions as the result of CRM difficult, if not impossible.

Our CRM task is amenable to a straight forward, wellmatched control task whereby participants are instructed to move their joysticks at different frequencies while performing different movements. This control condition is minimally different from coordinated conditions (both involve rhythmically moving a joystick at a specified frequency), while breaking the coordination between partners.

# The Current Studies

The goal of the studies that follow is to begin homing in on the class of model that best captures the relationship between intentional coordination and cooperation. This work will place specific, empirically-driven constraints on future work concerning the mechanism by which coordination influences cooperation. Experiment 1 was designed to discriminate between S+ and S− models (in-phase synchrony or coordination), between D+ and D− models (direct or mediated), and between P+ and P− models (group or individual level effect). Based on the results of this experiment we conducted two follow-ups. The first further explores the S+/S− distinction by investigating the consequences of coordinating via social and non-social information. The second probes the necessary features of a coordination task by testing two control tasks.

# EXPERIMENT 1

Experiment 1 tested whether in-phase synchrony is necessary to the effect of coordination on cooperation or whether the effect obtains with other coordinations as well (S+ or S−). Since our task allows a kinematic record of each participant's movements, we also tested whether cooperation varies in proportion to coordination, allowing us to discriminate between P+ and P− models. Finally, we measured several potential mediators suggested from previous research, which provides some evidence for D+ vs. D− models.

# METHODS

# Participants

Sixty-six undergraduate students at Leeds Beckett University volunteered to participate (19 males and 47 females Mage = 19.17 year, SDage = 2.77). All participants were naive to the aims of the study. The experiment was approved by the Leeds Beckett University Psychology Ethics Review Board.


# Design

The study employed an experimental design with one betweensubjects factor: Movement Phase. This had three levels: in-phase (0◦ ), anti-phase (180◦ ), or no coordination (control).

# Tasks and Measures Movement

In both experimental conditions, pairs of participants, sitting side by side moved one joystick each (Logitech Pro joysticks with force feedback disabled) horizontally at 0.75 Hz using a point light display (PLD) to monitor their and their partner's movements. The PLD consisted of two white feedback dots displayed on a black background by a single laptop screen positioned approximately 1 m in front of them. The dots were 40 × 40 pixels, and separated by a visual angle of 0.14◦ , one above the other, positioned in the center of the screen (Wilson et al., 2005a,b, 2010; Snapp-Childs et al., 2011). In the in-phase condition, participants moved so as to maintain 0◦ relative motion between their and their partner's dots. In the anti-phase condition, participants moved so as to maintain 180◦ relative motion between their and their partner's dots.

For the control task, participants made uncoordinated movements at different frequencies. One participant always moved their joystick at 0.6 Hz and the other always moved at 0.9 Hz (0.75 ± 0.15 Hz). Participants alternated moving their joysticks vertically and in clockwise circles, so that partners never performed the same movement during a trial. Participants switched movements every trial (e.g., person 1 moved vertically on one trial, in circles on the next etc.; person 2 in the pair did the opposite).

Participants in all conditions first saw two 15 s demonstrations of dots moving at the desired phase and frequency. In the experimental conditions both dots moved at 0.75 Hz (at either 0 or 180◦ relative to each other). In the control condition one dot moved at 0.6 Hz and the other at 0.9 Hz. After each demo participants had 30 s practice time to acquaint themselves with the required movements. Following this brief initial practice, participants completed six 60 s trials. Each trial was preceded by a four second version of the demonstration pacing them to the required phase and frequency of movements. This experiment was run on a MacBook Pro with a custom Matlab toolbox programmed by the second author and incorporating the Psychtoolbox (Brainard, 1997; Pelli, 1997; Kleiner et al., 2007).

# Social Mediators

#### **Self/other overlap**

Self/other overlap was measured using the Inclusion of the Other in Self (IOS) scale (Aron et al., 1992). Participants were asked to indicate how much overlap they felt between themselves and the other participant by choosing from one of seven different diagrams. The diagrams consist of increasingly overlapping circles, one representing the self and one representing the other (see Data Sheet 1).

### **Cohesion scale**

Five questions were used to measure mood, trust and cohesion (see Data Sheet 2). Question 1 measured participants' mood. Question 5 measured how much participants trusted each other. Questions 2–4 measured participants' cohesion to each other (closeness, connectedness and similarity). These were the same questions as have previously been used to measure cohesion in Wiltermuth and Heath (2009). Participants recorded their responses to each of these questions by marking a 185 mm continuum. This response scale was used to make it more likely to detect any changes after the movement manipulation and has been successfully used in a similar context by Lumsden et al. (2014).

# Dependent Variables

# **Economic game**

This included both a Public Goods Game (PGG) and an investment game (see Data Sheet 3). The PGG was identical to that used by Wiltermuth and Heath (2009) except token values were changed from dollar amounts to points. Participants were given a response booklet containing instructions and response sheets for each of five rounds of play. The aim of the game was to collect as many points as possible. In order to encourage competition between participants, the person who collected the most points won £40 of vouchers. For each of the five rounds participants had ten tokens to allocate between two accounts, a private account and a public account. Each token in the public account was worth three points to each of the players, while each token in the private account was worth five points only to the player who allocated that token. In each round participants privately recorded how many tokens they wished to allocate to each of the two accounts.

#### **Investment game**

After Round 5 of the PGG, participants played an investment game (adapted from Berg et al., 1995) to measure trust and reciprocity. Participants had the chance to transfer/invest the points (none, a quarter, half, or all) that they had earned in the public goods game. Any points that were invested were automatically doubled but it was up to the other player how many of these points to return to them (none, only the original amount invested, the original investment plus half of the earned bonus, or all of the original investment and the earned bonus). Each participant acted as both investor and banker simultaneously by confidentially marking their choices on a separate sheet without any discussion.

# Procedure

This study was conducted in pairs. Sessions lasted approximately 25 min. Participants completed the IOS and the cohesion scale (pre-test measures of potential mediators, and mood item) followed by the movement task. Participants then rated their perceived success at the coordination task as well as task difficulty and enjoyment using four-point Likert scales. Next, participants completed a second copy of the IOS and cohesion scale (posttest measures of potential mediators, and mood item). Finally, participants took part in the Economic (public goods and investment) Game.

# RESULTS

We checked whether mood, task difficulty, task enjoyment, and perceived success differed between in-phase, anti-phase, and control tasks. The distribution of scores on each of these variables was found non-normal from Shapiro-Wilkes tests (SW tests of normality used throughout) (p's < 0.05). Kruskal-Wallis tests confirmed that scores on these variables did not differ between movement tasks (all p's > 0.05). It was therefore concluded that mood, task enjoyment, perceived task difficulty or perceived success did not contribute to the effects described below.

# Coordination

All movement trials except for the first two practice rounds were analyzed. A low-pass Butterworth filter with a cut-off frequency of 10 Hz filtered each dot's position time series. A 60 Hz time series of the relative phase between the two dots was computed as the difference between the arctangent of each dot's velocity over position at each sample.

Mean vector length (MVL) is the circular equivalent of the standard deviation (Batschelet, 1981; see Wilson et al., 2005a,b for more detail). It is the normalized length of the resultant vector obtained by summing the relative phase vectors from each time step and measures coordination stability. MVL ranges from 0 (indicating minimum stability, a uniform circular distribution) to 1 (indicating maximum stability, no variability).

The distribution of MVL scores of those who moved in-, antiphase and those who did not coordinate all differed significantly from normality (p's < 0.05). An independent samples Kruskal-Wallis test identified a significant effect of phase on coordination scores [H(2) = 47.29, p < 0.001]. Bonferonni post-hoc tests with adjusted p-values (for 3 pairwise comparisons) showed more stable coordination for those moving in- and anti-phase than in the control condition (see **Figure 1** for mean MVL scores), p's < 0.001. Coordination at anti-phase did not significantly differ from coordination at in-phase (p > 0.05)<sup>1</sup> .

# Cooperation

Next we examined whether participants in the in- and antiphase conditions were more cooperative post movement task than those in the control condition. A univariate ANOVA found a significant effect of phase on the mean public account donation [F(2, 63) = 3.62, p < 0.05, <sup>N</sup> <sup>2</sup> = 0.10]. Bonferroni post-hoc tests indicated that the only significant difference lay between those who moved in-phase and the control (p < 0.05), no other comparison was significant (p's > 0.05). Post-coordination cooperation was greater for participants in the in-phase group compared to the control group (**Figure 2**).

Next we conducted a simple linear regression with each pair's MVL scores and each pair's average public goods donation to determine if the degree of coordination success predicts

the degree of cooperation. A pair's coordination score did not significantly predict their average cooperation score [F(1, 31) = 3.19, p > 0.05, r <sup>2</sup> = 0.093].

# Trust and Reciprocity

Trust was measured using the first part of the investment game (choosing what to invest with the other player: investing nothing, a quarter, half, or all). The distributions of those who moved inphase, anti-phase, and those who did not coordinate all deviated significantly from normality (p's < 0.05). A Kruskall-Wallis test showed no significant difference in trust between those who moved at in-, anti-phase and those who did not coordinate [H(2) = 4.48, p > 0.05].

As a further check that coordination had no effect on trust, we compared self-reported measures of trust across the coordination conditions. Change scores for the self-reported trust measure were first calculated by subtracting each person's "before" score from their "after" score. The distributions for those who moved in-phase, anti-phase, and those who did not coordinate all deviated significantly from normality (p's < 0.05). Consistent with the measure of trust based on the investment game, a Kruskall-Wallis test showed no significant change in selfreported trust between those who moved at in-, anti-phase, and those who did not coordinate [H(2) = 3.87, p > 0.05].

<sup>1</sup>Anti-phase is typically less stable than in-phase; this is one of the hallmarks of coordinated rhythmic movement. The lack of a difference here is a common issue with the MVL measure because it does not account for what relative phase people are actually performing. Anti-phase coordination can show an elevated MVL if people end up switching to in-phase coordination, and do that well (Wilson et al., 2005a,b; Snapp-Childs et al., 2011). We address this in detail in the Discussion section.

Reciprocity was measured using the option chosen in the second part of the investment game (choosing to return nothing, return only the original investment, return the original investment plus half of the bonus, or, return the original investment plus all of the bonus). Reciprocity scores for those who moved in-, anti-phase and those who did not coordinate all deviated significantly from normality (p's < 0.05). A Kruskall-Wallis test showed no significant difference in reciprocity between those who moved at in-, anti- phase, and those who did not coordinate [H(2) = 4.11, p > 0.05].

# Potential Mediators (Group Cohesion and Self/Other Overlap)

Change in group cohesion was measured as the sum of the difference between the three cohesion change questions (how similar/close/connected they felt to each other). A univariate ANOVA with phase (in-, anti-phase, no coordination) showed no significant effect of phase on group cohesion [F(2, 63) = 1, p > 0.05].

Change in self-other overlap was measured as the difference in self-other overlap before and after engaging in the coordination task (post-coordination—pre-manipulation). The distribution of overlap change scores for those who moved at in-, anti-phase, and those who did not coordinate all deviated significantly from normality (p's < 0.05). An independent samples Kruskal-Wallis test showed no significant effect of phase on changes in overlap between the three conditions [H(2) = 0.262, p > 0.05].

Analysis previously reported also confirmed that self-report measures of trust, mood, task difficulty, task enjoyment and perceived success did not differ between movement conditions.

# DISCUSSION

# Overview

The results showed that participants who moved in-phase with one another were more cooperative than those who moved in an uncoordinated manner. None of the measured candidate mediators were related to cooperation, and cooperation was not predicted by the level of coordination between partners. The results of Experiment 1 lend support to S+, D+, and P− models of how intentional coordination affects cooperation.

# Coordination Success (P+ vs. P− Models)

MVL scores suggested participants coordinated equally well at both in- and anti-phase. Coordination in both of these experimental conditions was better than in the control condition. MVL scores did not significantly predict cooperation, which suggests that the social effects seen post-entrainment do not vary linearly at an individual level with coordination. This is consistent with Kirschner and Ilari (2014) and Launay et al. (2013) and rules in favor of P- style models.

MVL is a measure of coordination (i.e., the extent to which people are doing something together) but it is not a measure of success at performing the target coordination. For example, people trying to move in anti-phase might fail to do so and spend their time moving in-phase. MVL might still be high because the partners were coordinating, even though they had failed at the target task (see Snapp-Childs et al., 2011 and Wilson et al., 2005a for detailed analyses of this problem). A better measure of coordination for this purpose is the proportion-time-on-target. This is the proportion of time people spent coordinating at the required phase (within an error bandwidth, typically set to 20◦ ). Proportion-time-on target, therefore, indicates how successful participants are at coordinating at the required relative phase (Wilson et al., 2010; Snapp-Childs et al., 2011, 2015). This measure was not used in our primary analysis because our control task has no target relative phase (meaning it is not possible to compute proportion-time-on-target for the control condition). However, the proportion-time-on-target can be calculated for the experimental conditions.

Further analyses of the proportion-time-on-target scores revealed that those who were instructed to move in-phase were more successful than those that were instructed to move anti-phase (See **Figure 3** for mean proportion-on-target-scores). Scores for those who moved at anti-phase were not normally distributed (p < 0.05). Because of this, an independent samples Mann-Whitney U-test was performed, which showed that there was a significant effect of phase on coordination (U = 140 p < 0.05), with those moving at in-phase performing significantly better than those moving anti-phase. However, coordination measured with proportion-time-on-target still did not significantly predict cooperation. A simple linear regression was run with each pair's proportion-time-on target scores and each pair's average public goods donation, to determine if coordination success predicts cooperation. A pair's coordination score did not significantly predict a pair's average cooperation score [F(1, 42) = 0.54, p > 0.05, r <sup>2</sup> = −0.011]. With the improved measure, we could identify the expected difference in performance between in- and anti-phase but the degree of coordination still did not predict the degree of cooperation. The data therefore still come down in favor of P− models; once some threshold amount of coordination has occurred, cooperation is positively affected.

# Potential Mediators (D+ vs. D− Models)

Against predictions, changes in trust, group cohesion and self/other overlap did not differ between conditions, suggesting

that these factors do not mediate CRM's effect on cooperation (supporting D+ models). The finding that increases in group cohesion do not mediate these effects supports the work of Dong et al. (2015), Lumsden et al. (2014), and Reddish et al. (2014). However, it did not support the work of Reddish et al. (2013), Wiltermuth and Heath (2009), and Wiltermuth (2012), which found that cohesion partially mediates the relationship between CRM and its social consequences. The finding that self/other overlap does not mediate these effects contradicts studies reported by Lumsden et al. (2014) and Reddish et al. (2013).

One reason for the inconsistencies in findings could be that the present study is the first to take "before and after" coordination measures of possible mediators. It may be the case that CRM does not actually foster changes in the given variables and that previous studies simply found group differences across these variables as opposed to actual increases in mediators as a result of CRM. Alternatively it could be that the measures used here are not sensitive enough to be used as a before and after measure. Completion of the pre-test measures may have restricted participant's answers to post-test measures, therefore leaving participants unable or unwilling to give more natural responses which may have otherwise led to us finding increases in potential mediators. For the cohesion measure we saw a mean change score of 2.27 with a standard deviation of 5.63. For the overlap measure we saw a mean change score of 0.45 with a standard deviation of 1.3. Considering we find considerable variation in individual change scores, we do not believe this interpretation alone can explain our findings.

# Synchrony vs. Coordination (S+ vs. S-Models)

This experiment did not provide conclusive evidence that cooperation was improved by coordination more generally. Significantly greater cooperation was only seen after in-phase coordination compared to control. Anti-phase coordination did not promote greater cooperation than after control, however cooperation levels following anti-phase coordination did not significantly differ from cooperation levels following in-phase coordination either. While this might initially lend some support to the S+ class of models (synchrony, rather than coordination being required). Findings lead us to further question whether inphase synchrony is crucial? Anti-phase coordination is a stable form of coordination (Kelso, 1995), that has been shown to affect other pro-social variables (see Cirelli et al., 2014).

The findings of Kokal et al. (2011) might shed light on the conditions necessary for different coordinations to affect prosociality. They provide evidence that, only when a coordination is relatively easy to perform can we attend to the social nature of the task, which is crucial to the pro-social consequences which follow. Anti-phase coordination is known to be harder and more demanding than in-phase (Kelso, 1995), as was supported by the proportion-time-on-target results in this Experiment (See **Figure 3**).

One potential limitation of our task was the use of simple PLDs to transmit movement information. These displays are informative about the dynamics of a person's action (Johansson, 1950; Bingham, 1987) and the success of coordinated movements in particular (Wilson and Bingham, 2008; Wilson et al., 2010) However, with their attention focused on the PLDs instead of on their partner, the social context of the coordination task might have been attenuated. In other words, using the PLD's to coordinate might dilute the social context of the coordination task.

The fact that relevant social information may be harder to detect during anti-phase coordination might explain why antiphase coordination did not significantly differ from control. A follow up explores this possibility by having participants coordinate at both relative phases using direct visual information of each other's movements. This set up makes the social nature of the task more salient. If post task cooperation is higher following anti-phase coordination given this change, it would add further support for D− models, where an additional causally relevant factor (e.g., social context) is necessary for coordination to affect cooperation.

# FOLLOW UP 1

In this follow up, we used a modified version of the CRM task in which co-actors coordinated by looking at each other in a fulllength mirror instead of using PLDs. Only the two experimental conditions (in- and anti-phase) were run in order to test whether increased social information would allow cooperation following the anti-phase condition to reach the level seen after in-phase coordination in Experiment 1. It was hypothesized that coordinating via a mirror would allow anti-phase CRM to affect cooperation similarly to in-phase CRM.

# METHODS

# Participants

Forty-four psychology students at Leeds Beckett University volunteered to participate (8 males and 36 females, Mage = 19.86 year, SDage = 1.79). All participants were naive to the aims of the study. This study was approved by the Leeds Beckett University Psychology Ethics Review Board.

# Design, Measures, and Procedure

The design was identical to the in-phase and anti-phase conditions from Experiment 1 except that participants watched each other using a 6 ft mirror placed horizontally 1m in front of them, below the laptop screen so that they could each view both of their upper bodies. These data were compared to the corresponding conditions from Experiment 1 to see whether enriched visual social information influenced cooperation. This follow up employed an experimental design with one between-subjects factor: Movement Phase, with two levels inand anti-phase. This enabled us to analyse the coordination data using the superior proportion time-on-target measure. The remaining measures and procedure were identical to Experiment 1.

#### Cross et al. Moving Together Brings Us Together

# RESULTS

We first examined mood, task difficulty, task enjoyment and perceived success measures for these two new conditions to see whether these varied across conditions, using a series of Kruskal-Wallis tests (all data distributions non-normal, p's < 0.05). None of these variables differed between the in-phase and anti-phase groups (all p's > 0.05). It was therefore concluded that mood, task enjoyment, perceived task difficulty or perceived success did not contribute to the effects described below.

# Coordination

We investigated differences in coordination scores across conditions using proportion-time-on target as a measure of coordination. The distributions of those who coordinated using the PLD and mirror at both in- and anti-phase (p's < 0.05) all differed significantly from normality, and Levene's test indicated unequal variances (F = 15.95, p < 0.001). Transforming the data did not allow it to meet the normality or homogeneity assumptions. Since no non-parametric alternative to a 2-way ANOVA could be performed and Field (2013) advises that homogeneity violations are irrelevant if sample sizes amongst conditions are roughly equal (sample sizes per condition here are identical, n = 22), a univariate ANOVA was still used. There was only a significant effect of Movement Phase [F(1, 87) = 14.78, p < 0.001], with those who moved in-phase showing greater coordination (M = 0.591, SD = 0.016) than those who moved anti-phase (M = 0.507, SD = 0.016). The effect of Coordination Information [F(1, 87) = 2.45, p > 0.05] and the interaction [F(1, 87) = 0.73, p > 0.05] were not significant (see **Figure 4** for mean proportion-time-on-target scores). It was therefore concluded that only Movement Phase had a significant effect on coordination, with those coordinating in-phase performing more accurately than those coordinating at anti-phase. The type of available Coordination Information had no effect on coordination scores.

# Cooperation

We then explored how rhythmically coordinating at different relative phases via differing Coordination Information affected cooperation using a 2 way ANOVA. There was no main effect of either Coordination Information or Movement Phase (p's > 0.05). However, there was a significant interaction between the phase people moved at and the information they used to coordinate their movements [F(1, 84) = 4.18, p < 0.05, <sup>N</sup> <sup>2</sup> = 0.04]. People who coordinated anti-phase via a mirror cooperated more than people who coordinated anti-phase via PLDs. There was no effect of Coordination Information on cooperation when people coordinated in-phase (see **Figure 5** for the mean public account donations for each condition).

Next we conducted a simple linear regression with each pair's proportion-time-on target scores and each pair's average public goods donation, to determine if coordination success predicts cooperation. A pair's coordination score did not significantly predict a pair's average cooperation score [F(1, 86) = 0.16, p > 0.05, r <sup>2</sup> = 0.01].

# Potential Mediators (Group Cohesion and Self/Other Overlap)

Separate 2 Way ANOVA's were conducted for each of the potential mediators as reported in Experiment 1, no significant main effects of either Movement Phase or Coordination Information and no significant interactions were found in any of these analyses (all p's > 0.05).

# DISCUSSION

Participants coordinating at anti-phase were more cooperative if they coordinated via direct visual information of their partner's movements rather than via PLDs. In fact, those coordinating at anti-phase using the mirror saw cooperation levels comparable to participants in the in-phase condition. There was no such increase in effect for those coordinating in-phase using direct visual info. This supports the claim of Kokal et al. (2011) that the social nature of the task is an important element in why CRM has pro-social consequences (supporting a D− model), which can be obscured in more demanding tasks. This suggests that both inand anti-phase movements are capable of affecting cooperation under the right circumstances, favoring a S− model.

Coordination scores (proportion-time-spent-on-target) again did not significantly predict cooperation scores (supporting a P− model). There is still no evidence that coordination success is driving CRM's effect on cooperation, replicating the result from Experiment 1 and supporting work by Kirschner and Ilari (2014) and Launay et al. (2013).

Greater cooperation can therefore follow either in- and anti-phase CRM compared with uncoordinated movements. However, analyses of coordination scores have shown that actual coordination does not seem to be driving this effect. The degree of coordination does not successfully predict the degree of cooperation. So what is it about the CRM task that is driving differences in cooperation? What are the critical differences between the coordinated and uncoordinated versions of this task?

# FOLLOW UP 2

In the CRM task people make the same (horizontal) movements at a shared frequency (0.75 Hz), while in the control task people make different movements (circular and vertical) at different frequencies (0.6 or 0.9 Hz). This means there are two potential differences between the CRM task and the control, type of movement and frequency of movement. Having participants perform different movements is essential to break coordination in the control task, since research shows people will end up falling into one of the two stable phases of coordination when performing the same kinds of movement unless they are trained to achieve out-of-phase coordination (Kelso, 1995).

When engaging in CRM in everyday life (e.g., when dancing), people often coordinate different movements to the same overall rhythm. What is more, Lakens (2010) has shown that people judge coordinated rhythmically moving co-actors as more entitative (seeing each other more as a unified group than as disparate individuals) regardless of whether they are coordinating exactly the same movements or not. Therefore, in order to investigate whether coordinating different movements to the same rhythm could also affect cooperation, a further followup condition was run in which participants coordinated different movements but to the same frequency. This is compared with the original control and the original in-phase CRM conditions from Experiment 1. It was hypothesized that coordinating different movements to the same overall frequency would foster greater cooperation than performing uncoordinated movements.

# METHODS

# Participants

Twenty-two undergraduate students at Leeds Beckett University volunteered to participate (4 males and 18 females, Mage = 18.73 year, SDage = 4.32). All participants were naive to the aims of the study. This study was approved by the Leeds Beckett University Psychology Ethics Review Board.

# Design, Measures, and Procedure **Movement task**

Participants made different movements but at the same frequency (0.75 Hz). One participant moved the joystick vertically and the other in clockwise circles. Participants switched movements each trial. Otherwise the structure of the movement task was identical to the Control in Experiment 1. This condition (Coordinated) was then compared with the original in-phase (In-phase) and control condition (Control) from Experiment 1. With no defined target relative phase we analyzed coordination using MVL. The remaining measures and procedure were identical to those reported in Experiments 1.

# RESULTS

We first examined mood, task difficulty, task enjoyment and perceived success measures to see whether these varied across conditions using a series of Kruskal-Wallis tests (All data's distributions not normal, p's < 0.05). There was no significant effect of any of the above variables (all p's > 0.05). It was therefore concluded that mood, task enjoyment, perceived task difficulty or perceived success did not contribute to the effects described below.

# Coordination

We then investigated whether coordination scores differed across conditions using an independent samples Kruskal-Wallis test (recall coordination data previously failed normality tests). There was a significant effect of Movement Type on coordination scores [H(2) = 57.83, p < 0.001]. Pair-wise comparisons with adjusted p-values showed that those who moved In-phase coordinated significantly more than those in the Coordinated condition (U = 3.8, p < 0.001) and those in the Control (U = 7.60, p < 0.001). Those in the Coordinated condition coordinated significantly more than those in the Control (U = 3.8, p < 0.001). See **Figure 6** for the mean MVL scores.

# Cooperation

Next we examined the cooperation scores of those in the Coordinated compared with the original In-phase and Control conditions from Experiment 1. A univariate ANOVA was performed to see whether cooperation (mean public account donation) differed across the three movement conditions (Inphase, Coordinated and Control). There was a significant effect of

Movement Type [F(2, 63) = 5.69, p < 0.01 <sup>N</sup> <sup>2</sup> = 0.15]. Bonferroni post-hoc tests indicated that those who moved In-phase (M = 6.19, SD = 2.24) showed more post-coordination cooperation than those in the Control (M = 4.2, SD = 2.81, p < 0.05). Those in the Coordinated condition (M = 6.72, SD = 2.74) also showed more cooperation than those in the Control (p < 0.01). There was no difference in cooperation between those in the Coordinated condition and those who moved In-phase (p > 0.05). See **Figure 7** for the mean public account donations for each condition.

# Potential Mediators (Group Cohesion and Self/Other Overlap)

A univariate ANOVA and Kruskal Wallis test (recall previous normality scores) again confirmed that there were no significant differences in any of the candidate mediators between conditions (all p's > 0.05).

# DISCUSSION

The results of this follow up show that similar levels of cooperation are seen after coordinating different movements to a common frequency as are seen after in-phase coordination, despite levels of actual coordination being significantly lower. MVL scores show that coordinating different movements to a common frequency produced significantly less tight coordination than coordinating at in-phase but significantly tighter coordination than in the original control. This was not the pattern observed in cooperation, however. The Coordinated and In-phase conditions produced comparable levels of cooperation, and both showed higher cooperation than the Control condition.

These results suggest that people do not need to perform the same type of movements for coordination to have cooperative social consequences and emphasize again that tightness of coordination is not directly linked to the magnitude of cooperation (P− model). The important factor appears to be that they coordinate to a common rhythm. Verbal reports from participants in this new condition also indicated that participants felt they were coordinating their actions. Multiple participants in this condition reported that they were trying to coordinate one full cycle of their movements to a full cycle of the other's

movements (i.e., trying to complete one full up-down-up cycle on the time it took the other to complete a full circle).

This, along with the other findings reported in this paper, suggests that it is not moving at some particular phase, or a given tightness in coupling which fosters cooperation. Rather, the crucial factor appears to be just intentionally moving in time with somebody in a clearly social context, regardless of whether the same movements are performed or whether there is a specific phase locking.

# GENERAL DISCUSSION

The experiment and follow ups detailed here showed that those who perform a simple CRM task are more cooperative post-task than those who perform a control task. We also showed that similar effects obtain following anti-phase coordination and after coordinating different movements to the same overall rhythm. We found no evidence that the degree of coordination predicts the degree of cooperation, and no evidence that increases in group cohesion or blurring of self/other overlap were mediating CRM's effects on cooperation. The effects on cooperation seem to mostly stem from simply moving in time in a social context.

# Revisiting Model Classes

# Synchrony (S+) vs. Coordination (S−)

The results of Experiment 1 initially supported S+ models, with no significant effect of anti-phase movement on cooperation. However, the point-light displays we used only provided information about the coordinated rhythmic movement, and may detract from the social context. Increasing the salience of the social context by using mirrors led to anti-phase movements affecting cooperation to the same extent as inphase movements. In addition, different movements at the same frequency led to greater cooperation than different movements at a different frequency. The former are still coordinated in that they are matched in time (and participants reported working to coordinate this timing). Overall, these results suggest it is temporal coordination, and not just synchrony, which can lead to pro-social consequences and so future models should be of the S− class.

# Direct (D+) vs. Indirect (D−)

Across all three studies, we found no effects of any candidate mediating variable on cooperation. It's worth noting at this point that we only looked at interactions between pairs of coordinating co-actors, and different dynamics may be at play when groups of 3 or more engage in CRM. This may be especially relevant for the group cohesion findings, as group cohesion may not be an appropriate construct for two person groups. Petersen et al. (2004) suggest group cohesion is an inter-individual attitude derived from depersonalized liking on the basis of group prototypicality. In other words, group cohesion may not be an appropriate concept for a pair of individuals. Similarly, Hogg and Turner (1985) propose that group cohesion is unlikely to be explained in terms of very personal constructs of self and other, but in terms of more general social similarities with larger numbers of people. It may be the case that group cohesion is an important factor in groups of three or more, but is not an appropriate mediator between CRM and cooperation in two person groups as is seen here.

Alternatively it may be the case that we failed to see changes in potential mediators due to a testing effect confound. It is possible that including pre as well as post-test measures of mediators may have restricted participants post-test responses. We do not however believe that this is a likely explanation, since in other work (Cross et al., Submitted) increases in group cohesion amongst larger groups have been found using these test-retest measures.

Still, results reported here showed greater cooperation amongst pairs who had performed coordinated movement than those who had performed uncoordinated movement, which was not mediated by any of the variables suggested by the literature.

We did observe an effect of social context, whereby having visual access to one's partner during the coordination task was necessary to obtain an effect of anti-phase coordination on cooperation. This pattern of results supports a D− model and is consistent with previous work showing that coordination does not have positive social consequences if the coordination task does not have a social component.

# Predicting Individual (P+) or Group Level (P−) Effects

Again, we found no evidence that the quality of coordination between participants predicted the amount of cooperation they exhibited. In addition, there was no increase in coordination stability in anti-phase movements when co-actors coordinated via direct movement information, but cooperation did increase. Once people perceive that they are temporally coordinating in a social context, greater cooperation follows. This supports P- class models for future work.

# Limitations

The findings presented in this paper apply only to cases of intentional coordination. They may not necessarily generalize to instances of unintentional coordination. This remains an interesting point for future work to explore. A further limitation is that the results of Experiment 1 were analyzed in conjunction with both of the follow ups. These results are effectively exploratory and require independent replication.

# REFERENCES


# SUMMARY

The current studies demonstrated that people who engage in a simple CRM task are more cooperative post task than people who engage in a control task. By relying on a welldefined and well-understood CRM task (see Golonka and Wilson, 2012 for a review), we were able to systematically manipulate a variety of task-critical parameters. This level of control means that we were able to begin identifying properties that eventual explanatory models of CRMs effect on cooperation must possess. In summary, our results indicate that this effect (1) follows from coordination generally, not just in-phase synchrony, (2) is indirect, in that coordination must occur in a social context; but direct in that the effect does not depend on coordination causing changes in mediating variables, and (3) is not proportional to individual level coordination performance.

# ETHICS STATEMENT

Participants were provided with an information sheet explaining the nature of the research prior to making an appointment to participate in the research. When potential participants arrived at their appointment, they were provided with a detailed consent form detailing their rights as participants. They were free to ask the researcher any questions. If they were happy to continue, they signed the consent form and the experimental session began. The research was approved by the Leeds Beckett University Psychology Ethics Committee

# AUTHOR CONTRIBUTIONS

LC conducted the experiments and analyses reported in this paper as part of his Ph.D. under the supervision of SG (director of studies) and AW (second supervisor). All authors therefore contributed to the design and analysis of the studies and we all contributed equally to the writing.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2016.01983/full#supplementary-material

Bingham, G. P. (1987). Dynamical systems and event perception. Perception 2:4.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Cross, Wilson and Golonka. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Anisotropy and Antagonism in the Coupling of Two Oscillators: Concepts and Applications for Between-Person Coordination

#### Harjo J. de Poel\*

*Center for Human Movement Sciences, University Medical Center Groningen, University of Groningen, Groningen, Netherlands*

Coupled oscillators provide a pertinent model approach to study between-person movement dynamics. While ample literature in this respect has considered the influence of external/environmental constraints and/or effects of a difference between the two agents' individual component dynamics (e.g., mismatch in natural frequency), recent studies also started to more directly consider the interaction *per-se*. The current perspective paper sets forth that while movement coordination dynamics has mainly been studied alongside a model in which the coupling is considered isotropic (i.e., symmetrical; both oscillators coupled to same degree) or strictly unidirectional (e.g., for moving to a given external rhythm), between-agent coupling involves a natural anisotropy: components influence each other bidirectionally to different degrees. Furthermore, recent research from different areas has considered so-called antagonistic or "competitive" coupling, which refers to the idea that one component is positively coupled to the other (attractive interaction), while the coupling in the other direction is negative (repulsive interaction). Although the latter would be rather tricky to address in within-person coordination, it does have strong applications and implications for between-person dynamics, for instance in the study of competitive interactions in sports situations (e.g., attacker-defender) and conflicting social (movement) interactions. The paper concludes by offering a conceptual framework and perspectives for future studies on the dynamic anisotropic nature of the interaction in between-person contexts.

#### Edited by:

*Richard C. Schmidt, College of the Holy Cross, USA*

#### Reviewed by:

*Maurice Lamb, University of Cincinnati, USA Robin Nicolas Salesse, Montpellier University, France*

> \*Correspondence: *Harjo J. de Poel h.j.de.poel@umcg.nl*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *12 August 2016* Accepted: *28 November 2016* Published: *21 December 2016*

#### Citation:

*de Poel HJ (2016) Anisotropy and Antagonism in the Coupling of Two Oscillators: Concepts and Applications for Between-Person Coordination. Front. Psychol. 7:1947. doi: 10.3389/fpsyg.2016.01947* Keywords: joint action, interpersonal dynamics, synchronization, social interaction, rhythmic coordination

# INTRODUCTION

Between-person coordination generally entails some form of functional cooperative synergy (Riley et al., 2011). Such collaborative coordination involves natural asymmetries, for example due to differences between the individual components. Indeed, ample literature examined coordinative performance as dependent on, for instance, a mismatch in natural frequency (i.e., "detuning;" e.g., Richardson et al., 2007) or movement amplitudes (e.g., de Poel et al., 2009; Fine, 2015). Broken symmetry also exists regarding the interaction itself (Treffner and Turvey, 1995; de Poel et al., 2007), yet this received considerably less attention in the coordination dynamics literature (see also Lagarde, 2013). The present perspective article therefore aims to highlight the study of such interactional asymmetries. Specifically, anisotropic (i.e., components influence each other bidirectionally to different degrees) and antagonistic coupling (i.e., one component attracts while the other repels) are deliberated in the context of dyadic between-person coordination. The paper concludes with a conceptual framework that may offer entry points for scientific engagements in this regard.

# TWO COUPLED OSCILLATORS

The study of between-person coordination dynamics eminently draws from a pertinent model of coupled oscillators (Haken et al., 1985) known as the HKB-model (for a historic overview, see Schmidt and Fitzpatrick, 2016). While this model was originally developed for rhythmic bimanual coordination (i.e., withinperson coordination), to date many studies have underwritten that between-person coordination abides by similar coordinative phenomena and principles (for reviews, see Schmidt and Richardson, 2008; Schmidt et al., 2011). Importantly, the component oscillators and coupling functions of the system are formulated such that it analytically constitutes a potential function that describes the attractor landscape of the collective behavior in terms of the phase difference (ϕ), capturing attractors at in-phase (ϕ = 0 ◦ ), and antiphase behavior (ϕ = 180◦ ) and their differential stability (Haken et al., 1985).

The general idea behind the coupled oscillator model is as follows. Two limit cycle (cf. self-sustaining) oscillators (reflected by subscript i = 1 or 2), each depicted by a second order differential equation are coupled following the general expression,

$$\begin{aligned} \ddot{\mathbf{x}}\_1 + f(\mathbf{x}\_1, \dot{\mathbf{x}}\_1) &= I\_{12} \\ \ddot{\mathbf{x}}\_2 + f(\mathbf{x}\_2, \dot{\mathbf{x}}\_2) &= I\_{21} \end{aligned} \tag{1}$$

in which x<sup>i</sup> , x˙<sup>i</sup> , and x¨ <sup>i</sup> reflect the position, velocity, and acceleration of the individual oscillators, respectively (because the present paper focuses on coupling, we assume identical oscillators), and I<sup>12</sup> and I<sup>21</sup> depict interaction functions that reflect the coupling between the two oscillators. Note that the couplings in I<sup>12</sup> and I<sup>21</sup> are a function of the difference between oscillator 1 and 2 in terms of their state variables (i.e., x<sup>i</sup> and/or x˙i), such as

$$I\_{12} = \eta\_1(\dot{\mathbf{x}}\_1 - \dot{\mathbf{x}}\_2)$$

$$I\_{21} = \eta\_2(\dot{\mathbf{x}}\_2 - \dot{\mathbf{x}}\_1) \tag{2}$$

(c.f., Astakhov et al., 2016), or as modeled by Haken et al. (1985) velocity- and position-dependent interaction of the form (see also Daffertshofer et al., 1999)

$$I\_{12} = \eta\_1 \left(a\_1 + b\_1(\mathbf{x}\_1 - \mathbf{x}\_2)^2\right) \left(\dot{\mathbf{x}}\_1 - \dot{\mathbf{x}}\_2\right)$$

$$I\_{21} = \eta\_2 \left(a\_2 + b\_2(\mathbf{x}\_2 - \mathbf{x}\_1)^2\right) \left(\dot{\mathbf{x}}\_2 - \dot{\mathbf{x}}\_1\right) \tag{3}$$

Regarding the purposes of this perspective article, we solely focus on general notions that can be derived and do not further consider the exact mathematical formulations. The first general notion from Equations (2) and (3) is that coupling coefficient η<sup>i</sup> sizes the degree (or strength) of the coupling (for related modeling strategies in this context, see Varlet et al., 2012; Withagen et al., submitted). Obviously, when η<sup>i</sup> = 0 there is no coupling whatsoever and the oscillators behave completely independently. Higher values of η<sup>i</sup> imply stronger overall coupling and thus enhanced attractor stability at the relative phase level (Haken et al., 1985). When I<sup>12</sup> and I<sup>21</sup> are entirely identical the coupling is perfectly symmetric (such as assumed by Haken et al., 1985, who aimed at deriving a minimal model) meaning that both components influence one another to the same degree, as schematically illustrated in **Figure 1A**. However, while most previous studies on movement coordination adhered to this assumption (deliberately or not), the next paragraph will highlight that the coupling is anisotropic of nature and that such interactive asymmetry is substantial for understanding between-person coordination (see also Lagarde, 2013).

# ANISOTROPIC COUPLING

Interaction between the components can be stronger in one direction than in the other, which implies an asymmetry in the strength of the coupling, hence anisotropic coupling (Peper et al., 2004; de Poel et al., 2007). From the preceding paragraph we can already see from Equations (2) and (3) that

respectively (see the online article for a colored version of this figure).

perfect isotropic coupling is an exceptional case: any other combination of coefficient values would yield I<sup>12</sup> 6= I<sup>21</sup> and hence capture anisotropic coupling. This is schematically illustrated in **Figure 1B**. For bimanual coordination, anisotropic coupling has been related to hand dominance, which for instance yields a coordination pattern in which the dominant hand is slightly though systematically ahead of the non-dominant in terms of its movement phase (Treffner and Turvey, 1995; de Poel et al., 2007).

The anisotropy can obviously take different degrees. For instance, handedness-related anisotropy is less pronounced in left-handers than in right-handers (de Poel et al., 2007). Still, both limbs mutually influence each other: the interaction is clearly bidirectional, be it with a certain degree of dependencyunevenness (**Figure 1B**). Increasing the coupling anisotropy toward the extreme form yields strict unidirectional coupling, in which one component is influenced by the other, with no coupling whatsoever in reverse direction (e.g., when η<sup>1</sup> 6= 0 while η<sup>2</sup> = 0, or vice versa). This situation essentially comes down to a forced oscillator ("master-slave;" see **Figure 1C**).

# "Leader-Follower" Dynamics

As in bimanual coordination (cf. de Poel et al., 2007), in dyadic coordination perfect symmetric interaction is the exception rather than the rule. To illustrate, a natural task such as crew rowing involves various sources to support interaction, amongst which a mechanical/haptic link via the boat that conveys more symmetrically, while the visual coupling is clearly asymmetric as the bow rower can see the movements of the stroke rower but not vice versa. The latter also draws in an explicit role division: the stroke rower sets the pace for the other rower(s) to adhere to (de Poel et al., 2016). Recently, researchers have started to examine such interactional directionalities in between-person settings, mainly in context of leader-follower relations (e.g., Konvalinka et al., 2010; Vesper and Richardson, 2014), such as in the context of a "mirror game" (Noy et al., 2011; Słowinski et al., 2016 ´ ), of which some studies specifically pertained to (or referred to) a dynamic model of anisotropic/asymmetric coupling (Varlet et al., 2012; Meerhoff and de Poel, 2014; Fine, 2015; Richardson et al., 2015). Importantly, between-person coordination typically entails bidirectional "leader-follower" interaction rather than strict unidirectional "master-slave" dependency (e.g., Meerhoff and de Poel, 2014).

Regarding anisotropic interaction, some studies examined dyads in which agents differed in terms of their social competences or interactive skills (e.g., Schmidt et al., 1994; Varlet et al., 2012). Another way is to experimentally impose leader-follower conditions explicitly through instructions (e.g., Ducourant et al., 2005; Noy et al., 2011), or implicitly through reducing/precluding access to information in one direction (e.g., Meerhoff and de Poel, 2014; Reynolds and Osler, 2014). At the level of relative phase dynamics, anisotropic coupling predicts a specific lead-lag in the phase relation: the component that experiences the strongest coupling influence of the other is lagging (Treffner and Turvey, 1995; de Poel et al., 2007). In line, in between-persons experiments the "sighted" agent typically lags the "blind" (Meerhoff and de Poel, 2014; Reynolds and Osler, 2014).

When leader-follower situations are not explicitly dictated, isotropic coupling might be expected. It appears nothing is less true: implicit heuristic strategies seem to emerge that facilitate a "spontaneous" division in interactional roles of the dyad-members (Vesper et al., 2011; Richardson et al., 2015). In line, Meerhoff and de Poel (2014) found that even in the symmetric condition of their experiment, between-person coupling exhibited clear anisotropy for 70% of the examined pairs, indicating that there was almost always a "dominant interactor" within each pair. Such "intrinsic" leader-follower configuration may relate to the social dominance of one of the dyad-members (Schmidt et al., 1994). Furthermore, findings from interpersonal sway showed that in a situation where both dyad-members could see each other (i.e., symmetric visual coupling), cross-correlations of the sway patterns always involved a lag toward either side, whereas correlation was absent at lag zero (Reynolds and Osler, 2014). This also illustrates how in data analysis such asymmetries may be obscured due to averaging procedures.

Experiments on between-person coordination have mainly adopted tasks involving visual and/or auditory interface (for overviews, See Section Anisotropic Coupling of Repp and Su, 2013 and Schmidt and Richardson, 2008). Such perceptual coupling relies on an agent's sensitivity to, or ability to detect interaction-relevant information (Meerhoff and de Poel, 2014). Also, devoting less attention (Richardson et al., 2007) or simply closing the eyes (Oullier et al., 2008) would drastically diminish entrainment. In other words, anisotropic coupling may mainly reside in one oscillatory component being more susceptible to the interactional sources ("follower") than the other ("leader"; de Poel et al., 2007), while an agent can also (whether or not intentionally) modulate the coupling influence inflicted on him/her (Withagen et al., submitted).

Together, these findings stress that between-person interaction is rarely symmetric and that typically one agent "leads the dance." This notion is particularly interesting given that anistropically coupled oscillator dynamics may imply more stable coordinative attractors compared to the isotropic situation (provided overall coupling remains at same level, See Section Considerations and Perspectives and Treffner and Turvey, 1995; de Poel et al., 2007). In line, similar coupling asymmetries have been demonstrated to prosper performance of complementary joint action like a collision-avoidance task (Richardson et al., 2015). Together, this may provide incentives for why "leaderfollower" collaboration may be beneficial over perfectly balanced interpersonal interaction.

# ANTAGONISTIC COUPLING

The preceding pertains to collaborative situations in which "leader" and "follower" cooperate toward a common task and/or to spontaneous interpersonal entrainment, in which two agents attract (to a certain, likely imbalanced degree) into one another's behavior. Most studies on movement coordination dynamics considered one (or both) of these scenarios of mutual attraction. Coupling influence can however also be repulsive or inhibitory (Kawahara, 1980; Kelso et al., 2009; Hong and Strogatz, 2011; Astakhov et al., 2016; Avitabile et al., 2016). Such repulsive interaction could for instance be modeled through setting the coupling coefficient η<sup>i</sup> < 0 (Astakhov et al., 2016; note that Kelso et al., 2009, used a similar though slightly different modeling strategy). Hence, a high degree of repulsive coupling would reflect that the component is highly susceptible to coupling influence while inflicting repelling effect. Here, we specifically consider antagonistic coupling, which holds that one component attracts (positive coupling) while the other repels (negative coupling). It is principally a special case of anisotropic coupling with the inclusion of repulsive interaction, as schematically illustrated in **Figure 1D**.

In the context of between-person coordination antagonistic coupling is particularly relevant, as it may refer to conflictive social interactions (e.g., Liebovitch et al., 2008) or competitive opposition such as in sport (e.g., McGarry et al., 2002; Palut and Zanone, 2005). Note that the latter involves competitive attackerdefender rather than cooperative leader-follower interaction. As a simplified explication, in a truly competitive situation a defender aims to follow the attacker's movements (hence attraction to the attacker) while an attacker wants to behave diametrically opposed of what the defender does (hence repulsion from the defender). In other words, one agent looks to maintain the interactional balance while the other aims to break it. Interestingly, in a study of Kelso et al. (2009) movements of an avatar hand were real-time coupled to human hand movements through HKB-equation (i.e., according Equation 3), which allowed to examine "exotic" coupling parameter settings such as "reversed" coupling: The human was instructed to move inphase with the avatar, while the avatar was programmed so as to achieve antiphase coordination, reflecting "conflict of intention." Moreover, numerical simulations of HKB-coupled oscillators (viz. Equations 1 and 3) with repulsive coupling revealed that coordination was repelled from in-phase and antiphase, and instead converged toward 90◦ and/or −90◦ phase relations.

Also, Avitabile et al. (2016)recently demonstrated numerically that the HKB-model can indeed yield relative phase dynamics beyond in- and antiphase bistability, depending on the parameter regime adopted for the oscillator and coupling equations. In particular, they demonstrated that specific coefficient settings including a repulsive coupling can yield stable solutions shifting away from 0◦ and 180◦ toward 90◦ and −90◦ relative phase. Although they examined the model parameter settings in symmetric/isotropic fashion, these results may likely generalize toward antagonistic coupling, especially when broadening parameter ranges even further. This is an interesting route to explore in future studies. Furthermore, relevant for the present paper and according Frontiers Research Topic, Avitabile et al. (2016) also specifically discussed their modeling results visà-vis the potential interpretations regarding between-person dynamics.

Recently, we explored whether signs of antagonistic coupling could be observed in competitive dyadic interaction in sports (de Poel et al., 2014; see also McGarry and de Poel, 2016). We analyzed long baseline rallies taken from footage of official tennis matches at the highest competitive level (Association of Tennis Professionals tournaments). Relative phase was calculated from the lateral positions of both players on the tennis field (Palut and Zanone, 2005). Analysis of this data revealed high occurrence of in-phase and moreover even higher occurrence close to −90◦ and 90◦ relative phase. In hindsight, similar distributions appeared to be reported previously for squash data (McGarry, 2006) but were not interpreted vis-à-vis antagonistic coupling at the time. Further inspection of the tennis data showed that rallies consisted of periods in which the opponents appeared to balance their interaction (i.e., relative phase around 0◦ ) and periods of clear competitive movement interaction (relative phase close to 90◦ and −90◦ ). Notably, over the course of a rally the phase relation seemed to switch between these stages, likely reflecting that the odds change back and forth within rallies: sometimes one player dominated the rally ("attacker-defender": 90◦ ) whereas at other instances the other player dominated ("defender-attacker": −90◦ ), alternated with short periods of balance in which none of the opponents attempted to perturb the rally ("defenderdefender": 0◦ ). A detailed report of these data will be provided elsewhere in a forthcoming paper.

# CONSIDERATIONS AND PERSPECTIVES

The preceding provides incentives for capturing and examining anisotropic coupling in the context of between-person coordination. Especially the idea of antagonistic coupling may offer novel insights for future analyses in this respect (cf. Kelso et al., 2009). To bolster such endeavors the paper concludes with a general schematic overview that captures the issues raised.

**Figure 2** graphically illustrates the coupling strength between the components in its proposed forms. The horizontal axis

represents the degree of interaction inflicted on component 1 (I12) and on the vertical axis the coupling strength in the other direction is depicted (I21). Before we commence it is important to note that the interaction strength I and, thus, anisotropy therein is not solely defined by coupling coefficients, as it is a function of the individual oscillators (in terms of state variables x<sup>i</sup> and/or x˙<sup>i</sup> , see Equations 2–3). Indeed, in the HKB-model the coupling strength is strongly dependent on the movement amplitudes of the individual oscillators (Peper and Beek, 1999). Accordingly, in experiments with humans, amplitude disparity has been demonstrated to imply coupling anisotropy to a rather high degree (Peper et al., 2008). Hence, a difference between the individual component characteristics can involve an implicit coupling anisotropy, though the reverse is not necessarily true.

Returning to **Figure 2**, for any scenario the betweencomponent coupling can be conceived as a point within the I12–I<sup>21</sup> coordinate frame. Larger Euclidean distance from the origin indicates stronger interaction. The origin of the coordinate system (indicated by the star) evidently reflects the situation where there is no coupling (I<sup>12</sup> = I<sup>21</sup> = 0) and the diagonal in the second quadrant represents perfectly symmetric, isotropic coupling (I<sup>12</sup> = I<sup>21</sup> > 0) as discussed in Section Two Coupled Oscillators. The horizontal and vertical axis edging the second quadrant relate to unidirectional coupling (I<sup>12</sup> = 0 while I<sup>21</sup> > 0, or vice versa, See Section Anisotropic Coupling). The majority of previous studies on betweencomponent movement coordination revolved their assumptions and/or inferences regarding coupling along this diagonal and/or these axes. The rest of the second quadrant reflects anisotropic coupling (Section "Leader-Follower" Dynamics). Note that while the anisotropy can be fairly large (i.e., further separated from the diagonal) the overall coupling can be stronger or weaker (i.e., further from/closer to the origin). This graphically illustrates that although stronger anisotropy

# REFERENCES


may yield stronger coordinative attractor stability (Treffner and Turvey, 1995; de Poel et al., 2007) the latter primarily depends on overall interaction strength (cf. final paragraph of Section "Leader-Follower" Dynamics). Lastly, the other three quadrants depict repulsive coupling situations where at least one of the coupling influences I is negative. Specifically, antagonistic coupling (Section Antagonistic Coupling) is delineated in the first quadrant (I<sup>12</sup> > 0 while I<sup>21</sup> < 0; here the coupling influence acting on oscillator 2 is repulsive, hence agent 2 could be labeled as 'attacker') and fourth quadrant (I<sup>12</sup> < 0 while I<sup>21</sup> > 0; here agent 1 is the 'attacker'). The third quadrant considers a situation in which interaction is repulsive in both directions, which was beyond the scope of the present paper and remains to be further investigated.

In sum, the present study offered a brief overview for the perspective that (1) between-person coupling is typically anisotropic, and (2) can also take repulsive/antagonistic shapes. The presented conceptual framework may provide incentives for further study of coupled oscillator models (e.g., in terms of analytical and/or numerical examination of anisotropic and antagonistic coupling settings) and related empirical examinations. For instance, the antagonistic experimental design of Kelso et al. (2009) may be translated to an agent-agent (rather than agent-avatar) situation, where one participant gets the instruction to move in-phase with his/her partner, while the other gets an antiphase instruction. Notably in this context, compared to within-person coupling, between-person coupling arguably allows for (empirical examination of) a larger variety of coupling settings (see also Avitabile et al., 2016), like antagonistic coupling.

# AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.

of Abstracts, eds H. J. de Poel, C. J. C. Lamoth, and F. T. J. M. Zaal (Groningen: Netherlands), 49.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 de Poel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Playing 'Pong' Together: Emergent Coordination in a Doubles Interception Task

Niek H. Benerink1,2, Frank T. J. M. Zaal<sup>3</sup> , Remy Casanova<sup>1</sup> , Nathalie Bonnardel<sup>2</sup> and Reinoud J. Bootsma<sup>1</sup> \*

1 Institut des Sciences du Mouvement, Aix-Marseille Université, CNRS, Marseille, France, <sup>2</sup> PsyCLE, Aix-Marseille Université, Aix-en-Provence, France, <sup>3</sup> Center for Human Movement Sciences, University Medical Center Groningen, University of Groningen, Groningen, Netherlands

In this contribution we set out to study how a team of two players coordinated their actions so as to intercept an approaching ball. Adopting a doubles-pong task, six teams of two participants each intercepted balls moving downward across a screen toward an interception axis by laterally displacing participant-controlled on-screen paddles. With collisions between paddles resulting in unsuccessful interception, on each trial participants had to decide amongst them who would intercept the ball and who would not. In the absence of possibilities for overt communication, such team decisions were informed exclusively by the visual information provided on the screen. Results demonstrated that collisions were rare and that 91.3 ± 3.4% of all balls were intercepted. While all teams demonstrated a global division of interception space, boundaries between interception domains were fuzzy and could moreover be shifted away from the center of the screen. Balls arriving between the participants' initial paddle positions often gave rise to both participants initiating an interception movement, requiring one of the participants to abandon the interception attempt at some point so as to allow the other participant to intercept the ball. A simulation of onthe-fly decision making of who intercepted the ball based on a measure capturing the triangular relations between the two paddles and the ball allowed the qualitative aspects of the pattern of observed results to be reproduced, including the timing of abandoning. Overall, the results thus suggest that decisions regarding who intercepts the ball emerge from between-participant interactions.

Keywords: joint-action, coordination, decision-making, collaboration, interpersonal coordination, perceptionaction, team, interception

# INTRODUCTION

Actions in our daily life often involve others. Whether we are shaking someone's hand, moving a table together or walking on a crowded pavement, we have to coordinate our actions with those of other individuals. Such social coordination, whether it is intentional or spontaneous, often requires decisions about the behavior that we should perform or, in some cases, we should not perform. For instance, safe driving dictates that when two drivers simultaneously approach an intersection one should cross first and the other should wait. Likewise, two individuals loading a dishwasher should take their turns when placing the dishes. Besides interacting with one another,

#### Edited by:

Michael J. Richardson, University of Cincinnati, USA

#### Reviewed by:

Leonard James Smart, Miami University, USA Bruno Travassos, University of Beira Interior, Portugal

> \*Correspondence: Reinoud J. Bootsma reinoud.bootsma@univ-amu.fr

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 12 August 2016 Accepted: 22 November 2016 Published: 06 December 2016

#### Citation:

Benerink NH, Zaal FTJM, Casanova R, Bonnardel N and Bootsma RJ (2016) Playing 'Pong' Together: Emergent Coordination in a Doubles Interception Task. Front. Psychol. 7:1910. doi: 10.3389/fpsyg.2016.01910

these situations typically demand a decision of who performs an action and who does not. It is such joint decision making among individuals in goal-directed joint activities that we address in the present study. To do so, we started from a pertinent example in a sports context: serve reception in beach volleyball. When facing a serve, only one of the two players of a team should perform the actual serve reception. The non-receiving player should not interfere during the interceptive action of the teammate, while, at the same time, preparing a follow-up action. How do such individuals coordinate their actions and decide who will intercept the ball? In this contribution, we captured the essential characteristics of the beach volleyball situation in a task in which two participants play "doubles pong." The participants' task was to ensure that on each trial one of them intercepted the approaching target. Like in the situation of serve reception in beach volleyball, the players have to decide together who will be the one performing the interceptive action (and who will not). We are interested in the way the decision of 'who intercepts the balls where' is shaped and how such joint decision making may best be captured.

Rather than focusing on the neural processes that are involved in decision making within each individual (e.g., Cannon-Bowers et al., 1993; DeSoto et al., 2001; Bogacz et al., 2006; Cisek, 2007; Resulaj et al., 2009; Lepora and Pezzulo, 2015), here we consider the system of the two individuals and their environment (cf. Araújo et al., 2006; Marsh et al., 2006; Richardson et al., 2010; Theiner et al., 2010; Riley et al., 2011; Schmidt et al., 2011; Coey et al., 2012). We study how the coordinated behavior of this system gives rise to a distribution over the individuals of interception activities. Instead of understanding decisions as mental operations that precede action, we see the act of deciding as the emergent behavior of the system of the individuals and environment resulting in (un)successful task performance (cf. Turvey and Shaw, 1995; Araújo et al., 2006; Travassos et al., 2012; Barsingerhorn et al., 2013). Understanding decision making among individuals as emergent is in line with a dynamicsystems approach initially developed to account for intrapersonal coordination of rhythmic movements (e.g., Haken et al., 1985; Kugler and Turvey, 1987). From a dynamic-systems perspective on human movement the goal is to identify general laws and patterns that govern the causal unfolding of a system's behavior rather than looking for neurophysiological areas that generate behavior (Kelso, 1995). Importantly, the stability principles underlying the emergence of coordination in a system of coupled oscillators have been demonstrated to operate whether the coupling is neural (Kelso et al., 1981), mechanical (Bardy et al., 1999, 2002) or informational (Schmidt et al., 1990; Schmidt and O'Brien, 1997; Richardson et al., 2007). That is to say, the same phenomena related with stability of patterns are found when a single person coordinates two body parts and when two persons contribute one body part each to the coordination (Schmidt and O'Brien, 1997; Richardson et al., 2007; see Schmidt and Richardson, 2008, for a review). This similitude principle indicates that the dynamic-systems approach can account for interactions at different behavioral levels, independent of the nature of the connections between the system's components (i.e., neural, mechanical or informational). Whereas most of the studies addressing the dynamics of joint actions concerned non-functional or stereotyped oscillatory limb or whole-body movements (such as swinging legs or rocking chairs together, Schmidt et al., 1990; Richardson et al., 2007), a few studies have shown that the interactive behavior of two individuals can also account for the observed coordinated patterns in more goal-directed tasks (Mottet et al., 2001; Richardson et al., 2015; Romero et al., 2015).

The shared goal of the players in a beach-volleyball situation is that the approaching serve will be intercepted by one of the two. In order to understand the dynamics of joint decision-making in such a cooperative goal-directed interception task, in the doubles-pong task adopted here we explored how a team's task performance might emerge from the interactions between participants. For the present purposes, potential interactions in this video-game-like task were restricted to be uniquely information-based: without any other form of communication being available, participants only shared vision of the task space (i.e., screen) in which the target and individual participantcontrolled interception paddles moved. With each of the two paddles being moreover confined to one-dimensional movement along a common interception axis, the task design ensured that successful interception could only be achieved by a single participant: contact between the two paddles immediately eliminated all future possibilities for interception. Because the task of the team of players involves the interception of the ball by one of them, and this lateral interception closely resembles tasks that have been studied extensively before (e.g., Peper et al., 1994; Michaels et al., 2006; Ledouit et al., 2013; Bootsma et al., 2016), we expect that the current study might serve as a stepping stone for identifying informational variables that may underlie team behavior.

# MATERIALS AND METHODS

# Participants

A group of 12 right-handed students from the University of Aix-Marseille, eight men and four women with an average age of 19.6 ± 1.0 years (M ± SD), took part in the experiment. They all provided written consent before participating voluntarily in our study. The study was approved by the local institutional review board of the Institute of Movement Sciences (Comité Ethique de l'Institut des Sciences du Mouvement d'Aix-Marseille Université) and conducted according to University regulations and the Declaration of Helsinki.

# Task

The experiment consisted of three consecutive sessions in which participants were to manually intercept virtual balls moving downward across a screen. A ball could be intercepted by moving an on-screen paddle laterally over an invisible horizontal interception axis at the bottom of the screen. During the first experimental session participants intercepted balls individually (**Figure 1A**). This session served to familiarize participants with the experimental set-up. In addition, by counting the number of intercepted balls, we obtained a measure of how

experimental sessions. Screen dimensions and other metrics are in cm. Note that the figures are not scaled to actual size. Balls appeared at the top of the screen (Y = 64) and moved downward toward the interceptions axis (Y = 0) at one of two constant vertical velocities. Gray triangles indicate the range of potential ball arrival positions. (A) During the first session (S1) participants intercepted balls individually. The situation depicted here represents the initial conditions for LP. (B) In the second session (S2) participants were assisted by a stationary partner, incorporated by a static paddle covering the final 24 cm of the range of potential ball arrival positions on the opposite side of the interception axis. The situation depicted represents the initial conditions for LP. (C) During the third session (S3) participants intercepted balls in dyads where LP started on the left side of the screen and RP started on the right side of the screen.

well individual participants performed the interception task. The second experimental session was, again, an individual-participant session. This time, however, participants were assisted by a static

"partner" incorporated by a large stationary paddle located at the opposite side of the interception axis (**Figure 1B**). Balls arriving at the stationary paddle were returned upward and counted as a successful interception. Participants had to avoid touching the static paddle; on-screen contact immediately led both paddles to disintegrate and interception was no longer possible. In the third experimental session participants performed the interception task in teams (**Figure 1C**). We composed teams of two participants with similar interception scores on the first two sessions. Like in Sessions 1 and 2, participants were able to move all along the interception axis and, comparable with Session 2, they should avoid touching one another; both paddles would disintegrate if they did. No communication in any form was allowed.

# Experimental Set-Up

The experiment took place in a darkened room without windows. **Figure 2** present the experimental setting for the session in which two participants performed the task together. Participants were seated at one of the two possible seats on one end of a table. They were facing a large television screen (Samsung 55<sup>00</sup> LED ED55C, with a 1920 × 1080 pixels resolution) that was positioned 2 m away at the other end of the table. When seated, the participants faced the screen at eye height. Six participants were always seated

at the right side of the table during each of the three sessions and are referred to as Right-side Participants (RPs); the other six participants always sat left and are referred to as Left-side Participants (LPs).

Using their right hand, participants displaced the on-screen paddle by laterally displacing a handheld knob on top of an in-house-constructed linear positioning device placed on the table in front of them. The knob was firmly attached to a small (3 by 6 cm) aluminum cart that could slide along two (75-cm long) parallel iron bars. The cart's position was sampled at a frequency of 100 Hz using a linear magnetic potentiometer (MP1-L-0750-203-5%-ST, Spectra Symbol, West Valley City, UT, USA) connected to the computer (HP ZBook 15) controlling the experiment. The digitally sampled electrical output of the potentiometer was converted by in-house developed ICE <sup>R</sup> (ISM, Aix-Marseille Université, France) software into a paddle position using a constant gain, such that the two extreme knob positions corresponded to (virtual) screen positions slightly beyond the physical screen. This allowed participants to cover the full (121 cm) length of the interception axis on the screen without ever reaching the extremities of the 75-cm long positioning device. Unless specified otherwise, positions and distances reported from here on correspond to distances on the screen, with the origin corresponding to the horizontal center of the interception axis. The screen thus extended horizontally (X-axis) from −60.5 cm to +60.5 cm and vertically (Y-axis) from −2 cm to +66 cm.

# Procedure

Participants had to intercept virtual (2-cm diameter circles) white balls depicted against a black background, moving downward across the screen at various angles and speeds, by making them bounce back upward after contact with their white (3-cm wide and 0.8-cm high) paddle.

For a trial to start, participants moved the paddle to a designated start position (see **Figure 1**) positioned at ±21 cm from the center of the screen in Session 1 and at ±30.25 cm from the center of the screen in Sessions 2 and 3. Start positions were marked by a 3-cm wide translucent red rectangle that would turn green when the center of the paddle was located at a horizontal distance of less than 0.3 cm from the center of the rectangle. After the participant(s) had remained in place for 2 s, the rectangle disappeared and after another 2 s the appearance of a ball at the top of the screen marked the beginning of the trial. Balls immediately moved downward with vertical velocities of 0.40 or 0.64 m/s corresponding to movement durations until reaching the interception axis of 1.6 and 1.0 s, respectively.

Ball trajectories were constructed with the use of five standard ball departure positions (Y = 64 cm) and five standard arrival positions (Y = 0 cm), both at X = −42, −21, 0, +21 and +42 cm. Combining the five departure positions with the five arrival positions gave rise to a total of 25 standard rectilinear trajectories. To avoid participants becoming familiarized with the arrival positions of the ball, a random distance between −10.5 cm and +10.5 cm was added to both the standard departure and arrival positions of a trajectory. This way, balls could appear and arrive everywhere between X = −52.5 cm and X = +52.5 cm while trajectory angles were kept the same. In a single block, all 25 trajectories appeared with two different vertical ball velocities, for a total of 50 fully randomized trials per block. All participants performed four blocks per session, adding up to a total of 200 trials per participant in a 1-h session.

Successful interception required that the paddle touched the ball when it crossed the interception axis. After a successful interception, the paddle turned green and the ball moved back up. In an unsuccessful trial the ball continued moving downward and the paddle turned red. Three seconds after ball arrival at the interception axis, the paddle returned to its original white color and the translucent red triangle would appear again to indicate the start of a new trial.

All sessions started off with ten practice trials. During these practice trials participants were asked not only to intercept a number of balls but also to purposely miss a ball so they would have experienced all the possible actions and their outcomes. In Sessions 2 participants were also asked to touch the stationary paddle during a trial, so as to experience what would happen if they did during the experiment. For the proper experimental sessions participants were instructed to intercept as many balls as possible, without any further information being provided. To motivate the participants, the experiment was organized as a competition in which all participants competed anonymously.

In Session 3 participants were seated next to each other (see **Figure 2**). They were separated by a black cloth, hanging from the ceiling, that effectively prevented each participant from seeing (any part of) the other. Moreover, they wore headphones (3M Peltor Optime2) and earplugs (DEXTER Lm30215-10) so that they could not hear each other either. No communication in any form was allowed (both before and during the experiment). The participants were explicitly instructed that the number of interceptions per individual did not matter and that their performance as a team was the only thing that counted.

Kinematic data of the participants' paddles and the ball was sampled at a frequency of 100 Hz and stored on an external disk. Along with the kinematic data, we registered trial characteristics including whether a participant intercepted the ball or not and, in Sessions 2 and 3, the time of a collision, if any. Before further analysis, the kinematic data was filtered with a second-order Butterworth filter with a cut-off frequency of 5 Hz run through twice in order to negate the phase shift (Ledouit et al., 2013).

# Dependent Measures

Interception scores were calculated per block as the percentage of balls intercepted from the total number of 50 balls presented. The score used to assemble the teams was the mean value of interception scores obtained during the first and second individual sessions.

Movement initiation time was defined as the first moment a participant crossed a velocity threshold of 3.0 cm/s provided that the participant's movement amplitude reached at least 1 cm. Based on this criterion we determined for each individual trial whether, and if so when, a participant initiated a movement. Velocity-time series were obtained using a three-point central difference method. Peak velocity was determined as the maximum velocity reached during a movement.

Defining angles βLP and βRP according the definition provided in **Figure 3A**, allowed deriving time-series of the rates of changes of these angles (i.e., angular velocities) for the LP and the RP. As demonstrated in **Figures 3B–D**, the manner in which a participant's paddle movement affects the pattern of change of the angle β (i.e., the state of the angular velocity, AV) is lawfully related to the future outcome of the ongoing action (also see, for instance, Fajen and Warren, 2004). As we will detail later, this prospective character of the (visual) information provided by the LP's and RP's angular velocities may be used to develop an account of emergent decision making.

# RESULTS AND DISCUSSION

# Performance on the Task

We begin by examining performance on the interception task, operationalized by the percentage of balls intercepted, in each of the three experimental sessions (see **Table 1**).

During the first session individual participants had to cover the full 105-cm range of potential ball arrival positions with their paddle initially positioned at an eccentricity of 21 cm (to the left for the LPs and to the right for the RPs) with respect to the center of the screen. With an average interception performance

TABLE 1 | Interception scores of the 12 individual participants in Sessions 1 (S1) and 2 (S2) and the six teams in Session 3 (S3), together with the number of collisions observed in Sessions 2 and 3.


of 85.7 ± 5.4% for the total of 200 trials completed by each participant, performance was overall quite good. A repeatedmeasures one-way ANOVA on the evolution of performance over the four blocks of 50 trials revealed a significant effect of Block [F(3,33) = 18.51, p < 0.001, η <sup>2</sup> = 0.63], reflecting an initial increase from Block 1 (78.2 ± 8.9%) to Block 2 (88.8 ± 4.2%), followed by a leveling off of performance during Blocks 3 (87.2 ± 6.3%) and 4 (88.7 ± 5.6%). Post hoc Newman–Keuls analyses confirmed that performance in Block 1 was significantly different from performance in Blocks 2, 3, and 4 (p's < 0.001), while no significant differences were observed among the latter.

During the second session the individual participant's paddle was initially positioned at an eccentricity of 30.25 cm (to the left for the LPs and to the right for the RPs) with respect to the center of the screen. Participants were assisted by a static partner (32-cm wide stationary paddle) covering the final 24-cm range of potential ball arrival positions on the opposite side of the full 105-cm range. They therefore needed to cover an 81-cm range of potential ball arrival positions while avoiding contact with the static partner's paddle. Collisions with the stationary paddle occurred only sporadically (on average on 0.5 ± 0.6% of the trials, see **Table 1**), with only three participants colliding once during the first block. Interception scores were stable over blocks (89.3 ± 4.5, 91.7 ± 5.2, 93.0 ± 5.9, and 92.0 ± 4.5%, for blocks 1, 2, 3, and 4, respectively); a repeated-measures ANOVA did not reveal significant differences in performance over the four blocks [F(3,33) = 1.61, p = 0.205, η <sup>2</sup> = 0.13]. These results indicate that participants performed well from the beginning of the session.

In order to examine potential differences between LPs and RPs in Sessions 1 and 2, we conducted a mixed two-way ANOVA on interception scores with Side (LP and RP) as a betweenparticipant factor and Session (1 and 2) as a within-participant factor. This analysis did not reveal significant differences between LP and RP [F(1,10) = 1.22, p = 0.296, η 2 <sup>p</sup> = 0.11]. Inspection of individual means (cf. **Table 1**) confirmed that performance was comparable for left and right participants in both sessions.

Having thus characterized the performance of individual participants in Sessions 1 and 2, we now turn to the third session in which the 12 participants were combined into six teams, each consisting of an LP and an RP. Paddles were initially positioned 30.25 cm to the left (LP) and to the right (RP) with respect to the center of the screen. Together, the two participants needed to cover the full 105-cm range of potential ball arrival positions while avoiding contact between their paddles. As in Session 2, collisions were rare (6 out of the total of 1200 trials, see **Table 1**), with only two teams colliding once within the first block. Interception scores were quite high from the start and stable over blocks (90.7 ± 4.3, 92.3 ± 3.9, 92.3 ± 5.1, and 89.7 ± 7.0%, for blocks 1, 2, 3, and 4, respectively); a repeated-measures one-way ANOVA did not reveal significant differences in performance over the four blocks [F(3,15) = 0.50, p = 0.688, η <sup>2</sup> = 0.09]. Interestingly, team performance could not be predicted on the basis of its members' scores observed in Session 2. Indeed in two cases team performance in Session 3 was better than the best team member's score in Session 2 (teams 3 and 5, see **Table 1**). In two other cases the opposite pattern was observed (teams 1 and 4, see **Table 1**).

**Figure 4** provides a graphical summary of the interception results as a function of the ball's arrival position on the interception axis for all 200 trials of each team. Interceptions accomplished by the LP (dark blue circles) and by the RP (light blue circles) were plotted on two separate axes, so as to allow visual discrimination of who intercepted the balls where. These intercepted trials were completed with the trials in which both participants failed to intercept the ball (red circles, referred to as errors) and with the trials in which the LP and RP paddles made contact with one another (purple dots, referred to as collisions). The (rare) collisions occurred for balls arriving at locations near the center of the screen. Errors, on the other hand, were generally distributed over the full range of ball arrival positions. Indeed, errors for ball arrival positions located within the interval between both participants' initial positions (n = 53) occurred as often as errors for ball arrival positions outside this interval (n = 52), indicating that the majority of errors seemed to result from individual mistakes. Together with the high interception scores (on average 91.3 ± 3.4%) and the low number of collisions (on average 0.5 ± 0.3%), these results demonstrate that participants succeeded remarkably well in coordinating their interceptive movements with one another.

Visual inspection of **Figure 4** revealed that all six teams exhibited a quite well-defined distribution of who intercepted the ball where, with the LP intercepting the grand majority of balls arriving on the left half of the interception axis and the RP intercepting the grand majority of balls arriving on the right half. Interestingly, however, the interception performance of all teams also included an area where both participants could intercept balls. In order to quantify the separation of interception domains, for each team we computed a logistic regression equation with ball arrival position as the explanatory variable. Using a logit link function (Nelder and Wedderburn, 1972), logistic probability curves were derived for the balls intercepted by the LP (P = 1) and by the RP (P = 0) for all teams independently. The boundary between both interception domains was defined as the Median

Effective Level (MEL), that is, the position on the interception axis where the probability of the LP intercepting the ball is equal to the probability of the RP intercepting the ball (i.e., P = 0.5). As can be seen from **Table 2** (observed interception performance), teams 1–5 revealed MEL values close to zero with a maximum absolute deviation of 1.08 cm, indicating that in these teams the boundary between both interception domains laid close to the exact (and yet unmarked) middle of the interception axis. Team 6, on the other hand, was characterized by a MEL value of −4.66 cm, indicating that the boundary between both interception domains was shifted almost 5 cm to the left. Of potential interest here is the fact that team 6 was the team with the largest difference in



individual performances, as observed in Sessions 1 and 2 (see **Table 1**). The boundary shifted toward the participant with the lowest interception score, resulting in a 19.5% difference in the ranges of both participants' interception domains. Note, however, that even in the presence of a shift in the location of the boundary team 6 still demonstrated a rather well-defined separation of interception domains.

The degree of separation between both interception domains is reflected in the steepness of the slopes of the logistic curve and the amount of overlap may be calculated as the distance between the 5 and 95% points of the logistic curve (Cox and Snell, 1989). On average, overlap thus defined amounted to a non-negligible 14.6 ± 3.6 cm. Interestingly, the amount of overlap between interception domains was not related to a team's performance [r = 0.13, t(4) = 0.263, p > 0.8]. While team 6 (characterized by the leftward boundary shift discussed above) demonstrated an above-average overlap (16.9 cm, see **Table 2**) as well as the lowest team performance (85.5% of all balls intercepted, see **Table 1**), team 3 not only revealed the largest overlap (19.3 cm) but also the highest team performance (95.5% of all balls intercepted).

# Movement Kinematics

We first examined initiation times for all interception movements in all three sessions. As can be seen from **Table 3**, whereas average initiation times were similar for Sessions 1 (428 ± 38 ms) and 2 (437 ± 44 ms), they appeared longer for Session 3 (534 ± 51 ms).

TABLE 3 | Mean initiation times of individual participants in Sessions 1 (S1), 2 (S2), and 3 (S3).


Range-corrected initiation times only concern movements initiated for balls arriving between the initial position (−30.5 cm for the LP and +30.5 cm for the RP) and the center of the screen (0 cm).

However, this observation was difficult to interpret because the different sets of initiation times refer to different ranges of movement in the three sessions. For Sessions 2 and 3 we therefore calculated the initiation times for the subset of all interception movements that were directed to ball arrival positions between the initial paddle position and the middle of the screen (i.e., between −30.25 cm and 0 cm for the LPs and between 0 cm and +30.25 cm for the RPs). As can be seen from the last two columns of **Table 3**, even for these range-corrected interception movements a difference in initiation time occurred [paired t-test: t(11) = 3.56, p < 0.01] with movements being initiated later in the presence of a dynamic partner (Session 3: 518 ± 50 ms) than in the presence of a static partner (Session 2: 459 ± 61 ms).

In Session 3, interception on a given trial could in fine only be accomplished by a single participant but this did not necessarily imply that the other participant did not move at all. For every single trial and independent of the result, we therefore determined for both LP and RP whether they initiated a movement. **Figure 5** summarizes the resulting frequency distribution of observed movement initiations for the LPs and RPs as a function of the arrival position of the ball, with the full 105-cm range of potential ball arrival positions divided into 20 (5.25-cm wide) bins. Each trial was classified into one of four categories: initiation LP only (dark blue), initiation RP only (light blue), initiation both LP and RP (green), and no initiation, that is, neither LP nor RP (red) initiated a movement. Of all 1200 trials, 436 (i.e., 36.3%) revealed LP initiation only, almost exclusively associated with balls arriving on the left side of the interception axis. Similarly, 421 (i.e., 35.1%) of all trials revealed RP initiation only, almost exclusively associated with balls arriving on the right side of the interception axis. Of the 279 (i.e., 23.3% of all trials) revealing both LP and RP initiations, 246 (i.e., 88.2%) resulted in successful interception, implying that one of the participants

must have abandoned the launched interception attempt at some point so as to allow the other participant to intercept the ball. The prevalence of such double initiations appeared to follow a bell-shaped distribution over the interception axis, with its peak located in the vicinity of the center of the interception axis (i.e., the center of the screen). In 5.3% of the trials neither of the two participants initiated any movement. In 63 of these 64 trials without movement initiation, balls arrived at or close to one of the participants' initial positions (i.e., ±30.25 cm). Note that in 59 (i.e., 93.7%) of those 63 trials the ball was in fact intercepted, making contact with one of the motionless (3-cm wide) paddles.

In order to obtain a grasp on when one of the participants abandoned the launched interception attempt, we examined the relation between the distance to be covered and the peak velocity reached during the movement on each trial. **Figure 6** presents this relation for each successful (i.e., intercepted) trial in which at least one participant initiated a movement, for each team and each of the two vertical ball speeds separately. Successful interceptions by the LPs (dark blue dots) and the RPs (light blue dots) were characterized by proportional scaling relations between the distance covered (i.e., the distance between initial paddle position and ball arrival position) and the peak velocity reached during the movement (see Ledouit et al., 2013, for similar results). For each individual player we therefore performed a linear regression analysis of peak velocity onto distance covered for the balls intercepted by that participant. Results of these regression analyses are reported in **Table 4** and shown graphically in **Figure 6**.

While the slope of the relation varied both as a function of participant characteristics and as a function of vertical ball speed, individual correlation coefficients were satisfactorily high to allow the definition, for each participant at each vertical ball speed, of a "standard" relation (operationally defined by a range of ±2 SDs around the mean, dashed parallel lines in the panels of **Figure 6**) between ball arrival position and peak velocity reached during an interception movement. Using this "standard" relation observed

FIGURE 6 | Peak velocity of movement as a function of ball arrival position for both members of each team for each vertical ball speed separately. Dark blue dots indicate LP-interception trials and light blue dots indicate RP-interception trials. The solid black lines represent the associated regression lines of peak velocity onto ball arrival position and the dashed gray lines represent the ±2 SD boundaries. Green symbols indicate trials in which interception was abandoned, with dots indicating that the peak velocity reached during that trial fell within the above-defined boundaries (late abandoning) and crosses indicating that the peak velocity reached during that trial fell outside the above-defined boundaries (early abandoning). The horizontal gray dashed lines in each panel, at peak velocity = 0 m/s, indicate the borders between negative (i.e., movements to the left) and positive (i.e., movements to the right) values of peak velocity. All green dots and crosses with positive peak velocity (i.e., all green points above the zero line) represent abandoned interception attempts of the LP. Likewise, all dots and crosses with negative peak velocity (i.e., all green points below the zero line) represent abandoned interception attempts of the RP. (A) High vertical ball speeds (0.64 m/s, 1-s trial duration) and (B) low vertical ball speeds (0.4 m/s, 1.6 s trial duration).


TABLE 4 | Results of regression analyses of the relations between peak velocity and distance covered during movements resulting in interception, performed for each participant separately for each of the two vertical ball speeds.

n, number of trials; a, slope (s−<sup>1</sup> ); r, correlation coefficient; p, probability.

for successful interceptions, we could identify whether the 246 abandoned interception attempts (i.e., successful trials in which the participant that did not intercept the ball had nevertheless initiated a movement) occurred early or late during the trial. Late abandoning was characterized by the participant still reaching the standard peak velocity (green dots in **Figure 6**), whereas early abandoning was characterized by the participant reaching a lower-than-standard peak velocity (green crosses in **Figure 6**). Of the 246 successfully intercepted trials demonstrating both LP and RP initiation, 179 (i.e., 72.8%) were characterized by early abandoning, while 67 (i.e., 27.2%) were characterized by late abandoning.

# Team Interactions

Several of the results discussed in the previous sections suggest that team performance, as observed in Session 3, cannot be satisfactorily understood as resulting from a form of organization with pairs of independent players, each covering their own half of the interception space. First, while for five of the teams the boundary between interception domains laid close to the center of the screen (with differences in the sizes of individual participant interception domains being limited to 2.3 ± 1.4%), in team 6 this boundary was shifted by almost 5 cm, leading to a difference in domain sizes of 19.5%. Second, for all six teams the boundary between interception domains was fuzzy rather than sharp, with participants regularly entering their teammate's domain to intercept balls there without such "intrusions" leading to collisions. The observed degree of overlap between interception domains was indeed quite substantial (14.6 ± 3.6 cm), amounting to 13.9 ± 3.4% of the full range of potential ball arrival positions. Third, balls arriving near the center of the screen (four center bins of **Figure 5**, with ball arrival positions ranging from −10.5 to +10.5 cm) more often evoked movement initiations of both participants than only initiations of the participant in whose interception domain the ball would in fact arrive. Yet, both collisions and errors were rare, as 87.9% of the trials on which both participants initiated a movement resulted in successful interception by one of the participants. Finally, while in 72.8% of the 246 double-initiation trials one of the participants abandoned the launched interception attempt early on, in the remaining 27.2% of the trials the interception attempt was abandoned after the participant had reached a peak velocity associated with an ongoing interception attempt. Together, these results suggest that participants took into account the ongoing actions of their partners.

Without going as far as suggesting that this is the information used by the participants (see Fajen and Warren, 2007; Bootsma et al., 2016, for further details), for the present purposes the state of the angle formed, for each participant, by the line connecting this participant's paddle with the other participant's paddle and the line connecting this participant's paddle with the ball (see **Figure 3**) may well allow capturing the unfolding team interactions. Indeed, by physical law, a constant angle (i.e., a zero AV) indicates that the player's current movement speed will lead the paddle to reach the interception point when the ball arrives there. Put differently, zero AV means that an interception will occur if both ball and paddle speed remain constant over the remainder of the trial. Given that in the present study ball speed was always constant over the course of a trial, from the foregoing it follows that a positive AV (i.e., an opening of the angle) implies that maintaining current paddle speed will lead to an early arrival at the interception location and, likewise, that a negative AV (i.e., a closing of the angle) implies that maintaining current paddle speed will lead to a late arrival at the interception location.

When neither of the two participants has begun to move their paddle (i.e., from the beginning of a trial up to the moment of first movement initiation), for both participants AV will be negative for balls arriving at a location between the two paddles. For balls arriving at locations to the left of the LP, AV will be positive for the stationary LP and negative for the stationary RP. Mutatis mutandis, AV will be positive for the stationary RP and negative for the stationary LP for balls arriving at locations to the right of the RP.

Each trial in which one or both participants initiated a movement is represented in **Figure 7** as a point in space defined by the states of the AV-LP (abscissa) and the AV-RP (ordinate) at the moment of first movement initiation. Dark blue dots designate the 436 trials in which only the LP initiated a movement, light blue dots designate the 421 trials in which only the RP initiated a movement, and green dots designate the 279 trials in which both players initiated a movement. As was already visible in **Figure 5**, balls arriving to the left of the LP almost invariably evoked only movement from the LP. In **Figure 7**, these trials correspond to the (predominantly dark blue) dots in the lower right quadrant where AV-LP is positive and AV-RP is negative. Likewise, balls arriving to the right of the RP almost invariably evoked only movement from the RP. In **Figure 7**, these trials correspond to the (predominantly light blue) dots in the upper-left quadrant where AV-LP is negative and AV-RP is positive. As was also already visible in **Figure 5**, trials evoking initiation by both the LP and RP generally arrived between the initial positions of both paddles, close to the center of the screen. In **Figure 7** these trials correspond to the green dots predominantly located in the lower-left quadrant where both AV-LP and AV-RP are negative.

The (AV-LP, AV-RP) state space allows us to scrutinize the evolution over time of the behavior of both participants with respect to the ball. The trials of interest for such scrutiny are of course the trials in which both participants initiated an interception movement (green dots in **Figure 7**). For these reasons, the subset of 246 successfully intercepted trials in which both participants initiated a movement is once again presented in **Figure 8**, but this time coded for the player who in the end intercepted the ball (LP interception: dark blue, RP interception: light blue). When participants start moving they actively change their relation to the ball, which is functionally captured by a change in their AV. The motion through the (AV-LP, AV-RP) state space thus captures the dynamic triangular relation between both players and the ball. As in **Figure 7**, **Figure 8A** depicts the situation at the time of first movement initiation. **Figures 8B–D** depict the situation, respectively, 100, 200, and 300 ms later.

Inspection of **Figure 8** brings out that trials that eventually gave rise to LP-interception were characterized by a change in AV-LP from negative to positive (resulting from the LP's sustained movement toward the future interception location), with dots moving from the lower-left quadrant either to the lower-right quadrant or, for a smaller proportion of trials, to the upper-right quadrant. A similar picture emerged for the trials that eventually gave rise to RP-interception. These trials were characterized by a change in AV-RP from negative to positive (resulting from the RP's sustained movement toward the future interception location), with dots moving from the lower-left quadrant either to the upper-left quadrant or, for a smaller proportion of trials, to the upper-right quadrant. **Figure 8** thus reveals the gradual separation in the two groups of trials based on who intercepted the ball in the end. This observation suggests that the decision of who intercepts the ball in fact emerges over the course of a trial, as a function of the expediency with which both participants engaged in their interception attempts. In fact, it appeared that the first participant to reach positive AV tended to be the one that ended up intercepting the ball. Recalling (cf. **Figure 3**) that negative AV implies that with the current movement speed the participant will be (too) late, positive AV implies that with the current movement speed the participant will in fact arrive at the interception location before the ball gets there. Even though all participants generally slowed down prior to interception (probably so as to minimize chances of colliding with the other participant), the occurrence of a positive AV for one participant may signal to the other that the interception attempt should be abandoned.

In order to test this idea, we examined the evolution over time of AV-LP and AV-RP for all 1095 trials on which the ball was intercepted. Starting from the situation at the onset of a trial, we classified the trial as LP-interception or RP-interception, as a function of the first participant to reach positive AV. Note that this rule led to correct (although immediate) classification of balls arriving to the left of the LP as LP-interception and of balls arriving to the right of the RP as RP-interception. The results of this on-the-fly decision formulation are presented in **Figure 9** for all six teams separately.

As can be seen from **Figure 9**, attribution of interception to the LP (dark blue circles) or the RP (light blue circles) was correct in the overwhelming majority of cases. Overall, attribution errors occurred on only 2.0% of the trials, corresponding to a total number of errors of 2, 2, 6, 7, 3, and 2, for teams 1–6, respectively. The on-the-fly decision criterion of interception by the "first participant to reach positive AV" not only allowed to predict which participant would intercept the ball with more

and light blue dots indicate RP-interception trials. The thin vertical and horizontal lines in each panel mark zero AV for the LP and RP, respectively. Movement of dots across these lines mark transitions from negative to positive AV. The thin diagonal line in each panel marks AV-LP = AV-RP. (A) At the moment the first participant initiated a movement, (B) 100 ms later, (C) 200 ms later, and (D) 300 ms later.

than satisfactory precision, but also reproduced the qualitative aspects of the distribution of interception domains observed in each team. Deriving logistic probability curves for the predicted performance (see **Table 2**, predicted interception performance) revealed that the locations of boundaries between interception domains were well predicted [r = 0.92, t(4) = 4.54, p = 0.010], laying close to the center of the screen (2.17 cm maximal absolute deviation) for teams 1–5 while being shifted 4.24 cm to the left for team 6. Similarly, even though somewhat overestimated, the amount of overlap between interception domains was fairly well predicted [r = 0.80, t(4) = 2.63, p = 0.059].

Finally, because the moment at which the first participant reached positive AV could be detected, we examined whether this criterion also correctly predicted when the non-intercepting participant abandoned the launched interception attempt in the trials in which both participants initiated an interception movement. In 209 (i.e., 85.0%) of the 246 double-initiation trials, the abandoning participant indeed reached peak velocity after the intercepting player had reached positive AV. Thus, the non-intercepting participant was already decelerating (that is, had already abandoned) before the intercepting player reached positive AV in only 15.0% of the cases. This first analysis suggests that our on-the-fly decision criterion also captures the timing of the decision rather well. We can take the analysis one step further by also considering the information with respect to the moment of abandoning contained in the magnitude of the peak velocity reached by the non-intercepting participant, as described in Section "Movement Kinematics." If the peak velocity reached during an abandoned interception attempt corresponded to the "standard" peak velocity of a successful interception movement, the interception attempt was considered as still underway at the moment the non-intercepting participant reached this peak velocity. Abandoning was then classified as late. If, on the other hand, the peak velocity reached during an abandoned interception attempt was smaller than the standard peak velocity, the interception attempt was considered as already abandoned when the non-intercepting participant reached this lower-thanstandard peak velocity. Abandoning was then classified as early. **Table 5** presents the foregoing results in the form of a contingency table.

As can be seen from **Table 5**, of the 209 double-initiation trials in which the non-intercepting participant reached peak velocity after the intercepting participant had reached positive AV, 150 (i.e., 71.8%) had been characterized as early abandoning and 59 (i.e., 28.2%) as late abandoning. This repartition nicely mirrors the observed overall 72.8% (179 out of 246) early abandoning and 27.2% (67 out of 246) late abandoning. Of the 37 trials in which the non-intercepting participant had reached peak velocity before the intercepting participant reached positive AV, the grand majority (29 or 78.4%) had been characterized as early abandoning. We suggest that in many of these trials the non-intercepting participant produced only a small movement, characterized by a low peak velocity (i.e., the green points close to the zero velocity axis in **Figure 6**). Overall we conclude that the on-the-fly criterion that the ball will be intercepted by the "first participant to reach positive AV" allows the observed team interactions to be rather accurately captured.

# GENERAL DISCUSSION

In the present contribution we set out to study how a team of two players coordinated their actions so as to intercept a series of approaching balls. Contrary to most work performed in the field of between-participant collaboration (e.g., Mottet et al., 2001; Isenhower et al., 2010; Romero et al., 2015), our doublespong task (implicitly) required the team members to decide amongst them, on every single trial, who would perform the interceptive action and who would not: continuing interception attempts realized by both players led to collisions between their paddles that subsequently disintegrated, thereby no longer allowing the ball to be intercepted. In order to be able to study how such joint decisions were made on the basis of shared visual information only, we effectively prevented participants from directly communicating between them: unable to see or hear the other participant, they only shared the visual information available on the screen in front of them, depicting the moving ball and the positions of each of the two participant-controlled paddles along the interception axis.

Before partaking in the team interception session, participants had previously been familiarized with the apparatus and task. In a first session they had practiced intercepting all balls on their own and in a second session they had practiced intercepting balls while assisted by a static partner, incorporated by a large stationary paddle covering the last part of the opposite side of TABLE 5 | Contingency table for double-initiation (both LP and RP) trials, combining the number of times the non-intercepting player reached peak velocity before or after the intercepting participant reached positive angular velocity with the number of times the non-intercepting player abandoned the interception attempt early or late, as determined by the magnitude of the peak velocity reached.


the interception axis. These first two sessions not only served to allow the participants to become acquainted with the setup but also allowed us to characterize interception performance of all 12 individual participants. After having ascertained that performance in the first two sessions was comparable for the leftpositioned participants (LPs) and right-positioned participants (RPs), six teams, each consisting of an LP and a RP, were formed for the final session.

Notwithstanding the lack of possibilities for overt communication, performance during this team interception session was remarkably good, with between 85.5 and 95.5% of the balls being intercepted by the different teams. Collisions were extremely rare, with one team never colliding, four teams colliding once and one team colliding twice on a total of 200 trials

per team. Focusing on who intercepted balls where revealed that all teams instantiated a division of the total interception space, with the LP intercepting the grand majority of ball arriving on the left half of the interception axis and the RP intercepting the grand majority of balls arriving on the right half. However, as already mentioned, a simple geometry-based division-of-space hypothesis did not satisfactorily account for the pattern of results observed. The decision of who intercepts a ball where appeared to be founded in between-participant interactions rather than in situational geometry.

A first indication hereof was the finding that, while for five of the six teams the boundary between LP and RP interception domains was located close to the (unmarked) center of the screen, for the remaining team this boundary was shifted almost 5 cm to the left (cf. **Figure 4**). As the latter team was characterized by a large difference in individual performance scores in Sessions 1 and 2 and the LP was the participant with the lowest interception performance scores, it is tempting to suggest that the boundary shift resulted from the better (worse) player taking charge of a larger (smaller) part of the interception space. However, more systematic explorations of between-participant performance levels are required to test the hypothesis that a team's division of interception space may indeed depend on the performance levels of the individual members. By the same token, the question whether approximately equally skilled team members would also divide the interception space in halves when their initial paddle positions were not symmetrically centered around the middle of the space also needs to be addressed in future work.

A second indication of the inadequacy of a geometry-based division-of-space hypothesis was the finding that, even though all six teams of the present study revealed a division of interception space, such divisions were never absolute. Boundaries were indeed fuzzy rather than clear-cut and the interception domains of individual participants were characterized by a significant degree of overlap (cf. **Figure 4**). Under a division-of-space hypothesis excursions into the other participant's interception space should be considered as mistakes likely to result in collisions, with the likelihood of collisions expected to increase with the magnitude of the intrusion. Yet excursions into the partner's interception domain leading to successful interception were clearly far more frequent than collisions. Collisions moreover generally occurred for balls arriving very close to the boundary between interception domains. Interestingly, overlap between interception domains was not only spatial but also temporal: initiation of interceptive movements by both participants occurred in almost a quarter of all trials (cf. **Figure 5**). While this may be understood as resulting from uncertainty with respect to the future ball arrival position, it does require that at some point in time one of the participants abandons the launched interception attempt so as to allow the other participant to successfully intercept the ball. At least in these trials the decision to (continue to attempt to) intercept the ball on a given trial or not is thus clearly taken on the fly rather than before movement onset.

How might between-participant interactions provide an account for the patterns of results observed? In the present contribution we suggested that the dynamic triangular relations between the movements of both participants and the approaching ball may be captured by the relation between the rates of change of angles βLP and βRP (cf. **Figure 3A**). Importantly, both AVs are influenced by the motion of the ball. Moreover, AV-LP is influenced by the way in which the LP moves the left paddle and AV-RP is influenced by the way the RP moves the right paddle. Contrary to movement speed, that necessarily varies as a function of the distance to be covered, AV provides a functional (because future outcome-related) characterization of the relation between the ball and the participant's paddle (see **Figures 3B–D**). As such it allows evaluation of the expediency of both participants' ongoing interception attempt. Expediency here refers to the current functionality of the engagement of a participant in an interception attempt, with an expedient movement being a movement that rapidly leads to positive AV. Because positive AV implies a paddle speed that is higher than required to ensure interception, such a relation indicates that the participant is on track to perform an interception (and may end up beyond the interception point if the ongoing movement is not decelerated). Picking up such expediency of the partner's movement would allow the other participant to timely abandon his/her own ongoing interception attempt in order to avoid the paddles to collide.

Simulating the outcome of the on-the-fly decision process on each intercepted trial by attributing the future interception to the first participant to attain positive AV allowed the qualitative aspects of the observed results to emerge for all six teams. Indeed the predictions grounded in this action-based criterion (cf. **Figure 9**) revealed that the overlap as well as the location of the boundary between interception domains, including the boundary shift observed for team 6, could be understood as emerging from the participants' behaviors during a trial. It is worth noting that predicted overlap tended to be larger than observed overlap, emphasizing the capacity of an information-based coupling to explain such a phenomenon. Moreover, the simulation provided first evidence that not only the outcome but also the timing of the team's decision who will intercept the ball could be understood as emerging from the interaction.

In this study we took an embodied approach to joint decision making (Richardson et al., 2008; Marsh et al., 2009; Richardson et al., 2010; Coey et al., 2012). Looking at the interactive team behavior over time provides a way to study the emerging of the decision over time, rather than focusing on the outcome of a decision making process (cf. Turvey and Shaw, 1995; Lepora and Pezzulo, 2015). With the observation that in almost a quarter of all trials both participants initiated an interceptive movement (after which one of the two was required to abandon this attempt), the results of the present study provide behavior-based empirical evidence for the argument that actions may already be underway before decisions are completed, stressing the need to consider choice of action and control of action as highly integrated rather than serially arranged processes (e.g., Newell and Simon, 1972; for neural accounts also proposing parallel rather than serial decision processes, see, for instance, Cisek, 2007; Lepora and Pezzulo, 2015). The results also revealed that

team decisions do not necessarily call upon shared knowledge or mental models —minimally exemplified in our doublespong task without overt communication by a silent agreement to divide interception space— as suggested by tenants of the social-cognitive perspective (e.g., Cannon-Bowers et al., 1993; Eccles and Tenenbaum, 2004; Cannon-Bowers and Bowers, 2006; Ward and Eccles, 2006; Sebanz and Knoblich, 2009). Our results rather suggest that team decisions are information-driven: the interactions between the participants with respect to the ball provide information (tentatively captured in the AV-LP, AV-RP space) that can be used to decide to continue or to abandon a launched interception attempt.

Taking our observations into account, how then should we perceive a team of two individuals intercepting balls together? Intercepting a moving target on itself is a non-social activity and, therefore, often studied as such (e.g., Bootsma and Van Wieringen, 1990; Peper et al., 1994; Chardenon et al., 2005; Michaels et al., 2006; Fajen and Warren, 2007; Ledouit et al., 2013; Bootsma et al., 2016). However, whereas the ball typically will be intercepted by one individual, in many (sports) situations more individuals are present, potentially intercepting the ball as well. The task under study here was inspired by and modeled after the situation of (beach) volleyball players ready to intercept an oncoming serve. In situations such as these, it is the common goal (i.e., intercepting as many balls as possible) and accompanying constraints (i.e., not colliding with one another) that bind both individuals to act as a 'social unit' (i.e., a team; Marsh et al., 2006). Nevertheless, we do not know (yet) how such a social unit comes about from two 'I's' cooperating as a 'we' on the same task. Marsh et al. (2006) proposed that multiple individuals acting together might be considered a so-called social synergy, in which several individuals are temporally and functionally constrained by informational linkages to act as one unit. Evidence for such a synergistic approach to joint action has been found in studies on rhythmical interpersonal coordination (see Schmidt and Richardson, 2008 for a review) and during a continuous interpersonal postural task (Ramenzoni et al., 2011) showing behavioral control at the collective level. Our study, however,

# REFERENCES


does not concern continuous rhythmical movements made by an ensemble of individuals, neither do both individuals perform the same task, as only one of the two individuals will intercept the ball in the end. Our results, though, do suggest that both players act as a team when deciding to go for the ball or not.

# CONCLUSION

This study offered a paradigm in which two players act as a team to realize the interception of an approaching ball without any other means of interaction than the visual information of the joint action display on the shared task space. We suggest that the decision of who of the two players realizes ball contact emerges from these interactions of both players (paddles) and the ball. The coordinated action often involves the initiation of movement by both members of a team, leading to abandoning of movement by one of the players. Of course, many questions remain. Details of the interactions, effects of the means of interacting, and the identification of the information that the players use await future experiments. Furthermore, we suggest that the task that we developed captures the essentials of real-world tasks such as the interception of a serve in beach volleyball, but also in many other situations of daily life in which individuals have to coordinate to attain a common goal. Although further testing is needed to back up these suggestions, we feel that the paradigm that we introduced holds great promise for understanding on-the-fly decision making among individuals.

# AUTHOR CONTRIBUTIONS

NHB, FZ, RC, NB, and RB conceived and designed the experiments and critically contributed to the intellectual content of the work. RC conceived the experimental set-up. NHB ran the experiments. NHB, FZ, RC, and RB analyzed the data. NHB, FZ, and RB wrote the first drafts. All authors approved the final version of the manuscript.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Benerink, Zaal, Casanova, Bonnardel and Bootsma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Autism in Action: Reduced Bodily Connectedness during Social Interactions?

#### C. (Lieke) E. Peper<sup>1</sup> \*, Sija J. van der Wal<sup>2</sup> and Sander Begeer2,3

<sup>1</sup> Department of Human Movement Sciences, MOVE Research Institute Amsterdam, Vrije Universiteit Amsterdam, Amsterdam, Netherlands, <sup>2</sup> Section Clinical Developmental Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands, <sup>3</sup> EMGO Institute for Health and Care Research, Vrije Universiteit Amsterdam, Amsterdam, Netherlands

Autism is a lifelong disorder, defined by deficits in social interactions and flexibility. To date, diagnostic markers for autism primarily include limitations in social behavior and cognition. However, such tests have often shown to be inadequate for individuals with autism who are either more cognitively able or intellectually disabled. The assessment of the social limitations of autism would benefit from new tests that capture the dynamics of social initiative and reciprocity in interaction processes, and that are not dependent on intellectual or verbal skills. New entry points for the development of such assessments may be found in 'bodily connectedness', the attunement of bodily movement between two individuals. In typical development, bodily connectedness is related to psychological connectedness, including social skills and relation quality. Limitations in bodily connectedness could be a central mechanism underlying the social impairment in autism. While bodily connectedness can be minutely assessed with advanced techniques, our understanding of these skills in autism is limited. This Perspective provides examples of how the potential relation between bodily connectedness and specific characteristics of autism can be examined using methods from the coordination dynamics approach. Uncovering this relation is particularly important for developing sensitive tools to assess the tendency to initiate social interactions and the dynamics of mutual adjustments during social interactions, as current assessments are not suited to grasp ongoing dynamics and reciprocity in behavior. The outcomes of such research may yield valuable openings for the development of diagnostic markers for autism that can be applied across the lifespan.

Keywords: autism, entrainment, interpersonal coordination, dynamics, reciprocity

# INTRODUCTION

Autism Spectrum Disorder (from hereon: autism) is a lifelong impairing disorder, or group of disorders (prevalence >1%), defined by deficits in social communication and interaction, and restrictive, repetitive interests (Lai et al., 2014). While autism can be diagnosed in preschoolers, recent findings indicate that the mean age of diagnosis is much higher, especially for individuals with autism and a normal IQ (around 50%) (Begeer et al., 2013; Lai and Baron-Cohen, 2015). Presumably these individuals can compensate for their autism until the complexity of social

#### Edited by:

Richard C. Schmidt, College of the Holy Cross, USA

Reviewed by: Nicola Yuill, University of Sussex, UK Kerry Marsh, University of Connecticut, USA

> \*Correspondence: C. (Lieke) E. Peper l.peper@vu.nl

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 04 August 2016 Accepted: 09 November 2016 Published: 23 November 2016

#### Citation:

Peper CE, van der Wal SJ and Begeer S (2016) Autism in Action: Reduced Bodily Connectedness during Social Interactions? Front. Psychol. 7:1862. doi: 10.3389/fpsyg.2016.01862

interactions at older ages brings their autism to light. Consequently, a remaining challenge is finding objective diagnostic markers that can help detect autism across the lifespan.

Although autism also entails non-social deficits, behavioral diagnostic markers for autism beyond the preschool years primarily focus on deficits in social interactive behavior. These behavioral markers often rely on standardized clinical observations (e.g., Autism Diagnostic Observation Schedule; Lord et al., 2000) or parent reports (Autism Diagnostic Interview Revised, ADI-R; Lord et al., 1994; Lai et al., 2014). A less obvious component of social competence lies within bodily movement during social interactions. In typically developing (TD) individuals, bodily movements become more connected in social settings, resulting in both imitative (Chartrand and Bargh, 1999) and synchronized movements (Bernieri, 1988). This 'bodily connectedness,' i.e., attunement of bodily movement between individuals, is related to psychological connectedness, as expressed by social skills and relational quality (Hove and Risen, 2009; Lumsden et al., 2012; Cook, 2016).

Here, we argue that the concept of bodily connectedness provides a unique opportunity to assess subtle features of social interactions in autism. We focus on social initiative and social reciprocity. Currently the assessment of these key diagnostic criteria for autism is hampered by the focus on static stimuli and unidirectional settings, which do not capture the ongoing mutual adaptations in the unfolding interaction (Lai et al., 2014). Even the ADOS (Lord et al., 2000), which does rely on the observation of live dynamic interactions, uses subjective and relatively coarse interpretations and quantifies behavior in a dichotomous way (scoring behavioral features, such as reciprocity, as either present or absent). Subtler and less subjective measurements with higher index quality would be an important addition to the diagnostic arsenal. The dynamic (i.e., time-varying) nature of interaction processes may be captured using empirical methods aimed at uncovering variations and differentiations in bodily connectedness of individuals with and without autism (Schmidt et al., 2012). The proposed focus on bodily movement fits with previous suggestions that perceptuo-motor impairments may critically affect socio-cognitive functioning in autism (Bhat et al., 2011; von Hofsten and Rosander, 2012; De Jaegher, 2013).

# SOCIAL LIMITATIONS IN AUTISM

The term autism stems from the Greek 'autos,' meaning 'self.' An extreme orientation toward the self in autism is reflected in poor initiative and reciprocity during social interactions. These features have been confirmed in a large number of studies (Duffy and Healy, 2011) and are central domains of diagnostic assessments (Lord et al., 2000). Social limitations in autism have been linked to disrupted early imitation and dysfunctions in the so-called mirror neuron system (i.e., brain regions that are active when an individual performs a specific action, but also when he/she observes another person performing that action; Klin et al., 2003). However, the evidence for a defect mirror neurons system is mixed (Hamilton, 2013) and the nature of limitations in imitation remains poorly understood (Vivanti et al., 2014). A general explanation for the impairments in social initiative and reciprocity states that individuals with autism find social interactions less rewarding, because they fail to appreciate their emotional significance. Indeed, abnormal brain functioning in autism suggests impaired sensitivity to social affiliation and reward at a neural level (Dawson et al., 2002).

Poor social initiative and reciprocity are most apparent during spontaneous interactions between individuals. Detailed assessment of these impairments requires measuring the dynamics of ongoing interactions, to capture potential asymmetries in the mutual contributions of interacting individuals. To date, tests for social limitations of individuals with autism focus primarily on isolated elements within this dynamical process. For instance, various instruments are available for testing children's conceptual understanding of perspective taking (Theory of Mind) or emotions (Yirmiya et al., 1998; Begeer et al., 2008). These tests typically focus on unidirectional interactions ("Do I understand what you think?"), and fail to address the dynamics of ongoing interactions.

An additional problem is that conceptual tests target cognitive skills. Normally intelligent individuals with autism (around 50%; Wingate et al., 2014) rely on cognitive abilities to compensate social limitations. This enables them to disguise these limitations, particularly during conceptual (Scheeren et al., 2013) or standard situations (Begeer et al., 2010), while remaining limited in reallife interactions (Klin et al., 2003). Tests for social behavior are often insensitive for more able individuals with autism, at school age or up (Happe, 1995). For intellectually disabled individuals with autism, it is equally important to develop IQ-independent assessments of their social limitations, as social and intellectual limitations are difficult to disentangle (Tureck and Matson, 2012).

The assessment of the social limitations of autism would benefit from tests that (i) capture the dynamics of social initiative and reciprocity in interaction processes, and (ii) are not susceptible to cognitive compensation, or dependent on intellectual or verbal skills. There is a particular scarcity of measures that assess elementary social limitations during direct social interactions, taking into account who initiates the interaction, how interactants are influenced by social triggers, and to what extent they contribute to the interaction in a balanced manner.

# BODILY CONNECTEDNESS AS MARKER FOR SOCIAL ABILITIES

In TD individuals, matching and synchronization of bodily movements are associated with (psychological) characteristics of the interactants and the quality of their relationship, such as self-esteem (Lumsden et al., 2014), pro-social attitudes (Lumsden et al., 2012), physical attractiveness (Zhao et al., 2015), rapport (Hove and Risen, 2009; Raffard et al., 2015), and perceived social difference (Miles et al., 2011). Moreover, moving in synchrony fosters cooperative abilities (Wiltermuth and Heath, 2009; Valdesolo et al., 2010). As bodily connectedness appears to be stronger between individuals whose movement patterns resemble

each other (Słowinski et al., 2016 ´ ), Cook (2016) argued that such connectedness between individuals with and without autism may be reduced due to a mismatch between their movement patterns, given the atypical patterns observed for autism (Bhat et al., 2011; Gowen and Hamilton, 2013). Hence, bodily connectedness may provide insight into underlying processes of social limitations in autism, potentially inspiring new assessment procedures.

Whereas examinations of interactional synchrony using temporal coding for specific actions or rating scales provide rather coarse-grain indices of synchrony (Schmidt et al., 2012), subtler aspects of how persons attune their movements to each other can be assessed using methods from the coordination dynamics approach (Schmidt et al., 2011, 2012). This approach highlights how on-going, dynamic interaction processes play a defining role in interpersonal coordination (Schmidt et al., 1990, 2011; Issartel et al., 2007; Peper et al., 2013). When two persons perceive each other's rhythmic movements, the resulting interactions yield attraction toward an interpersonal movement synergy (referred to as 'entrainment'), both in the presence and absence of instructions regarding coordination of the movements. Stronger entrainment reflects stronger mutual interactions. This focus on interpersonal interactions conveys new potential for assessing specific limitations in autism. Indeed, first applications to dyads involving a person with autism (Marsh et al., 2013; Fitzpatrick et al., 2013, 2016) indicated reduced entrainment, suggesting weakened bodily connectedness.

As outlined below, extending the examination of interpersonal coordination dynamics beyond the level of basic entrainment experiments may provide new tools for assessing social initiative and reciprocity. This requires strategically chosen conditions and methods to delineate the degree to which individuals contribute to the entrainment with the other person. If research along these lines is indeed successful, a next step would be to derive assessment tools suitable for clinical settings.

# INTERPERSONAL COORDINATION DYNAMICS AS WINDOW INTO SOCIAL LIMITATIONS IN AUTISM

# Quantifying Entrainment between Two Persons

Signs of bodily connectedness have been reported for TD individuals when they are engaged in a mutual task, even when the bodily movements are immaterial to the joint task performance [e.g., when solving a cognitive puzzle through verbal interaction (Shockley et al., 2003)]. As the limitations in social reciprocity are a defining criterion for an autism diagnosis, such spontaneous attunement of task-irrelevant movements is expected to be reduced in individuals with autism.

The paradigm developed by Shockley et al. (2003) provides an excellent option for examining this prediction. This paradigm involves two persons standing, each looking at a picture, without seeing the picture the partner is looking at. Through verbal communication they have to discover 10 differences between the two pictures. In control measurements the participants do not interact with one another. Shockley et al. (2003) demonstrated that engagement in this joint task resulted in subtle entrainment features in the postural sway patterns of the two TD partners. The degree of this entrainment may be expected to be smaller in autism–TD dyads than in TD–TD dyads. Given the complexity of the obtained postural sway patterns, detailed analysis of their entrainment requires refined analysis methods, such as Cross Recurrence Quantification Analysis (Shockley et al., 2003).

A more common way to determine movement entrainment is to examine the extent to which the movements of two persons are adapted toward each other during rhythmic movements, as those allow for examination of the degree of synchronization over a large number of movement cycles. In TD–TD dyads, entrainment has thus been determined during instructed mutual coordination (e.g., intentional synchronization; Amazeen et al., 1995; Richardson et al., 2007) but also in the absence of such instructions (Schmidt and O'Brien, 1997; Richardson et al., 2007; Oullier et al., 2008). By examining the phase relation (typically referred to as 'relative phase') between the two movement patterns, the occurrence and strength of entrainment can be determined. When no stable coordination pattern is observed (indicating weak interpersonal coupling), temporary attraction to synchronized patterns can be determined based on the distribution of relative phases over a trial or by means of recurrence or coherence measures (Ridderikhoff et al., 2006; Richardson et al., 2007, 2008). For stable coordination patterns, the variability of relative phasing between the moving individuals reflects the strength of connectedness (or: coupling), with lower variability reflecting stronger connectedness (Varlet et al., 2012).

Although autism has scarcely been examined along such lines, autism–TD dyads have been found to show less entrainment than TD–TD dyads (Fitzpatrick et al., 2013, 2016; Marsh et al., 2013), suggesting that autism is indeed associated with reduced bodily connectedness. However, although relative phase measures provide information about the degree of synchronization within a dyad, they do not inform us directly about potential differences in how the two individuals contribute to the entrainment process. Hence, additional manipulations and analyses are required to address social initiative and reciprocity asymmetries in more detail.

# Social Initiative

Individuals with autism typically show reduced social initiative. When prompted, some individuals with autism respond adequately (Shabani et al., 2002), but their limited spontaneous social initiative remains poor. A prerequisite for testing reduced initiative in social situations is the absence of prompts, instructions, or other cues to trigger behavior (Backer van Ommeren et al., 2015). Tests that rely on spontaneous skills are more sensitive to autism than tests that provide an opportunity to use cognitive skills (Senju et al., 2009). A focus on involuntary bodily connectedness provides a clear advantage here, as it is difficult to compensate for a lack of uninstructed, subtle attunement of bodily movement.

Given their diminished social initiative, we may expect that bodily connectedness in individuals with autism depends on instructions regarding the interactions with another person.

Whereas TD individuals tend to synchronize their movements to those of a partner spontaneously, even without being instructed to do so (Schmidt and O'Brien, 1997; Richardson et al., 2007; Oullier et al., 2008), this spontaneous tendency seems to be reduced in individuals with autism (Fitzpatrick et al., 2013, 2016; Marsh et al., 2013). Conversely, the instruction to (intentionally) synchronize movements provides an explicit trigger for movement interaction, and may be expected to yield higher degrees of synchronization in individuals with autism, who are known to thrive on explicit instructions (Schwarzkopf et al., 2014).

Empirically, these predictions can be tested in persons (in dyads) who move one of their limbs rhythmically but, initially, at slightly different tempi and/or phasing. Once they see each other's movements (Richardson et al., 2007; Varlet et al., 2012) interpersonal interactions are expected to induce entrainment. To address the degree of social initiative, participants can be instructed to either continue moving at the initial tempo and/or phasing (unintentional condition: no social initiative required) or to synchronize the movements with the partner (intentional condition: social initiative required). Less entrainment is expected for autism–TD dyads than for TD–TD dyads in the unintentional condition (Fitzpatrick et al., 2013, 2016; Marsh et al., 2013), but not necessarily in the intentional condition, given the explicit instruction to produce synchronization (but see also Fitzpatrick et al., 2016). Moreover, by analyzing the adaptations in the individual movement patterns during the first instances of entrainment, the degree to which participants with autism demonstrate social initiative can be further examined.

# Reciprocity Of Mutual Adjustments

Autism is not only characterized by a reduced tendency to initiate social interactions, but also by reduced reciprocity during social interactions, which affects the dynamics of the ongoing adjustments between the interactants. Measuring such reciprocity requires a technique to disentangle the dynamic contributions of each participant to the reciprocal interaction. Indeed, recent tests for reciprocal behavior in autism (Backer van Ommeren et al., 2015) demonstrated that such a dynamic approach yields an IQ-independent assessment. Individuals with autism show clear limitations to reciprocate during an interaction process with another person, although initial evidence suggest improvement when appropriate support is provided (Holt and Yuill, 2014). However, targeting the mutual adjustments during the interaction requires more detailed analyses of behavior, taking into account the ongoing contribution of each interactant in real time.

If bodily connectedness is a marker for autism, asymmetries are expected in the movement interactions between individuals. When two individuals synchronize their movements, the degree to which they contribute to the joint coordination pattern may differ. Whereas a person with autism may be expected to adapt his/her movements less to those of the partner, it is possible that this tendency is (partly) compensated by enhanced adaptations by the partner, thereby potentially obscuring the reduced bodily connectedness in the person with autism. Conversely, it is also possible that the partner shows less bodily connectedness when coordinating with an individual with autism. It is therefore important to establish the extent to which each person adapts his/her movements to those of the partner (Oullier et al., 2008). This can be done, for example, in the entrainment experiment described in the previous section by determining how much the phase and/or frequency of each person's movements, due to the mutual interactions, deviates from the initial values. The same can be done for a more challenging coordination task like the 'mirror game,' in which dyads are instructed to make creative yet synchronized rhythmic movements. For this paradigm, Słowinski et al. (2016) ´ recently developed a technique to determine the degree of movement adaptation, based on observed deviations of the 'individual motor signatures.'

A potentially stronger test involves the application of brief, unexpected perturbations that disrupt the interpersonal coordination pattern through a temporary arrest of one of the limbs (Peper et al., 2013). To re-establish the original coordination pattern, at least one of the persons has to adapt the phasing of his/her movements. In TD–TD dyads that intentionally synchronize their movements, both persons contribute approximately equally to this return process, yielding an adaptation ratio of about 0.5 (Peper et al., 2013). Participants with autism may show reduced adaptions of their movement phasing to the perturbed movements of the partner, resulting in a lower adaptation ratio and longer adaptation time before the original pattern is re-established.

So far, this technique has only been applied to situations in which TD–TD dyads were instructed to synchronize their movements. However, given the reduced social initiative in autism, it seems worthwhile to examine asymmetries in reciprocity during spontaneous entrainment (no instruction with respect to interpersonal coordination) as well. Since perturbation tests require a more advanced set-up than an entrainment test, it is useful to compare the results for both paradigms to determine whether the entrainment paradigm would suffice in this regard.

# CONCLUSION

To date, most research on the defining deficits of autism in social interactions has focused on social communicative behavior or cognition. Although the role of underlying bodily movement has largely been neglected, perceptuo-motor impairment (Spencer et al., 2000; Gepner and Mestre, 2002) may be expected to affect socio-cognitive functioning (Leary and Hill, 1996; De Jaegher, 2013; Cook, 2016). By focusing on covert movement coordination characteristics, the influences of acquired social or cognitive skills can be circumvented, uncovering the ways in which autism may be associated with impaired bodily connectedness (Marsh et al., 2013). The coordination dynamics approach offers experimental paradigms for scrutinizing specific aspects of bodily connectedness, which may help to assess defining characteristics of autism, such as poor social initiative

and reciprocity. To enhance the sensitivity of the proposed empirical methods additional modulations of the social setting may be applied, such as implicit social priming (Raffard et al., 2015).

If these assessments are successful, follow-up research may address their potential application in diagnostic procedures, for instance, by developing affordable set-ups (e.g., registration with Microsoft Kinect; Clark et al., 2013), determining whether human partners can be replaced by virtual partners/robots with (Dumas et al., 2014; Słowinski et al., 2016 ´ ) or without (Meerhoff et al., 2014; Zhao et al., 2015) interactional simulation software, and defining simplified protocols suitable for clinical use. Thus, a focus on bodily connectedness may contribute to the development of assessment tools that are

# REFERENCES


sensitive to the ongoing dynamics of social initiative and reciprocity in interpersonal interactions, while bypassing cognitive compensation strategies. In addition, it would provide additional fuel for theoretical considerations, regarding the underlying causes of autism and their potential relation to motoric and/or perceptual problems as highlighted by the embodied cognition account (von Hofsten and Rosander, 2012; De Jaegher, 2013).

# AUTHOR CONTRIBUTIONS

CP and SB: conception and writing of article; SW: critical reading and writing of article.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Peper, van der Wal and Begeer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Multidimensional Recurrence Quantification Analysis (MdRQA) for the Analysis of Multidimensional Time-Series: A Software Implementation in MATLAB and Its Application to Group-Level Data in Joint Action

#### Sebastian Wallot <sup>1</sup> \*, Andreas Roepstorff <sup>2</sup> and Dan Mønster 2, <sup>3</sup>

*<sup>1</sup> Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany, <sup>2</sup> Interacting Minds Centre, School of Culture and Society, Aarhus University, Aarhus, Denmark, <sup>3</sup> Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark*

#### Edited by:

*Rick Dale, University of California, Merced, USA*

#### Reviewed by:

*Chen Yu, Indiana University Bloomington, USA Fred Hasselman, Radboud University Nijmegen, Netherlands*

\*Correspondence:

*Sebastian Wallot sebastian.wallot@aesthetics.mpg.de*

#### Specialty section:

*This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology*

Received: *05 August 2016* Accepted: *07 November 2016* Published: *22 November 2016*

#### Citation:

*Wallot S, Roepstorff A and Mønster D (2016) Multidimensional Recurrence Quantification Analysis (MdRQA) for the Analysis of Multidimensional Time-Series: A Software Implementation in MATLAB and Its Application to Group-Level Data in Joint Action. Front. Psychol. 7:1835. doi: 10.3389/fpsyg.2016.01835*

We introduce Multidimensional Recurrence Quantification Analysis (MdRQA) as a tool to analyze multidimensional time-series data. We show how MdRQA can be used to capture the dynamics of high-dimensional signals, and how MdRQA can be used to assess coupling between two or more variables. In particular, we describe applications of the method in research on joint and collective action, as it provides a coherent analysis framework to systematically investigate dynamics at different group levels—from individual dynamics, to dyadic dynamics, up to global group-level of arbitrary size. The Appendix in Supplementary Material contains a software implementation in MATLAB to calculate MdRQA measures.

Keywords: Multidimensional Recurrence Quantification Analysis, MdRQA, multidimensional time-series, correlation, dynamics, joint action, MATLAB

# INTRODUCTION

The interest in joint action research in the past 15 years has come with an increased interest in the temporal dimension of action (Marsh et al., 2009; Knoblich et al., 2011), which offers additional information about linguistic, motor, physiological, or neuro-physiological underpinnings of that behavior (e.g., Shockley et al., 2003; Richardson and Dale, 2005; Richardson D. C. et al., 2007; Richardson M. J. et al., 2007; Dumas et al., 2010; Konvalinka et al., 2011; Louwerse et al., 2012; Fusaroli and Tylén, 2016).

Integrating information about the temporal dimension that characterizes the interaction of multiple actors always means to apply some kind of correlational analysis, with the terms "coupling" or "synchrony" used to loosely refer to more specific patterns of correlation that can be quantified. Many techniques are available to quantify patterns of correlation, such as cross-correlational methods (e.g., Konvalinka et al., 2010), methods to detect phase-coupling (Richardson M. J. et al., 2007), or methods to detect nonlinear patterns of coupling, such as techniques based on recurrence

(e.g., Shockley et al., 2003) or cross mapping (Sugihara et al., 2012). However, all of these methods primarily aim at data sets with two dependent variables (i.e., measurements taken from two participants performing a joint action task). The availability of methods that are readily applicable to the analysis of dyadic data may be one of several reasons why most joint action studies to date have been performed on the level of the dyad.

Investigation of group-level behavior has been done as well, but effectively resorting to bi-variate analyses, splitting the group behavior into all possible pairings and investigating the behavior as the average of all of its pairs. Apart from the fact that it would be desirable to quantify group-level behavior more properly (Fusaroli et al., 2014), as it might not always be the same as the average behavior of the constituting dyads, there are also practical implications on how to deal with pairwise decompositions statistically: If we have a group of three people (P1, P2, P3) that interact, and we quantify the group behavior as the average of pairwise interactions, we have to somehow deal with an insufficient number of independent degrees of freedom: Say the behaviors of P1 and P2 are positively correlated, and the behaviors of P2 and P3 are positively correlated, then the behaviors of P1 and P3 are also likely positively correlated and do not add independent information. So far, workarounds have been to either ignore this over determination in pairwise group analyses (Müller and Lindenberger, 2011), or try to work with a number random sub-samples of pairwise data points that reflect the number of actual independent degrees of freedom (e.g., Wallot et al., 2016).

The goal of the present paper is to introduce a multidimensional correlation technique, Multidimensional Recurrence Quantification Analysis (MdRQA), as a method to analyze group-level behavior of groups bigger than a dyad. In the following sections, we will describe MdRQA, explain its relation to standard Recurrence Quantification Analysis of individual time-series (RQA) and Cross-Recurrence Quantification Analysis of pairs of time-series (CRQA)—both of which have already been used to analyze dynamics of dyadic behavior (Shockley et al., 2003; Richardson and Dale, 2005; Richardson D. C. et al., 2007; Richardson M. J. et al., 2007; Konvalinka et al., 2011; Louwerse et al., 2012; Lang et al., 2016; Mønster et al., 2016a; Fusaroli and Tylén, 2016). We will also compare MdRQA to Joint Recurrence Quantification Analysis (JRQA)—another recurrence method that can be used to jointly analyze two or more time series. Then, we will show the utility of MdRQA, applying it to data from a joint action study featuring groups of three participants working on a joint production task. We show a correlation between group level dynamics of a physiological marker of arousal and independent outcome measures of the joint task. In accordance with previous analysis of the experiment using different techniques, this could not be seen at the level of aggregate individuals (Håkonsson et al., 2015) or dyads (Mønster et al., 2016a). Finally, we will end the article by discussing the interpretation of MdRQA results for group-level analysis, and summarize the advantages, disadvantages, and potential future developments of this technique. The Appendix in Supplementary Material of this paper contains MATLAB code to run the MdRQA analysis.

# MULTIDIMENSIONAL RECURRENCE QUANTIFICATION ANALYSIS (MdRQA)

MdRQA is a recurrence-based analysis technique to gauge the coordination pattern of multiple variables over time. The key concept of MdRQA, as the name suggests, is recurrence, meaning how the variables of interest repeat their values over time. MdRQA quantifies patterns of repetitions, which—depending on the interpretation of the analysis—are related to the dynamic characteristics of a multivariate system (see section "Comparison to RQA") or characterize the coordination of a group of variables over time (see sections "Comparison to CRQA," "Comparison to JRQA," and "Example: Origami production task").

MdRQA is a multivariate extension of simple RQA, which is an analysis technique that was developed to characterize the behavior of time-series that are the result of multiple interdependent variables, potentially exhibiting nonlinear behavior over time (Webber and Zbilut, 1994; Marwan et al., 2002). The basis of the RQA approach is phase-space reconstruction through time-delayed embedding. A phase-space is a space in which all possible states of a system under study can be charted. If full determination of the state of the system requires D independent variables, then the phase space has D dimensions. The method of time-delayed embedding allows the reconstruction of phase-space profiles from a single, onedimensional observable, following the logic of Takens' theorem (Takens, 1981). Takens showed that if a system of interest is comprised of multiple interdependent variables that drive its dynamics (i.e., its dynamics are multidimensional), and one has access only to a single observable x from the system (i.e., measuring one of its dimensions), then the multidimensional dynamics of that system can be reconstructed from the single measured dimension by plotting the observable x against itself a certain number of times at a certain time delay (see **Figure 1**). The starting point for the method is the measured values of the variable x:

$$\mathbf{x} = (\boldsymbol{\pi}\_1, \boldsymbol{\pi}\_2, \boldsymbol{\pi}\_3, \dots, \boldsymbol{\pi}\_n) \tag{1}$$

where **x** is a vector with values x<sup>1</sup> to x<sup>n</sup> representing the timeseries of the variable x sampled at regular times t1, t<sup>1</sup> + 1t, t<sup>1</sup> + 21t, ... t<sup>1</sup> + (n − 1)1t. If we know (or can estimate) the true dimension D of the dynamical system from which we have sampled x then we can construct D-dimensional vectors of the form:

$$\mathbf{V}\_1 = (\mathbf{x}\_1, \mathbf{x}\_{1+\tau}, \mathbf{x}\_{1+2\tau}, \dots, \mathbf{x}\_{1+(D-1)\tau}) \tag{2}$$

Note that the elements of **V**<sup>1</sup> are all elements from **x**, starting with x<sup>1</sup> sampled at time t<sup>1</sup> and then using values at later times, such as x1+<sup>τ</sup> sampled at t<sup>1</sup> + τ1t. Since the later times are all delayed relative to t<sup>1</sup> by an integer multiple of τ1t, the constant τ is called the time-lag. We can construct a similar vector **V**<sup>2</sup> by starting with x<sup>2</sup> sampled at t<sup>2</sup> = t<sup>1</sup> + 1t, and in fact we can construct n − (D − 1)τ such vectors, that can be arranged in a

matrix:

$$\mathbf{V} = \begin{pmatrix} \mathbf{V}\_1 \\ \mathbf{V}\_2 \\ \vdots \\ \mathbf{V}\_{n-(D-1)\tau} \end{pmatrix} = \begin{pmatrix} \mathbf{x}\_1 & \mathbf{x}\_{1+\tau} & \dots & \mathbf{x}\_{1+(D-1)\tau} \\ \mathbf{x}\_2 & \mathbf{x}\_{2+\tau} & \dots & \mathbf{x}\_{2+(D-1)\tau} \\ \vdots & \vdots & & \vdots \\ \mathbf{x}\_{n-(D-1)\tau} & \mathbf{x}\_{n-(D-2)\tau} & \dots & \mathbf{x}\_n \end{pmatrix} \tag{3}$$

Note that the rows are the D -dimensional phase-space vectors that we set out to construct above, while the columns are timedelayed versions of the first n−(D−1)τ elements of the vector **x**, delayed by 0τ , 1τ , 2τ , etc. The row index is a measure of time and each column index corresponds to a dimension in phasespace. Thus, the row vectors **V**<sup>i</sup> constitute points in the phasespace portrait of the multidimensional dynamics of the system from which the observable x was taken. The column vectors, **V**˜ j , j = 1, 2, . . . D are time series vectors, corresponding to the reconstructed dimensions of the phase space, and in particular **V**˜ <sup>1</sup> is the measured variable x from which the other dimensions are constructed. RQA is a method to statistically describe these multidimensional dynamics through the concept of recurrence in phase-space. RQA statistics are based on the recurrence plot (RP), which was invented as a means to graphically display the dynamics of a multidimensional phase-space (Eckmann et al., 1987). In essence, the RP describes repetitions of the values of **V** in its phase-space. A point RPij in the RP is considered recurrent if the distance **<sup>V</sup>**i(**x**) <sup>−</sup> **<sup>V</sup>**j(**x**) between the point **<sup>V</sup>**i(**x**) (at time ti) and the point **V**j(**x**) (at time tj) is smaller than the threshold T. This can be written as

$$\text{RP}\_{ij} = \Theta(T - \left\| \mathbf{V}\_i(\mathbf{x}) - \mathbf{V}\_j(\mathbf{x}) \right\|), \tag{4}$$

where 2(x) is the Heaviside step function, which has the value 0 for x < 0 and 1 for x ≥ 0. Throughout the remainder of the manuscript, values of the threshold parameter T are relative to a Euclidean distance norm of the respective phase-spaces.

As an example, imagine that we want to measure the position of a person on a merry-go-round, then assuming that the person does not move up and down, we only need two variables x and y to determine the position of the person at a given time. These two variables make up the phase-space of the system<sup>1</sup> . If we only measured one of these variables, say x, then we can reconstruct the full phase space from this variable alone using the method described above. **Figure 1** illustrates the process where the measured values of x have been simulated by using a sine-wave with added noise.

Because repetitions are usually never exact, either due to intrinsic fluctuations of the system's dynamics or measurement noise, a threshold parameter T is applied, within which values in phase-space are counted as being recurrent or not (see **Figure 2**).

MdRQA extends RQA by allowing the use of additional measured variables from the system under study to be used as dimensions in phase-space. Hence, instead of quantifying the dynamics of a D -dimensional system from a single observable x by using the D -dimensional vectors **V**i(x), MdRQA allows us to quantify the dynamics by using a number N of observables y1, y2, ... Y<sup>N</sup> to construct the phase-space:

$$\mathbf{W} = \begin{pmatrix} \mathbf{W}\_1 \\ \mathbf{W}\_2 \\ \vdots \\ \mathbf{W}\_n \end{pmatrix} = \begin{pmatrix} \mathbb{M}\_{1,1} & \mathbb{M}\_{2,1} & \dots & \mathbb{M}\_{N,1} \\ \mathbb{M}\_{1,2} & \mathbb{M}\_{2,2} & \dots & \mathbb{M}\_{N,2} \\ \vdots & \vdots & & \vdots \\ \mathbb{M}\_{1,n} & \mathbb{M}\_{2,n} & \dots & \mathbb{M}\_{N,n} \end{pmatrix} \tag{5}$$

where **W**<sup>i</sup> is the N -dimensional vector consisting of the N observables measured from the system sampled at time t<sup>i</sup> . The elements of the matrix **W** are thus given by Wij = yj,<sup>i</sup> , where yj,<sup>i</sup> is the value of y<sup>j</sup> at time t<sup>i</sup> .

MdRQA shares commonalities with Self-Similarity Matrices (SSM): Both methods rely on the computation of a distance matrix, where distances between sequences of positions of a multidimensional array are charted. However, while SSMs operate on the Euclidean distance of this distance matrix (e.g., Junejo et al., 2008), MdRQA proceeds by operating on the thresholded distance matrix (see RP illustration in **Figure 2**) in order quantify the matrix in terms of the standard recurrence measures (Webber and Zbilut, 1994; Marwan et al., 2002).

<sup>1</sup>For a full description we would also need the velocity as part of the phase-space, but we will ignore this here.

Earlier attempts to use RQA on multidimensional signals were made by computing the Euclidean distance of multiple signals and analyzing the resulting distance vector, for example by Thomasson et al. (2002) (cited in Webber and Zbilut, 2005) who quantified scaling characteristics in EEG-activity as a global brain-dynamics analysis. Applying RQA directly on multidimensional signals has been done in prior studies on the analysis of joint action by (Mitkidis et al., 2015; Wallot et al., 2016) to quantify the joint dynamics of hand movement in a joint car-model building task, taking each of the four hand acceleration time-series of the collaborating builders as variables.

# COMPARISON TO RQA

The relation between RQA and MdRQA has already been described above. Nevertheless, we want to illustrate how RQA can be used to infer the multidimensional dynamics of a system from a single observable, and compare this to how MdRQA allows the quantification of those dynamics by taking into account multiple observables. As an example, we choose the Lorenz system (Lorenz, 1963), a dynamic system of three coupled differential equations:

$$\begin{aligned} \frac{dx}{dt} &= \sigma(y - x) \\ \frac{dy}{dt} &= \varkappa(\rho - z) - y \\ \frac{dz}{dt} &= \varkappa y - \beta z \end{aligned} \tag{6}$$

where the parameters σ, ρ, β are constants with positive values. In the following we have chosen the fixed values σ = 10, ρ = 28, and β = 8/3. We solve the equations numerically in the interval 0 ≤ t ≤ 20, giving us solutions for x(t), y(t), and z(t), shown in **Figures 3A–C**. The maximum time (t = 20) is a somewhat arbitrary choice, that was chosen simply to give enough data points to use for recurrence analysis. We resample the data from the numerical integration to ensure that all three time series x(t), y(t), z(t) are sampled uniformly with the same time values, using a sampling interval 1t = 0.0162. In order to get comparable phase spaces, we further normalize the sampled time series for x, y, and z by using z-scores. If we plot the (zscored) points x(t), y(t), z(t) for all values of t, we get the wellknown Lorenz attractor, shown in **Figure 3G**. This plot shows the dynamics of the system in phase space, where the time, t, is no longer plotted along one of the axes, but each data point with regard to its position in the 3D space was sequentially plotted with temporal ordering on t.

Using the method of time-delayed embedding, we can take each of the individual dimensions, x, y, and z, to reconstruct the three-dimensional dynamics of the system via time-delayed embedding. The attractors, reconstructed with embedding dimension D = 3 and time delay τ = 4, are shown in **Figures 3D–F**. For the reconstructed attractors the points plotted are the row vectors **V**1, **V**2, and **V**3, that are created from the time-delayed values of x, y, and z, respectively.

The points that make up these reconstructed attractors using the time delayed embedding can be used to produce recurrence plots as shown in **Figures 3H–J**, by applying RQA. Note that

(H), *y* (I), and *z* (J); as well as based on the original attractor (K) with a threshold *T* = 0.08.

the axes on the RPs refer to vector index, rather than time, and correspond to the full time series shown in **Figures 3A–C** (there are 1234 samples, and 1234 · 1t = 20). Analogously, the information in all three dimensions can be used to produce the RP shown in **Figure 3K**, by applying MdRQA.

The figure illustrates that the time delayed embedding method relying on Takens' theorem does indeed produce reconstructed attractors (**Figures 3D–F**) that are isomorphic to the true attractor (**Figure 3G**), but it is also clear that the fidelity is not the same for all dimensions, e.g., the reconstruction based on z(t) does not properly reproduce the double-lobed structure of the original attractor. The RPs in **Figures 3H–J** that are based on a single variable x, y, or z clearly resemble each other, and also resemble the RP based on all three variables (**Figure 3K**). Many of the diagonal line structures are reproduced in all of the RPs, but with "noise" in the form of broken diagonal lines and points that are not part of diagonal lines seen in the RPs based on a single variable (**Figures 3H–J**) when compared to the RP based on all three variables (**Figure 3K**).

As mentioned above, the RP is not just a means to visually display the dynamics, but also allows to quantify them. Webber and Zbilut (1994) defined the first four recurrence measures, recurrence rate (RR), determinism (DET), average diagonal line length (ADL), and longest diagonal line length (LDL). These four measures quantify different aspects about the dynamics and their definitions are given in **Table 1**. Recurrence rate and determinism are commonly reported both as a fraction and in percent (% recurrence and % determinism).

Further measures have been developed and are currently in development (e.g., Marwan et al., 2002). However, for the purpose of describing MdRQA as a method we will only focus on those four. Values of the four recurrence measures for the recurrence plots shown in **Figures 3H–K** are shown in **Table 2**.

The measures in **Table 2** are consistent with the qualitative interpretation of the recurrence plots, presented above, and we also get some information that is difficult to read off a plot, e.g., that the recurrence rate is almost exactly the same in all of the RPs (with the RP based on y being slightly denser). The main difference between the MdRQA measures and the RQA measures is that the diagonal line structures are consistently longer in MdRQA than in RQA. This is because, in this case, MdRQA captures the true dynamics of the system, since we have all the dimensions included, whereas RQA is based on an approximation using only one of these. Moreover, this allows for

TABLE 1 | Description of the four RQA measures RR, DET, ADL, and LDL.


TABLE 2 | Values of the RQA measures RR, DET, ADL, and LDL for the recurrence plots shown in Figures 3H–K with embedding dimension D = 3, time delay τ = 4, and threshold T = 0.01 for RQA and T = 0.008 for MdRQA).


comparisons of how well the individual dimensions from which the phase-spaces were reconstructed approach the original: For example, comparing the RQA values in **Table 2**, it seems that the dimension x of the Lorenz system (**Figure 3A**) provides a better reconstruction than y and particularly z (**Figures 3B,C**, respectively).

# COMPARISON TO CRQA

Cross-Recurrence Quantification Analysis (CRQA) was probably the first multivariate extension of RQA, allowing for the analysis of two variables and their cross-recurrences (Marwan and Kurths, 2002). Besides explicitly incorporating more than one variable for analysis, CRQA also enables capturing the relation between the two variables, as CRQA-measures are not derived from the distances within a single phase-space profile, but are based on the distances between two profiles in phase-space. This is made explicit by comparing the formula for the recurrence plot (RP) with the formula for the cross recurrence plot (CRP). The recurrence plot is a plot of all non-zero elements of the recurrence matrix RPij (see Equation 4), just as the cross-recurrence plot is a plot of all non-zero elements of the cross-recurrence matrix CRPij:

$$\text{CRP}\_{\vec{\eta}} = \Theta(T - \left\| \mathbf{V}\_i(\mathbf{x}) - \mathbf{V}\_j(\mathbf{y}) \right\|) \tag{7}$$

Here, as in Equation (4), T is the threshold parameter that determines how close two points must be to each other to count as a recurrence. The formula for the RP (Equation 4) contains the distance, **<sup>V</sup>**i(**x**) <sup>−</sup> **<sup>V</sup>**j(**x**) , between two points, **<sup>V</sup>**<sup>i</sup> and **<sup>V</sup>**<sup>j</sup> in the reconstructed phase-space based on the points in the time series **x**, whereas the formula for the CRP contains the distance between a point **V**i(**x**) in the phase space reconstructed with points from **x** and a point **V**j(**y**) reconstructed with points from **y**.

As a model system to compare MdRQA with CRQA we choose a system of two coupled van der Pol oscillators, whose dynamics are governed by the coupled, second-order, differential equations:

$$\begin{aligned} \frac{d^2\chi}{dt^2} &= \mu(1-\chi^2)\frac{d\chi}{dt} - \chi + \epsilon\_1(\chi-\chi) \\ \frac{d^2\chi}{dt^2} &= \mu(1-\chi^2)\frac{d\chi}{dt} - \chi + \epsilon\_2(\chi-\chi) \end{aligned} \tag{8}$$

We fix µ = 100 and choose an asymmetric coupling between the variables, so that ǫ<sup>2</sup> = 5ǫ1, leaving only one free parameter in the system. A Cross-Recurrence Plot (CRP) and Multidimensional Recurrence Plot (MdRP) for the coupled van der Pol oscillators are shown in **Figure 4** for two different values of the coupling.

Comparing the time series at low coupling (**Figure 4A**) with the time series at high coupling (**Figure 4D**) it is evident that the two oscillators synchronize and become phase-locked for the high value of the coupling, whereas this happens on a longer time scale for low coupling. Here we are interested in whether CRQA and MdRQA capture this difference. There is a clear difference between the RPs produced by CRQA and MdRQA, both at low (**Figures 4B,C**) and high (**Figures 4E,F**) coupling. However, the RPs for CRQA at both low (**Figure 4B**) and high (**Figure 4E**) coupling look qualitatively similar, as do the RPs for MdRQA (**Figures 4C,F**). The RPs for MdRQA are indicative of a system that is initially non-periodic, but switches to periodic behavior. The RPs based in CRQA are somewhat insensitive to this, because the CRQA method is based on recurrence between to different phase-space trajectories—one built from x and one built from y —and these are both individually periodic, which masks the initial non-periodicity of the combined system.

To investigate the difference between CRQA and MdRQA in this example, we show in **Figure 5** how the recurrence measures obtained from the (cross-)recurrence plots vary as a function of coupling strength ǫ1. This figure demonstrates, quantitatively, that both methods are sensitive to changes in coupling. However, the MdRQA-based measures exhibit stronger, and more convergent correlations with coupling strength, which is evident from the correlation coefficients in **Table 3**: The MdRQA measures have generally high correlations with ǫ1, compared to the lower (in one case even negative) correlations between ǫ<sup>1</sup> and the CRQA measures.

This example of two coupled van der Pol oscillators illustrates the utility of MdRQA in detecting the coupling between two systems. It is important to note that this does not generally imply a greater sensitivity of MdRQA relative to CRQA, as we have not systematically tested different systems and their coupling properties.

# COMPARISON TO JRQA

Another extension of the basic recurrence plot is the Joint Recurrence Plot (JRP), which also allows investigations of the relation between multiple variables (see Marwan et al., 2007, for an introduction to JRPs and comparisons between JRPs and CRPs). While CRPs capture the commonalities between two signals as the distance between their phase-space profiles


TABLE 3 | Pairwise Pearson correlation coefficients between the RQA measures shown in Figure 5 and the coupling constant <sup>ǫ</sup><sup>1</sup> .

(see section above), JRPs capture the commonalities between two signals as coinciding instances of recurrence between the individual RPs of those signals. So first, proper RPs are constructed for each signal, and then their JRP can simply be computed by joining the plots together, so that common instances of recurrences are kept, but instances of recurrence that are different between the two plots are discarded. In the formula for the JRP, this is achieved as a product of two Heaviside functions, which is 1 if they are both 1 (recurrence in both variables) and 0 otherwise.

$$\text{JRP}\_{ij} = \Theta(T\_{\mathbf{x}} - \left\| \mathbf{V}\_i(\mathbf{x}) - \mathbf{V}\_j(\mathbf{x}) \right\|) \cdot \Theta(T\_{\mathbf{y}} - \left\| \mathbf{V}\_i(\mathbf{y}) - \mathbf{V}\_j(\mathbf{y}) \right\|) \tag{9}$$

Here, we allow for different thresholds T<sup>x</sup> and T<sup>y</sup> in the two phase spaces.

This plot can then be quantified just as a regular recurrence plot, yielding a Joint Recurrence Quantification Analysis (JRQA). Moreover, Marwan et al. (2007) also proposed a multivariate extension for JRQA, where the JRP is computed not just by joining two, but arbitrarily many individual RPs, based on a number (d) of observed variables y1, y2... yd:

$$\text{JRP}\_{\vec{\eta}} = \prod\_{k=1}^{d} \Theta(T\_k - \left\| \mathbf{V}\_i(\mathbf{y}\_k) - \mathbf{V}\_j(\mathbf{y}\_k) \right\|) \tag{10}$$

Hence, similar to MdRPs, JRPs also offer a way to quantify the simultaneous dynamics of more than two variables. The difference is that MdRPs are based on a phase-space that incorporate the component signals, JRPs are based on the RPs of the individual component signals which are joint together. In other words, MdRQA quantifies the commonalities based on the recurrence profile of a multi-component-signal phase-space, while multivariate JRPs quantify the commonalities based the recurrence profiles of multiple individual component signals. Using the Lorenz-system, we can illustrate the similarities and differences of how multivariate JRPs and MdRPs handle multivariate time series.

**Table 4** summarizes the quantitative differences between the multivariate JRP and the MdRP of the Lorenz system: In general, TABLE 4 | Values of the RQA measures RR, DET, ADL, and LDL for multivariate JRP shown in Figure 6A, and the MdRP shown in Figure 6/Figure 3K .


the values are of comparable magnitude, except for RR which is a factor 6 smaller for JRP compared to MdRP. This is due to the fact that the structure on the JRP is contingent on recurrence in all the three constituent RPs simultaneously. Since joint recurrence will not be perfect across the plots, many of the recurrent instances in the constituent plots will disappear in the JRP because recurrence is absent in at least one of the other RPs.

# EXAMPLE: ORIGAMI PRODUCTION TASK

As we have shown in the examples above, MdRQA can be used to quantify the dynamics of a multidimensional system at different levels of description by combining information from multiple variables, and it can be used to infer the shared dynamics of multiple time-series, similarly to CRQA or JRQA. In the following, we will apply MdRQA to empirical data to demonstrate how it can be used to systematically analyze group dynamics at different levels of aggregation: individuals, dyads, and at a global group level. In order to do so, we present a re-analysis of a sub-set of data from a study on teamwork investigating the role of team emotions for cooperation (Håkonsson et al., 2015; Mønster et al., 2016a).

In this study, teams of three participants were asked to build origami boats together over five consecutive sessions. The participants were told that the team that built the most boats would win an extra cash prize. Participants were fitted with heart rate, skin conductance, and facial electromyography monitors to investigate the role of dynamics of emotions during teamwork. Participants were then shown how to build the boats and subsequently built as many boats as they could during three 4-min sessions. After session three, participants were shown an alternative building technique and could choose to either adopt the new technique in sessions four and/or five, or stick with the original folding technique (see Mønster et al., 2016a, for further details on the study).

While the study by Håkonsson et al. (2015) looked at static effects of emotional measures, aggregating individual team members' physiological reactions to an average score, the study by Mønster et al. (2016a) re-examined the data using CRQA to look at shared emotional dynamics between pairs of teammembers. The individual physiological responses averaged at the group level showed only a marginal effect of emotions on outcomes in this team task (Håkonsson et al., 2015). However, shared emotional dynamics at the level of dyads as measured by skin conductance and electromyography of the zygomaticus major ("smiling muscle") were influenced by task conditions (Mønster et al., 2016a). Moreover, these dynamics were predictive of subjective self-reports of the team members, as well as the decision of whether to adopt a new work routine or not.

Comparing the results of these two studies demonstrates that the dynamics of physiological markers of arousal and emotions may contain information about interpersonal decisions and subjective states, and, importantly, that aggregate shared dyadic dynamics provides different information than aggregate individual scores. However, as discussed above, dyadic analysis only paints a partial picture of the global dynamics in groups bigger than two as it is effectively an aggregate of sub-groups at an intermediate level. In the following we demonstrate that MdRQA can be used to systematically investigate different levels of dynamics, starting from the individual to dyadic (triadic, etc.) relationships within a group, up to the highest level of global group-level-dynamics.

To illustrate this, we explore one of the observables from the origami-study, the skin conductance measure. Recall, that participants were put together in groups of three with the goal of producing as many origami boats during each session as possible. However, neither the individual measures of the group processes (Håkonsson et al., 2015), nor the dyadic shared dynamics investigated using CRQA (Mønster et al., 2016a) showed any predictive relationship to the performance outcome in terms of number of boats successfully built. Of course, it could simply be the case that the observables used in this study (skinconductance, heart-rate, electromyography of facial muscles) were not related to this aspect of group performance. However, it could also be the case that the group dynamics were not quantified at the level at which emotion-related team dynamics were relevant for team performance.

We used MdRQA to differentiate between these explanations. To that end, we subjected the individual skin-conductance records of team members to MdRQA1 and averaged the resulting measures across the team to capture the effect of the average individual skin-conductance dynamics. We denote the number n of measured observables taken as dimensions in MdRQA by an index number: Hence, MdRQA1 means that MdRQA was performed on a single, one-dimensional observable (equaling simple RQA), MdRQA2 means that MdRQA was performed on two, one-dimensional observables, and MdRQAN means that MdRQA was performed on N, one-dimensional observables. However, N does not necessarily equal the number of phasespace dimensions D, as time-delayed embedding is performed (see Section "A note on parameter estimation using MdRQA").

This allowed us to explore higher-level group-dynamics as well as the individual dynamics (i.e., MdRQA1). For the dyadic level, we subjected paired skin-conductance records within each team to MdRQA2 and averaged across the three resulting pairings per team to capture the effect of dyadic dynamics within the team. To capture the global effect of group level dynamics we subjected the three skin-conductance records simultaneously to MdRQA3.

We used the following embedding parameters to perform the analysis: Delay τ = 6, embedding dimension D = 6 (i.e., a 3-dimensional signal embedded once, 3 · 2 = 6), threshold T = 0.12, using a Euclidean norm. Note that normalization of the phase-space is important to compare different signals or samples with regard to their dynamics (see Shockley et al., 2003), and various norms can be used to achieve this (Webber and Zbilut, 2005). However, the most important thing about selecting a norm parameter is to keep it constant across all data sets.

Just as in the study by Mønster et al. (2016a), we computed the recurrence measures RR, DET, ADL, and LDL to capture the individual and shared skin-conductance dynamics (**Table 1** described these measures). We use these four resulting MdRQA measures for average individual-level team dynamics (RQA/MdRQA1), average dyadic-level team dynamics (MdRQA2), and group-level dynamics (MdRQA3) as predictors in a simple regression analysis to predict the number of boats a team built, successfully and unsuccessfully, for each session individually. **Figure 6** presents the results of the regression analysis in term of variance explained (R 2 ) by each of the three group levels. In accordance with Håkonsson et al. (2015) and Mønster et al. (2016a), neither the individual level nor the dyadic level dynamics predicted well the number of boats built (R <sup>2</sup> hovers around 0.1). In contrast, the analysis at the global group level showed a much stronger relation to the performance outcome, particularly in the later trials (R <sup>2</sup> MdRQA3 increases to above 0.2 in **Figure 7A**). A strikingly similar picture is seen for the unsuccessful building attempts (**Figure 7B**). This suggests the existence of genuine group-level physiological processes in team interaction that span simultaneous interaction of all three group members and correlate with a key aspect of group performance but are neither located within the individual group members, nor in their dyadic interaction.

The current example illustrates how MdRQA can specifically be used in research of social interaction to systematically investigate (shared) dynamics at different group-levels. We identify a correlation between a global level physiological proxy for group arousal dynamics and an independent outcome measure of the team performance that could neither be seen at the level of individuals (Håkonsson et al., 2015) nor of dyads (Mønster et al., 2016a). This demonstrates the potential of MdRQA to explore different levels of aggregation within one analytical framework. Our finding could be interpreted as evidence for the presence of an interpersonal synergy (Riley et al., 2011) at the group-level, that is, interaction of all three team members is crucial for successful task performance, and this performance (or at least the emotional-arousal aspect of it) is not attributable solely to the individual group members, but emerges in their interaction.

It is likely that this type of dynamics depends on the specifics of the group interaction. In the present experiment, all group members were simultaneously present in the same room, working on the origami figures. However, there could be other group-setting, where only certain participants can interact with each other, or only interact with each other in certain ways that constrains their behavior (Wallot et al., 2016). We hypothesize that in this case dyadic interaction would more relevant for group performance, and hence we would see the strongest correlation with MdRQA2. In the same vein, we hypothesize that performance in automated assembly lines, where "social interaction" is fully—or primarily—determined by electronic

FIGURE 6 | Multivariate JPR obtained by joining the individual RPs from Figures 3H–J (A). MdRP from Figure 3K (B). The plots convey a similar qualitative picture of the dynamics of the Lorenz system, with the main difference that the JRP has fewer points and fewer diagonal structures than the MdRP.

control systems that are the pace-maker of the interaction, would be most informative at the individual level. We suggest that MdRQA provides a coherent analysis framework to test such hypotheses.

# A NOTE ON PARAMETER ESTIMATION USING MdRQA

Of course, a system with two (or more) measured variables could boast yet-higher dimensional dynamics than the two (or more) measured variables at hand. Then, it would be necessary to infer the appropriate dimensionality and reconstruct the phase-space by the method of time-delayed embedding (Takens, 1981). Here, one can start by assessing the delay and embedding parameters from the individual component signals that are eventually fed to MdRQA. For example, before running MdRQA on three signals (MdRQA3), one can test each signal's embedding parameters, and if dimensionality of the individual signals, as determined by a false-nearest-neighbor algorithm (Kennel et al., 1992) is, say, six, then the time-series consisting of three component signals could be embedded once to yield this dimensionality (i.e., 3 · 2 = 6). However, as these methods are just estimators for embedding parameters, one could also try to infer the delay and embedding parameters directly from the multidimensional signals (Clark et al., 2014).

Whether or not (or how) to embed cannot be answered conclusively by such estimation procedures, however. Embedding might not always be necessary. As March et al. (2005) showed, an unembedded recurrence plot—the "parent plot" (p. 194)—can, under given circumstances, contain all the information that embedded versions of this plot provide, and Iwanski and Bradley (1998) showed that recurrence variables for a variety of deterministic systems are invariant or at least highly similar over a range of embedding parameters, including the non-embedded versions. However, in our own practical experience analyzing behavioral and physiological data, considerations regarding the "adequate" embedding of the data does sometimes make a substantial difference for the results, and effects of embedding on the results should at least be investigated.

Another issue is the question of comparing MdRPs of different dimensionality. If one is interested in comparing the magnitude of the different RQA-variables across a range of pairings of the component signals, using the analysis strategy we have described above [i.e., comparing for example DET for the individual signal (MdRQA1) vs. pairs of signals (MdRQA2) vs. the group-level (MdRQA3)], then one has to correct for the "baseline" effect of dimensionality on distances in phase-space and, subsequently, on all of the RQA outcome variables. **Figure 8** illustrates this: **Figures 8A,B** shows how the average distance in phase-space increases as the square-root of subsequent dimensions added (each new dimension was a z-scored vector of random numbers drawn from a uniform distribution [0, 1]). This increase is similar to the increase in average phase-space distance when a single random variable is embedded in increasingly higher dimensions, see **Figures 8C,D**.

In particular, for random variables with equal variance, the average phase-space distance increases with dimensionality as L 2 <sup>D</sup> <sup>=</sup> <sup>2</sup>D, giving the scaling relation:

$$L\_D = \sqrt{L\_{D+n}^2 - 2n} \tag{11}$$

where L<sup>D</sup> is the average distance in phase-space given some dimensionality D of that space, and LD+<sup>n</sup> is the average distance in a phase-space with n additional dimensions.

This can be taken as a baseline-correction factor to adjust the phase-space when one wants to compare RQA measures of, for example, a one-dimensional, non-embedded signal (RQA/MdRQA1) to three one-dimensional signals that are embedded together (i.e., MdRQA3). Alternatively, one could keep percent recurrence constant across RQAs obtained from phase-spaces with different dimensionality, and analyze other RQA measures, such as DET, ADL, or LDL. If, however, the one-dimensional signal is embedded in three dimensions using time-delayed surrogates, then such corrections are not necessary to compare RQA measures. This needs to be kept in mind if one wants to compare phase-spaces of different dimensionality using RQA/MdRQA, no matter whether the different dimensions are time-delayed surrogates or actual different observables.

# INTERPRETATION OF MdRQA, LIMITATIONS, AND POTENTIAL FUTURE DEVELOPMENTS

As already mentioned in the last section, illustrating the application of MdRQA on skin-conductance measures during teamwork, as well as in the sections relating MdRQA to

FIGURE 8 | Scaling of average phase-space distance with phase-space dimensionality (each dimension is a z-scored random variable taken from a uniform distribution [0, 1]). (A) Shows the increase of average distance as a function of separately added dimensions, and (B) shows that the increase in average distance follows the square-root of the dimensionality of the phase-space. (C) Shows the increase of average distance as a function of separately number of embeddings via time-delayed surrogates of a single random variable, and (D) shows that the increase in average distance follows the square-root of phase-space dimensionality as well. Distances in both cases scale similarly, with *L<sup>D</sup>* = *L* 2 *D*+*n* − 2*n* 1/<sup>2</sup> .

RQA, CRQA, and JRQA, there are two different, but related interpretations of MdRQA measures. On the one hand, we can interpret the outcome variables as capturing the dynamics of a (single) multidimensional system, as in the case of the Lorenz attractor, or as capturing synergistic relationship between different systems, as in the case of our skin-conductance example. Such interpretations might be more theoretically interesting, but could also put further demands on the data collected or explanations sought (i.e., is there a well-defined attractor manifold describing the dynamics of the variables? Can the coupling relationships between the variables be described in greater detail?). On the other hand, one can also simply view MdRQA as a tool to capture the simultaneous correlation of multiple variables over time—a form of dynamic multivariate correlation technique—that solves the problem of assessing multivariate correlation strength. In the former case, one would ideally investigate whether additional embedding is necessary (see consideration in the section "A note on parameter estimation in MdRQA"). In the latter case, one might consider simply using MdRQA on the non-embedded, one-dimensional component signals.

Besides the advantage of MdRQA, the ability to capture the dynamics of multiple signals at once, MdRQA also has disadvantages relative to other nonlinear coupling analyses, such as CRQA: At least with the method in its present form, it is not possible to calculate time-lagged coupling between signals to

# REFERENCES


investigate leader-follower relationships among the component variables as with CRQA (Coco and Dale, 2014). It is also not possible to test the specific influence that one component signal has on another over time as with convergent cross-mapping (Mønster et al., 2016b). Solutions to this problem could be comparisons of different MdRPs with and without the specific signal of interest, such as in Joint Recurrence Analysis (Romano et al., 2004), or investigating the effects of time-shifting individual signals systematically and comparing the resulting MdRPs (as has been suggested by Marwan et al. (2007) for JRPs with two variables). Future developments in this direction would be desirable for a more accurate and detailed analysis of group-level performances beyond the dyad, and recurrence-based techniques seem very well suited to tackle such challenges.

# AUTHOR CONTRIBUTIONS

SW invented the method. SW and DM created the software. SW, DM, and AR were involved in collecting/creating the example data and wrote the article.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2016.01835/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Wallot, Roepstorff and Mønster. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# How do Co-agents Actively Regulate their Collective Behavior States?

Jérôme Bourbousson\* and Marina Fortes-Bourbousson

"Movement, Interactions, Performance" Laboratory (EA4334), University of Nantes, Nantes, France

Keywords: interpersonal co-ordination, local couplings, co-regulation, social systems, joint action, team coordination, complex adaptive system

# INTRODUCTION

Collective capability of producing patterned collective behaviors is one important field of research in work psychology (e.g., shared cognition approach, Fiore and Salas, 2004; interactive team cognition approach, Cooke et al., 2013), neurosciences (e.g., social neuromarkers, Tognoli et al., in press; neurological mirroring, Waldman et al., 2015), sociology (Miller, 2013), or human movement science (e.g., joint movement, Schmidt and Richardson, 2008; team behavior, Araujo and Bourbousson, 2016). Within this stream of research, one neglected topic has been to conceptualize how interactors regulate online their dynamic involvement in collective activity, which is the individual skillful activity of adjusting online to the needs of the collective behavior. Grounded in of the hypothesis that collective behavior emerges from a self-organized complex system, the present opinion discusses the nature of the active regulation of the interactions performed by the co-agents. A deeper grasp of this regulation process is needed to understand how and why interpersonal co-ordination forms, stabilizes and/or is destroyed, leading to the emergence of high order phenomena at the team scale that are not fully predictable from the individual activities that compose the social system under study.

#### Edited by:

Richard C. Schmidt, College of the Holy Cross, USA

#### Reviewed by:

Julien Laroche, Akoustic Arts, France Jamie Gorman, Georgia Institute of Technology, USA

\*Correspondence: Jérôme Bourbousson Jerome.bourbousson@univ-nantes.fr

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 16 July 2016 Accepted: 20 October 2016 Published: 04 November 2016

#### Citation:

Bourbousson J and Fortes-Bourbousson M (2016) How do Co-agents Actively Regulate their Collective Behavior States? Front. Psychol. 7:1732. doi: 10.3389/fpsyg.2016.01732

Collective behavior is deemed here to constitute the property of a social system composed of living entities. In research that has considered collective behavior as emerging from a self-organized complex system, an important focus has been on the between-agents' interactions, supported by an informational flow that binds agents (e.g., Schmidt et al., 1990). In this stream of research, information is defined as an ambient energy that disturbs the agent, depending on his current activity (Varela et al., 1991). From the (interpersonal) informational flow, individual activities can be entrained, mutually affected by others' movements, so that the emerging collective behavior cannot be conceived out of either the nature or the content (i.e., being non-representational) of such a flow (e.g., Kelso, 1994; Lagarde and Kelso, 2006; Richardson et al., 2007). However, while between-agents informational flow has been considered the main binding mechanism that makes collective behavior emerge, we aim at pointing out that the way individuals manage their interaction in the real-time mainly has been theoretically presupposed rather than empirically investigated. We will use empirical and logical evidence to highlight shortcomings in the actual theorizations of the way individual movements merge into a collective unit. In our opinion, current research should restrict the importance of the co-regulation and the local couplings hypotheses. Both hypotheses appear unsatisfactory to us, and might probably be refined through a further consideration of the social system's size effects as a main topic.

# HYPOTHESIS 1: A COLLECTIVE BEHAVIOR EMERGES FROM INDIVIDUAL ACTIVITIES BEING LOCALLY COUPLED

According to the unifying principle of non-linear dynamical systems (see Jirsa and Kelso, 2013 for further detail on the co-ordination dynamics approach,), the collective behavior of a complex system emerges as the result of self-organization among the interacting individual parts that comprise the system, such as humans in a social system (see Schmidt and Richardson, 2008, for details on interpersonal co-ordination research). Thus, considering social systems as the place where collective behaviors emerge leads to the assumption that global collective patterns observable at the social system level of organization come from indivisible interpersonal dynamical couplings at a lower level of organization, also called local couplings. In this light, the rhythm of the collective behavior is supposed to change intermittently between periods of stable and unstable behaviors, depending on the capability of interacting parts to maintain or change their local coupling with respect to the evolving environment in which the social system is embedded (Glassman, 1973).

In such a conceptualization, interplays between the high (i.e., global) and low (i.e., local) levels of organization have been of particular interest (Rio and Warren, 2016). The emergence principle accounts for the process by which local couplings give rise to a higher order identifiable pattern, the so-called collective behavior. The global pattern that emerges thus cannot be reduced to the sum of its individual components, and cannot be predicted by the sole properties of these components. Conversely, the downward causation principle accounts for the process by which the global patterned behavior constrains the way in which individual agents behave and interact at the local level of organization, without these agents necessarily being aware of such a descending causality. According to the principle of parsimony of scientific explanations, and largely inspired by swarming intelligence theorizations, this local couplings hypothesis has been very successful in explaining from simple mechanisms how complex social systems behaviors can emerge from simple local rules of interaction.

# HYPOTHESIS 2: EMERGENT COLLECTIVE BEHAVIORS ARE SUPPORTED BY A PROCESS OF ≪ CO-REGULATION ≫ AT THE LEVEL OF THE LOCAL COUPLINGS

At a local level of organization, what allows a social system to exhibit the signatures of complex systems and thus let emerge a dynamical collective behavior? One important contribution that synthetized theoretical answers to this question came from the enactivist theory of interpersonal couplings (e.g., De Jaegher and Di Paolo, 2007). As the starting point, a collective behavior is captured through the identification of non-accidental patterns of individual behaviors, as observed at the global scale. These patterns can be captured by various tools, such as those well-developed for spatiotemporal pattern identification (Gudmundsson and Horton, 2016). However, an identifiable patterned behavioral co-ordination is not enough to consider that a collective behavior has emerged from interaction of its constituent individual parts; it also is required that the given interactors actively regulate the interpersonal co-ordination dynamics at the level of their local couplings. In other words, an informational flow must have occurred between them, and this flow must be dynamically managed. In a more fundamental way, De Jaegher and Di Paolo (2007) stated that complex phenomena of emergence are facilitated when both interactors simultaneously regulate their ongoing interpersonal co-ordination (i.e., a bi-directional flow of interplay), making the collective behavior achieved escape from any individual perspective of the interactors implied. In this specific case, the collective behavior can express all the marks of complexity and meta-stability needed to consider the social system as exhibiting self-sustained dynamical behaviors. The need for such a mutuality in the interaction fit under the theme of coregulation requirement, also discussed as a mutual awareness requirement in other research traditions (Fiore and Salas, 2004).

Some studies revealed the crucial function of this co-regulation requirement in interpersonal interactions, especially in those that used the perceptual crossing paradigm (Auvray et al., 2009). This device puts two actors in situations where they have to move an avatar in a virtual environment populated by different entities (avatars of humans and various lures), visually empty but providing tactile stimulation at each encounter through the mouse used by the participants. Interestingly, what helps participants to succeed in finding each other, and subsequently to experience social connectedness, is the occurring co-regulation process they both perceived simultaneously at some instances (Froese et al., 2014a,b), regardless of the extent to which each actor was satisfied by the unfolding interaction, since they were not informed of their current effectiveness in the task. In agreement with the co-regulation requirement for interpersonal co-ordination emergence, most of the studies testing this regulation process have been experimental and have focused on the co-ordination within dyads, providing reiterated evidence of the interpersonal benefits related to co-regulation processes (Schmidt and Richardson, 2008).

# PERPLEXING EMPIRICAL EVIDENCE 1: SOCIAL SYSTEMS DO NOT NEED CO-REGULATION TO PERFORM

While the hypothesis of a co-regulation requirement has been pervasive in interpersonal co-ordination research, some empirical studies have found it hard to observe in naturalistic empirical data, especially in goal-directed collective behaviors. For instance, Bourbousson and colleagues investigated how agents heeded their co-agents in the study of basketball teams performing in their natural social competitive context (Bourbousson et al., 2015). The authors examined mutual adjustments at the level of the activity that was meaningful for the interactors, and compared novice and expert teams. Teams were considered dynamic social networks, with team members as nodes and members' awareness of other members during ongoing performance as relations. Networks, and changes to them across games, were analyzed at different levels of organization, using social network analysis to identify patterns of co-regulation within the teams. Notably, the results showed that the reciprocity index, accounting for the instantaneous co-regulation occurring within all the considered dyads within the teams, was significantly lower than expected by chance when considering expert team co-ordination, but was not the case in novice team co-ordination. Moreover, the observed low co-regulation was very stable over time, so that the proposed intra-team patterns of regulation had all the marks of expertise. Other studies have reported similar observations in various field of team co-ordination, as in civilian command, control, and communication settings (Wellens and Ergener, 1988), socio-technical collaborative systems (Salmon et al., 2008), or various settings of cognitive engineering research (Cooke et al., 2009). Together, these studies suggested an enhanced capability of expert social systems to achieve and maintain an optimal level of awareness during the unfolding activity, with this level of awareness being lower than in novice social systems.

In this light, it appeared reasonable to the authors to consider that interactors' activities of regulation directed toward co-agents become parsimonious through practice and expertise enhancement, possibly enabled by a gradual establishment of implicit co-ordination processes (Bourbousson et al., 2015). Implicit co-ordination processes mean that interactors co-ordinate by drawing on accurate expectations of future intra-team events. These expectations are developed and shared by interactors through extensive shared practice prior to their current activity (Eccles, 2010; Gorman, 2014). It appears that whatever the nature of the process involved, expert interactors probably do not need to pay as much attention to their co-agents during ongoing task performance, as a result of their shared experiences. The co-regulation hypothesis is thus quite unsatisfactory, at least as a strong interaction requirement in goal-directed social systems that are composed of many inter-related dyads, and in which the shared experience of interactors allows them to adopt a parsimonious but effective structure of regulation of the intra-team co-ordination.

# PERPLEXING EMPIRICAL EVIDENCE 2: HUMAN AGENTS CAN GRASP THE GLOBAL PICTURE THEY HELP TO MAKE EMERGE

As introduced above, a main inspiration to collective behavior understanding has come from swarm intelligence, as observed in social insects (Theraulaz, 2014). Collective behaviors of social insects are powerful forms of collective intelligence because local couplings have been shown to be sufficient to give rise to very patterned and adaptive collective behaviors. Most of the time, agents do not even need to be strictly coupled together, as long as each of them maintains its coupling to the shared environment. Most complex-systems-inspired frameworks of interpersonal co-ordination have thus subsequently considered that local couplings were enough to conceptualize collective behaviors, that these local couplings signed a parsimonious way of structuring informational flows within the social system, and that such a process was a perfect example of the emergence phenomenon. However, unlike the research on social insects, that on interpersonal co-ordination has neglected to consider that human co-agents are capable of grasping the global picture they help to make emerge, especially in cases in which collective behavior is goal-directed and actively regulated by co-agents. In this way, the collective behavior in which individuals are involved may directly support their adaptive activity and thus be considered as a non-negligible informational constraint that supports humans' goal-directed behavior. This capability has been called holoptism, that is the ability for any interacting co-agent to perceive the dynamics of the whole interactive system (Noubel, 2004; Bauwens, 2005).

For instance, sport coaches are well aware of such a capability for holoptism in humans: When players are called to perceive the rhythm of the game, free spaces, or team fluidity of movements, the given agents thus couple to highorder spatiotemporal information that probably helps them to better couple locally<sup>1</sup> , but this information does not rely per se at the local coupling level itself (see Bourbousson et al., 2014 for an empirical research). Out of the sports domain, similar observations have also been discussed in the field of designing collaborative digital tools. For instance, Bauwens (2005) suggested looking with caution at swarming intelligence systems, and proposed that the peer-to-peer process might be re-considered in light of the quality of holoptism that is offered to user experience through digital collaborative practice.

While the local couplings hypothesis is very useful in swarming behaviors theories, our opinion is that current interpersonal co-ordination theories in humans run the risk of not being cautious enough when introducing the local couplings hypothesis as a starting point of the research (e.g., Silva et al., 2014). One can note that most of the experimental study designs have invited participants to adjust to a single co-agent, but this individual dyad level of investigation does not clearly distinguish local and global scales of the collective behavior (e.g., Schmidt and Richardson, 2008): When participants are asked to co-ordinate their arms in a dyad, by locally coupling with the movement of the co-agent, they also directly regulate the global co-ordination dynamics to which both are contributing, so that local and global perceptual capabilities coincide in the task goal. Thus, our opinion is that one approach to further investigate what holoptism may bring to interpersonal coordination theories might be to extend the number of coagents implied in the collective behavior under study to better allow for the distinction between the levels of organization that shape the social system's dynamics. For instance, such an extension of the number of participants involved in the study design would allow for discussing human capability of switching their attention from local couplings to the global interpersonal pattern.

<sup>1</sup>The question remains open whether holoptism only apply to goal-directed collective behavior, or may also be implied in spontaneous motor entrainment (i.e., unintentional interpersonal coordination patterns emergence).

# BREAKING THE DEADLOCK: CONSIDERING THAT THE NUMBER OF CO-AGENTS MATTERS IN THEORIZING SOCIAL SYSTEMS FUNCTIONING

Where does the problem probably lie? First, we have to remember that very few studies investigated how people actively are involved in regulating their interpersonal co-ordination states in the real-time. When this active regulation was discussed in the research, it was often considered a theoretical assumption related to the nature of the informational flow binding actors, rather than being empirically investigated and described. From this starting point, we have challenged two theoretical hypotheses, the co-regulation and the local couplings hypotheses, respectively. Our opinion is that both have been overlooked, probably due to a common property of the existing study designs: The number of co-agents implied in the experimental paradigms was quite small (i.e., two interacting agents; Alderisio et al., 2016). Studying dyads may have limited our fundamental understanding of how collective behaviors emerge from interacting individual activities. Empirical and theoretical benefits should thus come from studying operating social system larger than a dyad, especially by revising the co-regulation and the local couplings hypotheses.

What does it change to consider the number of co-agents implied in the study design as a variable? In the literature, few studies show how the number of agents involved in a given collective behavior really matter and can change the processes needed to make a collective behavior effective and adaptive. For instance, the effect of the co-agents' number has been studied abundantly in social insects' science, and is known as the effect of size colony on the adjustment processes. To illustrate, Perna et al. (2012) investigated termite colonies and identified two main adjustment processes that may explain the emergence of collective behaviors. The first process is a purely local mechanism that accounts for an arrangement of agents' behaviors based on only local information. The second process is a local estimation of global properties, and accounts for agents being sensitive to the efficiency of the current collective behavior (i.e., through rudimentary sensory sensitivity) and of improving on it based on information about some global parameters of the existing social system (Perna et al., 2012). Interestingly, the given insects were shown to be probably capable of switching from the first to the second process when the social system exceeded a threshold in term of colony size–the first process being less resilient to environmental changes or unpredictable events.

Obviously, the topic of co-agents' number was not discussed enough in human behavior science, but a few examples may be found in numerical science, especially in human crowd modeling, that explain how human collective systems can exhibit adjustment mechanisms that change, and are very dependent on the number of co-agents (Mehran et al., 2009). Some examples can also be found in the study of financial market fluctuations where interactions between agents are considered a variable (e.g., Lux and Marchesi, 1999), but these interactions are not expressed as a linear function of the investors' number but rather as subjected to a threshold effect that makes social contagion more or less pronounced (e.g., Orléan, 1990). Specificity of human collective behaviors often relies on interpersonal co-ordination being itself the goal to achieve, implying that co-agents interact to actively create/maintain/disrupt global interpersonal states of behavior, and, in some instances, these states are probably managed through holoptism capability. Empirical studies that investigate effects of the social system's size on the collective behavior of humans who are actively regulating their online states of co-ordination will contribute to an opened avenue of research on the topic of interpersonal co-ordination dynamics. Unanswered questions thus would need to be addressed, like knowing how many members implied in the social system might require or prevent occurrences of holoptism or one-sided coordination processes.

# PERSPECTIVES

How can informational flows be patterned in goal-directed social systems larger than dyads? For instance, in the case of co-agents reciprocally co-regulating their activities in a 5-member social system, each interactor must regulate four co-ordination links at once, which makes the attentional requirement of the task very hard to manage, and even harder in a 10-member social system in which 45 co-ordination links have to be simultaneously co-regulated, and so on. To counter-balance the co-regulation hypothesis, it is probable that co-regulation can occur only between certain co-agents, and the overall social system functions through few co-ordination links (i.e., low density within the network of informational flows). It is also likely that the coupling linkages do not necessarily need to be reciprocal between the co-agents, so that one-sided co-ordination should provide benefits to the global efficiency and parsimony of the system. It is even more likely that co-agents can face the difficulty of regulating each local coupling by grasping the overall picture at some point in their activity (i.e., global matching capabilities), thus counter-balancing the local couplings hypothesis. Related questions should then be addressed: does structural congruence between members, as achieved through recurrent interactions in team training (Maturana and Varela, 1987), help them to pay less (reciprocal) attention to the regulation of their couplings? Do some properties of interpersonal networks allow for a lessened need of agents' co-regulation, due to a somewhat 'less effort for more effects' phenomenon, such as might be hypothesized in wide networks? To which extent does holoptism capability help to better explain the emergence of non-goaldirected (i.e., unintentional) patterns of collective behavior? These proposals need to be challenged through empirical data analysis in future research, which should allow better theorization of how co-agents couple through skillful dynamic individual adjustments.

# AUTHOR CONTRIBUTIONS

JB originated the questioning. JB, MF co-wrote the manuscript.

# REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Bourbousson and Fortes-Bourbousson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Competitiveness and the Process of Co-adaptation in Team Sport Performance

#### Pedro Passos<sup>1</sup> \*, Duarte Araújo<sup>1</sup> and Keith Davids<sup>2</sup>

<sup>1</sup> CIPER, Faculdade de Motricidade Humana, Universidade de Lisboa, Lisboa, Portugal, <sup>2</sup> Centre for Sports Engineering Research, Sheffield Hallam University, Sheffield, UK

An evolutionary psycho-biological perspective on competitiveness dynamics is presented, focusing on continuous behavioral co-adaptations to constraints that arise in performance environments. We suggest that an athlete's behavioral dynamics are constrained by circumstances of competing for the availability of resources, which once obtained offer possibilities for performance success. This defines the influence of the athlete-environment relationship on competitiveness. Constraining factors in performance include proximity to target areas in team sports and the number of other competitors in a location. By pushing the athlete beyond existing limits, competitiveness enhances opportunities for co-adaptation, innovation and creativity, which can lead individuals toward different performance solutions to achieve the same performance goal. Underpinned by an ecological dynamics framework we examine whether competitiveness is a crucial feature to succeed in team sports. Our focus is on intra-team competitiveness, concerning the capacity of individuals within a team to become perceptually attuned to affordances in a given performance context which can increase their likelihood of success. This conceptualization implies a re-consideration of the concept of competitiveness, not as an inherited trait or entity to be acquired, but rather theorizing it as a functional performer-environment relationship that needs to be explored, developed, enhanced and maintained in team games training programs.

#### Edited by:

Michael J. Richardson, University of Cincinnati, USA

#### Reviewed by:

Ruud J. R. Den Hartigh, University of Groningen, Netherlands Yuji Yamamoto, Nagoya University, Japan

> \*Correspondence: Pedro Passos ppassos@fmh.ulisboa.pt

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 20 July 2016 Accepted: 26 September 2016 Published: 10 October 2016

#### Citation:

Passos P, Araújo D and Davids K (2016) Competitiveness and the Process of Co-adaptation in Team Sport Performance. Front. Psychol. 7:1562. doi: 10.3389/fpsyg.2016.01562 Keywords: competitive behavior, team sports, interpersonal coordination, affordances, constraints

# INTRODUCTION

In the current research literature there are three different approaches to understanding competitiveness: (i) a psychological perspective where competitiveness is conceptualized as an innate drive and viewed as a personality trait (Kayhan, 2003); (ii) another psychological view where competiveness is understood as a dynamical mental state which drives a performer toward excellence sustained by social comparisons to be better than others (Jones, 2015); and (iii) an evolutionary biological perspective where competitiveness is seen at the behavioral level as the ability to use resources in competition with others (Baldauf et al., 2014).

From the theoretical perspective of ecological dynamics, competitiveness can be conceptualized as a constraint on sports performance which influences emergence of a performer's competitive behaviors. At an ecological level competitiveness is a constraint, resulting from the confluence of environment, task and personal constraints, which can be managed during training, for instance,

with added rules (e.g., receive the ball while running), spatialtemporal constraints (e.g., short interpersonal distances), or manipulated pressure (e.g., technical and tactical similarity among opponents). But a key issue to enhance competitiveness is that these tasks constraints need to be manipulated to 'push' players beyond current performance levels, otherwise increasing competitiveness has little functionality in the representative practice contexts.

Competitiveness in a performance context is a constraint that creates affordances [i.e., possibilities for action, (Gibson, 1979)]. Consequently, sport practice programs provide an opportunity to simulate important performance sub-phases where such affordances can be perceived. Here we propose an interaction between the psychological and biological perspectives, in the form of an evolutionary psycho-biological framework to explore the idea that competitiveness can be characterized at the individual-environment level in behavioral dynamics. Continuous co-adaptations of individuals to constraints arise from situational factors which bound each individual's competitive behaviors.

This theoretical rationale sharply contrasts with considering competitiveness as a psychological entity to be gained or as an inherited trait. Rather it can be viewed at the level of the integrated performer-environment system, as a functional relationship that needs to be explored, enhanced and maintained in sport practice programs.

# How Intrateam Competition Enhances 'Fitness' for a Performance Environment

The relevance of situational factors, such as performance standards or the number of competitors involved in a collective system, can influence an athlete's competitive behaviors in sport (Garcia et al., 2013).

In discussing competitiveness there is a need to focus on the interaction between players in the same group, competing for selection by a coach, for example. As noted earlier, competitiveness (from a biological perspective) can be defined as the ability to use resources in competition with others (Baldauf et al., 2014). This definition supports the need to create, within the same team, an 'interteam' environment (e.g., small-sided and conditioned games, designing task constraints representative of specific sub-phases of competitive performance environments, e.g., 2v1; 3v2) which can increase intrateam competitiveness. By creating these competitive environments within squads of athletes, two categories of resources are uncovered, for which individuals have to compete: (i) intrateam resources which lead to competition between teammates (e.g., the development of technical and tactical skills to struggle for selection at a development academy or for a position in the senior squad); and (ii), interteam resources which lead to enhanced competitive behaviors against opponents.

Thus, each athlete's abilities to seek resources to function competitively will lead to the acquisition of psycho-physical, social and emotional resources over a long time scale (e.g., enabling athletes to become more functional in performance), enhancing their capacity to compete and gain selection, key roles, and status within a squad. An intrateam focus on competitiveness is needed in coaching, not driven by external comparisons for their own sake but to understand and re-define an individual's 'fitness' to compete in team sports. The term 'fitness' is not used as in the conventional way in sports training to signify a level of physical conditioning. Rather in this paper it has a connotation from the evolutionary sciences which examines the functionality of a relationship between an organism and its environment (Kauffman, 1995). A fitness landscape captures a range of behaviors that define how an organism can utilize affordances to enhance its functionality (e.g., successfully achieving goals and subgoals) in competing with other members of its species (intraspecies competitiveness) and with other species (inter-species competitiveness). Enhancing the 'fitness' of athletes to achieve resources and performance goals, enables them to compete for resources that allow them to perform more successfully (i.e., overcome opponents, support teammates, win in competition, earn sponsorships, achieve better professional contracts) through exploiting similar processes of co-adaptation (Davids et al., 2008).

# The Process of Co-adaptation

In nature, different biological systems have developed tools which enhance their competitiveness within their own species through the process of continuous co-adaptation to arising constraints. This concept is also influential in understanding how the process of competitiveness between and within athletes in sport can be functional for enhancing development, learning, and performance. Although evolution, learning, development, and performance have different timescales, their dynamical processes are predicated on the same principles. The key point in ecological dynamics is that the same principles underlie system dynamics, regardless of timescales of emergence (Newell et al., 2001).

In ecological dynamics, the term 'fitness' at an evolutionary scale of analysis can be helpful in describing how functionally adapted an individual member of a species is to the affordances in an econiche. Species change due to continuous interactions with other species and with their environment, and the dynamical process of continuous co-adaptation drives the co-evolution of functional behaviors (Kauffman, 1995). At the heart of these continuous interactions between species and environmental constraints, is a competition between biological organisms for resources noted earlier. In this way co-adaptation is the engine of evolutionary change. However, it is possible to characterize the term 'interaction' in two ways: (i) if there is no incentive to change, two competing species might keep their distance from each other and each population would evolve toward a steady state (no competitiveness); or (ii), in contrast, affordances provide opportunities for specific behaviors to emerge, for instance to compete for resources which enhance functionality (competitiveness). Competing for resources in one population might open the possibility for new affordances, due to the emergence of new skills leading to adaptive behaviors (Kauffman, 1995). These enhanced capacities within individual members of a species provide an 'optimal grip' on the specific 'form of life' that surrounds an individual athlete in sport, including the social and cultural 'climate' during practice and training (Rietveld and Kiverstein, 2014; Davids et al., 2016).

The process of co-adaptation drives an organism's relations with its environment in different directions, some of which may enhance its fitness in a performance environment, whereas others may lead to performance decrements and 'extinction' in the form of lack of competitiveness.

The utilization of affordances is a major feature of each individual's capacity to co-adapt to task and environmental constraints through competition which coaches can facilitate.

As mentioned earlier, the term affordance refers to action possibilities, and to perceive an affordance is to perceive how one could act with respect to a performance environment in sport. However, affordances are neither external properties of an environment, nor are they mentalistic properties of the mind. Rather, affordances are relational properties of an individualenvironment system and capture the action specific relations that exist between the action capabilities of an individual performer and the action relevant properties of the substances, surfaces, objects, others and events of a performance environment. In other words, affordances capture the "fit" between an individual and environment (Gibson, 1979).

In order to utilize affordances, individuals allocate different resources to enhance their competitive capacity: some may invest in physical resources (e.g., velocity, strength, flexibility), others may invest in perceptual abilities [e.g., increase the speed of gaze (scanning) patterns to anticipate threats from opponents]. Some individual organisms adopt risky behaviors (e.g., being more creative and playing with flair) than others, who prefer to perform conservatively, avoiding risky decisions. These different behaviors will shape the overall competitiveness of a group. Thus, competitiveness enhances innovation and creativity which provides individuals with different performance solutions for achieving the same goal (Kuperberg, 2003).

# Co-adaptation and Ecological Dynamics in Sport

Previous research has suggested that continuous attackerdefender interpersonal interactions in team sports, can be considered as emerging from a dyadic (1 vs. 1) sub-system, evolving by alternating between periods of stability and variability (Passos et al., 2009, 2013). In these team game dyadic systems, defenders compete with attackers to maintain system stability (remaining between the attacker and the goal/try line/basket), as attackers seek to de-stabilize it (Passos and Davids, 2015; Shafizadeh et al., 2016). As a consequence, the 'fitness' of performers in adapting to the changing competitive system can become more demanding. There is a tightening of space-time constraints which shorten the time for actions (Araújo et al., 2013) due, for instance, to a decrease in values of interpersonal distance between players. As the competitive sport system evolves there is a concomitant need for athletes to engage in exploratory behaviors to seek and establish functional movement solutions to satisfy the changing constraints of competitive performance (Davids et al., 2012).

In team sports the capacity to co-adapt behaviors in seeking affordances to utilize during competitive performance are predicated on two sorts of interpersonal coordination processes: intrateam coordination and interteam coordination. Intrateam coordination is supported by cooperation among players of the same team, where the patterns formed (e.g., geometric shapes formed by players' relative position) are characterized as preferred system states (Warren, 2006), offering specific affordances for those involved. During competitive performance in team games, the decreasing of interpersonal distance between competing players can disturb intrateam coordination patterns, continually demanding co-adaptive behaviors between performers to support different behavioral solutions to overcome opposition strategies. This aspect of co-adaptation between performers emphasizes the need for cooperation within a collective system in order to remain competitive. The emergence of different behavioral solutions can signify that previous preferred system states may no longer have been functional. That is, affordances available for utilizing an intrateam pattern of coordination may no longer have been available. As a consequence, the players need to reorganize into functional system states, as other affordances become available. In coadapting to opponents, performers may need to transit from one intrateam coordination pattern to another, since 'new' patterns of co-adaptive cooperation open for 'new' affordances, from a landscape of affordances (Rietveld and Kiverstein, 2014) which continuously evolve according to competitive dynamics.

Additionally, performers need to adapt to competitive constraints by exploiting interteam coordination processes, i.e., attacker-defender interpersonal coordination tendencies. Theoretically, interteam coordination tendencies remain relatively stable when both sides play within the rules and the 'spirit' of the game. Further, there are some rare instances when teams are happy to share a tied game and do not need to compete as they would normally for the same resources, for instance, to penetrate defensive space on field as they would normally, or to fight for ball possession. Therefore, competing sport teams can be conceptualized as components of a dynamical system which can display competing and cooperative tendencies (Davids et al., 1994; McGarry et al., 2002).

However, from the range of component variables that might characterize a dynamical system there is a subset of variables known as 'essential variables'<sup>1</sup> (Ashby, 1960; Kauffman, 1993). For instance, in sport such systems may involve two competing players and the variables might include physiological states, emotional states, but also technical and tactical skills. In an attacker-defender system which remains in a steady state, the values of these 'essential variables' must be kept within specific bounded ranges. When for some reason system constraints lead to a change in the values of one or more essential variables pushing them beyond the boundaries, system stability might be disturbed. Then the system might be poised to 'jump' to another preferred state, where the essential variables are maintained within other boundaries (or not). We argue that these 'jumps' are changes in the performer-environment system that occur after perceiving and realizing a new affordance, here

<sup>1</sup>The term 'essential variables' can be equated to 'control parameter,' previously used in the literature (see Passos et al., 2008 as an example). The terms can relate to each other due to the fact that when achieved a critical 'value' the system jump to a new performance state.

conceived as an attractor. Ashby (1960) suggested that the 'fittest' (most functional) attractors in the landscape of affordances were preferred system states (Ashby, 1960; Kauffman, 1993). Relating this idea from theoretical biology back to the example of attacker-defender dyads in rugby union, if the values of system essential variables (e.g., each player's running line velocity) remain within specific boundary limits (i.e., both contributing to a stabilization in the difference in running line velocity values) the system will remain in a current state of stability, which obviously favors the defender. However, an increase in value of the attacker's velocity, and a stabilization or decrease in the value of a defender's running line velocity, will drive an attackerdefender system to transit to another preferred system state (another attractor in the landscape), providing an advantage for the attacker (Passos et al., 2008). This is a core idea in the paper: Changes in system essential variables (due to a dynamical constraints of a competitive performance environment) will 'push' the entire system to another preferred state that exists in the competitive performance landscape. Jumps/transitions between preferred system states only occur due to changes in the values of system essential variables, which in turn are influenced by key constraints of a competitive performance environment. It is important to note that changes in values of essential variables can be due to the use, when competing, of 'new' individual resources (e.g., an increase in the acceleration profile or strength gains or adoption of an innovative 'new' dribbling technique in team sports), which may only emerge as a consequence of the co-adaptations to task and environmental constraints. This is how pedagogical practice and sport science support can greatly enhance the competitive behavior of individual athletes, by designing affordance landscapes in training enhancing competitiveness to ensure that performers can seek and exploit resources beyond individual limits.

# Competitiveness and the Implications for Skill Acquisition

Continuous co-adaptations, from developmental athlete to expert performer status, continually emphasize the need for individuals to train to adapt to the dynamic constraints of a competitive performance environment. Co-adaptations demanded by teammates and by coaching and sport science staff provide a platform of competitiveness for harnessing the competitive behavior of an individual to improve his/her own performance standards and enhance their competitive 'fitness' in the performance environment.

Such a conceptualization suggests that skill acquisition needs to be considered as skill adaptation, continuously constrained by key features of a performer-environment system (e.g., opponent skill levels; player perceptual systems; player technical skills; tactical performance behaviors; Araujo and Davids, 2011). An implication of harnessing competitiveness in practice is that the mutual and reciprocal interaction of the playerenvironment system enhances the attunement of performers to available information which can be used to functionally regulate their actions, during skill acquisition (Davids et al., 2012). During interactions with surrounding performers each individual learns to perceive new affordances in a competitive environment according to their evolving skill. In other words, skill acquisition leads to changes in properties of a specific competitive environment to which each individual's perceptual systems become attuned (Araújo et al., 2013; Passos and Davids, 2015).

During the course of action ongoing perceptual regulation sustains an individual performer's adaptive behaviors to satisfy specific task constraints, for instance the time needed to reduce the distance to an opponent (Davids et al., 2012). It needs to be noted that a performer's adaptive behaviors tend to create fluctuations in interpersonal coordination tendencies. Such fluctuations do not exist a priori, since they emerge molded by specific task constraints (Davids et al., 2012), such as the values of interpersonal distances to an opponent (Passos et al., 2008; Shafizadeh et al., 2016); or the interpersonal angle between ball carrier, the location of the goal and the closest defender (Vilar et al., 2013, 2014). Fluctuations provide information for affordances to which performers need to become attuned during practice and performance. These fluctuations only occur within critical regions where performance behaviors are no longer independent from other adjacent individuals (i.e., teammates; opponents), and each individual has to compete for available resources in order to succeed.

The level of competitive behavior varies considerably across individuals in space and time (Baldauf et al., 2014). Some players display more competitive behaviors in key performance areas, for example closer to their own goal area, whereas other individuals become more competitive closer to the opposition's goal area. Some players become highly competitive at selected time points, for example in different periods of a match, whereas others are highly competitive as soon as a match begins. In training these individual differences need to be explored and enhanced through designing an affordance landscape for individuals at different expertise levels. In competitive environments performers need to be attuned to affordances that support preferred behavioral states which satisfy constraints in dynamic contexts where unpredictability is ubiquitous.

# CONCLUSION

The acquisition of new skills requires exploratory behaviors on the part of each athlete who has to assemble unique functional movement solutions to satisfy particular task constraints. The perceptual-motor landscape of each individual changes as a consequence of new experiences and the acquisition of new skills. This aspect of skill acquisition means that players develop skills to enable them to compete for available resources (e.g., space-time gaps to perform key actions successfully; preventing opponents from dictating play or to de-stabilize a dyadic system formed with an adjacent opponent). Understanding practice designs for exploiting co-adaptive moves will help athletes and sports teams, as complex adaptive systems, to harness competitiveness in an intrinsic way so that each player drives the adaptations needed to continually re-define their 'fitness' for an 'optimal grip' on a form of life in sport performance (Davids et al., 2016).

# AUTHOR CONTRIBUTIONS

fpsyg-07-01562 October 6, 2016 Time: 13:7 # 5

PP contributes with the skeleton and first draft of the chapter, more especifically with the issues related with co-adaptation

# REFERENCES


in collective behaviors as team sports. DA contributes to the theoretical issues regarding the ecological dynamics approach as a framework to competitive practice designs. KD contributes for the theoretical issues related with the constraints led approach.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Passos, Araújo and Davids. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Coordination and Collective Performance: Cooperative Goals Boost Interpersonal Synchrony and Task Outcomes

#### Jamie S. Allsop, Tomas Vaitkus, Dannette Marie and Lynden K. Miles\*

School of Psychology, University of Aberdeen, Aberdeen, UK

Whether it be a rugby team or a rescue crew, ensuring peak group performance is a primary goal during collective activities. In reality, however, groups often suffer from productivity losses that can lead to less than optimal outputs. Where researchers have focused on this problem, inefficiencies in the way team members coordinate their efforts has been identified as one potent source of productivity decrements. Here, we set out to explore whether performance on a simple object movement task is shaped by the spontaneous emergence of interpersonally coordinated behavior. Forty-six pairs of participants were instructed to either compete or cooperate in order to empty a container of approximately 100 small plastic balls as quickly and accurately as possible. Each trial was recorded to video and a frame-differencing approach was employed to estimate between-person coordination. The results revealed that cooperative pairs coordinated to a greater extent than their competitive counterparts. Furthermore, coordination, as well as movement regularity were positively related to accuracy, an effect that was most prominent when the task was structured such that opportunities to coordinate were restricted. These findings are discussed with regard to contemporary theories of coordination and collective performance.

#### Edited by:

Michael J. Richardson, University of Cincinnati, USA

#### Reviewed by:

Daniel Richardson, University College London, UK Jeffrey Wagman, Illinois State University, USA

> \*Correspondence: Lynden K. Miles lynden.miles@abdn.ac.uk

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 06 June 2016 Accepted: 12 September 2016 Published: 27 September 2016

#### Citation:

Allsop JS, Vaitkus T, Marie D and Miles LK (2016) Coordination and Collective Performance: Cooperative Goals Boost Interpersonal Synchrony and Task Outcomes. Front. Psychol. 7:1462. doi: 10.3389/fpsyg.2016.01462 Keywords: interpersonal synchrony, cooperation, competition, productivity, teamwork, coordination, groups

# INTRODUCTION

Many of life's most valued outcomes are only attainable by combining efforts with others. No amount of exertion, or expertise, will ever allow the lone rugby player to defeat an opposition team of 15. Similarly, achievements in a modern operating theater, flight deck, boardroom, or restaurant kitchen are enabled to the extent that individual agents act in concert with colleagues. Teamwork, however, is not all moonlight and roses. Not only can group performance exceed the capacity of individual members, but teams can also underperform by failing to optimally realize their collective potential. While researchers have identified several phenomena that characterize specific aspects of group productivity (e.g., social loafing, social facilitation, Köhler effect), the issue, in essence, is one of coordination. Combining efforts leads to the emergence of dependencies (i.e., links) between team members. The efficiency of these links, that is, the extent to which each member's actions are functionally coordinated, in large part determines the effectiveness of the group.

Grounded in an extensive literature concerning collective performance (see Kozlowski and Ilgen, 2006 for an overview), contemporary theorists have argued that teamwork can be

conceptualized as a complex dynamical system (e.g., McGrath et al., 1999; Marks et al., 2001; Gorman et al., 2010; Waller et al., 2016). Specifically, rather than characterize group productivity as the simple aggregate of each member's individual level attributes (e.g., a priori skill, motivation, capacity), the dynamical stance proposes that collective performance is an emergent property, arising from the interaction of the system's components over time (Kelso, 1995; Schmidt and Richardson, 2008). Viewed in this way, patterns of productivity are not determined by top-down linear cause-and-effect relationships, but instead emerge via the intermittent and non-linear interactions between individual team members. The effectiveness of, for instance, a team consisting of a rally driver and navigator is not a linear combination of their respective skill levels — excellent navigation combined with poor driving is unlikely to yield performance equivalent to similarly excellent driving paired with poor navigation. In other words, team performance can be considered to emerge from the quality of the functionally specific interactions between team members, that is, the degree to which task-relevant dependencies are coordinated.

What then, does it mean to be coordinated in this sense? Conceptually speaking the teamwork literature considers coordination to encapsulate the range of activities (e.g., goal sharing, task assignment, resource allocation) required to effectively manage the timing and execution of interdependent efforts within a group (Steiner, 1972; Marks et al., 2001; Kozlowski and Bell, 2003; Espinosa et al., 2004). Although broad and clearly context-specific, two key commonalities have been identified that constitute coordinated efforts (Kozlowski and Ilgen, 2006). Coordination involves: (i) the integration of distinct actions; (ii) in a manner that is temporally aligned with other contributions. Typically, coordination in applied team settings is thought to come about via learning, experience, and expertise, and is managed via both explicit (e.g., instruction) and implicit (e.g., tacit understanding) mechanisms (Espinosa et al., 2004). However, core aspects of this approach are grounded in social-cognitive models, demanding voluminous information processing and top-down control (Araújo and Bourbousson, 2016). Construed in this way, coordination-driven teamwork then becomes an arguably impossible (Turvey, 1990, 2007; Turvey and Fonseca, 2009) achievement of individual minds, rather than an emergent property of the interactions between team members. In contrast, the science of coordination dynamics (e.g., Kelso, 1995) posits that coordination is self-organizing, emerging spontaneously precisely because of the interactions between individual components of a system (e.g., team members). Adopting this approach may therefore provide a more theoretically tractable framework for understanding how coordination impacts collective productivity.

Inspired by centuries-old observations of spontaneous alignment in mechanical devices (e.g., pendulum clocks; Huygens, 1673/1986), the lawful principles of coordination dynamics indicate that components of systems which are both coupled (i.e., linked) and share specific qualities (e.g., movement frequency), will tend to spontaneously synchronize (i.e., coordinate in time<sup>1</sup> ) toward one of two attractor states (i.e., inphase or anti-phase; Kelso, 1995). Indeed, these specific patterns have been documented in many biological systems, ranging from fields of fireflies (e.g., Buck and Buck, 1976) to people in social contexts (e.g., Schmidt and O'Brien, 1997; Richardson et al., 2007b). Importantly, interpersonal coordination brings with it a host of socially relevant outcomes that function to establish a common ground and enhance entitativity (Semin, 2007; Schmidt and Richardson, 2008; Marsh, 2013). For instance, even short periods of synchronous action promote affiliation (Hove and Risen, 2009) and cooperation (Wiltermuth and Heath, 2009) between interaction partners, while negative social contexts have been shown to thwart the emergence of synchrony (Miles et al., 2010; Paxton and Dale, 2013a).

Demonstrations of spontaneous interpersonal coordination are plentiful (see Marsh, 2013 for an overview). Not only do people unintentionally align their gross motor behavior (e.g., footsteps; Zivotofsky and Hausdorff, 2007) but also their gaze (Richardson and Dale, 2005), speech patterns (Fusaroli et al., 2012), postural movements (Shockley et al., 2003), and heart rate (Mitkidis et al., 2015), to name but a few examples. Acknowledging the enormous computational burden demanded by representational explanations of joint action,<sup>2</sup> researchers have recently highlighted how insight into the dynamics of interpersonal activity may provide more parsimonious accounts of collective behavior (Schmidt and Richardson, 2008; Coey et al., 2012; Dale et al., 2013). To illustrate, Richardson et al. (2015) investigated the behavioral dynamics of a goaldirected joint targeting task. Pairs of participants repetitively moved virtual objects to target locations with the instruction to avoid collisions. Importantly, the set-up was such that if both participants followed the optimal movement trajectory (i.e., a straight line) they would collide and fail to complete the task. The data revealed that, without communication, participants rapidly and spontaneously adopted an asymmetric pattern of movement with one maintaining the direct trajectory, while the movements of the other showed a more elliptical shape. Dynamical modeling supported this observation whereby a between-participant asymmetry in repeller (i.e., collision avoidance) strength reflected the behavioral data. Here then, participants were seen to spontaneously adapt their movements relative to one another in a manner functionally consistent with task-relevant dependencies (i.e., move objects and avoid colliding). Crucially, in line with a dynamical systems approach, the adoption of asymmetrical but complementary roles (i.e., one

<sup>1</sup>There is a variety of terminology used in the literature to describe interpersonal coordination (e.g., alignment, convergence, mirroring, mimicry, synchrony). Here, following Paxton and Dale (2013a; also see Lumsden et al., 2014), we equate 'synchrony' to behaviors that are matched in time and space (e.g., phase locked) and use, coordination, as a more general term to capture the range of non-spurious relationships between the behaviors of interacting individuals.

<sup>2</sup>Critiques of information-processing and/or representational accounts typically focus on two general issues: (i) a gross excess of information to process/control (i.e., the degrees-of-freedom problem; Bernstein, 1967), and (ii) the absence of a plausible processor or executive controller (i.e., the homunculus problem; Ryle, 1949/2009). Although a detailed treatment of these arguments is beyond the scope of the present article, we point interested readers toward several excellent overviews (e.g., Reed, 1996; Richardson et al., 2008; Chemero, 2011).

straight and one elliptical trajectory) emerged naturally from the interactions between participants and task constraints, rather than from any top-down, a priori plan or set of instructions.

A rapidly growing body of work attests to the notion that patterns of movement that can characterize self-organized interpersonal coordination are also implicated in effective joint performance. For instance, Abney et al. (2015) reported that performance on a joint tower-building task was improved when partners' body movements were loosely coupled. Although assigned to distinct task-specific roles and being freely available to communicate, pairs who displayed moderate levels of motor coordination also constructed better towers. Similarly, Fusaroli et al. (2016) showed that uninstructed behavioral coordination positively predicted competence in a group LEGO <sup>R</sup> building task, while Won et al. (2014) reported that dyads tasked with idea generation were more creative to the extent that they synchronized their movement. More concrete joint action tasks also reveal a functional role for spontaneous motor coordination. People readily make very fine-grained adjustments to their behavior and spontaneously take on distinct task-relevant roles in order to achieve coordination goals (e.g., coordinating landing times when jumping; Vesper et al., 2013). In seminal demonstrations, when given the exercise of moving planks of differing lengths without verbal communication, pairs of participants adopt different behavioral modes (i.e., one-handed, two-handed, or two-person lifting) depending on both plank length and partner ability (Richardson et al., 2007a; Isenhower et al., 2010). Together, what this work indicates is that beyond the notion that people can (and do) coordinate their actions with others, functional task-specific patterns of coordination emerge from goal-oriented interactions — a key characteristic of a selforganizing social system.

The current research sought to further explore the notion that collective performance can be understood in the terms of a self-organized dynamical system. By focusing on an ecologicallyrelevant outcome of group work – productivity – we aimed to identify whether performance in this sense is influenced by the spontaneous emergence of interpersonally coordinated behavior. Participants, both individually and as a pair, were asked to move small plastic balls from one location to another as quickly and accurately as possible. We manipulated two factors intended to shape the nature of the task-relevant dependencies (i.e., links) between individuals. First, as a within-participants factor, we adjusted the aperture of the target location (i.e., where the balls were deposited) so that either only one ball (i.e., small aperture condition) or two balls (i.e., large aperture condition) could be deposited at a time. In effect, this varied the affordances (i.e., opportunities for action; Gibson, 1979) available to participants and, in turn, the possibilities for coordination. Specifically, the potential for in-phase coordination (i.e., 0◦ relative phase, both participants pick-up and deposit balls simultaneously) was eliminated in the small aperture condition. Second, we varied the social context in a between-participants manner by manipulating the instructional set – either to compete or cooperate – given to each pair. This factor was intended to influence performancerelated dependencies between participants to the extent that cooperative goals promote interdependent modes of action, while competitive goals lead to more independent behavior (Deutsch, 1949; Beersma et al., 2003).

By manipulating task-relevant dependencies, we created a context in which both productivity and coordination were expected to vary in systematic ways. For each trial we quantified productivity in terms of both the number of balls successfully transferred (i.e., hits) and the number dropped (i.e., misses). We expected the small (cf. large) aperture condition to limit productivity, resulting in fewer hits and more misses. Similarly, consistent with Beersma et al. (2003), we expected the cooperation/competition instructions to result in a form of a speed-accuracy trade-off, leading to 'co-operators' being more accurate (i.e., fewer misses) and 'competitors' more productive (i.e., more hits). We also tracked each participant's actions using a video-based frame-differencing approach (Schmidt et al., 2012; Paxton and Dale, 2013b; Romero et al., 2016) and used the resulting time-series to estimate movement variability and between-participant coordination. Here we expected to see evidence of interpersonal coordination and an accompanying reduction in movement variability (i.e., increased stability), but this to be tempered by aperture size (i.e., small aperture to reduce levels of coordination as the in-phase mode is not possible) and instructions to compete (i.e., resulting from the reduction in interdependency). With these predictions in mind, we also set out to begin to address a more overarching question — what is the relationship between movement coordination and group productivity?

# MATERIALS AND METHODS

# Participants and Design

In total, 102 undergraduate participants took part in pairs in return for course credit. However, prior to analysis, five pairs were removed from the dataset on the basis that participants reported knowing each other.<sup>3</sup> The final sample consisted of 92 participants (72 female, age range 17–35 years, mean age = 20.7 years). The study had a three-factor mixed model design whereby task context (solo vs. group) and aperture size (small vs. large) were manipulated within participants, while instruction set (cooperation vs. competition) was manipulated between participants (i.e., 23 pairs per condition). The study was reviewed and approved by the School of Psychology, University of Aberdeen ethics committee.

# Materials and Procedure

Pairs of participants arrived at the laboratory individually and were briefly introduced to each other before being separated into adjacent rooms. At this point, one participant completed questionnaires to provide basic demographic information (see Supplemental Materials) while the other was introduced to the object movement task. The task (see **Figure 1**) required participants to move small plastic balls (6 cm diameter), one at

<sup>3</sup>Each participant indicated how well they knew the other by marking a vertical line on a 150 mm analog scale anchored by 'Not at all' and 'Extremely well'. Pairs with an average familiarity rating >10 (i.e., 10 mm from 'Not at all') were excluded from the analysis.

a time, from a large container (75 cm x 35 cm), fixed to the top of a table, to a tube located approximately 110 cm away. The receptacle tube was fitted with a lid with an aperture of either 7.5 cm (small aperture condition) or 15.5 cm (large aperture condition). The order of tube size was counterbalanced across pairs. Participants were required to use their dominant hand only while keeping their other hand behind their back, and to move each ball using a single arm movement without throwing them (i.e., to drop or place them into the tube). Importantly, participants were instructed to move the balls as quickly and accurately as possible.

Initially, participants completed 4 trials individually. Two trials were completed for each aperture size, one from each side of the table, and data were averaged across these trials. Once the first participant had completed this stage, they swapped rooms and filled out the demographic items while the other participant performed the object movement task. Each trial lasted for 65 s and was preceded by a 3 s countdown. Participants were given the option of a short break at the end of each trial if they were fatigued in any way. Immediately after both participants had completed the individual trials, they were invited back to the main laboratory to perform the task again, but this time together as a dyad. Participants were randomly assigned to either the cooperative or competitive conditions and at this point were given their instructions. Specifically, those in the cooperative condition were told to: "move the balls as quickly and accurately as possible, as a pair. That is, you need to cooperate with each another in order to achieve the goal." In contrast, those in the competitive condition were instructed to: "move the balls as quickly and accurately as possible, as an individual. That is, you need to compete against each other in order to achieve the goal." All participants were also instructed to not verbally communicate with each other. Again, each pair completed 4 trials, two for each aperture size, one from each side of the table. All trials were recorded to video (1920 px × 1080 px, 25 fps) using a digital video camera (Sony HD-SR12). Care was taken to ensure the camera was aligned with the center of the table/receptacle tube in order to be able to isolate each participant's movements (see Romero et al., 2016). After completing all trials participants were thanked for their time, debriefed, and dismissed.

# Data Reduction and Analysis

Prior to analysis, the first 5 s of each trial was truncated in order to remove the countdown period and to eliminate any initial transient movements. A frame-differencing approach was then employed using a custom-written MATLAB script to convert the remaining 60 s of each trial into movement timeseries. Specifically, each frame was halved vertically (in order to separate each participant's movements) and compared to the corresponding half of the previous frame in terms of pixel change (see **Figure 2**). This provided two time-series (one per participant) of movement data for each trial (one time-series for individual trials).

Global movement coordination was quantified using crossspectral coherence (Porges et al., 1980; Gottman, 1981; Warner, 1988). Each time-series was submitted to a cross-spectral analysis and expressed as component frequencies before the correlation between the two time-series (in the frequency domain) was calculated as a weighted average across the component frequency range. This measure provided an estimate of the extent to which participants' actions were temporally aligned (with 0 representing no movement coordination and 1 representing complete movement coordination) and has been commonly employed as an index of interpersonal coordination (e.g., Sadler et al., 2009; Lumsden et al., 2012; Schmidt et al., 2014). For each time-series we also calculated the coefficient of variation (CV) as

FIGURE 2 | Illustration of the frame-differencing technique used to quantify movement. A full 60 s time-series of movement (i.e., pixel change) from a solo trial is shown in the top panel and a 'zoomed' 2 s (35 s – 37 s) period in the middle panel. The lower panels depict every 6th frame (≈ <sup>1</sup>/<sup>4</sup> s) from this 2 s period. The letter on each frame denotes the corresponding data point on the 'zoomed' time-series. As can be seen, the oscillatory pattern of the time-series data corresponds to the participant's actions. 'Valleys' (i.e., low amount of movement/pixel change) match either picking up a ball from the container (e.g., frame A) or depositing it in the tube (e.g., frame D), while 'peaks' (i.e., high amount of movement/pixel change) match periods of movement between container and tube (e.g., frame B).

an index of movement variability. For this measure we initially calculated the mean and standard deviation of the period (i.e., distance between 'peaks' on each time-series) for each participant on each trial individually. Analysis revealed that the mean period differed as a function of aperture size and task context, hence rather than raw standard deviation we used the coefficient of variation (CV = σ/µ) as a standardized index of the temporal regularity of participant movements (i.e., higher CV values = less regular movements). Finally, we also recorded the number of hits and misses per participant per trial for the same specific 60 s period from which the movement time-series were constructed.

To provide an estimate of baseline performance we constructed pseudo-pairs by combining data from relevant individual trials (see **Figure 3**). For example, for a given trial (e.g., small aperture), data from the first participant's individual trial from the left side of the table were combined with that from the second participant's individual trial from the right side of the table. This provided baseline data specific to each pair in terms of expected performance (i.e., should their group-level productivity be a simple linear combination of their individual efforts), as well as an estimate of incidental (i.e., chance) levels of coordination. Therefore, across all measures the unit of analysis was at the level of the dyad.

# RESULTS

Initially, the primary dependent variables were analyzed separately using 2 (pair type: pseudo vs. actual) × 2 (aperture size: small vs. large) × 2 (instructions: cooperation vs. competition) mixed model analysis of variance (ANOVA) with repeated measures on the first two factors. Significant effects are reported below.

# Productivity: Hits

With respect to the number of balls successfully deposited, the analysis revealed main effects of both pair type, F(1,44) = 34.03, p < 0.001, η 2 <sup>p</sup> = 0.44 (i.e., pseudo < actual), and aperture size,

F(1,44) = 178.12, p < 0.001, η 2 <sup>p</sup> = 0.80 (i.e., small < large), which were qualified by an interaction between these factors, F(1,44) = 6.87, p = 0.012, η 2 <sup>p</sup> = 0.14, as shown in **Figure 4**. Post hoc pairwise comparisons (Bonferroni corrected) confirmed that actual pairs were more productive than would be expected by combining their solo efforts (i.e., pseudo-pairs) for both the small (p < 0.001) and large (p < 0.001) apertures.

# Productivity: Misses

fpsyg-07-01462 September 24, 2016 Time: 15:40 # 7

When considering the number of balls dropped or missed, the analysis revealed a main effect of pair type, F(1,44) = 34.57, p < 0.001, η 2 <sup>p</sup> = 0.44 (i.e., pseudo < actual), and a marginally significant effect of condition, F(1,44) = 3.82, p = 0.057, η 2 <sup>p</sup> = 0.08 (i.e., cooperation < competition), which were qualified by an interaction between these factors, F(1,44) = 8.49, p = 0.006, η 2 <sup>p</sup> = 0.16, as shown in **Figure 5**. Post hoc pairwise comparisons (Bonferroni corrected) indicated that while there was no difference as a function of condition when solo efforts

(i.e., pseudo-pairs) were combined (p = 0.74), actual pairs in the competitive condition made significantly more errors (i.e., more misses) than those in the cooperative condition (p = 0.03).

# Movement Coordination

Analysis of coordination (i.e., cross-spectral coherence) revealed that all main effects and 2-way interactions reached significance (all Fs>5.8) and were ultimately qualified by a 3-way interaction between pair type, aperture size, and condition, F(1,44) = 5.80, p = 0.02, η 2 <sup>p</sup> = 0.12, as shown in **Figure 6**. To simplify interpretation we then conducted separate 2 (aperture size: small vs. large) × 2 (instructions: cooperation vs. competition) mixed model ANOVAs for the pseudo-pairs (**Figure 6A**) and actual pairs (**Figure 6B**) separately. As expected, for the pseudo-pairs there were no significant effects (all Fs < 1), indicating that incidental (i.e., chance) levels of coordination were equivalent across conditions and aperture size. In contrast, for actual pairs there were main effects of both aperture size, F(1,44) = 13.51, p = 0.001, η 2 <sup>p</sup> = 0.24 (i.e., small > large), and instructions, F(1,44) = 22.38, p < 0.001, η 2 <sup>p</sup> = 0.34 (i.e., cooperation>competition), which were qualified by an interaction between these factors, F(1,44) = 6.88, p = 0.012, η 2 <sup>p</sup> = 0.14. Post hoc pairwise comparisons (Bonferroni corrected) indicated that for participants who had been instructed to cooperate, levels of coordination were higher when depositing balls into the small aperture tube compared to the large one (p = 0.006), while there was no such difference for those in the competitive condition (p = 0.34).

# Movement Variability

Comparison of the CV indicated main effects of aperture size, F(1,44) = 54.73, p < 0.001, η 2 <sup>p</sup> = 0.55 (i.e., small < large), and pair type, F(1,44) = 81.41, p < 0.001, η 2 <sup>p</sup> = 0.65 (i.e., pseudo < actual). Movements were more regular when depositing into the small tube, and when performing the task alone.

# Coordination, Movement Variability, and Task Performance

Finally, we examined the simple linear relationship between the level of coordination that emerged between each pair, movement variability, and productivity levels (i.e., hits and misses separately) for each aperture size. As displayed in **Table 1**, when considering the small aperture there was a clear negative relationship between coordination and accuracy (i.e., misses), r(46) = −0.41, p = 0.004, whereby pairs whose actions were more coordinated were also more accurate (i.e., fewer misses). Similarly, pairs who showed less variability in their movements also made fewer errors when depositing into the small aperture, r(46) = 0.42, p = 0.004. We then entered both the coordination and variability measures (from the small aperture condition) as predictors into a multiple regression analysis with accuracy (i.e., misses) as the outcome variable of interest. The overall model was significant, F(2,45) = 10.13, p < 0.001, and accounted for approximately 30% of the variance (adjusted R <sup>2</sup> = 0.289). Importantly, both variables were seen to be independent significant predictors of accuracy: coordination,

TABLE 1 | Correlations between coordination (i.e., cross-spectral coherence), movement variability (i.e., coefficient of variation [CV]), hits, and misses for the small and large apertures separately.

represents actual pairs. Error bars represent ±1 SEM.


A matrix of scatterplots depicting these relationships is provided in the Supplemental Materials. <sup>a</sup>p = 0.004

β = −0.38, t(43) = −3.03, p = 0.004; variability, β = 0.39, t(43) = 3.07, p = 0.004.

On the other hand, when depositing the balls into the large aperture there were no significant relationships between any of the factors. However, inspection of **Table 1** suggests these effects were consistent in terms of direction but reduced in magnitude compared with those found for the small aperture.

# DISCUSSION

The present results provided support for the predicted effects, but also revealed some unanticipated outcomes. Importantly, here we demonstrated that in the context of a simple object movement task, the presence of a co-actor led to facilitated productivity (i.e., more hits) and a decrease in accuracy (i.e., more misses) beyond the extent that would be expected by simply combining solo efforts. Characteristic of classic 'social facilitation' effects (e.g., Triplett, 1898; Zajonc, 1965), the product of working collectively exceeded the sum of the individual inputs. Similarly, across all conditions, coordination was greater than would be expected had each individual not been impacted by the presence of the other (i.e., chance). Together these findings point to the notion that performance at the dyadic level emerged from the realtime interactions between the participants and the environment, rather than simply being the linear product of each individual's a priori attributes and static task constraints. This view lends further support to the notion that group productivity can be conceptualized as an emergent phenomenon (McGrath et al., 1999; Marks et al., 2001; Gorman et al., 2010; Waller et al., 2016).

When it came to the relationship between task performance and movement the results provide further insight into the functional aspect of this connection. Here, it could be expected that more is simply better — that stable coordinative states best realize between-participant dependencies and in turn facilitate greater productivity. The data, however, suggest a different, potentially more nuanced situation (cf. Abney et al., 2015). Both of the movement-relevant measures we considered (i.e., variability and coordination) were seen to exert influence on task performance, but primarily in terms of shaping accuracy rather than gross productivity. Pairs that showed higher levels of coordination or more regular movements also tended to be more accurate (i.e., fewer misses). The effects on hits were similar in directional terms but did not reach significance. Of note, there was no relationship between the measures of coordination and movement variability, suggestive of these factors having distinct roles in shaping task performance. Thus, it appears that in the context of the current task, the emergence of interpersonal coordination, along with regular movement patterns, were associated with more accurate performance.

Where these effects were most robust, both coordination levels and movement regularity independently predicted task accuracy when participants were depositing balls into the small aperture. Moreover, this condition was seen to result in the lowest level

of productivity but the highest level of coordination. Although speculative, we suggest that the restriction of the small aperture led participants to fall into an anti-phase mode of coordination (i.e., when one participant is picking up a ball the other is depositing).<sup>4</sup> Acknowledging that this mode of coordination is stable at relatively lower movement frequencies (Haken et al., 1985; Kelso, 1995; Schmidt and Richardson, 2008), it follows that this slowing, in combination with the physical spacing of participants' actions (i.e., in an anti-phase mode, collisions at the pick-up and depositing regions are effectively eliminated) could result in the heightened accuracy observed. Relatedly, if there were fewer collisions between participants, this may also explain the effects of movement variability in that these instances will necessarily perturb regular rhythmic movements as participants recover and adjust their actions accordingly. Quite why participants appeared to avoid the globally stable inphase mode of coordination when available (i.e., large aperture) is, however, unclear.<sup>5</sup> Imperative, therefore, is for future work to seek to employ more precise methods (e.g., high fidelity motion-tracking) to better capture the dynamical characteristics of instances of coordination as reported here.

Two additional findings merit consideration. First, while there is a solid evidential basis to suggest that engaging in synchronous acts can promote subsequent cooperative behavior (e.g., Wiltermuth and Heath, 2009; Valdesolo et al., 2010; Kokal et al., 2011; Launay et al., 2013; Reddish et al., 2013; Cirelli et al., 2016), work addressing the converse relationship – cooperation engendered synchrony – is scarce. Although it has previously been established that individuals with prosocial motives show higher levels of spontaneous interpersonal synchrony (Lumsden et al., 2012), to our knowledge the current study provides the first empirical demonstration that an explicit instruction to cooperate (cf. compete) also leads to a greater tendency to coordinate behavior. Our findings point toward a bidirectional relationship between coordination and cooperation. We believe this adds weight to the claim that this association may operate as a feedback loop — establishment of coordination has been argued to provide immediate real-time reinforcement for cooperative intentions, which in turn support further coordination (Reddish et al., 2013). In a related sense, those instructed to compete in the current study not only showed reduced levels of coordination, they also made more errors. As well as contributing support for the speed-accuracy tradeoff documented by Beersma et al. (2003), this effect may again reflect a reinforcement of behavior over time. If competitive motives initially thwart the emergence of coordination, this may function to simply maintain the state of affairs which, in the context of the current task, was seen to result in decreased accuracy. Future work focused on developing a more fine-grained understanding of the real-time evolution of the relationship between coordination and productivity will help further evaluate this proposal.

Consideration of limitations of the current study also warrants some attention. First, we acknowledge that by always testing the solo performance condition first, we are unable to eliminate the potential influence of practice or carry-over effects. However, in the present task it was important to initially establish a baseline individual performance level free of any 'social contamination' (e.g., from observing a partner's performance), an approach that is also employed in related literature (e.g., Vesper et al., 2013). Moreover, given all participants performed their solo trials under identical conditions (i.e., prior to the cooperation-competition instruction set) it would be reasonable to expect any practice effects to be consistent across conditions. The motor task itself is also very straightforward, suggesting practice might be of limited benefit. Nonetheless, it is of course important for future work to empirically investigate this matter by counterbalancing the order of individual and group trials. Second, although the present pattern of results is consistent with a self-organized dynamical account of interpersonal coordination, we cannot effectively rule out more strategic socially-relevant behavior. For instance, participants may take more care when in the cooperative (cf. competitive) condition so as to limit the impact of their errors on their partner, intentionally take on a complementary role to their partner (cf. Vesper et al., 2013; Richardson et al., 2015), or even simply show 'good manners' by, for instance, pausing to allow their partner to proceed. Examining participant strategies, by conducting qualitative interviews post performance, for example, may help provide additional insight here and improve the generality of the current findings.

Along with furthering the empirical understanding of the functional relationships between coordination and collective activity, here we also outlined a novel object-movement task that we feel is well-suited for the experimental investigation of group dynamics. The current task is simple and inexpensive to run, allows for both laboratory and field settings, and has ample scope for 'scaling-up' to multi-agent activities. The task provides a procedure to establish meaningful baseline estimates of group behavior (i.e., pseudo-groups) and enables precise quantification of such behavior, while allowing participants to behave in a relatively naturalistic fashion. To this end, the present results provide some proof-of-concept that the task is a suitable vehicle for studying the effects of both social and physical parameters. Further validation of the task in combination with the introduction of more detailed behavioral recording (i.e., high fidelity motion tracking) are, therefore, important next steps.

More broadly, the current study also contributes to an increasingly complex picture regarding the general relationship between interpersonal synchrony and collaborative activity. Although consistent with prominent claims that synchrony is a pervasive feature of social life (Schmidt and Richardson, 2008; Marsh, 2013), more detailed functional arguments are likely to demand greater context-specificity. That is, understanding precisely how and when interpersonal coordination functions to enhance goal-directed joint activity is likely to require a more systematic specification of how between-person task dependencies are best managed in order to optimize

<sup>4</sup>Anecdotal observations recorded by the experimenters during the task are consistent with this conjecture. In addition, we conducted two follow-up procedures that also support this interpretation (see Supplemental Materials).

<sup>5</sup>One speculative suggestion is that although physically sufficient, the size of the large aperture (i.e., ≈ 2.6 times the ball diameter) was still too restrictive to allow participants to comfortably deposit two balls simultaneously without touching hands, etc.

performance. Although clearly challenging, this approach may offer valuable insight into how we might structure group activity in order to best realize the potentials of teamwork.

# AUTHOR CONTRIBUTIONS

fpsyg-07-01462 September 24, 2016 Time: 15:40 # 10

All authors developed the study concept and design. JA and TV collected the data. JA, TV, and LM conducted the data analysis and interpretation of results. JA and LM drafted the manuscript and TV and DM provided critical review and revisions. All authors approved the final version of the manuscript for submission.

# REFERENCES


# ACKNOWLEDGMENTS

The authors gratefully acknowledge the assistance of Cathy Macpherson and Hope Fawcett-Lipscombe with data collection and coding as well as Mike Richardson for generously sharing Matlab code and providing invaluable guidance.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2016.01462



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Allsop, Vaitkus, Marie and Miles. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Performance of Language-Coordinated Collective Systems: A Study of Wine Recognition and Description

Julian Zubek <sup>1</sup> , Michał Denkiewicz <sup>2</sup> , Agnieszka D ˛ebska3, 4, Alicja Radkowska<sup>3</sup> , Joanna Komorowska-Mach<sup>5</sup> , Piotr Litwin<sup>3</sup> , Magdalena St ˛epien´ 3 , Adrianna Kucinska ´ 3 , Ewa Sitarska<sup>3</sup> , Krystyna Komorowska<sup>3</sup> , Riccardo Fusaroli 6, 7, Kristian Tylén<sup>6</sup> and Joanna R ˛aczaszek-Leonardi <sup>2</sup> \*

1 Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland, <sup>2</sup> Institute of Psychology, Polish Academy of Sciences, Warsaw, Poland, <sup>3</sup> Faculty of Psychology, University of Warsaw, Warsaw, Poland, <sup>4</sup> Laboratory of Psychophysiology, Nencki Institute of Experimental Biology, Polish Academy of Science, Warsaw, Poland, <sup>5</sup> Institute of Philosophy, Faculty of Philosophy and Sociology, University of Warsaw, Warsaw, Poland, <sup>6</sup> Center for Semiotics, Aarhus University, Aarhus, Denmark, <sup>7</sup> Interacting Minds Centre, Aarhus University, Aarhus, Denmark

#### Edited by:

Gesualdo M. Zucco, University of Padua, Italy

#### Reviewed by:

Jonas K. Olofsson, Stockholm University, Sweden Remo Job, University of Trento, Italy

\*Correspondence: Joanna R ˛aczaszek-Leonardi jraczaszek@psych.pan.pl

Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 18 February 2016 Accepted: 18 August 2016 Published: 27 September 2016

#### Citation:

Zubek J, Denkiewicz M, D ˛ebska A, Radkowska A, Komorowska-Mach J, Litwin P, St ˛epien M, Kuci ´ nska A, ´ Sitarska E, Komorowska K, Fusaroli R, Tylén K and R ˛aczaszek-Leonardi J (2016) Performance of Language-Coordinated Collective Systems: A Study of Wine Recognition and Description. Front. Psychol. 7:1321. doi: 10.3389/fpsyg.2016.01321 Most of our perceptions of and engagements with the world are shaped by our immersion in social interactions, cultural traditions, tools and linguistic categories. In this study we experimentally investigate the impact of two types of language-based coordination on the recognition and description of complex sensory stimuli: that of red wine. Participants were asked to taste, remember and successively recognize samples of wines within a larger set in a two-by-two experimental design: (1) either individually or in pairs, and (2) with or without the support of a sommelier card—a cultural linguistic tool designed for wine description. Both effectiveness of recognition and the kinds of errors in the four conditions were analyzed. While our experimental manipulations did not impact recognition accuracy, bias-variance decomposition of error revealed non-trivial differences in how participants solved the task. Pairs generally displayed reduced bias and increased variance compared to individuals, however the variance dropped significantly when they used the sommelier card. The effect of sommelier card reducing the variance was observed only in pairs, individuals did not seem to benefit from the cultural linguistic tool. Analysis of descriptions generated with the aid of sommelier cards shows that pairs were more coherent and discriminative than individuals. The findings are discussed in terms of global properties and dynamics of collective systems when constrained by different types of cultural practices.

Keywords: language coordinated interaction, systemic complexity, bias-variance analysis, collective performance, wine tasting and recognition

# 1. INTRODUCTION

Even though we are often not aware of this, our decisions and actions in the world are rarely a solitary enterprise. When going for a job interview, your reaching to take out appropriate clothes seems to be your decision here and now, yet it is constrained by various kinds of cultural contexts. Your choice, an important one, as you are deciding on how much of your own personality you

wish to reveal to your future employer, depends on what is acceptable in your culture, on which dress codes have been taught to you by your family (explicitly or by practice), and on the current fashion and how your peers dress on such occasions. You may check the dress code of the company for which you are getting interviewed and check your choices with family members and friends by asking them in person or sending your picture via electronic media.

Doing things together is thus our species' natural mode of being, a fact generally underappreciated in cognitive psychology. Our actions, choices, and decisions practically always have a collective dimension. This togetherness comes in many forms: the presence of others (physical or virtual), engagement of culturally developed artifacts (Hutchins, 1995b; Clark and Chalmers, 1998), or knowledge, how things "ought to be done" or are "usually done," i.e., the social norms and practices which we acquire from our social surroundings and upbringing (Sidnell and Enfield, 2012; Enfield, 2013; Sinha, 2014).

Before we can start addressing the collective nature of human cognition and behavior we have to be careful in how we define it. From the dynamic perspective we engage, interactions are not just simple combinations of behaviors of two or more individuals. Rather, by entering a social interaction, individuals become parts of a larger systemic organization. New qualities emerge that can only be captured at the collective level. In turn, the emergent level comes to shape individual action and cognition (Schmidt et al., 1990; Hutchins, 1995a; Di Paolo et al., 2008; Schmidt and Richardson, 2008; Riley et al., 2011; Fusaroli et al., 2014). Such systemic level organization of human collectivity arises at multiple timescales: it is effective when engaging each other faceto-face, but crucially depends on being shaped in development (R ˛aczaszek-Leonardi et al., 2013), cultural evolution (Smith et al., 2003; MacWhinney, 2005; Enfield, 2013) and even biological evolution (Lewontin, 2001; Smaldino, 2014). This approach calls for new methods to describe properties of emergent collectivity and link them to the performance and properties of participating individuals. In investigations of movement coordination, concepts such as coupling or functional synergy have been applied to address aspects of complexity, stability and functional coherence of collective systems (Turvey, 1990; Schmidt and Richardson, 2008; Riley et al., 2011). Considering the collective dimension of systems has brought a focus on the system's level performance. Variables pertaining to the systems as a whole, such as temporal characteristics of their behavior and/or their stability or variability of performance are increasingly often used as indices revealing internal dynamics of such systems (Van Orden et al., 2003). Using such means, one can assess the functional reduction of degrees of freedom that results for a given system from a particular interaction in a particular situation.

Such views on collectivity bring about new perspectives on natural language as it becomes a constitutive element of human interaction. First, language is not considered an individual skill, a categorization tool or a simple vehicle of content. Rather, it is a mean of coordination, enabling and shaping interactions (Halliday, 1977; Schegloff et al., 1996; R ˛aczaszek-Leonardi and Kelso, 2008; Tylén et al., 2010; Raczaszek-Leonardi and Cowley, 2012), which—congruently with the systemic view above—can be operationalized as functional control over the systems' degrees of freedom. Second, the crucial role of language for interaction has to be considered on several timescales (R ˛aczaszek-Leonardi, 2003; Smith et al., 2003; MacWhinney, 2005). These timescales range from on-line processes, when interlocutors dynamically construct linguistic controls appropriate for a current task (Fusaroli et al., 2012; Mills, 2014) to the slower cultural processes of selection and stabilization of linguistic structures and practices useful to control interactions in relevant activities (R ˛aczaszek-Leonardi, 2009). This view carries explanatory potential not only for aspects of emergence of grammar in general but also for the emergence of domain-specific professional argots and even codified linguistic artifacts containing terms and structures selected to enable and facilitate co-action within specific fields of human activity.

This approach to collectivity and the role of language has only quite recently been employed to explain cognitive and linguistic coordination. It charts a field for the study of language in real interactions, over many timescales, utilizing advanced methods for studying complex dynamical systems. Some of the paths in this field are already being empirically explored in a promising way. Recent studies have shown how symbolic constraints can emerge in the course of online interactions (Galantucci, 2005; Fay et al., 2010; Mills, 2014), as well as how they guide the systems' collective task performance (Fowler et al., 2008; Dale et al., 2011; Fusaroli et al., 2012). The synergetic model has proven promising in accounting for the features of on-line communication that best predict performance on simple decision tasks (Fusaroli and Tylén, 2016). However, most studies so far have utilized only simple, one-dimensional tasks, which might have reduced the possible influence of linguistic coordination. Furthermore, questions remain open as to the potential impact of other timescales of language functioning (such as written cultural artifacts). Thus in our study, using the systemic approach sketched above, we aimed to investigate the task-relevant constraining role of language coming from different time-scales in a cooperation involving multidimensional stimuli.

# 2. THE STUDY

The present study was designed to assess the impact of two forms of linguistic involvement on the properties of collective systems formed to solve a recognition task. We will address the following questions: First, does spontaneous linguistic interaction affect behavior of a system in a complex perceptual identification and recognition task? Second, is its behavior further influenced by the use of a linguistic artifact, established on a cultural evolution timescale to facilitate communication and performance on the specific task? Third question regards the relation between these linguistic influences (spontaneous talk vs. artifact use) and their possible interaction. The hypotheses are formulated regarding both the performance of the collective systems and the kinds of errors that the systems make, which indirectly testify to the internal dynamics of the systems, which render specific discriminatory and recognition capabilities. For this purpose we rely on the bias-variance decomposition framework (Geman et al., 1992; Domingos, 2000). The participants' behaviors will be analyzed in terms of the systemic and multi-scale view introduced above. This means that the object of study will be the systems constituted through the use of various types of linguistic coordinators.

Bias-variance decomposition is a tool, which allows to distinguish between bias—systematic error and variance random, uncontrolled error. This kind of analysis becomes increasingly important if we consider systems making decisions in open, dynamic environments (Gigerenzer and Brighton, 2009). In our case bias and variance can be treated as indices of the internal dynamics of the system. When we consider systems that learn from interaction with the environment, with each system having slightly different experiences (data sample), variance is connected with the sensitivity of the system to individual samples: a system with high variance will produce very complex rules of judgment, tailored to the specific data it has been exposed to; a system with low variance will produce simpler rules ignoring the specific details of individual samples. High variance implies many internal degrees of freedom, which enable the system to fixate on the specific details of the data, but leads to a loss in the ability to generalize. High bias, on the contrary, relates to a low number of internal degrees of freedom, when the system is unable to cope with the problem's complexity, systematically skewing the system's performance in one direction. To gain intuition about these dependencies, we can think about people with various introspective abilities engaging in common social tasks, for example a person making a decision to take the floor during a large gathering, for instance, a scientific conference. People with low introspective abilities will act according to simple rules: for example, whenever their general confidence level is high they will start talking, failing to notice more subtle contexts, which make their action illtimed (for instance, another person trying to say something). Their actions will be schematic and they will make mistakes in certain situations (low variance, high bias). On the other hand, people with high introspective abilities and a complex model of the situation will be very sensitive to fluctuations of their own mood and subtleties of the circumstances. In many cases they will overcomplicate things by analyzing unimportant details, for instance, they will try to predict the mood of all the people in the audience and if their comment really fits the discussion. Their behavior will be flexible but unpredictable, and sometimes the overwhelming number of details will prevent them from taking any action at all (high variance, low bias). This illustrates a notion of bias-variance tradeoff because for a specific problem complex systems with low bias tend to have higher variance and vice-versa. The same phenomena which govern the behavior of an individual occur also on the collective level, which is the case in our study, where error decomposition is applied to provide insights into how different forms of linguistic constraints influence the description and recognition of complex perceptual stimuli.

# 3. DESIGN AND HYPOTHESES

In order to assess how two forms of linguistic collective engagement, i.e., spontaneous conversation and the use of a domain-specific cultural artifact, constrain cognition in a complex recognition situation, we needed a task which would: (i) involve complex, multidimensional stimuli; (ii) be difficult enough to yield sufficient performance variability, (iii) not be widely established in everyday language, but, on the other hand, (iv) have a professional, domain-specific, culturally created argot, codified in a linguistic artifact.

Therefore, we chose a wine tasting and recognition task. While being sufficiently difficult and complex, wine recognition is an intuitive task for most participants, and naturally performed as both a solitary and social activity (Lehrer, 2009). The culture surrounding wine consumption is rich and diverse, and a professional language has been developed for wine description, codified in so-called sommelier cards. This professional language is not widely known, nor does it correspond clearly with the lay, everyday language used in the novice's "wine talk" (Solomon, 1990). Additionally, multiple existing studies on wine perception, description, and recognition provide a useful background that can guide the selection of the participants and materials (Solomon, 1990; Hughson and Boakes, 2002; Lehrer, 2009; Zucco et al., 2011; Royet et al., 2013).

In order to operationalize our main research questions in a wine recognition task, we employed a two-by-two factorial design: individual vs. pairs (where the requirement of joint decision elicited spontaneous linguistic interaction); and the presence vs. absence of a cultural artifact for wine description (a sommelier card). Thus the conditions were as follows:


Performance was measured in terms of recognition accuracy (score, error decomposition) and the quality of wine descriptions. Recognition accuracy was measured across all conditions and errors were analyzed in terms of systems' bias and variance. In the sommelier card conditions we were also able to comparatively assess the properties of wine descriptions, as the sommelier card provided a limited set of dimensions to be quantified. We were especially interested in how the descriptions produced by pairs vs. individuals differed in their coherence (i.e., similarity across participants within the same condition) and in their ability to separate the wine samples (i.e., how little overlap there was between the different descriptions of wines).

We predicted that both kinds of collective engagements (interacting in real time with a partner or with the cultural scaffold of a sommelier card) would lead to increased accuracy in wine recognition. For systems relying on spontaneous interaction, we expected such benefits to arise from jointly created linguistic controls (shared vocabulary attuned to the task) that would guide collective attention to relevant dimensions of the taste experience (Fusaroli et al., 2012; Tylén et al., 2013). We also expected benefits from using the sommelier card. The sommelier card is a tool, which embodies years of professional experience, offering precise dimensions along which the stimuli can be organized. Thus, both pairs and individuals with a sommelier card should outperform their counterparts without it, as they can rely on a history of culturally selected dimensions to guide their descriptions and recognition processes. Whether the benefits of the two types of collectivity would be additive or interact was an open question.

Crucially, the bias-variance framework presented above allows for making predictions about the kind of errors characteristic for each system. Since, as explained above, the role of language is to functionally bind the degrees of freedom of a system, we can expect that adding linguistic constraints can lead to a decrease in variability of a system's performance. In particular, we expected that adding functional constraints in the form of a sommelier card would decrease the systems' degrees of freedom along culturally selected dimensions, therefore producing lower variance. Questions pertaining to individual vs. collective use of sommelier cards remain open for now: on the one hand, using spontaneous language should also constrain a system's degrees of freedom; on the other, the presence of another person may impinge on the complexity of a system in a way that could obscure this influence.

Finally, we also expected differences in the quality of descriptions prepared by individuals and pairs using the sommelier cards. Previous studies have shown that descriptions created by novices show little similarity and systematicity (e.g., Solomon, 1990). Bringing collective resources to the task should result in increased coherence (similarity of objects within one class) and discriminativeness (dissimilarity of objects belonging to different classes) of descriptions created by pairs compared to those created individually.

# 4. MATERIALS AND METHODS

# 4.1. Experimental Task

The task was to smell and taste three target wines in order to recognize them, after a break, among six wine samples. A pilot study was employed to identify an optimal number of wine samples, which would provide enough performance variability with a minimal amount of alcohol to be imbibed. As a wine sample contained 30 ml of wine, each experimental session (1–1.5 h) involved a maximum amount of 270 ml of wine available for consumption. The invited participants were informed that the study involved alcohol consumption which may influence their driving ability. Participants could measure their blood alcohol level with a breathalyzer. Out of 120 participants 102 had measured their alcohol level and 90 of the readings were 0. At maximum 0.19 per mille alcohol were observed, which is below the limits for drivers in Poland.

# 4.2. Participants: Recruitment and Demographics

Hundred and twenty three participants (85 females, one participant did not declare a gender) took part in the experiment. Participants' age ranged from 18 to 40 (M = 23.01, SD = 3.80). The majority of participants were university students or had higher education. Potential participants were contacted mainly through social media. They filled in a questionnaire, checking the following in/exclusion criteria: legal age, contraindication to the consumption of alcohol, smell or taste disorders, professional knowledge about wines, frequency of red wine consumption, and fluency in Polish (The questionnaire is provided in Supplementary Material S1.1.1). Informed by studies on the influence of age on olfaction (Doty, 1989; Hummel et al., 2007), we decided to recruit only participants younger than 50 years. Those who met the criteria were invited to participate.

All participants were wine tasting novices, that is, they had only cursory knowledge related to wine culture and possible ways of describing wines. The reasons for this choice were threefold: first, to avoid possible influence of earlier knowledge, which might be present in wine experts (Zucco et al., 2011); second, to avoid a possible verbal overshadowing effect which, according to some studies (Schooler and Engstler-Schooler, 1990; Melcher and Schooler, 1996; Parr et al., 2002) might occur especially when perceptual skills exceed verbal ones, which has been found especially among intermediately skilled participants, see Melcher and Schooler (1996) and Ryan and Schooler (1998). Third, we wanted to be sure that the nature and quality of the vocabulary would indeed be different in the spontaneous conversation and sommelier card conditions of our study, which in the case of experts could not be assured.

Due to these concerns, data from one participant in the individual condition and from one pair was excluded from the analyses because they informed the experimenter or demonstrated an extensive knowledge about wines. The final number of cases analyzed for each condition was as follows: Individual/no card: 20; Individual/card: 20; Pair/no card: 21; Pair/card: 19.

# 4.3. Wine Selection

The wines, both target and filler, in the final wine set were dry, red and had rather similar character. The selection was based on decisions of two professional sommeliers and the results of a pilot study. The aim was to maximize resolution in performance and avoid ceiling effects. This resulted in the following wine list:


# 4.4. Sommelier Card

A sommelier card is a cultural artifact that contains linguistic categories, which are used by professionals to judge the quality and the character of wine. Several such tools are presently used across the world, most notably the Associazione Italiana Sommelier card (Italian), Wine and Spirit Education Trust card (English), Deductive Tasting Format (American), or Feuille d'analyse sensorielle (ASNCAP) (French). In this study we chose a slightly simplified Polish version of the Associazione Italiana Sommelier card, which for several years has been used among the Polish sommeliers. This means that the key dimensions used in the card had the Polish terms agreed upon by the Polish sommeliers and used in professional writing. With the help of two professional sommeliers, we removed items that might be misleading for a non-expert because of meanings diverging from everyday language, and included additional explanations for some terms (such as "persistence" or "tannins").

The resulting sommelier card consisted of 21 items (scales and questions) pertaining to taste (9 items), smell (10 items) and general characteristics of wine (2 items). A comment section was included, where the participants could make their own descriptions if they felt it would help them make correct recognitions. An English translation of the sommelier card can be found in Supplementary Material S1.2. Both individuals and pairs were given the same card. Pairs shared a sommelier card and gave their joint answer for each item.

# 4.5. Experimental Procedure

The experiment was conducted following the ethical guidelines for psychological research and approved by local ethical committee of Institute of Psychology, Polish Academy of Sciences. Upon arrival, the participants signed informed consent forms and were assigned to one of the experimental conditions. The experimenter explained the task: to taste and smell wines, in order to recognize and identify them later in a larger set of other wines. Each participant was then provided with three samples of the target wines, each labeled with a number: 1, 2, or 3. The labels were consistent between participants and they were informed about this. **Figure 1** depicts the experimental setup during the learning phase.

In the sommelier card condition, three sommelier cards were provided, one for each target wine. Participants were instructed to use the sommelier cards for wine descriptions only—numbering or otherwise marking them was not permitted.

To prevent subjects from using additional visual cues, such as wine color or consistence, the samples were presented in black, opaque plastic glasses. Sessions involving pairs were recorded using a video camera and voice recorder. There were no time limitations on the learning phase—participants just signaled the experimenter when ready. The participants would then solve a series of unrelated spatio-visual tasks for approximately 40 min. Subsequently, participants were given six wines (the three targets plus three distractors), labeled with capital Latin letters A–F. Participants had to place correct numbers on three out of six presented wines. In the pair condition the participants were to provide a joint answer. No time limit was imposed in any of the conditions.

After completing the experiment, the participants filled a short survey querying their age, gender, whether they were smokers, the perceived difficulty of the task, and the perceived tastiness of each wine. In the pair condition, the questionnaire contained additional items assessing the relatedness of pair members and the evaluation of the level of cooperation during the session.

The quantity of the wine left was measured. Finally, participants could measure their blood alcohol level with a breathalyzer.

# 5. RESULTS

Raw data from the experiment is included as Supplementary Table S2. Analyses were conducted on three levels: (1) recognition accuracy in four experimental conditions, (2) condition specific patterns of bias and variance, and (3) analyses of the discriminativeness and coherence of the sommelier cards filled by individuals and pairs. Additional analyses assessed the character of the information integration resulting in pairs' wine descriptions. Finally, we provide preliminary data on quantitative aspects of verbal interactions that may have influenced performance.

# 5.1. Recognition Accuracy

Task performance was measured as the number of wines accurately labeled by participants ("identification score"). First, we assessed whether participants performed above chance in the four conditions. Since the wines are chosen simultaneously, not sequentially, calculating the baseline random performance is not trivial—for the description and mathematical formulas see Supplementary Material S1.1.2.

**Table 1** presents probabilities of obtaining a given score by chance. Observed distributions of scores in specific conditions

TABLE 1 | Probabilities of obtaining particular score value by chance under random performance.


experiment.

TABLE 2 | Frequencies of wine identification scores, tabulated by condition (N = 80).


The first four columns represent counts of particular scores, e.g., "2/3" means two wines out of three were identified correctly.

are given by **Table 2**. We applied goodness of fit test with simulated p-values (5000 samples) to assess if the scores differ from the random baseline. As can be seen, identification scores in all conditions are very unlikely to be obtained by chance.

In order to compare performance in the four experimental conditions, we performed a modified rank-based Brown-Forsyth test for variance inequality, which is median-based equivalent of Levene's test, more robust in case of non-normal distributions. Obtained p-value 0.0216 means that variances among groups differ significantly. Because of unequal variances and discrete score values distributed non-normally, to assess central tendencies we used Kruskal-Wallis test instead of standard ANOVA. The analysis yielded no significant results (p-value = 0.2328).

It is important to notice that even though the average scores in the 4 conditions do not differ, there is a significant difference in the overall scores distribution (**Figure 2**). For the conditions with sommelier cards, especially for pairs with sommelier card, distribution of scores gravitates toward the middle. Those differences between conditions were found significant by Fisher's exact test comparing numbers of medium scores (1 or 2 correct recognitions) and numbers of extreme scores (0 or 3) (p = 0.0002).

# 5.2. Bias-Variance Analysis

In the previous section we evidenced important differences in score distributions. To gain insight into the nature of errors, we used the bias-variance decomposition (see Introduction) analyzing placements of individual wines instead of aggregated scores. This allows for the analysis of distinct patterns of error structure in more detail. The procedure treats each system as a classifier in a supervised classification task (i.e., a task in which correct labels are given in the learning phase and in the recognition phase the classifier is expected to reconstruct the correct labeling). Error of the classifier can be attributed to three sources: bias, variance and noise.

We treat systems from each of experimental conditions as a classifier population and identification of each wine sample as a single instance of a learning problem. Error decomposition is performed for each population separately, which allows a meaningful comparison between conditions. In this context, bias is a systematic tendency of the systems within a specific condition to confuse two specific wines (answers are systematically skewed), while variance is the diversity of their answers (answers are more random). To define these concepts in a quantitative way, we apply the bias-variance decomposition schema proposed by Domingos (2000). The decomposition has the following form:

$$E(\mathbf{x}) = c\_1(\mathbf{x})\mathbf{N}(\mathbf{x}) + B(\mathbf{x}) + c\_2(\mathbf{x})V(\mathbf{x})$$

where E(x) is the expected error that the classifier makes on sample x, N(x) is noise, B(x) is bias, V(x) is variance and c1(x) and c2(x) are special coefficients dependent on the sample x.

Specific components of error are estimated by averaging over all systems and all samples within each condition. We calculate bias, variance and error for each of the three wines recognized by the participants and average the results. Let's denote y ∗ as the correct class for a given sample, y as the class predicted by an individual system and y<sup>m</sup> as the class most often predicted among all the systems. In this context we assume noise N = 0, that is, wine labels represent the true state of the nature. The overall error E is calculated as the fraction of samples identified incorrectly which is an estimation of P(y 6= y ∗ ). Bias B is the error of the main prediction, i.e., a classification based on the majority vote of all systems in the specific condition: P(y<sup>m</sup> 6= y ∗ ). Variance V is the fraction of answers different from the dominant answer: P(y 6= ym). Coefficient c<sup>2</sup> is a function of sample x. For each sample for which the main prediction is correct (hence bias=0), c<sup>2</sup> is equal to 1. Otherwise, c<sup>2</sup> = −P(y = y <sup>∗</sup> ∧ ym 6= y ∗ ) i.e., it is proportional to the probability of choosing the correct answer due to variance. This means that for a biased classifier variance may actually reduce the error, because it creates an opportunity to predict a label different from the main prediction.

It should be noted that bias and variance estimation is approximate in these experimental data because of (a) small number of samples (3) identified by each system, (b) the fact that the tasks of identifying wines 1, 2, 3 were not really independent. However, even this imperfect estimation allows for a meaningful comparison of different experimental conditions.

Results of the bias-variance decomposition for each condition are presented in **Table 3**. In comparison with individuals without sommelier card, pairs without sommelier card have a smaller bias and slightly larger variance, which results in error on roughly the same level. Pairs with sommelier cards, on the other hand, have a larger bias than pairs without card but much smaller variance. The reduced variance of answers was also visible as reduced variance of scores in previous analyses (see **Figure 2**). Individuals with sommelier cards have a slightly smaller error than individuals without cards, but the difference (due to a slight decrease of variance) is so small that we can say that individuals were mostly unaffected by the use of somelier cards.

To calculate statistical significance of the differences we performed a permutation test: we repeatedly (2000 times) randomly split the data into two groups and counted how many times more extreme values of bias and variance were produced. The results presented in **Table 4** suggest that the presence of a sommelier card in pair condition significantly alters bias and variance composition, as it is systematically different from all other conditions. In individuals, the sommelier card does not seem to have any influence. These findings motivate a more in depth analysis of the sommelier card-assisted descriptions produced by pairs as compared to those produced by individuals.

# 5.3. Analysis of Descriptions through Sommelier Cards

The analysis above show that sommelier cards affected the performance of pairs to a greater degree than performance of individuals. Therefore, the question arises if we can see this difference also on the collective level through the quality of sets



of descriptions produced via sommelier cards by individuals and pairs.

To answer this question we compared coherence and discriminativeness of descriptions prepared by pairs with those prepared by individuals. We had 19 pairs and 20 individuals each filling three sommelier cards, resulting in 57 cards filled in by pairs and 60 cards filled in by individuals. The 21 items from the sommelier card were encoded as a 21-dimensional vector. Since the number of options in each item varied—from two to five—we performed rank normalization: for each item its values were replaced by their ranks in the set of all sommelier cards. This procedure guaranteed that all of the items contributed to the analysis equally, regardless of the number of levels.

As a measure of coherence we used a silhouette score (Rousseeuw, 1987). It is based on the idea that an informative set of descriptions of the same wine should be more similar, while samples of different wines should be as distinct as possible. Formally, for each sample the silhouette score is a relation of its mean distance from points belonging to its class and its mean distance from the points of the closest foreign class. More formally: s = (b − a)/max(a, b), where a—mean intra-class distance, b—minimal mean inter-class distance. Silhouette scores look at each sample individually and the mean silhouette score value may be seen as a measure of coherence of the set of descriptions.

In order to measure the descriptions' discriminativeness, i.e., their usefulness for discriminating different wines, we employed

TABLE 4 | Results of permutation test comparing bias-variance decomposition between different conditions.


FIGURE 3 | Dispersion of filled sommelier cards after rank normalization and transformation with PCA (Principal Component Analysis). Gray lines denote logistic regression decision boundary, the model accuracy is reported below.

multinomial logistic regression. The independent variables were the 21 sommelier card items, the dependent variable was "wine label," and the model's accuracy in reclassification scenario was used as a score—the higher the accuracy, the more discriminative the description set. Note that the regression model was not used for inference, but rather as a measure of linear separability.

**Figure 3** shows dispersion of wine descriptions after rank normalization and dimensionality reduction through PCA. We applied a simple multinomial logistic regression model to look for regularities in the data. The independent variables were top two principal components, the dependent variable was either wine label or experimental condition. We observe a clear difference between descriptions of individuals and pairs (accuracy 0.43 vs. accuracy 0.6), which means that those prepared by pairs are more discriminative.

To obtain more meaningful results we compared differences between the two groups (individuals and pairs) in the original 21 dimensional space. We tested two hypotheses: (1) that the scores in each group are different than obtained by chance and (2) that the scores between the two groups (pairs and individuals) differ on those measures. Since such design is beyond the assumptions of standard statistical tests for linear models, significance of the obtained results was verified using permutation tests with 2000 permutations, conducted independently for each measure.

First, we compared the obtained results with the random baseline for individuals and pairs separately. Class labels of the descriptions were permuted randomly and the number of times when the permuted set outperformed the original one was counted. Results are presented in **Tables 5-I,II**. Pairs performed significantly better than random, while individual descriptions are on the baseline level. This means that, according to our criteria, on the population level the information content of individual descriptions is close to none.

The next step was to compare pairs and individuals directly. In each split of the permutation test we divided all the systems into two groups randomly. Then we calculated the value of each measure for each group. We counted the number of times when the obtained values were more diverse than those found between pairs and individuals. P-values returned by the described test are reported in **Table 5-III**. Significant differences were obtained both for silhouette scores and logistic regression reclassification accuracy. This suggests that descriptions made by pairs were both more coherent and more distinctive, allowing for a better classification than descriptions made by individuals.

An additional analysis was performed in order to gain more insight on how the information integration process occurred in pairs. One of the simplest possible mechanisms for the participants would be filling out the sommelier cards individually and then averaging the answers to obtain a pair decision. To test whether the participants could have employed such a procedure, we constructed artificial data points by randomly pairing and averaging points corresponding to the sommelier cards filled by individuals. We performed a permutation test comparing such synthetic sommelier cards with cards prepared by the real pairs. We randomly paired individual experiment participants to construct 10 × 3 synthetic sommelier cards and compared them with 10 × 3 sommelier cards produced by 10 randomly TABLE 5 | Comparison of coherence and discriminativeness of descriptions prepared by individuals and by pairs.


P-values for various dispersion metrics obtained through group split permutation test. Data after rank normalization.

selected individuals/pairs. We compared the scores for both groups and counted the number of cases when the score obtained for synthetic pairs was larger than the score obtained for real pairs. The experiment was repeated 2000 times. P-values returned by the described tests are reported in **Table 5-IV**. Data from the real pairs were significantly more coherent than the synthetic data (p = 0.02), while the difference in discriminativeness was on a tendency level (p = 0.07). These results demonstrate that the mechanism of information aggregation employed by the participants who collectively filled in sommelier cards is more complex than simple averaging. Through the conversation they were able to effectively combine information from their senses and their understanding of sommelier card categories to improve the quality of their descriptions.

# 5.4. Analysis of Verbal Interactions

In addition to performance data and analyses of the sommelier cards, we transcribed the video recordings, which allowed for quantitative characterization of pairs' verbal exchanges. We manually annotated the transcripts by assigning predesignated categories to words and phrases according to their communicative function. This, in turn, allows us to investigate more semantic and pragmatic aspects of pairs' collaborative exchanges. The most important category used in our present analyses was the "descriptor" category, which contained all vocabulary items used to describe properties of specific wines (taste, smell, etc.). The details of the transcription procedure and the full list of categories can be found in the Supplementary Material S1.1.3. Below we present preliminary observations from the analyses of the transcripts, to investigate which properties of the linguistic interactions are systematically related to pairs' performance.

In order to determine whether performance could be explained by the volume of verbal exchange, we examined the number of phrases and words used in the conversations (referred to as phrase count and word count), as well as the duration of the conversation. For each of these measures two Kendall's rank correlation (tau) tests were performed, separately for the card and no-card condition. No significant relationship was found for any of these measures.

As a measure of conciseness of conversation we used the mean of the logarithm of the length of utterance (short: MLU). We calculated lengths of uninterrupted utterances by a single speaker. Conciseness of conversation can be seen as an indicator of more efficient communication and language use—as less talk is required to convey the information on a single turn (Wilkes-Gibbs and Clark, 1992; Clark, 1996). Hence, we can expect that MLU would correlate negatively with performance. The Kendall's rank correlation test revealed a significant negative correlation between MLU and identification score in the sommelier card condition (r<sup>τ</sup> = −0.38, p = 0.039). For the no-card condition the correlation was not significant (r<sup>τ</sup> = −0.29, p = 0.098). The difference in MLU between card and no-card conditions was not significant according to t-test (t = 1.88, p = 0.07). These analyses suggest, that it was not the quantity of talk that influences performance, but rather the qualitative aspect of the exchange.

Next, we analyzed the vocabulary used to describe wine properties by experiment participants. From earlier work (Fusaroli et al., 2012; Fusaroli and Tylén, 2016) we expected to see a certain "homing in on" important dimensions for the particular task in more successful pairs when compared to those that were less successful. We therefore analyzed the elements of transcripts that were classified into the "descriptor" category. We expected the consistent use of wine-related vocabulary to be correlated with recognition performance and that more consistency will be displayed by pairs with sommelier cards. To measure the vocabulary consistency we introduced three measures:


Differences of those measures between experimental conditions are given by **Table 6**, and correlation with performance by **Table 7**. We can see that for card condition vocabulary is more concise and consistent (significant effects for TTR and CVS), and that consistency correlates positively with performance only in no-card condition (significant effects for CVP and CVS).

The results of these analyses suggest that consistency in description is important for the wine recognition task. Pairs with sommelier cards used more consistent vocabulary, and displayed similar characteristics as the most successful pairs without cards. The correlations in card condition were probably not visible since the consistency was already very high (in a sense, "forced" by the card).

# 6. DISCUSSION

In this paper we aimed to use a systemic approach to investigate the task-relevant constraining role of language coming from different time-scales. The timescales involved in our experiment included the biological level, connected with innate perceptual capabilities of individuals, the cultural level, including established categories in wine language, individual experience with similar types of stimuli (wine), and finally the time scale of real-time events consisting of the learning phase and the recognition phase. We controlled biological, cultural and individual experience scales to some degree through our recruitment procedure. Effects in the recognition phase were observed as the systems' performance, analyzed both as mean error and through biasvariance error decomposition. Additionally, we obtained insights into the learning phase by analyzing descriptions prepared using the sommelier cards. Types of collectivity on hereand-now scale included individual condition (no additional information) and a pair condition featuring spontaneous communication between participants (information integration in a pair). We also introduced a cultural-level constraint on colectivity through the use of an external artifact (sommelier card). We investigated different aspects of memory and decision making by experimentally manipulating these two central factors (collectivity and timescales).

Our analyses revealed that the levels of the first factor, representing different types of collectivity, have little impact on averaged performance scores in the recognition phase. However, it still has a significant impact on the systems' properties as evidenced by condition-specific patterns of error, revealed through bias-variance decomposition.




It occurred that the pair condition did not reduce the overall error. The variance of pairs without sommelier card was not smaller than the variance of individuals (i.e., spontaneous conversation did not seem to constrain the system). If the participants tried to solve the problem independently and then reported the average of their answers, the variance should decrease [according to Bienaymé formula, variance of the mean of uncorrelated variables with the same variance is that variance divided by the number of variables (Hoey and Goetschalckx, 2010)]. This suggests that participants chose a different strategy and made an attempt to adapt and complement each other. The lack of decrease in variance indicates that the language in spontaneous communication did not provide significant constraints. It is possible that the participants were able to influence each other but had difficulties communicating the precise meaning having no experience in wine talk and wine language. Benefits from communication have been argued to occur only when pairs are able to use locally aligned terms that are relevant for a given task (for example, Fusaroli et al., 2012 showed how pairs through verbal interaction calibrated their individual levels of confidence to inform joint decisions) and when the created conceptualizations are consistent during the whole performance. Perhaps in the case of pairs without the sommelier cards dealing with very complex stimuli, the time of the session was too limited for common dimensions to emerge and what we see is the "scouting" phase for useful terms. Indeed, our analyses of recording transcripts revealed that pairs without sommelier cards were using less consistent vocabulary than pairs with cards. Among pairs without cards those which managed to establish some consistency and sharing of the vocabulary were more successful.

Pairs with sommelier cards were characterized by the lowest variance, and bias only slightly larger than pairs without cards. Thus, the lower variance was likely due to useful constraints provided by sommelier card's linguistic categories, reducing the number of degrees of freedom of the system. By organizing their communication around these categories pairs were able to share their insights more reliably and precisely. Such results in collective decision making have been shown in earlier research for less complex tasks (e.g., Bahrami et al., 2010; Denkiewicz et al., 2013) and theoretical models have been developed to test which method of information integration have been used such as, e.g., weighted confidence sharing (Sorkin et al., 2001; Bahrami et al., 2010). It is possible that the present task requires more complex information integration models. This matter requires further investigation.

In the individual condition, the sommelier card provided slight constraints reducing the variance and the overall error, however this effect was very small and was not verified as significant. Thus, the external language categories had a large impact on pairs constraining their communication, but did not influence individuals, who did not have to share their experiences to jointly produce a description.

Further analyses revealed that the constraining effects of collectivity were already detectable in the learning phase. Analyses of the individuals' descriptions coherence and discriminativeness revealed them to be indistinguishable from a random baseline, which means they were not able to use this cultural tool effectively. Sommelier card-assisted descriptions produced by pairs were both more coherent and more discriminative than the descriptions produced by individuals. Importantly, informative sommelier cards cannot be produced simply by averaging scores from non-interacting individuals, thus by-passing true social interaction. This finding indicates that collective benefit effects are contingent on genuine dialogical interaction dynamics (Bahrami et al., 2010; Denkiewicz et al., 2013).

In the introduction, we left it open if the two target factors 'collectivity' and 'engagement of a cultural artifact' would affect the behavior of the participants in an purely additive or interactive manner. Our data suggests that the influence is interactive rather than additive. Individuals did not seem to benefit from the sommelier cards and the descriptions through cards prepared by them were not as discriminative as descriptions prepared by pairs. This observation suggests that rather than an external memory aid, we should consider the sommelier card as a linguistic tool beneficially constraining communication. The numbers of degrees of freedom cannot be reduced solely by means of using professional verbal categories—meanings of those categories have to be negotiated and clarified in interaction. While analyses suggest that the benefit of pairs with sommelier cards is contingent upon interaction, it remains open whether such effects are due to the co-creation of a shared description vocabulary. This will be subject of further analysis of already gathered transcript data.

Interestingly, the clear significant collective benefit for the quality of descriptions resulted in only a slight increase in performance, which did not reach significance. The framework we used in this paper gives a useful tool to gain insights also into intraindividual process of information integration: taking into account different modalities within an individual can also be interpreted in terms of a "collective" system. Inclusion of multiple modalities links smoothly with research on the so called verbal overshadowing effect. The verbal overshadowing effect, although replicated by many (Schooler and Engstler-Schooler, 1990; Schooler et al., 1996, 1997; Dodson et al., 1997; Finger and Pezdek, 1999), by others is considered controversial (Yu and Geiselman, 1993; Meissner and Brigham, 2001; Memon and Rose, 2002; Memon et al., 2003). It has been shown that intermediate level individuals (non-novices and non-experts) who formulate detailed verbal descriptions of complex nonverbal stimuli experience detrimental effects on recognition in comparison with those who did not formulate such descriptions (Melcher and Schooler, 1996; Ryan and Schooler, 1998). By inviting naive participants, we tried to minimize the possible influence of this effect, however the combined result of better quality of descriptions for the pairs (good verbal coordination) with lack of increase in performance points to this factor as one of possible causes.

Future work should investigate overshadowing effect using bias-variance decomposition framework. This should give a clear answer whether intermediate individuals formulating verbal descriptions suffer from increased bias or increased variance. In this case increased bias would mean that verbal categories indeed overshadow (constrain too strongly) the ability to sensorily distinguish samples. Increased variance, on the other hand, would mean that sensory and verbal categories add up resulting in too many degrees of freedom. This opens possibilities for future research.

In this work we performed some basic analyses of the communication transcripts in terms of the amount of verbal exchange and vocabulary consistency. In the future we are also planning to look deeper into communication transcripts of particular pairs to find insights into specific factors, which constitute successful communication. For example we plan to check how the dynamics of the conversation unfold, and if better performance is linked to the linguistic alignment on key terms over time.

# 7. CONCLUSIONS

Our study showed an impact of linguistic interaction in a complex recognition task, although performance benefit that stems from such interaction is not conclusive. Analyzed on the systemic level, this impact can be understood in terms of the kinds of error that various types of collective systems are prone to, which in turn are indicative of the number of degrees of freedom of a system performing the task. In this particular scenario unconstrained communication between members of a pair did not constrain the system, while adding a sommelier card to the pairs' task beneficially reduced the system's degrees of freedom. Thus, we have demonstrated how constraints from different types of linguistic interaction (spontaneous vs. utilizing a cultural artifact created on a slower timescale) influence the system differently.

It is important to note that the effects obtained pertain to the systemic level of the linguistically mediated interactions which were created in our study. Analysis on this level, with the use of methods such as bias-variance decomposition can be informative and helpful as a source of hypotheses about the individual-level cognitive processes that are present in such tasks.

# REFERENCES


Linking these levels of explanation (individual and collective) is crucial for understanding how language functions as a social coordination tool.

# AUTHOR CONTRIBUTIONS

JZ performed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper. MD performed the experiments, analyzed the data, wrote the paper. AD, AR, JK, PL, MS, AK, ES, KK performed the experiments, wrote the paper. RF, KT conceived and designed the experiments, reviewed drafts of the paper. JR conceived and designed the experiments, performed the experiments, wrote the paper, reviewed drafts of the paper.

# FUNDING

This work has been funded by European Science Foundation EuroCORES EuroUnderstanding grant DRUST 888/N-EuroUnder/2011/0 to the last author. JZ received funding from Operational Programme Human Capital "Information technologies: Research and their interdisciplinary applications" agreement no UDA-POKL.04.01.01-00-051/10-00 under the European Social Fund and Polish National Science Centre grant number 2015/16/T/ST6/00493. RF and KT were funded by Interacting Minds Centre, Aarhus University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2016.01321


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Zubek, Denkiewicz, D˛ebska, Radkowska, Komorowska-Mach, Litwin, St˛epien, Kuci ´ nska, Sitarska, Komorowska, Fusaroli, Tylén and R˛aczaszek- ´ Leonardi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Impairments of Social Motor Synchrony Evident in Autism Spectrum Disorder

Paula Fitzpatrick<sup>1</sup> \*, Jean A. Frazier<sup>2</sup> , David M. Cochran<sup>2</sup> , Teresa Mitchell<sup>2</sup> , Caitlin Coleman<sup>1</sup> and R. C. Schmidt<sup>3</sup>

<sup>1</sup> Department of Psychology, Assumption College, Worcester, MA, USA, <sup>2</sup> Department of Psychiatry, University of Massachusetts Medical School, Worcester, MA, USA, <sup>3</sup> Department of Psychology, College of the Holy Cross, Worcester, MA, USA

Social interactions typically involve movements of the body that become synchronized over time and both intentional and spontaneous interactional synchrony have been found to be an essential part of successful human interaction. However, our understanding of the importance of temporal dimensions of social motor synchrony in social dysfunction is limited. Here, we used a pendulum coordination paradigm to assess dynamic, process-oriented measures of social motor synchrony in adolescents with and without autism spectrum disorder (ASD). Our data indicate that adolescents with ASD demonstrate less synchronization in both spontaneous and intentional interpersonal coordination. Coupled oscillator modeling suggests that ASD participants assembled a synchronization dynamic with a weaker coupling strength, which corresponds to a lower sensitivity and decreased attention to the movements of the other person, but do not demonstrate evidence of a delay in information transmission. The implication of these findings for isolating an ASD-specific social synchronization deficit that could serve as an objective, bio-behavioral marker is discussed.

Edited by:

Hanne De Jaegher, University of the Basque Country, Spain

#### Reviewed by:

Bert Timmermans, University of Aberdeen, UK Mohamed Chetouani, Pierre-and-Marie-Curie University, France

#### \*Correspondence:

Paula Fitzpatrick pfitzpat@assumption.edu

#### Specialty section:

This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology

Received: 15 March 2016 Accepted: 18 August 2016 Published: 31 August 2016

#### Citation:

Fitzpatrick P, Frazier JA, Cochran DM, Mitchell T, Coleman C and Schmidt RC (2016) Impairments of Social Motor Synchrony Evident in Autism Spectrum Disorder. Front. Psychol. 7:1323. doi: 10.3389/fpsyg.2016.01323 Keywords: social synchrony, autism spectrum disorders, social dysfunction, social dynamic behavior, coupled oscillators

# INTRODUCTION

Individuals with autism spectrum disorders (ASD) exhibit numerous impairments in social interaction that typically persist throughout adolescence and adulthood (Ballaban-Gil et al., 1996; Howlin et al., 2004; Billstedt et al., 2005; Eaves and Ho, 2008; American Psychiatric Association [APA], 2013). These deficits impact mental and physical development, learning, and behavioral functioning across settings and are the main reason that even high functioning individuals have difficulty contributing to the workforce in adulthood (Arnett, 2000; Howlin et al., 2004). Past research has found that weaknesses in social competence of individuals with ASD are comprised of deficits in a number of areas including social cognitive (Baron-Cohen, 1995) and social perceptual processes (Klin et al., 2002). These deficits, however, are difficult to treat and there are few evidence-based interventions available to target them. One noteworthy characteristic of social interactions that has not been the focus of much research is the coordination and timing of bodies that occur in jointly created actions. For example, when two people carry on a conversation, they take turns speaking and synchronizing their hand gestures (Wilson and Wilson, 2005; Louwerse et al., 2012) or match each other's stride length and step in synchrony when walking together (van Ulzen et al., 2008). The temporal nature of such social motor

synchronization remains an overlooked dimension of social communication in ASD research.

This is unfortunate because there is a large body of research that suggests that how we move our body or express ourselves via our body "language" has a substantial impact not only on how others perceive us, but also on our own mental states and physiological well-being. For example, synchronizing one's body with another person has been found to be vital for maintaining critical aspects of successful social interaction including interpersonal responsiveness, social rapport and otherdirectedness (Bernieri et al., 1994; Lakin and Chartrand, 2003), positive self-other relations (Miles et al., 2009; Seger and Smith, 2009), cooperation (Wiltermuth and Heath, 2009; Valdesolo et al., 2010; Reddish et al., 2013, 2014), and verbal communication and comprehension (Semin, 2007; Shockley et al., 2009).

Consequently, this interpersonal synchrony can be thought to reflect psychological connectedness and research with adults has found that interpersonal synchrony breaks down in social pathology. For example, breakdowns in synchronization are associated with marital dissatisfaction (Julien et al., 2000), as well as psychological disorders such as schizophrenia (Ramseyer and Tschacher, 2011; Varlet et al., 2012) and borderline personality disorder (Gratier and Apter-Danon, 2009). The psychological importance of social synchronization is also underscored by research that found that manipulating an individual's body into different poses has the ability to change their perception, emotions, and even impact physiological changes within an individual (Strack et al., 1988; Carney et al., 2010). That is, the way we move our body influences our own mental and physiological states, the social judgments others make of us and can consequently foster or inhibit the social connection we have with others.

In its broadest sense, interpersonal synchronization can be defined as "a range of social communication activities and constructs including joint attention, imitation, turn-taking, nonverbal social communicative exchanges, affect sharing and engagement" (Charman, 2011). Such social communication requires synchronization in both time and content (Kinsbourne and Helt, 2011; Delaherche et al., 2012). Given that impairments in social interaction and communication are core features of ASD (American Psychiatric Association [APA], 2013), the role of various aspects of social synchronization, broadly defined, has been the focus of much research.

For example, imitation has been widely studied because imitation is thought to be a precursor to more complex social cognition such as joint attention and understanding agency (Meltzoff, 1990, 2009). A number of researchers have found that imitation is disrupted in ASD and have proposed that an atypically functioning mirror neuron system may be the underlying mechanism (Rogers and Pennington, 1991; Charman et al., 1997; Williams et al., 2001; Rogers et al., 2003; Gallese, 2006; Oberman and Ramachandran, 2007; Colombi et al., 2009; Rizzolatti and Fabbri-Destro, 2010). Other research, however, suggests that some children with ASD do not have deficits in imitative movements and that the mirror neuron system of the social brain may not be damaged (Hamilton et al., 2007; Gowen et al., 2008; Fan et al., 2010). Similarly, Kinsbourne and Helt (2011) argue that individuals with ASD are capable of imitation but just produce it less frequently, especially in more naturalistic situations. They suggest this is due not to imitation problems and a damaged mirror neuron system but rather caused by a lack of social attention. Moreover, Gernsbacher et al. (2008) and Dowd et al. (2010) have suggested alternate processes, namely, motor control problems, as potentially important for understanding social interaction.

Much social synchronization research has been concerned with coding of whether general activity or certain behavioral contents are synchronized (i.e., similar gestures are occurring together or at a lag). Condon was one of the early researchers to code for the timing of general activity and he proposed that synchronized bodily coordination was disturbed in social pathologies generally and in particular in children with ASD (Condon, 1982). Similarly, Trevarthen and Daniel (2005) coded for synchronous emotional arousal, initiation, and changing of attention and reported a lack of reciprocity in the parent– child exchanges of an infant with ASD and Feldman's (2007) research, based on coding of mutual gaze, shared attention, and arousal, has found that synchronization is predictive of social outcomes such as attachment and empathy. Furthermore, Oberman et al. (2009) found that children with ASD differed in latency to produce facial mimicry, but not in the amount they mimicked, suggesting problems in interpersonal synchrony may be due to disruptions in timing. A breakdown of the temporal synchronization of specific kinds of speech behaviors have also been reported in adolescents with ASD. For example, Feldstein et al. (1982) found that the ability of adolescents with autism to synchronize the timing of their speech to that of a conversational partner was poor and de Marchena and Eigsti (2010) discovered that adolescents with ASD do not synchronize gestures with speech.

The research reviewed above relies largely on a behavioral coding of specific gestures (content) to evaluate social synchronization. Such behavioral coding methods rely on identifying discrete segments of behavior and analyzing the sequencing or timing between them but are time consuming to perform and rely on highly skilled coders. Moreover, such behavioral coding is not particularly wellsuited for understanding the full temporal patterning of social synchronization in that it is discrete and not fine-grained enough to capture the complex, time-dependent dynamic organization of interpersonal synchrony. Consequently, a methodology that investigates the "process" of the social activity generally (rather than specific behaviors) in order to ascertain the time unfolding nature of social interaction may provide measures with more resolution that might deepen our understanding of the social synchronization in general and its deficits in ASD specifically.

A coordination dynamics approach to behavior (Kelso, 1995) provides a framework for the development of such a research methodology. This approach involves recording continuous time-varying process measures of behavior as they unfold and then analyzes the dynamical structure of behavior using time-series analysis techniques. These techniques allow for a more discerning measurement of behavioral coordination by evaluating the synchronization (patterning and strength) of

system components as they change over time (Haken et al., 1985; Strogatz, 2003). The temporal resolution of this approach allows for the capture of subtle dimensions of coordination that are typically missed by gross outcome measures. The ability to index subtle changes in the patterning and stability of coordination will allow us to determine whether such differences are related to the variations in social competence that are observed in adolescents with ASD.

This coordination dynamics approach has been used to model social coordination (Schmidt and Richardson, 2008; Schmidt et al., 2011). For example, both intentional coordination (directed by an explicit social goal of the people interacting) as well as spontaneous coordination (outside of the awareness of the two people interacting) of the movements of two people interacting have been modeled using a coupled oscillator dynamic for both simple laboratory tasks (Schmidt et al., 1998, 2011; Richardson et al., 2005) as well as more naturalistic interactions (Schmidt et al., 2012, 2014). In the dynamical modeling of this interpersonal synchronization, individual limbs of the two people are treated as embodying oscillators that are linked via perceptual coupling (Richardson et al., 2005; Schmidt et al., 2011, 2012; Schmidt, 1988, unpublished).

A task that has been used to study the both intentional and spontaneous interpersonal coordination is a methodology in which two people coordinate handheld pendulums swung from the wrist joint in the sagittal plane (using radial-ulnar abduction–adduction). This methodology has demonstrated that the strength of interpersonal synchronization is dependent upon many different physical as well as psychological variables (see Schmidt and Richardson, 2008 for a review) and can be understood in terms of a dynamical model of synchronization.

A synchronization dynamic model (Varlet et al., 2012) used to understand pendulum coordination utilizes a non-linear coupling of two limit-cycle oscillators:

$$\begin{aligned} \ddot{\mathbf{x}}\_1 + \delta \dot{\mathbf{x}}\_1 + \lambda \dot{\mathbf{x}}\_1^3 + \chi \mathbf{x}\_1^2 \dot{\mathbf{x}}\_1 + \omega^2 \mathbf{x}\_1 &= K\_1 \begin{bmatrix} \dot{\mathbf{x}}\_1 - \dot{\mathbf{x}}\_{2\mathbf{r}\_1} \end{bmatrix} \\ \begin{bmatrix} a + b(\mathbf{x}\_1 - \mathbf{x}\_{2\mathbf{r}\_1})^2 \end{bmatrix} \quad \text{(1)} \\ \ddot{\mathbf{x}}\_2 + \delta \dot{\mathbf{x}}\_2 + \lambda \dot{\mathbf{x}}\_2^3 + \chi \mathbf{x}\_2^2 \dot{\mathbf{x}}\_2 + \omega^2 \mathbf{x}\_2 &= K\_2 \begin{bmatrix} \dot{\mathbf{x}}\_2 - \dot{\mathbf{x}}\_{1\mathbf{r}\_2} \end{bmatrix} \end{aligned}$$
 
$$\begin{bmatrix} a + b(\mathbf{x}\_2 - \mathbf{x}\_{1\mathbf{r}\_2})^2 \end{bmatrix}$$

where x<sup>1</sup> and x<sup>2</sup> represent the positions of the two oscillators and the dot notation represents derivative with respect to time. The left side of the equations represents the limit cycle dynamics of each oscillator determined by a linear stiffness parameter (ω 2 ) and damping parameters (δ, λ, γ) and the right side represents the coupling function determined by strength parameters a and b. This model predicts that even if the two pendulums have different (inherent) eigenfrequencies (which can be induced by manipulating the length or mass of the pendulum) and two people are asked to coordinate the movements of two pendulums, they are able to do so and achieve a common tempo. However, the person swinging the pendulum that prefers to move more slowly (e.g., the one with the lower eigenfrequency) lags slightly behind the person swinging the pendulum that prefers to move faster. The magnitude of this lagging and leading (phase shift) is determined by the interplay of the difference between the eigenfrequencies of the pendulums (the degree of frequency detuning) and two model parameters—one that indexes the coupling strength of the two oscillators (K<sup>1</sup> and K<sup>2</sup> corresponding to the coupling strengths of the oscillators 1 and 2) and another that indexes the rate (delay/advance) of information transmission (x2τ<sup>1</sup> and x˙2τ<sup>1</sup> corresponding to the position and the velocity of the oscillator 2 at a previous time point t−τ<sup>1</sup> and the parameters x1τ<sup>2</sup> and x1τ<sup>2</sup> corresponding to the position and the velocity of the oscillator 1 at a previous time point t−τ2.

Using such a dynamical model to understand how synchrony breaks down in social deficits has the distinct advantage of allowing one to infer which dynamical components of the model are underlying the impairment. Varlet et al. (2012) adopted this strategy and found that individuals with schizophrenia exhibited both a lower coupling strength and an information transmission delay when performing intentional interpersonal coordination. However, they did not find any disruptions in spontaneous interpersonal coordination. These findings suggest that individuals with schizophrenia may not be attending to others or may be delayed in their responses during social interactions when they are interacting with them under an explicit social goal to coordinate. Del-Monte et al. (2013) have extended this work and found that first-degree relatives of patients to schizophrenia demonstrate the same overall pattern of synchronization impairments as individuals with schizophrenia. Namely, the first-degree relative pairs also demonstrated larger phase lag and greater variability but only for the intentional rhythmic coordination of pendulums. The results of these two studies suggest that social motor synchronization may be part of schizophrenia's core deficits and may provide a bio-behavioral marker for the disorder.

Relatedly, to demonstrate the feasibility of using dynamical measures of social synchronization to investigate the social deficit in those with ASD, Fitzpatrick et al. (2013a,b) designed a battery of movement tasks to investigate the dynamics of social synchrony in children (6–10 years old) with ASD. They also utilized traditional cognitive measures of social competence (joint attention, theory of mind, intentionality, and cooperation) and several social motor measures including imitation, synchronization and an interpersonal hand-clapping game. Findings yielded significant relationships between social cognitive and social synchrony measures and a principal components analysis revealed three different factors (social attention, social knowledge, and social action) as important for characterizing embodied social competence. These findings suggested that social competence is a complex construct and identified social synchrony as a potentially important pathway for understanding the social problems of children with ASD.

Taken together, the research discussed above suggests that social synchronization is a potentially important pathway for understanding the social problems characteristic of people with social deficits. The current study extends the previous work by employing a pendulum coordination task to examine the content and timing of social motor synchronization of adolescents with ASD. The aim of this study is to determine whether adolescents with ASD exhibit an interpersonal synchrony deficit

and whether this can be used to differentiate adolescents with and without ASD. In particular, we are evaluating (a) whether disruptions are evident in both intentional and spontaneous coordination; and (b) which components of the coupled oscillator dynamic are impaired (e.g., coupling strength only, information transmission only, neither or both). An impairment in coupling strength would reveal difficulty in attending to social cues, an impairment in information transmission would suggest problems with detecting and processing the information in time for an appropriate response, and disruptions in both would indicate problems with both attending to social cues as well as processing the information. The use of a social motor synchronization task allows for a precise, objective, and dynamical measure of synchronization and a more nuanced exploration of the temporal nature of synchronization. In addition, the direct dynamical modeling available using the pendulum paradigm will allow us to explore whether a social synchronization deficit is general or specific to a disorder (i.e., different for schizophrenia and ASD).

# MATERIALS AND METHODS

# Participants

A total of 18 adolescents paired with one of their parents participated in this study. There were nine adolescents with a diagnosis of ASD (eight males, one female, average age 13.67 ± SD years, range 12–17) and nine control adolescents (seven males, two females, average age 14.44 ± SD years, range 12–16). There was one adolescent with ASD who was left-handed; all other participants in both groups were right-handed.

The participants with ASD had previously been diagnosed by a licensed clinical psychologist or psychiatrist based on Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Text Revision (DSM-IV-TR) criteria (American Psychiatric Association [APA], 2000) and diagnosis was confirmed using the Autism Diagnostic Observation Schedule, 2nd Edition (ADOS-2) (Lord et al., 2012). The ADOS-2 is a semi-structured, standardized assessment of communication, social interaction, and play for individuals referred because of a question of a possible diagnosis of autism. Control participants also completed the ADOS-2. Five participants were administered Module 3 whereas 13 were administered Module 4 based on their developmental and language level. The mean ADOS scores for the two groups were significantly different from each other and confirmed group membership (**Table 1**). The groups were matched for chronological age and the Wechsler Abbreviated Scale of Intelligence (WASI) IQ score (Wechsler, 2011) for both groups was in the normal range of 85–115, although the WASI IQ score of the ASD group was slightly lower than the control group (**Table 1**).

All parents of participants gave informed, written consent for their children to take part in the study, and adolescents also provided assent to participate. The project was approved by the University of Massachusetts Medical School (Docket # H00001602) and Assumption College Institutional Review Boards (IRB # 2012-17, March 18, 2013).

Participants were recruited from local communities through print advertising, a recruitment brochure, email, social media, and community events. Recruitment material was posted on various community and University of Massachusetts Medical School websites.

# Apparatus

Participants sat on chairs 1 m apart, facing the same direction (**Figure 1**). Each chair had a forearm support attached to the inside of the chair parallel to the ground. This ensured that the handheld pendulums would be swung about the wrist in the sagittal plane and participants would have an unobstructed view of their partner's pendulum. Adolescents swung the pendulums with their dominant hand and parents swung the pendulums with the non-dominant hand.

The time-series motions of the pendulums were recorded at 100 Hz using a magnetic motion tracking system (Polhemus Liberty, Polhemus Corporation, Colchester, VT, USA) and 6-D Research System software (Skill Technologies, Inc., Phoenix, AZ, USA). A sensor was attached to the end of each pendulum to record the angular displacement of the pendulum. The time series of participants were low-pass filtered using a 10 Hz Butterworth filter.

# Pendulum Preferred Frequency of Oscillation Manipulation

Two handheld pendulums, each composed of a wooden dowel that was 54 cm in length and had a 100 g weight attached


TABLE 1 | Participant characteristics and clinical phenotyping.

at either the bottom or the middle of the pendulum, were used. The placement of the weight manipulated the inertial loading of the pendulum, and hence, the preferred frequency of oscillation. Pendulums weighted at the middle have a lower inertial load and a higher preferred frequency of oscillation whereas pendulums weighted at the bottom have a larger inertial load and a lower preferred frequency of oscillation (Schmidt and Turvey, 1994; Schmidt and O'Brien, 1997). The pairing of the two pendulums resulted in three pendulum combination conditions for participant pairs that reflect differential inertial loadings of the pendulums and differential preferred frequencies of oscillation: 0 (no inertial difference between pendulum conditions, both adolescent and parent have pendulum weighted at bottom, no preferred frequency of oscillation difference); 1 [parent had pendulum with higher inertial loading (mass at bottom) and adolescent had pendulum with lower inertial loading (mass at middle), adolescent had higher preferred frequency of oscillation and should lead the coordination]; and −1 [adolescent had pendulum with higher inertial loading (mass at bottom) and parent had pendulum with lower inertial loading (mass at middle), parent had higher preferred frequency of oscillation and should lead the coordination].

# Social Synchronization Tasks

To measure social synchronization, adolescent–parent pairs swung three combinations of pendulums as described above and the movement time series of the adolescent's and parent's pendulums were recorded using the Polhemus movement capture system. Two different synchronization tasks were performed, spontaneous synchrony and intentional synchrony.

### Spontaneous Social Synchronization

To evaluate spontaneous synchrony, 90 s trials were completed in which each participant swung his/her pendulum at a comfortable

tempo and maintained that tempo. During the control trial segments (the first and last 30 s) participants were looking away from their partner's pendulum and during the spontaneous coordination experimental segment (middle 30 s) the participants were looking at each other's pendulums (**Figure 1A**). Trials, including the not looking (control) segments and looking (spontaneous coordination) segments, were completed for each of the three pendulum combination conditions. Two replications per pendulum condition were completed for a total of six spontaneous synchrony trials.

#### Intentional Social Synchronization

To evaluate intentional synchrony, participant pairs were instructed to coordinate their pendulum swinging with their partner in either an in-phase pattern so their pendulums were in the same portion of their cycles at the same time (**Figure 1B**) or anti-phase pattern so that their pendulums were in opposite portions of their cycles at the same time (**Figure 1C**). Trials were 60 s, with two replications for each pendulum combination condition, for both in-phase and anti-phase coordination resulting in a total of 12 intentional coordination trials (6 inphase, 6 anti-phase).

# Social Synchronization Measures

Relative phasing of the adolescent's and parent's pendulum movements was used to evaluate the degree and pattern of rhythmic synchronization. Relative phase is an angle that measures where one rhythm is in its cycle (i.e., its phase) with respect to where another rhythm is in its cycle. If two rhythms are in identical parts of their cycles, they have a relative phase of 0 ◦ and are in-phase. If two rhythms are in opposite parts of their cycles, they have a relative phase of 180◦ and are in anti-phase. A continuous relative phase time series was computed from the two angular positions of pendulums using the Hilbert transform (Pikovsky et al., 2003).

### Spontaneous Social Synchronization Task

The degree of synchronization was evaluated by a measure of relative phase variability. We computed the circular variance (Batschelet, 1981) of the relative phasing between the two participant's movements from the continuous relative phase time series. This measure yields an index of synchronization between 0 and 1 with 1 reflecting a perfect synchronization and 0 reflecting an absence of synchronization. The circular variance represents the proportion of relative phases relationships visited by the two time series. A circular variance of 0 means that the two time series never visited the same relative phase relationship more than once. Higher values of circular variance indicate that the two time series repeatedly visited a set of relative phase relationships throughout the trial.

#### Intentional Social Synchronization Task

To evaluate the synchronization that occurred in both intentional in-phase and anti-phase synchronization tasks, two dependent measures were calculated from the relative phase time series. First, circular variance was calculated to measure the overall degree of synchronization. As mentioned above, a circular variance of 0 indicates no synchronization and a circular variance of 1 indicates perfect synchronization.

Second, mean circular relative phase angle (Batschelet, 1981) was calculated from the continuous relative phase time series to determine the phase shift (lag–lead relationship) associated with each dyad's coordinated rhythmic movements. Positive relative phase angles (phase shifts from intended phase 0◦ or 180◦ ) indicated that child led the coordination and negative relative phase angles (phase shifts) indicated that child followed the movements of the parent.

# Design and Procedure

Participants completed two separate experimental sessions, approximately 1 week apart. In the first experimental session at the University of Massachusetts Medical School, clinical phenotyping was completed including the ADOS-2 and the WASI Matrix Reasoning and vocabulary subtests and lasted approximately 3 hours. Additional clinical phenotyping measures were administered as part of a larger study during this session but are not reported here.

In the second visit, the social synchronization tasks were completed. All participant pairs completed the spontaneous synchrony trials at the start of the experimental session to prevent experimental task demands from influencing performance. The order of presentation of the in-phase and anti-phase intentional synchrony trials was counterbalanced across participants—half of the participant pairs completed in-phase trials followed by anti-phase trials and half completed anti-phase trials followed by in-phase trials. Two additional experimental tasks were also completed as part of a larger study but they are not being reported here.

To summarize the design of the experiment, diagnosis group (ASD, neurotypical control) was a between-subject variable. Group differences in clinical phenotyping were evaluated with independent samples t-tests. In the spontaneous social synchrony task, diagnosis group was a between-subject variable, and pendulum combination condition [−1 (adolescent with higher loading, parent should lead), 0 (no differential loading), 1 (parent with higher loading, child should lead)], and looking condition (1st 30 s, not looking; 2nd 30 s looking; 3rd 30 s not looking) were within-subject variables. The circular variance of relative phase for the spontaneous synchrony task was analyzed with a 2 (diagnosis group) × 3 (pendulum combination) × 3 (looking) analysis of variance (ANOVA). For the intentional synchrony task, a 2 (diagnosis group) × 3 (pendulum combination) × 2 (phase mode, in-phase or anti-phase) ANOVA was used to analyze the dependent measure, circular variance of relative phase. Circular variance values were standardized using a Fisher's z-transformation before the statistical analyses were performed. Bonferroni post hoc tests were used as necessary to determine the nature of the effects. To determine whether IQ affected the results, all the ANOVAs reported below were run with IQ as a covariate. IQ was not a significant factor in any of the analyses, nor did IQ occur as a variable in any significant interactions. Therefore, results are reported below without IQ as a covariate.

# RESULTS

pairs than the control pairs (∗p = 0.03).

# Was There an ASD Synchrony Deficit for Spontaneous Coordination?

For the spontaneous coordination task, an ANOVA on the circular variance of relative phase resulted in a significant main effects of pendulum combination [F(2,32) = 9.94, p < 0.001, η 2 = 0.38], looking condition [F(2,32) = 23.15, p < 0.001, η <sup>2</sup> = 0.59], and diagnosis group [F(1,16) = 5.77, p = 0.03, η <sup>2</sup> = 0.27]. These results indicate that both groups had higher spontaneous entrainment when the pendulums were the same rather than different, that both the groups demonstrated spontaneous entrainment during the looking condition (as evidenced by higher circular variance) and that ASD pairs had less spontaneous entrainment than the control pairs across all trial segments. The latter two main effects were qualified by a significant looking segment and diagnosis interaction [F(2,32) = 3.25, p = 0.05, η <sup>2</sup> = 0.17], indicating that there was only a significant group difference for the looking trial segment (p = 0.03, η <sup>2</sup> = 0.25) but not for either of the non-looking segments (both p > 0.05, η <sup>2</sup> = 0.01 and 0.10; **Figure 2**). The interaction between looking condition and pendulum combination was also significant [F(4,64) = 3.18, p = 0.02, η <sup>2</sup> = 0.17] suggesting that the degree of synchronization observed depended upon the pendulum combination more for the looking than the nonlooking conditions. Neither the interaction between pendulum combination and diagnosis nor the three-way interaction were significant.

# Was There a ASD Synchrony Deficit for Intentional Coordination?

# Circular Variance

For the intentional synchrony trials, an ANOVA on circular variance of relative phase verified several dynamical model predictions that have been observed before in a number of studies (see Schmidt and Richardson, 2008 for a review). A main effect of phase mode [F(1,16) = 157.60, p < 0.001, η <sup>2</sup> = 0.91], revealed that in-phase coordination demonstrated more stable entrainment (0.88) than anti-phase (0.73). A main effect of pendulum combination [F(2,32) = 10.63, p = 0.001, η <sup>2</sup> = 0.40] indicated that pendulum combinations with similar pendulums had more stable entrainment than combinations with different pendulums. The phase mode by pendulum combination interaction was not significant [F(2,32) = 1.86, p = 0.17, η <sup>2</sup> = 0.10], indicating the influence of pendulum combination was the same for both in-phase and anti-phase coordination.

Importantly, across all conditions, a main effect of diagnosis group revealed that ASD pairs had less stable entrainment than control pairs [0.71 and 0.90, respectively, F(1,16) = 24.55, p < 0.001, η <sup>2</sup> = 0.61]. The interaction between phase mode and diagnosis was not significant indicating that the ASD group had lower circular variance than the control for both inphase and anti-phase coordination (**Figure 3**). In addition, the interaction between pendulum combination and diagnosis was not significant, nor was the three-way interaction, suggesting that the influence of pendulum combination was similar for both groups (**Figure 3**), with the ASD group demonstrating an overall lower level of synchronization.

#### Phase Shift

The ANOVA on the phase shift (mean relative phase angle) revealed the model-based predicted main effect of pendulum combination [F(2,32) = 20.21, p < 0.001, η <sup>2</sup> = 0.56] indicating greater lagging for the person with the larger pendulum. The positive sign of the phase shift values indicate that across both groups the adolescent always led the parent and, there was a trend toward the adolescent with ASD to lead by more [22.82◦ vs. 8.42◦ ; F(1,16) = 3.5, p = 0.08, η <sup>2</sup> = 0.18]. The three-way interaction between pendulum combination, phase mode, and diagnosis was significant [F(2,32) = 4.96, p < 0.01, η <sup>2</sup> = 0.24]. Follow-up analyses revealed no group differences for in-phase coordination but an interaction between pendulum combination and group [F(2,32) = 5.70, p < 0.01, η <sup>2</sup> = 0.26] for anti-phase coordination suggesting a steeper linear increase in the phase shift with pendulum combination for the ASD group (**Figure 4**).

To verify this conclusion, a regression analysis was conducted with subject pair mean phase shift values as the dependent variable and actual eigenfrequency differences (frequency detuning value as determined from the unintentional nonlooking segments) as the independent variable. As seen in **Figure 5**, there was a significant correlation for the ASD pairs, r <sup>2</sup> = 0.41 (p < 0.001), and both the slope and intercept were significantly different from 0 (93.84, p < 0.001 and 11.97, p = 0.02, respectively). For the control pairs, there was a significant correlation as well, r <sup>2</sup> = 0.32 (p = 0.008), and both the slope and intercept were significantly different from 0 (39.75, p = 0.008 and 7.47, p = 0.008, respectively). The significant slopes in these analyses indicate the model predicted change in phase shift with the eigenfrequency differences of the pendulum pairs whereas the significant intercepts indicate that for both groups the child led the parent in the coordination. A Wald chi-square test showed that the intercepts were not significantly different

between groups (p = 0.4) but the slopes were significantly different (p < 0.001). This slope difference verifies a steeper linear increase in the phase shift with pendulum combination for the ASD group, indicative of weaker coupling. The lack of difference between the intercepts indicates that overall the ASD group did not lead the parent more: there was not an overall phase advance by the ASD group, which would translate into an overall tendency to anticipate.

# DISCUSSION

The findings reported here indicate that adolescents with ASD demonstrated a disruption of both spontaneous synchronization and intentional synchronization. Analysis of circular variance of relative phase confirmed spontaneous social entrainment occurred in both groups, corroborating past research on the ubiquity of spontaneous entrainment. However, the ASD group had weaker spontaneous synchronization during the important second trial segment when participants were viewing each other's pendulum. For intentional social coordination, the circular variance of relative phase confirmed a number of

dynamical model predictions. Namely, for both groups, antiphase synchronization was weaker than in-phase synchronization and coordinating different pendulums was less stable than coordinating similar pendulums. Importantly though, these analyses also indicate that intentional social synchronization was weaker for the ASD pairs. Thus, our findings on the degree of synchronization using circular variance indicate that ASD participants synchronized less well under conditions in which synchronization occurs spontaneously in the presence of perceptual information of the social partner and in situations when there is an explicit social goal to coordinate with another person (e.g., intentional synchronization).

Evaluation of the pattern of synchronization using the phase shift for the intentional task replicated past findings of frequency detuning in which there was greater lagging for the person with the larger pendulum. Whereas this was true for both groups, the rate of change of this lagging across pendulum pairs was not equal for the two groups: For anti-phase coordination, the ASD pairs showed a steeper lagging slope (**Figure 5**), which indicates, as

suggested by the circular variance, that the ASD pairs had weaker synchronization.

In terms of the dynamical model in Eq. 1, these analyses suggest that the ASD pairs assembled a synchronization dynamic in both spontaneous and intentional social coordination situations that has weaker coupling strengths, K, than the synchronization dynamic assembled by the control pairs. Such a weakness in dynamical entrainment corresponds to a lower sensitivity and attention to the movements of the other person. Kinsbourne and Helt (2011) have suggested that interpersonal synchrony problems in ASD may be due to a lack of social attention and these findings are consistent with such a claim. That is, given the social nature of the task, the adolescent with ASD was unable to sustain his/her attention to the movement of the partner's pendulum throughout the trial and hence the synchronization of his/her movements with the partner was lower. Similarly, Bebko et al. (2006) suggest those with ASD may have disruptions in perceiving the temporal aspects of social interactions. This interpretation is reinforced by Koehne et al.'s (2016) findings that adults with ASD did not have synchronization problems when they were asked to synchronize their movements with a dot on a screen. Those participants were told that the movements of the dot were either controlled by a human or a computer, but no social information was present during the task. Participants were not required to use social attention or perception and their synchronization ability was not impaired. Additional research is needed to carefully evaluate the role of animacy on synchronization ability by systematically varying the level of task sociality.

At the same time, whereas there was a slope difference in the regression analysis of frequency detuning (**Figure 5**), there was no intercept difference between the two groups. This lack of a difference suggests that the ASD group did not lead the parent more than the control group and also indicates that there was no group difference in the rate (delay/advance) of information transmission terms in Eq. 1 (Varlet et al., 2012). These findings would indicate that the synchronization problems of adolescents with ASD was due to problems with attention or perception but not with the timing of information transmission. One could also argue that the synchronization difficulties evident in ASD may be the result of motor control problems, which are also common in ASD (Ghaziuddin and Butler, 1998; Pan et al., 2009; Fournier et al., 2010). A number of researchers suggest that motor problems may contribute to the social difficulties of those with ASD (Gernsbacher et al., 2008; Dowd et al., 2010; Bhat et al., 2011). Disentangling the roles that motor control and social attention and perception play in synchronization is needed in future research.

The finding that the majority of children led the parent in synchronization, in both groups, is somewhat surprising. One might have expected adolescents with ASD to be less likely to lead in the coordination. In fact, Warlaumont et al. (2010) found that infants between 16 and 48 months with ASD were more likely to lag in parent–child vocal communicative exchanges while infants without ASD were more likely to lead. Similarly, Varlet et al. (2012) found that individuals with schizophrenia were less likely to lead their partner when performing a social motor synchronization task. The finding that this is not the case in ASD could be suggestive of a lack of attention to the social partner and a lack of reciprocity—the adolescent with ASD is moving the pendulum and the parent is adjusting his/her movements to match the adolescent. This is consistent with the original conception of ASD by Kanner (1943) and Asperger (1944/1991) as a tendency to focus attention inward on their own bodily states even when engaged in tasks that require interaction with the environment.

This specific pattern of disruptions in synchronization ability may be unique to ASD. Whereas participants with schizophrenia have been found to have a social synchrony deficit during intentional synchronization but not spontaneous synchronization (Varlet et al., 2012), participants with ASD demonstrate a less stable entrainment for both intentional as well as spontaneous social synchrony. It appears that in individuals with schizophrenia synchronization is disrupted only when there is an explicit social goal, while in ASD the reduction in coupling strength is evident both when there is an explicit social goal to coordinate and when there is no explicit social goal to coordinate. Furthermore, during intentional coordination participants with

schizophrenia not only had a weaker coupling strength but also demonstrated a delay in information transmission (Varlet et al., 2012). In contrast, the participants with ASD did not have a deficit in the rate of information transfer. These findings suggest that social synchronization deficit evident in ASD is different from schizophrenia and may be different from other disorders characterized by problems with social interactions. Consequently, social synchronization may prove to be a biobehavioral marker of the social deficits in ASD.

In addition, the dissociation of deficits in intentional and spontaneous social synchronization suggests that these kinds of entrainment may function independently and have distinct underlying mechanisms. One might argue that these differences could be due to the fact that the participants with schizophrenia were adults and the participants with ASD were adolescents. This seems unlikely, however, because the data from the adolescent controls replicated the dynamical model predictions that have been extensively demonstrated with adult participants. Another important difference between schizophrenia and ASD is that schizophrenia typically has an onset in early adulthood while the onset of ASD is much earlier and could account for the disruptions in spontaneous synchronization evident in ASD but not schizophrenia. Caution, therefore, is warranted in drawing firm conclusions until future research has explored these differences with larger sample sizes, conducted studies to directly compare diagnostic groups, and compared adult and adolescent populations to isolate any developmental differences.

Our results demonstrating that social synchronization successfully differentiates adolescents with and without ASD is consistent with other work using dynamical measures of synchronization that has found similar differences in social synchronization abilities in children with ASD (ages 6–10 years old; Fitzpatrick et al., 2013a,b). These findings are also consistent with behavioral-coding work indicating disruptions in synchronization of parent–child interactions (Condon, 1982; Trevarthen and Daniel, 2005; Feldman, 2007), timing of facial mimicry (Oberman et al., 2009), synchronization of speech with a partner (Feldstein et al., 1982), and synchronization of speech and gesture (de Marchena and Eigsti, 2010).

The confirmation of social synchronization differences in an older population using a task that allowed direct dynamical modeling, combined with the finding that the synchronization deficit in ASD is different from the deficit seen in schizophrenia, raises the important possibility that social synchronization could be a bio-behavioral marker for ASD. This research also points to the importance of using objective, dynamical, process-oriented measures of social synchronization to be able to fully evaluate the temporal nature of social synchronization. Our research focused on synchronization in the context of a social motor task. Future research is planned to use this dynamical methodology to explore social synchronization in more naturalistic tasks such as the coordination of whole body movements and speech during conversation tasks. Cross recurrence analysis provides another potentially fruitful dynamical methodology for analyzing the temporal and directional characteristics of interpersonal exchanges (e.g., Richardson et al., 2008; Coco and Dale, 2014). Cross recurrence analysis has demonstrated, for example, that mother–infant gaze patterns become more tightly coupled developmentally (Nomikou et al., 2016), infants with ASD are more likely to lag parent–child vocal exchanges while infants without ASD are more likely to lead (Warlaumont et al., 2010), and children with ASD demonstrated less stable and more deterministic social motor coordination (Romero et al., 2016). Questions remain, however, about whether synchronization differences are due to underlying mechanisms that are social, motor, or due to attention or perceptual processing disruptions. Future research is planned to disentangle the role of motor, attention/perception, and social contributions to social synchronization.

One potential limitation of this research is that the participants were performing the task with their parent. While this was chosen to reduce the anxiety that would be inherent in doing the task with a stranger, it may have contributed to the finding that, in both groups the adolescents always led the parent in the coordination. It is possible that there could be something distinct about the interactions between parent and adolescent that would not generalize to interactions of other social pairs. In addition, due to the heritability of ASD (e.g., Zhao et al., 2007; Hallmayer et al., 2011), the parents of the ASD participants could have symptoms on the ASD spectrum that could also contribute to the synchronization displayed by those pairs. Del-Monte et al. (2013) found that this was the case with first-degree relatives of individuals with schizophrenia—they also demonstrated the same overall pattern of synchronization impairments as individuals with schizophrenia. Alternatively, it is possible that parents of children with ASD over-compensate and adjust their behavior more to match their child. If that were the case, it would suggest the synchronization ability of the adolescents might be overestimated here. Green et al. (2010), for example, found that the proportion of synchronous parental communications increased after parents completed a training program that focused on increasing parental response to communication and action routines In future research, we plan to explore this issue by having participants complete the tasks with a stranger.

Another potential limitation lies in the fact that observed synchronization is the result of "live" reciprocal interactions between people. This means that factors of the interaction are by its very nature uncontrolled. This could be circumvented in future research by using video-based presentation of the partner as this would not only allow for the standardization of the movements of the partner used to elicit social behavior, but also for the manipulation of the reciprocity of coupling between the partner and adolescent and the simultaneous recording of the movements of both. This sort of precision in presentation and manipulation of social movements and simultaneous measurement of the user interaction as it unfolds will help clarify the unique contribution of each partner to initiating and maintaining the social synchronization.

Furthermore, the relatively small number of participant pairs used suggests that replication would be prudent before largescale conclusions can be drawn about the specific pattern of results being a bio-behavioral marker unique to ASD. That being said, it should be noted that our significant effects have large

effect sizes and the sample size used here is similar to those in past studies that have investigated social synchronization using the pendulum paradigm with populations without social deficits (Schmidt and O'Brien, 1997; Richardson et al., 2005). Nonetheless, studies including large samples of both ASD and schizophrenia dyads should be performed to definitely conclude that the social synchronization deficits are different for these two groups.

An inherent challenge in delineating the precise nature of ASD-specific social deficits lies in the fact that the population of individuals with ASD is phenotypically and behaviorally heterogeneous. The participants in our sample were relatively high-functioning. In future work we plan to investigate the heterogeneity in adolescents with ASD by recruiting a more diverse participant population and measuring behavior across multiple domains (motor, social, cognitive, emotional, neural) and conducting a discriminant analysis to estimate the contribution each of these components makes to the synchronization difficulties both on the group and individual level. This will allow us to better understand the heterogeneity in ASD and how it relates to synchronization ability.

To help identify the mechanisms underlying intentional and spontaneous synchronization, additional research is planned using electroencephalogram (EEG) and functional magnetic resonance imaging (fMRI) to map brain activity during social synchronization to determine how neurophysiological activity in individuals with ASD is different from that of controls. Researchers are beginning to investigate the underlying neural activity involved in social synchronization (Kelso et al., 2013) but little is known about how it develops or how it may differ between healthy and ASD populations. Research has found that EEG activity in the alpha-mu band between the centroparietal regions in the right hemisphere (Dumas et al., 2010; Naeem et al., 2012) is different during intentional and spontaneous coordination. In particular, those investigators found comparatively more mu suppression in central–parietal brain regions, with intentional synchronization showing more mu suppression than spontaneous. Mu activation is associated

# REFERENCES


with understanding and coordinating motor acts and these patterns of deactivation of mu activity suggest they may be a neural correlate of social synchronization. Exploring whether mu activation is different in ASD during intentional and spontaneous social synchronization could provide us with important insights for understanding the mechanisms responsible for the social problems characteristic of the disorder.

Coordinating one's movements with another person typically helps to facilitate social connection. The current findings suggest that adolescents with ASD have disruptions in social synchronization and this may in turn interfere with the formation and maintenance of social bonds. The role of abnormal movement patterns during social interactions, and how they may contribute to or maintain social deficits, raises important questions for understanding the social problems characteristic of ASD as well as other developmental and psychiatric disorders. The findings here suggest there may be a social synchronization deficit that is ASD-specific and could likely serve as an objective, bio-behavioral marker.

# AUTHOR CONTRIBUTIONS

PF: study design, data collection, data analysis, data interpretation, writing; JF: study design, data collection, data analysis, data interpretation, writing; TM: data interpretation, writing; DC: administering clinical assessments, data collection; CC: data recruitment, data collection, data entry; RS: study design, data collection, data analysis, data interpretation, writing.

# FUNDING

This research was supported by University of Massachusetts Medical School Department of Psychiatry and Assumption College Collaborative Pilot Research Program (CPRP), awarded to PF and JF, Evaluating Social Synchrony in Autism Spectrum Disorders as well as National Institutes of Health Grant R01R01GM105045 awarded to RS.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Fitzpatrick, Frazier, Cochran, Mitchell, Coleman and Schmidt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.