# HIGH PERFORMANCE COGNITION: INFORMATION-PROCESSING IN COMPLEX SKILLS, EXPERT PERFORMANCE, AND FLOW

EDITED BY : Benjamin Cowley, Frederic Dehais, Stephen Fairclough, Alexander John Karran, Otto Lappi and Jussi Palomäki PUBLISHED IN : Frontiers in Psychology and Frontiers in Human Neuroscience

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-200-5 DOI 10.3389/978-2-88966-200-5

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# HIGH PERFORMANCE COGNITION: INFORMATION-PROCESSING IN COMPLEX SKILLS, EXPERT PERFORMANCE, AND FLOW

Topic Editors: Benjamin Cowley, University of Helsinki, Finland Frederic Dehais, Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-SUPAERO), France Stephen Fairclough, Liverpool John Moores University, United Kingdom Alexander John Karran, Université de Montréal, Canada Otto Lappi, University of Helsinki, Finland Jussi Palomäki, University of Helsinki, Finland

Citation: Cowley, B., Dehais, F., Fairclough, S., Karran, A. J., Lappi, O., Palomäki, J., eds. (2020). High Performance Cognition: Information-Processing in Complex Skills, Expert Performance, and Flow. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-200-5

# Table of Contents


Michael S. Chin and Stefanos N. Kales


Qiuhua Yu, Bolton K. H. Chau, Bess Y. H. Lam, Alex W. K. Wong, Jiaxin Peng and Chetwyn C. H. Chan

*52 The Effect of Meditation on Comprehension of Statements About One-Self and Others: A Pilot ERP and Behavioral Study*

Alexander Savostyanov, Sergey Tamozhnikov, Andrey Bocharov, Alexander Saprygin, Yuriy Matushkin, Sergey Lashin, Galina Kolpakova, Klimenty Sudobin and Gennady Knyazev


Sami Abuhamdeh

*86 Investigating Flow State and Cardiac Pre-ejection Period During Electronic Gaming Machine Use*

W. Spencer Murch, Mario A. Ferrari, Brooke M. McDonald and Luke Clark


*135 Well Done! Effects of Positive Feedback on Perceived Self-Efficacy, Flow and Performance in a Mental Arithmetic Task*

Corinna Peifer, Pia Schönfeld, Gina Wolters, Fabienne Aust and Jürgen Margraf


David Z. Hambrick, Brooke N. Macnamara and Frederick L. Oswald

# Editorial: High Performance Cognition: Information-Processing in Complex Skills, Expert Performance, and Flow

Benjamin Ultan Cowley 1,2 \*, Frederic Dehais <sup>3</sup> , Stephen Fairclough<sup>4</sup> , Alexander John Karran<sup>5</sup> , Jussi Palomäki <sup>2</sup> and Otto Lappi <sup>2</sup>

<sup>1</sup> Faculty of Educational Sciences, University of Helsinki, Helsinki, Finland, <sup>2</sup> Department of Digital Humanities, Faculty of Arts, University of Helsinki, Helsinki, Finland, <sup>3</sup> Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-SUPAERO), Toulouse, France, <sup>4</sup> Liverpool John Moores University, Liverpool, United Kingdom, <sup>5</sup> Hautes études Commerciales Montréal, Université de Montréal, Montreal, QC, Canada

Keywords: high performance cognition, cognitive neuroscience, expert performance, psychophysiology, flow, deliberate practice, cognitive fitness

**Editorial on the Research Topic**

#### **High Performance Cognition: Information-Processing in Complex Skills, Expert Performance, and Flow**

#### Edited and reviewed by:

Cristina M. P. Capparelli Gerling, Federal University of Rio Grande do Sul, Brazil

> \*Correspondence: Benjamin Ultan Cowley ben.cowley@helsinki.fi

#### Specialty section:

This article was submitted to Performance Science, a section of the journal Frontiers in Psychology

Received: 03 July 2020 Accepted: 26 August 2020 Published: 06 October 2020

#### Citation:

Cowley BU, Dehais F, Fairclough S, Karran AJ, Palomäki J and Lappi O (2020) Editorial: High Performance Cognition: Information-Processing in Complex Skills, Expert Performance, and Flow. Front. Psychol. 11:579950. doi: 10.3389/fpsyg.2020.579950 In this Research Topic, High Performance Cognition (HPC) emerges as a generic term for the study of human performance and skill acquisition, from novices on an unfamiliar task (as most psychological experiments) to experts displaying superior levels of domain-specific skill. We received 11 pieces of empirical work across several experimental tasks and three articles describing theoretical/conceptual work on flow, cognitive training, and Deliberate Practice framework for expert performance. The papers received in this Research Topic encompassed a range of methodologies including reports of phenomenology, measurement of behavior, and psychological traits, and measures based on psychophysiology and neurophysiology.

What were the commonalities underlying the various contributions received for the topic? Most empirical papers report that the cognitive process under investigation showed interdepencies with other processes: from the longitudinal relationship between performance anticipation and Flow shown in Cowley et al., to the proactive control of executive function shown by "open"-skill athletes (i.e. trained in dynamic, externally-paced sports) in Yu et al.. This theme also reflects in theoretical contributions which call for revision of the longstanding constructs of deliberate practice and Flow, to either account for interdependency in a more holistic framework (Hambrick et al.), or control for it (Abuhamdeh), respectively. Clearly, even to make such revisions is challenging; beyond which is the yet more-challenging task to build a unified framework on the revised constructs, coherent with existing evidence. Yet such a framework will refine the generic term HPC to a definition of cognitive performance, and thereby provide grounds for empirical predictions and a direction for future work for many years.

Thus, while we cannot yet answer the question: "do cognitive processes that are specific to contexts of high performance also generalize across task domains?," this Topic clarifies somewhat how we might start to address the question. A summary of the contributions follows to provide an overview of the specific articles received as part of this Research Topic.

#### EMPIRICAL STUDIES

#### Flow Focus

Chin and Kales investigate whether autonomic arousal (heartrate variability, HRV) influences self-reported flow and performance in the Stroop task. To increase between-subjects HRV, participants first performed techniques in either nasal respiration, arm-muscle contraction, both combined, or read emotionally-neutral articles. The results showed an inverted-U pattern: performance and flow were maximal at moderate arousal levels. Optimal performance was also associated with predominantly sympathetic autonomic activity.

Cowley et al. ask: How does subjective feeling of high performance relate to development of (personally) optimal performance? They report a longitudinal (40 trials over 8 sessions) study, recording Flow and behavior as participants learned to perform a visuomotor steering task. Participants did not experience more Flow over sessions; instead, their trialwise subjective anticipation of performance (estimated from a power-law model over trials) was strongly related to Flow.

Murch et al. report on three experiments which aim to identify objective markers of flow while gambling. The authors collected self-reported gambling flow and physiological data as measured by the cardiac pre-ejection period (PEP). The authors did not find evidence of changes in PEP when interacting with the electronic gambling machines but only interactions between subjective and objective measures during the first block of each experiment.

In a study examining the effect of positive feedback on perceived self-efficacy, flow and performance, Peifer et al. hypothesize that self-efficacy acts as a mediator, whereby positive feedback on one task imparts a positive effect upon performance and flow in subsequent tasks. They report evidence to support the hypothesis and propose multiple methods of intervention to further increase these positive effects.

Sinnett et al. ask: can increased flow experience can improve perception? The authors implemented a within-subjects design involving groups of athletes and musicians facing a temporal order judgement task paradigm. This task encompasses a measure of temporal processing measure and of spatial attention. Their results disclosed a positive relationship between the self-reported value of experienced flow and the efficiency of spatiotemporal information processing.

#### Cognitive-Performance Focus

Here we order the studies by participant expertise, starting with naïve-subject designs.

Kee et al. investigate the effects of non-striving states, induced by a repetitive water-pouring task, on a word-length comparison task. The authors reported that the experimental group tended to exhibit lower performance and was significantly faster to perform the word comparison task than the control group. Though the authors did not use self-report questionnaires, they suggest their experimental design could help to improve mindfulness practice.

Prolonged task execution brings mental fatigue, characterized by "indolence, reduced motivation, and impaired performance." Liu et al. investigate the motivating effect of monetary rewards (given during this low-vigilance state) on performance in a flanker task, and concomitant electrophysiological measures linked to selective visual attention, primarily P300. Monetary reward did improve performance during low vigilance, but neural measures recovered selectively, indicating that some constraints on higher cognitive performance are non-volitional.

Gong et al. investigate how EEG neurofeedback training (NFT) affects shooting performance and neuroplasticity among police students. They found that pre-post shooting performance increased for participants using sensorimotor-rhythm NFT, but slightly decreased for participants using alpha-rhythm NFT. Participants learned to alter their EEG patterns in both NFT groups. Pre-post measures also showed that neuroplasticity was affected by NFT. Thus, NFT may facilitate training for activities requiring sensorimotor accuracy.

Muñoz et al. present work toward a psychophysiological model of firearm training for real police officers using a combination of head-mounted virtual reality and non-intrusive physiological sensors. The authors demonstrate how changes in frontal theta activation and heart rate variability were affected by changes in task difficulty levels, which they articulate as metrics of "concentration" and "calmness."

The study by Savostyanov et al. utilizes ERPs to examine the effect of long-term meditation on comprehension of statements about oneself and others. Three groups (control, short- and long-term meditators) completed a reading task containing self or non-self-related evocative statements and grammatical errors. They demonstrate that meditation increases negative affect processing and an ERP analysis that potentially shows an increase of voluntary control over emotional states for meditators.

Work presented by Yu et al. investigates how motor-skills experience modulates proactive and reactive control of executive function, using a cued task-switching protocol with "open"- and "closed"-skilled athletes and non-athlete controls. Their findings highlight that open-skilled participants showed significantly less positive-going parietal cue-locked P3 and better predictive taskswitching performance. They conclude that proactive control may be enhanced in open-skilled participants when compared to the other groups.

#### THEORETICAL PAPERS

Aidman proposes a framework for Cognitive Fitness: trainable cognitive abilities (as distinct from domain-specific knowledge, skills and attitudes) that can be improved and maintained by specific cognitive fitness training, analogous to physicalfitness training for strength, endurance, and flexibility. Aidman proposes his framework would be valuable for training performance in sport, arts, emergency services—indeed, any domain where individuals must deliver high performance on demand under stressful conditions.

Hambrick et al. and Abuhamdeh present critical opinion pieces on two highly-influential theoretical constructs in study of HPC: Deliberate Practice (DP, Ericsson et al., 1993), and Flow (Csikszentmihalyi, 1975). The Hambrick et al. paper continues their ongoing interrogation of the internal consistency of existing DP research, building on the authors' earlier metaanalysis (Macnamara et al., 2014). The Abuhamdeh paper looks critically at how the original definition of flow has since been operationally defined.

Both contributions share an underlying message: after decades of research, both lines of investigation remain preparadigmatic, in the sense of Kuhn (1962). Each is identified with one researcher's seminal account, but subsequently has been operationalized in an unsystematic way. This inconsistency, alongside lack of agreement on theoretical attributes, threatens the ability of both theoretical perspectives to develop into mature paradigms. Both papers challenge researchers to unify methodology and theory, and develop "progressive research programs" (Lakatos, 1978).

#### SUMMARY

The studies presented in this Research Topic illustrate both current diversity, and a bright future, for research

### REFERENCES


in performance cognition. Researchers investigating flow states posit new experiments to model the conditions and relationships involved and call for a consistent operationalization of the flow state. Neurofeedback researchers propose to move into real-time adaptation of training protocols; those investigating meditation propose novel interventions to further increase the benefits of the practice. Extension and improvement of human cognition through augmented and directed training comprises a common vision for the future. To achieve it, we must unify this diverse field around a common and parsimonious framework, such that we can move toward measuring and understanding high performance cognition as it develops through learning and training across various domains.

### AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Cowley, Dehais, Fairclough, Karran, Palomäki and Lappi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Flow Experiences During Visuomotor Skill Acquisition Reflect Deviation From a Power-Law Learning Curve, but Not Overall Level of Skill

Benjamin Ultan Cowley 1,2 \* † , Jussi Palomäki 1,2†, Tuisku Tammi 1,3†, Roosa Frantsi 1,3 , Ville-Pekka Inkilä<sup>1</sup> , Noora Lehtonen<sup>1</sup> , Pasi Pölönen<sup>1</sup> , Juha Vepsäläinen<sup>1</sup> and Otto Lappi 1,2,3

<sup>1</sup> Cognitive Science, Department of Digital Humanities, Faculty of Arts, University of Helsinki, Helsinki, Finland, <sup>2</sup> Helsinki Centre for Digital Humanities, Helsinki, Finland, <sup>3</sup> Traffic Research Unit, TRUlab, University of Helsinki, Helsinki, Finland

#### Edited by:

Rafael Ramirez, Universidad Pompeu Fabra, Spain

#### Reviewed by:

Serena Oliveri, University of Milan, Italy Paula Thomson, California State University, Northridge, United States

#### \*Correspondence:

Benjamin Ultan Cowley ben.cowley@helsinki.fi

†These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Performance Science, a section of the journal Frontiers in Psychology

Received: 13 February 2019 Accepted: 29 April 2019 Published: 15 May 2019

#### Citation:

Cowley BU, Palomäki J, Tammi T, Frantsi R, Inkilä V-P, Lehtonen N, Pölönen P, Vepsäläinen J and Lappi O (2019) Flow Experiences During Visuomotor Skill Acquisition Reflect Deviation From a Power-Law Learning Curve, but Not Overall Level of Skill. Front. Psychol. 10:1126. doi: 10.3389/fpsyg.2019.01126 Flow is a state of "optimal experience" that arises when skill and task demands match. Flow has been well studied in psychology using a range of self-report and experimental methods; with most research typically focusing on how Flow is elicited by a particular task. Here, we focus on how the experience of Flow changes during task skill development. We present a longitudinal experimental study of learning, wherein participants (N = 9) play a novel steering-game task designed to elicit Flow by matching skill and demand, and providing clear goals and feedback. Experimental design involves extensive in-depth measurement of behavior, physiology, and Flow self-reports over 2 weeks of 40 game trials in eight sessions. Here we report behavioral results, which are both strikingly similar and strong within each participant. We find that the game induces a near-constant state of elevated Flow. We further find that the variation in Flow across all trials is less affected by overall performance improvement than by deviation of performance from the expected value predicted by a power law model of learning.

Keywords: Flow, skill acquisition, power law of practice, visuomotor performance, steering, high performance cognition

# 1. INTRODUCTION

In many fields of human endeavor—such as music, art and sports—the skilful performance of a demanding task can elicit a state of "optimal experience" called Flow (Csikszentmihalyi, 1975). The Flow state is thought to be dependent on several pre-conditions for the eliciting task, and characterized by several phenomenological features (Nakamura and Csikszentmihalyi, 2002; Engeser and Schiepe-Tiska, 2012; Keller and Landhäußer, 2012). Conditions for Flow to occur define certain characteristics of the task and one's skill in the task: C1: challenge should match skill in a demanding task. C2: the task setting should present clear and personally significant goals. And C3 the setting should provide unambiguous feedback on goal achievement. When these conditions are met and the individual enters a mode of high performance, they may experience a set of phenomenological features characterizing "the Flow experience": F1 total focus in the present moment, and concentration on what one is doing; F2 merging of action and awareness ("being one with the task"); F3 loss of reflective self-consciousness, a sense of effortlessness; F4 a sense of personal control and confidence in one's skill; F5 positive affect, the activity is experienced as

highly enjoyable; F6 a distortion of temporal experience (time may seem to go slower or faster than normal). These consistently co-occurring features of the experience imply that Flow is autotelic, i.e., Flow-producing activities are intrinsically rewarding, people want to do them for their own sake regardless of external reward.

The antecedent conditions (C1-3) and phenomenological features (F1-6) of Flow have been investigated for several decades, mainly using analysis of self-report data from people engaged in natural everyday or expert performance (Csikszentmihalyi and Bennett, 1971; Moneta, 2012). Despite this, debate continues around the precise definition of the preconditions, especially C1 (challenge–skill balance).

At least three different models have described how Flow depends on challenge–skill ratio. Each model makes different assumptions for how this dependence is affected when skill and challenge change during learning. The original Flow model (Csikszentmihalyi, 1975) assumed that levels of challenge and skill can vary independently from low to high, and that Flow can happen when skill and challenge match at any level. This is the seminal Flow "channel" model.

The now-classic octant/quadrant models (Massimini et al., 1988) instead suggested that Flow only happens when skill and challenge exceed a certain threshold level. If the challenge or skill of task performance are too low, even if matched, then the task will not elicit Flow. At a certain point, the level of matched skills and challenges will become enough to elicit Flow; we will refer to this point as the reference level. The octant/quadrant models have been criticized because it is unclear how to determine the reference level. Does challenge need to exceed the challenge of most typical everyday tasks? Does the task need to be challenging for the individual, or relative to typical skills of a reference population? Or, does the reference level come from some universal benchmark of physical effort or informationprocessing complexity<sup>1</sup> ? Challenge (and skill) could also be taskspecific, with its reference level continually recalibrating to the reference level of past performance in the specific task. Each of these issues beg the question: what happens during task learning? Does the reference level need to be recalibrated to track the levels of skill and challenge as they increase?

In their critical examination of these models, Keller and Landhäußer (2012) argue that the concept of reference level is problematic, as it might not be empirically determinable for researchers, or even psychologically available to performing individuals (see also Moneta, 2012). Keller and Landhäußer (2012, p. 56) proposed the Flow intensity model, which has the dimensions "perceived challenge–skill balance" and "subjective value of the task" to define conditions that elicit Flow. This model has no reference levels. On the other hand, the intensity model takes no account of the direction of a challenge–skill imbalance, thus losing information which may be important to understand the Flow experience.

To sum up, the state of the art is unclear on three main points. First, it is unclear whether reference levels of challenge and/or skill govern the emergence of Flow. If they do, it is unclear how the levels are defined. If defined relative to a given task or task episode, we must ask: should the level change with skill acquisition? If defined relative to other tasks, we must ask: should the task demand be compared to the skill sets of the individual, or a reference population, or an absolute standard? Second, it is unclear if the direction of challenge–skill ratio is important, i.e., does it make a difference that skill exceeds challenge vs. challenge exceeds skill. Third and finally, these models capture a static snapshot of Flow. Thus, Flow research must still deal with the effects of learning on C1 (challenge–skill balance).

The matter is of importance because understanding how the Flow conditions behave across different levels of skill is relevant to any field interested in the development of performance (e.g., development of coaching practices or concentration techniques in sport Jackson and Marsh, 1996). Also, an understanding of the mechanisms that mediate Flow across a learning process could help to enhance enjoyment or performance through better design, e.g., of recreational tasks such as games (Chen, 2007). However, these aims call for studies of Flow elicited across different stages of learning, with a more controlled and quantitative approach. Such an approach can build on recent studies of Flow from fields of experimental psychology (Keller and Blomann, 2008; Harris et al., 2017) and psychophysiology (Peifer, 2012; Peifer et al., 2014; Harmat et al., 2015; Wolf et al., 2015; Labonté-LeMoyne et al., 2016).

Here we report an experimental skill-acquisition study on the connections between performance and the selfreported phenomenology of Flow. We introduce a novel, demanding visuomotor task. With a longitudinal design, we are afforded more power to examine the connections of Flow and performance, within- and between-subjects. We also model the learning shown by participants, finding good fit of the data to a power-law curve, which has been shown to closely approximate a very wide range of skill acquisition datasets (Newell and Rosenbloom, 1982; Logan, 1988; Palmeri, 1999).

Results show that our task successfully elicits an elevated level of Flow across all self-reports. Yet when we examine Flow with fine granularity, we see that variation in Flow responses relates less to overall performance improvement, than it does to deviation of performance from the expected value predicted by a power law model of learning, with higher performance associated with higher Flow and lower performance with lower Flow.

### 1.1. Protocol and Research Questions

Participants learned to play a custom-made high-speed steering game (**Figure 1**<sup>2</sup> ). The game was specifically designed to elicit Flow through balancing task demand with the skill level of the participant, and providing clear immediate feedback. The aim in the game was to steer a blue cube through a course with randomly placed red obstacles at the highest possible speed. The cube started each game at a fixed forward velocity, which increased at a constant rate. The lateral position of the cube was controlled by the steering wheel. Collision with obstacles reduced speed by

<sup>1</sup>As an example of a universal benchmark, the maximum propagation speed of neural signals along an axon sets a limit on response times.

<sup>2</sup>For game video see https://doi.org/10.6084/m9.figshare.7269395.v1

a fixed amount, also indicated by a flashing of the screen (see section 2 for units).

This design, inspired by psychophysical staircase methods (Cornsweet, 1962), ensured constant match between skill and demand at the participant's level of performance. The performance was measured by duration of the trial (shorter duration = faster average speed = better), displayed as a time score at the end of each trial.

Participants played the game for forty trials across eight sessions, over a period of 2–3 weeks, which was sufficient to achieve good proficiency in this task with no ceiling effect. The 10 item Flow Short Scale (Engeser and Rheinberg, 2008) was filled after each trial to probe self-reported Flow in the task. Physiological data were recorded (skin conductance, heart rate, and eye tracking), during the task and a 5 min baseline, in sessions one and five-to-eight. This data introduces considerable further research questions and so falls outside the scope of this report.

This design allowed us to explore the following Research Questions:


# 2. METHODS

#### 2.1. Participants

A convenience sample (N = 9, 6 males, 3 females) was recruited via student mailing lists at the University of Helsinki. The participants were between 22 and 38 years of age (mean 27, SD 3) with normal or corrected-to-normal visual acuity and no history of neurological or psychiatric disease.

Eight of the participants had a driving license; two participants reported <10,000 km lifetime kilometrage, three participants 10,000–30,000 km, two 30,000–100,000 km, and one participant >100,000 km. Two had no or very little previous gaming experience, two participants played 1–3 h a month, and five participants stated they play over 1 h a week. **Table 1** shows the details of each participant.

All participants were naive about the specific hypotheses and purpose of the study, other than that the time of recruiting they were informed that the experiment was about game experience and learning. Participants were given 11 cultural vouchers (1 voucher is worth 5 euro) in compensation for their time. They were told that they would get 9 vouchers for participating in all sessions and 2 extra vouchers if they improved their performance in the game. The criteria for sufficient improvement were not stated explicitly, and in fact all participants were given the two extra vouchers.

Participants were briefed and provided written informed consent before entering the study, and were aware of their legal rights. The study followed guidelines of the Declaration of Helsinki and was approved by the University of Helsinki Ethical review board in humanities and social and behavioral sciences (statement 31/2017; study title MulSimCoLab).

# 2.2. Design

The experiment was divided into eight sessions, on eight different days over a period of 2–3 weeks scheduled at each participant's convenience. In each session, the participant played five trials of the driving game, each trial lasting 2–4 min depending on their performance, for approximately 15 min of driving time per session. The judgement of how much total playtime (here, 2 h) would be sufficient to develop good task proficiency was based on extensive informal piloting, including prior observations with other convenience samples. **Figure 2** illustrates the protocol.

After each trial, the participant was shown the trial duration and the number of collisions, after which they filled in a selfreport questionnaire (FSS). In sessions 1 and 5–8 (lasting approx. an hour), eye-tracking and physiological signals (electrodermal activity and heart rate) were recorded in a 5 min baseline recording before playing, and during gameplay. In sessions 2–4 (lasting 20–30 min), no physiological measurements were taken.

# 2.3. Materials

#### 2.3.1. Game

The experimental task was a custom-made high-speed steering game CogCarSim designed specifically for the study of Flow and coded in Python<sup>3</sup> .

The participant steered a cube "avatar" moving forward along a straight track bounded by edges that could not be crossed. The cube's side length was 2 units, and the track was 25 units wide. The horizontal field of view angle of the virtual

<sup>3</sup>The game code as used herein is permanently available under open source license at https://doi.org/10.6084/m9.figshare.7269467

TABLE 1 | Participant background information.


camera was 60◦ and vertical 32◦ . The camera was positioned behind the cube at 4 units height, pointing forward along the track.

Stationary obstacles (red cones, red or yellow spheres with a height/diameter of 2 units) on the track had to be avoided. For each trial, a total of 2,000 obstacles were placed randomly on the track, with placement constrained to always allow a path through. Track length varied between 24196.4 and 24199.7 units (mean 24197.8, sd 0.8). The speed of the cube was initially set to 1.6 units per step (96 units per second); increased at a constant rate (0.0012 units/step at every step); and slowed down if obstacles were hit (0.102 units/step at each collision). When a collision caused a speed drop, the screen flashed to indicate a collision; there followed an immunity period of 100 steps during which additional collisions did not cause further speed drops. Participants could only affect speed indirectly, by avoiding collisions. Participants were instructed to avoid as many obstacles they could in order to complete the trial as fast as possible.

The game had maximally simple one degree-of-freedom linear and holonomic dynamics: the horizontal position of the cube was directly proportional to steering wheel angle. Extensive selfpiloting was done to adjust the graphics, e.g., virtual eye height; plus starting and increment speeds, rate of change of speed during collisions, and steering wheel sensitivity (steering ratio and damping).

The participants started each trial by pressing a button on the steering wheel when they felt ready. At the end of each trial, the elapsed time and number of collisions were displayed, along with a high score of the participant's ten best trials so far.

Data collected by CogCarSim included the positions, shape, and color of obstacles on the track; trial-level aggregated performance data (trial duration, number of collisions, average velocity); and within-trial time series data (steering wheel and cube position, speed, registered collisions).

#### 2.3.2. Equipment

The game was run on a Corsair Anne Bonny with Intel i7 7700k processor and an Nvidia GTX 1080 graphics card, running Windows 10.

The participant was seated in a Playseat Evolution Alcantara playseat (Playseats B.V., The Netherlands) aligned with the mid point of the 55′′ display screen (LG 55UF85). The screen resolution was 1,920 × 1,080 pixels, the frame rate was 60 and the refresh rate 60 Hz. The viewing distance was adjusted for each participant (so that they could place their hands on the steering wheel comfortably) and was approximately between 90 and 120 cm from the eye to the screen. The game was controlled with a Logitech G920 Driving Force steering wheel (Logitech, Fremont, CA). Steering wheel settings in Logitech Gaming Software 8.96.88 were: sensitivity 100%, centering spring strength 4 percent, and wheel operating range 900◦ .

Eye-tracking and physiological signals were collected and stored on an Asus UX303L laptop with Debian GNU/Linux 9 OS. Electrodermal activity (EDA) and blood volume pulse (BVP) were recorded at 128 Hz sampling rate using NeXus-10 (Mind Media B.V, Roermond-Herten, The Netherlands). For EDA, silver-silver chloride (Ag-AgCl) electrodes with 0.5% saline paste were attached to the medial side of the left foot with adhesive skin tape and gauze. The BVP (heart rate) sensor was attached to the left index toe of the participant. Eye tracking was measured with a Pupil Labs Binocular 120 Hz headset with a custom-built headband<sup>4</sup> .

#### 2.3.3. Flow Short Scale

To measure self-reported Flow, participants were asked to fill in the Flow Short Scale (FSS) after each trial (Rheinberg et al., 2003; Engeser and Rheinberg, 2008). FSS has 10 core items which load the subfactors fluency of performance (6 items) and absorption by activity (4 items); plus 3 items for perceived importance. The response format of FSS is a 7-point Likert scale ranging from Not at all to Very much. Higher scores on the scales indicate higher experienced Flow and perceived importance. Example items include "My thoughts/activities run fluidly and smoothly" (fluency of performance), "I do not notice time passing" (absorption by activity), and "I must not make any mistakes here" (perceived importance). See **Supplementary Information** for full English text and Finnish translation.

Cronbach's alpha for a 10-item scale including the fluency of performance and absorption by activity items was 0.92; Cronbach's alpha was 0.87 for the 13-item FSS scale including perceived importance (Rheinberg et al., 2003). FSS authors (Rheinberg et al., 2003) suggest using the 10-item scale (excluding perceived importance subfactor) as a measure of experienced Flow. For our data also, Cronbach's alpha was higher for the core 10- than for 13-item scale. Thus, the Flow scale used in our analyses was formed by averaging the items in the fluency of performance and absorption by activity subfactors. The perceived importance subfactor was used separately in some analyses (see section 3).

In addition to the 13 main items asked after every trial, participants were asked at the end of every session to report 3 more items measuring the fit of skills and demands of the task (Rheinberg et al., 2003). These items also had 7-point scales, e.g.,: "For me personally, the current demands are... (too low—just right—too high)."

There was no Finnish translation of the scale available, so it was translated into Finnish by the authors. Two of the authors (native speakers of Finnish, no formal qualifications for English-Finnish translation) first made translations independently; these translations were compared and revised, then reviewed by other Finnish-native authors, and revised.

#### 2.4. Procedure

After recruiting, participants selected eight suitable dates within a 3-week period. All sessions took place between 8 a.m. and 7 p.m. at Traffic Research Unit, Department of Digital Humanities, University of Helsinki. In the first session, participants were informed about the procedure of the study and asked to fill in a background information questionnaire, including information on health, driving experience and gaming experience, and an informed consent form.

The sessions were managed by two research assistants at a time, who observed the measurement, out of participants' line of sight behind a partition wall, and took notes about possible confounding factors and problems within the session. In the beginning of each session participants filled in a sessionwise questionnaire on the use of contact lenses, restedness, and medication, caffeine, and nicotine intake.

In sessions with physiological measurements (1 and 5 to 8), participants were dressed in physiological sensors and an eye-tracking headset, seated in the driving seat in quiet, lowlight conditions for baseline measurement. They were asked to sit still for 5 min, looking at a dark blue screen, while baseline was recorded. After baseline recording, participants played five game trials, filling FSS after each trial. Eye-tracking and physiological signals were recorded during trials. In sessions 2–4, participants played five trials straight after filling in the session-wise questionnaire, without a baseline period. The FSS was filled after each trial. At the end of Session 8, the participants were debriefed and given the reward of culture vouchers.

#### 2.5. Statistical Methods

All statistical data processing reported herein was implemented with R platform for statistical computing (R Core Team, 2014). Where possible, exact corrected p-values are reported; inequalities are reported where exact values were not available. All p-values were corrected for multiple comparisons using Bonferroni-Holm. For all simple correlations we calculated Pearson's correlation coefficient, because all data in these tests were shown to be normally distributed by Shapiro-wilk tests and associated Q-Q plots.

For RQ1, participant-wise linear regression models were fitted using lm function in R, which also supplies R 2 values. The same approach was used to fit the "grand model" to group-wise data (i.e., pooled participants).

For RQ2, we obtained the independent variable as follows. For each participant and for each trial (40 trials in total), we subtracted predicted trial duration (y-value of power-law performance line) from observed trial duration, thus obtaining power-law model residuals, in units of log(sec). We refer to these within-participant trial-duration residuals as deviation scores, because they represent how much each observed trial duration deviates from the duration predicted by the model. Note, residuals are in the space of log-transformed trial durations in seconds and are therefore equivalent to ratio of performance in seconds. So for similar deviation scores from two trials, the later deviation represents a larger (or equal) effect in seconds.

Specifically, we first fit a linear mixed model with nonstandardized Flow scores as the dependent variable, deviation scores as the predictor, and participant (numerical participant identifier ranging from 1 to 9) as a random factor with both random intercept and slope. This approach was chosen to handle the non-independence of data points within participants (see Bates et al., 2015).

Note that that there is no consensus on the best way to obtain p-values or estimates of effect sizes from linear mixed models. We have treated the t statistic as a z statistic using a standard normal distribution as a reference, and followed the method by Nakagawa and Schielzeth (2013) to obtain pseudo-R 2 values. Another way to statistically evaluate the significance of these results is via the binomial distribution: The (two-tailed)

<sup>4</sup>For technical details, see https://zenodo.org/record/1246953#.XJT-Ki10eqA

probability of 9 negative slopes (should the probability of a negative slope per participant be 0.5, that is, fully random) is p = 0.007.

### 3. RESULTS

All participants completed the task (40 trials in total). Average trial duration was 186s (SD 18.2 s, min 162.2 s, max 300.1s). Average number of collisions was 17.8 (SD 4.9, min 5, max 40). Average trial velocity ranged between 1.37 and 2.54 units per step (mean 2.23, sd 0.19). Maximum instantaneous speed was 3.6 and minimum 1.06. The **Supplementary Information** provides comprehensive data on performance-related features, such as trial duration, along with correlations between them; also it includes further details on participant self-report and background, plus validation of our main result for RQ2.

### 3.1. RQ1: How Does Performance Change Over Time?

What is the form of the learning curve, does it consistently improve e.g., as a power law of practice (Newell and Rosenbloom, 1982)? A power-law curve transformed to log-space will be linear. Thus, to investigate whether participant behavior follows a power law, we fitted a linear model in log-log space (log-transformed dependent and independent variable) of trial durations as a function of cumulative number of trials, for each participant separately. **Figure 3A** shows this log-log performance data for each participant in each trial. Blue dashed lines indicate the power-law LC. Distance of points from the line (residuals) indicate the deviation of each trial from predicted learning: points above the line indicate longer duration (worse performance) than predicted by the LC, and vice versa.

All participant-specific log-log models had negative slopes, which indicates that with experience each participant learned to play better (obtained faster trial times). The variation in intercepts reflects disparity in participants' initial skill levels, and the variation in slopes the different learning rates. The individual intercepts and slopes of the models are presented in **Table 2**. A grand model was also fitted for all participants, and cumulative number of trials explained 39.6% of variance in trial durations. As the performance generally improves with cumulative trials, in agreement with a power-law of learning model, the explained variance can be ascribed to learning.

To confirm that a power law model gives a good approximation of learning, we compared its model-fit criterion against the fit of an exponential curve model (see **Supplementary Information** for details). While both models had good fit, the power law model was slightly better.

RQ1 can thus be answered: the task was learned and the LC fit well to a power law model. Given these positive answers, we may assume that the model provides a useful statistical estimate of performance expectation, i.e., how well the participants expect to perform can be estimated from the model.

# 3.2. RQ2: How Is Flow Related to Performance?

Participant-wise mean Flow and LC slope were related but not significantly correlated (Pearson's correlation coefficient r = 0.6, p = 0.6, N = 9). Since we have established that performance improves over sessions, we also used session number as a simple proxy of performance improvement. **Figure 4** shows the groupwise distribution of Flow scores plotted against sessions: clearly, there is no effect of session on group-wise median Flow (Pearson's r = -0.12, p = 1.0, N = 8). We found no significant effect of session type (sessions 2–4 vs. sessions 1, 5–8) on Flow scores [F(1, 8) = 3.18, p = 0.11], by repeated measures ANOVA.

Next, we calculated the group-wise correlation of median duration and median Flow, separately for each session. The relationship between duration and Flow was intermittently significant before correction for multiple comparisons, but not after, and with no particular trend (range of Pearson's r = [-0.05 . . . -0.74], p = [0.2 . . . 1.0], N = 9 for all). These results suggest that higher Flow was sometimes associated with lower trial durations (i.e., better performance), but not strongly and not systematically. If we group sessions by condition (introduction = 1, practice = 2–4, main test = 5– 8), we can visualize the evolution of performance against Flow more clearly than by plotting each session individually, see **Figure 5**.

The relationships between global (over all responses) Flow and performance appear weak, but we also wish to examine local

Flow for each trial separately. The points in each subplot of **Figure 3A** are colored according to Flow self-reports made after each trial, in a standardized range (original scores transformed to z-scores). The highest Flow scores are yellow, the lowest are navy-blue. Interestingly, this figure reveals at a glance that the points lie above and below the log-transformed powerlaw line in good agreement with the level of experienced Flow: worse performing trials (data-point above the line) tend to be more blue (Flow scores below the participantwise mean), and better performing trials tend to be more yellow (scores above the mean). In other words, it seems that whenever participants were performing better than predicted by the power-law line, they were experiencing more Flow, and vice versa.

We evaluated whether this effect was robust and statistically significant. For each participant, we correlated their deviation scores (signed residuals from the power-law model) with their Flow scores, using a linear mixed model (see section 2). This model was statistically significant (deviation score β = -8, t = - 4.36, p = 0.002) and the relationship is shown in **Figure 3B** (Flow scores are standardized). The conditional pseudo-R 2 value for this model was 0.47, corresponding to a correlation of 0.68, so that the model explains ∼47% of Flow score variability.

Thus, high Flow scores are associated with better than predicted results (trial durations below the predicted performance line), and vice versa. The strength of this association per participant follows from the strength of the correlation, and overall the model has large effect size.

As can be seen in **Figure 3B**, the trend was clearly negative for 7 out of 9 participants, while for two participants, 3 and 7, the trend was similar but the relationship was weaker. Notably, these two participants also reported lower scores on perceived importance: mean scores for these participants were 2.03 and


TABLE 2 | Individual learning rate parameters (cols 2–4), Flow (cols 5–6) and perceived importance (P.I., cols 7–8) scores. Last row shows group-mean values of each column.

2.22, whereas the overall mean was 3.77 (see **Table 2**). However, the group-wise interaction between perceived importance scores and deviation scores was not statistically significant.

RQ2 can thus be answered: Flow was not consistently and robustly related to improvement in task performance with the skill acquisition occurring over 2 h of practice. It was, however, consistently related to whether performance was better (or worse) than predicted given the participant LC. Moreover, this effect might be moderated by self-reported perceived importance of the steering task (more data would be required to clarify).

#### 4. DISCUSSION

We present a longitudinal experiment of Flow in a gamelike high-speed steering task where task performance is easily parameterized and its relation to Flow analyzed. To induce Flow, the game was designed to hold the balance between skill and challenge constant: the difficulty of the game continually adapted to the skill level of the participant.

The results show the game was clearly Flow-inducing: mean Flow across sessions was reported as 5.1 (out of 7) on the FSS. This relatively high and stable mean Flow "baseline" induced by the game could be construed as reflecting the meeting of skill and challenge C1–3 by design.

We further found that Flow was not associated with gaining experience and skill in the game—our participants did not reliably report more Flow even as they learned, session by session, to complete the trials faster. This supports the theoretical position that Flow is elicited by the balance of skill and challenge, but show that Flow is less sensitive to the absolute level (within a task) of skill or challenge. This fits with the models which are more lenient regarding skill/challenge level: the original Flow Channel model and the latest Flow intensity model. The Quadrant/Octant models, which require "above-average" skills and challenges for Flow, are only supported under the assumption that this aboveaverage reference level is task-specific and dynamically adjusted in step with the learning curve. Otherwise, if the reference level is fixed, then based on the Octant model experience of Flow would be predicted to increase along with the increase of skill level (and demand) during our longitudinal measurements of participant learning. In other words, when skills and challenges increased from a fixed reference, participants should have felt further "north-east" of the model midpoint where Flow bottoms out, and thus be more likely to report Flow and assign it greater intensity on a reporting scale. This was not observed. Therefore, our results do not fit the predictions of the Quadrant/Octant models under the assumption of a fixed-demand reference level. See also Keller and Landhäußer (2012) for critical discussion of these models. In absence of an independent motivation for such adjustment hypothesis it must be considered somewhat ad hoc. So, we suggest that—in order to incorporate the present findings—the octant model should be developed to provide a valid set of assumptions to support clear conclusions about the reference level.

#### 4.1. Mechanisms of Learning and Flow

We also showed that higher trial-wise Flow (trials with higher self-reported Flow) was associated with trial durations shorter than expected by the power law LC model, and vice-versa. Thus, Flow for each trial was higher-than-average or lower-thanaverage in agreement with task performance that was better or worse than expected (at the current level of skill). This stands in contrast to how mean Flow remained stably elevated across sessions. In other words, learning to play the game did not itself increase Flow; rather, the game induced a fairly high level of mean Flow, and trial-wise variability of Flow was correlated with better or worse than statistically-expected performance.

#### 4.1.1. Alternative Explanations of Trial-Wise Results

It is a novel observation that trial-wise Flow relates to fluctuations of performance around the level expected from the learning curve. It is interesting because higher-than-expected performance in an individual trial can plausibly indicate either higher skill (e.g., better concentration), or lower challenge (easier-to-negotiate random placement of obstacles). Both are deviations from the average skill-challenge balance, yet may be associated with higher Flow. Either way, our result undermines the assumption of Keller and Landhäußer (2012)'s Flow Intensity model that the direction of skill-challenge ratio can be ignored: our results show that Flow is elevated when skills exceed challenges.

There are naturally several alternative explanations for this result. One possibility is that performance on some trials is enhanced by increased Flow during the trial—i.e., participants perform better when they "get into Flow." Another possibility is that the randomly-generated geometric layout of each trial might be more or less easy to negotiate; the "easier" trials would afford faster performance. Such random fluctuations of task difficulty could shift the skill-challenge ratio closer to one that the participant finds Flow-inducing. Alternatively, participants may be more likely to report higher Flow (after the fact) on a more successful (hence, more rewarding) trial, because they see their score before they complete the FSS. From these three alternatives, we find the first one the most convincing, because (A) comparing first to second, we believe that whatever biological and psychological mechanisms might be underpinning Flow will tend to create greater variation than the randomlygenerated track layout; and (B) comparing first to third, the game design undermines the third alternative (see Limitations below). Ultimately however, the present paradigm cannot conclusively support any one alternative.

#### 4.1.2. Cognitive Mechanisms of Flow

How can our approach and the present results be helpful to understand the mechanisms generating the Flow experience?

Methodologically, this study follows a time series approach rarely used in Flow research, by looking at changes in Flow over time, with relatively highly frequent and non-independent repeated measurements. This contrasts with much prior work which treats Flow as a relatively stable property, and allows us to look at on-task learning.

Learning implies skill increase, which (by game design) implies challenge increase, which (by design of Quadrant/Octant models) together imply Flow increase. As discussed, for such "state Flow" models the assumption of the fixed reference frame leads to the prediction of increased Flow with skill acquisition, which is not supported by our data. It is thus not straightforward to reason about the evolution of Flow, or learning and Flow, based on these models, before resolving the crucial issue of the reference level for above-average task demand and skill.

For the Flow intensity model this particular problem does not arise (Keller and Landhäußer, 2012). However, the relation of learning and Flow is not entirely straightforward here, either. Increased skills should eventually increase task demands (because skill-learning increases access to the task's deeper levels of challenge), and thus perceived fit of skills and task demands. We did not assess perceived value directly, but it is plausible that time investment in and enjoyment of the game (as indicated by high mean Flow) would also increase the subjective value of the activity. If this is the case, again higher Flow should be elicited in step with increasing performance. This was not observed.

Overall, the lack of mechanistic hypotheses about the processes underlying the proposed dependencies in the Flow models make it difficult to make definite predictions in novel tasks, especially ones with changing task demands and skill, such as here. Prior work has provided some (neuro)cognitive, information-processing views on Flow (Marr, 2001; Cowley et al., 2008; Šimleša et al., 2018). Such work could provide an approach to make cognitive hypotheses about Flow, but these models have not been empirically tested, so it is unclear which (if any) to follow.

The aim of future work should then be to find out: what cognitive processes are specific to Flow-inducing task performance (in different stages of learning), but also general to multiple performance-domains. By so doing, we can in future attempt to clarify empirical observations by reference to a distinct cognitive theory of how Flow is generated.

#### 4.1.3. Flow and Task Complexity

A possibly useful novel way to view Flow and learning is via task complexity. Csikszentmihalyi (1999) proposed that Flow should be possible in any task, complex (e.g., car driving) or simple (e.g., dish washing). But Keller and Landhäußer (2012) also proposed that Flow depends on perceived task value as well as challenge– skill balance. One way to resolve these ideas is to consider that the individual can introduce complexity (or value) to their activity if they appear to have exhausted that task's potential to challenge them. Nakamura and Csikszentmihalyi (2002) suggest such an exploratory mechanism to explain how individuals maintain Flow in complex tasks: "As people master challenges in an activity...to continue experiencing Flow, they must identify and engage progressively more complex challenges." The corollary for simple tasks is that individuals create complexity, e.g., with selfdefined goals (Rauterberg, 1995). For example, a similar state to Flow, called the Zone, has been reported for machine-gambling addicts whose pastime is in fact skill-free, but who nevertheless believe that they are skilled (Schull, 2014).

In summary, complex tasks have deep structure to be learned, requiring non-trivial skill acquisition for any duration of learning and thus a shallow LC (learning is slow). Importantly, the skill level does not quickly peak, such as with simpler tasks like washing the dishes, where Flow might be obtained but cannot strongly interact with learning (without self-created complexity). Learning comes into play when we consider that the same task can appear at first simple and later complex, e.g., as our experiment game.

# 4.2. Limitations and Future Work

Our study had a small convenience sample because (a) the recording paradigm was extensive (around 8 h of contact time), and (b) it was to some degree an exploratory study; both implying the need to constrain datasets to tractable sizes. It is worth noting that while we recruited only 9 participants, we collected a significant amount of behavioral and physiological data for each participant over a course of 2 weeks and 8 recording sessions. This amounts to a quite rich dataset allowing us to dig deep into the underpinnings of skilled high speed steering and Flow. Moreover, given the fact our experimental paradigm has not been used previously, we could not a priori easily estimate the statistical power required to discover significant effects. However, our results ended up being both strikingly similar and strong within each participant, suggesting that collecting a larger sample size would likely have not changed the pattern of the results or provided more in-depth insights. Regardless of the justifications, the sample size and recruitment method are minor limitations to be remedied in future work. For example, gender is known to affect performance visuomotor tasks (Feng et al., 2007). However, although there is a gender imbalance in our sample, the factor gender is confounded by background variables including gaming and driving experience, masking the true effect of gender. In order to properly study gender differences we would need to tailor our recruitment to that purpose. On the other hand, the result of Feng et al. (2007) suggests that gender differences might anyway balance out over the course of learning such a task as ours. Finally, because it lacks experimental manipulations the study cannot make strong causal claims, which could be improved by recording separate conditions of the game task, e.g., with varied difficulty levels.

As stated above, there are different plausible explanations of the main result linking Flow and performance. If FSS reports were indeed influenced by seeing the score beforehand, this should be considered a design limitation. However, the game gives such clear and direct feedback on performance (i.e., after collisions), that it is likely that self-assessment of performance would be similar with or without seeing the score. The score can thus be considered just a reinforcement of the perception of fit between skills and demands, which is anyway a required part of Flow reporting (Keller and Landhäußer, 2012). There is a possibility that Flow states could be affected by wearing eyetracking and physiological equipment in some of the sessions. However, there was no difference in the level of subjective Flow reports in these sessions compared to other sessions.

Trial-by-trial analysis is limited by the Flow self-report which only has one data point per trial. It would be more powerful to analyse inside each trial. In our data, self-reported Flow is a point model of an entire trial, for which the participant knows their score. Self-reported Flow is thus an after-the-fact report, and could be criticized for not capturing the in-themoment experience of Flow, which might fluctuate greatly during a trial. Analysing the fluctuation of Flow-experience requires us to model individual actions and/or their outcomes, and sample Flow during performance. This is a difficult challenge, because paying attention to one's phenomenal state might easily disrupt the very processes sustaining that state, especially for Flow which is unreflective by definition. Future work should aim to model the conditions of Flow (C1–3) in real-time, while simultaneously recording participant physiology, to uncover in greater detail the relationships involved. The existing dataset will be used for this purpose, in a pending report on the biosignals recorded with high temporal resolution, primarily electrodermal activity.

Future work should also look into individual differences in learning (or cognitive) styles. For example, ample evidence suggests people can be roughly placed on a continuum of verbal as opposed to visual learners. These learning styles, in turn, have been positively linked to cognitive abilities on either verbal or visual tasks (e.g., Choi and Sardar, 2011; Knoll et al., 2017). Since the task used in our study was highly visual (there were no verbal cues nor audio), learning in it might be moderated by players cognitive learning styles. For example, perhaps those with a propensity for visual learning find it easier to get into flow and thus perform better. Therefore, an interesting avenue for further research using our driving task is first dividing people into verbalizers and visualizers, and then seeing if visualizers in particular find the game flow-inducing.

In terms of possible applications, game-induced Flow has been studied in the context of technology-enhanced learning (TEL) games (e.g., Cowley et al., 2014), but the style of activity in such games tends to be rather more complex than the driving task reported here. Use of such simple tasks for TEL games has been reported (Cowley et al., 2011), but it remains unclear what higher-learning benefit is derived from the Flow induced by the TEL game (Cowley and Bateman, 2017). In summary, there remains a large conceptual gap between what is known about task-learning and Flow, and how to make use of Flow-inducing tasks for higher learning applications.

### 4.3. Conclusion

We report results that self-reported Flow in a novel, challenging, and engaging high-speed steering task relates to trial-by-trial task performance relative to the learning curve: "better than expected" trials have higher Flow scores, and "worse than expected" trials have lower scores. The average level of selfreported Flow was high, as the game was specifically designed to meet the main preconditions of Flow, including balance of current skill and challenge. Perhaps surprisingly, Flow did not seem to change with global skill development or improvement in task performance.

These results show that: (1) If a reference level is important (as the octant model requires), it is so on a trial-wise scale. In other words, the reference level is task-specific and is continually adjusted during skill acquisition, following in step with the individual's own learning curve; (2) Contrary to the intensity model (Keller and Landhäußer, 2012) the direction of challengeskill deviation cannot be ignored. Our study highlights a need for models of Flow to be developed in a way that better captures Flow dynamics, over the range of skill acquisition from novice to expert, than the state-like models of the phenomenal psychology tradition (Moneta, 2012).

Understanding how phenomenological experiences, such as Flow, relate to task performance is an important topic for understanding human motivation and performance. This study contributes to this goal and we hope will inspire more inquiry into the dynamics of the Flow experience in different stages of learning a skill.

#### 4.4. Context

The three senior authors conceived and guided the work. BC (the lead author) has studied computer gameplay from the perspective of performance, psychophysiology and emotional experiences. OL (the last author) has studied visuomotor skill in the domain of steering, and with JV he designed the "Flowinducing" high-speed steering game. JP (the second author) has studied emotions and decision-making in games of skill and chance, and written on the similarities and differences between the Zone and Flow phenomena. Because these authors share a keen interest in understanding the development of expertise and the phenomenon of Flow, it was decided to join forces and put together a team of researchers to develop the present experiment on the basis of the steering game designed earlier. The other authors (all more junior graduate students) were recruited to work as part of their studies. This experiment is part of a larger effort to initiate a line of research into the neurocognitive processes underlying Flow and expert performance, combining experimental, psychophysiological, and computational methods.

#### DATA AVAILABILITY

The R code and data used to produce all analyses and figures is permanently available online at https://doi.org/10.6084/m9. figshare.7268387.

#### REFERENCES


#### ETHICS STATEMENT

Participants were briefed and provided written informed consent before entering the study, and were aware of their legal rights. The study followed guidelines of the Declaration of Helsinki and was approved by the University of Helsinki Ethical review board in humanities and social and behavioral sciences (statement 31/2017; study title MulSimCoLab).

### AUTHOR CONTRIBUTIONS

OL, JP, and BC conceived the study. OL and JV designed the gameplay. JV developed the game software. BC and TT designed and implemented the data collection. NL, TT, PP, RF, V-PI, and JP translated the FSS. All authors participated in decisions on the experiment specifications. RF, V-PI, NL, PP, and TT conducted the experiment. BC, V-PI, TT, RF, PP, NL, JP, and OL analyzed and interpreted the results. BC, JP, and OL drafted the paper. All authors participated in writing and reviewing, and approved the manuscript.

#### ACKNOWLEDGMENTS

Authors wish to thank Kalle Toikka for conceptual contributions, data gathering, and team-work.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.01126/full#supplementary-material


Can u shape it? J. Exp. Soc. Psychol. 53, 62–69. doi: 10.1016/j.jesp.2014. 01.009


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Cowley, Palomäki, Tammi, Frantsi, Inkilä, Lehtonen, Pölönen, Vepsäläinen and Lappi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Is There an Optimal Autonomic State for Enhanced Flow and Executive Task Performance?

Michael S. Chin1,2 \* and Stefanos N. Kales3,4

<sup>1</sup> Division of General Internal Medicine and Public Health, Vanderbilt University School of Medicine, Nashville, TN, United States, <sup>2</sup> Vanderbilt Occupational Health, Vanderbilt University Medical Center, Nashville, TN, United States, <sup>3</sup> Environmental and Occupational Medicine and Epidemiology Program, Harvard T.H. Chan School of Public Health, Boston, MA, United States, <sup>4</sup> Department of Occupational Medicine, Cambridge Health Alliance, Harvard Medical School, Boston, MA, United States

Introduction: Flow describes a state of optimal experience that can promote a positive adaptation to increasing stress. The aim of the current study is to identify the ideal autonomic state for peak cognitive performance by correlating sympathovagal balance during cognitive stress with (1) perceived flow immersion and (2) executive task performance.

#### Edited by:

Frederic Dehais, National Higher School of Aeronautics and Space, France

#### Reviewed by:

Gianluca Di Flumeri, Sapienza University of Rome, Italy Jean-François Gagnon, Thales Research & Technology, Canada

> \*Correspondence: Michael S. Chin mchin@mail.harvard.edu

#### Specialty section:

This article was submitted to Performance Science, a section of the journal Frontiers in Psychology

Received: 24 April 2019 Accepted: 09 July 2019 Published: 14 August 2019

#### Citation:

Chin MS and Kales SN (2019) Is There an Optimal Autonomic State for Enhanced Flow and Executive Task Performance? Front. Psychol. 10:1716. doi: 10.3389/fpsyg.2019.01716 Materials and Methods: Autonomic states were varied in healthy male participants (n = 48) using combinations of patterned breathing and skeletal muscle contraction that are known to induce differing levels of autonomic response. After autonomic variation, a Stroop test was performed on participants to induce a mild stress response, and autonomic arousal was assessed using heart rate variability. Subjective experience of flow was measured by standardized self-report, and executive task performance was measured by reaction time on the Stroop test.

Results: There were significant associations between autonomic state and flow engagement with an inverted U-shaped function for parasympathetic stimulation, sympathetic response, and overall sympathovagal balance. There were also significant associations between autonomic states and reaction times. Combining sympathetic and parasympathetic responses to evaluate overall sympathovagal balance, there was a significant U-shaped relationship with reaction time.

Discussion: Our results support the flow theory of human performance in which the ideal autonomic state lies at the peak of an inverted-U function, and extremes at either end lead to both suboptimal flow experience. Similarly, cognitive task performance was maximized at the bottom of the U-function. Our findings suggest that optimal performance may be associated with predominant, but not total, sympathetic response.

Keywords: flow, heart rate variability, cognitive performance, parasympathetic and sympathetic reactivity, sympathovagal balance

# INTRODUCTION

fpsyg-10-01716 August 14, 2019 Time: 11:28 # 2

Recent public health focus on occupational burnout and stress resiliency has prompted further investigation into the role of the autonomic system in cognitive performance. Increasing number of recent studies have suggested that burnout is related to autonomic dysfunction during excessive stress (Lennartsson et al., 2016; Kanthak et al., 2017; May et al., 2018; Zhang et al., 2018; Traunmuller et al., 2019). However, despite these concerning observations, the effects of autonomic state on cognitive performance have not been fully defined, and some degree of increased stress may actually be desired for task performance. Increased autonomic arousal correlating with improved task performance is supported by several studies (Luft et al., 2009; Mathewson et al., 2010; Murray and Russoniello, 2012).

As a possible explanation resolving these seemingly conflicting findings, flow theory, as proposed by psychologist Csikszentmihalyi (1975), describes a state of optimal experience characterized by complete task immersion, effortless intention, intrinsic reward, and increased perception of control. This experience can promote a positive adaptation to increasing stress by a matching growth between challenges and skills through immediate task feedback.

The ideal physiological conditions to facilitate the flow experience have not been established. According to Csikszentmihalyi's (1975) theory, the flow state lies on the continuum between boredom and anxiety. Csikszentmihalyi (1975) proposed that there were physiological changes associated with the flow experience, but recognized this association between physiology and psychology were not readily established. Previous studies have identified a possible association between increased sympathetic enhancement during the flow experience, but none have adequately demonstrated an optimal level of autonomic arousal for both task performance and subjective flow experience (Keller et al., 2011; Gaggioli et al., 2013; Peifer et al., 2014; Bian et al., 2016).

Many popular mind–body disciplines such as Yoga, Tai Chi, and Qigong, have been shown to activate the parasympathetic nervous system to varying degrees (Goyal et al., 2014; Sullivan et al., 2018; Walther et al., 2018). As part of a related project, the authors previously studied healthy male subjects' responses to preconditioning using various rhythmic breathing and skeletal muscle contraction methods to vary their baseline autonomic states (Chin and Kales, 2019). Our previous study assessed paced respiration and dynamic tension through rhythmic skeletal muscle contraction as two core components common to Yoga, Tai Chi, and Qigong to better understand their interaction in activating the parasympathetic nervous system. The activation of the body's parasympathetic nervous system has been demonstrated to occur through respiratory entrainment effects (Jerath et al., 2006; Nijjar et al., 2014) as well as voluntary rhythmic muscle contraction (Lehrer et al., 2009; Vaschillo et al., 2011).

The aim of the current study is to utilize the autonomic variability resulting from these different patterns of preconditioning, when faced with a cognitive stressor, to identify an autonomic state for maximized cognitive performance. This study examines the relationship between sympathovagal balance during cognitive stress with (1) perceived flow immersion, and (2) task performance. Sympathovagal balance was assessed by measurement of heart rate variability (HRV), which can be considered an indicator of autonomic activity (McCraty and Shaffer, 2015). A recent meta-analysis of 37 studies by Kim et al. (2018) concluded that HRV is impacted by stress and can be used as an objective assessment of psychological stress.

#### MATERIALS AND METHODS

#### Participants

Forty-eight healthy male participants, ages 18–55 years were recruited from Harvard T.H. Chan School of Public Health- and Harvard Medical School-affiliated student programs, fellowships, and training residencies. Males were enrolled to minimize HRV due to hormonal variation (Sato and Miyake, 2004; Thayer et al., 2012; Koenig and Thayer, 2016). Any individuals with history of restrictive or obstructive lung disease, hypertension, or taking any medications that could affect heart rate were excluded. Caffeine consumption was not specifically restricted since withdrawal effects on HRV may exist for habitual caffeine users (Zimmermann-Viehoff et al., 2016).

#### Procedure

The study protocol was reviewed and approved by the IRB of the Harvard T.H. Chan School of Public Health. Testing occurred over a single 30-min session between the daytime hours of 09:00 and 16:00. A Polar H7 heart rate monitor (Polar Electro Oy, Kempele, Finland) was used as a validated research device to measure R–R intervals with accuracy comparable to electrocardiograms (Barbosa et al., 2016; Giles et al., 2016). A logging application on iPad (Apple Inc., Cupertino, CA, United States) recorded the R–R interval signals from the chest strap which were further analyzed.

Participants sat upright quietly for 5 min while reading the instructions for the study, and then, heart rate, cuff blood pressure, and respiration rate were measured. To generate varying baseline autonomic states among participants, subjects were randomized to one of four preconditioning groups: (1) nasal respiration at 0.1 Hz (inhale nose 5 s, exhale mouth 5 s) for 5 min, (2) contracting arm muscles by grasping a tennis ball at 0.1 Hz for 5 min (alternating contractions in left and right arms every 5 s), (3) performing contraction and nasal respiration in synchrony at 0.1 Hz for 5 min, and (4) reading consecutively four articles rated as emotionally neutral for 5 min. The contraction tasks have been demonstrated to vary cardiac reactivity through differing levels of resonance (Lehrer et al., 2000, 2009; Vaschillo et al., 2011). A graphical timer application on iPad was used to visually cue the breathing/contraction. The articles were Scientific American excerpts that were previously validated as emotionally neutral (van den Broek et al., 2001).

To assess executive task function, a computerized version of the Stroop test<sup>1</sup> was run for 5 min. The Stroop test has been demonstrated to produce a mild sympathetic response through

<sup>1</sup>http://cognitivefun.net

dissonant executive task function (Salahuddin et al., 2007; Visnovcova et al., 2014). Participants were asked to indicate the color of the word (and not its meaning) by keystroke, as quickly as possible, while minimizing their errors. For congruent trials, the displayed word and the color described by the word were the same. For incongruent trials, the displayed word and color presented were not the same. The reaction time for each word pair is recorded by the computer program with the premise of the Stroop test that incongruent pairs have longer reaction times when compared to congruent pairs (Dyer, 1973). As a marker for performance, a reaction time gap was calculated for each participant from the difference between congruent and incongruent pair reaction times. A shorter reaction time gap was considered to indicate higher performance.

At the end of this task period, a 5-min questionnaire Short Flow State Scale-2 (SFSS-2) was then administered to assess degree of flow engagement (Jackson et al., 2008). Flow engagement is the degree of perceived task immersion, and the SFSS-2 has been validated for evaluation of performance engagement. Respiration rates were monitored during all phases of testing to ensure they were within the 9–24 cycles/min range required to correspond accurately to vagal tone (Laborde et al., 2017).

#### Heart Rate Variability Analyses

Recorded R–R intervals were analyzed offline using Kubios HRV Premium software (Kubios Oy, Kuopio, Finland) using 2-min intervals based on recommendations from published standards (Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology, 1996). Frequency domain measures analyze the power distribution of HRV as a function of high frequency (HF) and low frequency (LF). HF and LF components are reflective of parasympathetic and sympathetic activation, respectively. LF/HF can be regarded as the overall sympathovagal balance and degree of autonomic arousal (Pagani et al., 1986; Shaffer and Ginsberg, 2017).

Heart rate variability recordings during Stroop test were processed by Kubios HRV Premium. Automated artifact correction was performed for all recordings prior to analysis. One-hundred twenty second sampling periods were utilized to derive LF, HF, and LF/HF using Fast Fourier transformation spectrum method (**Figure 1**). LF and HF bands were standardly defined as 0.04–0.15 and 0.15–0.4 Hz, respectively, and absolute power for each band was analyzed in normalized units, LF or HF divided by total power (Malliani et al., 1994).

# Statistical Analyses

All statistical analyses were performed using Prism 8 (GraphPad Software, San Diego, CA, United States). To analyze the overall effects of varying HRV on reaction time, HRV measures (LF, HF, LF/HF) for all subjects during the Stroop test were regressed on measured reaction time using the least squares regression method, with no weighting. Both linear and quadratic models were generated for each comparison. For all regressions, assumption of homoscedasticity was made. To assess the effects of varying HRV on flow engagement, HRV measures were similarly regressed on SFSS-2 scores, testing both linear and quadratic models. R 2 -values for each relationship were evaluated at p < 0.05 significance level. Outliers were identified for each analysis using robust regression and outlier removal, which is a validated method of outlier detection included in Prism 8 (Motulsky and Brown, 2006). The outlier false discovery rate was set at <1%. Any reported outliers reported were excluded from the analysis.

# RESULTS

All enrolled participants (n = 48) completed the study sessions without any adverse events. Participants' average age was 29.9 years (SD ± 5.96). As a result of randomization, 12 subjects were each assigned to one of the four preconditioning groups.

# HRV Measurements Between Preconditioning Groups

A previous study compared LF, HF, and LF/HF responses between the preconditioning groups measured during application of the Stroop test (Chin and Kales, 2019). As a summary of the findings of this analysis, the alternating contraction group had 71.7% higher activation of parasympathetic signal over respiration alone (p < 0.001). Alternating contractions synchronized with breathing demonstrated 150% higher parasympathetic activation than control (p < 0.0001). Between contraction alone and synchronized contraction groups, the synchronized group demonstrated 45.9% higher parasympathetic response during the cognitive stressor (p < 0.001).

# HRV and Maximal Flow Engagement

Heart rate variability was regressed onto SFSS-2 scores to analyze autonomic association with flow engagement. Representative examples of HRV time varying frequency domain measures and flow scores are in **Figure 1**. In all cases, quadratic functions were statistically significant, whereas linear functions were not. HF (**Figure 2A**) indicated an inverted U-shaped relationship between parasympathetic stimulation and SFSS-2 scores (DF = 44, R <sup>2</sup> = 0.110, p < 0.0001); one outlier was identified and excluded from this analysis. Similarly, when analyzing the sympathetic response (**Figure 2B**), a reciprocal relationship was found with an inverted U-shaped relationship between increasing and SFSS-2 scores (DF = 45, R <sup>2</sup> = 0.070, p < 0.001). When combining sympathetic and parasympathetic responses to measure overall sympathovagal balance, LF/HF (**Figure 2C**) demonstrated a significant inverted-U relationship when regressed on SFSS-2 (DF = 45, R <sup>2</sup> = 0.187, p < 0.0001). The interpolated value of the vertex was LF/HF 6.822 with a maximal SFSS-2 score of 38.85.

# HRV and Cognitive Performance

Reaction time gaps were used as a marker for cognitive performance. When HRV was regressed onto reaction time gaps, there were significant second-order relationships for all measures; linear regression analysis demonstrated a significant relationship only for LF. HF (**Figure 3A**) indicated a positive curvilinear relationship between parasympathetic stimulation and reaction time (DF = 45, R <sup>2</sup> = 0.053, p < 0.05). When analyzing the sympathetic response (**Figure 3B**), a reciprocal

FIGURE 1 | Top portion of the figure is a schematic of each study session. Underneath are representative examples of HRV frequency domain measures during the session for three different subjects. Increased power in the green frequency band represents parasympathetic HF response. Increased power in the red frequency band represents sympathetic LF response. During the Stroop test, the top subject exhibited high flow with a balanced LF response. The bottom two subjects, at extremes of either high or low sympathetic response, reported low flow.

FIGURE 2 | (A) Inverted U-shaped relationship between parasympathetic stimulation and SFSS-2 scores (DF = 44, R <sup>2</sup> = 0.110, p < 0.0001); one outlier was identified and excluded from this analysis. Dotted line represents the linear model (DF = 46, R <sup>2</sup> = 0.026, p > 0.05). Each point represents a single participant's HF measured during the cognitive stressor. (B) A reciprocal relationship was found with an inverted U-shaped relationship between increasing sympathetic response (LF) and SFSS-2 scores (DF = 45, R <sup>2</sup> = 0.070, p < 0.001). Dotted line represents the linear model (DF = 46, R <sup>2</sup> = 0.00087, p > 0.05). Each point represents a single participant's LF measured during the cognitive stressor. (C) When combining sympathetic and parasympathetic responses to measure overall sympathovagal balance, LF/HF demonstrated a significant inverted-U relationship when regressed on SFSS-2 (DF = 45, R <sup>2</sup> = 0.187, p < 0.0001). The interpolated value of the vertex was LF/HF 6.822 with a maximal SFSS-2 score of 38.85. Dotted line represents the linear model (DF = 46, R <sup>2</sup> = 0.031, p > 0.05). Each point represents a single participant's LF/HF measured during the cognitive stressor.

relationship was found with increasing LF resulting in decreasing reaction time gaps (DF = 45, R <sup>2</sup> = 0.117, p < 0.05). After combining sympathetic and parasympathetic responses to evaluate overall sympathovagal balance, LF/HF (**Figure 3C**) demonstrated a significant U-shaped relationship when regressed on reaction time (DF = 45, R <sup>2</sup> = 0.046, p < 0.0001). The interpolated value of the vertex was LF/HF 11.61 with a minimal reaction time gap of 143.5 ms.

#### DISCUSSION

After autonomic markers were regressed on flow scores, we demonstrated a possible inverted U-relationship between overall sympathovagal balance and self-reported experience of flow. With the exception of the LF relationship to reaction time, none of the linear models were statistically significant. Our finding supports Csikszentmihalyi's (1975) theory of human

FIGURE 3 | (A) When HRV was regressed onto reaction time gaps, HF indicated a positive curvilinear relationship between parasympathetic stimulation and reaction time (DF = 45, R <sup>2</sup> = 0.053, p < 0.05). Dotted line represents the linear model (DF = 46, R <sup>2</sup> = 0.046, p > 0.05). Each point represents a single participant's HF measured during the cognitive stressor. (B) When analyzing the sympathetic response, a reciprocal relationship was found with increasing LF resulting in decreasing reaction time gaps (DF = 45, R <sup>2</sup> = 0.117, p < 0.05). Dotted line represents the linear model (DF = 46, R <sup>2</sup> = 0.10, p < 0.05). Each point represents a single participant's LF measured during the cognitive stressor. (C) After combining sympathetic and parasympathetic responses to measure overall sympathovagal balance, LF/HF demonstrated a significant U-shaped relationship when regressed on reaction time (DF = 45, R <sup>2</sup> = 0.046, p < 0.0001). The interpolated value of the vertex was LF/HF 11.61 with a minimal reaction time gap of 143.5 ms. Dotted line represents the linear model (DF = 46, R <sup>2</sup> = 0.0083, p > 0.05). Each point represents a single participant's LF/HF measured during the cognitive stressor.

performance in which the optimal autonomic state lies at the peak of an inverted-U, in which extremes at either end lead to suboptimal experience.

Studies on leisure activities, such as playing piano (de Manzano et al., 2010) and video games (Kozhevnikov et al., 2018), have effectively demonstrated the left side of the inverted-U curve, in which increasing sympathetic activation leads to increased flow experience. Paradoxically, other studies on job stress illustrate that increasing sympathetic arousal can also lead to declining performance and burnout. A recent systematic review identified 13 studies which confirmed a negative association between parasympathetic response and job stress or burnout (de Looff et al., 2018). When taken together, these seemingly opposing responses to arousal may actually just represent different sides of the inverted-U relationship demonstrated in our results.

A significant limitation in previous studies remained that the full spectrum of sympathetic arousal had not been represented due to insufficiently varied autonomic states. Peifer et al. (2014) addressed this shortcoming in study design by varying the arousal stimulus and inducing differing levels of social stress prior to task testing. Interestingly, using this design to create further autonomic variability, the authors did successfully demonstrate a quadratic inverted-U relationship for sympathetic LF HRV. However, they demonstrated only a linear association between increasing HF parasympathetic response and flow experience, but it should be noted that the study power was limited by the small study sample of only 22 participants.

Similar to Peifer et al. (2014)'s study design, we varied our baseline autonomic states. However, instead of priming subjects using varying social stressors, our methodology used differing combinations of patterned breathing and skeletal muscle contraction that are known to induce varying levels of autonomic response. With this preconditioning, we were able to prime enough baseline autonomic variation to demontrate the inverted-U relationship between increasing sympathetic arousal and flow experience. Improving on Peifer et al. (2014), we did successfully demonstrate an inverted-U relationship between increasing parasympathetic response and flow. Furthermore, when analyzing combined sympathic and parasympathetic balance as LF/HF, we found that the inverted-U curve remained significant with the interpolated maximum flow occuring when LF/HF balance was 6.822. Stated another way, the optimal flow experience occurred when the autonomic state comprised of 87% sympathetic and 13% parasympathetic response.

Our results also support the potential existence of an optimal autonomic state for high cognitive performance. Reaction time during the Stroop test is often referenced as a marker of executive function (Egner and Hirsch, 2005). When interpreting the effect of HRV on reaction times, the overall sympathovagal balance suggested the minimum reaction time at a sympathetically dominated balance. With the estimated minimum reaction time at LF/HF 11.61, this could be interpreted as the fastest reaction times occurring when the autonomic state is at predominantly (92%) sympathetic response. It should also be noted that while our results were generally statistically significant, autonomic state only explained approximately 10– 20% of the overall variability observed in flow scores. While not able to fully account for changes in flow, our results suggest that autonomic state at least partially influences the flow experience. Other unaccounted factors, such as mindset or task familiarity, may play important roles in further determining the flow relationship.

The finding of increased arousal in autonomic states correlating with improved task performance is supported by several studies. In a study on executive function using a similar Stroop color test, increased sympathetic tone was associated with faster response times in color naming (Mathewson et al., 2010). Other studies have demonstrated that invoking a higher stress response can also increase cognitive performance. In a study on perceived threat of electric shock, subjects with increased anxiety demonstrated faster reaction time in executive tasks under the threat of unpredictable shocks to the hand (Cantelon et al., 2018). In studies on physical exertion in athletes, higher stress responses from

exertion were associated with faster reaction times (Luft et al., 2009; Murray and Russoniello, 2012).

#### Limitations

There are two main limitations to the external validity of our pilot study. Since estrogen levels can affect HRV during the stress response, we chose to limit our participant enrollment to male participants in order to minimize hormonal variability. Therefore, we may have limited generalizability to females. Future studies should validate our findings in a female population. Secondly, our findings are limited based on a relatively small pilot study. Replication of these findings should be attempted in a future on a larger sample size.

Use of LF/HF as a measure of sympathovagal balance has been highly debated (Eckberg, 1997; Billman, 2013), but despite this challenge, recent studies on stress response still use LF/HF as a marker for sympathovagal balance (Lennartsson et al., 2016; Cao et al., 2019). von Rosenberg et al. (2017) has proposed a new twodimensional method of LF/HF analysis to categorize mental and physical stresses. This new methodology might be promising for a future analysis in a larger confirmatory study.

#### CONCLUSION

When reviewing both executive task function and flow experience, both indices appeared to be maximized at approximately 90% sympathetic state. To our knowledge, this is the first study to suggest U-shaped relationships existing simultaneously for both flow experience and executive task function, suggesting that optimal performance may be associated with predominant, but not total, sympathetic response. Our findings are based on a small pilot so the results should be approached with caution. However, they do provide a preliminary foundation to understand the practical applications of autonomic modulation to potentially enhance performance during high-stress situations.

#### REFERENCES


#### DATA AVAILABILITY

The datasets generated for this study are available on request to the corresponding author.

#### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of IRB of the Harvard T.H. Chan School of Public Health with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the IRB of the Harvard T.H. Chan School of Public Health.

#### AUTHOR CONTRIBUTIONS

MC originated the study design, conducted the experiments, and performed all study analyses. SK contributed to the study design. Both authors drafted, revised, and approved the manuscript.

#### FUNDING

This work was partly supported by a training grant from the Harvard Education and Research Center for Occupational Safety and Health (T42 OH008416 to MC) and funded by an award from the Harvard Chan-NIEHS Center for Environmental Health (NIEHS Grant P30 ES000002).

#### ACKNOWLEDGMENTS

The authors would like to thank Ms. Ann Backus for her assistance with study participant recruitment. MC would like to recognize the mentorship of Mr. Vladimir Vasiliev, whose teachings on Systema breathwork and performance under stress, served as the original inspiration for this study.

airline pilots during flight simulations. Int. J. Environ. Res. Public Health 16:237. doi: 10.3390/ijerph16020237




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Chin and Kales. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Effects of a Brief Strange Loop Task on Immediate Word Length Comparison: A Mindfulness Study on Non-striving

*Ying Hwa Kee\*, Khin Maung Aye, Raisyad Ferozd and Chunxiao Li*

*National Institute of Education, Nanyang Technological University, Singapore, Singapore*

Non-striving is an important aspect of mindfulness practice, but it has not been sufficiently researched. This study examines whether a strange loop-based task – Infinite Water Scooping Task – performed for 10 min, has an effect on non-striving behavior and performance in a subsequent word length comparison task. Results showed that performance (number of correct trials) did not differ significantly between the two groups, though the experimental group tended to perform worse. However, participants in the experimental group took a significantly shorter time to respond to the word length comparison task than those in the control group. It is inferred that shorter time taken reflects response without investing much effort to count with care, i.e., non-striving. The present study demonstrates that the brief strange loop task implemented in this study elicited non-striving behavior compared to the effects of the control task, and this adds to the understanding of non-striving in the context of mindfulness. The Infinite Water Scooping Task may be useful for illustrating and teaching non-striving within mindfulness practice.

#### *Edited by:*

*Jussi Palomäki, University of Helsinki, Finland*

#### *Reviewed by:*

*Staci Anne Vicary, Australian College of Applied Psychology, Australia Fernando Rosas, Imperial College London, United Kingdom*

#### *\*Correspondence:*

*Ying Hwa Kee yinghwa.kee@nie.edu.sg*

#### *Specialty section:*

*This article was submitted to Performance Science, a section of the journal Frontiers in Psychology*

*Received: 09 July 2019 Accepted: 27 September 2019 Published: 11 October 2019*

#### *Citation:*

*Kee YH, Aye KM, Ferozd R and Li C (2019) Effects of a Brief Strange Loop Task on Immediate Word Length Comparison: A Mindfulness Study on Non-striving. Front. Psychol. 10:2314. doi: 10.3389/fpsyg.2019.02314*

Keywords: mindfulness, paradox and ambiguity, reaction time, motivation, non-striving, goal setting

Mindfulness can be generally described as repetitive and sustained attentional efforts toward ongoing moments in an accepting and non-judgmental fashion (Kabat-Zinn, 1990). The consensus from previous operationalization of mindfulness points to attentional regulation and non-judgmental acceptance as two key aspects of mindfulness (Bishop et al., 2004; Lindsay and Creswell, 2017), consistent with the general definition above. The notion of acceptance and being non-judgmental underpins the uniqueness of mindfulness practice relative to other coping strategies, which are often focused on problem identification and intended eradication of problems, such as those found in traditional cognitive behavioral therapy approaches (Hofmann and Asmundson, 2008). In contract, the practice of acceptance and non-judgmental awareness within the mindfulness approach downplays problem eradication as mindfulness practice typically involves an observation mode, mere labeling of emotions, and distancing from mental content (Hinton et al., 2013). Suffice to say, the notion of "letting go" captures the sense of non-judgmental acceptance sufficiently well, as seen in previous literature (e.g., Frewen et al., 2008; Bergeron et al., 2016; Ruskin et al., 2017; Blackie and Kocovski, 2018). Thus, it can be assumed that implicit within the mindfulness approach is a sense of "letting go" or non-striving orientation which serves to obliterate the usual sense of resistance and judgmental stance toward undesired (or even desirable) circumstances and emotions.

Non-striving is an associated quality of mindfulness that is relatively less researched upon compared to other better known correlates such as compassion (Shonin et al., 2015), attention (Mak et al., 2018), and self-esteem (Randal et al., 2015). In English, the definition of striving is "to devote serious effort or energy" (Striving, n.d.). Accordingly, non-striving would mean the absence of devotion of serious effort or energy toward a task. In terms of mindfulness practice, an attitude of non-striving is about non-doing while undertaking the practice, trying less, and simply experiencing the moment (Kabat-Zinn, 1990). Kabat-Zinn (1990) elucidates that adopting a non-striving mind-set means that one is not desiring to change anything related to the moment through mindfulness practice, but instead being simply aware of the ongoing circumstances. This notion of non-striving discussed in secular mindfulness literature possibly roots from works in Eastern and Buddhist philosophy. Specifically, within the Chinese philosophy, the term *wu-wei* or effortless action has the connotation of non-striving as it refers to the harmonized state of mind while one performs actions spontaneously, with freedom from "the need for extended deliberation of inner struggle" (Slingerland, 2003, p. 7). This lack of inner struggle can be construed as non-striving. While discussion on non-striving remains scarce in the secular academic space today, mindfulness scholars such as Baer (2006) and Shapiro et al. (2018), echoing Kabat-Zinn's view, too noted that mindfulness meditation should be practiced with no specific goal in mind, operationalizing the practical notion of non-striving for mindfulness practice to some extent. To further contribute to understanding of non-striving in the context of mindfulness practice, we conducted an experiment to investigate the effects of a brief strange loop task, purported to elicit sense of futility and paradoxity, on subsequent non-striving behavior and performance. We posit that this is an important endeavor that can help further operationalize the notion of non-striving for secular mindfulness practice.

The notion of non-striving in mindfulness practice is a paradox. Goal-oriented instructions for mindfulness practice, such as paying attention to the target object, are typically prescribed; and repeated attempts to adhere to the instruction are undertaken by the practitioner (Baer, 2006). But yet, mindfulness practice necessitates non-judgmental awareness, which could also, in theory, include openness and acceptance of whether there was success in adhering to the goal-oriented instruction. To illustrate, one may begin a mindfulness practice session with the goal of sitting for 30 minutes, with attention purposefully fixated on breathing and adopting a strategy to be aware of passing thoughts without reacting to them. Clearly, this takes some discipline and effort. Paradoxically, to adhere to this instruction well, it is also important to adopt a non-striving attitude toward the task. That is, wanting and trying too hard to achieve a certain "mindful state" can be counter-productive (Shapiro et al., 2018). Taking non-striving to another level, even the desire for relaxation, reduction of pain, or alteration of thoughts and emotions should be downplayed even if those are the reasons for initiating mindfulness practice (Baer, 2006). In sum, Shapiro et al. (p. 1697) describe non-striving in mindfulness practice as "an alert, relaxed attention (which requires effort to develop) without pursuing any specific goal," and acknowledge that this attitude is often elusive. For example, the instruction not to strive hard in mindfulness practice may lead one to give up the practice too easily (Hopkins and Proeve, 2013). Suffice to say, non-striving is a nontrivial issue in the study of mindfulness that deserves more research as its coverage is currently limited in the extant literature.

Despite the centrality of non-striving in mindfulness, published academic work discussing non-striving in the context of mindfulness is surprisingly rare. By far, Shapiro et al. (2018) provided the most extensive discussion of non-striving in a narrative review which described paradoxes of mindfulness. In essence, Shapiro et al. (p. 1697) highlighted that practitioners often find it difficult to "… simultaneously allow what arise to arise – not strive to cultivate a particular state of mind – while trying to focus the mind on a particular object of attention…" They further explained that a way out of this paradoxical dilemma is to adopt the middle path, by striving at times and letting go in other instances. A more accurate interpretation would be to strive or to non-strive for different aspects of practice. For example, making the necessary effort to apply the mindfulness technique, while letting go of expectations. The difficulty of actualizing non-striving was documented in a research conducted by Solhaug et al. (2016), where they found that participants who underwent a mindfulness intervention program expressed lesser ease in appreciating the non-striving aspects of mindfulness compared to the attentional aspects. The former is somewhat counterintuitive while the latter is more concrete and less abstract. In essence, such empirical work on non-striving remains scarce. Further research is needed to better understand the operationalization of non-striving within mindfulness practices, particularly for future possible meaningful incorporations in clinical (Roemer and Orsillo, 2003) and other areas of interventions. Particularly, mindfulness practice tasks that can concretely introduce or elicit the experience of non-striving is especially needed.

Given the lack of research on non-striving and practice tasks that can introduce this concept well, the present study investigated the effects of a mindfulness-like practice task aimed at priming non-striving, futility, and process focus, on consequent non-striving behavior and performance. To this end, we created a brief intervention task – "Infinite Water Scooping Task," which we speculate could be useful as a mindfulness practice for introducing non-striving in the future. This task involves using a small scoop to continuously transfer water over a string dividing a filled water container. Essentially, the task is perceptually futile in nature, as water would flow freely beneath the string after pouring, making the effort made in scooping and pouring seem purposeless. We speculate that performing this task would induce non-striving attitude temporarily even when practiced briefly for 10 min, compared to performing the control task of scooping and pouring from one container to another. A person performing the control task would see water level reducing in one container and increasing in another, whereas one performing the Infinite Water Scooping Task would not see incremental changes in water level over time. While both tasks involve essentially performing the same scooping and pouring action, it is intended the Infinite Water Scooping Task would elicit a mental state that departs from the usual outcome-oriented mental state that would be primed through the control task. The experience of performing the control task is deemed to be consistent with the *modus operandi* of most human as goal driven cognition is a norm in modern societies (e.g., Kenrick et al., 2010). On the other hand, the brief repetition of Infinite Water Scooping Task attempts to elicit non-striving orientation without explicitly instructing and emphasizing non-striving, as would be communicated during typical mindfulness practice.

The creation of the Infinite Water Scooping Task was inspired by the idea of strange loop proposed by Hofstadter (2007). Briefly, strange loop is characterized by self-referentiality and paradoxity. One example of strange loop offered by Hofstadter (2007) is M. C. Escher's lithograph Drawing Hands, which depicts the right hand drawing the left hand, which in turn draws the right hand, forming an infinite loop. Another example is the case of Penrose stairs or impossible stairs created by Lionel Penrose and Roger Penrose, which is a continuously looping staircase that creates the impossible perception of climbing higher (Penrose and Penrose, 1958). In Hofstadter's (2007, p. 101–102) words, strange loop is "... an abstract loop in which, in the series of stages that constitute the cyclingaround, there is a shift from one level of abstraction (or structure) to another, which feels like an upwards movement in a hierarchy, and yet somehow the successive 'upward' shifts turn out to give rise to a closed cycle. That is, despite one's sense of departing ever further from one's origin, one winds up, to one's shock, exactly where one had started out. In short, a strange loop is a paradoxical level-crossing feedback loop."

While those above examples of strange loop comprised of optical illusions, the Infinite Water Scooping Task is aimed at eliciting a sense of paradox through a physical experience. In this task, when water is scooped and lifted across the string, there is an upward shift from one level of abstraction to another (where the abstraction refers to the state of the actions, such as where the scooped water is). The shift advances further when the water is poured out. Paradoxically, with this further advancement, water from the scoop merges with the original water source in the container, and the original state is revisited. In theory, this task can continue infinitely, like the continuously looping staircase in Penrose stairs. Hofstadter (2007) noted in the earlier description that some kind of "shock" is experienced in realizing the paradox of returning to the original state despite advancing. We liken this "shock" as the realization of the futility or the paradox of "physically exerting effort to scoop and pour water and yet effecting no change." By continuously performing the Infinite Water Scooping Task for a few minutes, we expect participants to experience a temporary departure from the usual orientation of associating effort with outcome, which is typically ingrained in modern cultures.

We expect the experience of dissociation of effort with outcome through the Infinite Water Scooping Task to have some effects on subsequent non-striving behavior as performing this task continuously for a brief period could elicit psychological effects similar to mindfulness practice. The similarity lies in its repetitive nature, akin to that of, say, continuously paying attention to one's breathing in mindfulness breathing exercises or repetitively watching how one walks in walking meditations. Typically, in mindfulness practice, one simply observes the continuous process unfolds cycle after cycle without expectation of any advancement. In other words, watching the in- and out-breaths, and appreciating that these are the only two states in the cycle, and that there is no need for striving to advance in breathing stages. In the Infinite Water Scooping Task, it is perceptually clear that there is no necessity of striving as there will be no visible change in water level. As the instruction to focus on the process is given concurrently, we therefore speculate that performing the task for 10 min is akin to mindful movement practice. Thus, upon completing this task, a mental state resembling mindfulness state could be activated. Here, we are primarily interested in examining whether there will be indications of weaker willingness to strive in a secondary task as a result of performing the Infinite Water Scooping Task.

In summary, we argued that non-striving is an important aspect of mindfulness practice, but it has not been sufficiently researched. To fill this gap in research, we conducted a randomized experiment to examine whether an essentially futile task designed based on the ideas of strange loop – Infinite Water Scoping Task – performed for 10 min has an effect on subsequent non-striving behavior and performance. Given the likelihood that the psychological experience of performing the Infinite Water Scooping Task is different from transferring water from one container to another (control condition), the main hypothesis tested is that the experimental group participants would spend less time in performing the subsequent word length comparison task due to lower efforts invested in counting carefully (i.e., showing lesser degrees of striving), relative to the control group.

### MATERIALS AND METHODS

#### Participants

Sixty participants comprised of 38 males (*M* age = 24.61, SD = 2.62) and 22 females (*M* age = 24.36, SD = 4.85) took part in the study. Participants were recruited from the university community *via* social media, posters, and word of mouth. The study was approved by the university's institutional review board where the study was conducted. All participants provided informed consent. A SGD 10 (approx. USD 7.5) shopping voucher was given to each participant for his or her involvement in this study. They were randomly assigned to either the experimental or control condition in equal distribution. The methods were carried out in accordance to the guidelines stated by the university's institutional review board.

### Brief Intervention Tasks

Common in the manipulation tasks of the experimental and control conditions was the task of scooping and pouring water at one's self-selected pace for 10 min continuously using a small plastic scoop (15 cc in capacity with a 18-cm handle) in a standing position. The difference in conditions was in terms of the way the water container was set up on the table (i.e., using one or two containers). The video depicting the intervention tasks is provided in the **Supplementary Material** section. Before participants started the water scooping task, a brief condition-specific instruction was delivered to them *via* an Android tablet app in text and in audio script. When the time was up, the app played a chime to indicate the end of the intervention task.

#### Experimental Condition (Infinite Water Scooping Task)

In the experimental condition, a string was tied across the top of a container with the dimension of 42 (length) × 34.5 (width) × 17 (height) in cm, dividing the left and right of the container equally. The container was half-filled with tap water. Participants were tasked to scoop water from one side of the container, lifting the scoop over the string before pouring it back into the container. The task was meant to be perceptually futile and purposeless, because no drastic change to the water level would be observed as water flows across freely under the string within the same container. The instruction given to them *via* the app was to pour water using the scoop across the string at their own pace and to focus on the process during the task.

#### Control Condition

In the control condition, two containers, each with dimensions of 27 (length) × 20 (width) × 16.5 (height) cm were placed side by side. One of them was half-filled with tap water. Participants were tasked to scoop water from one container, and then pour it into the other container. The instruction given to them by the app was to pour water using the scoop across two containers at their own pace and that the task is practiced as a means of developing wrist control and strength. The control task was an appropriate match for the experimental task in terms of motor execution, effort, and timing. The effort applied was not futile and changes in water level can be observed. In contrast with the experimental task, the control task can be deemed as a common task.

#### Secondary Task: Word Length Comparison Task

To detect the extent to which participants display non-striving behavior after undergoing the respective manipulations, we used a secondary task adapted from the task outlined by Touré-Tillery and Fishbach (2012) in their Experiment 5 and implemented it using OpenSesame (Mathôt et al., 2012) on a laptop computer. In this task, within each trial, two English words were shown separately on left and right sides of the screen. The task requires participants to choose the word with fewer letters by responding accordingly on the keyboard. There was no emphasis on timing and accuracy. The task is challenging as each trial had a pair of words that were different in length only by a letter. For example, "outstretches" vs. "repolarized," "fishburger" vs. "assignation," "birthweight" vs. "utterances," etc. Words used were between 9 and 12 letters long, and were generated randomly using a computer script beforehand. Trials comprised of longer words would present more difficulty relative to trials with shorter words, by virtue of effort needed to count them.

The entire task comprised of two practice trials and seven blocks of six trials each (42 actual trials). We derived two types of measures from this task for each participant. First, the sum of correctly performed trials was calculated to serve as a measure of task performance. Second, the mean time (in ms) spent on each trial was calculated for all trials performed by each participant. We further derived the individualized mean time spent on trials that featured (1) 9 and 10 letters, (2) 10 and 11 letters, and (3) 11 and 12 letters, such that between-group comparisons can be made based on performances for trials of similar difficulty. This allows us to verify if the observed results were consistent regardless of task difficulty. The time taken to respond to the task was used as a proxy for striving as it reflects one's effort to count the letters accurately before making a decision. Conversely, lesser time spent reflects non-striving as decisions were likely made without careful counting. As we did not emphasize on the speed or accuracy requirements, participants who strove to perform the task correctly might be more likely to be counting the words rather than estimating to get the correct answer. They were also told to move at their own pace and to feel free to take short breaks between blocks of trials if needed. In a nutshell, the salient instruction given to the participants was to pick the shorter word without time pressure. With that, we expected that the general speed-accuracy trade-off phenomenon predicted by Fitts' law would naturally occur (Fitts, 1954), i.e., those who responded faster would tend to sacrifice accuracy, and vice versa. The experimental group, hypothesized to be non-striving, should respond faster but would have less correct trials, indicative of their lack of effort in getting the task right, if they simply adhered to the instruction of picking the shorter word without trying to be fast in responding.

#### Procedure

Participants were individually tested in a quiet room and each session lasted for approximately 30 min. When participants arrived for the study, they were randomly assigned to one of the two conditions. After providing informed consent and given the shopping voucher, they proceeded to receive the task instructions and performed the water scooping task for 10 min according to the condition assigned, following which, they proceeded to complete the word length comparison task. Finally, participants were debriefed and thanked for their participation.

### Data Analysis

The initial step involved identifying and removing the outliers among the participants by detecting peculiar performances, such as those resulting from misinterpretation of task instructions. Next, we tabulated the descriptive statistics for sample-wise mean count of correctly identified trials and mean completion time spent. Data normality was examined using Shapiro-Wilk tests. Inferential statistical tests were performed using the robust Yuen's *t* test for trimmed means, with bootstrapping set at 2,000, and trim level for the mean set as 0.20, as recommended in Field and Wilcox, 2017. Tests for group differences were conducted for the following dependent variables: number of correct trials (proxy for task performance), and mean time taken for completing each trial based on all 42 trials as well as that of trials with 9 and 10 letters, 10 and 11 letters, and 11 and 12 letters (proxies for non-striving to test the main hypothesis). The reason for analyzing the trials of similar word length separately is that shorter sets of words could take lesser striving/effort to perform while longer sets of words could require more effort, by virtue of total length presented. The additional analyses serve to determine whether group differences observed are present regardless of the word length used. Additionally, we also repeated the analyses for mean completion time of trials that were correctly performed to countercheck if the result was consistent with the earlier analyses based on all trials (correct and incorrect trials). The *yuenbt* function from R package WRS2 (Mair and Wilcox, 2015) was used for the robust statistical analysis. In this package, akp effect size which is a robust version of Cohen's *d*, proposed by Algina et al. (2005), was used. The same rules of thumb as for Cohen's *d* can be used to interpret the effect sizes; that is, 0.2, 0.5, and 0.8 correspond to small, medium, and large effects, respectively.

#### RESULTS

#### Descriptive Statistics

Among the 60 participants, data from three participants were removed from further analysis as they achieved less than two correct trials in the word length comparison task. They failed to adhere to the given task instructions. This resulted in 27 participants in the control group and 30 participants in the experimental group. Of these remaining 57 participants, the mean count of correctly identified trials is 39.02 out of 42 (92.90%), and ranged from 34 to 42 trials. The mean trial completion time is 3,876 ms, and ranged from 786 to 10,770 ms. Results of Shapiro–Wilk test of normality for counts of correct response (*p* = 0.004) and time spent on each correct answer (*p* < 0.001) suggest that the distribution of the data is significantly different from normal distribution, i.e., normal distribution cannot be assumed. The decision to rely on robust approach for inferential statistical analysis based on Yuen's modified *t* test for independent trimmed means with bootstrapping (Field and Wilcox, 2017) was thus made.

#### Inferential Statistics

Before testing the main hypothesis, we compared the number of correct trials between the two groups to ascertain if there was a difference in task performance (based on the number of correct trials). The robust Yuen's *t* test for trimmed means (bootstrapping set at 2,000, and trim level for the mean as 0.20) detected no significant difference between trimmed means in number of correct trials for the control group and the experimental group, *M*diff = 1.27 (−0.03, 2.56), *Y*t = 2.02, *p* = 0.055. However, it can be interpreted that the value of *p* is close to the critical value for significance difference. **Figure 1** suggests that the number of correct trials in the control group tended to be higher than that of the experimental group. The effect size was found to be 0.57, which was slightly larger than a medium effect size.

The main hypothesis that degree of non-striving differed between the two conditions was tested based on mean time spent on trials as a proxy measure for non-striving. When all 42 trials were considered, the robust Yuen's *t* test for trimmed means (bootstrapping set at 2,000, and trim level for the mean as 0.20) showed that there was a significant group difference, *M*diff = 2358.35 (573.48, 4143.21), *Y*t = 2.65, *p* = 0.016. **Figure 2**

and experimental groups.

shows that participants in the experimental group spent significantly less time working on each trial compared to those in the control group. The effect size was found to be 0.78, which was close to a large effect size. The combination of results above shows that the control group took a longer time and yielded more correct trials, compared to the experimental group which took a shorter time but registered less correct trials. This suggests that the expected speed-accuracy trade-off occurred with the mere instruction to pick the shorter word without time pressure.

Further tests were undertaken to ascertain that the observed differences in time spent were also present for trials of similar difficulty (i.e., trials with words that had the same total number of letters). When trials with words of 9 and 10 letters are considered, the Yuen's *t* test for trimmed means (bootstrapping set at 2,000, and trim level for the mean as 0.20) showed that there was a significant group difference, *M*diff = 1871.91 (171.56, 3572.26), *Y*t = 2.39, *p* = 0.031. The effect size of 0.70 was observed. For trials with words of 10 and 11 letters, the result again revealed a significant difference between groups, *M*diff = 2378.98 (414.04, 4343.91), *Y*t = 2.53, *p* = 0.017. The effect size was found to be 0.74. Lastly, for trials with words of 11 and 12 letters, the result is similar in that a significant difference between groups was observed, *M*diff = 2693.13 (712.90, 4673.36), *Y*t = 2.76, *p* = 0.010. The effect size value was large at 0.80. In all cases, the experimental group spent lesser time than the control group as depicted in **Figure 3**.

Lastly, as the analyses above were based on response timings of all trials, regardless of whether they were correctly or incorrectly performed, we repeated these analyses based on response timings of trials that were correctly performed to remove the possible noise in the data due to incorrect trials. Similar findings were observed. This additional observation suggests that the experimental effects are present even when we removed trials that were performed incorrectly due to the participants giving up or responding carelessly on certain trials. The effects of incorrect trials were negligible.

#### DISCUSSION

As there is a dearth of empirical research on the non-striving aspect of mindfulness, we tested the effects of a presumably futile strange loop task that has the potential of being developed as a mindfulness task, on subsequent non-striving behavior and performance. Although there was no significant group difference observed in terms of performance (based on total number of correct trials achieved), the overall results (close to significant *p* value and medium effect size) suggest that the control group showed tendencies of better performance than the experimental group. It is likely that those in the control group applied more effort and adopted the counting strategy to achieve the accuracy while the experimental group tended not to. Findings from the test of the main hypothesis suggest that participants who underwent the Infinite Water Scooping Task took significantly shorter time to perform the task compared to the control group. The effect was observed when all trials were considered, and when trials of similar difficulty level were considered. Collectively, our hypothesis that performing the futile strange loop task would prime subsequent non-striving behavior is supported.

The current outcome of eliciting non-striving behavior is noteworthy as non-striving is an important aspect of mindfulness practice that is rarely examined in research. Here, we observed that 10 min of repetitively scooping and pouring water over a string tied across a container, peppered only with a brief initial reminder to focus on the process of the actions (without mention of it being a mindfulness practice), led to non-striving consequently when compared to the control task. One speculation could be that the implicit realization of task's futility or paradoxity arising from repetitive performance of the strange loop task (Hofstadter, 2007) is having an effect on one's tolerance of ambiguity. Having to repeatedly scoop and pour water for 10 min without perceiving any sign of advancement may have gradually activated a sense of acceptance toward the ambiguities of the task. The inner dialogue could be something like: "this task is paradoxical in nature and does not make sense, but I continue with it nevertheless." A sense of acceptance may have been resulted through repeated actions required of the task, akin to acceptance of unfoldment of in- and out-breaths in mindful breathing practices. Following the completion of this task, this tolerance for ambiguity or paradox, in turn, may have lowered one's efforts to judge critically (or count carefully) during the subsequent task. Previously, Carson and Langer (2006) noted that actively thinking about paradoxes could increase one's ability to tolerate ambiguity, which they viewed as a hallmark of mindfulness. In a way, performing the Infinite Water Scooping Task for a few minutes possibly succeeded in exposing the participants to a sense of paradox, akin to the sense of perceiving paradoxes when contemplating Zen koans (Christopher, 2003; Maex, 2011), which has been linked to psychological awakening.

In the present context, by psychological awakening, we mean leading participants out of their fixed ways of thinking or habits of striving, at least temporarily. Since the control group performed an objective-oriented task (i.e., clearly perceiving changes in water levels as a result of their action, and being told that the task is a means of developing wrist control and strength), performing the task could have heightened the sense of purpose-driven thinking and certainty that is implicitly habitual in most people (Covington, 2000; Monteiro et al., 2018; Ronkainen et al., 2018). Upon completing the control task that is perceptually more normal and non-ambiguous, the tendency to count the letters purposefully to achieve better accuracy in the subsequent word length comparison task may have been more readily evoked, as shown by the longer trial durations. Based on the shorter trial duration observed, it appears that those in the experimental group tended not to count before making a response. If they were striving to achieve task accuracy, they would have spent more time in the task by counting the letters carefully. As earlier alluded to, shorter time spent can be viewed as a sign of their willingness to tolerate ambiguity and errors in their task performance. They are striving less toward achieving accuracy as per required by the task. In a way, those in the experimental group may be less judgmental about themselves in terms of whether they performed the task accurately, while those in the control group seem more objective-oriented about their task performance. Taken together, the Infinite Water Scooping Task seems to be instrumental in leading participants out of their habits of striving, at least temporarily, when compared to the control task.

There are some practical implications arising from the current findings worthy of mention. First, given the current results, the Infinite Water Scooping Task could potentially be used to introduce the notion of non-striving within mindfulness practice in various contexts, such as when teaching mindfulness in clinical interventions (Roemer and Orsillo, 2003), in education (McKeering and Hwang, 2019), and in sport as part of mental skills training (Ortega and Wang, 2018). For instance, the Infinite Water Scooping Task can be used as an introductory task to teach what is meant by notions of repetition, focus on process, and purposelessness within mindfulness practice, relative to a task that is done with a purpose. Beginning mindfulness practitioners could especially benefit by experiencing the nuts and bolts of mindfulness practice through the physical nature of the task (Kee, in press), alleviating some of the difficulty raised by Solhaug et al. (2016). Second, the possibility of eliciting one's tolerance for ambiguity and non-striving through the Infinite Water Scooping Task may be useful when it comes to helping one prepare for performance or creative situations when a creative, relaxed, and open mind-set is needed. For example, in sports, some athletes described their best performance during a flow experience (a peak psychological state in which one is fully immersed in an activity experience) as one of intense focus without making effort in keeping focused (Jackson and Roberts, 1992). The relaxed and non-striving experience accompanied by the Infinite Water Scooping Task or other similar tasks may help athletes navigate through the nuances accompanying such peak psychological states when they use it as part of their mental skills repertoires. For example, an archer can use this task to get him/herself mentally prepared to focus on the process and not the outcome for his/her shoot. Likewise, this task could add to the list of other known strategies for enhancing creativity such as one described and tested by Chirico et al. (2018), in that this task may help one to suspend judgment momentarily to allow creativity to flow. Lastly, beyond these two specific implications, there is potential for the mindful repetition of the Infinite Water Scooping task to be used as a mindfulness practice task for those who find difficulty in performing seated meditation, given that it is an overt task.

Although the findings and implications seem promising, there are some limitations that must be noted. First, as no manipulation check was conducted, we cannot be certain that the aforementioned non-striving or paradoxical effects were indeed evoked. One main reason for not conducting a manipulation check immediately after participants completed the brief intervention task is the concern that such check (e.g., using questions) may result in unexpected priming of psychological effects that affect the performance of secondary task. While we maintain that the paradoxity explanation is plausible for the experimental task given that the design principle of the Infinite Water Scooping Task took reference from concepts of strange loop (Hofstadter, 2007), we acknowledge that a better approach would be to conduct a manipulation check after the secondary task. Secondly, although we argued that the control task is aligned with participants' *modus operandi* of being goal driven and performing the control task could be treated as typical experiences, the control condition can potentially prime goal-focused orientation, and thus the possibility of participant striving as a result cannot be ruled out. It would be worth considering including a neutral control task that does not prime striving and non-striving in future studies. Thirdly, although the main difference between the experimental and control condition is essentially pouring across a string within the same container compared to pouring across containers, the brief instruction of focusing on the process was given to the experimental group but not to the control group. This raises the question whether the observed effects came from the instruction to focus on the process. Although the likelihood is slim as the instruction was only given at the start in a brief fashion, future studies could consider keeping the instructions to focus on the process consistent for both groups to rule out this possibility. Fourthly, there could be an alternative interpretation in that those who underwent the Infinite Water Scooping Task actually strove harder than the control group, since they could have completed the word length comparison task faster due to them putting in more efforts to respond sooner without compromising on overall performance. It is a limitation in that there was no measure of perceived efforts. Nevertheless, since there was no explicit instruction to complete the task fast, we expected participants to direct their effort toward responding correctly as required by the task. The observed speed-accuracy trade-off supports this view. We maintain that the control group indeed made more efforts to count carefully compared to the experimental group, judging by the seemingly better task performance in the control group. In the future, potential mediators such as a perceived effort, drowsiness, sleepiness, and apathy can also be examined.

In summarizing the lessons learned in terms of designing and conducting this research on non-striving, we conclude that non-striving *per se* is very difficult to operationalize, which perhaps explains the lack of such works in the literature. The difficulty roots from the fact that, by definition, non-striving is the antithesis of striving. To understand non-striving without making reference to striving is not possible in some sense. Experimentally, it is difficult to create a neutral control task that is completely devoid of any form of striving to compare non-striving with. Any form of active control task would necessarily involve some level of striving, simply because there would be something to be done. That inevitably primes striving. On the other hand, adopting a passive task of sitting still as a control task is also not suitable as a control condition, because the non-striving condition such as the Infinite Water Scooping task would then be construed as requiring more striving than the control task. To this end, we propose a possible approach to make further inroads into understanding non-striving by focusing on the nature of the secondary task rather than on the manipulations. That is, by examining whether non-striving can manifest even in a seemingly effortless task. Putting it in the context of the present experimental setup, we can get closer to know if we effected non-striving if performance on word length comparison task involving very easy tasks (say, obviously long versus short words) is different for those undergoing control and experimental treatments. The assumption is that easy tasks take very minimal striving to perform. If the Infinite Water Scooping task resulted in even lesser striving for easy tasks, that could be a clearer indication of non-striving. Future studies could explore this approach to advance the operationalization of non-striving.

As mindfulness research advances over the years, many constructs related to mindfulness were examined to better understand mindfulness (Kee et al., 2019). Since works on non-striving have been especially scarce, the current study presents an initial foray into the empirical examination of non-striving in the context of mindfulness practice. The main takeaway message is that the Infinite Water Scooping Task seems instrumental in eliciting non-striving (relative to the control task), which may be useful for teaching mindfulness practitioners about the notion of non-striving. Clearly, more works in examining the nature and practical value of non-striving is warranted. Beyond investigating non-striving in the context of mindfulness, non-striving as a research topic is also relevant for line of research questioning the values of objectives and goals setting, such as works championed by Ordóñez et al. (2009) and Swann et al. (in press). There is a possibility that the line of research questioning the values of goals and non-striving through mindfulness may converge, hopefully illuminating further lessons beyond the duality

#### REFERENCES


of striving and non-striving. The conversations arising from the testing of Infinite Water Scooping Task may also contribute to discussions on altered sense of consciousness-related issues such as *wu-wei*, flow, mystical experiences, and awakening, since there had been discussion of "letting go," non-striving, and such states in neuroscience literature (Austin, 2006, p. 275).

#### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

#### ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Nanyang Technological University – Institutional Review Board. The patients/participants provided their written informed consent to participate in this study.

#### AUTHOR CONTRIBUTIONS

YK, KA, and RF conceptualized and designed the study. KA and RF collected the data. YK and KA analyzed the data. YK, KA, and CL prepared the manuscript. All authors approved the final version of the manuscript for submission.

#### FUNDING

This research was funded by the Nanyang Technological University's NIE AcRF Grant (RI 9/17 KYH).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02314/ full#supplementary-material


Hofstadter, D. R. (2007). *I am a strange loop*. New York, NY: Basic Books.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Kee, Aye, Ferozd and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Neural Processes of Proactive and Reactive Controls Modulated by Motor-Skill Experiences

Qiuhua Yu1,2, Bolton K. H. Chau<sup>2</sup> , Bess Y. H. Lam<sup>2</sup> , Alex W. K. Wong3,4, Jiaxin Peng2,5 and Chetwyn C. H. Chan2,6 \*

<sup>1</sup> Department of Rehabilitation Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China, <sup>2</sup> Applied Cognitive Neuroscience Laboratory, Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, China, <sup>3</sup> Program in Occupational Therapy, Washington University School of Medicine, St. Louis, MO, United States, <sup>4</sup> Department of Neurology, Washington University School of Medicine, St. Louis, MO, United States, <sup>5</sup> Department of Education, Shaoguan University, Shaoguan, China, <sup>6</sup> University Research Facility in Behavioral and Systems Neuroscience, The Hong Kong Polytechnic University, Hong Kong, China

This study investigated the experience of open and closed motor skills on modulating proactive and reactive control processes in task switching. Fifty-four participants who were open-skilled (n = 18) or closed-skilled athletes (n = 18) or non-athletic adults (n = 18) completed a cued task-switching paradigm task. This task tapped into proactive or reactive controls of executive functions under different validity conditions. Electroencephalograms of the participants were captured during the task. In the 100% validity condition, the open-skilled participants showed significantly lower switch cost of response time than the closed-skilled and control participants. Results showed that the open-skilled participants had less positive-going parietal cue-locked P3 in the switch than repeat trials. Participants in the control group showed more positive-going cuelocked P3 in the switch than repeat trials, whereas the closed-skilled participants had no significant differences between the two types of trials. In the 50% validity condition, the open- and closed-skilled participants had less switch cost of response time than the control participants. Participants in the open- and closed-skilled groups showed less positive-going parietal stimulus-locked P3 in the switch than repeat trials, which was not the case for those in the control group. Our findings confirm the dissociation between proactive and reactive controls in relation to their modulations by the different motor-skill experiences. Both proactive and reactive controls of executive functions could be strengthened by exposing individuals to anticipatory or non-anticipatory enriched environments, suggesting proactive and reactive controls involved in motor-skill development seem to be transferable to domain-general executive functions.

Keywords: proactive control, reactive control, task switching, ERP, motor skills

# INTRODUCTION

A high level of motor skills has been associated with improved executive functions. Sports are physical activities that require high levels of motor skills. Individuals who engaged in fencing (Chan et al., 2011), baseball (Kida et al., 2005), and soccer were reported to have better executive functions than the controls (Verburgh et al., 2014). Researchers have suggested that the enhancement in

#### Edited by:

Stephen Fairclough, Liverpool John Moores University, United Kingdom

#### Reviewed by:

Chun-Hao Wang, National Cheng Kung University, Taiwan Valentina Bianco, Santa Lucia Foundation (IRCCS), Italy

> \*Correspondence: Chetwyn C. H. Chan Chetwyn.Chan@polyu.edu.hk

#### Specialty section:

This article was submitted to Motor Neuroscience, a section of the journal Frontiers in Human Neuroscience

Received: 05 September 2019 Accepted: 29 October 2019 Published: 14 November 2019

#### Citation:

Yu Q, Chau BKH, Lam BYH, Wong AWK, Peng J and Chan CCH (2019) Neural Processes of Proactive and Reactive Controls Modulated by Motor-Skill Experiences. Front. Hum. Neurosci. 13:404. doi: 10.3389/fnhum.2019.00404

executive functions could be related to the neural plasticity brought about by the long-term aerobic fitness and cognitive trainings involved in these sports (Voss et al., 2010; Chan et al., 2011). Common features of the physical activities described above are playing facing opponents and in changing external environments, which demand open motor skills. Open motor skills involve generation of physical responses to dynamically and externally paced environment (Wang et al., 2013a; Yu et al., 2017). Contrary to open motor skills is closed motor skills, which requires participants to generate responses that are relatively consistent, stationary, and self-paced (Wang et al., 2013a; Yu et al., 2017). The typical physical activities involving closed motor skills are swimming, and track and field. In view of the differences between open and closed motor skills, it is intuitive that participants of "closed" physical activities would have gained lower level of executive functions than those of "open" physical activities. For instance, open-skilled participants were revealed to have higher levels of inhibitory control than the closedskilled counterparts (Yamashiro et al., 2015). Nevertheless, in an earlier behavioral study, we reported that, by employing a dual cognitive control model for executive functions (Chevalier et al., 2015; Tarantino et al., 2016), both the open- and closedskilled participants showed significantly higher levels of reactive control of executive functions than the controls (Yu et al., 2017). The differences between the two experimental groups are in the proactive versus reactive control. In this paper, we aimed to employ electroencephalogram to understand the neural mechanisms underlying how different types of motor skills would associate with proactive or reactive controls among participants engaging in physical activities.

Proactive control is an early selection process, which can optimally bias attention, perception, and action systems in a goal-driven manner (Braver, 2012). Information selected early in the process is to be deployed before the response-demanding event occurs. In contrast, reactive control is to resolve the interference imperatively after the response-demanding event appears (Braver, 2012). Physical activities dominated by open motor skills involve predictions of outcomes upon the actions of opponents and teammates for producing rapid and accurate responses (Jin et al., 2011; Abreu et al., 2012; Yu et al., 2016). These cognitive processes are comparable to those described in proactive control. In contrast, the physical activities dominated by closed motor skills less involve the prediction of actions of others before giving the responses. In addition, open motor skills require the participants to rapidly inhibit inappropriate actions, and switch from an intended movement to an appropriate one in an unpredictable environment (Taddei et al., 2012; Yu et al., 2017). The imperative inhibition and switching process are comparable to those described in reactive control. Tsai and Wang (2015) explored the effects of open- (e.g., badminton and table tennis) or closed-skilled (e.g., jogging and swimming) training in the reactive control of task switching for elderly subjects. Their results showed that open-skilled group had larger P3 amplitude in switch condition than closed-skilled and control groups, but had comparable P3 amplitude in repeat condition with the other two groups. All the studies revealed employed a unidimensional perspective of executive control. The results are that open-skilled groups had higher level of executive functions in reactive control associated with enhanced P3 than closed-skilled groups. No study has been conducted to explore and explain the possible gains in different time processes of executive functions differentially in the open- and closed-skilled groups. The dual cognitive control model of executive function offers a theoretical basis for addressing the potential differentiation in the cognitive gains due to the engagement in these physical activities.

Neural processes related to proactive and reactive controls can be examined by using a cued task-switching paradigm with electroencephalogram (EEG). The cued task-switching paradigm involves participants predicting a switch of task rule based on the information embedded in the cue and subsequently giving a response according to the new rule. The task cue can be fully predictive (100% validity) or fully non-predictive (50% validity), which makes possible elicitation of the eventrelated potentials (ERPs) for reflecting the proactive or reactive control processes, respectively. The common ERPs associated with proactive control in the task-switching paradigm reported are the parietally distributed P3 elicited by the cue (called cuelocked P3) (Gajewski and Falkenstein, 2011; Tarantino et al., 2016) and the frontocentrally distributed contingent negative variant (CNV) (Tarantino et al., 2016). In contrast, the frontally distributed N2 (Hsieh and Wu, 2011) and parietally distributed P3 (called stimulus-locked P3) (Scisco et al., 2008; Tarantino et al., 2016) elicited by the stimulus were reported to associate with the reactive control in task switching. The cue-locked P3 can be identified within the 300–600 ms time-window after the appearance of a predictive cue (West et al., 2012; Tarantino et al., 2016). This component was suggested to reflect task reconfiguration – anticipatory updating of task goals and/or action rules in working memory (Nicholson et al., 2006; Gajewski and Falkenstein, 2011). The CNV is a slow wave elicited prior to the onset of the target stimulus (Funderud et al., 2013). It reflects anticipatory attention and motor preparation for the upcoming target stimulus (Funderud et al., 2013; Grane et al., 2016). Thus, the amplitude of CNV could be modulated by the responserelated parameters embedded in the task cue (Scheibe et al., 2009; Linssen et al., 2013), e.g., the cue validity. Previous studies also showed that Bereitschaftspotential (BP), comparable to CNV, was more negative-going in athletes than non-athletes (Bianco et al., 2017a,b). This finding suggested that the athletes would have better motor preparation than the non-athletes. However, Wang et al. (2013b) did not reveal significant differences in motor preparation between the open- and closed-skilled players.

The N2 component has been employed as a marker reflecting reactive control – suppression of conflict responses (Rushworth et al., 2002; Hsieh and Wu, 2011). More negative-going frontal N2 was shown to associate with the switching to a new response set. The stimulus-locked P3 can be identified after the appearance of a response-demanding stimulus (Swainson et al., 2006; Tarantino et al., 2016). In the predictive cue condition, the stimulus-locked P3 accounts for stimulus-response set implementation (Jamadar et al., 2010; Gajewski and Falkenstein, 2011; Tarantino et al., 2016) or task-specific evaluation of a target stimulus (Swainson et al., 2006). The amplitude difference of stimulus-locked P3 between switch and repeat trials was negatively related to the switch cost

of the response time when the cue was predictive (Li et al., 2012), suggesting that stimulus-locked P3 in reactive control was related to the performance in task switching. In the non-predictive cue condition, stimulus-locked P3 is associated with updating of the task goal or task rules (Hillman et al., 2006; Scisco et al., 2008). It was more positive-going in the switch than repeat trials, because more attentional resources required for subsequent memory updating in reactive control (Hillman et al., 2006; Scisco et al., 2008). However, Kamijo and Takeda (2010) reported intensive experience of physical training, regardless of type of sport, showed less positive stimulus-locked P3 in the switch than repeat trials in an alternating-runs switching paradigm. The reason was likely that these studies showed differences in the task difficulties, which P3 component was sensitive to Kamijo and Takeda (2010).

In this study, a cued task-switching task was employed for eliciting the proactive and reactive control processes modulated by the participants with experience of open- or closed-skilled physical activities. As the participants had received intensive training of two types of motor skills, we hypothesized that in the trials with predictive cues (100% validity), the openskilled participants would show fewer positive differences in the between-trial (switch verse repeat) cue-locked P3 compared with the closed-skilled participants due to employing more anticipation in the open-skilled training. It was also hypothesized that in the trials with non-predictive cues (50% validity), the open-skilled participants would show fewer positive differences in the between-trial stimulus-locked P3 than the closedskilled and control ones due to the imperative switch in the unpredictable environment. More negative-going CNV in the open- and closed-skilled than controls participants were anticipated due to better preparation in the former two groups. Comparable N2 amplitudes were anticipated in the open- and closed-skilled participants. The performances in the neural processes of the 75% validity condition would be between those of the 100 and 50% validity conditions.

# MATERIALS AND METHODS

#### Participants

Fifty-four university students were recruited via convenience sampling. Among them, 18 (8 females and 10 males) were members of the university badminton team (open-skilled group), 18 (7 females and 11 males) were members of the university track and field team (closed-skilled group), and 18 (9 females and 9 males) declared they had not engaged in any professional or amateur sport (control group). The selection of badminton and track and field athletes as the open- and closed-skilled participants was made reference to those recruited in Wang and Tu (2017) and Wang et al. (2017). The results obtained would have more meaningful comparisons with those reported by Wang and Tu (2017) and Wang et al. (2017). Participants in each group had matched age and education levels (**Table 1**). The participants of the open- and closed-skilled groups had five or more years for professional motor skill practices. Each athlete had won prizes in open competitions and had no regular training in other sports. The levels of skill competences (in terms of winning international/local sport awards) were comparable between two groups (n = 2/16 for open-skilled versus n = 3/15 for closed-skilled). All of the participants were right-handed with normal or corrected-to-normal visual acuity, and had no history of neurological or cardiovascular disorders. Participants were not on regular medication. The participant's cardiorespiratory fitness was assessed by the Queen's College step test, which had been introduced in Yu et al.'s (2017). Ethical approval of this study was granted by the Departmental Research Committee of The Hong Kong Polytechnic University. Written informed consent was obtained from the participants before commencing the experiment for data collection.

# Experimental Task

This study used a cued task-switching paradigm to manipulate the proactive and reactive controls. Details of the task design were described in Yu et al. (2017). The time course of one typical trial is summarized in **Figure 1**. A trial began with presentation of a task cue (4 cm × 4 cm) at the center of the screen for 1500 ms. Then the task cue was replaced by a target stimulus (4 cm × 4 cm). Upon presentation of the target stimulus, the participant was asked to give a two-key sequential response correctly as soon as possible within 3000 ms. The next trial began once the response from the participant was registered. Next, a blank interval of 1000 ms appeared before the onset of the next trial.

In this paradigm, proactive and reactive controls were manipulated by means of cue validities. Three cue validities were used: 100, 75, and 50%. A 100% valid cue appeared as a solid square (or diamond ), a 75% valid cue appeared as a hollow square (or diamond ), and a 50% valid cue appeared (as a solid star ). The participants were asked to prepare the response selection rules based on the cue (except for a 50% valid cue). A 100% valid cue meant the rule sets conveyed in the cue would be the same as those that appeared in the target stimulus. In this case, the participant could prepare to repeat the same task rule as the previous trial or switch to a new task rule based on the rule set conveyed in the cue. Thus, a 100% valid cue would elicit more proactive but fewer reactive control processes. A 50% valid cue meant that no information on the task sets in the subsequent response was provided. In such an ambivalent situation, the participant could not give a specific preparation for the response. The participant could repeat the same task rule or switch to a new task rule based on the target stimulus. It was expected that a 50% valid cue would elicit reactive rather than proactive control processes. A 75% valid cue was expected to elicit mental processes that were a combination of those in the 50 and 100% validity conditions. A digit (1 or 2) that appeared inside the shape (a square or diamond) formed the target stimulus (**Figure 1**). Two sets of response selection rules were used. Each rule involved two sets of two-key sequential responses delivered by the target stimulus with the same shape (square or diamond). For instance, one response selection rule was that a square with digit "1" was for the participant to press the "z" and then "n" keys on the keyboard, and a square with digit "2" was "x" and then "m"; the other response selection rule was that a diamond with digit "1" was "x" and then "n", and a diamond with digit "2" was "z" and then "m." The mappings

#### TABLE 1 | Demographic characteristics of the open-skilled, closed-skilled and control participants.


<sup>∗</sup>Denotes the regular motor skill practices under the coach guidance.

between cue stimulus (square or diamond) and two response selection rules were counterbalanced across participants. Only the 75% validity condition had congruent and incongruent trials. Congruent trials featured the same hollow square or diamond shapes appearing in the cue and target. Incongruent trials were when the shapes displayed in the cue and target differed. Only valid congruent trials in the 75% validity condition were included in the data analyses. Each trial was defined as "repeat" or "switch" depending on the response selection rule. A repeat trial was that the response selection rule was the same as that in the previous trial; whereas that of a switch trial was different from the previous trial.

The ratio of the three types of cue validities was 1:1:1. There was same ratio of repeat to switch trials in each block. Trials were organized in counterbalanced orders and grouped into nine blocks, and each block had 140 trials. It took around 9 min to complete one block followed by a 4-to-5 min break. NeuroScan Stim2 software (NeuroScan, Inc., Sterling, VA, United States) was used for the fabrication of the trials. The time for completing the experimental task was approximately 1.5 h.

#### Data Collection Procedures Preparation

Each participant was asked to complete the demographic information sheet, which included years for professional motor skill practices, hours for professional motor skill practices per week, sports categories, and other expertise outside of sports (**Table 1**). Before engaging in the experimental task, each participant sat on a comfortable chair in front of a table inside a dimly lit and electrically isolated sound-proof chamber. A 15 inch computer monitor for showing trials was placed on the table at a distance of 65–75 cm. Each participant was required to first complete 100 practice trials. Standardized instructions and feedback were given to the participant throughout the training block. This was followed by a test block in which the participant completed 50 task trials and reached 90% accuracy before entering into the experiment. If the participant achieved an accuracy rate of less than 90%, he or she repeated the training block. The participant was also reminded to minimize eye blinks and to keep his or her eyes at the center of the monitor throughout the task.

#### Acquisition of ERP Data

fnhum-13-00404 November 12, 2019 Time: 17:4 # 5

Participants' EEG signals were captured by a 64-channel Quik-cap equipped with 90 mm Ag/AgCl sintered electrodes, SynAmps2 Digital DC EEG amplifier, and Curry 7 software (NeuroScan, Inc., Sterling, VA, United States). Vertical and horizontal electrooculograms (EOGs) were captured with two pairs of electrodes placed on the supra- and infra-orbital areas of the left eye and the left and right orbital rims of both eyes, respectively. A ground electrode was positioned on the forehead in front of the Cz electrode. All the channels were referenced to the electrodes on the left and right mastoids. The EEG and EOG signals were sampled at a rate of 1000 Hz/channel. All EEG/EOG electrode impedances were set to below 5 k. EEG signals were recorded from the beginning of each block of experimental tasks. The timing of all stimuli was recorded by Curry 7 software.

Offline signal preprocessing also employed Curry 7 software. EEG signals were digitally filtered with a band pass from 0.01 to 30 Hz. The covariance analysis algorithm was used when eye movement was detected. Then the EEG signals were segmented into the cue- and stimulus-locked epochs. Cuelocked epochs were defined as −200 ms before the cue to 1,500 ms after the cue, and stimulus-locked epochs were defined as −200 ms before the target to 1,000 ms after the target. Baseline corrections were referenced to the pre-stimulus interval. Epochs with an amplitude exceeding ± 80 µv and trials with incorrect responses were excluded from the subsequent averaging procedure. The cue- and stimulus-locked waveforms of each electrode were averaged separately for three cue validities (100, 75 versus 50%) and two task conditions (repeat versus switch). The number of epochs extracted for data analysis for each cue validity in each of the repeat or switch trials was around 140 for each group.

#### Data Analysis

As the behavioral data of this study shared the same data set of a previous study conducted by the same research team, the detailed methods of analyzing the behavioral results of the participants can be found in Yu et al. (2017) and will not be repeated here. Analyses of the ERP data included the cue-locked P3, CNV, N2 and the stimulus-locked P3 elicited when participants engaged in the behavioral task. Independent component analysis (ICA) was conducted to confirm the timewindows set for extracting signals related to the cue-locked P3 (350–550 ms post-cue), CNV (1200–1500 ms post-cue), N2 (200–300 ms post-target), and stimulus-locked P3 (300–600 ms post-target). A short time-window was set for the cue-locked P3 for lowering the possible interferences to the CNV, as ICA results showed a slight overlap in the time-windows of these two components. In the analysis, the electrodes at the midline sites (Fz, Cz, and Pz) were included making reference to the results of previous studies that switch effects were maximal at the midline electrode sites (Gajewski and Falkenstein, 2011; Hsieh and Wu, 2011; Li et al., 2012). A four-way repeated measures ANCOVA for validity (100, 75 versus 50%) × trial (repeat versus switch) × site (Fz, Cz versus Pz) × group (open-skilled, closed-skilled versus control) was conducted to test the mean amplitudes of the cue-locked P3, CNV, N2, and stimulus-locked P3. The years of participants' professional motor skill practices was the only covariate entered because of the significant between-group differences in this variable (**Table 1**). Another reason was that the years of professional motor skill practices rather than the MBI and VO2max was revealed to significantly predict the participants' performances on the behavioral tasks in Yu et al. (2017). Post hoc pairwise comparisons with the Bonferroni adjustment were applied when significant main or interaction effects were observed. This study only included the amplitudes rather than latencies in the ERP data analyses because previous study reported no significant differences between groups and between trials on the P3 latency (Scisco et al., 2008). As the switch cost of response time was related to amplitudes of cue- or stimulus-locked P3 components (Jamadar et al., 2010; Li et al., 2012), the present study examined the relations among cue- and stimulus-locked P3 and behavioral performance (i.e., switch cost of response time) for different motor skills by using a hierarchical, stepwise regression analysis for each of the validity conditions. For each regression equation, two regressors were the mean amplitude differences between switch and repeat trials of cue- (cP3S-cP3R) and stimulus-locked P3 (sP3S-sP3R); two other regressors were the identities of the open- and closed-skilled groups (with the control group as the reference). The two-way interaction terms for the neural processes and group identities were cP3S-cP3<sup>R</sup> × openskilled; cP3S-cP3<sup>R</sup> × closed-skilled; sP3S-sP3<sup>R</sup> × open-skilled; and sP3S-sP3<sup>R</sup> × closed-skilled. The variance inflation factor (VIF) ≥ 10 and Pearson's correlation ≥ 0.85 were considered as indicators of strong multicollinearity between any of the two independent variables in a hierarchical regression model. The results showed no variable displayed strong multicollinearity, suggesting that none of the variables were related to each other. All analyses were performed with IBM SPSS statistics version 20.0 (IBM, Chicago, IL, United States).

#### RESULTS

The main behavioral variable was the switch cost of response time, which was defined as the difference in the reaction times between the switch and repeat trials. Two-way repeated measure ANOVA of validity (100, 75 versus 50%) × group (open-skilled, closed-skilled versus control) testing the effects on the switch cost of response times indicated that the validity × group effect on the switch cost of response times was marginally significant (p = 0.053). Participants in the open-skilled group showed significantly fewer switch cost values than the closedskilled (p = 0.023) and control (p < 0.001) groups in the 100% validity condition (**Figure 2**). Participants in both the open- (p < 0.001) and closed-skilled (p = 0.033) groups showed

significantly fewer switch cost values than the control group in the 50% validity condition (**Figure 2**). No significant differences in switch costs were revealed between the open- and closed-skilled groups (p = 0.473) (**Figure 2**). Their details can be found in Yu et al. (2017).

#### Cue-Locked P3 (350–550 ms)

**Figure 3** presents topographic maps (3A) and waveforms (3B) of the cue-locked P3 for the open-skilled, closed-skilled, and control groups. The covariate of years of professional motor skill practices [F(1,50) = 0.54, p = 0.466, η 2 <sup>p</sup> = 0.011] was not significant. The validity × trial × site × group effect [F(6.264,156.602) = 2.66, p = 0.016, η 2 <sup>p</sup> = 0.096] was found significant. The site main effect was also significant [F(1.488,74.389) = 9.61, p = 0.001, η 2 <sup>p</sup> = 0.161]. However, the validity [F(2,100) = 0.13, p = 0.880, η 2 <sup>p</sup> = 0.003], trial [F(1,50) = 0.002, p = 0.968, η 2 <sup>p</sup> < 0.001], and group effects [F(2,50) = 0.42, p = 0.657, η 2 <sup>p</sup> = 0.017] on the amplitudes of cue-locked P3 were not significant.

Post hoc analyses on trial × site × group effect were conducted separately at each level of validity. The trial × site × group effect was only significant in the 100% validity condition [F(3.351,85.442) = 5.46, p = 0.001, η 2 <sup>p</sup> = 0.177] but not in the 75% [F(2.752,70.179) = 1.11, p = 0.348, η 2 <sup>p</sup> = 0.042] and 50% validity conditions [F(3.249,82.856) = 0.13, p = 0.952, η 2 <sup>p</sup> = 0.005]. Hence, the trial × group effect for the 100% validity condition was further examined at each of the electrode sites. The trial × group effect was significant at Fz [F(2,51) = 6.82, p = 0.002, η 2 <sup>p</sup> = 0.211] and Pz [F(2,51) = 8.03, p = 0.001, η 2 <sup>p</sup> = 0.239] but not at Cz [F(2,51) = 1.32, p = 0.28, η 2 <sup>p</sup> = 0.049]. At Fz, the open-skilled group showed marginally less positive-going cue-locked P3 in the switch than repeat trials (p = 0.056) in the 100% validity condition, whereas the closed-skilled group showed an opposite trend whereby the cue-locked P3 was significantly more positivegoing in the switch than repeat trials (p = 0.003). The control group did not show significant between-trial-type differences in the amplitudes of cue-locked P3 at Fz (p = 0.822) (**Figure 3C**). At Pz, the open-skilled group showed a less positive-going cuelocked P3 in the switch than repeat trials (p = 0.011). The closed-skilled group, however, did not show significant betweentrial-type differences in the amplitudes of cue-locked P3 at Pz (p = 0.523). The control group showed significantly more positive-going cue-locked P3 at Pz in the switch than repeat trials (p = 0.004).

#### CNV (1200–1500 ms)

In CNV, the covariate of years of professional motor skill practices was found significant [F(1,50) = 4.21, p = 0.045,

η 2 <sup>p</sup> = 0.078]. The validity × trial × site × group effect [F(6.118,152.946) = 0.55, p = 0.772, η 2 <sup>p</sup> = 0.022] on the amplitudes of CNV was not significant (**Figure 4**). The validity effect [F(2,100) = 7.20, p = 0.001, η 2 <sup>p</sup> = 0.126] was significant. However, trial [F(1,50) = 0.06, p = 0.811, η 2 <sup>p</sup> = 0.001], site [F(2,100) = 1.54, p = 0.221, η 2 <sup>p</sup> = 0.030], and group effects [F(2,50) = 1.06, p = 0.354, η 2 <sup>p</sup> = 0.041] were not significant. Post hoc analysis showed that CNV in the 100% validity condition was more negativegoing than those in 75% (p < 0.001) and 50% (p < 0.001) validity conditions.

#### N2 (200–300 ms)

fnhum-13-00404 November 12, 2019 Time: 17:4 # 8

The covariate of years of professional motor skill practices was found not statistically significant [F(1,50) = 0.29, p = 0.596, η 2 <sup>p</sup> = 0.006]. The validity [F(1.656,82.825) = 1.85, p = 0.170, η 2 <sup>p</sup> = 0.036], trial [F(1,50) = 1.63, p = 0.208, η 2 <sup>p</sup> = 0.031], group [F(2,50) = 0.42, p = 0.658, η 2 <sup>p</sup> = 0.017], and the validity × trial × site × group effects [F(5.866,146.655) = 1.23, p = 0.295, η 2 <sup>p</sup> = 0.047] on the amplitudes of N2 were also not significant. However, the site [F(1.275,63.730) = 5.86, p = 0.012, η 2 <sup>p</sup> = 0.105], and validity × trial × site × covariate (years of professional motor skill practices) effect were significant [F(2.933,146.655) = 2.80, p = 0.043, η 2 <sup>p</sup> = 0.053]. Post hoc analysis showed that N2 in 100% validity conditions was more negativegoing in the switch than repeat trials at Fz (p = 0.034), Cz (p < 0.01), and Pz (p < 0.01), but this effect was not found in 75 and 50% validity conditions. In both 75 and 50% validity conditions, N2 was more negative-going at Fz (ps < 0.05) than Cz and Pz. **Figure 5A** presents the topographic maps of the N2 component.

#### Stimulus-Locked P3 (300–600 ms)

The topographic maps (5A) and waveforms (5B) of the stimuluslocked P3 for the open-skilled, closed-skilled, and control groups are presented in **Figure 5**. The covariate of years of professional motor skill practices was significant [F(1,50) = 4.46, p = 0.040, η 2 <sup>p</sup> = 0.082]. The validity × trial × site × group effect [F(5.671,141.771) = 2.25, p = 0.045, η 2 <sup>p</sup> = 0.082] on the amplitudes of stimulus-locked P3 was found significant. The site [F(1.474,73.719) = 4.14, p = 0.031, η 2 <sup>p</sup> = 0.076] and group effects [F(2,50) = 5.27, p = 0.008, η 2 <sup>p</sup> = 0.174] were significant. The trial effect was marginally significant [F(1,50) = 3.93, p = 0.053, η 2 <sup>p</sup> = 0.073]. However, the validity [F(2,100) = 0.33, p = 0.719, η 2 <sup>p</sup> = 0.007] effect was not significant.

By adjusting years of professional motor skill practices, post hoc analysis showed significant trial × site × group effect only in the 50% validity condition [F(3.267,81.670) = 3.48, p = 0.017, η 2 <sup>p</sup> = 0.122] (**Figure 5C**), but not in the 100% [F(2.824,70.610) = 1.81, p = 0.322, η 2 <sup>p</sup> = 0.045] and 75% validity conditions [F(2.897,72.437) = 1.37, p = 0.260, η 2 <sup>p</sup> = 0.052]. Interestingly, the results were different from those found in cuelocked P3, in which the same three-way interaction effect was significant in the 100% validity condition. The trial × group effect in the 50% validity condition was further tested separately for Fz, Cz, and Pz. The trial × group effect was found significant at the Cz and Pz [F(2,51) = 3.52, p = 0.037, η 2 <sup>p</sup> = 0.121; F(2,51) = 4.14, p = 0.021, η 2 <sup>p</sup> = 0.140, respectively], but not at the Fz [F(2,51) = 2.02, p = 0.143, η 2 <sup>p</sup> = 0.073]. At Pz, both the open-skilled (p = 0.008) and closed-skilled (p = 0.002) groups showed significantly less positive-going stimulus-locked P3 in the switch than repeat trials (**Figure 5C**). The control group, however, showed no significant between-trial-type difference in the stimulus-locked P3 amplitudes at Pz (p = 0.630). At Cz, the open-skilled (p < 0.001) and closed-skilled (p = 0.038) groups had significantly less positive-going stimulus-locked P3 in the switch than repeat trials; whereas the control group did not show any significant between-trial-type differences in the amplitudes of stimulus-locked P3 (p = 0.838).

#### Hierarchical Stepwise Regression

In the 100% validity condition, the regression model was significant, R <sup>2</sup> = 0.226, F(4,49) = 3.579, p = 0.012, with the only significant regressor in the model being the open-skilled group as a group identity (β = −0.573, p = 0.001) (**Table 2**). Other regressors (e.g., cP3S-cP3<sup>R</sup> and sP3S-sP3R, closed-skilled group) were not significant (| β| < 0.285, ps > 0.057). The changes in the variance explained by the open-skilled group regressor were also significant [1R <sup>2</sup> = 0.182, F(4,45) = 3.455, p = 0.015]. The effect of cP3S-cP3<sup>R</sup> × open-skilled was significant (β = 0.475, p = 0.007), whereas cP3S-cP3<sup>R</sup> (β = −0.431, p = 0.072) and cP3S-cP3<sup>R</sup> × closed-skilled (β = 0.247, p = 0.205) did not show significant impacts (**Figure 6A**). These results suggested that the cP3S-cP3R, which was associated with proactive control for task switching, showed significant correlation with the switch cost of response times among the open-skilled participants but not among the closed-skilled and control participants.

In the 50% validity condition, the regression model was significant, R <sup>2</sup> = 0.211, F(4,49) = 2.782, p = 0.037, for both the open- and closed-skilled groups as group identities were identified as significant regressors (β = −0.474, p = 0.003; β = −0.397, p = 0.012, respectively) (**Figure 6B**). The other two regressors, cP3S-cP3<sup>R</sup> and sP3S-sP3R, were not significant (β = −0.008, p = 0.949; β = −0.062, p = 0.643, respectively). The changes in the variance are explained by the significance of the open- and closed-skilled groups' regressors [1R <sup>2</sup> = 0.157, F(4,45) = 3.057, p = 0.043]. The effects ofsP3S-sP3<sup>R</sup> (β = −0.550, p = 0.020), open-skilled (β = −0.363, p = 0.031), closed-skilled (β = −0.347, p = 0.023), sP3S-sP3<sup>R</sup> × open-skilled (β = 0.464, p = 0.022), andsP3S-sP3<sup>R</sup> × closed-skilled (β = 0.519, p = 0.012) were the significant predictors, suggesting that the correlations between sP3S-sP3<sup>R</sup> and the switch cost of response time in the open- and closed-skilled groups were significantly different from those of the control group. A follow-up analysis suggested that such correlation was significantly negative in the control participants (r = −0.534, p = 0.022), but not significant in the open-skilled (r = 0.262, p = 0.294) and closed-skilled (r = 0.274, p = 0.270) participants (**Figure 6B**).

In the 75% validity condition, the regression model was also significant, R <sup>2</sup> = 0.255, F(4,49) = 4.203, p = 0.005, with cP3S-cP3<sup>R</sup> (β = 0.320, p = 0.023) and sP3S-sP3<sup>R</sup> (β = −0.378, p = 0.004)

being significant regressors in the model. No other significant regressors were found [R <sup>2</sup> = 0.277, F(8,45) = 2.151, p = 0.050] (**Figure 6C**). Both cP3s-cP3r (r = 0.317, p = 0.020) and sSP3ssP3r (r = −0.333, p = 0.014) showed significant correlations with the participants' switch cost of response times, regardless of the subgroups.

# DISCUSSION

The current study investigated how motor skill experiences modulated proactive and reactive controls of executive function in healthy adults. New findings are that the open-skilled participants, when compared with the other two groups, showed significantly less positive-going parietal cue-locked P3 in switch than repeat trials, which coupled with better performances on task-switching in the predictive condition. These suggest that proactive control was unique to the open-skilled participants. It appears that they might have been able to deploy fewer attentional resources in proactively updating the new action rule than the closed-skilled participants. These findings corroborate with the results of the regression analysis, which show that better proactive control for task switching was associated with the between-trial difference in the cue-locked P3 amplitudes for the open- but not closed-skilled participants. On the contrary, in the non-predictive condition both the open- and closed-skilled participants showed significantly less positivegoing parietal stimulus-locked P3 in the switch than repeat trials, which could not be found in the control participants. These results indicate that, prior experiences in motor skill training, regardless of the types of motor skills developed, would result in fewer deployments of attentional resources for reactively updating the new action rule under non-anticipatory circumstances. Our findings further confirm the dissociation of proactive and reactive controls in relation to their modulations by different motor-skill experiences. In particular, both proactive and reactive controls of executive functions could be enhanced by intensively exposing individuals to anticipatory and nonanticipatory enriched environments.

The results of the 75% validity condition will not be discussed because no significant findings were revealed in the comparisons of the ERP data.

# Proactive Control and Open-Skill Experience

The experimental task used in the present study required the participants to switch between two sets of action rules. Our ERP results show less positive cue-locked P3 in switch than repeat trials observed among the open-skilled participants, which was not the case in the closed-skilled and control groups. These findings indicate that open-skilled participants deployed fewer attentional resources when proactively updating the new action rule than their closed-skilled counterparts. In addition, behavioral results on the same groups of participants reported by in Yu et al. (2017) showed that open-skilled participants exhibited smaller switch cost of response times in the predictive condition (100% validity) than closed-skilled participants. The ERP findings of the cue-locked P3 and the published behavioral

TABLE 2 | Results of hierarchical stepwise regression of amplitudes of the cue- and stimulus-locked P3 for predicting the switch cost of response times in the 100, 50, and 75% cue validity conditions.


<sup>∗</sup> p < 0.050; ∗∗p < 0.010; RT denotes response time; cP3S-cP3<sup>R</sup> denotes difference of mean amplitudes between switch and repeat trial-types of cue-locked P3; sP3S-sP3<sup>R</sup> denotes difference of mean amplitudes between switch and repeat trial-types of stimulus-locked P3; 100% denotes 100% valid cue; 75% denotes 75% valid cue; 50% denotes 50% valid cue.

data suggest higher efficiency in updating the new action rule in proactive control than the closed-skilled participants. Our results are consistent with those reported in previous studies, which found participants engaged in open-skilled physical activities have higher efficiency in the updating process related to motor preparation. Open-skilled participants were proposed to have better motor-reprograming processes in terms of smaller timing errors than closed-skilled participants (Nakamoto and Mori, 2012). Jin et al. (2011) also found that professional badminton players had more accurate judgments of the placement of badminton strokes and larger P3a amplitude in the proactive anticipation than non-professional players, suggesting good anticipation ability. Mcrobert et al. (2011) also reported the openskilled participants had the higher online updating ability, with which they were more adaptive to dynamic environments in the anticipation. Bertollo et al. (2016) further explained that the higher efficiency in switching between the automatic and controlled processes among sport experts than amateurs is a reason for the former to proficiently adapt to the changing environment in open sports. When compared with amateur athletes, the high efficiency found among the open-skilled athletes was revealed to result in the employment of less cognitive resources in controlled processes (Babiloni et al., 2010).

In the present study, the open-skilled participants are badminton athletes whose experience is in updating opponent's changed kinematics information and overcoming interferences from previous deceptive movement patterns (Müller and Abernethy, 2012). They succeeded in winning the game mostly depending on how well they update environmental changes and anticipate their opponent's actions (Bianco et al., 2017b). With such a background, the open-skilled participants tend to deploy fewer attentional resources when updating the new action rules in proactive control for the switch trials, which was not the case in the closed-skilled and control participants. The significant positive correlations between the amplitudes of cue-locked P3 and the switch cost of response time in our results were revealed only in the participants of the open-skilled group, which further substantiate the uniqueness of proactive control to the openskilled participants.

No differences were revealed in the between-trial and between-group comparisons for the CNV. The non-significant between-group difference, supported by those reported in other

study (Bianco et al., 2017b), suggests that the level of the top–down attentional control preparing for task switching was comparable across the three groups of participants. No difference between switch and repeat trials indicates that motor preparation levels were not affected by switch and repeat conditions, which was consistent with the findings in

Gajewski and Falkenstein (2015). These results further support the notion that the CNV appears to be not an important neural marker related to the executive functions in the proactive control process. The years of professional motor skill practices is a significant covariate for CNV component, which suggests that participants with professional motor skill practices, regardless of any type of motor skills, had more negative-going CNV than those without professional motor skill practices. These findings were in line with those reported by Bianco et al. (2017a,b). The tonic activity in motor preparation was related to speed control in pre-supplementary motor area (Bianco et al., 2017a). In the present study, the speed requirement in both open (badminton) and closed (most were runners) skills was high, which may contribute to no group differentiation in CNV amplitude.

# Reactive Control and Motor Skill Experiences

The open- and closed-skilled participants showed less positivegoing parietal stimulus-locked P3 in the switch than repeat trials in the non-predictive condition. Less positivity in the switch than repeat trials could not be found in the control group, who reported having no habit of practicing any types of physical activities. The results indicate that fewer attentional resources would have been deployed by the participants in updating the action rule in reactive control for switching in both open- and closed-skilled groups than in the control group. No significant differences were shown between these two motor-skilled groups, however. The comparable performance in stimulus-locked P3 for the open- and closed-skilled groups was not in line with our hypothesis, but was partly consistent with the findings in Wang et al.'s (2017). The findings of Wang and colleagues study revealed that open- and closed-skilled participants had comparable frontal N2 component and theta power for reactive control in a flanker task. The open-skilled participants, however, showed greater theta phase coherence (0–500 ms, 4 Hz; 300–400 ms, 5 Hz) for incongruent trials compared to congruent trials, but this effect could not be found in the closed-skilled group. These findings suggested that the dissociation between open- and closed-skilled participants appears to be the stability level of neural process rather than the level of allocated cognitive resources. Compared with those engaged in closed motor skills, the open motor skills required the participants to give a response within a limited time (Jin et al., 2011). Thus, the superior performance in reactive control of open-skilled participants could be showed in the paradigm with short interval between the response and the next target stimulus, like 500 ms in Tsai and Wang's (2015), but not long interval between the response and the next target stimulus, like 2500 ms in the present study.

The behavioral results reported in Yu et al. (2017) indicated that open- and closed-skilled participants had significantly smaller switch cost of response times than controls. The ERP findings reported in this study reveal that both open-and closedskilled groups had higher efficiency in updating the new action rule in reactive control, reflected by less positive-going stimuluslocked P3. These findings were supported by those reported in the previous studies (Kamijo and Takeda, 2010; Zhang et al., 2015). Kamijo and Takeda (2010) reported that the participants with regular physical training showed less switch cost and less positive stimulus-locked P3 in the switch than repeat trials than the sedentary controls, suggesting better reactive control in task switching. One plausible reason to account for the enhanced reactive control among the participants could have been the inevitable gains in cardiorespiratory fitness due to the intensive training received by both open- and close-skilled groups (Scisco et al., 2008; Tsai and Wang, 2015). Zhang et al. (2015) indicated that experienced fencers deployed less cognitive effort in the reactive inhibition process (as reflected from the less positivegoing P3 in Nogo condition) than their novice counterparts, which is consistent with the results revealed in this study. Nevertheless, Yamashiro et al. (2015) revealed more positive somatosensory Nogo-P3 in the open-skilled participants for reactive control in the Go/Nogo task than the closed-skilled participants. The inconsistent findings reported in Yamashiro et al. (2015) study were likely due to a lack of controlling the years of professional motor skill practices among the participants, which could confound the results in the reactive control condition. Another reason may be that the superior performance in baseball group resulting from the baseball specific training, which could not be generalized to the closed-skilled participants.

Significant correlations were revealed between the amplitudes of stimulus-locked P3 and the switch cost of response time only in the control group. The results are unexpected, as significant correlations were anticipated among the open- and closed-skilled participants. A plausible explanation for the non-significant findings could be due to the heterogeneity of the strategies employed by the open- or closed-skilled participants. To further test this proposition, a median-split method (Tamura et al., 2010) was applied to subdivide the open- and closed-skilled groups into higher and lower ability subgroups based on the participants' performances on between-trial difference in the stimulus-locked P3 amplitudes (sP3S-sP3R). Significant correlations were found in the higher but not the lower open-skilled ability subgroups; whilst significant correlations but opposite in direction were found in both the higher and lower closed-skilled ability subgroups. The small sample size for each of the subgroups (n = 9) only allowed us to suspect within-group heterogeneity as a possibly confounding factor to the non-significant relationships between the ERP and behavioral results. Future studies should explore possible deployment of different strategies by the participants in the same open- or closed-skilled group and the differences in the neural processes associated with proactive or reactive controls.

Our negative-going frontal N2 findings in the switch than repeat trials (in the 100 validity) at the Fz, Cz, and Pz electrodes are consistent with those reported by Hsieh and Wu (2011). The results suggested the possible involvement of response-set switching and suppression of conflict response processes unique to the behavioral task used in this study (Nakamoto and Mori, 2008; Wang et al., 2017), as no significant group difference was revealed in the N2 among all the participant groups.

#### Limitations

First, the proactive and reactive control processes were prescribed by the switching task employed in this study. The results may

not be directly generalized to other executive functions, such as inhibition or self-regulation. Second, the sample sizes of the open- and closed-skilled participants of this study were relatively small, which could have lowered the power of the analyses. Readers should be cautious when interpreting the results. Third, it is unclear whether badminton and track and field can best represent open- and closed-skilled physical activities, respectively. Any generalization of the findings should be restricted to participants of the same type of physical activities and level of competence. Future studies may consider recruiting participants of other types of physical activities and levels of competence. Fourth, the participants in the control group were those who had not been engaged in professional or amateur sports. The levels of physical activity engaged by these participants were not controlled. The existing differences between the open/closed-skilled and control groups could have been confounded by the differences in other parameters such as the levels of physical fitness rather than the types of motor skills, in case that the majority of the participants in the control group had been leading a sedentary lifestyle. Future study should recruit individuals who have comparable levels of physical fitness and/or physical training, but not at the professional level, as the control group. Fifth, the proposition of potential heterogeneity in strategies taken by the open- and closed-skilled groups was based on small sample sizes and without triangulation. Future studies should employ a more stringent research design and a larger sample size to address this issue.

#### CONCLUSION

This study explored how experiences in open and closed motor skills modulate individuals' proactive and reactive control processes. Compared with closed-skilled experiences, intensive open-skilled experiences were related to better proactive control for task switching characterized by lower switch cost and significant difference between switch and repeat trials of cuelocked P3 amplitudes. The enhanced proactive control is likely the result of high demand of anticipating environmental changes in open-skilled physical activities. Intensive open- and closedskilled experiences were related to better reactive control for task switching than the experiences of control participants, which

### REFERENCES


was most likely resulted from higher cardiorespiratory fitness. Proactive and reactive controls as part of the process of motorskill development seem to be transferable to domain-general executive functions.

# DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

# ETHICS STATEMENT

The studies involving human participants were reviewed and approved by The Research Grant Committee of the Hong Kong Polytechnic University. The patients/participants provided their written informed consent to participate in this study.

### AUTHOR CONTRIBUTIONS

AW and BL contributed to the data interpretation and article writing. BC contributed to the conceptualization, data interpretation, and article writing. CC contributed to the conceptualization, study design, data interpretation, and article writing. JP contributed to the data collection, data analysis, and data interpretation. QY contributed to conceptualization, study design, data collection, data analysis, data interpretation, and article writing.

### FUNDING

This work was supported by the General Research Fund of the Research Grant Council of Hong Kong (grant number 151044).

### ACKNOWLEDGMENTS

The authors thank the University Research Facility in Behavioral and Systems Neuroscience for its support in providing the equipment and research personnel.

compared to non-athletes and other musicians. Neuroscience 360, 39–47. doi: 10.1016/j.neuroscience.2017.07.059


prefrontal cortex–an event-related potentials study. Brain Res. 1527, 174–188. doi: 10.1016/j.brainres.2013.06.017


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Yu, Chau, Lam, Wong, Peng and Chan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fnhum-13-00404 November 12, 2019 Time: 17:4 # 15

# The Effect of Meditation on Comprehension of Statements About One-Self and Others: A Pilot ERP and Behavioral Study

Alexander Savostyanov 1,2,3\*, Sergey Tamozhnikov <sup>2</sup> , Andrey Bocharov 2,3 , Alexander Saprygin<sup>2</sup> , Yuriy Matushkin<sup>1</sup> , Sergey Lashin1,3 , Galina Kolpakova<sup>3</sup> , Klimenty Sudobin<sup>3</sup> and Gennady Knyazev <sup>2</sup>

<sup>1</sup>Laboratory of Psychological Genetics, Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences (SB RAS), Novosibirsk, Russia, <sup>2</sup>Laboratory Differential Psychophysiology, State-Research Institute of Physiology and Basic Medicine, Novosibirsk, Russia, <sup>3</sup>Laboratory of Biological Markers of Human Social Behavior, Humanitarian Institution at the Novosibirsk State University, Novosibirsk, Russia

#### Edited by:

Stephen Fairclough, Liverpool John Moores University, United Kingdom

#### Reviewed by:

Bahar Güntekin, Istanbul Medipol University, Turkey Christopher George Burns, University of Warwick, United Kingdom

#### \*Correspondence:

Alexander Savostyanov alexander.savostyanov@gmail.com

#### Specialty section:

This article was submitted to Cognitive Neuroscience, a section of the journal Frontiers in Human Neuroscience

Received: 04 July 2019 Accepted: 28 November 2019 Published: 09 January 2020

#### Citation:

Savostyanov A, Tamozhnikov S, Bocharov A, Saprygin A, Matushkin Y, Lashin S, Kolpakova G, Sudobin K and Knyazev G (2020) The Effect of Meditation on Comprehension of Statements About One-Self and Others: A Pilot ERP and Behavioral Study. Front. Hum. Neurosci. 13:437. doi: 10.3389/fnhum.2019.00437 The main goal of this study was to examine the effect of long-term meditation practice on behavioral indicators and ERP peak characteristics during an error-recognition task, where participants were presented with emotionally negative (evoking anxiety or aggression) written sentences describing self-related or non-self-related emotional state and personality traits. In total, 200 sentences written in Russian with varying emotional coloring were presented during the task, with half of the sentences containing a grammatical error that the participants were asked to identify. The EEG was recorded in age-matched control individuals (n = 17) and two groups of Samatha meditators with relatively short- (3–5 years' experience, n = 18) and long-term (10–30 years' experience, n = 18) practice experience. Task performance time (TPT) and accuracy of error detection (AED) were chosen as behavioral values. Amplitude, time latency and cortical distribution of P300 and P600 peaks of ERP were used as a value of speech-related brain activity. All statistical effects of meditation were estimated, controlling for age and sex. No behavioral differences between two groups of meditators were found. General TPT was shorter for both groups of meditators compared to the control group. Non-meditators reacted significantly slower to sentences about aggression than to sentences about anxiety or non-emotional sentences, whereas no significance was found between meditator groups. Non-meditators had better AED for the sentences about one-self than for the sentences about other people, whereas the meditators did not show any significant difference. The amplitude of P300 peak in frontal and left temporal scalp regions was higher for long-term meditators in comparison with both intermediate and control groups. The latency of P300 and P600 in left frontal and temporal regions positively correlated with TPT, whereas the amplitude of P300 in these regions had a negative correlation with TPT. We demonstrate that long-term meditation practice increases the ability of an individual to process negative emotional stimuli. The differences in behavioral reactions after onset of negative information that was self-related and non-self-related, which is typical for non-meditators, disappeared due to the influence of meditation. ERP results could be interpreted as a value of increase in voluntary control over emotional state during meditational practice.

Keywords: Samatha meditation, assessment of self and other, recognition of written speech, EEG and ERP, task performance time and accuracy of error detection

# INTRODUCTION

Assessment of self and others (non-self) is one of the most important elements of regulation of social behavior. One of these methods, attribution theory, was suggested by social psychologist Heider (1958) to evaluate how people perceive their own behavior, as well as the behaviors of others. According to this theory, an act of attribution is a prediction of ongoing behaviors, such as emotions and motives for self or any other person. The main question of research in this area of self-feeling (our ''I'') is whether it has a special status in human consciousness or if it is functionally equivalent to semantic processing of any other classes of stimuli. Is our consciousness capable of paying more attention to information in which we are somehow involved? Symons and Johnson (1997) showed that information related to self in any way is remembered better than impersonal. This phenomenon became known as ''self-reference effect''. Neuroscientific studies on self-concept have mainly relied on experimental paradigms in which participants evaluated sentences that described some traits about the self. These studies revealed consistent activation in a medial prefrontal cortex (mPFC) during self-evaluation (see meta-analyses: Denny et al., 2012; Murray et al., 2012). In the study of Fossati et al. (2003), influence of emotional valence of stimuli was studied in addition to localization of the information processing relating to self. Activation in the right dorsomedial prefrontal cortex was found for self-referencing stimuli, which was independent from the valence of the words. Differences in processing of positive and negative adjectives were noticed in this study in areas outside of the medial prefrontal cortex. Other studies (Moran et al., 2006; Qin and Northoff, 2011) showed that besides mPFC, such areas as the cingulate cortex, precuneus, and temporalparietal cortex are also involved in the processing of information about self.

One of the experimental approaches to studying evaluation of self and non-self is a comparison of groups of people that vary significantly in behavioral values related to social perception. Such groups can include people of different age (Pfeifer et al., 2009), gender (Cross and Madson, 1997), or ethno-cultural traits (Markus and Kitayama, 1991; Cross et al., 2011) that affect selfevaluation. In addition, healthy subjects can be compared to psychiatric patients with impaired self-perception processes (Sass and Parnas, 2003; Nelson et al., 2012; Mishara et al., 2014). In this study, we aimed to examine people practicing Buddhist Samatha meditation through neuroscientific approaches.

Meditation is a religious practice of great interest to neuroscience, as it is a model that allows the studying of changes occurring in regards to brain function, which occur due to voluntary and conscious efforts of the subject. Meditation's effects on attention have been previously studied using attention network task (ANT) and other experimental paradigms (Chiesa et al., 2011). Some studies report improvement in conflict monitoring (Wenk-Sormaz, 2005; Chan and Woollacott, 2007; Slagter et al., 2007; Tang et al., 2007; Moore and Malinowski, 2009). For example, it is shown that several days (20 min each day) of integrative mental-body training led to improved conflict monitoring (Tang et al., 2007). Additionally, longitudinal studies of 3-month long awareness meditation have shown decrease in attention blinking due to exercise (Slagter et al., 2007; van Leeuwen et al., 2009). In cross-section studies, experienced meditators demonstrated the best results in monitoring conflicts (van den Hurk et al., 2010).

According to the report of Aftanas and Golosheykin (2005), Sahaja Yoga meditators manifested smaller emotional arousal while watching aversive video clips, which was reflected in changes of spectral power in their EEG records. In addition, meditators manifested larger power values in theta-1 (4–6 Hz), theta-2 (6–8 Hz) and alpha-1 (8–10 Hz) frequency bands compared to control subjects in eye-closed conditions without external stimulation. These results were interpreted as proof of meditators having better capabilities to moderate intensity of emotional arousal.

Changes in amplitude of N1, P2, and P3 peaks in meditators compared to non-meditators were revealed in many ERP studies of different kinds of meditation (Cahn and Polich, 2009; Atchley et al., 2016; Biedermann et al., 2016). P300 amplitude increased in meditators immediately after meditation practice compared to the pre-meditation condition (Telles et al., 2019). The study of Vipassana meditation practice has found that the dynamics of P300 in conjunction with the theta and alpha bands spectral power indices reflected differences between people with different prolongation of meditation experience (Kakumanu et al., 2019). The amplitude of frontal P300 reflects a degree of concentration of voluntary attention. Therefore, most researchers interpret the influence of meditation on the amplitude of P300 as an index of the increase in the ability of meditators to voluntary concentrate attention on themselves and/or external events.

Several functional and structural MRI studies of awareness meditation have focused on neuroplasticity in areas responsible for attention control. Awareness practice effects on attention are mostly related to the anterior cingulate (van Veen and Carter, 2002; Cahn and Polich, 2006; Hölzel et al., 2007; Tang et al., 2010, 2012a,b, 2013; MacCoon et al., 2014; Tang and Posner, 2014). The anterior cingulate is responsible for executive attention and control through detecting conflicts caused by incompatible information processing flows (van Veen and Carter, 2002; Posner et al., 2007; Tang and Tang, 2014). The anterior cingulate cortex and the frontal cortex form a part of the network, which, due to effective distant connections with other areas of the brain, facilitates cognitive processing (Sridharan et al., 2008; Tang et al., 2012b). Compared to the control group, experienced practitioners have shown an increase in activation of areas of the anterior cingulate during meditation (Hölzel et al., 2007) or with a conscious expectation of pain stimulus (Gard et al., 2012). In a controlled randomized longitudinal study, high activation of the ventral and/or rostral anterior cingulate cortex have also been identified during the rest period following 5 days of integrative mental-body training (MacCoon et al., 2014). Activation of the anterior cingulate cortex may increase during the initial stages of the practice and decrease with prolonged training (Brefczynski-Lewis et al., 2007).

Other brain areas related to attention, in which functional changes due to awareness meditation were witnessed, include the dorsolateral prefrontal cortex (dlPFC), where an increased response during executive processing was observed (Allen et al., 2012), and the posterior parietal cortex (PPC), which had shown higher activation after the MBSR course in subjects with social anxiety (Goldin and Gross, 2010).

Thus, the areas of the brain where activity changes under the influence of meditation and the areas involved in the self-assessment and evaluation of other people overlap to a significant degree. According to the self-report of participants, Samatha is a religious practice, with the end goal being the ''dissolution'' of Myself in Universe. Self-awareness is considered by adherents of such meditation an illusion that must be discarded. We have proposed that the long practice of Samatha can change the processes of evaluation of one-self and/or of other people, which can be witnessed using brain electrical activity analysis. We chose event-related potentials (ERP), occurring during the recognition of written sentences, which include emotionally negative descriptions of one-self's conditions or non-self traits and characteristics, as the method of our study. According to our hypothesis, in such a paradigm non-meditators should show differences in responses to messages about self and non-self, whereas the consequence of meditation should be the disappearance of such differences.

### MATERIALS AND METHODS

#### Participants

The study was conducted at the Baikal Meditational Center<sup>1</sup> . The experimental sample included 53 healthy right-handed participants from 25 to 66 years (32 men; mean age = 41.0, SD = 8.3). The participants were divided into three groups: (1) non-meditators-17 persons (10 men, mean age = 40.5, SD = 8.5) who refused to take part in a meditational practice; (2) intermediate group–18 persons (11 men, mean age = 40.3, SD = 8.0) with 3–5 years of experience of meditation; and (3) long-term meditators–18 persons (11 men, mean age = 42.7, SD = 9.3) who had long-term (more than 10 years) experience of a meditation practice. All participants from the long-term group have the status of great masters of meditation, recognized in the Buddhists community. After completion of our study, they took part in ''a retreat,'' i.e., semi-annual continuous meditation. The groups of participants were balanced in age and sex. All participants had no history of neurological, psychiatric, or major somatic disorders. According to the selfreport, they denied use of narcotic drugs or other psychoactive substances. All participants were native Russian-speakers or natural bilinguals (Russian and one of the Siberian languages) and had normal or corrected-to-normal visual acuity. All participants (including non-meditators) are related to the lamaistic Buddhism community of Russia.

After the experiment all participants filled a Russian version of Goldberg's IPIP Big-Five Factor Markers (validated by Knyazev et al., 2010) for estimation of their psychological traits. In addition, a set of psychological questionnaires was used for estimation of participants' differences in emotional intelligence scores, trait anxiety level, vulnerability to depression and anxiety disorder, etc.

All subject protection guidelines and regulations were followed in accordance with the Declaration of Helsinki. The study goals were explained to all participants and they signed the informed consent. The study and the consent form were approved by the Institute of Physiology and Basic Medicine ethics committee and by the spiritual leader of the lamaistic Buddhism community of Russia.

# Experimental Procedure

Two-hundred sentences in Russian language were selected for the experiment (see **Table 1**). Half of the sentences (100) contained a grammatical error. In preliminary testing on another group, the grammatically wrong sentences were selected so that they were easily recognized (more than 80% accuracy) by all participants. The text stimuli were presented in white and black (Arial, 36 pt) via a 24.4 cm × 18.3 cm monitor located 60 cm away in front of a participant. A warning signal (cross) appeared at the center of the screen for 0.5 s before the task onset. The participants were instructed to answer whether a presented sample contains an error by pressing one of two buttons with their dominant hand. Participants had three practice trials before task execution. Time for decision making was not limited for participants, but they were instructed to perform the task as quickly as possible and with maximal accuracy.

The experiment also contained a hidden condition: the sentences were grouped by emotional coloring, of which participants were not informed. There were five groups of sentences: (1) sentences describing aggression of a participant; (2) sentences describing aggression of other people; (3) sentences describing anxiety of a participant; (4) sentences describing anxiety of other people; and (5) neutral sentences about inanimate objects. The sentences from different categories were presented in random order. The sentences about anxiety were selected from the Russian version of Spielberger's State-Trait Anxiety Inventory (Spielberger et al., 1970; translated to Russian and validated by Hanin, 1976). The sentences about aggression were taken from the Buss-Perry aggression questionnaire (Buss and Perry, 1992; translated to Russian and validated by Yenikolopov and Tsibulsky, 2007). All sentences

<sup>1</sup>http://www.geshe.ru/



about one-self were taken from validated Russian versions of psychological tests. Translations of these tests into Russian were performed by professional translators. All questionnaires were repeatedly validated in different samples in Russia and in other countries where Russian is widely spoken. The sentences about non-self anxiety or aggression were created by the replacement of a pronoun ''I'' or ''Me'' on accidentally chosen pronouns ''He,'' ''She,'' ''They,'' ''Him,'' ''Her'' or ''Them.'' The sentences about objects, anxiety, and aggression were balanced in number of words and grammatical structure. The sentences about self and non-self differed only by pronouns and connected verbs. The order of sentences was randomized across participants.

#### EEG Recording

EEGs were recorded using 130 channels (128 EEG, VEOG, ECG) via Ag/AgCl electrodes. The EEG electrodes were placed on 128 head sites according to the extended International 5–10% system and referred to Cz with ground at FzA. The Quik-Cap128 NSL was used for electrode fixation. The electrode resistance was maintained below 5 k. The signals were amplified using NVX 136 amplifier (MCS, Russia), with 0.1–100 Hz analog bandpass and continuously digitized at 1,000 Hz.

#### Behavioral Data Processing

The task performance time (TPT, in milliseconds) and accuracy of error detection (AED, a percent of correctly recognized error in wrong sentences and a percent of sentences without error correctly marked as ''right'') were used as the two behavioral values.

Initially, these values were used for repeated measures ANOVA with the Greenhouse-Geisser correction to test the main effects of such factors as ''group'' (non-meditators vs. meditators), ''correctness'' (right or wrong sentences), ''sentence category'' (five levels for different categories of sentences), ''me\_other'' (sentences about participant or about other people, results for neutral sentences were excluded from such analysis), ''aggression\_anxiety'' (sentences about aggression vs. about anxiety, results for neutral sentences were excluded from such analysis), age, sex, and interactions between these factors. However, any significant differences in the behavioral values between the participants from intermediate and long-term experience groups were not revealed. Both groups of meditators showed same differences from the non-meditators. For this reason, behavioral values were compared between the united group of meditators and the control group of non-meditators. Besides, no inter-group differences in TPT and AED were found for neutral sentences. Therefore, behavioral values for neutral sentences were excluded from statistical analysis. Finally, for statistical comparison of TPT and AED values, the repeated measures ANOVA with the Greenhouse-Geisser correction with factors ''group'' (non-meditators vs. meditators of both groups), ''correctness,'' ''me\_other,'' ''aggression\_anxiety,'' age, sex, and interactions between these factors was used. All statistical effects of meditation were estimated controlling for age and sex.

# EEG Pre-processing and ERP Analysis

Recordings were processed in the EEGLAB toolbox (Delorme and Makeig, 2004). The trials containing artifacts were rejected from analysis. For each subject, 180–195 trials with sentence onset were used. The time intervals from 1.5 s before to +3.0 s after the fixation cross onset were analyzed. Time intervals from −1.5 to −0.75 s before fixation cross onset were used for baseline-correction.

EEGs were preliminary band-pass filtered in 1–40 Hz using elliptic filters. Following the suggestion by Delorme and Makeig (2004), re-reference (averaged reference) and baseline adjustment procedures were performed during data pre-processing. Independent component analysis (ICA) was used for correction of eye-movement and eye-blinking artifacts. First, the component's weights were computed individually for each record. Then, components corresponding to eye's artifacts were disclosed by visual inspection of component sets together with VEOG and ECG records. Artifactual components were removed in the pre-processing of EEGs.

To assess changes in signal amplitude, associated sentence onset, event-related potentials (ERPs) were calculated in the ERPLAB toolbox<sup>2</sup> . After removing artifacts, we computed ERP-values using the Erplab toolbox separately for every EEG-channel, subject and experimental condition, and cutoff filter at 12 Hz was applied to them. After that, maximal peak amplitudes, mean peak amplitude, and peak latencies in 150–500 ms (that corresponded to location of P300 peak during the visual peak analysis) and 600–1,000 ms (i.e., peak P600) time ranges were averaged across nine scalp regions of interest (ROI: left frontal, medial frontal, right frontal, left temporal, central, right temporal, and left, medial and right parietal-occipital scalp regions) for each subject. These values were used for repeated measures ANOVA with the Greenhouse-Geisser correction to test the main effects of such factors as ''region of interest,'' ''group,'' ''correctness,''

<sup>2</sup>https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3995046/

''sentence category,'' age, sex, and interactions between these factors.

### RESULTS

## Behavioral Results

No statistically significant inter-group differences in Big-Five factor markers were found in all scales of Goldberg's inventory. The main effects of factors ''age'' and ''sex'' and their interactions were statistically insignificant both for TPT (for age p = 0.95, for sex p = 0.51) and AED (for age p = 0.33, for sex p = 0.12).

As already mentioned, significant differences in the behavioral values between the participants from intermediate and long-term experience groups were not revealed. The main effect of ''correctness'' factor controlling for age and sex was highly significant as for TPT, F(1,51) = 23.78; p < 0.0001, as for AED, F(1,51) = 15.65; p < 0.0001. The correct sentences were recognized slower (correct sentences: mean TPT 4.0 ± 0.2 s; wrong sentences: 3.4 ± 0.2 s), but with better accuracy (correct sentences: mean accuracy 96.5 ± 1.3%; wrong sentences: 90.6 ± 0.9%), than the sentences with error. However, the interactions between factors ''correctness,'' ''group,'' ''age,'' and ''sex'' were insignificant for both values (p > 0.5).

For TPT, the main effect of the ''group'' factor was statistically significant, F(1,51) = 2.95; p = 0.052. Meditators solved the task faster (mean time 3.3 ± 0.2 s) than non-meditators (4.0 ± 0.2 s). In addition, for value of TPT, the main effect of the ''me/other'' factor was highly significant, F(1,51) = 22.69; p < 0.0001. Sentences about one-self were recognized faster (mean time 3.5 ± 0.2 s), than about others (3.9 ± 0.2 s). Interaction between factors ''Me/Other'' and ''group'' for value of reaction time was statistically insignificant (p > 0.3). Additionally, a highly significant main effect of the ''Anxiety/Aggression'' factor, F(1,51) = 51.24; p < 0.0001, and an interaction between ''Anxiety/Aggression'' and ''group'' factors, F(1,51) = 5.86; p = 0.019 were revealed for TPT. For all groups of participants, recognition time for sentences about anxiety (mean 3.4 ± 0.2 s) was lower than for sentences about aggression (4.0 ± 0.2 s). However, meditators had smaller differences in recognition time for sentences about anxiety and aggression (anxiety: 3.4 ± 0.2 s; aggression: 3.6 ± 0.2 s) than non-meditators (3.7 ± 0.3 s 4.4 ± 0.3 s, respectively; **Figure 1**).

The main effect of the ''group'' factor for AED value was not significant (p > 0.4). For AED, a statistically significant main effect of the ''me/other'' factor, F(1,51) = 7.46; p = 0.009, and an interaction between ''Me/Other'' and ''group'' factors, F(1,51) = 9.11; p = 0.004 were revealed. Averages between all group AED values were higher for sentences about one-self (94.3 ± 1.0%), than about others (92.8 ± 0.8%). However, these differences manifested only in the non-meditation group (oneself: 94.5 ± 1.7%; other: 91.3 ± 1.3%), while meditators had no such differences (one-self: 94.1 ± 1.1%; others: 94.3 ± 0.8%; **Figure 2**). The main effect of the ''Anxiety/Aggression'' factor (p > 0.9), and the interaction between ''Anxiety/Aggression'' and ''group'' (p > 0.8) were not significant. Interactions between ''Me/Other,'' ''Anxiety/Aggression,'' ''age,'' and ''sex'' were also insignificant.

FIGURE 1 | Differences in time of task performance for the sentences about anxiety (light gray) and aggression (darkly gray) for both groups of meditators (left side) and non-meditators (right side).

#### ERP Results

Time-amplitude plots of ERP in different cortical areas among different groups of participants are presented in **Figure 3**; cortical topography is presented in **Figure 4**. P300 and P600 peaks can be identified by visual inspection of these plots. P300 peak is clearly noticeable in the left temporal and all frontal cortical regions with amplitude maximum about 300 ms after sentence onset. Negative peak is detected in occipital-parietal cortical regions with amplitude maximum about 350 ms after sentences onset. The P300 has the highest amplitude in the group of long-term meditators in comparison with other groups. P600 peak is noticeable only in left temporal regions, i.e., in Broca's and Wernicke's areas. Amplitude of P600 among different groups is maximal in short-term meditators, but this difference was not significant. The topographic distribution of P300 and P600 peaks over cortical regions was not principally different among groups of participants. Though, general time-amplitude parameters and

sentences about one-self (light gray) and others (darkly gray) for both groups of meditators (left side) and non-meditators (right side).

cortical distribution of P300 and P600 peaks were in consistence with standard patterns for tasks on language recognition.

In all subject groups, amplitudes of P300 P600 peaks were significantly higher for sentences about aggression than for sentences about anxiety and inanimate objects. Amplitudes of these peaks were higher for incorrect sentences, however, no significant interactions between ''group'' and ''category'' or ''correctness'' were revealed. Also, there were no differences between ERP values for sentences about self and non-self. That's why we averaged ERP values for all sentence categories.

For the majority of measured ERP values (i.e., time latency of P300 and P600, averaged and maximal amplitude of P600, and averaged amplitude of P300) any inter-group differences were not revealed. Significant inter-group differences were revealed for the maximal amplitude of the P300 peak. The main effect of age was significant, F(1,46) = 6.16; p = 0.017 while the main effect of sex was insignificant (p > 0.7). The statistical significance of effects of meditation to P300 amplitude controlling for age and sex was higher, than without such control. The main effect of the ''group'' factor for P300 values was insignificant (p > 0.2). However, the interaction of effects of factors ''Region'' and ''Group'' compared in three groups of participants was statistically significant, F(16,368) = 1.83; p = 0.026. Using one-way ANOVA, we have compared all pair combinations of three groups (i.e., control vs. intermediate; control vs. long-term meditators; intermediate vs. long-term). The comparison was performed separately for each cortical region, controlling for age and sex. One-way ANOVA revealed significant differences between the long-term meditators and other participants in left frontal, medial frontal, right frontal, and left temporal cortical regions. In all of these regions, long-term meditators showed higher amplitude of peak P300 in comparison with non-meditators and people with short-term experience of meditation (see **Table 2**). These effects were significant both for comparison between control participants and long-term meditators, and between short-term and long-term meditators. The significant differences between a control group and short-term meditators were revealed in right frontal and right temporal cortical regions. The amplitude of P300 in these regions was higher in control participants than in short-term meditators.

#### Correlations Between Behavioral and ERP Values

Two-tailed Pearson's correlation coefficients were calculated between the values of TPT and AED, averaged across all



Significantly different values are marked by a bold font.

categories of sentences, and values of maximal amplitude and time latency of P300 and P600 peaks of ERP separately in all cortical ROI. Maximal amplitude of P300 had significantly negative correlation with the TPT in left frontal (r = −0.34; p = 0.014) and left temporal (r = −0.31; p = 0.028) cortical regions. The latency of P300 significantly positively correlated with TPT values in left frontal (r = 0.34; p = 0.014) and medial frontal (r = 0.31; p = 0.029) cortical regions. The latency of P600 had a positive highly significant correlation with TPT in the left frontal (r = 0.53; p < 0.0001) and left temporal (r = 0.56; p < 0.0001) regions, and negative correlation with accuracy of error recognition in the left temporal area (r = −0.30; p = 0.035). Separated statistical analysis did not reveal any differences in correlations between behavioral and ERP values among different groups of participants and different categories of sentences.

#### Results Summary

Short-term and long-term experience groups of meditators were not statistically different in their behavioral values, however, they showed significant differences to the control group. Non-meditators showed decrease in TPT for sentences about aggression when compared to sentences about anxiety, while both meditators groups had these differences decreased due to faster recognition of sentences containing aggression. Additionally, non-meditators performed better in sentences about self than about non-self, while meditators had no such differences due to better performance in sentences about non-self. In ERP values long-term meditators showed higher amplitudes of P300 peak in frontal and left temporal cortical regions compared to other groups, while differences between non-meditators and short-term meditators were found in the right frontal and temporal regions. For all groups and all sentences, categories peak latencies of both P300 and P600 correlated positively with TPT. P300 peak amplitude in left frontal and temporal regions correlated negatively with TPT.

# DISCUSSION

The aim of our study was to investigate how meditation changes perception of information connected with negative assessment of one-self or other people. The task reflecting implicit perception of emotional coloring of written speech was used as an experimental method. In our experiment, the voluntary attention of a participant was concentrated on detection of a grammatical error in the presented sentence, with the tasks being simple for the participants (mean AED was more than 90% for all participants). However, we compared speeds and accuracy of the answers for different categories of sentences which differed in their emotional coloring (neutral, anxious and aggressive), and whether the sentence was in relation to themselves or to other people. In this study, we have analyzed differences in behavior that participants did not control voluntarily and did not concentrate their attention on. It is possible to assume that the recorded differences in these measurements between meditators and non-meditators reflect some long-term changes in their behavior, which can be shown in other test tasks and ordinary life.

We have found two significant effects of meditation on the behavioral values and one effect of meditation on the ERP values. The behavioral differences were revealed in both groups of meditators (i.e., both for people with relatively shorter and longer experience of such practice), whereas the ERP differences distinguished the group with long-term experience both from non-meditators and meditators with shorter experience. A slight difference in the P300 amplitude in the right frontal and right temporal areas was also found between the control participants and meditators with shorter experience.

The significant differences in time and accuracy in performing the set tasks for anxiety and aggression colored sentences, as well as for sentences about self and non-self were observed in non-meditators compared to meditators (the values for neutral sentences did not significantly differ from values for sentences about self-related anxiety). Non-meditators have recognized the sentences about anxiety quicker (difference approximately in 0.7 s) than sentences about aggression. The quality of recognition of sentences about anxiety and aggression in non-meditators did not differ. Also, non-meditators recognized the sentences about themselves quicker (difference in 0.4 s) and with better accuracy (difference about 3%), than the sentences about others. It is essential that the sentences about themselves and about others differed only with a pronoun and connected verb and contained identical grammatical errors. The order of presentation of these sentences have been randomized between participants. Thus, the differences in behavioral values for these sentences cannot be connected neither with their grammatical properties, nor with a presentation order, and reflect personal features of perception of negative information about self and others.

It is also important to note that all statistically significant effects of meditation were calculated controlling for participants' age and sex. Thus, behavioral and ERP differences between groups cannot be explained as a random result of influence of age and sex differences between groups of participants. We revealed a general correlation between TPT and AED, which also correlated with P300 amplitudes for all groups, as noted in the results on correlations between different indicators. However, long-experienced meditators showed a general decrease in TPT as compared to other groups, but the main effect of the ''group'' factor for AED value was not significant. Therefore, the effect of meditation on AED cannot be explained by the decrease of reaction time. In addition, the most important difference between meditators and non-meditators was not the overall accuracy or task response time, but the difference in AED when searching for an error in self-related and non-self-related sentences. This effect did not depend on the tasks response time. Also, the statistical significance of this effect increased when controlling for age and sex. Overall, the most significant behavioral effect of meditation did not depend neither on the general response time, nor on general quality of task performance, nor sex, nor age of the participants.

It has been shown that the recognition of grammatical errors in sentences depends on their emotional coloring. Sentences with emotionally negative signals usually induce a slower error recognition response, with worse accuracy scores when compared to emotionally positive or neutral sentences (Ayusheeva et al., 2018). In our results, it can be observed that in non-meditators the greatest delay and decline in quality of task performance happens in sentences containing aggression towards others. It could be interpreted as an indicator that the description of aggressive behavior of other people causes the most negative emotions, which is shown in behavioral data of a linguistic task performance.

The effect of meditation consists in the general acceleration of performance of a linguistic task for all categories of emotional (but not neutral) sentences and in reduction of differences in the TPT between anxiety and aggression related sentences. This correlates with the assumption that meditation improves voluntary control over own emotions (Aftanas and Golosheykin, 2005; Marchand, 2012; Kasala et al., 2014). Our result could be interpreted as an indicator that meditators reduce the extent of negative perception of sentences about aggression that improves their control over execution of a grammatical task.

Other behavioral effects of meditation are connected with the alignment in AED for sentences about self and non-self. This effect happens due to improvement (in comparison with non-meditators) in accuracy in sentences about non-self with maintaining the accuracy for self. Differences in the TPT for the sentences about one-self and others in meditators remain the same, as well as in non-meditators. These results could be hypothetically interpreted based on data about cross-cultural differences in the neuronal processes underlying an assessment of self and non-self. It is known that in people who originate from individualistic (so-called ''western'') cultures, a significantly stronger activation of the medial prefrontal cortex was associated when thinking about self rather than non-self, including when thinking about relatives (Kelley et al., 2002). In contrast to this, in people from collectivistic (''oriental'') cultures the thinking about self causes the same levels of activation of the prefrontal cortex as when promoted to think about relatives (Zhu et al., 2007; Wang et al., 2012). According to the self-report of our participants, the purpose of the Buddhist meditational practices is subjectively realized as ''a dissolution of consciousness in the Universe.'' As meditators aspire to reach this goal, the differences between themselves and others are perceived as an illusion that needs to be eliminated. It is possible to speculate that the lack of differences in accuracy of recognition of sentences about self and others in meditators is an indicator reflecting the effect of meditation on their consciousness. On the other hand, all of our participants (including nonmeditators) consider the Buddhist community of Russia and Kazakhstan to belong to oriental, rather than to western culture. Moreover, the differences in TPT for sentences about self and non-self were still present in the meditators group. Another possible explanation for the effect of meditation is the general decrease in negative emotions that improves attention to those sentences which cause the most negative reaction in nonmeditators. Such an assumption is in agreement with the results received when comparing reactions to anxiety and aggression related sentences.

Amplitude of frontal P300 peak is one of the most frequently noted neurophysiological markers of meditation (Cahn and Polich, 2009; Atchley et al., 2016; Telles et al., 2019). In the study of Vipassana meditation, it was revealed that dynamics of P300 together with theta and alpha spectral power differ in people with various durations of meditation practice (Kakumanu et al., 2019). Our ERP results showed the differences at the level of brain activity between meditators with long-term experience both from non-meditators and intermediate group of meditators. In long-term meditators the amplitude of P300 peak in all frontal and left temporal cortical regions was increased in comparison with other groups. At the same time, in all groups of participants higher amplitude of P300 in left frontal and temporal regions correlated with faster task solution for all categories of sentences. Frontal amplitude of the P300 peak is a well-known correlate of voluntary-controlled attention (Heinze et al., 1999). Our findings could be interpreted as the indicator reflecting the general improvement of ability to concentrate attention as a result of long-term meditation. It is known, that the right fronto-temporal areas are activated under processing of unconscious negative emotion (Satto and Aoki, 2006). Therefore, a decrease in the P300 amplitude in meditators with shorter experience can be hypothetically interpreted as an indicator of a general decrease of sensitivity to negative emotions.

The differences revealed in behavior may have several interpretations. Probably, it is a direct effect of meditation practice on self-assessment behavior. Another possible reason of revealed behavioral effects is differences in the personality traits of the participants. In our study, intergroup differences in the Big-Five factor markers were not identified. However, some of psychological properties, not covered by the Goldberg's inventory, could exist between non-meditators and meditators before the latter began to practice meditation. These properties can have influence on recognition of self-related sentences.

It is necessary to notice that the differences in brain activity between the long-term meditators and other groups have no direct relation to the differences in the behavioral values. First of all, the behavioral differences have been found in the intermediate meditators as well, whereas the ERP differences were detected only in long-term meditators. Besides, ERP differences among the experimental groups did not concern the differences in reactions of the sentences about anxiety and aggression or about one-self and others, and were revealed for the reactions to all sentences, including the neutral ones. Respectively, in the present study we did not find the indicators of brain activity which are connected with the changes in perception of one-self and others, observed in behavioral assessment. On the contrary, our ERP findings reflect the long-term effect of meditation, which is shown at the perception of any speech-related information, including unemotional information about inanimate objects. A more detailed comparison of various behavioral and neurophysiological effects of meditation is a topic for our future research.

### LIMITATIONS

1. Although the groups of participants were balanced by sex and age to the best of our ability, inside of all groups there was significant age variability. The age range was between 20 and 66 years old: it was connected to age heterogeneity in groups of meditators. Besides, the researched samples were not few in number of participants. Our hypotheses about the effects of meditation on behavior and brain activity should be tested in larger groups.


## CONCLUSION

Meditative practice changes a perception of emotional coloring of written speech. Meditators have an increase in behavioral control of recognition of emotionally negative sentences about aggression, which is reflected in an increase in speed of performance of grammatical tasks. Meditation changes a perception of information about one-self and others. The differences in a recognition of sentences about one-self and others, which are characteristic of non-meditators, have not been observed in meditators. Effect of long-term meditation on brain activity is an increase in amplitude of P300 peak of ERP in frontal and left temporal cortical areas, which correlates with the reduction of the TPT. Revealed in this study is that ERP-effects of meditation are not specific to the emotional category of a sentence, and are not directly connected with processes of assessment of one-self or others.

### REFERENCES


# DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article.

# ETHICS STATEMENT

The study was approved by the Institute of Physiology and Basic Medicine ethics committee. All applicable subject protection guidelines and regulations were followed in the conduct of the research in accordance with the Declaration of Helsinki. The study aim was explained to all participants and they signed the informed consent.

## AUTHOR CONTRIBUTIONS

ASav designed the study; organized scientific expedition to Baikal, participated in EEG data collection, undertook the statistical analysis, and wrote the first draft of the manuscript. ST and KS participated in EEG and behavioral data collection and analysis, and gave feedback to participants. YM is responsible for communication with the participants and leaders of the lamaistic Buddhism community of Russia, organization of scientific expedition, and control of ethical aspects of the study. ASap, SL, GKo and GKn participated in EEG and behavioral data analysis. All authors have read and approved the final manuscript.

### FUNDING

The part of this work related to the data collection and processing was supported by the Russian Foundation of Basic Research [RFBR; Российский Фонд Фундаментальных Исследований (РФФИ) ] under Grant No. 18-29-13027. Elaboration of experimental model was supported by the Russian Science Foundation (RSF) under Grant No 17-18-01019. The publication of this article was supported by the project ''Investigation, analysis and complex independent expertise of projects of the National technological initiatives, including the accompanying of projects of ''road map'' ''NeuroNet,'' which is executed within the framework of the state assignment No. 28.12487.2018/12.1 of the Ministry of Science and Higher Education of the Russian Federation.

among Yakuts and Russians during the recognition of emotionally colored verbal stimuli,'' in Proceedings of the 11th International Conference Bioinformatics of Genome Regulation and Structure \Systems Biology, BGRS\SB 2018 124–126 (Novosibirsk, Russia: IEEE). doi: 10.1109/CSGB.2018.85 44755


**Conflict of Interest**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Savostyanov, Tamozhnikov, Bocharov, Saprygin, Matushkin, Lashin, Kolpakova, Sudobin and Knyazev. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Cognitive Fitness Framework: Towards Assessing, Training and Augmenting Individual-Difference Factors Underpinning High-Performance Cognition

#### Eugene Aidman1,2,3 \*

<sup>1</sup>Land Division, Defence Science & Technology Group, Edinburgh, SA, Australia, <sup>2</sup>School of Psychology, The University of Sydney, Sydney, NSW, Australia, <sup>3</sup>School of Biomedical Sciences & Pharmacy, University of Newcastle, Newcastle, NSW, Australia

#### Edited by:

Benjamin Cowley, University of Helsinki, Finland

#### Reviewed by:

Simonetta D'Amico, University of L'Aquila, Italy Annette Kluge, Ruhr University Bochum, Germany Leonard Zaichkowsky, Boston University, United States

\*Correspondence: Eugene Aidman eugene.aidman@sydney.edu.au

#### Specialty section:

This article was submitted to Cognitive Neuroscience, a section of the journal Frontiers in Human Neuroscience

Received: 12 September 2019 Accepted: 19 December 2019 Published: 14 January 2020

#### Citation:

Aidman E (2020) Cognitive Fitness Framework: Towards Assessing, Training and Augmenting Individual-Difference Factors Underpinning High-Performance Cognition. Front. Hum. Neurosci. 13:466. doi: 10.3389/fnhum.2019.00466 The aim of this article is to introduce the concept of Cognitive Fitness (CF), identify its key ingredients underpinning both real-time task performance and career longevity in high-risk occupations, and to canvas a holistic framework for their assessment, training, and augmentation. CF as a capacity to deploy neurocognitive resources, knowledge and skills to meet the demands of operational task performance, is likely to be multifaceted and differentially malleable. A taxonomy of CF constructs derived from Cognitive Readiness (CR) and Mental fitness (MF) literature maps into phases of operational cycle from foundational to advanced, mission-ready and recovery. Foundational cognitive attributes, such as attention, executive control and co-action, were hypothesized to be trainable at the initial Cognitive Gym phase. More advanced training targets at the CR phase included stress and arousal regulation, adaptability, teamwork, situation awareness (including detection, sense-making and prediction) and decision making (de-biasing and confidence calibration). The mission-ready training phase is focused on tolerances (to sleep loss, monotony, pain, frustration, uncertainty) and resistance (to distraction, deception or manipulation). Operational Augmentation phase relies on support tools such as decision aids and fatigue countermeasures, while the Recovery phase employs reflexive (e.g., mindfulness), and restorative practices (e.g., nutrition and sleep hygiene). The periodization of cognitive training in this cycle is hypothesized to optimize both real-time cognitive performance and the resilience that enables life-long thriving. One of the most promising avenues of validating this hypothesis is by developing an expert consensus on the key CF ingredients and their relative importance in high-performance settings.

Keywords: cognitive fitness, task performance, operational readiness cycle, RDoC domains cognitive functioning, measurement, trainability

# INTRODUCTION

Performance psychology is a rapidly expanding field that is of growing significance to a wide range of occupations—from competitive sport and performing arts to first responder and military professions. These user groups share a common focus on striving for superior performance in challenging tasks under stressful conditions, and on effective recovery to enable repeat performance across the lifespan. The physical and psychological factors contributing to task performance are tightly interconnected and go beyond mere ''wellness'' (i.e., the absence of pathology). They include, apart from knowledge and skills, a range of ''capacity'' factors, such as strength, endurance, and flexibility, that is best summarized by the concept of ''fitness.'' This article introduces the concept of Cognitive Fitness (CF) as a capacity to deploy neurocognitive resources underpinning the execution of goal-directed action and proposes a hypothetical set of its ingredients. Similar to physical fitness (PF), CF enables the application of knowledge, skills and attitudes (KSA) in generating task performance. The ingredients of PF are well established, with robust measurement protocols for muscular strength, aerobic/anaerobic endurance and range of motion/joint flexibility (Jeffreys and Moody, 2016), as well as validated training interventions such as strength and conditioning, cardiovascular fitness, or high-intensity interval training.

The meaning of psychological fitness is less clear, with this widely used term referring to diverse characteristics ranging from ''character strengths and assets'' (Cornum et al., 2011; Vie et al., 2016) to ''resources that provide protection against the development of mental disorders'' (Wesemann et al., 2018). The concept of mental fitness (MF) has emerged in the mental health and positive psychology literature (McCarthy, 1964; Seligman, 2008) to promote a positive and proactive notion of mental health. The MF literature is focused on identifying protective factors, such as cognitive flexibility, implicated both in the prevention of mental illness and in the promotion of flourishing (Keyes, 2007; Robinson et al., 2015). MF is also critical in the world of work, especially in high-stakes occupations where cognitive lapses can undermine the performance of complex socio-technical systems, while individual's and teams' superior capacity to sense, think, decide and act is widely seen as conferring critical performance advantages (Baker and Phillips, 2000; Bowers and Cannon-Bowers, 2014; Fletcher and Wind, 2014; Herzog and Deuster, 2014; Ahn and Cox, 2016; Bogga, 2017).

The growing literature on cognitive readiness (CR; Foster, 1996; Morrison and Fletcher, 2002; Grier, 2011; O'Neil et al., 2014) has developed extensive conceptual models of factors contributing to sustained professional performance in complex, dynamic, and unpredictable environments (Bolstad et al., 2006; Grier, 2012). In particular, CR has been construed as a comprehensive set of predictors—both distal and proximal—of cognitive performance by the military personnel in complex missions facing agile, near-peer opposition (Kluge and Burkolter, 2013; Sotos, 2019). The key components of CR include: (1) trainable skills, knowledge and attitudes (KSAs); (2) dynamic functional states; and (3) stable, trait-like characteristics ranging from cognitive ability to working memory and learning styles (Grier, 2012; Mason and McQuade, 2013). CF (Aidman, 2017) corresponds to the cognitive element of this latter CR component. Despite the lack of causal cognitive mechanisms explained by the CR construct (Crameri et al., 2019) it has been instrumental in stimulating the development of measurement frameworks for assessing individuals' and teams' fluctuating capacity for operational task performance (Fatkin and Patton, 2008; Grier et al., 2012) and evaluating training interventions to improve it (Kluge and Burkolter, 2013; Peña and Brody, 2016). The explanatory power of the CR construct can be enhanced by establishing its connections to causal factors, such as arousal regulation, discomfort tolerance and inhibitory control, underpinning an individual's performance in cognitively demanding tasks. These fundamental, biologically traceable dimensions of cognitive functioning have been well established in clinical neuroscience (Cuthbert and Insel, 2013; Appelbaum, 2017). Their incremental predictive validity for the assessment of psychopathology (e.g., Yücel et al., 2019) and mental health in nonclinical populations (Carcone and Ruocco, 2017) indicates that these broad domains of cognitive functioning can potentially underpin CR as well. The concept of CF (Aidman, 2017) was introduced to examine these causal connections between CR and cognitive factors underpinning mental health, to bridge the gap between the CR and MF literature, and to develop a more tractable and systemic approach to the assessment and training of high-performance cognition. CF is focused on the ''capacity'' component of CR—as distinct from its ''knowledge, skill and expertise'' (KSE) components. The aim of the current article is to develop a working definition of CF, by drawing on the ''why,'' the ''what'' and the ''how-to'' questions from the cognitive training and readiness literatures, and connecting them to the broad, biologically traceable domains of cognitive functioning developed in clinical neuroscience. A hypothesized set of constituent elements of CF is then articulated in a Cognitive Fitness Framework (CF2), followed by a discussion of its potential applications in the areas of assessment, training periodization and operational augmentation. Finally, a research agenda is suggested to improve CF2 and the measurement tools supporting it.

### COGNITIVE TRAINING: FROM "HOW TO" TO "WHAT" AND "WHY"

MF has been defined as a set of malleable attributes that can develop through regular practice—analogous to physical training (Robinson et al., 2015). How trainable are the elements of CF is an empirical question. The growing literature on cognitive training has accumulated promising evidence of its effectiveness in areas such as visuospatial training in developmental populations (Boccia et al., 2017) and memory training for older adults (Gross et al., 2012; although see Redick, 2019; for a broader analysis of the latter).

More specifically, the occupational performance literature has accumulated promising evidence of the cognitive attributes that are considered both trainable and capable of producing reliable performance gains for the end-user—be it an athlete (Morris and Summers, 2004; Fadde and Zaichkowsky, 2018), police officer (Page et al., 2016), first-responder (Joyce et al., 2019), or a warfighter (Adler et al., 2015; Cooper and Fry, 2018; Blacker et al., 2019).

Despite the inconsistent evidence for the effectiveness of cognitive training (Walton et al., 2018; Redick, 2019) and substantial gaps in conceptual integration of cognitive attributes relevant to high-stakes performance applications, core psychological skills such as goal setting, imagery, attention and stress/arousal regulation have been shown to improve with systematic, deliberate practice (Zaichkowsky and Peterson, 2018) characterized by immediate performance feedback and gradual improvement through repetition (Ericsson, 2008). However, the literature is not clear on the parameters of such practice, what it should target and when—or even on how complete this skillset is. Where do you start? What should you train first, second, last? In what combinations, doses, frequency, with what recovery times? At what phase of your training cycle? If you have to choose, which ones are more important? What are the skill fade rates and refresher training requirements? This systemic picture seems missing here, with the resulting symptoms of slow progress, such as persistently low rates of transfer of cognitive training to untrained tasks (Redick, 2019). At the same time, the field is flooded with a plethora of technological inventions for cognitive training and augmentation that are more focused on demonstrating the diverse new technologies than on what cognitive faculties are worth training for (with the attendant questions about the limits of their trainability), and what the best methods of training them are in a holistic and pragmatic approach that accounts for how these faculties develop and fade, improve and decline through existing real-life processes of maturation, aging, training and education, medical treatment, et cetera.

Addressing the questions of trainability would open up a realistic, evidence-based consideration of those cognitive attributes that are either unlikely to improve through training (and thus should be selected for) or fluid/cyclical in nature (and thus need to be monitored and augmented). Unpacking the neurocognitive mechanisms underlying task performance is critical to assessing attribute trainability. This mechanistic analysis requires the increasingly relevant evidence from neurosciences about factors impacting cognitive functioning—from genetics to social interactions.

# DOMAINS OF COGNITIVE FUNCTIONING

With a ground-swell in the mental health literature suggesting that mental illness is not a category, CF2 suggests that neither is high-performance. Both are natural consequences of the varying levels of psychological functioning (including cognitive, affective and motivational) ranging from deficit to norm, and further to high or gifted performance. Broad expert consensus exists on key domains of cognitive functioning that underpin mental health (Morris and Cuthbert, 2012; Yücel et al., 2019). The deficiency of categorical diagnostic systems is well known—DSM-5 and ICD are being challenged by the Research Domain Criteria (RDoC; Cuthbert and Insel, 2013) framework with its broad dimensions of cognitive functioning (Appelbaum, 2017; Clark et al., 2017).

RDoC defines ''major domains for the study of mental illness and validate them using optimal genetic, neuroscientific, physiological, behavioral, and self-report measures'' (Morris and Cuthbert, 2012). The long-term goals of RDoC were to validate tasks for use in clinical trials, identify new targets for treatment development, and provide a pathway by which research findings can be translated into changes in clinical decision making. RDoC identified broad higher-level domains of functioning that comprise multiple sub-dimensional constructs, reflecting state-of-the-art knowledge about major systems of cognition, motivation, and social behavior. In its present form, the RDoC Matrix contains five broad Domains cognitive functioning that are differentiated into 23 main Constructs (shown in brackets here) which are further divided into Subconstructs (Clark et al., 2017):


The full list of RDoC constructs is regularly updated at https://www.nimh.nih.gov/research/research-funded-bynimh/rdoc/constructs/rdoc-matrix.shtml. The constructs are being continuously revised and refined, with the overall goal of improving measurement validity and treatment efficacy. Expert consensus frameworks have become a best-practice standard, they are known to stimulate research discoveries and accelerate translational pathways by estimating the relevance of primary RDoC constructs (and their sub-dimensions) to specific application domains such as substance and behavioral addictions (Yücel et al., 2019).

RDoC has informed the development of reliable and valid measures across a range of units of analysis for each construct—from genes and cells to neurocircuits, whole-body physiology (e.g., heart-rate or event-related potentials), behavior and subjective experience (e.g., Passell et al., 2019). These measures have enabled and inspired studies to determine the full range of variation along with these measurement constructs, from deficit to norm and characterizing both clinical and nonclinical populations (Carcone and Ruocco, 2017). Extending this range to the well-adjusted functioning and high-performance domains is an important next step, given that nonclinical populations have been under-represented in the current RDoC-driven research. This would require developing an expert consensus on the relative importance of primary RDoC constructs and their sub-dimensions to various high-performance applications. Consequently, CF2 is aimed at building on RDoC foundational evidence in order to define major domains for the study of CF and develop guidelines for assessing them using an optimal mix of biomarker, physiological, behavioral, and self-report measures (Aidman, 2017).

# COGNITIVE FITNESS FRAMEWORK

Integrating the evidence about the mechanisms of cognitive deficit and psychopathology (the RDoC literature) with what is relevant to high performance by well-adjusted individuals who are motivated to excel, remains incomplete. While the exact composition of RDoC domains relevant to work performance is awaiting full articulation through consensus studies similar to Robinson et al. (2015) and Yücel et al. (2019), their preliminary scoping can be informed by the widely recognized ''psychological skillset'' established in performance psychology (Zaichkowsky and Peterson, 2018). In exercise sciences, the ingredients of PF are well-established and include strength/power, endurance, agility and flexibility (Jeffreys and Moody, 2016). The corresponding features of cognitive performance include focus intensity (Sherlin et al., 2013) for strength, attention span and mental effort tolerance (Aidman et al., 2002, 2016; Aidman, 2005) for endurance, task shifting (Genet and Siemer, 2011), cognitive flexibility and creativity (Palmiero et al., 2019) for flexibility, and adaptability (Chandra and Leong, 2016; Zhang et al., 2019) and self-regulation (Schunk and Greene, 2017) for agility. Research evidence accumulated in sport psychology and other high-performance contexts, points to the same core domains of cognitive functioning, while their relative importance may depend on the specifics of task and mission profiles under consideration.

In particular, the MF resource index (Robinson et al., 2015) populates the same three categories—strength, endurance, and flexibility—with a set of positive psychology constructs such as self-efficacy (for strength), acceptance (for flexibility) and resilience (for endurance). These allocations are metaphorical—they ''employ metaphor'' (Robinson et al., 2015, p. 56) to create constructs that are similar to the well-understood components of PF. As a result, their neuro-psychological bases remain unclear, and the question of their measurement is left wide open. The extensive set of CR constructs (Cosenzo et al., 2007; Grier, 2011, 2012; O'Neil et al., 2014) is focused on higher-order abilities, such as decision making, problem-solving and metacognition, it also contains some underlying cognitive capacity constructs of agility, speed of processing and memory capacity (for review, see Crameri et al., 2019).

Based on the literature summarized above, CF can be defined as a ''multi-faceted and differentially malleable capacity to deploy neurocognitive resources, knowledge, and skills to meet the demands of operational task performance, and to sustain this performance throughout a career- and life-long application.''

The key aspects of this working definition are:

1. CF entails multiple capacity factors that are different from knowledge and skills.


Several factors impacting task performance and career longevity in high stakes occupations can be considered for inclusion in the CF set. **Table 1** summaries these operationally relevant constructs and notionally allocates them to the phases of the operational readiness cycle: from the foundational (cognitive gym) to advanced (readiness), mission-ready (operational tolerances), and recovery phase.

In particular, cognitive primaries, such as attention, executive control, and co-action, that underpin most cognitive skills, may potentially be trainable through the gold-standard ''isolate—overload—recover'' regimes at the initial, Cognitive Gym phase (Temby et al., 2015). More advanced training targets at the CR phase include stress management (arousal regulation and ''getting in The Zone on cue''), adaptability, teamwork, situation awareness (including detection, sense-making and prediction) and decision making (de-biasing and confidence calibration). The mission-ready training phase is focused on tolerances (from pain and sleep loss to monotony, frustration and uncertainty) and resistances (to distraction, deception or manipulation). These capacities get further enhanced through operational support (with tools such as decision aids and fatigue countermeasures). The recovery phase completes the cycle, with its role widely recognized by expert consensus (Kellmann et al., 2018; Reardon et al., 2019), employing both reflexive (e.g., mindfulness) and restorative practices (e.g., healthy eating, hydration and sleep hygiene) and relying on social support. The full cycle reinforces the cognitive fundamentals that are known for their contribution to both real-time cognitive performance under challenging conditions (Blank et al., 2014; Crameri et al., 2019) and the resilience that enables career longevity and life-long thriving (Seligman, 2008; Cornum et al., 2011).

# PERIODIZATION OF THE COGNITIVE FITNESS CYCLE

Some of the most important questions about the CF are centered around the sequential periodization of cognitive training. **Figure 1** shows a hypothesized sequence in which these elements can be addressed in the training cycle across the CF2. The following questions seem worth addressing in future research:



Notes: <sup>a</sup>The list is not exhaustive and is subject to validation through expert consensus. These training objectives are not directly linked to operational task performance. Similar to strength and conditioning in physical training, the products of CF training feed into the subsequent cognitive skills training, and only through this skill training—into operational performance. <sup>b</sup>Given the nature of the Operational Augmentation phase, performance augmentation tools are listed here instead of training objectives. <sup>c</sup>The recovery phase is focused on the development of habits and practices that promote cognitive fitness and, as such, they are applicable to all other phases of the Cognitive Fitness Cycle.

	- (a) CF training can be combined with existing training modalities that address physical fitness, technical skill acquisition or tactical skill application (Ward et al., 2017).
	- (b) CF training can be combined with mental/psychological skills training—the traditional province of sport

psychology where self-management skills get added or overlaid to the already acquired fitness, technical and tactical skills (Birrer and Morgan, 2010).

(c) CF training can be a stand-alone modality—a largely neglected area where cognitive capacity training becomes part of the foundational holistic fitness.

It is safe to assume that the selection of insertion points will be guided by the task, context and resources available. In addition, their relative effectiveness can be compared directly (Röthlin et al., 2016).

# APPLICATIONS

The CF2 is a step towards a consensus on the selection of cognitive attributes to train, the limits of their trainability, and the methods of assessing them in the 21st-century workforce. It helps to map out various lines of research effort and see where individual projects fit. For example, current research on foundational cognitive training is progressing under the construct of Cognitive Gym (Temby et al., 2015; Jarvis et al., 2019). On the other hand, the emerging work on team decision making has been driven by research evidence on the so-called c-factor (collective intelligence, or team ''smarts''). This c-factor has surprisingly little connection to the individual team members IQ, and instead has been linked to their interactional competence and team diversity (Woolley et al., 2010; Blanchard et al., 2018).

One of the core predictors of fitness and performance is executive functioning (EF)—a primary cognitive capacity underpinning self-discipline, attentional focus and impulse control. Its known predictive links include BMI and cardiovascular health (Schlam et al., 2013), learning outcomes in academic and occupational training settings, injury incidence and overtraining, job performance and resilience, post-traumatic stress and other mental health vulnerabilities (Moffitt et al., 2011). Even a modest gain in EF capacity (either through selection or training) is known to drive population-wide gains in health, learning achievement in education and training (Moffitt et al., 2013), and productivity/work safety (including injuries). Twin studies indicate heavy genetic influences on EF (for review, see Friedman and Miyake, 2017), while longitudinal research shows substantial intra-individual variability (Moffitt et al., 2011). Direct estimation of EF heritability (vs. trainability) would go a long way towards informing wide-ranging investment decisions about selection and training programs. Adding this cognitive mediator to the original predictor set would make both performance and health prediction models more comprehensive and holistic.

# CONCLUSIONS AND FUTURE DIRECTIONS

The construct of CF developed here is a means to address the gap between the CR, RDoC and MF literatures by offering a unifying framework to integrate the multitude of biologically traceable factors underpinning individuals' performance in cognitively demanding tasks, to assess their trainability and inform the development of methods to improve them through training and augmentation.

The CF2 is a working hypothesis mapping out the research agenda to identify and measure key attributes of CF, underpinning both real-time cognitive performance under challenging conditions and the resilience that enables career longevity and life-long thriving. CF2 also offers a hypothesized sequence for cognitive training in a CF training cycle. As a hypothesis, CF2 requires testing and validation. One of the most promising validation avenues is through the development of an expert consensus on the key CF ingredients and their relative importance in high-performance settings. The Delphi method utilized in expert consensus studies of cognitive functioning in mental health (see Yücel et al., 2019) appears a good fit for validating the CF2 hypothesis. Once the relative importance of CF constructs is confirmed through expert consensus, training protocol evaluation studies can inform the selection of training methods that are best suited for each CF construct, including the formulation of training objectives to complement the training targets for each training phase in CF2.

The next challenge is extending the range of measurement of the assessment tools measuring CF constructs to cover both cognitive deficit and gifted performance and to employ best-practice measurement protocols to improve the reliability, validity and utility of these assessment tools. These improvements in the measurement of CF constructs are critical to stimulating the design and development of the environments and protocols to improve CF, and to developing fieldable technologies to protect and enhance cognitive performance.

### REFERENCES


# DATA AVAILABILITY STATEMENT

No datasets were generated or analyzed for this study.

#### AUTHOR CONTRIBUTIONS

EA developed the conceptual framework for the article, drafted, revised, and approved the manuscript.

#### FUNDING

This work was partly supported by the Australian Army-funded Human Performance Research Network.

#### ACKNOWLEDGMENTS

The author would like to thank Leonard Zaichkowsky for his inspiring mentorship, and Nicholas Beagley, Amy Adler, Bevan Macdonald, David Crone, and Philip Temby for their feedback on the earlier versions of this article.


**Conflict of Interest**: The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Aidman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Investigating the "Flow" Experience: Key Conceptual and Operational Issues

#### Sami Abuhamdeh\*

Department of Psychology, Istanbul ¸Sehir University, Istanbul, Turkey

The "flow" experience (Csikszentmihalyi, 1975) has been the focus of a large body of empirical work spanning more than four decades. Nevertheless, advancement in understanding – beyond what Csikszentmihalyi uncovered during his initial breakthrough in 1975 – has been modest. In this conceptual analysis, it is argued that progress within the field has been impeded by a lack of consistency in how flow is operationalized, and that this inconsistency in part reflects an underlying confusion regarding what flow is. Flow operationalizations from papers published within the past 5 years are reviewed. Across the 42 reviewed studies, flow was operationalized in 24 distinct ways. Three specific points of inconsistency are then highlighted: (1) inconsistences in operationalizing flow as a continuous versus discrete construct, (2) inconsistencies in operationalizing flow as inherently enjoyable (i.e., "autotelic") or not, and (3) inconsistencies in operationalizing flow as dependent on versus distinct from the task characteristics proposed to elicit it (i.e., the conditions/antecedents). After tracing the origins of these discrepancies, the author argues that, in the interest of conceptual intelligibility, flow should be conceptualized and operationalized exclusively as a discrete, highly enjoyable, "optimal" state of consciousness, and that this state should be clearly distinguished from the conditions proposed to elicit it. He suggests that more mundane instances of goal-directed engagement are better conceived and operationalized as variations in task involvement rather than variations in flow. Additional ways to achieve greater conceptual and operational consistency within the field are suggested.

Keywords: flow, enjoyment, task involvement, intrinsic motivation, critical review

# INVESTIGATING THE "FLOW" EXPERIENCE: KEY CONCEPTUAL AND OPERATIONAL ISSUES

Csikszentmihalyi (1975) introduced the concept of "flow" 42 years ago in his groundbreaking book Beyond Boredom and Anxiety. The concept of flow was not entirely new – the experience itself held much in common with Maslow's (1964) conception of "peak experience," as well as accounts of ecstatic experiences by Laski (1961). However, Csikszentmihalyi's approach was appreciably more

#### Edited by:

Benjamin Cowley, University of Helsinki, Finland

#### Reviewed by:

Guillaume Chanel, Université de Genève, Switzerland Fernando Rosas, Imperial College London, United Kingdom

#### \*Correspondence: Sami Abuhamdeh samiabuhamdeh@sehir.edu.tr

#### Specialty section:

This article was submitted to Performance Science, a section of the journal Frontiers in Psychology

Received: 28 September 2019 Accepted: 21 January 2020 Published: 13 February 2020

#### Citation:

Abuhamdeh S (2020) Investigating the "Flow" Experience: Key Conceptual and Operational Issues. Front. Psychol. 11:158. doi: 10.3389/fpsyg.2020.00158

**73**

systematic and empirically driven than previous approaches. Within a few years, flow was the focus of hundreds of empirical studies from a diversity of fields including educational psychology, recreation and leisure sciences, game design, and many others.

Over the years, many predictors and consequences of "flow"<sup>1</sup> have been identified (e.g., Jackson and Roberts, 1992; Csikszentmihalyi et al., 1993; Jackson et al., 2001; Demerouti, 2006; Schüler, 2007; Stavrou et al., 2007; Engeser and Rheinberg, 2008; Fullagar and Kelloway, 2009; Nielsen and Cleal, 2010; Bakker et al., 2011; Rodríguez-Sánchez et al., 2011; Seger and Potts, 2012; Coffey et al., 2016). But what have we learned about flow itself – about the state of optimal experience – since Csikszentmihalyi introduced the concept in 1975? Here, the view is sobering. The conceptualization introduced in 1975 remains essentially unchanged. Furthermore, fundamental questions persist. [For example, although flow is conceptualized as a multifaceted construct (**Figure 1**), very little is known regarding its latent structure – the causal relations among its proposed components, the relative contribution of each component to the overall flow experience, etc.]. Indeed, and perhaps most alarming, after almost 42 years of research, there appears to be significant disagreement among researchers regarding what flow actually is and how to measure it. This last point can best be appreciated by first reviewing the many different ways in which flow has been operationalized in the literature.

# A REVIEW OF FLOW OPERATIONALIZATIONS IN THE PSYCHOLOGICAL LITERATURE

Within any field of science, the consensual operationalization of central constructs is a sine qua non for progress. When this is lacking, results across studies cannot be compared, and the potential for progress in the field is severely undermined. To examine the degree of consistency with which flow has been operationalized within the psychological literature, a review was conducted, limited to publications from the past 5 years<sup>2</sup> . A PsychINFO search yielded the 42 publications listed in **Table 1** (see the **Appendix** for the specific inclusion criteria used to select these publications). As shown in the first column, across the 42 reviewed studies, flow was operationalized in 24 distinct ways. Furthermore, the differences between these operationalizations were often considerable, so that the meaning of "flow" often changed dramatically from one study to the next.

The fourth, fifth, and sixth columns of **Table 1** indicate three key ways in which the operationalizations differed. Column 4 indicates whether flow was operationalized as a continuous versus discrete construct in each study. Column 5 indicates whether flow was operationalized as enjoyable (i.e., "autotelic") or not. Column 6 indicates whether flow was operationalized using one or more of its proposed antecedents (i.e., clear goals, immediate feedback, and a balance of challenge and skill).

In the remainder of this conceptual analysis, I elaborate the nature of the three issues highlighted in **Table 1** and attempt to trace their origins. Based on my reading of Csikszentmihalyi's conceptualization of flow, I suggest that most operationalizations of flow currently found in the literature miss the mark. I argue that flow should be conceptualized and operationalized exclusively as a state of optimal experience – that is, as a discrete, highly rewarding state of consciousness – and that the potential for progress in our understanding of flow largely depends on it.

# THE THREE ISSUES

### Issue 1: Is Flow a Discrete or Continuous Construct?

Many psychological constructs, such as happiness, anxiety, and self-efficacy, represent continuous (i.e., spectrum and dimensional) constructs. At any given moment, your happiness may be very low, very high, or anything in between. Other psychological constructs, such as euphoria, fury, and the "suicidal mode" (Rudd, 2000), represent discrete (i.e., categorical and taxonic) constructs. Although it may be possible to locate them on a continuum, they are not applicable to its full range. Occasionally it is not entirely clear whether a construct is continuous or discrete. When this happens in the realm of science, fierce debate usually ensues in an attempt to resolve the conflict. An example of this can be found in the field of abnormal

<sup>1</sup>Here I put "flow" in quotes because, as will be shown, most studies of flow haven't operationalized flow as conceptualized by Csikszentmihalyi – as a (discrete) state of optimal experience.

<sup>2</sup>Thanks to ¸Sahika Dilgü¸sa Durmu¸s, Khaled Mahmoud Elazab, and Selenay Kele¸s for their help with this review.

#### TABLE 1 | Flow operationalizations in the psychological literature from the past 5 years.


<sup>1</sup>Operationalizations which included one or more items measuring "autotelic experience" (i.e. intrinsically motivating), but did not include items measuring "enjoyment" specifically, were nevertheless classified as having an enjoyment component, given that intrinsic motivation implies enjoyment. <sup>2</sup>The meaning of "related scales": We did not distinguish between long versus short versions of scales, nor did we distinguish between older versus newer versions of scales, nor did we distinguish between original versus translated versions of scales. They were all considered to be versions of the same scale and are not differentiated in the table.

psychology, where the designation of psychological disorders as continuous versus discrete has been hotly contested.

Looking at Column 4 of **Table 1**, we can see that in a majority of the studies flow was operationalized as a continuous construct, applicable to the full range of participants' experience in varying degrees. For example, the Flow State Scale-2 (Jackson and Eklund, 2002) composed of items intended to tap the six experiential characteristics of flow, as well as the three conditions (**Figure 1**), asks participants to indicate the extent to which the items characterize their experience in a just-completed activity on a 5-point Likert scale, ranging from 1 ("strongly agree") to 5 ("strongly disagree"). Responses to the items are usually averaged to compute a single "flow" score for each and every observation.

A few studies, in contrast, operationalized flow as a discrete construct. For example, two studies which used the experience sampling method (Csikszentmihalyi et al., 1977) used a "quadrant" approach popularized earlier by Csikszentmihalyi and his colleagues (e.g., Csikszentmihalyi and Csikszentmihalyi, 1988; Massimini and Carli, 1988; **Figure 2**). Using this approach, flow is operationalized as any observation in which both perceived challenge and perceived skill are both "high" (i.e., above the person's average).

So is flow a continuous construct which exists in greater or lesser degrees across the full range of human experience (like happiness, for example)? Or is it a discrete state that is sometimes experienced, but usually not? In the preface to Beyond Boredom and Anxiety, Csikszentmihalyi described flow as such:

"On the rare occasions that it happens, we feel a sense of exhilaration, a deep sense of enjoyment that is long cherished and that becomes a landmark in memory for what life should be like. This is what we mean by "optimal experience." (p. ii)

#### Also from the preface:

"From their accounts of what it felt like to do what they were doing, I developed a theory of optimal experience based on the concept of flow – the state in which people are so involved in an activity that nothing seems to matter; the experience itself is so enjoyable they will do it even at great cost, for the sheer sake of doing it." (p. iv)

As is evident from the passages above (and many others), Csikszentmihalyi conceptualized flow as an "optimal" state of consciousness, one that usually occurs relatively rarely in life. You can be in flow, or not in flow. When you are not in flow, Csikszentmihalyi referred to these states in his work as "non-flow" states (e.g., Csikszentmihalyi, 1975; Csikszentmihalyi and LeFevre, 1989).

Csikszentmihalyi and Csikszentmihalyi (1988) created the Flow Questionnaire as a first attempt to operationalize flow (see Moneta, 2012). Participants are presented with first-hand accounts of what it feels like to be in flow, and then are asked a series of questions including "Have you ever felt similar experiences?" and "If yes, what activities where you engaged in when you had such experiences?" Thus, the Flow Questionnaire operationalizes flow as a discrete construct. Csikszentmihalyi and his colleagues have also used the "quadrant model" (**Figure 2**) to classify states of consciousness as either flow or non-flow states (i.e., anxiety, apathy, boredom/relaxation) (e.g., Csikszentmihalyi and LeFevre, 1989; Shernoff et al., 2003). This measurement method, too, operationalizes flow as a discrete construct.

Given that Csikszentmihalyi and his colleagues have conceptualized and operationalized flow as a discrete construct, it may be surprising to learn that a significant majority of the studies conducted within the past 5 years operationalized flow as a continuous construct (**Table 1**). How did this come to be? To address this question, it is necessary to appreciate the difficulty of capturing flow. Flow is described as occurring rarely in regular life (Csikszentmihalyi, 1975, 1990). The rarity with which flow is experienced presents a serious problem for the flow researcher, as statistical power is strongly dependent on having a large sample size. The difficulty of capturing flow is compounded in the psychological laboratory, where participants engage in what is typically an unfamiliar task in an inherently evaluative context. Both of these attributes – the unfamiliarity of the task and the evaluative nature of the context – are likely to work against the (already slim) likelihood of flow being experienced by a study participant, given that (1) flow appears more likely to be experienced by individuals who have developed considerable skill in the activity at hand (Jackson and Csikszentmihalyi, 1999; Rheinberg, 2008; Marin and Bhattacharya, 2013; Cohen and Bodner, 2019) and (2) performance anxiety is not conducive to flow (Csikszentmihalyi, 1975; Fullagar et al., 2013).

One strategy to deal with this "problem of low N" is to reformulate flow from a discrete state of consciousness to one experienced in varying degrees across the full spectrum of conscious experience. Using this approach, any state of consciousness can be classified along a flow continuum, with one end being very low flow and the other end being very high flow (e.g., Jackson and Marsh, 1996; Rheinberg et al., 2003). By doing this, all observations collected in a given study may be included in statistical analyses and contribute toward calculated effects. But reformulating flow in this manner alters the concept in a fundamental way. Flow is by definition an optimal experience,

and so designating all other experiences as variations in flow (low flow, moderate flow, etc.) diminishes the intelligibility of the construct. "Low flow" is a contradiction in terms, just as "mild rage" and "moderate ecstasy" are, given that level of intensity is built into the construct.

Besides the conceptual confusion that results from operationalizing flow as a construct applicable to the full range of conscious experience, there is a second reason to avoid operationalizing flow in this manner. When the concept of flow is extended to apply to the full range of experience, it has questionable discriminant validity over pre-existing constructs in surrounding fields. Within the field of intrinsic motivation, dozens of studies have examined a state-level construct called task involvement (e.g., Harackiewicz et al., 1987; Elliot and Harackiewicz, 1994, 1996; Tauer and Harackiewicz, 2004; Abuhamdeh and Csikszentmihalyi, 2012a), which represents the degree to which an individual concentrates on and becomes absorbed in an activity. Research on task involvement predates the first operationalizations of flow as a continuous construct, and appears to have been influenced by Csikszentmihalyi's work on optimal experience (Harackiewicz and Sansone, 1991). If flow is reformulated as a continuous construct, how do we know associated findings are not redundant with what has already been found with respect to task involvement? What is presented as a new contribution to the psychological literature may in fact be old news.

In reality it seems unlikely that there is a sharp boundary between flow and non-flow experiential states. Such thresholds appear to be exceedingly rare when it comes to states of consciousness, even extraordinary ones such as flow. Nevertheless, because flow is conceptualized as an "optimal" experience, it should be operationalized as such. Or else it shouldn't be called "flow."

#### Issue 2: Is Flow Inherently Enjoyable?

In the preface to Beyond Boredom and Anxiety (1975), Csikszentmihalyi described the purpose of his research:

"The goal was to focus on people who were having peak experiences, who were intrinsically motivated, and who were involved in play as well as real life activities, in order to find out whether I could detect similarities in their experiences, their motivation, and the situations that produce enjoyment." (p. xiii)

From this passage, and many others, it is clear that Csikszentmihalyi conceptualized flow as an enjoyable experience. Indeed, it was the enjoyable nature of flow, and the positive implications this enjoyment had for motivation, that positioned it as a vehicle for skill development and personal growth (i.e., greater "complexity") (Csikszentmihalyi and Rathunde, 1998). Csikszentmihalyi hasn't veered from this initial conception. In more recent work by Csikszentmihalyi and his colleagues, the enjoyable, "autotelic" (i.e., intrinsically rewarding) nature of flow has been consistently emphasized (e.g., Nakamura and Csikszentmihalyi, 2009; Nakamura et al., 2019).

Despite Csikszentmihalyi's conceptualization of flow as a form of enjoyment, it is quite common for flow researchers to exclude enjoyment (or "autotelic experience") from their operationalizations of flow, as shown in **Table 1**. Of the 42 reviewed studies, 17 of them did not include enjoyment (or autotelic experience or intrinsic motivation) in their operationalizations. How did this come to be? Why is flow being operationalized by some flow researchers without an enjoyment component? In reviewing the history of this issue I identified several likely sources (Abuhamdeh, in press).

#### Source #1: Martin Seligman

Beginning in his bestselling book Authentic Happiness (2002), Seligman (2011) began asserting that "it is the absence of emotion, of any kind of consciousness, that is at the heart of flow." (p. 111). Seligman (2011)'s reasoning for this is expressed in many places, including his modestly titled follow-up book Flourish: A Visionary New Understanding of Happiness and Wellbeing (2011), in which he wrote: "I believe that the concentrated attention that flow requires uses up all the cognitive and emotional resources that make up thought and feeling." (p. 11).

Judging by how often he has been cited, flow researchers have taken Seligman's views on flow very seriously. But his assertion that flow is devoid of emotion is in direct conflict with Csikszentmihalyi's conceptualization of flow as a form of enjoyment (given that enjoyment is an emotion). Furthermore, the notion that the intensive allocation of cognitive resources to a task prevents emotions from being experienced is at odds with contemporary emotion theory and research. Perhaps the most complete account of how emotions are elicited is provided by appraisal theories of emotion (Arnold, 1960; Lazarus, 1966; Scherer, 1984; Smith and Ellsworth, 1985; Frijda, 1986; Oatley and Johnson-Laird, 1987). Among appraisal theorists, there is consensus that appraisals do not always require conscious intervention (Ellsworth and Scherer, 2003; Moors, 2010). In fact it is generally presumed that appraisal processes usually occur automatically (Smith and Kirby, 2001; Moors, 2010). Appraisals must be fast and efficient given that changes in the environment can occur very quickly (Lazarus, 2001). Thus, like other automatic processes, they need not consume significant attentional resources.

Appraisal theorists also agree that with increasing practice there is greater automatization of appraisal processes (Moors et al., 2013). This has particular relevance for flow because flow appears to be more commonly experienced by individuals who are quite skilled in the activity they are engaged in (and thus have logged many hours of practice) (Csikszentmihalyi, 1975; Dietrich, 2004; Marin and Bhattacharya, 2013; Cohen and Bodner, 2019). Therefore, it seems especially likely that any appraisal processes that may occur during flow are mostly or fully automatic.

#### Source #2: A Failure to Differentiate Between Experiencing Emotions and One's Awareness and Labeling of These Emotions

One defining feature of flow is an absence of self-awareness. Flow researchers have sometimes assumed that this absence of selfawareness during flow prevents the experience of emotion during flow. For example, from a recent paper (Kyriazos et al., 2018): "Flow-ers seem to be almost beyond experiencing emotions,

probably due to the absence of self-awareness. . ." But selfawareness is not a precondition for the experience of emotions, only the recognition and labeling of them. This is why nonhuman mammals who lack a sense of self are nevertheless capable of experiencing emotions (Panksepp, 2005). Similarly, among humans, those younger than 7 months (and who therefore have not yet developed a sense of self) are nevertheless able to experience a wide range of emotions (Izard et al., 1995). The only emotions not in the repertoire of these children appear to be the so-called "self-conscious emotions" (e.g., pride, shame, and guilt), which young children first appear capable of experiencing between the ages of 2.5 and 3 years (Lewis, 2008). Indeed, even children who lack a cerebral cortex are capable of experiencing emotions (Merker, 2007).

#### Source #3: Csikszentmihalyi's Confusing Usage of the Word "Pleasure" in His Work

In his book Flow (1990), Csikszentmihalyi wrote, "None of these [flow] experiences may be particularly pleasurable at the time they are taking place, but afterward we think back on them and say, "That really was fun" and wish they would happen again." This statement may seem to imply that the experience of flow itself may not be particularly enjoyable. However, to properly interpret this passage it is necessary to understand Csikszentmihalyi's unusual usage of the word "pleasure" in his work, and the sharp distinction he draws between pleasure and enjoyment. Csikszentmihalyi (1990) considers pleasurable experiences to be those that satisfy biological needs, such as eating and sleeping (p. 45). According to Csikszentmihalyi, the experience of pleasure is derived from "restorative homeostatic experiences." Thus an artist who stayed up all night feverishly working on a painting, foregoing both food and rest, did not have a "pleasurable" experience according to Csikszentmihalyi's usage, because the behavior did not satisfy any biological needs (in fact it was in conflict with them). But this should not be misinterpreted as implying that the artist did not enjoy him/herself.

## Issue 3: Should Flow Be Partly or Fully Operationalized Using Its Proposed Antecedents?

Csikszentmihalyi and his colleagues make a clear distinction between the conditions of flow and the experience of flow itself (**Figure 1**). Yet if we refer once again to **Table 1**, we see that a large number of studies ignored this distinction by operationalizing flow using both the experiential elements of the flow state and one or more of the conditions of flow. For example, in the Flow State Scale (Jackson and Marsh, 1996), some items measure the experiential elements of flow (e.g., "I had total concentration") whereas others measure the proposed conditions (e.g., "my goals were clearly defined"). The items are then usually averaged by researchers to yield a single "flow" score.

Given the strong distinction Csikszentmihalyi and his colleagues make between the conditions proposed to elicit flow and the state of flow itself, why is this distinction routinely ignored in empirical work? One explanation may be found in Csikszentmihalyi's earlier work. Though for the past several years Csikszentmihalyi and his colleagues have drawn a sharp distinction, this was not always the case. In Beyond Boredom and Anxiety (1975), for example, Csikszentmihalyi himself grouped the conditions of flow with the experiential elements by including all of them under the heading "Elements of the flow experience" (p. 38). And this continued for several years. In Flow (1990), he included both the conditions of flow and the experiential elements under the general heading "The elements of enjoyment." (p. x). It wasn't until approximately 20 years ago that Csikszentmihalyi and his colleagues began consistently differentiating the conditions from the experience.

Additionally, it should be noted that Csikszentmihalyi and his colleagues themselves sometimes operationalized flow based solely on the ratio of challenges and skills (e.g., Massimini and Carli, 1988; Csikszentmihalyi and LeFevre, 1989; Stein et al., 1995; Shernoff et al., 2003; Asakawa, 2004). Indeed, before the current popularity of flow scales, this was the most common way to operationalize flow. This likely served to further reinforce the idea that flow and the conditions that elicit it are one and the same.

So how to proceed? It has been argued that the primary objective of any scientific endeavor is to provide causal explanations (e.g., Popper, 1957; Shadish et al., 2002). Thus the conceptual distinction Csikszentmihalyi and his colleagues make between the conditions of flow and the state itself is an important one. Indeed, much of what distinguished Csikszentmihalyi's initial work on flow from previous work on peak experiences was that he attempted to not only describe the experience, but to explain it by identifying the conditions which elicited it. This is why Csikszentmihalyi's work on flow is sometimes referred to as a "model" or "theory." Without distinguishing cause from effect, however, it is neither.

That the distinction should be consistently made is supported by empirical findings, too. "Flow" (as measured by the Flow Short Scale, Rheinberg et al., 2003) is not always optimized by a balance of challenges and skills, which suggests that inferring flow based on this condition is not a safe bet (Engeser and Rheinberg, 2008). Indeed, the relationship between challenge and enjoyment appears to be very unstable across both activity and person (Abuhamdeh and Csikszentmihalyi, 2009, 2012b). This variation helps account for why the variance in subjective experience explained by challenge-skill ratios across all daily activities tends to be low (Ellis et al., 1994).

As can be seen in **Table 1**, most of the commonly used flow scales conflate the conditions and the experience. One notable exception among them, however, is the 10-item Core Flow Scale (Martin and Jackson, 2008), used in one of the 42 studies. The aim of the scale, as described by the authors, is "to assess the central subjective (phenomenological) experience of flow." Because this scale does not conflate the conditions of flow with the experience of flow, it may be the best option among the current fleet of validated scales. However when using this scale, or any other which purports to measure the components of flow, it is advisable to allow the weighting of the components to vary freely rather than the usual custom of assuming they are equal and taking their average, since the relative contribution of each component to the overall experience of flow in specific contexts is unknown (see Jackson and Marsh, 1996).

# TWO REMAINING QUESTIONS

fpsyg-11-00158 February 13, 2020 Time: 18:1 # 7

The preceding discussion raises two specific questions which deserve to be addressed here.

# Question 1: If Flow Is to Be Operationalized as a Discrete Construct, Where Should the Boundary Between "Flow" and "Non-flow" Be Set?

This is clearly a difficult question to answer satisfactorily.<sup>3</sup> A sharp boundary or threshold is unlikely to exist. Individuals who describe their optimal experiences do not commonly report a sudden transition point between flow and non-flow. This therefore presents a dilemma for the flow researcher, as any delineation of a cutoff would necessarily involve a degree of arbitrariness. Nevertheless, to remain true to flow's conceptualization as a discrete state, a boundary must be set.

Previous attempts to distinguish flow from non-flow have varied considerably in approach. The most common approach has been to classify experience based on challenge– skill ratios (such as the quadrant model shown in **Figure 2**). However, this approach infers flow based solely on a single proposed condition (the balance of challenge and skill), which, as previously discussed, is not warranted. Furthermore, dividing experience in such a manner often results in 25% or more of all daily experiences being designated as "flow" experiences (e.g., Csikszentmihalyi and LeFevre, 1989; Hektner and Asakawa, 2000).

Rather than the researchers deciding which experiences qualify as flow experiences, an alternative strategy has been to have the participants decide for themselves. Indeed, this is how Csikszentmihalyi initially began measuring flow experiences (see Moneta, 2012). In the Flow Questionnaire (Csikszentmihalyi and Csikszentmihalyi, 1988) respondents are first provided with a description of a flow experience, and then are asked to indicate whether they have ever experienced flow. If so, various followup questions about these experiences are then asked. Similar measures which tap single flow experiences have since been created (e.g., Novak et al., 2003). These measures appear to come closest to operationalizing flow as it is conceptualized – as a discrete, optimal state of consciousness. Unfortunately, they are not commonly used. Out of the 42 studies listed in **Table 1**, only one used such a measure.

Kawabata and Evans (2016), noting the inability of most commonly used flow scales to differentiate flow experiences from non-flow experiences (e.g., the Flow State Scale, Jackson and Marsh, 1996; the Flow Short Scale, Rheinberg et al., 2003), proposed a remedy. They first administered one of the more popular flow scales to participants (the Flow State Scale-2; Jackson and Eklund, 2002) immediately following physical activity of some sort (e.g., physical education class and training session). They then used latent class analysis to divide participants into four groups based on the participants flow scores. Kawabata and Evans noted that the participants in the two groups with the highest item-averages both had average scores greater than 3 (the midpoint of the 5-point scale), and on this basis they proposed that the participants in the two groups experienced flow. This constituted 54% of the sample. Though the sensibility of the criterion used in this case to delineate a cutoff appears dubious and resulted in a suspiciously high number of participants who were deemed to have experienced flow, the study represents the first serious attempt to rectify what is a major limitation of most flow scales.

Although no sharp boundary between "flow" and "non-flow" is likely to exist, this does not mean that a cutoff cannot be based on sensible criteria. This may seem contradictory, but such cut-offs are routinely designated for practical reasons in other fields, with success (for example in the medical sciences for high blood pressure, obesity, etc., as well as in clinical psychology for the assessment of psychological disorders). Taxonomic analytic techniques (Meehl, 1995; De Boeck et al., 2005; Ruscio et al., 2006) appear especially well-suited for identifying potential cutoff points. As one possibility, previous factor analyses based on data derived from flow scales indicate that two of the proposed components of flow – a lack of self-consciousness and a merging of action and awareness – load poorly on a higher-order "flow" factor (see Swann et al., 2018), even though these two features were commonly mentioned features of flow in Csikszentmihalyi's early interviews. One possible explanation for this is that these two features only become experientially salient at very high levels of involvement, which may have been underrepresented in the factor-analytic studies. If this is the case, the implied inflection point would offer a sound basis for a cut-off. More generally, taxonomic analytic techniques should help clarify whether flow represents a difference in quality of experience versus simply a difference in degree.

# Question 2: What About "Sub-Optimal" Experiences? Does the Flow Model Have No Relevance for Them?

In this conceptual analysis I've argued that flow should be operationalized as Csikszentmihalyi conceptualized it: as an exceptional, "optimal" experience. But what about less intense, "non-flow" states of goal-directed engagement? Does the flow model have no relevance when it comes to these much more common states? Clearly it does. There is evidence that all three of the proposed antecedents of flow (clear goals, immediate feedback, and optimal challenges), in at least some situations, promote enjoyment (Harter, 1978; Reser and Scherl, 1988; Abuhamdeh and Csikszentmihalyi, 2012b; Pratt et al., 2016). But the fact that the conditions of flow have relevance for these states should not prompt researchers to automatically label these states as flow, as doing so obfuscates the meaning of flow.

<sup>3</sup>The difficulty this presents is one reason why, in my own empirical work on goal-directed engagement, despite my longstanding interest in flow, I've resisted operationalizing flow altogether, instead opting to measure experience in a more piecemeal fashion using lower-level constructs that can be meaningfully applied to the full range of conscious experience (e.g., Abuhamdeh and Csikszentmihalyi, 2012a; Abuhamdeh et al., 2015).

It is interesting to note that Csikszentmihalyi himself recognized the relevance of the flow model for less intense states than flow. He introduced the concept of "micro-flow" to help account for such experiences (Csikszentmihalyi, 1975). However, the introduction of another discrete construct (with all the accompanying operational dilemmas) to account for less intense states at this point seems unnecessary. Two preexisting constructs in the motivation literature, mostly ignored by flow researchers, appear very capable of capturing such states. Crucially, both of them are continuous constructs that can be applied meaningfully to the full range of conscious experience.

#### Construct #1: Task Involvement

Flow has been described as being composed of cognitive, emotional, and motivational components (e.g., Delle Fave and Massimini, 2005). In terms of its cognitive aspect, the defining feature of flow is intense attentional focus on the task at hand (Nakamura and Csikszentmihalyi, 2002). It is this deep attentional involvement that appears to underlie several of the other characteristics of flow including the merging of action and awareness and the absence of self-consciousness (Dietrich, 2004; Csikszentmihalyi et al., 2005; Kawabata and Mallett, 2011).

Task involvement, as previously described, represents the degree to which an individual concentrates on and becomes absorbed in an activity (Elliot and Harackiewicz, 1994). Operationalizations usually include items that measure both absorption and concentration. The task involvement construct nicely captures the central cognitive feature of flow. In contrast to flow, however, task involvement is a purely cognitive phenomenon representing the degree of attentional involvement in an activity; it is not inherently enjoyable and motivating in concept, though it often predicts both (Abuhamdeh and Csikszentmihalyi, 2012a).

#### Construct #2: Intrinsic Motivation

Because of the enjoyable nature of flow, it is "autotelic," meaning it motivates the person who experiences it to continue doing what he/she is doing. The meaning of autotelic and intrinsic motivation are synonymous. Intrinsic motivation, as conceptualized and operationalized within the motivation literature, captures both the emotional and (therefore) motivational properties of flow, yet, in contrast, is applicable to the full range of conscious experience.

The standard way to measure intrinsic motivation is by asking participants how enjoyable and interesting the activity they are (or were) engaged in is. The measurement of both enjoyment and interest is important, because interest appears to be a positive emotion distinct from enjoyment (Tomkins, 1962; Izard, 1977; Panksepp, 1998; Silvia, 2008). This view is backed by empirical findings which indicate that interest and enjoyment, in at least some contexts, have different antecedents, as well as different trajectories in response to performance feedback (Reeve, 1989; Egloff et al., 2003).

In sum, the conditions of flow have implications for a much wider array of states than just flow. The constructs task involvement and intrinsic motivation appear particularly well-suited for capturing these states. The incorporation of these constructs into empirical investigations of goal-directed engagement has the added benefit of allowing the associated research findings to be more easily assimilated into the surrounding motivation literature.

# SUMMARY AND CONCLUSION

Almost 50 years ago, Csikszentmihalyi (1975) began a program of research with the aim of understanding the common experiential characteristics of so-called "optimal experiences," as well as the conditions which promote these experiences. To this end, he asked hundreds of rock climbers, chess players, artists, etc. to describe what their best moments felt like. Based on this research, Csikszentmihalyi developed the concept of "flow."

Since that time, hundreds of empirical studies have been conducted in an attempt to further understand flow. Yet if we survey the ways in which flow has been operationalized in these studies, we are forced to reckon with an unsettling fact: a consensual operationalization of flow has yet to be established. Across studies, operationalizations vary considerably, so that the meaning of flow from one study to the next often changes drastically.

In this conceptual analysis, I've highlighted three key inconsistencies found in flow operationalizations: (1) inconsistences in operationalizing flow as a discrete versus continuous construct, (2) inconsistencies in operationalizing flow as inherently enjoyable (i.e., autotelic) or not, and (3) inconsistencies in operationalizing flow as dependent on versus distinct from the task characteristics proposed to elicit it (i.e., the conditions/antecedents). I've argued that these inconsistencies are born out of conceptual misunderstandings, as well as the methodological difficulties inherent in operationalizing optimal experience.

The lack of a standard operationalization of flow does not bode well for the field. It is only by adopting a standard operationalization that questions about the nature of flow (e.g., is the distortion of time a consistent component of optimal experience?) as well as flow's relation to other constructs (e.g., what is the relationship between flow and performance?) can be addressed. It is only by the consistent application of a standard operationalization that a period of "normal science" (Kuhn, 1962) may ensue.<sup>4</sup>

Given that a standard operationalization of flow is needed, whose conceptualization of flow should it be based on? A tacit assumption made throughout this paper is that Csikszentmihalyi's conceptualization of flow is the only valid conceptualization. The reasoning for this is as follows: Unlike

<sup>4</sup> Swann et al. (2018) recently assessed the current state of flow research in sport and exercise psychology, using Kuhn's (1962) model of scientific development as a guide. Their provocative thesis was that flow research, following a long period of "normal science," is now approaching a "crisis point." However in Kuhn's (1962) scheme, "normal science" represents the practice of working within a firmly established research paradigm, characterized by, among other things, uniform conceptualizations and standard operationalizations. As shown in the current paper, flow research cannot be characterized as such. At least from a methodological standpoint, the current state of the field seems to have more in common with the preceding stage in Kuhn's (1962) scheme – what he referred to as the "pre-paradigm" stage. Indeed, in his famous book, Kuhn (1962) himself seemed to imply that all of the social sciences are pre-paradigmatic (p. 161).

most psychological constructs, which are generic in their nature (e.g., euphoria, misery, anxiety, etc.), we put "flow" in quotes (or italicize it, or write it with a capital F) because it is a proper noun, a term coined by a specific psychologist to represent his particular conceptualization of optimal experience. In other words, the term flow comes with Csikszentmihalyi's conceptualization "pre-installed." His conceptualization is therefore the default conceptualization, and this is true regardless of its merits.<sup>5</sup>

#### REFERENCES


Of course, once this conceptualization is operationalized in a valid and consistent manner, and systematically tested and evaluated, it may turn out that Csikszentmihalyi's conceptualization of optimal experience should be modified or updated in one or more ways. In this case, a revised conceptualization would be warranted. This would be a positive development, a sign of progress.

#### AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and has approved it for publication.

at within-individual level. J. Happ. Stud. 18, 861–880. doi: 10.1007/s10902-016- 9755-8


<sup>5</sup> By the same token, if I formulated a conceptualization of ecstatic love which I called Glow, and other researchers, inspired by my work on Glow, wished to investigate it, they would need to operationalize Glow as I conceptualized it (as a state of ecstatic love) in order to make any claims about Glow based on their subsequent findings.


Izard, C. E. (1977). Human Emotions. Berlin: Springer.




**Conflict of Interest:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Abuhamdeh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# APPENDIX

fpsyg-11-00158 February 13, 2020 Time: 18:1 # 13

# How the Publications in Table 1 Were Selected

An "advanced search" in PsycINFO specified the following parameters:


This yielded 111 publications. Forty-one of these publications did not include a flow operationalization, and were therefore not included in the review. Of the remaining 70 publications, those which included one or more of the following features were also not included in the review:


This process yielded the 42 publications shown in **Table 1**. Although not exhaustive (given the inclusion criteria above), the listing is intended to be adequately representative of the operationalizations found in the psychological literature.

# Investigating Flow State and Cardiac Pre-ejection Period During Electronic Gaming Machine Use

W. Spencer Murch<sup>1</sup> \*, Mario A. Ferrari<sup>1</sup> , Brooke M. McDonald1,2 and Luke Clark<sup>1</sup>

<sup>1</sup> Centre for Gambling Research at UBC, Department of Psychology, The University of British Columbia, Vancouver, BC, Canada, <sup>2</sup> Faculty of Medicine, The University of British Columbia, Vancouver, BC, Canada

Flow activities (e.g. sports and gaming) have been associated with positive affect and prolonged engagement. In the gambling field, modern electronic gaming machines (EGMs, including modern slot machines) have drawn concern as a potentially flowinducing activity that may be associated with gambling-related harms. Current research has heavily relied on self-reported flow, and further insights may be afforded by physiological methods. We present data from three separate experiments in which selfreported gambling flow and cardiac pre-ejection period (PEP; a measure of sympathetic nervous system arousal) were examined. Male undergraduate participants gambled on a genuine EGM in a laboratory setting for a period of at least 15 min, and completed the Flow subscale of the game experience questionnaire (GEQ). Aggregated data were analyzed using multilevel regression. Although EGM gambling was not associated with significant changes in PEP across participants, we found that self-reported flow states were associated with significant decreases in PEP during the first five minutes of EGM use. Thus, participants who experienced flow showed a greater sympathetic nervous system response to the onset of gambling. Though these effects were consistent in experiments 1 and 2, in experiment 3 the effect was inverted during the same time window. We conclude that flow during EGM gambling appears to be associated with early changes in sympathetic nervous system activity, but stress that more research is needed to characterize boundary conditions and moderating factors.

Keywords: flow, immersion, heart rate, gambling, slot machine, electronic gaming machine, impedance cardiography, pre-ejection period

#### INTRODUCTION

"Although it is possible to flow while engaged in any activity, some situations appear to be designed almost exclusively so as to provide the experience of flow."

–Mihaly Csikszentmihalyi

Flow – the trance-like experience of extreme focus on a task or activity – is often described in the context of leisure activities such as rock climbing, chess, or art (Jackson and Marsh, 1996; Csikszentmihalyi, 2014; Stavrou et al., 2015). The experience is typically associated with increases in positive affect (Asakawa, 2004; Rogatko, 2009; Csikszentmihalyi, 2014; Murch et al., 2017).

Edited by:

Jussi Palomäki, University of Helsinki, Finland

#### Reviewed by:

Mike John Dixon, University of Waterloo, Canada Paula Thomson, California State University, Northridge, United States

> \*Correspondence: W. Spencer Murch spencer@psych.ubc.ca

#### Specialty section:

This article was submitted to Performance Science, a section of the journal Frontiers in Psychology

Received: 04 December 2019 Accepted: 07 February 2020 Published: 26 February 2020

#### Citation:

Murch WS, Ferrari MA, McDonald BM and Clark L (2020) Investigating Flow State and Cardiac Pre-ejection Period During Electronic Gaming Machine Use. Front. Psychol. 11:300. doi: 10.3389/fpsyg.2020.00300

**86**

According to Flow Theory, some activities are particularly adept at eliciting and sustaining the flow experience, and gambling has been proposed as one such "flow activity" (Csikszentmihalyi, 2014, p. 140, 146). A negative implication is that businesses may offer products specifically designed to encourage flow, capitalizing on prolonged or more frequent participation by individuals who are seeking to escape from stress or low mood (Schüll, 2012; Dixon et al., 2018).

Researchers first became interested in the flow-related aspects of gambling in the 1980s. Jacobs (1986) proposed that gambling activities can provide a pleasurable, trance-like sensation that reduces gamblers' self-awareness. He suggested that this state of absorption in gambling was akin to the clinical symptom of dissociation, although in modern accounts it appears more characteristic of non-pathological or "normative" dissociation (Butler, 2006; Thomson and Jaque, 2012). Jacobs posited that this state of absorption could contribute to gambling addictions, and that these experiences could be addictive in-andof-themselves (Jacobs, 1986, 1988). One study aimed to directly compare Csikszentmihalyi's and Jacobs' constructs in samples of student athletes and problem gamblers (Wanner et al., 2006). Results showed that the problem gambler group endorsed every item on both the Flow Trait Scale, and Jacobs' "Dissociation Questionnaire" (Jacobs, 1986; Jackson and Marsh, 1996). In an earlier analysis of data in the current study, we reported high internal consistency between items on the Dissociation Questionnaire (which includes feelings of being "in a trance," and losing track of time; Jacobs, 1986), and the Flow subscale of the GEQ ("I felt completely absorbed," "I forgot everything around me"; Poels and De Kort, 2007; IJsselsteijn et al., 2013; Murch and Clark, 2019), again suggesting considerable overlap between Flow Theory and Jacobs' absorption construct (but see Murch et al., 2019).

The susceptibility of regular gamblers to experiencing gambling flow is reliably associated with symptoms of disordered gambling; the Dissociation Questionnaire has been repeatedly correlated with measures of problem gambling (Kofoed et al., 1997; Diskin and Hodgins, 1999; Wanner et al., 2006; Noseworthy and Finlay, 2009; Hopley and Nicki, 2010; Cartmill et al., 2015; Murch et al., 2017, 2019; Dixon et al., 2018). In two experiments, gamblers were asked to monitor an area off-screen at the same time as they gambled on an electronic gaming machine (EGM; including modern slot machines), providing a response when target shapes appeared off-screen (Diskin and Hodgins, 1999; Murch et al., 2017). In both studies, levels of problematic gambling were associated with reduced detection of peripheral targets while gambling. This effect is consistent with the "attentional narrowing" mechanism proposed in Flow Theory (Csikszentmihalyi, 2014, p. 139). A more granular investigation of gambling flow found that specific flow experiences may have protective (losing track of time, autotelic experiences) or aggravating (senses of concentration and control) effects on gambling harms (Trivedi and Teichert, 2017; see also Palomäki and Laakasuo, 2016).

Some forms of gambling may be especially good at eliciting flow. EGMs are disproportionately associated with problem gambling (Breen and Zimmerman, 2002; MacLaren, 2015; Binde et al., 2017; Gainsbury et al., 2019). Recent accounts of EGM gambling have argued that these devices may be designed to maximize "time on device," conceivably via flow experiences (Schüll, 2012, p. 74). In an Australian survey of gamblers who endorsed feeling "in a trance" while gambling, 79% of respondents had been using an EGM at the time (Office for Problem Gambling, 2006). Several scholars have proposed that absorption in EGM gambling may be an effective (though ultimately maladaptive) coping strategy for those seeking to avoid symptoms of depression, anxiety, or stress (Schüll, 2012; Dixon et al., 2018, 2019).

Current research relies heavily on self-report measures of flow, which can be susceptible to disruption (e.g. by introducing a secondary task; Murch et al., 2017). Psychophysiological methods may provide alternative markers for investigating the gambling flow phenomenon more covertly. Past examinations of EGM use have suggested a role for both the sympathetic (Anderson and Brown, 1984; Griffiths, 1993; Coventry and Constable, 1999; Coventry and Hudson, 2001) and parasympathetic nervous systems (Murch et al., 2017; Murch and Clark, 2019). However, little evidence exists for a link between gambling flow and physiological measures. In two experiments, we found no significant relationships between EGM flow and respiratory sinus arrhythmia, a cardiac marker of parasympathetic nervous system tone (Murch et al., 2017; Murch and Clark, 2019).

The present study evaluated the relationship between gambling flow and sympathetic nervous system arousal, indexed by cardiac pre-ejection period (PEP). PEP is an impedance cardiography-derived metric, which approximates the interval between onset of the electrical signal that stimulates left ventricular contraction (QRS complex) and opening of the aortic valve (commencement of blood efflux from the left ventricle into the aorta). In human studies and animal models, PEP has demonstrated excellent validity as an inverse measure of sympathetic arousal (Cacioppo et al., 2007, pp. 461, 619). In human studies, PEP has been observed to decrease (indicating sympathetic arousal) in response to anger, disgust, and fear emotional induction, and increase in response to happiness, sadness, and amusement (Kreibig, 2010). PEP is also sensitive to reward anticipation and delivery: PEP decreased when participants anticipated social reward (Brinkmann and Franzen, 2017), and was linearly related to reward size in delayed-match-to-sample tasks (Richter and Gendolla, 2009; Brinkmann and Franzen, 2013).

We report data from three laboratory experiments, in which self-reported flow and PEP data were collected for an EGM gambling session that lasted at least 15 min. We first hypothesized that PEP would decrease (relative to baseline) in response to EGM gambling, indicating sympathetic nervous system arousal associated with the gambling activity. We divided the gambling sessions into 5-min blocks to test the time-course of this response, as the effects of gambling on PEP may not be uniform across a gambling session. Our second and primary hypothesis proposed that EGMrelated changes in PEP would interact significantly with participants' flow ratings.

#### Murch et al. Impedance Cardiography and Gambling Flow

## MATERIALS AND METHODS

fpsyg-11-00300 February 26, 2020 Time: 14:17 # 3

#### Participants

The studies included in these analyses were approved by UBC's Behavioural Research Ethics Board. Participants were recruited to three experiments conducted between 2015 and 2018 (N<sup>1</sup> = 121, age M = 21.25, SD = 2.91; N<sup>2</sup> = 80, age M = 20.55, SD = 2.37; N<sup>3</sup> = 106, age M = 20.80, SD = 2.39, **Figure 1**). Primary analyses for Studies 1 and 2 are already published (Ferrari et al., 2018; Murch and Clark, 2019), without the measures of PEP. Study 3 has not been submitted for peer-reviewed publication (Murch, 2016). Study 1 was primarily interested in examining testosterone change in relation to EGM gambling. Study 2 looked at levels of flow and heart rate variability during EGM gambling with differing bet strategies tested within-subjects. Study 3 examined gambling immersion using a social manipulation, in which participants who provided psychophysiological data were, in some cases, tested alongside other participants seated at adjacent EGMs. Participants in Studies 2 and 3 gambled while an experimenter seated behind them monitored the physiological recording. Participants in Study 1 gambled without anyone else in the room. All participants were male undergraduate students, at least 19 years of age, who responded to an online advertisement posted by the psychology department. Most participants were compensated with partial course credit, though some participants in Study 1 were paid \$15 CAD instead. Participants were included only if they were not high-risk problem gamblers (i.e. problem gambling severity index score <8, see below), had no allergies to gels or adhesives, and no current prescriptions for psychotropic or cardiac medications.

# Questionnaires

Participants completed the Problem Gambling Severity Index (PGSI), which probes past-year problem gambling symptoms (Ferris and Wynne, 2001). Responses to the nine items were rated on a 4-point Likert scale ranging from "Never" (0), to "Almost always" (3), and a total score was obtained.

After gambling on the EGMs, participants completed the Flow subscale of the GEQ ("I felt completely absorbed," "I forgot everything around me"; Poels and De Kort, 2007; IJsselsteijn et al., 2013). In Study 2, participants completed this questionnaire after each of four 5-min gambling blocks. Responses were given on a 5-point Likert scale ranging from "Not at all" (0), to "Extremely" (4). Scores for the two items were averaged, and scores were standardized within each study. Past analysis of these items in Study 2 indicated relatively high reliability estimates (Cronbach's α = 0.80; Murch and Clark, 2019).

#### Procedure

After providing written consent, participants completed the PGSI. Individuals scoring greater than seven (indicating high risk problem gambling), on this measure were excluded from the gambling task, and instead proceeded straight to debriefing. The lab was then cleared of any additional participants and participants providing physiological data were asked to remove their shirt for electrode placement. For the impedance signal, we applied eight Ag/AgCl electrodes (Vermed, Buffalo, NY, United States); four were applied laterally on the neck and four were applied laterally on the chest below the armpit (e.g. Gramzow et al., 2008). For an electrocardiogram, we then applied three electrodes to the upper left pectoral, upper right pectoral

and lower left abdomen. The data were relayed wirelessly to the RSPEC-R and NICO-R modules of a Biopac MP150 system (BIOPAC Systems, Inc., Goleta, CA, United States) recording at 1,000 Hz. Participants then put their shirts on and provided a 5-min baseline recording in a seated position. In Study 1, participants' baseline recording was obtained at the same time as they provided a saliva sample via passive drool into a small vial. Participants in Studies 2 and 3 were instructed to close their eyes during the baseline recording, but did not provide a saliva sample.

In each study, participants gambled on a genuine EGM for at least 15 min. Each EGM was a modern, multi-line device (see Dixon et al., 2014), set on a one cent denomination (i.e. if betting on a single line, each spin would cost \$0.01). In Study 1, participants gambled continuously on the EGM "Dragon's Fire" (Scientific Games Co., Las Vegas, NV). Study 2 consisted of four 5-min gambling blocks on the EGM "Buffalo Spirit," (Scientific Games Co., Las Vegas, NV). Study 2 participants completed the GEQ Flow questionnaire after each block. This introduced a break lasting approximately 1-min between blocks. Participants who provided PEP data in Study 3 gambled on "Double Diamond," or "Triple Diamond," (IGT, Las Vegas, NV, United States). Participants in all studies were provided \$40–60 CAD (equivalent to 4,000–6,000 in-game credits) to use on the machine. Each study constrained participants' betting strategies in some way. In Studies 1 and 3, a multi-line bet strategy was set, at \$0.40 and \$0.20, respectively, to ensure frequent reinforcement (Livingstone and Woolley, 2008; Murch et al., 2017). In Study 2, bet strategies were systematically manipulated, from one credit bet on one payline (i.e. a \$0.01 bet, the minimum), to five credits bet on each of 20 paylines (i.e. \$1.00 per spin). Each study involved a cash bonus incentive: participants in Study 1 were paid a \$10 bonus if they finished the session in profit (i.e. over 4,000 credits), whereas participants in Studies 2 and 3 received a variable bonus from \$2 to \$12 based on their remaining credits.

### Processing and Analyses

Pre-ejection period is defined as the latency between Q-wave onset in an electrocardiogram, which reflects the onset of the electrical signal prompting left ventricular contraction, and the upward inflection point (B) in the derived impedance signal, dZ/dt (Cacioppo et al., 2007). In order to address our time-course hypotheses and retain comparability to baseline recordings, PEP data were partitioned into 5-min blocks. As 15 min was the shortest session length, we extracted the first three blocks (0– 5, 5–10, and 10–15 min) from each study. Physiological data were visually inspected for artifacts. Blocks were excluded in cases where either the participant had run out of credit and stopped gambling, or serious artifacts precluded an accurate extraction of PEP. PEP extraction was completed using the PEP algorithm in Acqknowledge 4.4 (BIOPAC Systems, Inc., Goleta, CA, United States). Complete or partial PEP data was available for 218 participants across the three studies (**Figure 1**). Baseline PEP scores were subtracted from PEP scores for each block. This array of difference scores represents the change in PEP from baseline to each block, the dependent variable "1PEP."

We performed three linear multilevel regression models with maximum likelihood estimation to predict 1PEP given the block in which it was recorded, and the self-reported flow state score associated with that block. Participants in Study 2 gave flow ratings for each of the three blocks separately, while participants in Studies 1 and 3 gave a single flow rating for all blocks after the session was completed. 1PEP blocks were nested within participants and studies, and we examined indices of model fit (AIC and BIC) to determine that these factors should be modeled as random effects (Field, 2012). Block and study number were dummy-coded. Since 1PEP was calculated by subtracting baseline PEP levels, a value that reflects no task-related change is necessarily equal to zero, and as such the model intercept was suppressed. Models 1 and 2 directly address our hypotheses. Model 3 was included in order to explore the simple main effects of block and study on the relationship between flow and PEP. This allowed us to investigate whether any effects observed in Model 2 appeared heterogeneously across different experimental contexts.

Model 1: 1PEP predicted by block number.

Model 2: 1PEP predicted by block number and block-by-flow interaction terms.

Model 3: 1PEP predicted by block number and block-byflow-by-study interaction terms.

Analyses were performed in JASP, and R version 3.5.2, using the "nlme" package (Fox and Weisberg, 2011; R Core Team, 2018; JASP Team, 2019; Pinheiro et al., 2019). To assess the underlying assumptions of linearity and homoscedasticity, we calculated variance inflation factors and visually inspected the distributions of fitted and residual values at the levels of the factors and random effects. We were satisfied that the models did not violate the underlying assumptions of the analyses. These data and analyses have been publicly archived<sup>1</sup> .

# RESULTS

The overall mean PEP during baseline blocks was 106.00 ms (SD = 21.79 ms). Task-related PEP levels were comparable during Block 1 (mean = 105.98 ms, SD = 22.17 ms), Block 2 (mean = 105.32 ms, SD = 19.87 ms), and Block 3 (mean = 106.47 ms, SD = 20.36 ms). The mean GEQ Flow score was 1.62 (SD = 1.12) in Study 1, 1.14 (SD = 0.72) in Study 2, and 1.21 (SD = 1.02) in Study 3, indicating mild-tomoderate levels of flow in the three experiments. A one-way ANOVA indicated that the average GEQ Flow scores differed significantly between the three studies [F(2,197.62) = 6.93, p = 0.001; Welch correction employed due to unequal variances], with higher scores in Study 1 than Study 2 (pBonferroni = 0.004), and Study 3 (pBonferroni = 0.007), but not between Studies 2 and 3 (pBonferroni > 0.99).

### Regression Model Results

Model 1: Overall, there was no significant change in PEP relative to baseline levels [Block 1: B = 0.51, t(408) = 0.51, p = 0.61; Block 2: B = −0.37, t(408) = −0.36, p = 0.72; Block 3: B = 0.61, t(408) = 0.60, p = 0.55].

<sup>1</sup>https://dataverse.scholarsportal.info/dataset.xhtml?persistentId=doi:10.5683/SP2/ JFR1B3

Model 2: 1PEP again did not differ significantly from baseline for Block 1 [B = 0.70, t(405) = 0.70, p = 0.48], Block 2 [B = −0.22, t(405) = −0.22, p = 0.83], or Block 3 [B = 0.72, t(405) = 0.71, p = 0.48]. The block-by-flow interaction term was significant for Block 1 [B = −1.89, t(405) = −2.13, p = 0.03], but not for Block 2 [B = −0.87, t(405) = −0.96, p = 0.34], or Block 3 [B = −0.50, t(405) = −0.56, p = 0.58]. The model fit was not significantly improved over Model 1 [χ 2 (3) = 5.24, p = 0.16].

Model 3: 1PEP did not differ significantly from zero for Block 1 (p = 0.89, **Table 1**), Block 2 (p = 0.56), or Block 3 (p = 0.69). In Study 1, 1PEP interacted significantly with flow during Block 1 (p = 0.01, **Figure 2**), but not during Block 2 (p = 0.11), or Block 3 (p = 0.33). In Study 2, 1PEP interacted significantly with flow during Block 1 (p = 0.02), but not during Block 2 (p = 0.40), or Block 3 (p = 0.25). Lastly, in Study 3, 1PEP interacted significantly with flow during Block 1 (p = 0.02), but not during Block 2 (p = 0.10), or Block 3 (p = 0.10). Notably, the direction of the Block 1 effect differed from those observed in Studies 1 and 2. The model fit was significantly improved over Model 2 [χ 2 (6) = 16.03, p = 0.01].

#### DISCUSSION

We tested cardiac PEP as a potential sympathetic nervous system marker of flow while undergraduate students gambled on authentic EGMs situated in a laboratory environment. We examined whether PEP changes were associated with EGM use, the stability of these levels over time, and their associations with self-reported flow, using multilevel regression models that accounted for the nested data structure. We did not observe significant change in PEP from the pre-task baseline to gambling. When we examined the interaction between task block and flow on PEP during gambling, we found that self-reported flow was associated with decreases in PEP (indicating increased

TABLE 1 | Predicted 1PEP from baseline, Model 3.


Flow scores have been standardized. Block and Study factors represent dummycodes. Values in column B are unstandardized coefficients. Values in column SE (B) represent the standard error for the coefficient in that row.

in PEP during Block 1 compared to their baseline level. A participant who gave a flow rating of 0.19 (–1 SD) in Study 3 is expected to have a 3.87 ms decrease in PEP during Block 1. Bars represent the 95% confidence interval. <sup>∗</sup>p < 0.05.

sympathetic nervous system activity) during Block 1 (the first 5 min of gambling). When we explored this interaction within the three studies, we found opposing relationships between block and flow on 1PEP. Studies 1 and 2 showed results consistent with Model 2: higher self-reported flow states during gambling were associated with greater decreases in PEP during Block 1 (but not Blocks 2 or 3). In Study 3, flow was associated with increased PEP (i.e. reduced sympathetic activity) and again, this effect was only statistically significant during Block 1. Taking these results together, it appears that early physiological responses to EGM use were related to increases in participants' subsequent flow ratings. We have thus found tentative support for an association between subjective flow and fluctuations in sympathetic nervous system activity. Crucially, however, the direction of this effect may depend on particular aspects of the task procedure.

It is worth speculating on why the observed interactions with flow were limited to the first 5 min of gambling. As our flow ratings were taken at the end of the session in Studies 1 and 3, this firstly indicates that participants' early experiences of the EGM are particularly important in accounting for variability in later flow ratings (a kind of primacy effect). These results further indicate that the initial physiological response to EGM use is an important factor in determining whether the session produces flow overall. Perhaps early experiences that produce physiological change increase the likelihood that gamblers will experience flow. In future research, it would be fruitful to take multiple flow measurements within a prolonged EGM

gambling session, to characterize the subjective time-course, although such designs are challenging due to the potential for distractors to impair flow.

One possible explanation for the opposing results across the three studies is the social manipulation present in Study 3. Participants in that experiment were made aware that they may be gambling alongside other participants, and this may have impacted either their physiological response or experience of flow while gambling. Further, the gambling sessions in Study 1 (which saw the largest effect at Block 1) were conducted without an experimenter present in the room (in order to minimize any observer effects on risk-taking; Rockloff and Dyer, 2007; Rockloff et al., 2011). Thus participants' physiological responses to the gambling task may have been moderated by these social factors, either of researchers or other participants. Alternatively, our effects could be related to participants in each study employing different betting strategies. This necessarily affected the rate of reinforcement in these studies and may have also had an impact on self-reported flow state (Murch and Clark, 2019). Consistent with past findings, we found the highest levels of flow in Study 1, which employed a 40-line bet strategy to achieve high rates of reinforcement (Livingstone and Woolley, 2008; Templeton et al., 2015). Study 3 employed a smaller, 20-line strategy, and Study 2 compared several bet strategies that varied the number of lines bet, either 1, 5, or 20. Thus, if there is a real relationship between PEP and flow during EGM use, it may depend on additional factors that we could not systematically control in this aggregated analysis.

When not accounting for flow, we observed no significant change in PEP while gambling. Previous work has typically inferred sympathetic arousal from increases in mean heart rate during gambling, including on EGMs (Anderson and Brown, 1984; Griffiths, 1993; Coventry and Norman, 1997; Coventry and Constable, 1999; Coventry and Hudson, 2001). However, the physiology of heart rate change is complex, and affected by both branches of the autonomic nervous system (Cacioppo et al., 2007). Decreases in vagal tone while gambling (Murch et al., 2017; Murch and Clark, 2019), could potentially increase heart rate while sympathetic arousal remains constant, accounting for past results. A separate possibility is that heart rate effects did reflect sympathetic arousal in past experiments, but our laboratory environment or PEP measure may have lacked the sensitivity needed to detect a sympathetic response here.

Our findings are preliminary and intended to stimulate further enquiry; they have several important limitations. First, the three study protocols differed in numerous ways, and it is possible that methodological differences drove the disparate pattern of results. Second, the laboratory environment may have attenuated physiological reactivity. EGM gambling is regarded as an appetitive psychological challenge that involves intense audiovisual stimuli, motor actions and monetary outcomes, but responses to EGM use may differ based on whether the device is situated in a gambling venue, or in a laboratory environment (c.f. Anderson and Brown, 1984). Third, participants were convenience-sampled from an undergraduate population and were not regular EGM users. This potentially diminished both physiological responses to the EGM task, and the level of flow that was reported. Fourth, participants were men, because practical application of our PEP methods precluded the recruitment of women. Fifth, the GEQ Flow scale is unidimensional, focusing on absorption states, and other measures may provide insight into different aspects of the flow state (e.g. Jackson and Eklund, 2002). Finally, the block-by-flow-by-study analytic approach was exploratory, and the available data could not clarify why opposing effects were observed between the studies. Our preliminary conclusion is that cardiac sympathetic nervous system responses early in an EGM gambling session may affect subsequent ratings of flow for that session. However, followup studies should be undertaken in an attempt to replicate and clarify this effect.

# DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/supplementary material.

# ETHICS STATEMENT

The studies involving human participants were reviewed and approved by UBC Behavioural Research Ethics Board. The participants provided their written informed consent to participate in this study.

# AUTHOR CONTRIBUTIONS

WM collected and processed the data, aggregated the data across experiments, performed the analyses, and drafted the manuscript. MF and BM collected and processed the data, and contributed to writing. LC supervised the experiments, provided the testing facilities and research funding, and contributed to writing.

# FUNDING

The experiments in this manuscript were funded by the core funding of the Centre for Gambling Research at UBC. The Centre for Gambling Research was funded by the Province of British Columbia and the British Columbia Lottery Corporation, a Canadian Crown Corporation. WM and MF held graduate fellowships from the Natural Sciences and Engineering Research Council of Canada (NSERC). LC held an NSERC Discovery Grant (RGPIN-2017-04069).

# ACKNOWLEDGMENTS

The authors would like to thank Amit Chandna, Christopher de Groot, and Cameron Drury for their assistance in data collection.

# REFERENCES

fpsyg-11-00300 February 26, 2020 Time: 14:17 # 7



**Conflict of Interest:** The Centre for Gambling Research at UBC receives funding from the Province of British Columbia and the British Columbia Lottery Corporation (BCLC), a Canadian Crown Corporation. The slot machines used in the study were provided by the BCLC. The British Columbia Government and BCLC had no further involvement in the research design, methodology, conduct, analysis or write-up of the study, and impose no constraints on publishing. LC is the Director of the Centre for Gambling Research at UBC. LC has received speaker travel reimbursements/honoraria from the National Association of Gambling Studies (Australia) and the National Center for Responsible Gaming (United States), and academic consulting fees from Gambling Research Exchange Ontario (Canada) and the National Center for Responsible Gaming (United States). He has not received any further direct or indirect payments from the gambling industry or groups substantially funded by gambling. He has received royalties from Cambridge Cognition Ltd., relating to neurocognitive testing.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Murch, Ferrari, McDonald and Clark. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Flow States and Associated Changes in Spatial and Temporal Processing

Scott Sinnett <sup>1</sup> \*, Joshua Jäger 2†, Sarah Morgana Singer 2† and Roberta Antonini Philippe<sup>3</sup>

<sup>1</sup> Department of Psychology, University of Hawaii at Manoa, Honolulu, HI, United States, <sup>2</sup> Institute of Psychology, University of Bern, Bern, Switzerland, <sup>3</sup> Laboratoire PHASE, Institut des Sciences du Sport, Faculté des Sciences Sociales et Politiques, Université de Lausanne, Lausanne, Switzerland

Improved perception during high performance is a commonly reported phenomenon. However, it is difficult to determine whether these reported changes experienced during flow states reflect veridical changes in perceptual processing, or if instead are related to some form of memory or response bias. Flow is a state in which an individual experiences high focus and involvement in a specific task, and typically experiences a lack of distractibility, a disordered sense of time, great enjoyment, and increased levels of performance. The present pre-registered study investigated 27 athletes and musicians using a temporal order judgement (TOJ) task before and after a sports or music performance over three sessions. Participants' flow experiences were surveyed in order to measure how modulations of flow over successive performances potentially modulates spatiotemporal perception and processing. Hierarchical linear modeling showed a positive moderation of subjectively experienced flow and performance on post-measures of a TOJ task. Specifically, the higher the subjective flow experience of the sport or music performance was rated, the better the participant performed in the postperformance TOJ task compared to the pre-performance TOJ task. The findings of the present study provide a more comprehensive explanation of human perception during flow at high level performances and suggest important insights regarding the possibility of modulated temporal processing and spatial attention.

Keywords: flow, temporal processing, spatial attention, hierarchical linear modeling, perception, sport, music, high performance

# 1. INTRODUCTION

Several anecdotal claims regarding improved perception during flow states have been reported in various populations, yet it remains an open question as to whether there is a veridical change in perception, or if instead these reported improvements in perception are related to post-performance memory biases. As an example of this supposed improvement in performance, George Scott, a professional baseball player, stated in an interview: "When you're hitting the ball [well], it comes at you looking like a grapefruit. When you're not, it looks like a blackeyed pea" (Witt and Proffitt, 2005, p. 937). In an attempt to disentangle the question of whether perception is indeed modulated during optimum performance levels, Witt and Proffitt (2005) correlated performance when playing softball (i.e., batting averages) and the perceived size of a softball. According to Witt and Proffitt (2005), successful players perceived the ball to be bigger than less successful players, with this finding leading the authors to further claim that enhanced performance levels are indeed capable of modulating perception. Similar findings have been found with darts players, with throwing ability in darts influencing the perceived size of a target (Wesp et al., 2004; Cañal-Bruland et al., 2010), such that participants with better accuracy chose

#### *Edited by:*

Jussi Palomäki, University of Helsinki, Finland

#### *Reviewed by:*

Bruno Gingras, University of Vienna, Austria Paula Thomson, California State University, Northridge, United States

#### *\*Correspondence:*

Scott Sinnett ssinnett@hawaii.edu

†These authors have contributed equally to this work

#### *Specialty section:*

This article was submitted to Performance Science, a section of the journal Frontiers in Psychology

*Received:* 01 December 2019 *Accepted:* 18 February 2020 *Published:* 12 March 2020

#### *Citation:*

Sinnett S, Jäger J, Singer SM and Antonini Philippe R (2020) Flow States and Associated Changes in Spatial and Temporal Processing. Front. Psychol. 11:381. doi: 10.3389/fpsyg.2020.00381

**94**

bigger circles corresponding to the size of the target than participants with lower accuracy. In addition, people who perform better in archery report seeing larger targets when compared to their weaker counterparts (Lee et al., 2012), and high-performing golfers (compared to weaker golfers) perceive the size of the cup to be larger (Witt et al., 2008). Moreover, distances are perceived as longer by people who are overweight (Sugovic et al., 2016), by people carrying a heavy backpack (Proffitt et al., 2003), or when throwing a heavy object toward a certain destination (Witt et al., 2004). Witt (2019) provides a review on action-specific effects on modulated spatial perception for a review on action-specific effects on modulated spatial perception.

These types of performance-dependent modulations of perception extend to the temporal domain. For instance, Gray (2013) found that not only was perceived ball size larger amongst high performing baseball players, but also the perceived speed of the ball, which was rated to be slower by better players. Additionally, tennis players who played better than other players perceived the ball to move more slowly (Witt and Sugovic, 2010). This evidence dovetails with the phenomenological experience of time slowing during threatening events (Arstila, 2012) as well as how the prospect of reward can affect the subjective perception of time (Failing and Theeuwes, 2016). Different approaches might offer an explanation for these phenomenological circumstances, of how or why our subjective perception of time changes when performing well or when placed in fear provoking circumstances. One possible explanation could derive from memory biases. For example, it is possible that players who perform poorly could perceive the size of the target during the game as exactly the same size as better players, but recall the size of the targets to be smaller, perhaps as a means to justify their poor performance (Cooper et al., 2012). A memory bias could also lead to the perception that time seems to slow down, because richer than usual memories could later be improperly connected in such a way that they span a longer period than the experience on which they were actually based on (Arstila, 2012). Furthermore, neurophysiological correlates have been identified that are related to perceptual alterations (van der Kruijs et al., 2014) might play a crucial role in experiencing such phenomenological modulations. Ursano et al. (2007) discuss in their investigation about dissociative reactions during traumatic events the contribution of the cerebellum in perceptual alteration concerning time and space.

When conceptualizing the state of the literature that has explored the phenomenological experience of modulated perception (see for example Witt and Proffitt, 2005), it is important to note that mostly intra-individual differences in perception that may result from different levels of performance are not taken into account; instead, most investigations have simply focused on how better players compare to weaker players (e.g., players with better batting averages vs. worse batting averages). It is entirely possible that a better player might perceive the ball or a target to be larger than a weaker player. This essentially equates to a measure of good vs. bad players, and is therefore uninformative with respect to the question of whether or not performing at a high level modulates perception. Furthermore, to the best of our knowledge most researchers have also failed to consider the subjective evaluation that an athlete might have regarding their own personal performance, which could be different from the objective evaluation of, for instance, averaging their hitting rate. With specific respect to Witt and Proffitt (2005), batting average was based on self-report and calculated on a relatively small number of attempts (1–2 games only, without the exact number of at-bat attempts reported). As can clearly be seen, a longitudinal approach to investigating such phenomena is needed.

In addition to investigating potential perceptual modulations that arise due to high performance, phenomenological experiences associated with flow experiences have also been robustly explored. The experience of flow (Jackson and Csikszentmihalyi, 1999; Engeser and Rheinberg, 2008) refers to high performance in a task (e.g., athletics, music, etc.) that often involves increased levels of focus until complete immersion occurs, attention that is not distracted by anything irrelevant, feelings of optimal challenges, and deep enjoyment (Csikszentmihalyi, 1975, 2000). Recent research suggests that flow can be characterized by nine different dimensions (Csikszentmihalyi, 2000, 2002): challenge-skill balance (demanding situations in which the individual is engaged but not overwhelmed to meet the challenge), clear goals that derive from the activity, unambiguous feedback that helps individuals to constantly adapt in order to achieve their goals, concentration on the task at hand (one's focus relies on the activity and is not distracted by irrelevant stimuli), action-awareness merging (total immersion in the activity), loss of self-consciousness (individual's self-awareness and concerns regarding external evaluations decreases), increased sense of control (knowledge about the ability to keep things under control, if necessary), and transformation of time (disordered perception of time). The first three (i.e., challenge-skill balance, clear goals, and unambiguous feedback) are required conditions for flow to occur, while the remaining items refer to the phenomenological characteristics frequently associated with flow. These dimensions have been studied in many different populations mainly using self-report approaches (Moneta, 2012; Swann et al., 2012; Chirico et al., 2015; Stamatelopoulou et al., 2018; Habe et al., 2019).

Of direct concern to the research conducted here, the notion that the perception of time can be modulated when in flow states has been frequently reported, although these reports are almost exclusively anecdotal. For instance, an elite track and field athlete claimed that "When I went to throw it [the javelin], it was like things were in slow motion, and I could feel the position I was in, and I held my position for a long time" (Jackson, 1995, p.82). This statement, and others regarding the altered perception of time, refers to the speed at which the passage of time is experienced (Thönes and Stocker, 2019).

A challenge for the claim that the perception of time slows down during flow states can be found in the difficultly of disentangling the subjectively perceived experience of time from objective perception. It is unlikely that flow states would lead to (or arise from) a speed up in neuronal communication, with the question being further muddled by the fact that attempts to measure perception during flow states would almost surely take the individual out of that state. As such, questions related to time perception during flow states are limited to posteriori surveys, therefore, the underlying processes of any modulation in the perception of the passage of time still remain unknown (Wearden, 2015; Tanaka and Yotsumoto, 2017).

The aim of the present study is to investigate whether potential changes in temporal and spatial processing are modulated by increases in flow experiences. Possible modulations of temporal processing and spatial attention can be measured with a temporal order judgement (TOJ) paradigm, a task that has been widely used as a tool to measure temporal and spatial processing. By means of the TOJ task, two different values can be calculated: The just noticeable difference (JND) and the point of subjective simultaneity (PSS) (West et al., 2008; Lim and Sinnett, 2011). The former is a measure of temporal processing and refers to the smallest amount of time needed to accurately separate two stimuli 75% of the time, and thus be able to correctly identify the order of presentation. The latter is a measure of spatial attention and reflects the extent to which attention is distracted by a spatial cue, either peripheral (exogenous) or central (endogenous), such that the uncued side must be presented before the cued side in order for both stimuli to be perceived as having been presented simultaneously.

The cues in the TOJ task create a prior entry effect (Shore et al., 2001): Attended stimuli are perceived before unattended stimuli, showing that temporal processing is influenced by attention (Shore et al., 2001). By presenting such cues in the TOJ task prior to the onset of the first stimulus, attention should be, at least in theory, directed toward the cued side, resulting in the cued side being detected first, even when both items had been presented simultaneously. That is, if the left and right stimuli appear simultaneously, for example, the stimulus at the cued side will be perceived as having occurred first and the PSS would indicate a shift of attention toward the cued side (Shore et al., 2001). The shift might be greater for exogenous cues than for endogenous cues, due to increased volitional control over orienting effects for central cues (Shore et al., 2001). Notably, evidence has been observed that faster stimulus perception associated with the prior cue reliably results from the allocation of spatial attention and not from any potential response bias (Ulrich, 1987; Stelmach and Herdman, 1991; Shore et al., 2001; West et al., 2009).

The TOJ paradigm has been used in several situations as a viable approach for measuring perception. For example, Lim and Sinnett (2011) showed lower JND scores for musicians, suggesting better temporal discrimination in musicians than in controls. Similarly, modulations in visual attention were observed after extensive action video game play, with West et al. (2008) showing greater sensitivity to exogenous sensory stimuli and the potential that video game play modulates spatial attention. While neither of these studies considered whether temporal perception might be modulated when these groups of participants were in flow states, these modulations in information processing nonetheless are attributable to longterm experience, and on a neurological level, arguably due to increased neuro-plasticity. For instance, Granek et al. (2010) and Gong et al. (2016) used fMRI to show greater activity in the prefrontal cortex within video-game experts during complex non-gaming tasks, and increased functional integration between two critical neural networks for visual attention, namely the salience network and the central executive network, when compared to novices. These different brain patterns can be potentially explained by the task demands on visual attention that are associated with video games. Using behavioral and electrophysiological measures, Qiu et al. (2018) investigated the effects of short- and long-term action video gaming on measures of visual attention. After a short session of playing an action video game, experts and novices showed performance improvements in visual attention, with experts outperforming novices before the session. Importantly, modulated electrophysiological measures in novices were found. These findings provide evidence for a correlation between plasticity of visual attention and action video gaming, even after a brief session of gaming.

In this study we explored whether such modulation of temporal processing and spatial attention can also be observed depending on flow experience, in the shortterm. Given the continuum of performance levels within any performer, flow experiences provide an ideal place to investigate how phenomenological experience might possibly modulate perception. While this is clearly the case considering anecdotal evidence, it is unknown whether there are objective enhancements in perception when participants experience a higher feeling of flow compared with when they experience a lower state of flow. By measuring flow levels and temporal processing across multiple sport and music performances in practice or rehearsal sessions, we are able to address this question to an extent that has not been done previously, to the best of our knowledge. Precisely, if individuals in a flow state do experience a slow down in the perception of time, this should be correlated with improved temporal processing of stimuli (i.e., a smaller JND) when in a flow state compared to when they are not performing at that optimal level. Additionally, we extended this question to determine whether individuals in a flow state are less distracted by exogenous or endogenous cues, potentially suggesting enhanced spatial attention when performing in a higher flow state. The present study will help provide a more comprehensive explanation of modulated temporal processing and spatial attention during flow states with intra-individual differences in consideration.

## 2. MATERIALS AND METHODS

#### 2.1. Participants

Eleven athletes (mean age = 23.6, SD = 3.53, 4 female and 7 male) and 16 musicians (mean age = 20.8, SD = 3, 10 female and 6 male) from various sports and musical disciplines were recruited. One additional subject (athlete) was used for piloting and excluded from the analyses due to irregularities in the testing procedure and refinement of the experiment (e.g., increasing the number of repetitions of the TOJ task). The athletes had 14 (SD = 5.87) years of experience and practiced 18 (SD = 6.83) hours per week on average. The musicians had 8.8 (SD = 2.94) years of experience and practiced 8.5 (SD = 6.33) hours per week on average. Due to previous findings suggesting that skill level is correlated with the experience of flow (Catley and Duda, 1997; Engeser and Rheinberg, 2008), expertise was operationally defined as regular practice over several years in a particular discipline. Additionally, all athletes and musicians currently compete or perform at exceptional levels (e.g., NCAA Division II tennis players; performing musicians, etc.). The years of experience between athletes and musicians were significantly different, t(13) = 2.72, p = 0.017, as well as the weekly practice hours, t(20) = 3.65, p = 0.002. Altogether, 27 trained musicians and athletes (mean age = 21.9, SD = 3.47, 14 female and 13 male) participated. To cover a more general picture about flow across expertise types, a diverse sample of athletes (2 runners, 9 tennis players) and musician types (1 piano, 2 trumpets, 4 flutes, 2 clarinets, 2 saxophones, 1 bassoon, 1 oboe, 1 trombone, 1 tuba, 1 percussion) participated. The study was approved by the University of Hawaii at Manoa's committee on human subjects (CHS). All participants provided written informed consent before beginning the study. In order to compensate for their time, participants were offered the opportunity to participate in a mental preparation seminar held by one of the authors (RAP). Due to drop outs, altogether five experimental runs of two participants are missing.

#### 2.2. Task

The experimental task consisted of a temporal order judgment task (TOJ), adapted from Lim and Sinnett (2011), designed to measure temporal processing and spatial attention. Two versions with different conditions of the TOJ task were presented in separate blocks, one with exogenous cues and the other with endogenous cues (**Figure 1**). The TOJ tasks were presented successively in separate sessions and counterbalanced blocks. Approximately half of the participants started with the endogenous condition. Each trial started with a fixation cross in the middle of the screen flanked by two placeholder squares. The length between the outer ends of the placeholder squares to the fixation cross was 5.4 cm. After 1,000 ms, either an exogenous or an endogenous cue was displayed for 45 ms. Exogenous cues were created by thickening placeholder squares to 4 pixels, whereas endogenous cues consisted of a central arrow (1.2 cm). Following the appearance of the cue with a delay of 45 ms, the first target (horizontal or vertical line) was displayed in either the left or right placeholder square. The other target appeared in the other placeholder square after a specified stimulus onset asynchrony (SOA). Target orientation and appearance side were presented with the same probability of occurrence. The targets (1.2 cm) appeared within the placeholder squares (1.6 × 1.6 cm). Participants were then forced to make a choice on the keyboard to indicate on which side they perceived the target to appear first. To determine the SOAs for each trial, a 1 up-3-down adaptive staircase approach (Cornsweet, 1962) was used. Each block started with a SOA of 267 ms. Depending on whether 3 correct or one incorrect response was given, the SOA would respectively decrease or increase by 16.7 ms on the subsequent trial. Each block was finished after 14 reversals. Altogether, the TOJ task lasted ∼7–10 min. The experiment was programmed and run using the software PsychoPy 3.0 (Peirce et al., 2019).

# 2.3. Questionnaire

To measure flow states, the Activity Flow State Scale (AFSS) (Payne et al., 2011a) was used. The AFSS captures the 9 dimensions of flow [Merging actions and awareness (MAA); Clear goals (CG); Concentration on task at hand (CO); Unambiguous feedback (UF); Challenge skill balance (CS); Transformation of time (TT); Sense of control (CN); Loss of self-consciousness (SC); Autotelic experience (AE)] according to Csikszentmihalyi (2000, 2002) with 26 statements. It has a high reliability with Cronbach's alpha coefficients for the 9 subscales ranging from 0.71 to 0.90 (Payne et al., 2011a). It has been shown to measure flow in different populations and was specifically constructed to measure flow across a wide range of activities (Payne et al., 2011b; Osin et al., 2016). The items are rated on a 5-point Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). A global flow score (the mean of all items) was computed for each participant.

#### 2.4. Design

The experiment was divided into three sessions conducted ∼2–5 days apart (**Figure 2**). In each session, all participants completed a pre-TOJ task, including both endogenous and exogenous conditions, a sports or music performance (practice or rehearsal sessions), a post-TOJ task, again including endogenous and

second target.

exogenous conditions, a control question and the questionnaire (AFSS) at the end of each session.

# 2.5. Procedure

Participants were recruited according to their level of expertise. All sessions took place next to the participants' practice environment to ensure high ecological validity, and were managed by at least two researchers at a time. In the first session, participants were asked to provide informed consent, and then filled out a questionnaire collecting demographic information, including information about years of experience in their specific discipline and weekly practice amounts. The participants were then seated ∼ 60 cm from the monitor. The investigation was set up in a way that a maximum of six people could participate at the same time. Each session started with a written introduction for the upcoming task and a reminder to complete the task as quickly and accurately as possible. As soon as the participants were familiar with the task and felt ready, they began with the pre TOJ task by pressing the space bar on the keyboard. The task was divided into 2 blocks with a short break in between. Each block included either the exogenous or the endogenous condition, each lasting ∼ 3–5 min. After the TOJ task, the participants completed their sports or music performance (practice or rehearsal sessions), which lasted between 30 min to 2 h. The duration of the practice or rehearsal sessions depended on the usual practice schedule of the participants. Immediately after the practice or rehearsal session, the participants completed the post-TOJ task (same procedure as the pre-TOJ task) followed by the self-report questionnaire (AFSS) about their flow state during the sports or music session, altogether lasting around 10 min. Prior to beginning the AFSS, a control question ranging from 1 (strongly disagree) to 5 (strongly agree) was inserted in order to get an impression if participant's state, emotions, and body feelings were about the same during the practice or rehearsal session and the post-TOJ task, and intended to gauge whether the participants still experienced flow while doing the post-TOJ task. The task and the questionnaire were presented on 6 different 13.3" laptop computers each with a refresh rate of 60 Hz.

# 2.6. Analysis

To carry out the following reported analyses of this study, we used Microsoft Excel 2019 (Microsoft Corporation, 2019) and the statistical software R version 3.5.3 (R Core Team, 2019). The logistic regressions were performed using the R package "quickpsy" (Linares and López i Moliner, 2016). To fit the HLMs the R package "lme4" was used (Bates et al., 2015). To fit the bayesian HLMs the R package "brms" was used (Bürkner, 2016). The brms package implements bayesian HLMs in R using the probabilistic programming language "Stan" (Carpenter et al., 2017) under the hood, with an lme4-like syntax. Sorensen and Vasishth (2015) provide a detailed and accessible introduction to bayesian HLMs applied to cognitive science using Stan. The "Loo" package (Vehtari et al., 2017) was used to compare the bayesian models.

To estimate the JNDs and PSSs for each participant, we used a similar procedure as Lim and Sinnett (2011). At first, data from each participant was separated into endogenous cued trials, endogenous left or right cued trials, exogenous cued trials, and exogenous left or right cued trials. A logistic regression model was then fitted to each cue type for each participant's run, resulting in 36 models per participant, namely 12 JND scores (3 endo pre, 3 exo pre, 3 endo post and 3 exo post) and 24 PSS scores (3 endo pre right, 3 endo pre left, 3 exo pre right, 3 exo pre left, 3 endo post right, 3 endo post left, 3 exo post right and 3 exo post left). Two measures were then calculated for each participant's run. First, the JNDs were calculated independently for the endogenously and exogenously cued trials by taking the SOAs corresponding to 0.75 and 0.25 proportions, and then halving the distance between these SOAs (**Figure 3**). This halved distance is the specific JND for each condition (Endogenous/Exogenous) within participants for testing (pre/post), and session (1, 2, 3). Second, the PSSs were taken from the models of the right and left cued endogenous and exogenous trials for the SOAs corresponding to 0.50 proportion for right first responses (**Figure 4**). This is the point of maximal uncertainty, representing chance performance, and statistically, the PSS. To compare the PSS scores between the pre- and posttasks, the distances between the left and right PSS separate

for endogenous and exogenous pre and post measurements were calculated.

All estimated JNDs outside a range of 2.5 of the absolute deviations around the median (Leys et al., 2013) were excluded, due to response error rates and non-convergence of the algorithm fit, resulting in 39 data points being excluded (∼12.4%). With regards to the PSS, the distances between the left and right cued PSSs of the pre task were compared to the distances between the left and right cued PSSs of the post task, separated by session. All responses outside a range of 2.5 absolute deviations around the median (Leys et al., 2013) of the PSS pre-post difference were excluded, due to response error rates and nonconvergence of the algorithm fit, resulting in the exclusion of 37 data points (∼11.8%).

In order to test the pre-registered hypothesis of whether flow has an influence on the JND, hierarchical linear models (HLMs) were calculated. These models estimate the influence of flow on the change (moderation) of the JND from pre- to post-testing. HLMs account for data hierarchies as observed in repeated measurement designs. In our study, repeated points of measurement are nested within sessions and within participants, and this data dependency can be recognized by HLMs.

In order to test whether the mean values of the nested groups differ and whether the differences justify a three-level model structure, a fully unconditional model was created. To test if there is a general difference in the JND from the pre- to post-testing a predictor was added to the model to estimate this relationship. Additionally, we included condition (endogenous vs. exogenous)

measurements were taken for the endogenous right cued items (red), endogenous left cued items (dark red), exogenous right cued items (blue), and exogenous left cued items (dark blue) for the SOAs corresponding to 0.50 proportion (see gray segments for endogenous cued items and black segments for exogenous cued items). Notable is the larger gap between right cued items and left cued items for exogenous trials compared to endogenous (reflecting larger distraction by peripheral cues).

as a predictor in the model to test the general contrast between these two conditions. Lastly, the interaction of the difference from pre- to post-testing and flow as a predictor was included in the model. A significant fixed interaction effect between flow and the pre- to post-testing TOJ would indicate a moderation of the change in the JND by flow from the pre- to the post-testing, and therefore a relationship between flow and the JND.

Stegmueller (2013) have shown that frequentistic approaches to modeling HLMs are sometimes susceptible to relatively small sample sizes, whereas bayesian probabilistic models appear to be more robust. In particular, the bayesian probabilistic approach shows considerably better properties with regard to the estimation of confidence intervals. Therefore, due to our relatively small sample size, for the main analysis of the dependent variable JND, an additional probabilistic model, with a bayesian point of view, was estimated to support the results. An interaction term of pre- to post-testing and flow which does not include zero in its 95% credible interval would indicate an effect of flow on the JND.

#### 2.6.1. Exploratory Analyses

In order to investigate whether the PSSs are influenced by the amount of experienced flow, HLMs for the dependent variable PSS were estimated in the same way as for the dependent variable JND. In order to test whether the JNDs and PSSs within the

TABLE 1 | Number of items, mean value, standard deviation, and Cronbach's α for the nine subscales of the AFSS.


n = 78; MAA, merging actions and awareness; CG, clear goals; CO, concentration on task at hand; UF, unambiguous feedback; CS, challenge skill balance; TT, transformation of time; CN, sense of control; SC, Loss of self-consciousness; AE, autotelic experience.

musicians differ from those of the athletes, a predictor that estimates this distinction was calculated. To test whether the relationship between the JND or PSS and flow is dependent on the response of the control question, an additional interaction term that estimates this relationship was built into the models.

### 3. RESULTS

Across all three sessions, the mean of the global flow score as reported on the AFSS was 3.43 (out of 5), ranging between 1.58 and 4.85, suggesting that our participants did experience a broad variance of flow levels. The Cronbach's alpha coefficient for the AFSS was, α = 0.95. The mean values, standard deviations and Cronbach's alpha for the nine subscales can be found in **Table 1**. The mean answer on the 5-point Likert scale to the control question [i.e., "My state (body feelings, emotions and thoughts) during the task on the computer was similar to the state I had during my music/sports performance."] was 3.33 (SD = 1.09), suggesting that participants tended to affirm the statement.

### 3.1. Analysis of the JND

A model that estimates the explained variance by the nested groups showed that the subject effect (level 3) was responsible for 15% of the explained variance, and 38% of the JNDs variance was explained by session effects (level 2). This suggests the necessity of using a three-level model structure (pre-/post-testing nested in sessions within subjects).

When including pre-/post-testing of the JND as a fixed coefficient in the model, representing the estimated general difference between the measured JNDs before the practice or rehearsal session and after the practice or rehearsal session, the model's fit was significantly improved compared with the fully unconditional model, X 2 (1) = 4.82, p = 0.03. When including the factor pre-/post-testing as a random coefficient in the model (M1.1), there was no significant improvement in the model fit, X 2 (4) = 6.72, p = 0.15.

Including the discrimination of the conditions exogenous and endogenous as an additional fixed coefficient to our model to test TABLE 2 | Estimated coefficients of the JND-model.


CI, 95% Confidence Interval; Flow, Mean-centered Values.

for possible differences between these two conditions, the model's fit improved significantly, X 2 (1) = 19.66, p < 0.001. There was no significant improvement in the model fit, X 2 (3) = 7.73, p = 0.052, when including this predictor as a random coefficient.

To test whether flow is a moderator for the change of the JND between the pre- and the post-testing, an interaction term of the variable pre/post and flow was added as a fixed coefficient to the model (the coefficients of the model can be seen in **Table 2**). The interaction between the pre-/post-testing and flow amounts to −5.8 ms, SE = 3.02, 95% CI [−11.71 to 0.12], t(206) = −1.92, p = 0.055. The model fit did not significantly improve as we added the interaction as a fixed coefficient, X 2 (2) = 3.64, p = 0.16.

The estimated interaction of pre/post, flow and the condition (exogenous/endogenous) was not significant (p = 0.27), suggesting that there is no difference between the interaction of the pre-/post-testing and flow for the endogenous condition when compared with the exogenous task.

In regards to the JND, the interaction between pre/post and flow does not depend on the answer of the control question (p = 0.85). Thus, the connection between the values of the pre- /post-testing and flow is not stronger if the control question is answered higher.

#### 3.1.1. Probabilistic Modeling of the JND

The estimated interaction coefficient of Flow and the difference from pre- to post-testing within the bayesian model is −5.75 ms, SE = 3.04, 95% Credible Interval [−11.8 to −0.02]. The 95% credible interval can be interpreted in a way that there is a 0.95 probability that the value of the intercept lies between −11.82 and −0.02 ms. This effect is visualized in **Figure 5**. Low values of flow indicate a negative difference between the pre- and the post-testing and a high value of flow shows a positive difference between the pre- and the post-testing. It should be noted that the probabilistic estimate differs from the frequentistic estimate in terms of whether zero should be included in the confidence or the credible interval, respectively. The results of the estimates therefore do not seem to agree entirely on the extent to which the effect can be perceived as significant. Furthermore, a model

which includes the interaction term of pre- to post-testing and flow differs only slightly from a model which does not include this predictor in terms of its predictive accuracy (elpd\_diff = −0.2), and not larger than the standard error of these estimations (se\_diff = 0.3). This indicates that a model which includes the interaction term of pre- to post-testing and flow does not make a substantial contribution to the prediction of new data.

#### 3.2. Analysis of the PSS

A model that estimates the explained variance by the nested groups shows that 13.1% of the variance of the PSS was explained by participants. This indicates the necessity of a two-levelstructure (pre-/post-testing nested within subjects). A threelevel-structure (nesting the values of the PSS in sessions within participants) leads to an over-fitting of the estimated model, and therefore was not considered.

To test whether flow is a moderator for the change of the PSS between pre- and post-testing, an interaction term of the variable pre/post and flow was added to the model. As can be seen in **Table 3**, the estimation shows that a one unit change in the value of "flow" moderates the change from the preto the post-testing by −21.3 ms, SE = 7.28, 95% CI [−35.58 to −7.05], t(227) = −2.93, p < 0.01. Essentially, this means that for every unit increase/decrease in experienced flow the PSS improved/diminished by 21 ms, respectively, indicating improved control of spatial attention with increased flow. This model also significantly accounted for the explained variance compared to a model without this interaction, X 2 (3) = 9.82, p = 0.02. Adding the effect of the interaction as a random effect leads to an over-fitting of the data.

Probabilistic modeling of the data supports these results and estimates a coefficient of −20.9 ms, SE = 7.34, 95% CI [−35.06 to −6.41], for the interaction of flow and the change from preto post-testing. **Figure 6** shows the relationship from pre to post PSS changes depending on flow. Lower flow values are associated TABLE 3 | Estimated coefficients of the PSS-model.


CI, 95% Confidence Interval; Flow, Mean-centered Values.

with an increase in PSS values and high flow values are associated with a decrease in PSS values.

Including the interaction of pre/post, flow, and condition (exogenous/endogenous) to test if the moderation depends on the condition, there is neither an improvement of the explained variance [X 2 (3) = 1.32, p = 0.72] nor a significant three-way interaction [−13.93, SE = 14.50, 95% CI [−42.35 to 14.50], t(226) = −0.96, p = 0.34]. This indicates that the moderation of the PSS by flow remains the same under both endogenous and exogenous conditions.

In regards to the PSS, the interaction between pre/post and flow does not depend on the answer of the control question (p = 0.98), suggesting that the connection between the values of the pre-/post-testing and flow is not stronger if the control question is answered higher.

# 3.3. Effects of the Domain on the JND and the PSS

When directly comparing the performance between musicians and athletes (Domain), there are no significant differences in the JND scores (p = 0.53). However, the rate of change from the pre- to the post-testing seems to differ significantly by 11.3 ms, SE = 4.82, 95% CI [1.89–20.79], t(202) = 2.35, p = 0.020. Athletes have thus a 11.3 ms larger change from the pre- to the post-measurement compared to musicians. Regarding the PSS, no difference between musicians and athletes in general (p = 0.83) nor in their rate of change from pre-to post-testing (p = 0.57) was observed.

# 4. DISCUSSION

Phenomenological differences in perception while performing a demanding task have been reported on several occasions. To the best of our knowledge, this is the first study to address the question of whether increased flow experience can improve perception, with the aim to investigate the relationship between modulated temporal and spatial visual processing at different time points and flow states in a within-subjects design. Of critical importance, the participants in this experiment participated over several experimental sessions, thereby enabling the measurement of a range of self-perceived flow states from which we can assess whether increased self-reported flow correlated with improved spatial and temporal processing, as measured by the TOJ task. This controls for the possibility that previous research addressing similar questions simply measured perceptual performance between more skilled and lesser skilled participants: Wesp et al. (2004) demonstrated that accomplished darts players perceived the target to be larger than novices with lower abilities, Witt and Proffitt (2005) suggested that softball players who were successful at hitting recalled the ball to be bigger than players with less success, Lee et al. (2012) claimed that people who perform better in archery see the target as larger than weaker archers, and Witt et al. (2008) show that golfers with high performance perceived the size of the cup to be bigger compared to lower performing golfers. Furthermore, possible short-term modulations in temporal perception like the speed of the ball in baseball (Gray, 2013) or in tennis (Witt and Sugovic, 2010) was perceived to be slower among better players. Nevertheless, these studies failed to take into account the intra-individual differences of the participants, therefore it is difficult to claim that flow states (i.e., when a participant is performing optimally) does lead to improved perception. In the present study, intra-individual differences in flow state were repeatedly measured, with our findings suggesting a flow state dependent correlation instead of a person dependent correlation.

The results of the present pre-registered study indicate a relationship between the value of experienced flow and spatiotemporal information processing. The improved performance was manifested in improved temporal perception (i.e., reduced JND scores) and significantly improved spatial attentional control (i.e., reduced PSS scores) when flow states were highest. In a recent longitudinal experiment by Cowley et al. (2019), where flow was induced with the help of a video game-like high-speed steering task, possible trial-wise fluctuations of performance due to flow were found, suggesting that performance was enhanced when participants experienced increased flow. These results dovetail with our findings, and might likewise indicate a short-term modulation of information processing and performance by flow.

# 4.1. Temporal Processing and Flow

The results indicate a positive correlation between the value of experienced flow and temporal processing. Specifically, when participants reported higher flow states in the practice or rehearsal session, they also performed better in the postperformance TOJ task compared with the pre-performance TOJ task, approximately by 5.8 ms per one-unit change of flow. But the results cannot draw a clear picture of the significance of the effect of flow on the JND. Although probabilistic models demonstrated that a coefficient of zero is not in their credible interval, the effect does not seem to contribute significantly in terms of predicting new data. In addition, the frequentistic model, although clearly pointing in one direction of supporting the alternative hypothesis, did not show a significant effect and the interaction term did not improve the fit of the model. Due to the way the data was collected (in the field), it can be assumed that a high level of noise in the data contributed to wide confidence intervals (and slightly smaller credible intervals), which makes it difficult to provide a reliable statement about the effect of flow on the JND and its strength.

On average, across all participants and independent from reported flow levels, the JND from pre- to post-testing generally improved (decreases) by approximately 5 ms. Despite there being no significant difference found overall when comparing the JND between athletes and musicians, there was a difference in the general rate of change from pre- to post-testing between the two domains. Specifically, athletes' JND scores deteriorated by 11 ms compared to musicians. This means that the musicians have improved JND scores while the athletes' performance has weakened. The differences between the rate of change of the two domains may be explained by the fact that athletes experience increased fatigue (compared to musicians) due to their physical performance while musicians might benefit more from learning effects and increased concentration, or, in contrast to the athletes, are not fatigued. However, these assumptions are purely speculative and the underpinning reason for these differences should be explored in further research. These differences were not observed in relation to the PSS.

# 4.2. Spatial Attention and Flow

In regards to spatial attention, it appears that increased flow experiences resulted in significant improvements in the PSS. More precisely, PSS improved by approximately 21 ms for each one unit increase in reported flow, suggesting that spatial attention was indeed modulated during flow experiences. It is important to note that this improvement could in fact be underestimated given that performance specific demands on spatial attention are arguably less for musicians than athletes (although, musicians are required to divide attention between their own performance and musical notes, and also the conductor and other fellow musicians). Future research could focus specifically on how flow experiences might affect attentional/perceptual mechanisms that are more tightly coupled with differing music/sport specific demands. Similar to Lim and Sinnett (2011) larger overall PSS scores (64ms difference) were observed for exogenous than for endogenous trials, which demonstrates that exogenous stimuli have a greater impact on attention. With direct respect to the perception of central objects (i.e., endogenous trials), it is likely that attentional focus is increased during flow experiences. This effect is reflected by participants' PSS scores, which indicate that a high level of flow reduces the distractibility from the central cues, resulting in a lower PSS.

#### 4.3. Theoretical Implications

Attention plays a critical role in how time is experienced. Consider how a pleasant event (e.g., an excellent film or book) makes the subjective time seem to pass more quickly compared to something less pleasant (e.g., a boring lecture or perhaps this article for some). That is, the perception of time can be modulated depending on one's focus during the task (Phillips, 2012; Wearden, 2015). Arguably, the basis for time perception is rooted in the ability to process temporal order, with temporal resolution allowing for the successful recognition of stimulus order or simultaneity (Thönes and Stocker, 2019). Yet, the question remains as to how exactly the phenomenological perception of the passage of time during flow states and the processing of temporal information are related.

As a theoretical framework for the subjectively perceived slow down in time perception, Tse et al. (2004) might offer an alternative. These authors found that during unexpected events attention is arguably highly engaged, potentially leading to an increase in the amount of information processed during that time and to the subjectively experienced expansion of time. The authors conducted an oddball paradigm, in which participants were required to respond to a low-probability target that occurred within a range of high-probability stimuli. Participants were required to judge the duration of the presented lowprobability target and decide whether it lasted longer or shorter than the standard items. The authors found that unexpected stimuli that were in fact presented for a shorter amount of time than the standard stimuli were, in fact, judged to be presented for the same length, suggesting that the engagement of attention for unexpected stimuli potentially leads to an increase in the amount of processed information, and subjectively perceived time. Essentially, more information is extracted from an unexpected signal. Arstila (2012) further posit why such faster rates of information processing lead to the experience of time as moving slower than usual. Specifically, these authors suggest that one component of the perception of time is determined by the speed of things in the external world. This re-afferent system plays a crucial role in determining the time that we perceive. The experience of faster acting than usual also implies that external objects might be slower than usual. Applying this logic to our study would suggest that participants who experienced a high flow state (in which attention plays a crucial role), processed more temporal information during the post TOJ task than in the pre task, and therefore, their JND scores improved (i.e., reduced). Moreover, we might realize that we are able to shift our attention from one stimulus to another more quickly than usual and therefore, our re-afferent system provides us with information that our internal processes are faster than normal.

One possible alternative explanation for our results could simply be that participants were subject to learning effects because of the repeated measures design. This learning effect could manifest itself within a session, between the pre- and the post-testing or across subsequent sessions. The change between the sessions is embedded in the hierarchical structure of the model, which accounts for the variance of this random effect. While a general modulation of JND does occur over sessions, these same differences were not observed for PSS. Moreover, it should be noted that the main interest of this study was not in the general effect of a single performance on the JND, which would be reflected in the change from pre- to post-testing, but whether this change is moderated by flow, for which the data suggest to be the case.

# 5. LIMITATIONS AND FUTURE RESEARCH

At this point it is impossible to make any statements regarding the direction of the causality of whether flow leads to improved perception, or if instead improved perception leads to flow. Thus, the question remains unanswered as to whether participants who experienced flow during the practice or rehearsal session have therefore an enhanced information processing, or if participants who show an enhanced information processing therefore experience higher flow states. While this concern should be partially alleviated given that our findings do imply that perception improves as flow increases within our participants, this is a question that should be elucidated by further research. Randomized controlled trials and experiments in which flow can be actively manipulated could provide insights into the causality relationships between flow states and spatiotemporal processing. Shehata et al. (2018) present a possibility to actively manipulate flow states and to induce them with the help of a "Music Rhythm Game," which arguably generates flow over the difficulty of the game (from "boredom" to "overload"). The difficulty in this is, of course, whether flow states caused by simple but controllable environments are similar or not to those experienced in selfchosen multifaceted sports and music environments. Another limitation that could be tackled with such a task, might be that the flow inducing sports or music performance was between the TOJ measurements and not at the same time.

An additional aspect of the current study that could be expanded in future research has to do with the inclusion of both athletes and musicians who play different sports and instruments, respectively. While this did result in differences between groups in terms of years of experience and training regimens, it should be noted that when including group as a factor no major differences were observed in the findings. We included different activities (i.e., athletes and musicians) in an attempt to more broadly explore how flow experiences from varying performance types might affect perception. Future research could consider exploring specific activities (e.g., only tennis players or only pianists) in order to see if perception is differently modulated based on the flow that is experienced from different activities.

Despite the inclusion of a control question to determine whether the state was approximately the same during the practice/rehearsal session and post task, and although the participants tended to agree with the control question, it is difficult to conclude that the flow-state experience was truly transferred to the post TOJ task. It was expected that there would be a stronger correlation between flow and pre- and post-measurement performance when the control question was answered higher. However, this was not the case. This could possibly be explained by the fact that the control question might not adequately assess whether the flow state of the participant extended to the TOJ task. In addition, although we attempted to keep the amount of time between music/sport session and testing as short as possible, memory biases could have influenced our results in that we presented the control question and the flow questionnaire at the end of the experimental trial and the items were answered retrospectively. In future investigations it would certainly be interesting to develop a task where flow and perception can be simultaneously measured, so as to completely avoid any potential memory confound. This could potentially be accomplished by using physiological measures, such as skin conductance, pulse rate (Tozman et al., 2015), brain activity or oculomotor indicators (Shehata et al., 2018). Knierim et al. (2018) provide an overview of peripheral nervous system indicators of flow. Furthermore, aspects that, in addition to the flow state, could also lead to an improvement in spatiotemporal processing, such as increased alertness after enhanced performance, should be controlled in further studies.

Due to the testing that took place in the field, the data are subject to high noise exposure, therefore the calculated models have difficulties in estimating accurate confidence/credible intervals. This limitation is notable as this potentially impedes an accurate determination of the effect size. For instance, and with particular regard to the pre-registered hypothesis about the moderation of the JND, no statistically significant and unambiguous answer could be found.

Lastly, even if the results regarding the modeling of the PSS may appear promising with regard to the understanding of the possibility that perception is modulated during flow states, the results were obtained in form of an exploratory analysis and should be confirmed in the sense of a pre-registered confirmatory analysis.

#### 6. CONCLUSION

To the best of our knowledge the present study is the first to provide evidence that subjectively experienced improvements in

#### REFERENCES

perception during flow states are related to improved temporal and spatial visual processing. In this study, self-reported flow states and perceptual processing, as measured by a TOJ task, were obtained over several time points. The correlation between flow and the JND scores indicate an improved temporal processing during flow, with the correlation between flow and the PSS scores indicating significantly enhanced spatial attention. Combined, these findings provide evidence to suggest that anecdotal accounts of improved perception during flow states might actually reflect objective reality.

## DATA AVAILABILITY STATEMENT

This study was pre-registered, in which the general procedure, the hypothesis for the confirmatory analysis about the relationship between flow and the JND, and the exploratory analysis about the relationship between flow and the PSS were documented. The pre-registration, "Transparent Changes" document, all generated data sets, experimental code, analyses and additional analyses are publicly available via the Open Science Framework: https://osf.io/xfrmu/.

# ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Committee on Human Subjects, University of Hawaii at Manoa. The patients/participants provided their written informed consent to participate in this study.

# AUTHOR CONTRIBUTIONS

SS and RA conceived the study. All authors participated in experimental design and decisions on the experiment specifications. RA, JJ, and SMS collected the data. JJ and SMS programmed the software for the task, recruited the participants, organized the open practice process, conducted the experiment, selected and applied the statistical analyses, interpreted the results, and drafted the paper. All authors participated in reviewing and revising the manuscript and approved the final version.

#### ACKNOWLEDGMENTS

The authors wish to thank Dr. Boris Mayer, Stefan Thoma, and Lukas Schumacher for conceptual contributions specifically in data analysing. We also thank the Hawaii Pacific University Men's and Women's tennis teams, the University of Hawaii at Manoa's Music Department, and all of the athletes and musicians who participated in this study.

Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixedeffects models using lme4. J. Stat. Softw. 67, 1–48. doi: 10.18637/jss.v067.i01

Bürkner, P.-C. (2016). brms: an r package for bayesian generalized linear mixed models using stan. J Stat Softw.

Arstila, V. (2012). Time slows down during accidents. Front. Psychol. 3:196. doi: 10.3389/fpsyg.2012.00196


Microsoft Corporation (2019). Microsoft Excel.

Moneta, G. B. (2012). "On the measurement and conceptualization of flow," in Advances in Flow Research, ed S. Engeser (New York, NY: Springer), 23–50.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Sinnett, Jäger, Singer and Antonini Philippe. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Efficacy, Trainability, and Neuroplasticity of SMR vs. Alpha Rhythm Shooting Performance Neurofeedback Training

Anmin Gong<sup>1</sup> , Wenya Nan<sup>2</sup> , Erwei Yin<sup>3</sup> , Changhao Jiang<sup>4</sup> \* and Yunfa Fu<sup>5</sup> \*

<sup>1</sup>School of Information Engineering, Engineering University of Armed Police Force, Xi'an, China, <sup>2</sup>Department of Psychology, College of Education, Shanghai Normal University, Shanghai, China, <sup>3</sup>Tianjin Artificial Intelligence Innovation Center (TAIIC), National Institute of Defense Technology Innovation, Academy of Military Sciences China, Beijing, China, <sup>4</sup>Key Laboratory of Sports Performance Evaluation and Technical Analysis, Capital University of Physical Education and Sports, Beijing, China, <sup>5</sup>School of Automation and Information Engineering, Kunming University of Science and Technology, Kunming, China

#### Edited by:

Frederic Dehais, Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-SUPAERO), France

#### Reviewed by:

Robert J. Gougelet, University of California, San Diego, United States Nadine Matton, École nationale de l'aviation civile, France

\*Correspondence:

Changhao Jiang jiangchanghao@cupes.edu.cn Yunfa Fu fyf@ynu.edu.cn

#### Specialty section:

This article was submitted to Cognitive Neuroscience, a section of the journal Frontiers in Human Neuroscience

> Received: 10 June 2019 Accepted: 02 March 2020 Published: 20 March 2020

#### Citation:

Gong A, Nan W, Yin E, Jiang C and Fu Y (2020) Efficacy, Trainability, and Neuroplasticity of SMR vs. Alpha Rhythm Shooting Performance Neurofeedback Training. Front. Hum. Neurosci. 14:94. doi: 10.3389/fnhum.2020.00094 Previous literature on shooting performance neurofeedback training (SP-NFT) to enhance performance usually focused on changes in behavioral indicators, but research on the physiological features of SP-NFT is lacking. To explore the effects of SP-NFT on trainability and neuroplasticity, we conducted a study in which 45 healthy participants were randomly divided into three groups: based on sensory-motor rhythm of C3, Cz and C4 (SMR group), based on alpha rhythm of T3 and T4 (Alpha group), and no NFT (control group). The training was performed for six sessions for 3 weeks. Before and after the SP-NFT, we evaluated changes in shooting performance and resting electroencephalography (EEG) frequency power, participant's subjective task appraisal, neurofeedback trainability score, and EEG feature. Statistical analysis showed that the shooting performance of the participants in the SMR group improved significantly, the participants in the Alpha group decreased, and that of participants in the control group have no change. Meanwhile, the resting EEG power features of the two NFT groups changed specifically after training. The training process data showed that the training difficulty was significantly lower in the SMR group than in the Alpha group. Both NFT groups could improve the neurofeedback trainability scores and change the feedback features by means of their mind strategy. These results may provide evidence of trainability and neuroplasticity for SP-NFT, suggesting that the SP-NFT is effective in brain regulation and thus provide a potential method to improve shooting performance.

#### Keywords: neurofeedback, shooting performance, motor sensory rhythm, resting EEG, trainability

# HIGHLIGHTS


# INTRODUCTION

Shooting is a simple motor behavior that can be easily affected by mental states such as attention and emotion. The relationship between shooting performance and central nervous system features has raised interest in many scholars, and this relationship has been widely investigated for shooting athletes through electroencephalography (EEG; Hatfield et al., 1984; Del Percio et al., 2009; Bertollo et al., 2016). For instance, it has been reported that during the preparation process in shooting, shooting experts show continuously increased alpha rhythm (8–12 Hz) at the T3 electrode (Hatfield et al., 1984). Furthermore, during the preparation process, a steady increase of theta rhythm (4–7 Hz) power at the frontal midline is shown in shooting experts rather than novice shooters (Doppelmayr et al., 2008), and the event-related desynchronization (ERD) and event-related coherence (ERCoh) of alpha rhythm in shooting experts is significantly less than that of novice shooters (Del Percio et al., 2009, 2011). In addition, a decrease in alpha power of the occipital region is reported during successful trials compared to fail shooting trials (Loze et al., 2001). These results suggest that shooting performance is closely related to the shooter's brain activity during shooting preparation.

It is unknown if it is naturally possible to enhance shooting performance by interventions that influence brain activity. Therefore, in the movement neuroscience, brain regulation for enhancing shooting performance using neurofeedback training (NFT) has become a new research focus. NFT converts EEG signals into sound or animation, which is easily understood by the participants, to help people understand their own physical status. Participants can selectively enhance or suppress neurophysiological signals of a specified frequency band through repeated training and effectively regulate their brain function.

Many studies have used NFT to improve the sports performance of athletes (Raymond et al., 2005; Faridnia et al., 2012; Strizhkova et al., 2012; Cheng et al., 2015a; Mikicin et al., 2015; Ring et al., 2015). For instance, Raymond et al. (2005) improved the dance performance of the college dance sports team by increasing the alpha/theta ratio of the Pz electrode. Strizhkova et al. (2012) improved the complex coordinated activities performance of gymnasts by increasing alpha of F1, F2, P3 and P4 electrodes. Faridnia et al. (2012) reduced the sport competition anxiety of swimmers by increasing SMR and low beta and decrease theta and a high beta of C3 and C4 electrodes. Ring et al. (2015) improved the golfers' putting performance by reducing theta and high alpha power of the Fz electrode. Mikicin et al. (2015) reduced student-athletes' attention-reaction by increasing beta1 and SMR and decrease theta and beta2 of C3 and C4 electrodes. Cheng et al. (2015a) improved golfers' putting performance by increasing SMR of the Cz electrode.

In the literature, only two studies attempted to improve shooting performance by NFT (SP-NFT). One SP-NFT study utilized SMR (Sensorimotor Rhythm, 12–15 Hz EEG rhythm, usually collected from C3, Cz, and C4), beta and alpha rhythm mixed NFT protocol to improve shooting performance (Rostami et al., 2012). This approach was taken because a lot of previous studies have found that the increase in SMR is often accompanied by attention increases. The NFT based on SMR has been widely used in the treatment of attention deficit hyperactivity disorder (ADHD), and its activities are also found to be closely related to the optimization of the skilled action execution motor performance such as golf putting and dart throwing (Vernon et al., 2004; Cheng et al., 2015a,b). Therefore, the researchers expected to increase the participants' attention by increasing SMR, so as to improve their shooting performance.

On the other hand, it has been demonstrated that there is an asymmetry between the left and right hemispheres and an increase of alpha rhythm in the left temporal region during rifle and archery shooting preparation (Hatfield et al., 1984; Salazar et al., 1990). Based on this phenomenon, the other SP-NFT study utilized NFT to enhance EEG low-frequency activity over the left temporal region (T3) and successfully improved archers shooting performance (Landers et al., 1991). The authors explained this NFT could improve the shooting performance by simulating brain activity during actual rifle and archery shooting preparation.

Although both studies used NFT to improve shooting performance, it is unknown which of the two methods is more effective. In addition, according to the current standards of neurofeedback experimental research, there are still some problems in these two studies. Mirifar et al. (2017) suggested that neurofeedback combining visual and auditory feedback may be more effective than visual or auditory feedback alone. However, the two kinds of SP-NFT mentioned above only use visual feedback. Additionally, in Landers's study, participants were tested for feedback effects after only one feedback training session. Some scholars have suggested that successful neurofeedback regulation may require a minimum of three to four sessions, and initial improvements can only be seen within the first five to ten sessions (Konareva, 2005; Hammond, 2011). More importantly, both studies only analyzed the efficacy of NFT from a kinematic perspective and did not provide a detailed analysis of the effect of NFT on trainability and neuroplasticity (Enriquez-Geppert et al., 2014). That is, there was no indication of whether the EEG features of feedback can be actively modulated by the participants during the training process, i.e., trainability, or if this training can produce some lasting changes to the EEG rhythm of the brain, i.e., neuroplasticity.

In previous NFT studies, the NFT trainability was firstly examined before confirming that the NFT can affect cognition/behavior. For instance, Cho et al. (2007) confirmed the NFT trainability of alpha activity in the midline parietal region and found that NFT could enhance participants' ability to maintain alpha activity. Zoefel et al. (2011) first studied the trainability of upper alpha NFT in the parietal and occipital regions and found the effect of this NFT on cognitive improvement. Enriquez-Geppert et al. (2014) demonstrated the trainability of frontal midline theta NFT and revealed the role of training in cognitive ability improvement. These literatures suggest that proving NFT trainability is the basis of studying the effects of NFT on cognition/behavior.

Studies regarding neuroplasticity are mainly based on the correlations between neurological characteristics and cognitive/behavioral indicators. Many studies found a significant correlation between neurological characteristics of resting states and behavioral/cognitive indicators (Babiloni et al., 2010; Zhou et al., 2012; Wan et al., 2014; Zhang et al., 2015). Therefore, scholars believe that if NFT changes not only the cognitive/behavioral indicators but also the relevant neurological characteristics, it will be an important evidence supporting its influence in cognition/behavior by altering the neurological characteristics. Many scholars have studied the neuroplasticity generated by NFT from different perspectives. Besides finding that NFT can produce neuroplasticity changes in resting EEG, NFT can lead to changes in white and gray matter in the brain and the resting brain network features (Cho et al., 2007; Zoefel et al., 2011; Ghaziri et al., 2013; Kluetsch et al., 2014).

Here we sought to improve SP-NFT by using both visual and auditory feedback and increasing the number of sessions, while also examining trainability and neuroplasticity. To examine trainability, we recorded the extent to which each participant could actively modulate EEG features during the training process. To examine neuroplasticity, we measured restingstate EEG before and after NFT. Current studies have found significant correlation between resting EEG features such as individual alpha frequency (IAF) and frequency band power and the performance of participants completing cognitive tasks, motor imagery tasks, and even shooting tasks (Koch et al., 2008; Zhou et al., 2012; Zhang et al., 2015; Gong et al., 2017). Other studies also found that there were significant differences in resting EEG features among participants of different sport levels (Gong et al., 2017). Therefore, we believe that resting EEG is a good neurological indicator reflecting the brain ''baseline'' state and can be used to analyze the brain neuroplasticity induced by training (Khanna et al., 2015; Zhang et al., 2015).

Previous studies on SP-NFT have not specifically focused on the changes in resting EEG caused by NFT. Therefore, they could not fully demonstrate whether NFT really changed brain activity or whether the observed effect was simply a placebo effect of neurofeedback (Schönenberg et al., 2017; Xiang et al., 2018). These research limitations are one of the main reasons why NFT technology has been controversial and has not gained as much popularity in the kinematics field (Gruzelier, 2014; Mirifar et al., 2017).

To better explore the effects of SP-NFT on trainability and neuroplasticity, we hypothesized the following: (1) SP-NFT has an effect on trainability, participants can master the SP-NFT, and improve neurofeedback trainability scores; and (2) SP-NFT has an effect on neuroplasticity, such that resting-state EEG activity is significantly changed.

#### MATERIALS AND METHODS

#### Participants

The 45 healthy college students (male, age: 19.5 ± 2 years) from the Armed Police Engineering University voluntarily took part in this research. All participants completed pistol course learning, had qualifying grades, and mastered fixed-target pistol shooting skills. Before training, all participants were divided into three groups according to their age, height, weight, and somatotype. The SMR group (N = 15) aimed to enhance SMR (12–15 Hz) power of C3, Cz, and C4 channel, similar to the study of Rostami et al. (2012). The Alpha group (N = 15) training aimed to enhance the alpha rhythm (8–12 Hz) power of T3 channel and decrease that of T4 channel, similar to the study of Landers et al. (1991). The control group (N = 15) did not receive NFT, just underwent a shooting test. This study protocol was in accordance with the Declaration of Helsinki and approved by the university ethics committee. Before the experiment, the experimental processes and purpose were explained, and the written informed consent was obtained from all participants. Participants were free to withdraw from the experiment at any time.

#### Experimental Design

**Figure 1** shows the experimental design and flow of our study. First, all participants underwent a shooting performance pre-test and a resting EEG pre-test within 3 days before NFT. Next, during the NFT period, the SMR group and Alpha group underwent six NFT sessions within 3 weeks at convenience. Finally, within 3 days after all NFTs, they completed

a shooting performance post-test and a resting EEG post-test. It is worth noting that the control group only underwent shooting performance pre-test and post-test, and all participants did not perform any excess shooting tasks during the 3 weeks.

# Shooting Performance Pre-test and Post-test

All participants underwent two pistol shooting tasks to evaluate the effect of NFT on shooting performance. The shooting test was organized by the College of Basic Military Education. Participants used a type 92 pistol, aiming at a target 25 m away. The dimensions of the target were 52 × 52 cm. It included 10 rings, with a diameter of 10 cm and 10 ring edges, each extending 5 cm followed by 9, 8, 7, and 6 rings. The corresponding shooting score was obtained by hitting the position on the target, and a miss was recorded as 0. For example, if the shooter hit the center of the target, a score of 10 was recorded. Every shooter was asked to take a standing position in a single firing mode. Prior to each shot, the participants were informed of their previous shooting score. All participants performed 25 shots at their own pace.

#### Acquisition and Preprocessing

The resting EEG acquisition device was a Beijing SymTom 32-D EEG amplifier. The EEG signals were recorded from 32 electrodes according to the international 10-20 system, including Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, T6, Fz, Cz, Pz, FC3, FC4, CP3, CP4, FT7, FT8, TP7, TP8, FCz, CPz, Oz, PO3, and PO4. The ground electrode was placed over the forehead and the reference was the left and right mastoids. The impedance of each electrode was kept below 5 k and the sampling frequency was 1,000 Hz. The participants were asked to sit on soft and comfortable seats, while asking them to remain relaxed but not to fall asleep, and do not try to recall anything. EEG data of participants were collected with eyes-closed for 5 min and eyes-open for 5 min. To control the alertness level, the participants' behavior and the quality of the EEG signals were monitored online in real-time. If the EEG had abnormal changes due to coughing, manual activity, sleepiness, etc., the participants were verbally reminded to cooperate.

The EEG signals at all recording channels were analyzed offline in MATLAB (2014a) and EEGLAB toolbox (Delorme and Makeig, 2004). The signals were firstly segmented into 2 s epochs and the data epochs corrupted by artifact were rejected by visual inspection (Delorme and Makeig, 2004). Independent component analysis (ICA; fast ICA algorithm) was used to remove blinking artifacts during the eyes-open resting state (Jung et al., 2001). Blinking artifacts were recognized by calculating the correlation coefficient with the EEG signals of Fp1 and Fp2 channels. Any component with a correlation coefficient greater than 0.8 was considered as the blinking artifact and was removed (the 0.8 was empirical data). Then a 6-order Butterworth band-pass filter of 0.5–40 Hz was used to filter the signal. Finally, the clean EEG data were used for subsequent frequency band power analyses. The frequency band power of the EEG signal was calculated using the Welch method. The calculated time window spanned 5,000 sampling points (5 s) and the overlap rate was 50%.

To control for differences in individual frequency bands between participants, the frequency bands of interest were defined relative to the IAF (Klimesch, 1999). The EEG power spectrum is calculated in the occipital region during the resting eyes-closed state, and the peak frequency position is found in the range of 7–13 Hz frequency bands. Finally, this peak frequency position was recorded as the IAF of the participants, and other frequency bands were defined relative to the IAF as follows: theta as IAF −6 Hz to IAF −3 Hz, alpha as IAF −2 Hz to IAF +2 Hz, beta as IAF +3 Hz to IAF +20 Hz.

# Implementation of Shooting Performance NFT

According to previous studies, an effective neurofeedback experiment should preferably include at least five training sessions, and the interval between two training sessions should be at least 1 day (Mirifar et al., 2017). Therefore, our study design included six sessions of SP-NFT in 3 weeks, in which each participant would undergo two sessions a week with an interval of at least 1 day in-between (**Figure 1**).

To ensure double-blinded NFT, experimenters were only required to enter the unique identity number of the participants into the NFT system (Ros et al., 2019). The system automatically identified the feedback type of the participants and ran the corresponding NFT mode. Therefore, the experimenter was not aware of the feedback mode in which the participant was training.

The EEG signals during SP-NFT were recorded from Cz, C3, C4, T3, and T4 for all participants in both NFT groups (**Figure 2**). The EEG signals were transmitted to the computer by the USB interface, and the SP-NFT program written in MATLAB 2014 was used to process and calculate the EEG features for feedback to participants in real-time. The feedback feature of the SMR group was the average SMR power of Cz, C3, and C4, while the feedback feature of the Alpha group was the alpha power of the T3 electrode minus alpha power of T4 electrode.

Each training session lasted about 25 min and included 30 trials. Each trial included a 15-s relaxation break and 30-s SP-NFT. During the relaxation break, the participants sat in silence and did not deliberately recall anything. At the end of the relaxation stage, the median value of the feedback feature in this period was shown as a red line on the feedback interface and served as the baseline of the SP-NFT trial. In the training stage, the feedback feature of the participants was calculated in real-time and presented to the participants in the form an image of a shooting target at different clarity levels, and a dynamic blue curve, both indicating the magnitude of the feedback feature. The participants were requested to focus on the shooting target image and to try to improve the feedback feature by means of their own mental strategy. Participants are advised not to be overly nervous during the NFT period, to remain relaxed but focused, to try to imagine the movements of the shooting preparation stage or to focus on the target image on the computer screen. When the feedback feature increases, the curve rises and the image becomes clearer; if the feedback

feature decreases, the curve falls and the image becomes less clear. To implement the important auditory component of our NFT paradigm, the system included a digital sound equalizer to control the play of feedback music. The parameters of the equalizer were controlled according to the real-time feedback EEG feature: at the beginning of the training stage, the music was lowest, and when the magnitude of the feedback feature was higher than the baseline, the feedback music would gradually increase in volume and clarity.

To evaluate the success of SP-NFT, the system automatically calculated the neurofeedback trainability scores after each trial. The trainability score was determined as the ratio of time the feedback features were higher than the baseline over the total time of the training period. If the feedback features were always higher than the baseline during the feedback period, the neurofeedback trainability scores were recorded as 100 points. On the contrary, if the feedback features were always lower than the baseline, the neurofeedback trainability scores were recorded as 0. At the end of each feedback trial, the feedback score displayed and shown to the participant.

### Subjective Task Appraisal of Participants

To compare the subjective outcomes of SP-NFT between SMR groups and the Alpha group, a participants' subjective task appraisal scale was used to evaluate the two kinds of training. After each SP-NFT session, participants reported the degree of fatigue, commitment, and difficulty in relation to the training. The reported scale used a five-level Likert scale: one indicated the lowest degree and five indicated the highest degree. For example: for fatigue degree, one means no fatigue at all, and five means very tired.

# Statistical Analysis

It is noteworthy that we tested the probability distribution of the samples using the Kolmogorov–Smirnov test. Regarding shooting performance, resting EEG IAF, and resting EEG frequency band power, it was found that these samples were not all subjected to Gaussian distribution. Thus, we used the Wilcoxon signed-rank test for these samples to examine the difference between pre-test and post-test in each group. For subjective task appraisal index and NFT trainability score, feedback feature and repeated-measures ANOVA was applied because all the samples were subjected to Gaussian distribution.

#### Shooting Performance Index

For each participant, the shooting performance index was the mean value of the 25 shot scores. To examine the differences in shooting performance between pre-test and post-test, Wilcoxon signed-rank test was used to test the median difference of shooting performance index between pre-test and post-test in the SMR group, Alpha group, and control group, respectively. Then, to compare the difference in the effect of the SMR group and the Alpha group on the shooting performance, we also carried out the Wilcoxon rank-sum test on the shooting scores of the post-test minus pre-test for the SMR group and the Alpha group.

#### Subjective Task Appraisal Index

For the subjective task appraisal index of the two SP-NFT groups, a statistical analysis was applied to test the difference between the two groups. To test whether the participants had experienced any changes during the six training sessions based on the subjective task appraisal, a repeated-measure ANOVA with the factors group (SMR vs. Alpha) and the factors session was performed (1–6).

#### Neurofeedback Trainability Scores and Feedback Feature

To examine the neurofeedback trainability of two SP-NFT groups, two repeated-measures ANOVAs were performed. The first ANOVA with factors group (SMR vs. Alpha) and factors session (1–6) compared the real-time calculated neurofeedback trainability scores for each trial, averaged within each session and compared across sessions and groups. This score was calculated as the ratio of the time that the feedback feature remained above baseline for each NFT trial. The second ANOVA with within-subject factor state (Train vs. Relax) and within-subject factor session (1–6) was conducted for the averaged values of the individualized feedback feature across all trials within each session.

#### Resting EEG Rhythm Power Indexes

Finally, we analyzed the changes in resting EEG features before and after training from three perspectives: (1) the resting EEG IAF (calculated by the eyes-closed resting EEG in the occipital region); (2) changes in the EEG power spectrum and the specific frequency band power in channels of interest (COI) during training; and (3) calculation of the average whole-brain EEG frequency band power topographic map of all participants and visually comparing the changes of the whole brain EEG before and after training. For the resting EEG IAF and band power of COI, the difference between pre-test and post-test was tested using the Wilcoxon signed-rank test. The visual comparison of the whole brain map only reflects the change of the average power spectrum before and after the feedback training; it does not reflect statistical testing. All statistical tests are performed in MATLAB 2014.

#### RESULTS

### Shooting Performance Before and After the SP-NFT

**Figure 3** indicates the comparisons of the shooting scores of the three groups in pre-test and post-test. The left is the SMR group,

the middle is the Alpha group, and the right is the Control group. Blue, red, and black box indicates post-test shooting performance and the gray box indicates pre-test shooting performance. The statistical results show that the median shooting score after training is significantly higher than that before training in the SMR group (z = −3.55, p < 0.01). The median shooting score after training is marginal significantly lower than that before training in the Alpha group (z = 1.80, p = 0.09). The median shooting score have no significant change in control group (z = 0.85, p = 0.39). In terms of shooting score difference between two SP-NFT groups, the SMR group is significantly higher than the Alpha group (z = −3.06, p < 0.01).

#### Subjective Task Appraisal of Participants

**Table 1** shows the mean and SD of the three subjective task appraisal indexes for the six feedback sessions. As can be seen from the table, the fatigue degree of the two groups was approximately 2, while the degrees of effort and difficulty were approximately 3 in both groups. Results of repeated-measure ANVOA statistical analyses show that for fatigue index, the group factor effect is not significant F(1,14) = 0.44, p > 0.05, and the session factor effect is not significant F(5,70) = 0.47, p > 0.05. For commitment index, the group factor effect is significant F(1,14) = 10.54, p < 0.01, such that the Alpha group is significant higher than SMR group; the session factor effect is not significant F(5,70) = 0.68, p > 0.05. For difficulty index, the group factor effect is significant F(1,14) = 17.02, p < 0.01, with higher


From top to bottom, there are the results of fatigue, commitment, and difficulty of SMR group and Alpha group, respectively, and from left to right, session 1 to session 6.

difficulty in the Alpha group compared to the SMR group; the session factor effect is not significant F(5,70) = 0.18, p > 0.05. The interactions between session and group of the three indicators are not significant.

## Dynamic Changes in Parameters of SP-NFT

#### The Dynamics of Trainability Scores

**Figure 4** shows the mean and 1.96× standard errors of neurofeedback trainability scores for each session. The horizontal axis is the feedback training session, the vertical axis is the neurofeedback trainability scores, the blue line indicates the SMR group, and the red indicates the Alpha group. The neurofeedback trainability scores of both NFT groups increased as the number of training sessions increased. The results of repeated—measure ANOVA showed the group factor effect is significant (F(1,14) = 5.43, p < 0.01, post hoc: SMR group > Alpha

group). For the session factor effect is significant (F(5,70) = 3.13, p < 0.01, post hoc: session 6 > session 1). This finding indicated that participants could master the neurofeedback well after six sessions of SP-NFT, and the neurofeedback trainability scores were significantly improved.

#### The Dynamic of Feedback Features

**Figure 5** shows the dynamics of feedback features with training sessions. The left side is the SMR group and the right side is the Alpha group. The gray line is the feedback feature in a resting state; the blue and red lines indicate the feedback feature of SMR and Alpha groups, respectively. For the SMR group, the results of repeated—measure ANOVA showed that the state factor effect was significant (F(1,14) = 78.25, p < 0.001, post hoc: Train > Relax) while the session factor effect was not significant (F(5,70) = 0.68, p > 0.05). For the Alpha group, the results of repeated- measure ANOVA showed that the state factor effect was significant (F(1,14) = 7.32, p < 0.01, post hoc: Train > Relax) whereas the session factor effect was not significant (F(5,70) = 0.28, p > 0.05). Comparing the two groups, it was found that the feedback features of both NFT groups in the training state were significantly higher than that of the relaxation state, while the F value of the Alpha group was lower than that of the SMR group. This may also be one of the reasons why participants in the Alpha group considered the training was more difficult than those in the SMR group.

#### The Effect of SP-NFT on the Resting EEG Comparison of Pre-test and Post-test IAF

The median value (SD) of IAF calculated according to the resting eyes-closed EEG is as follows. In the SMR group, the pre-test is 10.64 ± 0.69 Hz and the post-test is 10.45 ± 0.56 Hz, and there was no significant difference between pre-test and post-test (z = 0.99, p = 0.32). In the Alpha group, the pre-test is 10.30 ± 0.83 Hz and the post-test is 10.23 ± 0.92 Hz, and

there was no significant difference between pre-test and post-test (z = 1.42, p = 0.16).

#### Comparison of Pre-test and Post-test Resting EEG Power Spectrum on COI

**Figure 6** is the power spectrum of the resting eyes-closed EEG on the feedback channel before and after training for the SMR group and Alpha group, respectively. The left side corresponds to the SMR group and the right side, to the Alpha group. The asterisk (\*) indicates that there was a significant change in the frequency band power between before and after training (p < 0.05). **Figure 7** shows the power spectrum of the resting eyes-open EEG, with similar details as in **Figure 6**. The power spectrum of the two groups changed after the training compared with before the training, and the change was in the same direction as the direction of SP-NFT: the training for the SMR group consisted in increasing the SMR power of Cz, C3, and C4, causing the resting eyes-closed alpha frequency power of the Cz channel (z = −2.96, p < 0.05) and the resting eyes-open beta frequency power of the C3 and C4 channels to significantly increase (C3: z = −4.39, p < 0.05; C4: z = −3.22, p < 0.05). The

training of the Alpha group consisted of increasing the alpha power of the T3 electrode and decreasing the alpha power of the T4 electrode, leading to a significant increase of the resting eyes-closed alpha frequency power at the T3 electrode (z = −3.01, p < 0.05), significant decreases of the resting eyes-closed and eyes-open alpha frequency power (eyes-closed: z = 2.86, p < 0.05; eyes-open: z = 2.88, p < 0.05) as well as the resting eyes-open beta frequency power at the T4 electrode (z = 2.95, p < 0.05).

#### Comparison of the Whole Brain Resting EEG Frequency Band Power

**Figure 8** is a whole-brain topographic map of the resting eyes-closed EEG frequency power difference between the NFT groups before and after training. The upper is the SMR group and the lower is the Alpha group. From left to right are theta, alpha and beta frequency band. Red indicates that the EEG power of the post-test is higher than that of the pre-test, and the blue indicates that the EEG power of the post-test is lower than that of a pre-test. **Figure 9** is a whole-brain topographic map of the resting eyes-open EEG frequency power difference between the groups before and after training. In the SMR group, three frequency bands in the prefrontal, frontal channels, and central channels were increased after feedback. For the Alpha group, the frequency band power of the left hemisphere increased after feedback, while that in the right hemisphere was slightly weakened. The resting EEG changes in both NFT groups were consistent with the enhanced direction of SP-NFT.

# DISCUSSION

In this article, we explored and compared the efficacy, trainability, and neuroplasticity of SMR vs. alpha rhythm SP-NFT. We improved traditional neurofeedback paradigms by including an auditory feedback component, in addition to a visual feedback component. Furthermore, we increased the number of NFT sessions and carried out six NFT sessions for each participant, rather than the typical one or two sessions. In addition, we have also taken into account trainability and neuroplasticity, which have not been fully explored by previous researchers.

# The Effect Analysis of SP-NFT on Shooting Performance

We evaluated the shooting performance of the three groups after SP-NFT and found that the groups achieved different results. The median shooting score reflects the overall shooting level. There was a significant improvement in the shooting performance of the SMR group, whereas there was a decrease in the performance of the Alpha group. Significant improvement in the shooting performance of the participants who participated in SMR training is consistent with the results of Rostami et al. (2012). In previous findings, an increase in SMR was often accompanied by an increase in attention (Vernon et al., 2004; Cheng et al., 2015b). NFT based on SMR has been widely used in the treatment of ADHD, and its activities have been found to be closely related to the optimization of the skilled action execution motor performance, such as golf putting and dart throwing (Vernon et al., 2004; Cheng et al., 2015a,b). These results also extend the potential facilitation effects of SMR training to athletes and healthy people in need.

Nonetheless, participants who underwent alpha training did not achieve improvement and even displayed a decline in shooting performance. The feedback feature of the Alpha group was characterized by the alpha power difference between T3 and T4, and the participants were given positive feedback when the feedback feature increased. That is, the participants got positive feedback irrespective of the increase in alpha power of T3 or a decrease in the alpha power of T4. Collura (2013) pointed out that in neurofeedback experiments, the training aimed at reducing activity may be a kind of ''squeeze'' enhancement training. Enhancing or reducing EEG activity in a brain region may lead to increased activity in that region (Plotkin and Rice, 1981). Although researchers try to suppress the activity of a certain brain area, the results of the training may instead lead to the enhancement of the brain area. According to this view, Alpha group enhanced the alpha rhythm of left temporal region and decreased the alpha rhythm of the right temporal region; however, the results may strengthen the activity of both temporal hemispheres of the participants because the participants did not acquire their shooting skills through training and the shooting performance did not improve. Therefore, we speculate that if the feedback feature is changed to the alpha power of the Hemi-temporal region, better training results may be obtained.

On the other hand, in terms of EEG features, Landers et al. (1991) used the Slow Cortical Potential (SCP) signal, whereas we used the alpha rhythm signal in the present study. Although the experimental principle is the same, it may lead to different experimental results. In terms of participant selection, the participants of Landers et al. (1991) were pre-professional athletes, while our participants were military students with amateur shooting levels. These differences may also lead to the inconsistency between the results of our research and previous studies.

## The Effect of SP-NFT on Trainability

We defined trainability as the ability of participants to control their own NFT features in training. The experimental results in ''Dynamic Changes in Parameters of SP-NFT'' section show that the trainability in both groups increased gradually throughout the six SP-NFT sessions. Neurofeedback trainability scores for the sixth session were significantly higher than those for the first session. The feedback features increased gradually up to the first four sessions, and the change range between the fifth and sixth sessions was stable. These results are consistent with previous studies of alpha training and frontal middle line theta training (Cho et al., 2007; Zoefel et al., 2011; Enriquez-Geppert et al., 2014). In addition, we also inquired about the strategy used by the participants who had high neurofeedback trainability scores during NFT. Most of them reported that they were ''focusing on one point in the target image'' or ''to concentrate on the motor imagination of the shooting preparation stage.'' These results indicate that both feedback training modes are effective and the participants can actively modulate their EEG rhythm power features. In particular, the subjective task appraisal showed that the degree of difficulty for the Alpha group was significantly higher than for the SMR group, which indicates that SMR feedback might be easier and more convenient for participants.

# Effects of SP-NFT on Neuroplasticity on Resting EEG

The third focus of our research was to study the effects of SP-NFT on neuroplasticity, that is, whether SP-NFT can change the brain neural activity and whether the brain activity of the participant has changed throughout a period of training (Ghaziri et al., 2013; Megumi et al., 2015; Faller et al., 2019). In this study, we tested this by examining the resting EEG of the participants before and after NFT.

The results of our study showed that after training, the resting eyes-closed and eyes-open EEGs significantly changed

in both NFT groups. From the power spectrum of COI and the power topographic map of the whole brain, the position at which the resting state EEG changed was the region where the feedback electrode was placed. This suggests that NFT can cause specific changes in the channels and frequency bands involved in feedback, and it provides evidence that NFT exhibits neuroplasticity at the EEG level. However, from the brain topographic map, we find that in addition to the trained target channels and frequency bands, the adjacent channels and frequency bands have also undergone trend changes, which are a non-specific change. Collura (2013) suggest this could be a kind of influence of the ''entrainment'' effect: in addition to the frequency bands and channels involved in feedback, the EEG power of other nearby frequency bands and the cerebral cortex also showed a certain degree of change. Other neurofeedback also has reported similar phenomena (Cheng et al., 2015a).

The resting-state EEG of the participants changed significantly after SP-NFT, which is not only strong evidence that neurofeedback promotes neuroplasticity, but it also may provide a reason why SP-NFT can improve behavior indices (shooting performance). This shows that in future training processes of skill learning, we may be able to incorporate appropriate NFT throughout the entire training process and improve the participants' attention and mental abilities, thereby improving the participants' physical ability and skill level.

### Research Limitations and Improvements

Through comparative experiments, we studied the effects of two kinds of SP-NFT on training ability and brain regulation. However, based on the experimental conditions, the study had several limitations and could have benefited from some improvements:


# CONCLUSION

To compare the efficacy, trainability and neuroplasticity effects of SMR and alpha SP-NFT, 45 participants were recruited into the experiment, and 30 participants trained with SP-NFT during six sessions in 3 weeks, respectively. Through the analysis of the experimental results, the following main conclusions were obtained:


Overall, the results in this article provide some evidence of the effects of SP-NFT on trainability and neuroplasticity and further contribute to the application of SP-NFT to improve shooting performance stage.

# DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

# ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Engineering University of the Chinese People's Armed Police Force ethics committee. The patients/participants provided their written informed consent to participate in this study.

# AUTHOR CONTRIBUTIONS

AG collected experimental data and wrote the original manuscript. WN analyzed experiment results. CJ and YF designed experiments and revised manuscript. EY provided important advice and help on key content of the manuscript.

# FUNDING

This work was supported by the National Natural Science Foundation (NNSF) of China under Grant Nos. 81771926, 61763022, 81470084, 61463024, and 31771244, State General Administration for Sports Scientific Research (2015B040) and Beijing Research Institute of Sports Science (2017).

## REFERENCES


Collura, T. F. (2013). Technical Foundations of Neurofeedback. London: Routledge.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Gong, Nan, Yin, Jiang and Fu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Psychophysiological Model of Firearms Training in Police Officers: A Virtual Reality Experiment for Biocybernetic Adaptation

#### John E. Muñoz<sup>1</sup> , Luis Quintero<sup>2</sup> \*, Chad L. Stephens<sup>3</sup> and Alan T. Pope3,4

<sup>1</sup> Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada, <sup>2</sup> Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden, <sup>3</sup> Langley Research Center, National Aeronautics and Space Administration, Hampton, VA, United States, <sup>4</sup> Learning Engagement Technologies, Poquoson, VA, United States

Crucial elements for police firearms training include mastering very specific psychophysiological responses associated with controlled breathing while shooting. Under high-stress situations, the shooter is affected by responses of the sympathetic nervous system that can impact respiration. This research focuses on how frontal oscillatory brainwaves and cardiovascular responses of trained police officers (N = 10) are affected during a virtual reality (VR) firearms training routine. We present data from an experimental study wherein shooters were interacting in a VR-based training simulator designed to elicit psychophysiological changes under easy, moderate and frustrating difficulties. Outcome measures in this experiment include electroencephalographic and heart rate variability (HRV) parameters, as well as performance metrics from the VR simulator. Results revealed that specific frontal areas of the brain elicited different responses during resting states when compared with active shooting in the VR simulator. Moreover, sympathetic signatures were found in the HRV parameters (both time and frequency) reflecting similar differences. Based on the experimental findings, we propose a psychophysiological model to aid the design of a biocybernetic adaptation layer that creates real-time modulations in simulation difficulty based on targeted physiological responses.

Keywords: biocybernetic adaptation, virtual reality, psychophysiological model, electroencephalography, heart rate variability, simulation, firearms training

# INTRODUCTION

Burdea and Coiffet (2003) identified "traditional" areas of application of virtual reality (VR) as medicine, education, arts, entertainment, and the military as well as "emerging" areas of manufacturing, robotics and data visualization, training being a particular focus area in many of these applications. For use in training, VR environments provide advantages over physical training environments. VR training in medical surgery contexts has shown performance improvements beyond traditional training approaches (Lehmann et al., 2005). The use of immersive training systems taps into gross and fine motor skill acquisition, maintenance, and expert-level performance (Faria et al., 2018). Firearms training is considered an especially appropriate setting for VR technology deployment. Due to the safety concerns associated with live fire weapons training, the United States Department of Defense places a high value on the potential use of VR environments for firearms training for service members who are impaired by polytrauma (Oliver et al., 2019).

#### Edited by:

Stephen Fairclough, Liverpool John Moores University, United Kingdom

#### Reviewed by:

Domen Novak, University of Wyoming, United States Konstantinos Papazoglou, Yale University School of Medicine, United States

> \*Correspondence: Luis Quintero luis-eduardo@dsv.su.se

#### Specialty section:

This article was submitted to Performance Science, a section of the journal Frontiers in Psychology

Received: 29 November 2019 Accepted: 20 March 2020 Published: 16 April 2020

#### Citation:

Muñoz JE, Quintero L, Stephens CL and Pope AT (2020) A Psychophysiological Model of Firearms Training in Police Officers: A Virtual Reality Experiment for Biocybernetic Adaptation. Front. Psychol. 11:683. doi: 10.3389/fpsyg.2020.00683

Under high-stress situations, the shooter is affected by responses of the sympathetic nervous system that can impact respiration. Relaxing the body and keeping a natural breathing pattern have been identified as major components of firearms training (Johnson, 2007). A training guide well known to the professional police officer trainer community asserts that particular sympathetic responses are desired during military fighting situations [e.g., 100–115 BPMs for heart rate (HR) levels] (Mason, 1998). According to marksmanship training guidelines (Johnson, 2007), an important factor that needs to be trained in overall marksmanship scenarios is shooting during the natural respiratory pause. Since the lack of oxygen might disturb the performance of cognitive skills and visual acuity, training autogenic breathing (autonomic self-regulation training) is considered as a main component of firearms training (Sajnog, 2013), which will calm the body and keep a natural breathing pattern. In order to maintain both body and mind collected, proper oxygenation could be provided by controlling the optimal respiration pace and peripheral responses such as HR and HR variability (HRV).

To support investigation of physiological self-regulation in police officers while training in a target-shooting scenario, a fully immersive, VR-based training simulator called Biocyber Physical Simulator (BioPhyS) for firearms training has been created (Muñoz et al., 2016b). BioPhyS is an example of a system that is capable of employing a form of physiological computing known as biocybernetic adaptation wherein real-time data from the brain and body is used by a control loop to adapt the user interface (Ewing et al., 2016). The design of the BioPhyS system relies on investigations of physiological responses associated with the psychophysiological responses of firearm trainees during a targeted state associated with concentration and calmness.

The research reported here focuses on how frontal oscillatory brainwaves and cardiovascular responses of experienced police officers are affected during a head-mounted display-VR (HMD-VR) training routine. This article uses insights from a previously reported research study (see Muñoz et al., 2019, for more details) to investigate how brain and heart of police officers react to a simulated training for firearms use. Particularly, this research is focused on:


• Illustrating a streamlined pipeline for the integration of physiological adaptation into VR simulators using novel tools such as biocybernetic software technologies and biofeedback design elements.

# RELATED WORK

Lele (2013) points to the quality of immersion as the basis for a VR system's usefulness. Psychophysiological measurement has increasingly been incorporated into research using HMD-VR setups (Pugnetti et al., 2001), particularly in studies examining presence and immersion. Cortical, as well as cardiovascular measures, have relevance for characterizing the psychophysiological response of individuals experiencing VR environments.

In one study, service members with polytrauma demonstrated improved accuracy and precision following VR-based firearm training (Oliver et al., 2019), due to its capability to measure metrics that are not measured in a traditional qualification course, and allowing instructors to focus on other aspects to enhance shooter precision such as posture, sight alignment, or the elimination of bad habits. The United States Navy has developed the tactically reconfigurable artificial combat enhanced reality (TRACER) system to train sailors for combat. TRACER is extolled as a dynamic, engaging and less predictable virtual training environment (Iatsyshyn et al., 2019). In an analysis of the development status and characteristics of VR technology in China (Zhang et al., 2019), it is asserted that VR be beneficial in improving the psychological wellbeing of military personnel and help soldiers to adapt to various war environments. In this regard, exposure therapy based on VR has been extensively showed as an efficacious treatment for active-duty army soldiers with posttraumatic stress disorder (Rizzo and Shilling, 2017). EEG has been employed in investigations of various cognitive variables in 3D virtual learning environments. Frontal alpha EEG was recorded in a study of its role in attentional control; the study also demonstrated the importance of the immersion and engagement afforded by a 3D virtual learning environment (Berger and Davelaar, 2018). In a study of the effect of competition, while shooters were immersed in a virtual environment representing a shooting range, changes in alpha oscillatory activity were found during aiming that were associated with better performance (Pereira et al., 2018). In a previous study by the authors (Muñoz et al., 2016b), frontal alpha activity (as well as delta) was found to be a brainwave pattern that allows differentiation between baseline and active shooting states in a VR simulator. Increases in alpha EEG activity have also been associated with subjects who performed well in spatial navigation tasks in a VR environment (Pugnetti et al., 1996).

Virtual reality experiences can also affect the cardiovascular responses of users. A significant difference in HR values was found during 5-min long interactions with a VR experience that required users to perform simple manual tasks involving the arrangement of virtual elements with multiple shapes (Malinska ´ et al., 2015). A recent study (Marín-Morales et al., 2019) employed analyses of EEG and electrocardiographic (ECG)

signals to explore differences in the emotional reactions (arousal and valence dimensions) between an exploration in a realmuseum and its VR representation, and demonstrated high accuracy of a machine learning model in classifying the nature of a stimuli as real or virtual.

Machine learning approaches have also been used in psychophysiological studies. Specifically, the use of supervised classifiers [e.g., linear discriminant analysis (LDA) and support vector machines (SVM)] to categorize emotional states and create a model for audio-visual or game difficulty adaptation (Novak et al., 2012). Other studies have utilized multilayer perceptron to classify anxiety, boredom, and flow states to compare the effects of mental-state adaptation and performancebased adaptation in a shooting desktop-based game (Alves et al., 2018). A custom-made version of the conventional Tetris was adapted to create real-time adjustments of the game difficulty in three levels by using a SVM classifier that processed signals from skin response, blood volume and EEG (Chanel et al., 2011). Furthermore, learning-based classifiers have also been used to detect high and low anxiety in drivers from ECG and accelerometer data (Dobbins and Fairclough, 2018), or to create a virtual driving platform to maximize engagement in people with autism spectrum disorder (Bian et al., 2019). Applications for airplane pilots used classifiers to identify features from EEG and skin response signals that can model the users in scenarios of attention-related human performance limiting states (Harrivel et al., 2016) or to find relationships of cardiovascular features with psychophysiological stress while performing piloting maneuvers (Hanakova et al., 2017). To the best of our knowledge, this is the first project that aims at characterizing psychophysiological responses of police officers on duty for designing biocybernetic loops in VR firearms training.

# VIRTUAL REALITY FOR FIREARMS TRAINING

In this section, we describe the design and development of a fully immersive, HMD-VR based simulator for firearms shooting training as well as briefly introduce a software tool, the Biocybernetic Loop (BL) Engine, used to integrate physiological intelligence to the VR simulator.

### Biocyber Physical Simulator System for Firearms Training

The BioPhyS contains an outdoor military target-shooting range with representative props such as cable reels, wooden tables, barricades, weapons, and water towers (see **Figure 1**, left). Twenty targets were laid out at different distances from the shooting point; each target moves along a horizontal track. Three weapons were used for training: The Pistols M1911 and SIG Sauer P250, and the Reichsrevolver M1879 (see **Figure 1**, right), each with different impact force on the targets.

A setup panel called Wizard of Oz (WoZ) was designed to allow trainers to modify in real-time the conditions of the simulation; it also provided a controlled environment to study the participants' responses and behaviors when exposed to specific stressors. The set of variables was defined to create physiological modulation and were grouped according to its effect on the scenario, as shown in **Table 1**.

BioPhyS can be used with wireless HTC Vive controllers and an air pistol gun adapted with the HTC Vive Tracker to handle and shoot the weapon. The touchpad of the controllers offers teleportation to navigate around the shooting range, the lateral buttons are used to grab the virtual gun, and the trigger is used for shooting.

# Physiological Intelligence and the Biocybernetic Loop Engine

Conventional methods to integrate physiological sensing technologies into games and VR applications entail the development of specific software clients that stream data collected from the sensors and capturers able to read the data packages directly in game engines such as Unity3D (Muñoz et al., 2016a). Additionally, scripts for signal processing and data analysis are required to create truly intelligent algorithms able to finally integrate biocybernetic adaptation. To streamline the process, the BL Engine is a software tool that acts as a middle layer facilitating the integration of physiological intelligence to games and VR applications developed in Unity3D (Muñoz et al., 2017b). By allowing (i) communication with multiple physiological sensors, (ii) a drag-and-drop console to create adaptive rules, and (iii) specialized scripts to communicate applications and modify variables in real-time; the BL Engine is a software tool to simplify the integration of biocybernetic adaptation in VR projects. This article outlines our initial stage of designing the biocybernetic adaptation layer after carrying out a physiological characterization study in police officers while interacting with the BioPhyS with controlled difficulty levels.

# PSYCHOPHYSIOLOGICAL CHARACTERIZATION STUDY WITH POLICE OFFICERS

The BioPhyS was used to conduct a controlled study with police officers using a repeated measures design, to understand the cardiovascular and neurophysiological responses under different simulation difficulties.

# System Setup

A room-scale tracking system of the VR headset HTC Vive Pro was used to provide a fully immersive experience. The users were able to walk in an area of up to 12 square meters (maximum 5 m between both tracking lighthouses) and interact with the virtual environment by using one wireless controller. A VR One MSI backpack (VR ready computer) computer to run BioPhyS, and an additional screen was used to configure the scenarios and to mirror the VR simulation (see **Figure 2**).

#### Participants

Ten police officers from the police division of Hampton city (VA, United States) were recruited for the study. Participants were

guns. Right: view of the grabbable weapons that can be shot during the simulation.

mostly males (nine males and one female) with ages from 21 to 43 years old. The participation in the study was advertised by the police department as voluntary and the inclusion criteria was having past experience with real target shooting. The experiment was described as a playtest and the details were given to the participants before starting; every police officer also signed informed consent. **Table 2** summarizes the characteristics of the police officers who participated in the study.

#### Physiological Metrics

Police officers were wearing both wearable EEG and HR monitors during the training session using the HMD-VR system. The signals were synchronously recorded on a secondary computer.

#### Electroencephalography

Brainwave activity was recorded using the wearable headband Muse BCI. The sensor includes four EEG electrodes in the TP9, Fp1, Fp2, and TP10 channel positions following the 10– 20 standards. The device generates samples at a frequency of 500 Hz and contains proprietary algorithms that compute relevant parameters to quantify brain activity patterns such as the oscillatory rhythms (Sanei and Chambers, 2013) delta (δ, 1– 4 Hz), alpha (α, 8–12 Hz), beta (β, 12–30 Hz), theta (θ, 4–8 Hz),

TABLE 1 | List of simulation variables carefully defined to create the physiological modulation in the BioPhyS.


and gamma (γ, 30–100 Hz). The proprietary software Muse Lab<sup>1</sup> was used to compute the power spectral density (PSD) of the EEG raw data for each channel for a frequency range from 0 to 110 Hz, using a Hamming window with a window-length of 256 samples and 90% overlap. The EEG metrics that were more relevant for exploration were:


#### Cardiovascular

The chest strap sensor Polar H10 was used to record the cardiac responses of police officers. It includes built-in algorithms that were used to calculate ECG parameters needed for the HRV analysis. The main variable computed is the R-to-R intervals (RRI) with units of 1/1024 seconds.<sup>2</sup> HR and RRIs are broadcast by the sensor and saved locally using a custom-made client running on Windows and based on a Bluetooth Low Energy Windows API. The PhysioLab toolbox (Muñoz et al., 2017a) was used to compute both time and frequency domain HRV parameters. Extracted features include the standard deviation

<sup>1</sup>https://sites.google.com/a/interaxon.ca/muse-developer-site/muselab

<sup>2</sup>https://support.polar.com/en/support/Selection\_Info\_Analysis\_for\_R\_R\_data


LEO, Law Enforcement Officer; SWAT, special weapons and tactics.

of the RRIs (SDNN) and the root mean square of successive differences (RMSSD) values from the time domain. Similarly, frequency domain parameters included high frequency (HF, 0.15–0.40 Hz), low frequency (LF, 0.04–0.15 Hz), and very low frequency (VLF, 0.0033–0.04 Hz), which were extracted from the PSD of the RRI signal. The PSD is computed by using a Welch estimator with a Hanning window, and spectrum components are averaged by an area-under-the-curve approach. The polar chest strap has shown acceptable performance in calculating HR and RRI measurements under different scenarios including non-resting situations (Plews et al., 2017).

#### Other Measurements

#### Simulation Performance

The BioPhyS computed the participants' performance during firearm shooting training. It recorded specific simulation variables such as the total amount of shot bullets, number of destroyed targets, and headshots. The shooting performance metric was defined as the ratio between the number of destroyed targets and the shot bullets.

#### Simulation Sickness

The level of motion sickness produced during interaction with the VR system was evaluated using the Simulator Sickness Questionnaire (SSQ; Kennedy et al., 1993). It assesses total severity of simulation sickness supported by three main symptom clusters called oculomotor (eyestrain, difficulty focusing, blurred vision, and headache), disorientation (dizziness and vertigo), and nausea (nausea, stomach awareness, increased salivation, and burping).

#### Post-session Subjective Interview

Police officers were briefly interviewed at the end of the session to gather their reactions in terms of (i) overall user experience, (ii) pros and (iii) cons of training with the BioPhyS, and (iv) envisioned improvements.

#### Experimental Protocol Training Scenarios

The training scenarios were jointly designed together with a military veteran of the research team. The training protocol included three difficulty modes: easy, medium, and hard; each of them lasting 3 min. The easy configuration laid out 10 static targets randomly distributed in the 10 first horizontal tracks of the training scenario. The medium setup used 10 moving targets at a speed of 0.5 m/s. The hard scenario used 20 targets moving at 1 m/s and it was specifically designed to provoke frustration. Both the target size and hardness were maintained constant across the difficult levels. The same weapon Reichsrevolver was used during the experiment setup with a shooting power equals to

the hardness of the target, thus targets were instantly destroyed with one shot. Although unrealistic, the revolver was preferred considering its slow response, thus aiding a more careful aiming and shooting instead of less mentally and physically prepared training. A baseline condition was used to record physiological signals from the police officers during a passive stand-up situation, wearing the VR headset and physiological sensors, and holding the virtual weapon without shooting or interacting with the virtual environment.

#### Procedure

The study protocol was reviewed for ethical treatment of human subjects and approved by the Office of the Chief of Police of the City of Hampton, VA, serving as the research ethics committee. All police officer subjects volunteered to participate and gave written informed consent in accordance with the Declaration of Helsinki. The experiment lasted around 40 min per participant, including questionnaires, the connection of sensors and interaction. The informed consent was signed at the beginning of the experiment together with a short demographics form. Then, the sensors were connected to the police officer's forehead and chest; the VR headset was also worn and participants were given the instruction to shoot some targets before starting to verify that the protocol was understood properly. Baseline physiological measurements were taken for 3 min, emphasizing that users needed to avoid any facial expression, such as speaking or visual navigation while moving the head. During the active shooting moments, police officers were instructed to also avoid squinting the eyes for aiming in order to minimize signal artifacts in the EEG sensor caused by facial movements. The easy-medium-hard scenarios were manually set up by the researcher, the participants interacted with them for a 2-min resting period between sessions. Finally, users filled out the SSQ and the short post-session interview.

#### Data Analysis

Collected physiological data was processed offline using MatLab (v2013b). Individual cardiovascular and EEG parameters were computed and averaged for statistical analysis. EEG bandpowers and index analysis initially explored frontal Fp1 and Fp2 electrodes separately; however, frontal lobe activity was ultimately weighted by averaging the contribution of both electrodes. Data normality was checked using Kolmogorov-Smirnoff tests. Data with normal distribution were statistically analyzed using parametric tests, whereas non-parametric tests were used for non-normal distributions to determine the influence of the simulation difficulty as the main effect. Posthoc analysis using Bonferroni correction was used to follow up the findings.

### RESULTS

# Physiological Responses

# Electroencephalography

Electroencephalography data recorded from the frontal lobe using the Fp1 and Fp2 electrodes of the Muse BCI system revealed significant brain activity patterns associated with the different difficulty levels used in the BioPhyS. Two specific EEG patterns showed significant results. Firstly, the frontal theta levels (θ), χ 2 (3) = 10.21, p < 0.05, showed significant changes across simulation difficulty levels. Wilcoxon tests were used to follow up initial findings while Bonferroni adjustments were applied, so all effects are reported at a 0.0125 level of significance. Results revealed that frontal theta (θ) activity differed significantly from easy to hard difficulty levels (Z = −1.58, p = 0.009) for the police officers (see **Figure 3**). Secondly, although the theta/beta ratio was significantly influenced by the simulation difficulty factor, χ 2 (3) = 10.69, p < 0.05, non-significant differences were found with the Wilcoxon test.

#### Cardiovascular Responses

Cardiovascular responses were quantified via HR and HRV parameters. HRs of police officers were significantly affected by the simulation difficulty, χ 2 (3) = 18.84, p < 0.05. Post hoc tests (Wilcoxon with Bonferroni) showed significant differences between the baseline measurements of HR compared with the easy (Z = −2.80, p = 0.005), medium (Z = −2.80, p = 0.005), and hard (Z = −2.80, p = 0.005), difficulty levels (see **Figure 4**).

The HRV analysis showed similar results for both time and frequency domains. RMSSD, χ 2 (3) = 18.48, p < 0.05, and VLF, χ 2 (3) = 18.84, p < 0.05; values were significantly affected by the simulation difficulty. Particularly the differences between baseline and easy (Z = −2.80, p = 0.005), medium (Z = −2.80, p = 0.005), and hard (Z = −2.80, p = 0.005) were significant in the post hoc analysis.

#### Simulation Performance

Simulation (or shooting) performance was computed as the ratio between the number of destroyed targets and the shot bullets. Simulation difficulty significantly influenced the shooting

FIGURE 3 | Theta brainwave patterns analyzed for the baseline, easy, medium, and hard difficulties in the BioPhyS. Asterisk (<sup>∗</sup> ) denotes significant results following Wilcoxon tests.

performance of police officers, χ 2 (3) = 18.84, p < 0.05, revealing 64, 46, and 24% performance for the easy, medium, and hard difficulties, respectively.

#### Simulation Sickness

Simulation sickness was measured immediately following the experience by asking police officers about their physical and cognitive status using the SSQ. Data from one of the users was discarded due to recording errors. Test subjects reported eight or fewer SSQ symptoms which is categorized as minimal symptoms, indicating the VR experience did not impact users' operation of the system (Kennedy et al., 2001; Saredakis et al., 2020). Further analysis of the SSQ was conducted and cut-off scores were defined based on the 75th percentile of the calibration sample, to represent the majority of users in the test sample, to detect whether users were adversely affected by exposure to the virtual reality scenario (Kennedy et al., 1993). The reference scores are 15 for total severity, the thresholds for the subscales are 9.5 for nausea, 15.2 for oculomotor, and 0 for disorientation. Results revealed that three police officers were having sickness score above the threshold in all four aspects, and a total of six participants had high scores in the subscales of nausea and disorientation. Eye strain and difficulty focusing were reported by half of the police officers at the end of the training, although they never reached severe intensities.

Additionally, the mean and standard deviation were computed for the three distinct symptom clusters and the total score for the SSQ, and the relative severity was calculated by comparison with the nine calibration simulators utilized in the questionnaire. Results, depicted in boxplot format in **Figure 5**, indicate SSQ total severity (M = 15.4, SD = 11.1), nausea (M = 8.5, SD = 7.4), oculomotor (M = 16.8, SD = 12.4) and disorientation (M = 13.9, SD = 13.9) that fall between the first and second simulators with higher sickness. Therefore examination of the data and further development of the simulator is necessary to reduce effects of the VR simulation used in this study on users.

## Subjective Interview

Overall user experience was described positively by all police officers participating in the study. Two participants emphasized that the simulation made them focus on breathing and concentrate on remaining calm while shooting. Five participants described the experience as "good" and "cool" while highlighting their enthusiasm for trying VR for the first time. One subject mentioned that "graphics were really good" while others mentioned being "stressful as things became more difficult." As advantages (pros) of the simulation system, three participants liked the simplicity of the setup, mentioning that the sensors were minimally invasive and comfortable. Three participants highlighted the realism of the simulation and the accuracy of the motion tracking. The possibility to personalize the training scenarios was mentioned by one of the users as the most remarkable advantage of the system. As disadvantages (cons) of the simulation system, five police officers mentioned being concerned about the sights and the aiming issues resulting from not allowing them to squint their eyes. Additionally, three officers reported some blurriness in the HMD. Three participants also mentioned mapping problems between the controllers and the virtual weapon since "the controller not as much like a weapon as would like." Finally, police officers proposed several features

that can be implemented to improve the system, from which we highlight the following ones: (i) change controllers for air pistols (or more realistic devices), (ii) allow users to see sights of the weapon to improve shooting accuracy, and (iii) integrate scenario-driven interactions.

Finally, the four police officers trained as firearms instructors endorsed the potential of the system for firearm training, the simplicity of the setup including the HMD-VR headset and physiological sensors and the training scenario personalization.

# THEORETICAL MODEL TO INTEGRATE BIOCYBERNETIC ADAPTATION

# Psychophysiological Model

A psychophysiological model based on empirical findings under various levels of challenge is an integral part of the design of a biocybernetic adaptation layer that creates real-time modulations in simulation difficulty based on targeted physiological responses. The resulting psychophysiological model derived for the trained police officer population in this study is used to inform the biocybernetic adaptation approach that can be integrated into the simulation system by using the BL Engine. With the BioPhyS system, this model is developed by characterizing the difference between resting states when compared with active shooting states in a VR simulator in both brain responses and sympathetic signatures. This characterization helps specify the values of physiological variables to be targeted in the biocybernetic adaptation system as well as serve as a control measure for assessing the effects of adding a biocybernetic adaptation layer to a VR training system. Considering the results from the characterization study, we created a psychophysiological model (see **Figure 6**) that reveals opportunities to integrate biocybernetic adaptation in the BioPhyS using the police officers' data.

# Preliminary Computational Model

The simulation difficulty in the BioPhyS covered easy, medium and hard scenarios. The design of those scenarios intended to evoke responses from police officers during specific human states such as minimal engagement (simple challenge considering their skills), engagement (balanced challenge/skill), and frustration (tough challenge considering their skills). The captured neurophysiological and cardiovascular responses from police officers across the simulation difficulty levels reflect specific physiological signatures that must be used to define the automatic adaption intended. Firstly, two specific EEG metrics showed statistical significance across simulation difficulties, frontal theta and theta/beta ratio. However, for real-time adaptation, frontal theta is preferred since it can be captured directly from the sensor without any further computation effort (e.g., dividing one bandpower by other). Secondly, cardiovascular responses also revealed significant differences in HR and RMMSD and VLW for HRV. HR is also preferred considering computational efficiency issues for real-time adaptation. Additionally, one of the firearms instructors of the research team pointed out literature suggesting specific sympathetic responses that are desired for trainees to elicit during the training. Literature research revealed the 100–115 BPMs range as the targeted HR zone defined by firearm trainers to stress the heart enough to facilitate timely cognitive responses without hampering fine motor skill performance (Mason, 1998). Therefore, we used frontal theta as concentration (mental focus) and HR as calmness (cardiovascular regulation) metrics to create an adaptive rule in the BL Engine. The rule is shown schematically in **Figure 7**. The implementation allows the rules to be easily modified by dragging-and-dropping specific blocks that allow connecting logical instructions.

The rule involves capturing physiological data from data receiver blocks and averaging it using the array buffer and math array blocks. Averaged physiological data is then compared against the threshold values that are inferred from the psychophysiological model to finally create the changes in the simulation based on the detected values. For instance, this rule creates modulations toward increasing simulation difficulty (e.g., increasing targets speed and reducing target size) if police officers are not reaching the intended targeted values for getting them engaged (e.g., −0.1 dB frontal alpha and 100 BPMs for HR). Additionally, data from the BL Engine can be sent to the BioPhyS to provide a biofeedback display (panel at the right side of **Figure 8**) for the police officers and investigate the effects of implicit and explicit feedback on training performance (Kuikkaniemi et al., 2010).

Although controlled experiments to validate the real-time model have not been carried out yet, the proposed model and computational solution serve as an initial asset for the future integration of biocybernetic adaptation.

# DISCUSSION

We present a repeated-measures experiment of firearms training using an HMD-VR setup and neurophysiological and cardiovascular measurements to model the psychophysiological responses of 10 (n = 10) police officers to different simulation difficulty levels. To induce specific graded stressors in the simulation, the system was designed to manually modulate specific parameters in the VR simulator (e.g., target speed). Three different scenarios (easy, medium, and hard) were used to investigate the police officer's physiological and subjective responses while actively shooting in the simulation and results were compared including measurements from resting states. Results show how police officers' brainwaves and cardiovascular responses were significantly affected by modulations in the simulation difficulty. Particularly, frontal theta values were significantly different between easy and hard difficulty levels whereas HR and HRV data (RMSSD and VLF) were also significantly different during resting once compared with the easy, medium, and hard difficulties.

### Police Officers and Cognitive Readiness

Crucial elements for police firearms training include mastering very specific psychophysiological responses associated with controlled breathing while shooting. Novel immersive VR

systems as the BioPhyS, allow the recreation of realistic scenarios where trainees can be exposed to different stressors while physiological responses are recorded. We showed how frontal theta values were significantly higher during the simulation difficulties that were not created to frustrate the police officers. Decrements in theta (θ) values have been commonly associated with non-attentional states such as drowsiness, severe sleepiness (Liu et al., 2009), or lower levels of arousal (Sanei and Chambers, 2013). Thus, controlled modulations toward increasing frontal theta (θ) levels can help police officers to train cognitive readiness during shooting scenarios. Moreover, values of the theta/beta ratio collected during the experiment showed that this metric could be used to differentiate between difficulty levels in simulations for firearm training.

On the other hand, police shooters are affected not only by responses of the central nervous system, but also the peripheral system as well. A consistent and mastered skill of controlling cardiovascular responses able to produce desired patterns of respiration has been identified as a major component of firearms

training (Johnson, 2007). HR levels significantly increased from the resting or baseline recording where police officers were not engaged with shooting activities. A trend in **Figure 4** shows how HR levels were incrementally impacted by the simulation difficulty, demonstrating how via systematic modulations in the simulation parameters, cardiovascular responses of shooters can be affected. Similar responses have been found in police officers after training practices under pressure (Oudejans, 2008). Although only few parameters were used to modulate the simulation difficulties, further research using more metaphorical simulation variables (e.g., daylight and rain intensity) could reveal the ideal configuration to elicit the desired cardiovascular responses (Marchiori et al., 2018). Finally, we believe the overall user experience was positively perceived by police officers due to (i) the wearability of the physiological sensors which avoided extra discomfort produced by more invasive setups (e.g., multichannel ECG signals or EEG caps), (ii) the realism of the simulation provided by the VR equipment used as well as the accuracy of the motion tracking achieved through the roomscale play area using two lighthouses, and (iii) the usefulness of the system perceived by the instructors and other police officers who were enthusiastic about using this technology in real training.

# Characterization, Psychophysiological Model, and the Physiological Intelligence Layer

The characterization study with 10 police officers serves as a starting point for the design of physiological computing system that would be able to create the biocybernetic adaptation in the BioPhyS. From the model (see **Figure 6**) two trends can be identified, while HR levels increased proportionally with the simulation difficulty, the frontal theta values decreased. Theta waves have been found as a sensitive neurophysiological marker to describe participant's discomfort (Heo and Yoon, 2020) and motion sickness (Park et al., 2008) in gaming/VR related studies. Moreover, cardiovascular responses such as HR levels have been also studied in firearms training showing heightened states of arousal after active shooting (Heim et al., 2006). Thus, differences between the active shooting and baseline revealed that just by engaging people in the virtual shooting activity without considering the difficulty level, specific psychophysiological patterns can be identified. This is important since it allows differentiating between the physiological signature of being wearing the headset and sensors as well as being standing holding the controllers and the cognitive and cardiovascular cost of shooting. In other words, this provides a certain level of context awareness, so the adaptive system would be able to create more personalize modulations (Novak, 2014). Finally, the adaptive rule created using the BL Engine tool (see **Figure 7**) shows how to move from the theoretical model to a real software implementation by: (i) combining both cardiovascular and neurophysiological features into a clearly defined and transparent adaptive rule, (ii) allowing the modifications of the rule and values in real-time, so speeding up threshold's adjustment for realtime adaptation, and (iii) fully integrating the VR simulator with the BL Engine, so a bidirectional communication will take place enabling research on biofeedback visualization strategies (Kuikkaniemi et al., 2010). Relatedly, Kuikkaniemi et al. (2010) hypothesized that the increase in immersion due to adaptive biofeedback that they demonstrated in the context of first-shooter games could also be achieved in other contexts such as 3D virtual environments (Malinska ´ et al., 2015). There is also evidence that neurofeedback, which BioPhyS employs, may boost the effects of cognitive training (Dessy et al., 2018). Biocybernetic adaptation integrated during exercise-based videogames called Exergames was used effectively to encourage older users to exert in targeted cardiac zones (Muñoz et al., 2018). A similar concept can be explored here, where the difference between the trainee's HR

response and the target or setpoint HR can be used to drive attributes of the simulation task, e.g., target speed or hindering rain intensity, which, in turn, would be expected to drive the trainee's HR.

# Future Work With Virtual Reality and Biocybernetic Adaptation

In the BioPhyS system programmed for self-regulation training, attributes of the VR simulation task, e.g., target speed or rain intensity, are adjusted in ways designed to encourage certain psychophysiological response changes and to discourage others. Physiological self-regulation training with the BioPhyS leverages two memory process principles to promote the transfer of skill learning. Stimuli in the VR environment are designed to match stimuli in the real-world shooting situation, thereby promoting the transfer of firearm skill learning via a memory process known as encoding specificity (Pugnetti et al., 2001; Jaiswal et al., 2010). Physiological states that are determined to be effective in the real-world shooting situation are rewarded in the training setting by biocybernetic adaptation, thereby promoting transfer of learning via a memory process known as state-dependent learning (Pugnetti et al., 1996). In order to investigate how to integrate biocybernetic adaptation strategies into novel immersive systems for firearm training, the BioPhyS can leverage physiological computing technologies to be empowered with intelligent capabilities. Specifically, in this project, the BL Engine offers strategies suitable for enhancing shooting performance by controlling the elements in the virtual environment that affect the simulation difficulty and the officer's concentration and relaxation levels (concentrated and so-called "calm, cool, and collected" state). To improve their scores in the shooting scenarios, police officers would have to use self-regulation strategies (e.g., respiration and attentional control) that will keep them calm and focused, hence reducing the probability of making mistakes in real-life situations (Blacker et al., 2018). Additionally, the physiological challenges can be modified (in realtime, if needed) by trainers allowing a very dynamic and personalized training.

Including physiological sensing technologies in novel immersive and interactive technologies such as augmented or VR has been posed as the new generation of gaming interfaces (Pope et al., 2014). This integration is far from being simple and straightforward since including the human in the loop carries several complexities widely discussed (Novak, 2014). Nevertheless, current advances in sensing technologies, game engine software, and hardware accessibility have allowed the development of very sophisticated biocybernetically adaptive systems empowered with the immersivity and realism of 52 (Kosunen, 2018; Muñoz et al., 2019). In order to get a more solid and integrated VR-biocybernetic confluence, advances in three aspects are required:

• Improvements in the hardware integration: to aid the design and development process of biocybernetically empowered VR applications, the integration of physiological sensing devices, and VR headsets should be minimally intrusive and less cumbersome. Although in our experience we used minimally intrusive sensors, the use of VR headsets that include sensors to record body signals (e.g., Neurable,<sup>3</sup> Looxidlabs,<sup>4</sup> Interaxon; Aimone et al., 2018) will aid the widespread of such technological symbiosis.


# STUDY LIMITATIONS

Our study encloses numerous limitations that are listed as follows:

• Sample size: the size of the sample, n = 10, was small. Therefore, caution should be exercised in generalizing the results of this VR characterization study. The results should be confirmed in future studies with a larger sample size. Additionally, participants were diverse in terms of training skills and demographics.

<sup>3</sup>http://www.neurable.com/

<sup>4</sup>https://looxidlabs.com/


Further research is needed to provide a more conclusive understanding of how biocybernetic adaptation can impact police officers' performance in firearms training using VR simulations.

### CONCLUSION

This study characterized the dynamical neural and cardiac responses of police officers in a VR based target shooting scenario and provides the foundation for a biocybernetic system with an intelligent adaptation layer. The simulation difficulty factor was shown to influence shooting performance, frontal theta brainwave activity and HR, providing a specification of values of the physiological variables to be targeted in a biocybernetic adaptation system as well as to serve as a control measure for assessing the effects of adding a biocybernetic adaptation layer to a VR training system. Assessment of the results from the Simulator Sickness Questionnaire showed that the experience was generally well tolerated. Further development and refinement of the VR simulation capability (including ergonomic improvements to the headset) are suggested to mitigate unintended effects of simulation stimuli on users of such systems. An explanation was given for employing the neural and cardiac findings to integrate physiological intelligence into the VR simulator using a specialized software tool, the BL Engine. Next steps on this research must employ the provided model to quantify the effects of using biocybernetic adaptation in VRbased firearms training for police officers.

# DATA AVAILABILITY STATEMENT

The datasets generated for this study are not publicly available as participants did not provide their informed consent for the public sharing of their data. The datasets are available on request to john.munoz.hci@uwaterloo.ca.

# ETHICS STATEMENT

Ethical review and approval was completed for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this manuscript.

# AUTHOR CONTRIBUTIONS

JM and AP designed and defined the BioPhyS approach. JM conducted the experiment and collected the data. LQ developed the VR simulation and carried out the integration with the BL Engine. CS contributed and reviewed the manuscript. All authors revised and approved the current version of the manuscript.

### FUNDING

The authors declare that this study received funding from J&F Alliance Group, which financially supported the development process with access to VR hardware. This funder was not involved in the study design, analysis, intepretation of data, the writing of this article or the decision to submit it for publication. The study also received funding from Stockholm University to cover open access publishing fees.

### ACKNOWLEDGMENTS

We would like to thank: (i) the J&F Alliance Group who financially supported the internship and VR development

process, (ii) the National Institute of Aerospace (NIA) that supported the convergence of the stakeholders for this project, (iii) the PRISM team from NASA Langley for its very supportive feedback throughout the project development, (iv) Zeltech employees at Hampton facilities. Special thanks to personnel from the Hampton, Virginia Police Department who participated actively in our studies; and Jeremy Sklute and Mike Priddy from J&F Alliance who helped in improving the system's realism, and (v) Stockholm University for providing open access publication fees. This

#### REFERENCES


collaboration with NASA was established through the Langley Research Center Commercialization Office with the J&F Alliance Group, including Dr. Munoz and Dr. Quintero, who licensed several NASA patents related to simulation and game psychophysiological modulation. The use case that the J&F Alliance Group determined to be high potential was military/police training. The support and cooperation of the Langley Research Center Commercialization Office (Jesse Midgett) and System Wide Safety Project (John Koelling) are gratefully appreciated.



**Conflict of Interest:** AP was employed by the company Learning Engagement Technologies.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Muñoz, Quintero, Stephens and Pope. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Well Done! Effects of Positive Feedback on Perceived Self-Efficacy, Flow and Performance in a Mental Arithmetic Task

Corinna Peifer<sup>1</sup> \*, Pia Schönfeld<sup>2</sup> , Gina Wolters<sup>1</sup> , Fabienne Aust<sup>1</sup> and Jürgen Margraf<sup>2</sup>

<sup>1</sup> Faculty of Psychology, Applied Psychology in Work, Health, and Development, Ruhr University Bochum, Bochum, Germany, <sup>2</sup> Faculty of Psychology, Mental Health Research and Treatment Center, Ruhr University Bochum, Bochum, Germany

#### Edited by:

Stephen Fairclough, Liverpool John Moores University, United Kingdom

#### Reviewed by:

Osman Titrek, Sakarya University, Turkey Nicola Baumann, University of Trier, Germany

> \*Correspondence: Corinna Peifer corinna.peifer@rub.de

#### Specialty section:

This article was submitted to Performance Science, a section of the journal Frontiers in Psychology

Received: 25 July 2019 Accepted: 22 April 2020 Published: 10 June 2020

#### Citation:

Peifer C, Schönfeld P, Wolters G, Aust F and Margraf J (2020) Well Done! Effects of Positive Feedback on Perceived Self-Efficacy, Flow and Performance in a Mental Arithmetic Task. Front. Psychol. 11:1008. doi: 10.3389/fpsyg.2020.01008 Self-efficacy is a well-known psychological resource, being positively associated with increased performance. Furthermore, results from field studies suggest a positive impact of self-efficacy on flow experience, which has not yet been tested experimentally. In this study, we manipulated self-efficacy by means of positive feedback and investigated whether self-efficacy serves as a mediator in the relationship between positive feedback and flow and in the relationship between positive feedback and performance. Our sample consisted of 102 participants (63 female, 39 male). The experimental group received positive feedback after completing 5 min of mental arithmetic tasks on a computer, whereas the control group received no feedback. A second session of a mental arithmetic task was then completed for 5 min. Mediation analyses confirmed that specific self-efficacy mediated a positive effect of positive feedback on flow as well as on both performance measures (quality and quantity) in a subsequent task. However, direct effects of feedback on flow and on performance were not significant, which suggests the presence of other mechanisms that remain to be investigated.

#### Keywords: feedback, self-efficacy, flow, performance, mental arithmetic

# INTRODUCTION

"Well done!" – Positive feedback has been found not only to enhance performance (e.g., Kluger and DeNisi, 1996; Hattie and Timperley, 2007), but also to be an efficient intervention to manipulate perceived self-efficacy (e.g., Brown et al., 2012). Self-efficacy refers to the judgment of one's own abilities to successfully cope with future demands (Bandura, 1983). It can either refer to a general judgment, called general self-efficacy, or it can refer to more specific domains, such as mathematical skills, then called specific self-efficacy. Self-efficacy is well-known as a psychological resource protecting mental health and buffering the negative effects of stress (Bandura, 1977, 1986; Schönfeld et al., 2016). In line with Bandura's social cognitive theory (SCT), there is a broad basis for higher self-efficacy being associated with lower symptoms of depression and anxiety as well as with higher optimism and emotional well-being (Benight et al., 1999; Rottmann et al., 2010; Singh and Bussey, 2011; Wang et al., 2014).

While self-efficacy has been found to enhance performance (e.g., Bandura and Locke, 2003), contradictory findings also exist, suggesting that effects may differ with respect to different performance outcomes such as performance quality and quantity (Vancouver et al., 2014). The first part of our study contributes to the answer to this yet open research question by examining the effects of positive feedback as an intervention to manipulate self-efficacy and we test how self-efficacy affects performance quality and quantity. Further, we contribute to research on the mechanisms via which the effects of positive feedback are transmitted to performance by using self-efficacy as a mediator in the feedback-performance relationship.

The second part of this study examines flow – the experience of being fully absorbed in a task (Csikszentmihalyi, 1975) – in relation to positive feedback and self-efficacy. Feedback has been described as a central antecedent of flow (e.g., Landhäußer and Keller, 2012), but effects of positive feedback on flow have not yet been investigated experimentally. Another antecedent of flow as identified in field studies (e.g., Zubair and Kamal, 2015a,b) is self-efficacy. This study aims to replicate the positive relationship between self-efficacy and flow in an experimental setting, in which self-efficacy is manipulated using positive feedback. Finally, and adding to the existing research, we aim to test whether self-efficacy transmits effects of positive feedback on flow experience.

#### Positive Feedback and Performance

Feedback can be defined as the "provision of information regarding some aspect(s) of one's task performance" (Kluger and DeNisi, 1996, p. 255). Meta-analyses show impressive effects of feedback on increased performance, with average effect sizes of d = 0.40 (Kluger and DeNisi, 1996) and d = 0.79 (Hattie and Timperley, 2007). Research has identified moderators of the feedback-performance relationship, with findings suggesting that positive feedback is more efficient than negative feedback. For example, Arbel et al. (2014) found that positive feedback improved learning performance more than negative feedback. Furthermore, it has been found that feedback after good trials enhanced learning in comparison to feedback after poor trials (Chiviacowsky and Wulf, 2007). In line with these findings, the meta-analysis of Kluger and DeNisi (1996) found that feedback following correct results, that is, positive feedback, was more effective than feedback following incorrect results. Furthermore, feedback was more effective when it was provided by a computer (d = 0.41) vs. not (d = 0.23; Kluger and DeNisi, 1996).

One particular type of feedback is normative feedback, which refers to information on one's performance compared to referenced others, allowing comparative inferences (Hartwell and Campion, 2016). Accordingly, positive normative feedback is the information that one's performance was better than that of referenced others, such as feedback indicating above-average performance. Studies suggest that such positive normative feedback – even if it is false feedback – leads to increased performance compared to negative normative feedback (i.e., indicating below-average performance; Bandura and Jourden, 1991; Wulf et al., 2010). Based on these findings, we expect to replicate earlier studies by finding positive effects of positive normative feedback on performance.

# Positive Feedback as an Intervention to Increase Self-Efficacy

Previous research has shown that false normative positive feedback not only affects performance, but also self-efficacy, and such feedback has been successfully applied to manipulate self-efficacy (e.g., Reynolds, 2006; Beattie et al., 2016; Brown et al., 2016; Dimotakis et al., 2017). This has also been used in experiments with mental arithmetic tasks (Weinberg et al., 1979; Wright and Gregorich, 1989; Eden and Zuk, 1995; Brown et al., 2012). The approach is in line with Bandura's SCT (Bandura, 1977). Bandura (1977) pointed out that there are different kinds of information that lead to expectations about personal efficacy: performance accomplishment, vicarious experience, verbal persuasion and psychological states. In line with that, external persuasion through positive feedback to induce an experience of success should be an effective strategy to manipulate self-efficacy (Achterkamp et al., 2015). In line with this, we expect to replicate positive effects of positive normative feedback on self-efficacy.

# Self-Efficacy and Performance

Successful mastery experiences contribute to the development of efficacy beliefs and increase the investment of effort and the level of performance (Bandura, 1997). Perceived self-efficacy is a key dynamic and malleable factor affecting behavior (Gist and Mitchell, 1992; Hardy, 2014), and some evidence indicates that higher self-efficacy leads to better performance in cognitive and sports tasks (e.g., Beattie et al., 2014; Niemiec and Lachowicz-Tabaczek, 2015). At the same time, divergences in social cognitive and control theories lead to different assumptions about the effects of self-efficacy (see Bandura and Locke, 2003; Bandura, 2012; Schönfeld et al., 2017). For example, Powers' (1973, 1991) perceptual control theory assumes that the discrepancy between one's personal goal and one's perceived progress in handling a situation successfully regulates the performed action (e.g., Vancouver et al., 2002). In case of a low discrepancy, the person will invest fewer resources in achieving the goal, and successful performance is assumed to be easy. In situations in which perceived progress is ambiguous, perceived capabilities (i.e., self-efficacy) can be used as an indicator of progress. As a consequence, high perceived skills will lead to a decreased perceived discrepancy between goal and progress. Thus, according to perceptual control theory, high selfefficacy would undermine performance and motivation. Initial empirical findings support these assumptions (e.g., Vancouver et al., 2001, 2002, 2008, 2014; Vancouver and Kendall, 2006; Vancouver, 2012; Beattie et al., 2014). A study by Vancouver et al. (2014), for example, found that self-efficacy was negatively related to performance quality, while it was positively related to performance quantity. They assumed that individuals with high self-efficacy allocate less effort per task, which leads to faster progress (performance quantity), but lower quality of results in the form of more mistakes. Yet, more research

is needed to disentangle effects of self-efficacy on different performance measures.

In accordance with the findings of Vancouver et al. (2014), our study differentiates between performance quantity and performance quality in that we assume that self-efficacy has positive effects on performance quantity, but negative effects on performance quality.

A general limitation of research on self-efficacy and performance is that it is largely based on observational rather than experimental designs, so no conclusions can be drawn about the direction of effects. While most studies assume positive effects of self-efficacy on performance, Bandura (1977) in fact already identified performance accomplishment as an antecedent of self-efficacy. Accordingly, experimental research on the relationship between self-efficacy and performance is necessary to disentangle potential bidirectional effects.

Consequently, by differentiating between performance quantity and quality, and by applying an experimental design in which we manipulate self-efficacy by means of positive feedback, we aim to contribute to a better understanding of the relationship between self-efficacy and performance.

## Self-Efficacy as a Mechanism That Transmits Effects of Positive Feedback on Performance

As mentioned above, the induction of positive feedback including a favorable comparison to others has been found to be a suitable method to enhance the level of self-efficacy, which in turn, affects performance (see Zinken et al., 2008). Based on a comparative appraisal, the individual is persuaded that he or she has performed successfully, which is in line with SCT. Using feedback-manipulation as a strategy to increase a person's appraisal of his or her capabilities, beneficial effects have also been demonstrated in the context of emotional learning processes (Zlomuzica et al., 2015). However, to the best of our knowledge, the mediation hypothesis of self-efficacy has not yet been tested. Integrating the described relationships, we propose that selfefficacy acts as a mediator, transmitting positive effects of positive feedback on performance. Taking the differential expectations for the relationship of self-efficacy with performance quantity and performance quality into account, we expect that self-efficacy acts as a mediator, transmitting positive effects of positive feedback on performance quantity (Hypothesis 1a), but transmitting negative effects of positive feedback on performance quality (Hypothesis 1b).

### Effects of Feedback on Flow-Experience

Flow is the positive experience of being fully absorbed in an optimally challenging task. While in flow, individuals are completely concentrated on the task at hand, which is experienced as rewarding in itself. Individuals perceive clear goals and feedback and a high level of control over the demands, thereby experiencing a merging of action and awareness, and a loss of self-consciousness, along with a distorted sense of time (Csikszentmihalyi, 1975). Flow can be experienced in different activities and tasks, among them cognitive tasks such as solving math calculations – even under laboratory conditions (e.g., Harmat et al., 2015; Ulrich et al., 2016).

Feedback has been described as one of the core antecedents fostering flow-experience (Bakker, 2005; Demerouti, 2006; Landhäußer and Keller, 2012; Nakamura and Csikszentmihalyi, 2014). This conceptualization is in line with findings based on the job characteristics model (Hackman et al., 1975), showing that feedback along with four other core job characteristics is positively related to flow experience (Bakker, 2005; Demerouti, 2006; Maeran and Cangiano, 2013). Studies that have specifically examined the relationship between feedback and flow have confirmed a positive link between the two (Rau and Riedel, 2004; Maeran and Cangiano, 2013). However, these studies have focused on feedback in general, and the effects of specifically positive feedback on flow have not yet been studied using quantitative research. Qualitative research has provided first indications that positive feedback – but not negative feedback – is an antecedent of flow (Jackson, 1995; Swann et al., 2015). In line with this, positive normative feedback has been found to have positive effects on positive affect during a challenging task (Hutchinson et al., 2008) – which is often linked to flow experience. Furthermore, positive normative feedback has been suggested to have energizing and reinforcing effects (Kühn et al., 2008) – both typical characteristics of the experience of flow. Bringing together theoretical and empirical evidence, we expect to find positive effects of positive normative feedback on flow experience.

### Self-Efficacy and Flow-Experience

A central component of flow is the perceived balance between the demands of the task and the individual's skills (e.g., Csikszentmihalyi, 1975; Landhäußer and Keller, 2012). The level of self-efficacy is an individuals' evaluation of his/her skills and therefore has a substantial impact on how the balance between skills and task demands is perceived. High levels of self-efficacy should thus positively impact flow. Empirical studies support this assumption: For example, Zubair and Kamal (2015a,b) investigated the relationship between the dimensions of psychological capital (Luthans et al., 2004) and flow experience and found that all dimensions, including self-efficacy, were positively related to flow. In a two-wave longitudinal study design, Salanova et al. (2006) found that work-specific selfefficacy beliefs facilitated the experience of work-related flow. In another longitudinal study, Rodríguez-Sánchez et al. (2011) found that teachers' work-related self-efficacy positively affected their flow experience. Furthermore, collective efficacy beliefs are associated with higher flow. In a longitudinal study with small groups, it was found that collective efficacy beliefs can lead to higher collective flow, which in turn leads to higher collective self-efficacy in the future, forming a reciprocal relationship (Salanova et al., 2014). Furthermore, Pineau et al. (2014) found that self-efficacy as well as team-efficacy are significantly related to dispositional flow. All in all, abundant research supports the hypothesis that self-efficacy beliefs are positively associated with flow experience. Following the existing literature, we postulate that self-efficacy facilitates flow experience.

# Self-Efficacy as a Mechanism That Transmits Effects of Positive Feedback on Flow

While results from field studies, including long-term studies, suggest a reciprocal and positive relationship between selfefficacy and flow, this relationship has not yet been tested experimentally. As outlined above, positive feedback is an established intervention to positively affect self-efficacy. Thus, we use positive feedback to manipulate self-efficacy with the aim to test the effects of self-efficacy on flow experimentally. Furthermore, and as described above, theoretical considerations and qualitative research suggest also direct effects of positive feedback on flow. We suggest that these effects of positive feedback on flow can be explained at least partially by increased self-efficacy. Accordingly, we propose that self-efficacy acts as a mediator, transmitting positive effects of positive feedback on flow experience (Hypothesis 2).

## MATERIALS AND METHODS

## Participants and Design

The sample was recruited at Ruhr University Bochum (Germany) through postings in social media networks, such as student groups in Facebook or via announcements on notice boards. The total sample consisted of 134 subjects (82 females, 52 males). Due to missing values, data from 23 subjects were excluded from the analyses: Eighteen participants did not complete the Flow-Short-Scale, one participant did not complete the self-efficacy scale and four participants did not perform the mental arithmetic task. The data was z-transformed and due to outliers on the study variables (flow, specific self-efficacy, performance quantity, and performance quality) another nine subjects were excluded<sup>1</sup> . The final analysis included data from 102 participants, of which 63 were female and 39 were male, with a mean age of 22.51 (SDage = 3.13). Participants were mainly undergraduate students (76.5%). Another 20.6% were students with a bachelor's degree and 2.9% held a secondary school degree. Participants rated their ability in mental arithmetic on a 100-point scale on average at M = 54.22 (SD = 17.32). Furthermore, they rated the difficulty of the experimental task on an 8-point Likert Scale from 0 = "not difficult at all" to 7 = "very difficult" to be at an average level, with the second task being slightly more difficult than the first (Mtask1 = 3.87, SDtask1 = 1.60; Mtask2 = 4.43, SDtask2 = 1.63).

Participants were randomly assigned to one of two conditions: the positive feedback condition (n = 53, 33 female, Mage = 22.43, SDage = 3.17) or the non-feedback condition (n = 49, 30 female, Mage = 22.59, SDage = 3.12). All participants provided written informed consent and received course credit for participation. The study was approved by the local Ethics Committee of the Faculty of Psychology at Ruhr University Bochum, Germany.

#### Task

The experimental task was a computer-based mental arithmetic task, which lasted 5 min per block, with the participant sitting alone in the laboratory in front of a computer screen. A computer program written in VB.NET (Microsoft Visual Studio [Software], 2015) was used to generate the task on the screen. Participants typed the calculated numbers into the computer and pressed "enter" after each calculation. All the previously calculated numbers could be seen on the screen while working on the task. After 5 min, the task stopped automatically. In the first block of the mental arithmetic task, participants were asked to subtract the number 12, starting at 2000, consecutively with maximal accuracy and rapidness for 5 min. In the second block of the mental arithmetic task, participants were asked to subtract the number 17, starting at 2043, consecutively with maximal accuracy and rapidness for 5 min. The mental arithmetic task was constructed based on the Trier Social Stress Test (Kirschbaum et al., 1993), which uses a similar mental arithmetic task as part of the protocol. Importantly, and in contrast to the Trier Social Stress Test, there was no social stress component in our mental arithmetic task.

#### Feedback Manipulation

After the first block of the mental arithmetic task, the feedback group saw a note on their computer screen stating that their performance had been evaluated in terms of accuracy and rapidness. According to this analysis, he or she had performed better than the average of the previous participants, and that compared to the average participant, he or she was better able to follow new instructions and to manage mathematical problems spontaneously. The control group also saw a note on their screen, simply stating that time was up.

#### Procedure

The experiment was conducted in a laboratory room of Ruhr University Bochum. The participant was seated in front of a computer and asked to read and sign the informed consent form. Self-report measures and subsequent instructions were presented on the computer screen (see **Figure 1**). After a baseline measure of self-efficacy, the participant was asked to complete the first block of the mental arithmetic task (2000–12). Participants in the feedback group received positive normative feedback after the task was accomplished, while participants in the nofeedback group received no feedback. Right after this feedback, specific self-efficacy was assessed. Participants then completed the second block of mental arithmetic tasks (2043–17) for 5 min and finally were asked to answer questionnaires on flow and specific self-efficacy with respect to that task. Participants were then debriefed regarding the purpose of the study and the feedback manipulation.

#### Measures Specific Self-Efficacy

Based on a guide for constructing self-efficacy scales developed by Bandura (2006), participants rated their ability to complete mental arithmetic tasks on a 10-point scale from 0 = "cannot do at all" to 100 = "can do very well" to measure their level of specific self-efficacy (Mt1 = 52.84, SDt1 = 18.21) for mental arithmetic tasks. This item was assessed at baseline level (t0), after the first task following the feedback manipulation (t1), and

<sup>1</sup>We performed all analyses with and without outliers and results did not change.

after the second task (compare **Figure 1**). The three measurement points t0, t1, and t2 were used for the manipulation check, that is, to test if the feedback manipulation increased specific selfefficacy over time. Specific self-efficacy at t1 was used to test the hypothesized mediation effects.

#### Performance

Performance quantity (MQN = 28.33, SDQN = 9.28) was assessed using the number of calculated results in the given time (5 min). Performance quality (MQL = 0.90, SDQL = 0.10) was assessed using the ratio between the number of correctly calculated results and the total number of calculated results.

#### Flow

Flow (Mt2 = 4.50, SDt2 = 1.18) was measured with the Flow-Short-Scale (Rheinberg et al., 2003), which comprises ten items measuring absorption ("I did not notice time passing") and fluency ("My thoughts/activities ran fluidly and smoothly") as experienced during the task on a 7-point Likert Scale. The reliability of the scale was very good with a Cronbach's Alpha of 0.92. The scale was administered after the second task (at t2; compare **Figure 1**).

#### Data Analysis

Data were analyzed with the IBM SPSS statistics package. To analyze the efficiency of our manipulation, we tested whether participants' specific self-efficacy increased over time in the feedback group using a repeated measures ANOVA. As the Mauchly test of sphericity was significant, we used the Greenhouse-Geisser correction procedure. The mediation analyses were conducted with the SPSS macro Process (Hayes, 2013). For the mediation analyses, all variables were z-standardized. To estimate if self-efficacy served as the mediator, the indirect effect ab was estimated (Preacher and Hayes, 2008). We report 95% confidence bootstrap intervals for the indirect effect (nbootstrap = 5000).

# RESULTS

**Table 1** shows means, standard deviations and correlations of all study variables.

#### Manipulation Check

With regard to the experimental manipulation of specific selfefficacy, a significant main effect of time [F(1.82, <sup>181</sup>.59) = 4.97, p = 0.010, η<sup>p</sup> <sup>2</sup> = 0.047] and an interaction effect [F(1.82, <sup>181</sup>.59) = 6.23, p = 0.003, η<sup>p</sup> <sup>2</sup> = 0.059] were found for specific self-efficacy for mental arithmetic tasks. As can be seen in **Figure 2**, in contrast to the feedback group, the control group showed a decrease in specific self-efficacy for mental arithmetic tasks after Task 1.

mental arithmetic tasks at measurement points t0, t1, and t2 for the control group (no feedback) and the experimental group (positive feedback).


<sup>∗</sup>p < 0.05; ∗∗p < 0.01.

# Testing of Hypotheses

\*\*\*p < 0.001.

To test Hypothesis 1a, we z-standardized all study variables and performed a mediation procedure with feedback as the independent variable, specific self-efficacy as the mediator, and performance quantity as the dependent variable. The a- (β = 0.43, SE = 0.19, p = 0.029) and b-path (β = 0.40, SE = 0.09, p < 0.001) were significant. The indirect effect was β = 0.17 (SE = 0.10, 0.01 < CI < 0.39) and significant (compare **Figure 3A**). The total effect was β = −0.07 (SE = 0.20, t = −0.38, p = 0.708) and unexpectedly not significant. The direct effect was also not significant (β = −0.25, SE = 0.19, t = −1.30, p = 0.195) when controlling for the indirect effect. However, as the indirect effect was significant, specific self-efficacy appeared to transmit a positive effect of positive feedback on performance quantity as hypothesized.

Regarding Hypothesis 1b, the a- (β = 0.43, SE = 0.19, p = 0.029) and b-path (β = 0.31, SE = 0.10, p = 0.002) of the model with specific self-efficacy as the mediator, group as the independent variable, and performance quality as the dependent variable were both significant. The indirect effect was β = 0.13 (SE = 0.08, 0.00 < CI < 0.32) and significant (compare **Figure 3B**). The total effect was β = −0.00 (SE = 0.20, t = −0.02, p = 0.988) and not significant. The direct effect was not significant (β = −0.14, SE = 0.20, t = −0.70, p = 0.483) when controlling for the indirect effect. As the b-path in this model was significantly positive, these results run counter to our expectation to find negative effects of increased self-efficacy on performance quality. We further found that self-efficacy appeared to transmit a positive effect of positive feedback on performance quality – while we had expected that a negative effect would be transmitted.

To conclude – in line with Hypothesis 1a, but in conflict with Hypothesis 1b – specific self-efficacy mediated a positive effect

of positive feedback on both performance quantity and quality. Contrary to our expectations, however, the total effect of positive feedback on both performance measures was not significant.

Regarding Hypothesis 2 the a- (β = 0.43, SE = 0.19, p = 0.029) and b-path (β = 0.57, SE = 0.09, p < 0.001) of the z-standardized mediation model with specific self-efficacy as the mediator, feedback as the independent variable, and flow as the dependent variable were significant. In support of Hypothesis 2, the indirect effect was β = 0.25 (SE = 0.11, 0.02 < CI < 0.47) and significant. The total effect was β = 0.12 (SE = 0.20, t = 0.61, p = 0.544) and not significant. The direct effect was not significant (β = −0.13, SE = 0.17, t = −0.74, p = 0.461) when controlling for the indirect effect (compare **Figure 4**). The significance of the indirect effect confirms Hypothesis 2, that specific self-efficacy would transmit a positive effect of positive feedback on flow. However, and contrary to our expectations, the total effect of positive feedback on flow was not significant.

### DISCUSSION

In this study we aimed to test the postulated effect of selfefficacy on flow-experience and performance experimentally. To manipulate self-efficacy, we used a well-established paradigm, that is, false normative positive feedback about performance on a mental arithmetic task. Using a bootstrap procedure to conduct mediation analyses, we found that positive feedback enhances specific self-efficacy, which, in turn, enhances performance (quality and quantity) and flow experience in a subsequent task. In the following we discuss our results in more detail:

First of all, given our successful manipulation check, we could replicate earlier studies showing that false positive normative feedback is an efficient intervention to promote self-efficacy (Weinberg et al., 1979; Wright and Gregorich, 1989; Eden and Zuk, 1995; Brown et al., 2012). It can therefore be recommended as an experimental manipulation of self-efficacy in future studies.

Furthermore, Hypothesis 1a was confirmed: The results suggested that positive feedback has an indirect positive effect on performance quantity via self-efficacy. Considering the positive b-path in our model, that is, the positive effect of self-efficacy on performance quantity, our results replicate earlier studies that also found that performance increases with increased selfefficacy (e.g., Bandura and Locke, 2003; Vancouver et al., 2014).

As previous research on the relationship between self-efficacy and performance is largely based on observational data, a strength of our study is the experimental approach to manipulate self-efficacy using positive feedback.

By differentiating between performance quantity and quality, our study further contributes to the debate regarding whether – and for which performance parameters – self-efficacy has positive or maybe even negative effects on performance: some researchers argue that high self-efficacy might undermine motivation, as a person might believe that effort is not necessary to successfully cope with low demands compared to high abilities, which leads to an increase in performance quantity (as also found in our study), but to a decrease in performance quality (see, e.g., Vancouver et al., 2001, 2002, 2014; Vancouver, 2012). That is why we assumed in Hypothesis 1b that we would find a negative indirect effect of positive feedback via self-efficacy on performance quality. However, contrary to Hypothesis 1b, we found this indirect effect to be positive, with a positive effect of self-efficacy on performance quality. A possible explanation for the contradictory findings in the literature could be that the undermining effect of self-efficacy on performance quality only occurs if self-efficacy is very high. In our case, we told participants in the feedback condition that their performance was "above average", which is a relatively moderate manipulation. Thus, while we successfully increased self-efficacy levels, they only increased to a moderate but not to a very high level. This explanation would be in line with the proposed inverted u-shaped relationship between self-efficacy and performance (cf. Schönfeld et al., 2017): there could be an increase in performance until self-efficacy is moderately high and a decrease in performance if self-efficacy further increases.

Hypothesis 2 was also confirmed, with our results suggesting that positive feedback increases flow-experience via increased self-efficacy. The finding that experimentally induced self-efficacy increased flow-experience supports earlier cross-sectional field studies that have found positive associations between the two (Zubair and Kamal, 2015a,b) and longitudinal studies (Salanova et al., 2006; Rodríguez-Sánchez et al., 2011) with first evidence for a causal effect of self-efficacy on flow experience. By using an experimental manipulation to increase self-efficacy, we add further evidence to the existing literature that self-efficacy can causally increase flow-experience.

However, our study did not test the opposite causal direction, that experiencing flow would increase self-efficacy. Flow provides an enjoyable feeling of control over the activity at hand, while applying one's skills. Thus, it is likely that flow also enhances selfefficacy, and that the effects are bidirectional. This reciprocity suggests that an upward spiral of self-efficacy and flow can occur – as supported by earlier results from field studies (Salanova et al., 2006, 2014). Future experimental studies should look at both causal directions, replicating and complementing previous results.

Our results further showed an indirect effect of positive feedback on flow. This is in line with earlier theory and research: In his nine components of flow-experience, Csikszentmihalyi (1990) had already named "clear goals and feedback" as one of the components. Later operationalizations of flow distinguished between antecedents, characteristics and consequences of flow and considered feedback as an antecedent (Nakamura and Csikszentmihalyi, 2002; Landhäußer and Keller, 2012). Crosssectional field studies on feedback and flow supported this assumption (Rau and Riedel, 2004; Maeran and Cangiano, 2013). To the best of our knowledge, ours is the first study to show an indirect effect of positive feedback on flow in an experimental design, thereby providing insights into a mechanism that can transmit the effects of positive feedback to flow: selfefficacy. Accordingly, providing positive feedback enhances selfefficacy, which presumably enhances the feeling of competency and control in the respective task – two characteristics of flow experience.

However, it needs to be stated that while the indirect effect of positive feedback via self-efficacy on flow was significant, the total effect was not significant, and neither was the direct effect of positive feedback on flow when self-efficacy was included as a mediator. According to Hayes (2009, 2013), finding an indirect effect confirms mediation, while a missing total effect does not contradict mediation: "A failure to test for indirect effects in the absence of a total effect can lead to you miss some potentially interesting, important, or useful mechanisms by which X exerts some kind of effect on Y (Hayes, 2009; p. 415)." In cases in which the total effect is not significant, it is likely that different mechanisms play a role in the relationship between independent and dependent variable. This means that in addition to a positive effect of positive feedback on flow via enhanced self-efficacy, it is likely that there are counteracting mediators in this process, transmitting negative effects of positive feedback on flow. Such possible mediators should be investigated in future studies.

Similarly, the indirect effect of positive feedback via selfefficacy on performance quantity and quality were significant, while the total and direct effects were not significant. This again underlines the possibility of counteracting mechanisms between positive feedback and performance. While positive feedback via self-efficacy positively impacts performance, positive feedback might lead to the assumption that less effort is necessary, which could have a counteracting negative impact on performance (e.g., Vancouver et al., 2014). Such counteracting mechanisms of action should be examined in future studies, for example with the use of physiological indicators of mental effort, such as high frequency heart rate variability. Not only short-term but also long-term perspectives are relevant: an increase in self-efficacy through regular positive feedback might have long-term consequences on flow, and – through extensive application – on skill acquisition and future performance.

Another possible explanation for the lack of a total or direct effect of positive feedback on flow is that the feedback manipulation was potentially not salient enough: We told participants that they were better than average – which is what most people would assume anyway: the "better-thanaverage" effect is a robust finding in research on social comparisons (compare Alicke and Govorun, 2005). Accordingly, our positive feedback was potentially a confirmation of an existing assumption (i.e., neutral feedback) rather than a positive deviation. Future studies should use stronger positive feedback manipulations in order to investigate its effects on flow and performance.

Yet another explanation for the missing findings could be the kind of feedback that we used: Prior studies that addressed the relationship between flow mostly referred to either task-inherent feedback or supervisor feedback (mostly according to the Job Characteristics Model, Hackman et al., 1975). In their feedback measure, these studies did not differentiate between a normative vs. an individual reference norm (e.g., Bakker, 2005; Demerouti, 2006). Regarding performance there is research differentiating between normative and individual feedback: Brunstein and Hoyer (2002), for example, investigated the effects of positive vs. negative feedback with normative vs. individual reference norm on performance. They did not find any main effects of positive normative feedback on performance (i.e., d2-test for concentration). Descriptively, positive normative feedback was even associated with lower performance (higher reaction times). In addition, Brunstein and Hoyer (2002) found significant interactions with the achievement motive: A higher implicit (but not explicit) achievement motive was associated with higher performance after feedback. Interestingly, it was not positive normative feedback but negative individual feedback that spurred achievement-motivated individuals' performance. These results show that future studies should explicitly differentiate between normative and individual feedback to get deeper insights into the mechanisms that facilitate or hinder flow experience and performance. As moderating variables, implicit and explicit achievement motives should be controlled, as differential effects might occur. Furthermore, feedback with an individual reference norm may be more appropriate to foster performance and flow (Brunstein and Hoyer, 2002; Brunstein and Maier, 2005).

While Brunstein and Hoyer found that performance was enhanced by negative feedback at least for individuals scoring high on the implicit achievement motive, we do not know how negative feedback impacts flow. In theoretical models (Nakamura and Csikszentmihalyi, 2002; Landhäußer and Keller, 2012) as well as in previous (field) studies (Rau and Riedel, 2004; Maeran and Cangiano, 2013) on the relationship between feedback and flow, positive and negative feedback were not distinguished. However, there is evidence from qualitative research that positive feedback is especially beneficial for flow (Jackson, 1995; Swann et al., 2015). Thus, it will be interesting for future studies to look at the differential effects of positive vs. negative feedback, differentiating between task-inherent, normative and individual feedback, and controlling for an individual's achievement motive.

A potential limitation of our results refers to the fact that we did not use change scores of performance in the mental arithmetic tasks to control for baseline scores. However, based on the design of our study, the interpretation of change scores is problematic: The two tasks differed very much in difficulty: while the first task was very easy (continuously subtracting 12 from 2000), the second task was more difficult (continuously subtracting 17 from 2043). Our results clearly support this, as both performance indicators decreased significantly from t1 to t2. Furthermore, the low difficulty in the first task likely led to a ceiling effect of performance in both groups, reducing the systematic variance in the data. These circumstances make the change scores very hard to interpret. At the same time, as we had randomly assigned our participants to the experimental conditions, we believe that mental arithmetic skills are equally distributed between groups. Accordingly, we refrained from using change scores in our data analysis. Detailed results using change scores in the analyses can be found in the **Supplementary Material**.

One more possible limitation of our study refers to the measurement of flow: By using the flow short scale (Engeser and Rheinberg, 2008), we applied a widely used componential approach to assess flow (compare Moneta, 2012). This scale measures flow as a continuous phenomenon: the more its components are pronounced, the higher flow values. The components used in the flow short scale reflect those proposed by Csikszentmihalyi (1975, see flow definition above). However, an ongoing discussion in flow research is whether flow is a continuous phenomenon or if it is rather a yes-or-no phenomenon (compare Engeser, 2012; Peifer and Engeser, in press), with an individual either being in flow or not. A cut-offvalue for flow when measured with the componential approach has not yet been identified and is a challenge for future research. With what we know today, the current study is limited insofar as it cannot differentiate between flow or not-flow, but rather measures more or less pronounced flow components.

#### Practical Implications

Based on our results, we can recommend positive feedback as an intervention to enhance self-efficacy. This recommendation can be applied to different contexts such as work or schools. In the work context, positive feedback could be given by supervisors in annual performance reviews and in regular meetings. A precondition to providing clear feedback is goal setting, as expectations are made transparent to the employee. Goals should be realistic and achievable in due time in order to provide the opportunity for positive feedback on a regular basis. In general, positive feedback can also come from sources other than just the supervisor – for example from customers or colleagues. Positive feedback has further been found to be positively related to job satisfaction (Yukl et al., 2002) and wellbeing (Stocker et al., 2014). Thus, an organizational climate of mutual appreciation is a good basis for employees' self-efficacy, satisfaction and well-being. As shown in the current study, selfefficacy is also positively related to flow and performance and acts as a mediator transmitting positive effects of positive feedback to flow and performance.

While we manipulated self-efficacy in this study using positive feedback, there are other interventions that have been successfully used to enhance self-efficacy, for example interventions to increase psychological capital (Luthans et al., 2006, 2008, 2010). These interventions could therefore also be used to increase self-efficacy, and, in turn, performance and flow-experience.

In this study, we found short-term and immediate effects of self-efficacy on flow-experience. Future studies should have a look at spillover effects and at long-term effects of self-efficacy interventions: Enhancing flow-experience has been found to affect future flow and performance in similar tasks (Christandl et al., 2018). Furthermore, flow has been found to lead to long-term increased performance via increased motivation to practice (Engeser et al., 2005; Schüler, 2007; Schüler and Brunner, 2009). Thus, interventions to foster the pleasant experience of flow are a valuable endeavor for institutions (e.g., organizations, schools) as well as for individuals (e.g., employees, students).

# CONCLUSION

fpsyg-11-01008 June 10, 2020 Time: 12:40 # 9

"Well done?" – our study investigated the relationships between positive normative feedback, self-efficacy, performance, and flow experience. Our results provide experimental evidence that positive feedback enhances self-efficacy. Further, we found an indirect effect of feedback via self-efficacy on performance quantity and quality, as well as on flow experience. However, mutually opposing counter-mechanisms were potentially also active as we did not find a total effect of positive normative feedback on performance and flow, calling for further research on this issue.

### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

### ETHICS STATEMENT

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with

#### REFERENCES


the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

#### INFORMED CONSENT

Informed consent was obtained from all individual participants included in the study.

# AUTHOR CONTRIBUTIONS

CP and PS conceived of the presented idea. PS carried out the experiment. CP, PS, and FA developed the theory. PS and GW wrote the methods. GW and FA performed the computations and wrote the results part. CP wrote the discussion. JM supervised the concept and findings of this work. All authors discussed the results and contributed to the final manuscript.

#### ACKNOWLEDGMENTS

We acknowledge support by the DFG Open Access Publication Funds of the Ruhr University Bochum.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2020.01008/full#supplementary-material




and depressive symptoms among Chinese unemployed population: a crosssectional study. BMC Psychiatry 14:61. doi: 10.1186/1471-244X-14-61


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Peifer, Schönfeld, Wolters, Aust and Margraf. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Dissociable Effects of Reward on P300 and EEG Spectra Under Conditions of High vs. Low Vigilance During a Selective Visual Attention Task

Jia Liu1,2 , Chi Zhang<sup>1</sup> , Yongjie Zhu1,2 , Yunmeng Liu<sup>1</sup> , Hongjin Sun<sup>3</sup> , Tapani Ristaniemi <sup>2</sup> , Fengyu Cong1,2,4,5\* and Tiina Parviainen<sup>6</sup> \*

#### Edited by:

Benjamin Cowley, University of Helsinki, Finland

#### Reviewed by:

Kirk R. Daffner, Brigham and Women's Hospital and Harvard Medical School, United States Edmund Wascher, Leibniz Research Centre for Working Environment and Human Factors (IfADo), Germany

#### \*Correspondence:

Fengyu Cong cong@dlut.edu.cn Tiina Parviainen tiina.m.parviainen@jyu.fi

#### Specialty section:

This article was submitted to Cognitive Neuroscience, a section of the journal Frontiers in in Human Neuroscience

> Received: 15 September 2019 Accepted: 07 May 2020 Published: 24 June 2020

#### Citation:

Liu J, Zhang C, Zhu Y, Liu Y, Sun H, Ristaniemi T, Cong F and Parviainen T (2020) Dissociable Effects of Reward on P300 and EEG Spectra Under Conditions of High vs. Low Vigilance During a Selective Visual Attention Task. Front. Hum. Neurosci. 14:207. doi: 10.3389/fnhum.2020.00207 <sup>1</sup>School of Biomedical Engineering, Dalian University of Technology, Dalian, China, <sup>2</sup>Faculty of Information Technology, University of Jyväskylä, Jyväskylä, Finland, <sup>3</sup>Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, ON, Canada, <sup>4</sup>School of Artificial Intelligence, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China, <sup>5</sup>Key Laboratory of Integrated Circuit and Biomedical Electronic System, Dalian University of Technology, Dalian, China, <sup>6</sup>Centre for Interdisciplinary Brain Research, Department of Psychology, Faculty of Education and Psychology, University of Jyvaskyla, Jyvaskyla, Finland

The influence of motivation on selective visual attention in states of high vs. low vigilance is poorly understood. To explore the possible differences in the influence of motivation on behavioral performance and neural activity in high and low vigilance levels, we conducted a prolonged 2 h 20 min flanker task and provided monetary rewards during the 20- to 40- and 100- to 120-min intervals of task performance. Both the behavioral and electrophysiological measures were modulated by prolonged task engagement. Moreover, the effect of reward was different in high vs. low vigilance states. The monetary reward increased accuracy and decreased the reaction time (RT) and number of omitted responses in the low but not in the high vigilance state. The fatigue-related decrease in P300 amplitude recovered to its level in the high vigilance state by manipulating motivation, whereas the fatigue-related increase in P300 latency was not modulated by reward. Additionally, the fatigue-related increase in event-related spectral power at 1–4 Hz was sensitive to vigilance decrement and reward. However, the spectral power at 4–8 Hz was only affected by the decrease in vigilance. These electrophysiological measures were not influenced by motivation in the state of high vigilance. Our results suggest that neural processing capacity, but not the timing of processing, is sensitive to motivation. These findings also imply that the fatigue-related impairments in behavioral performance and neural activity underlying selective visual attention only partly recover after manipulating motivation. Furthermore, our results provide evidence for the dissociable neural mechanisms underlying the fatigue-related decrease vs. reward-related increase in attentional resources.

Keywords: vigilance, mental fatigue, motivation, selective visual attention, event-related potential, event-related spectral perturbation

#### HIGHLIGHTS


#### INTRODUCTION

Although we are subjected to constant visual information in daily life, our visual capacity to process this information is limited. To perform in an efficient and goal-directed manner, we need to continuously distinguish the relevant information from the visual environment and allocate our limited attentional capacity to the selected target objects, a phenomenon referred to as selective visual attention (Moore and Zirnsak, 2017). As outlined by Robert and Duncan (1995), selective visual attention is characterized by two basic phenomena: the ability to filter out task-irrelevant stimuli and the limited capacity for task-relevant information processing, both of which leading to reduced accuracy when the target number increases. We experience selective visual attention in many daily activities. For example, customers find the target objects among colorful irrelevant sales; car drivers filter out irrelevant surroundings and detect the relevant road marks and traffic lights. However, prolonged engagement in selective attention tasks inevitably leads to increased errors, deactivated performance goals, diminished motivation to continue performing the task (Boksem et al., 2005), and an increase in mental fatigue (Kok, 2001; Lal and Craig, 2001; Gergelyfi et al., 2015; Benoit et al., 2019).

Mental fatigue is caused by prolonged cognitive task performance (Gergelyfi et al., 2015). It is considered a related concept but distinct from arousal, which often refers to a physiological state and is closely linked with the transition between wakefulness and sleep (Shen et al., 2006). Mental fatigue is a cumulative process, accompanied by a feeling of indolence, reduced motivation, and impaired performance (Lal and Craig, 2001). What is more, mental fatigue exhibits more cognitive elements than arousal. Based on different causal factors, two types of mental fatigue can be identified: sleepand task-related (May and Baldwin, 2009). The former results from accumulated sleep debt, whereas the latter from prolonged task engagement (May and Baldwin, 2009). In the present study, we aimed to examine the task-related mental fatigue, specifically related to attentional resources. The attentionrequiring task performance over a prolonged duration pointedly refers to vigilance decrement (Mackworth et al., 1964), which is likely identical or very closely related to mental fatigue (Oken et al., 2006). For this reason, both terms have been used interchangeably in previous studies (Taya et al., 2018; Reteig et al., 2019).

Vigilance decrement has been reported as a major factor in a large proportion of road crashes due to the reduction of attentional resources. Although the risks of vigilance decrement have received much attention, the underlying neurophysiological mechanisms have not yet been established (Lorist et al., 2005; Tops et al., 2006; Benoit et al., 2019). In earlier research, three core concepts around vigilance decrement or mental fatigue have emerged, namely, active fatigue, passive fatigue, and motivational control. Active fatigue is a result of an excessive workload—needed to carry out a task over a prolonged duration, resulting in the depletion of cognitive resources (Helton and Warm, 2008). Passive fatigue is a result of a lower workload—needed to engage in prolonged, but relatively easy tasks (May and Baldwin, 2009). Motivational control plays an important role in vigilance decrement, as it reflects the level of willingness to perform a task. Motivational control is linked with the process of subconscious balancing between costs and benefits to expend or conserve energy (Kurzban et al., 2013a). For instance, Kurzban and colleagues suggested that people experienced performance reductions over time when the costs outweighed the benefits (Kurzban et al., 2013b). Recent studies recognize that these three core concepts are not mutually exclusive, and there are still limitations in the core concepts account for changes induced by fatigue (Boksem and Tops, 2008; Seli et al., 2015; Thomson et al., 2015). Therefore, the hybrid models synthesizing different concepts have emerged to complement the limitations. For example, Boksem and Tops (2008) proposed a framework of mental fatigue that integrated the motivational control and energetical costs, suggesting that people would no longer maintain their performance when the energetical resources depleted, although the costs outweighed the benefits. All in all, it is still unclear why task performance deteriorates with time-on-task.

The influence of motivation on prolonged task performance has been studied by subsequently providing monetary rewards. The effects on response selection (Möckel et al., 2015), action monitoring (Boksem et al., 2006), and sustained attention (Reteig et al., 2019) have been previously shown. Although numerous studies have demonstrated that monetary rewards can improve performance when provided after long-term tasks (Lorist et al., 2005; Boksem et al., 2006; Hopstaken et al., 2015), the neural mechanisms upon which this improvement builds on are not established. Moreover, the effect of reward on performance in different (i.e., high vs. low) vigilance states has rarely been approached.

To explore the effects of motivation on behavioral performance and brain electrophysiology in high and low vigilance states, we conducted a 140-min selective visual attention task and provided monetary rewards for successful task performance in the early stage (during the 20- to 40-min interval) and in the late stage (during the 100- to 120-min interval; **Figure 1**). By utilizing brain electrophysiological measures derived from high-temporal-resolution electroencephalograms (EEGs), we focused on time domain [event-related potential (ERP) P300 amplitude and latency] and time-frequency domain [event-related spectral perturbations (ERSPs)] variables as electrophysiological markers of visually induced neural activations. We further quantified the degree of recovery of behavioral and electrophysiological measures in the low vigilance state after motivation manipulation.

The stimulus-locked ERP component P300 has received much attention as a potential indicator of mental workload in a selective visual attention task (Faber et al., 2012). The amplitude of P300 was proved to be a useful measure of processing capacity that correlates positively with the accuracy of the memory search task (Kok, 2001). Furthermore, the latency of P300 was suggested to be an indicator of mental chronometry as demonstrated by its positive correlation with reaction time (RT; Verleger, 1997). While reports about the effect of time-on-task on the P300 component are diverse, the study of Faber et al. (2012) did not find a significant decrease in the P3b amplitude during prolonged engagement in a selective visual attention task. Boksem et al. (2006) also showed that the P300 amplitude did not change with time-ontask, but the P300 latency increased with vigilance decrement. Although the P300 amplitude and latency have been widely used in studies on vigilance (Kato et al., 2009; Käthner et al., 2014; Hopstaken et al., 2015), most results are limited to conventional ERP analysis.

It is also valuable to explore how the oscillatory dynamics reflect changes in attentional allocation and information processing during a selective visual attention task. Frontal theta oscillations have been shown to be related to the allocation of attention to task-relevant visual and auditory stimuli (Keller et al., 2017). Oscillations in the delta band have been implicated in attention and salience detection and are associated with vigilance levels and motivation (Knyazev, 2012). It has also been suggested that EEG delta oscillations are an indicator of attention to internal processing during the performance of mental tasks (Harmony et al., 1996). Compared with traditional time- and phase-locked ERP analysis, the changes in spectral power provided by two-dimensional time-frequency analysis could provide a better account of the neural mechanisms involved in selective visual attention. In the current study, besides the evoked P300 component, we will analyze the ERSPs.

We hypothesize that vigilance decrement induced by prolonged engagement in a selective visual attention task impairs behavioral performance and neural activity and is evident in P300 latency and amplitude. We further hypothesize that monetary rewards improve the behavioral performance and neural activity in the low vigilance state. We apply a variant of the Eriksen Flanker Task conducted over 2 h 20 min (seven blocks) and assume that the subjects are in a lower vigilance state at the end of the task (blocks 5 and 6) than at the beginning (blocks 1 and 2). To compare the effects of motivation on performance in states of high vs. low vigilance, we introduce rewards in block 2 (during 20–40 min after task onset) and block 6 (during 100–120 min after task onset). The behavioral performance, evoked ERPs, and ERSPs were compared between high and low vigilance states with and without rewards.

# MATERIALS AND METHODS

#### Subjects

Twenty healthy participants (eight males), ranging from 18 to 28 [mean = 21.9, standard deviation (SD) = 2.4] years of age, were recruited from the university population. Participants reported that they had no history of smoking, sleep problems, or use of prescription medication. None worked the night shift. Furthermore, they all had normal or corrected-to-normal visual acuity, and they were right-handed according to their own report. The participants were compensated for their participation. The study was conducted in accordance with the Declaration of Helsinki and was approved by the ethics committee of Liaoning Normal University. Informed consent was obtained from each subject prior to the study.

#### Measures

#### Task and Stimuli

A version of the Eriksen Flanker Task (Eriksen and Eriksen, 1974) was adopted. A five-letter string stimuli with a central target letter (M/N) and four-remaining flanker letters (N/M) were used. The letters M and N were more similar with increased complexity in comparison to the original version with the letters H and S (Gulbinaite et al., 2014). In congruent trials (MMMMM or NNNNN), the target letter (the middle letter in the five-letter string) was identical to the flankers, whereas in incongruent trials (MMNMM or NNMNN), the target letter differed from the flankers. The participants were instructed to press the left button with the left index finger if the target was M and the right button with the right index finger if the target was N as soon as possible while maintaining a high level of accuracy.

All stimuli were presented as white against a black background on a computer screen. At the beginning of the task, there was a fixation cross in the center of the screen (0.32◦ × 0.32◦ ). Each letter of the string had a height and width of 0.24◦ visual angle. The letters were 0.05◦ apart to increase the error rates (Boksem et al., 2008). After 1,000 ms, the fixation cross was replaced by the five-letter string. The stimuli disappeared after 200 ms and—for the subjects to provide the response—were followed by a time interval, which elapsed until the response button was pressed or until 600 ms. An additional 200-ms interval was provided for the subjects to realize a possible erroneous response. Finally, the feedback indicating task performance was presented for 1,000–1,500 ms, depending on the response time. Feedback was presented with given responses (''Correct,'' ''Error,'' or ''Miss'') at a width of 0.5 cm. Each trial lasted 3 s in total. The trial structures are depicted in **Figure 1**. Congruent (60%) and incongruent (40%) trials were presented in random order (Tops et al., 2006).

#### Reward

Although individuals present differences in sensitivity to reward, the monetary reward has been corroborated to be an effective means of motivation manipulation (Paschke et al., 2015). Participants were told that in one or some blocks, for each correct response, they would receive bonus money, and they would not lose money for errors or misses. Participants could earn up to

100 RMB (approximately 12.8e) in addition to a basic sum of 50 RMB (approximately 6.4e) payment. The amount of money was evaluated proportionally to students' monthly expenses when manipulating motivation. To maintain the effectiveness of the reward, it was stressed that they would receive the bonus if the average accuracy of the reward blocks was more than 90%; otherwise, they would lose it. For the feedback in the reward blocks, the correct responses ''Correct'' coupled with ''+ RMB'' were 1 cm in width, and the ''Error'' or ''Miss'' responses were similar to the nonreward blocks (**Figure 1**).

## Procedure

The participants were informed that they should abstain from alcohol, tea, and coffee for 24 h before the experiment. After arriving at the laboratory, they were given the written task instructions. They were asked to leave their watches and mobile phones outside the laboratory so that they had no indication of time during the measurement. The participants were then seated in front of a 19-inch PC monitor (1,280 × 1,024 pixels) at a distance of 0.9 m in a dimly lit, sound-attenuated, and electrically shielded room. Participants practiced the task before the formal experiment day to achieve an accuracy of 90% (those with an accuracy of <90% were not included in this study). Moreover, the reward was introduced in the practice experiment to build the association between task performance and monetary reward already prior to the experiment to avoid different time of reward exposure in high vs. low vigilance states in the formal experiment. On the experiment day, prior to the start of the formal experiment, participants performed the task for 10 min (200 trials) to adapt to the task. In the formal experiment, they were instructed to respond to the target letter presented in seven blocks of 20 min, for a total of 2 h 20 min (2,800 trials). Among the seven blocks, the monetary reward was introduced in blocks 2 and 6. The procedures can be found in **Figure 1**. The task blocks 1–4 were performed to induce vigilance decrement. To avoid any anticipatory effect of experiment ending, the additional no-reward block 7 was performed after the rewarded block 6. There was no rest during the experiment or any subjective questionnaires to maintain task performance and avoid the effects of short breaks alleviating fatigue. Prior studies have shown that even short breaks can increase task performance, making it difficult to evaluate whether the performance recovery results from motivation or the short break (Helton and Russell, 2015; Lim and Kwok, 2016). To maintain task performance, subjects were asked to focus their attention on the target letter presented in the center of the screen. The subjects were informed of the beginning and end of the reward blocks by instructions displayed on the screen. At the end of the task, the average accuracy of reward blocks was calculated to determine whether they would receive the bonus money or not.

### EEG Recording and Processing

The EEGs were recorded using 64 Ag/AgCl electrodes attached to an electro cap according to the International 10-20 System. An ANT Neuro EEG amplifier was used to record EEG signals sampled at a digitization rate of 500 Hz. Horizontal and vertical electrooculograms were recorded from the outer canthi of the eyes and above and below the left eye. The electrode impedance was kept below 10 k, and the EEG was online referenced to the CPz channel.

In the offline analysis, EEG data were notch filtered at 50 Hz. Next, a digital high-pass filter of 0.5 Hz and a low-pass filter of 30 Hz were applied. After removing the direct current (DC) component, the EEG signals were denoised using the wavelet threshold method (Zhang et al., 2018), wherein the wavelet coefficient threshold was set to abs (mean ± 3 × SD). If the absolute value of the wavelet coefficients exceeded the threshold, the coefficients were reset to one-quarter of the average value. The data were re-referenced to the average of the mastoid references (M1, M2). The ERP epochs from 200 ms before to 800 ms after stimulus onset were extracted. Finally, by using the Icasso software (Himberg and Hyvärinen, 2003), independent artifact components (e.g., blinks, movements, etc.) were removed through visual inspection.

### Data Analysis

To study the effects of the reward state (i.e., no-reward vs. reward) on the behavioral and electrophysiological measures in the states of high vs. low vigilance, four blocks (blocks 1, 2, 5, and 6) were selected. The subjects were provided with monetary rewards in blocks 2 and 6. In both high (blocks 1 and 2) and low (blocks 5 and 6) vigilance states, the reward blocks were introduced after the no-reward blocks. In summary, the analysis was based on 2 × 2 comparisons, representing the no-reward high vigilance (NRHV) condition in block 1, reward high vigilance (RHV) condition in block 2, no-reward low vigilance (NRLV) condition in block 5, and reward low vigilance (RLV) condition in block 6.

#### Behavioral Performance

For each participant, the accuracy, mean RT, and number of omitted responses were calculated. Only responses occurring between 100 and 600 ms were included in the RT analysis. A response time equal to zero was regarded as an omitted response. The accuracy was calculated as the percentage of correct responses in each block. We addressed the main effects and interactions of the vigilance state and the reward state on task performance. In addition, the effect of congruency (congruent vs. incongruent) was also tested for accuracy, RT, and omitted responses.

#### Event-Related Potentials

ERPs were analyzed with MATLAB 2015b. First, the individual correct trials whose amplitude was out of range (max >75 µv, baseline max >30 µv) were rejected, and then the baseline 200 ms before stimulus onset was subtracted from the waveforms. Next, trials were averaged across blocks for each subject. The mean (with SD in parentheses) number of trials across all subjects for NRHV, RHV, NRLV, and RLV were 236 (82), 232 (65), 234 (64), and 238 (64), respectively. The P300 amplitude and latency were quantified for further analysis. Based on some earlier studies (Polich and Kok, 1995; Kuba et al., 2012; van Dinteren et al., 2014) and topographic activations in our study, eight electrodes (FC1, FC2, FCz, C1, C2, Cz, CP1, and CP2) were chosen for the P300 analysis. A time window of 440–660 ms for the P300 component was selected. The P300 latency values were calculated as the time of maximum amplitude within the time window of the P300 component (Luck, 2005).

#### EEG Spectra

The EEG spectral power was assessed by calculating the ERSP using the continuous wavelet transform (CWT; Zhang et al., 2018). The complex Morlet wavelet was adopted for the CWT analysis, by which the time-dependent signals were evaluated at each sampling instant with a central frequency band of 1.5 Hz covering frequencies from 1 to 30 Hz, with a frequency step of 0.5 Hz. Additionally, we normalized the power spectra with the subtraction change from −1,000- to 0-ms baseline. For quantifying the oscillatory dynamics, we focused on separate time windows in the analysis of two frequency bands (**Figure 5**). According to the maximum power of the different frequency bands, statistical analysis was performed within the time window of 440–660 ms for the delta band (1–4 Hz) and within the time window of 300–600 ms for the theta band (4–8 Hz). In order to account for the effect of phase-locked (evoked response) activity in the induced oscillations, we also analyzed the induced activations by subtracting the averaged evoked response from each epoch prior to the wavelet analysis. The results of this analysis are provided in the **Supplementary Materials**.

#### Statistical Analysis

Data were analyzed using the IBM SPSS software (version 22.0), Chicago: SPSS Inc. The significance level p < 0.05 was used, and all results were reported under the 2-tailed condition. One-way repeated-measures analysis of variance (ANOVA) with the blocks 1, 3, 4, and 5 was used to test the hypothesis that behavioral performance deteriorates with time-on-task. Blocks 2 and 6 with an additional influence of motivation and block 7 with an effect of approaching the end of the task were excluded to capture the changes purely due to time-on-task. Moreover, behavioral, time domain, and time-frequency domain data were subjected to 2 × 2 [vigilance states (high and low) × reward states (no-reward and reward)] repeated-measures ANOVA. In case of significant interaction and/or main effects, a follow-up ANOVA was applied to separately test the effect of the vigilance state in no-reward and reward conditions (NRLV vs. NRHV indicates the effects of vigilance decrement) and the effect of reward in low and high vigilance states (RHV vs. NRHV and RLV vs. NRLV indicate the effects of motivation in the high and low vigilance states, respectively). The Greenhouse–Geisser correction was used as the adjusted report, and the effect size was determined using adjusted partial η 2 (η 2 ap; Mordkoff, 2019).

The effect of congruency was initially tested with 2 × 2 × 2 ANOVA (congruency, vigilance state, and reward state). However, as no interaction was found for congruency, the effects of the reward and vigilance states were tested with congruent and incongruent trials integrated together. The correlations between performance (accuracy, RT, and omitted response) and ERPs (P300 amplitude and latency) were calculated using the Pearson Correlation Coefficient to study the association between the behavioral and electrophysiological measures in different vigilance and reward states.

#### RESULTS

#### Behavioral Performance

**Figure 2** illustrates the alterations of behavioral performance (accuracy, RT, and number of omitted responses) with time-on-task. Based on the one-way repeated-measures ANOVA, we found that the accuracy significantly decreased (F(1.37,25.93) = 4.44, p = 0.02, η 2 ap = 0.15) with time-on-task. Meanwhile, the RT (F(2.37,44.99) = 3.97, p = 0.03, η 2 ap = 0.13) and the number of omitted responses (F(2.55,48.45) = 4.12, p = 0.02, η 2 ap = 0.14) significantly increased along with the prolonged task performance.

#### Accuracy

In the 2 (vigilance states) × 2 (reward states) ANOVA analysis, there was a significant main effect of the reward state (F(1,19) = 6.02, p = 0.03, η 2 ap = 0.21 and a significant vigilance state × reward state interaction (F(1,19) = 7.38, p = 0.01, η 2 ap = 0.24). When the vigilance states were contrasted separately for reward and no-reward conditions, the accuracy was lower

in the low vigilance state than in the high vigilance state in the no-reward condition (NRLV: mean = 0.88, SD = 0.12, NRHV: mean = 0.93, SD = 0.06, F(1,19) = 5.24, p = 0.03, η 2 ap = 0.17). There was no difference between the rewarded low and high vigilance states (RLV: mean = 0.94, SD = 0.05, RHV: mean = 0.94, SD = 0.05, F(1,19) = 0.00, p = 1.00, η 2 ap = 0.00). The monetary

blocks) = (NRHV + NRLV)/2, and reward (reward blocks) = (RHV + RLV)/2. Analysis of variance (ANOVA) results were marked by <sup>∗</sup>p < 0.05 and ∗∗p < 0.01.

reward played a role only in the low vigilance state. The accuracy was higher in the rewarded than in the no-rewarded low vigilance condition (F(1,19) = 7.37, p = 0.01, η 2 ap = 0.24), although there was no significant difference between the rewarded and the no-rewarded high vigilance conditions (F(1,19) = 0.47, p = 0.50, η 2 ap = −0.03).

#### Reaction Time

There was a significant main effect of the reward state on the RT (F(1,19) = 10.95, p < 0.01, η 2 ap = 0.33). The follow-up ANOVA indicated that the RT increased with vigilance decrement in the no-reward condition (NRLV: mean = 319.22, SD = 49.15, NRHV: mean = 311.02, SD = 46.42, F(1,19) = 5.52, p = 0.03, η 2 ap = 0.18). There was no significant difference between the low and high vigilance states in the reward condition (RLV: mean = 308.58, SD = 45.25, RHV: mean = 309.83, SD = 47.25, F(1,19) = 0.21, p = 0.65, η 2 ap = −0.04). When rewards were provided in the states of low and high vigilance, the RT was faster in the low vigilance state (F(1,19) = 8.38, p = 0.01, η 2 ap = 0.27) but was not improved in the high vigilance state (F(1,19) = 1.75, p = 0.20, η 2 ap = 0.04).

#### Omitted Responses

There was a significant main effect of the reward state on the number of omitted responses (F(1,19) = 9.22, p = 0.01, η <sup>2</sup> = 0.29). The follow-up ANOVA revealed that the omitted responses increased with the decrease in vigilance in the noreward condition (F(1,19) = 5.39, p = 0.03, η 2 ap = 0.18). No difference was found between low and high vigilance states in the reward condition (F(1,19) = 0.07, p = 0.80, η 2 ap = −0.05). The number of omitted responses decreased in the state of low vigilance (F(1,19) = 10.94, p = 0.01, η 2 ap = 0.33) after motivation manipulation, although it did not change in the state of high vigilance (F(1,19) = 1.97, p = 0.18, η 2 ap = 0.05).

#### Congruency

Regarding the congruency (congruent × incongruent), we found significant main effects of congruency on accuracy (F(1,19) = 18.07, p < 0.01, η 2 ap = 0.46), RT (F(1,19) = 32.75, p < 0.01, η 2 ap = 0.61), and omitted responses (F(1,19) = 9.65, p = 0.01, η 2 ap = 0.30). The congruent condition showed a higher accuracy (congruent: mean = 0.94, SD = 0.01, incongruent: mean = 0.91,

SD = 0.02), faster RTs (congruent: mean = 306.93, SD = 10.08, incongruent: mean = 315.28, SD = 9.84), and less omitted responses (congruent: mean = 2.98, SD = 0.69, incongruent: mean = 5.45, SD = 1.31) than the incongruent condition. However, no significant interaction between congruency and vigilance state or between congruency and reward state was observed in behavioral performance.

#### Event-Related Potentials

#### P300 Components

#### **Amplitude**

The left part of **Figure 3A** shows the averaged ERP amplitude waveforms with the time window of interest (P300 response at 440–660 ms after stimulus onset) depicted by a gray rectangle. The middle part of **Figure 3A** shows the corresponding topographies in the four experimental conditions, whereas **Figure 3B** illustrates the differences in P300 amplitude between the four conditions (left) and between the two main factors of vigilance state and reward state (right).

The repeated-measures ANOVA showed a significant main effect of the reward state on the P300 amplitude (F(1,19) = 7.08, p = 0.02, η 2 ap = 0.23) and a significant interaction between the vigilance state and reward state (F(1,19) = 6.78, p = 0.02, η 2 ap = 0.22). The follow-up ANOVA revealed that the P300 amplitude decreased with vigilance decrement (NRLV: mean = 4.17, SD = 2.68, NRHV: mean = 4.81, SD = 2.74, F(1,19) = 9.99, p = 0.01, η 2 ap = 0.31) in the no-reward condition, although no significant difference was found between the low and high vigilance states in the reward condition (RLV: mean = 4.96, SD = 3.01, RHV: mean = 4.89, SD = 2.98, F(1,19) = 0.07, p = 0.80, η 2 ap = −0.05). When the effect of reward was tested in states of low and high vigilance separately, the reward improvement presented only in the low vigilance state (F(1,19) = 15.88, p < 0.01, η 2 ap = 0.43) and not in the high vigilance state (F(1,19) = 0.12, p = 0.74, η 2 ap = −0.05).

Regarding the congruency (congruent vs. incongruent), a significant main effect of congruency was found for the P300 amplitude (F(1,19) = 22.19, p < 0.01, η 2 ap = 0.51). The amplitude was higher in the congruent condition (mean = 5.07, SD = 1.15) than in the incongruent condition (mean = 4.35, SD = 1.14). No interaction from congruency × vigilance state or congruency × reward state was detected.

#### **Latency**

**Figure 4A** illustrates the ERP waveforms in high and low vigilance states (reward and nonreward blocks coalesced), and **Figure 4B** shows the differences in P300 latency in the four experimental conditions (left) and the two main factors of vigilance state and reward state (right).

There was a significant main effect of the vigilance state on the P300 latency (F(1,19) = 52.20, p < 0.01, η 2 ap = 0.72) and an interaction between the vigilance state and the reward state (F(1,19) = 6.55, p = 0.02, η 2 ap = 0.22). Separate ANOVAs revealed the clear effects of vigilance states regardless of rewards. The P300 latency increased in the low vigilance state compared with the high vigilance state in both the no-reward (NRLV: mean = 557.45, SD = 84.71, NRHV: mean = 506.85, SD = 81.31, F(1,19) = 45.52, p < 0.01, η 2 ap = 0.69) and reward conditions (RLV: mean = 554.65, SD = 83.94, RHV: mean = 519.55, SD = 80.44, F(1,19) = 37.97, p < 0.01, η 2 ap = 0.65). When the effect of reward was tested in the states of low and high vigilance, there was no difference between reward and no-reward conditions in the low vigilance state (F(1,19) = 0.37, p = 0.55, η 2 ap = −0.03), but we did find a decrease in the high vigilance state (F(1,19) = 6.81, p = 0.02, η 2 ap = 0.23) after manipulating motivation.

Regarding congruency (congruent vs. incongruent), a significant main effect was found on the P300 latency (F(1,19) = 8.91, p = 0.01, η 2 ap = 0.28). The latency was shorter in the congruent (mean = 465.59, SD = 8.43) than in the incongruent conditions (mean = 471.48, SD = 9.41). For P300 latency, no interaction from congruency × vigilance state or from congruency × reward state was detected.

#### Correlations Between ERPs and Behavioral Performance

To investigate the associations between task performance and ERPs affected by motivation and vigilance states, the correlations between the behavioral measures (accuracy, RT, and number of omitted responses) and ERPs (the amplitude and latency of P300) were calculated (**Table 1**). Significant negative correlations between the accuracy and P300 latency and significant positive correlations between the accuracy and P300 amplitude were detected. Additionally, the number of omitted responses and the RT were negatively correlated with the P300 amplitude and positively correlated with the P300 latency. Scatter diagrams showing the relationships between the behavioral measures and P300 amplitude and latency can be found in **Supplementary Figure S1**.

#### ERSP Analysis

**Figure 5** illustrates the time-frequency representations (averaged over electrodes F1, F2, Fz, FC1, FC2, FCz, C1, C2, Cz, CP1, and CP2) in the four experimental conditions. A clear modulation of frequencies of approximately 1–4 Hz is visible in the time window of 440–660 ms. Separable modulations of approximately 4–8 Hz (in the time window of 300–500 ms) appear visually earlier than 1–4 Hz over the four conditions. The corresponding frequency bands and time windows are indicated by the dotted-line boxes. We also calculated the induced time-frequency representations after removing the phaselocked evoked responses from the total power (**Supplementary**

TABLE 1 | Correlations between behavioral performance and P300 measures in the four conditions.


Note: The AMP and LAN represent the amplitude and latency of P300. ACC and OMIT represent the accuracy and number of omitted responses. Correlation cofficients in 2-tailed condition were marked by <sup>∗</sup>p < 0.05 and ∗∗p < 0.01. The correlations were corrected by executing the Benjamini and Hochberg procedure for controlling the false discovery rate (FDR; Benjamini and Hochberg, 1995).

**Figure S2**). **Figure 6A** illustrates the topographic distribution (right) and power waveforms (left) averaged across the electrodes (referred above) corresponding to the delta band (averaged over 1–4 Hz). **Figure 6B** draws the topographic distribution (right) and power waveforms (left) of the theta band (averaged over 4–8 Hz), with activations in the frontal electrodes (F1, F2, Fz, FC1, FC2, and FCz).

In the 2 (vigilance states) × 2 (reward states) ANOVAs, for the delta band power, we found a significant interaction between the vigilance state and reward state (F(1,19) = 7.28, p = 0.01, η 2 ap = 0.24). When the effect of the vigilance state was tested in the no-reward and reward conditions, the delta band power decreased with vigilance decrement in the no-reward condition (NRLV: mean = 703.43, SD = 162.89, NRHV: mean = 768.16, SD = 153.49, F(1,19) = 8.72, p = 0.01, η 2 ap = 0.28), but no significant difference was detected between low and high vigilance states in the reward condition (RLV: mean = 750.86, SD = 206.30, RHV: mean = 757.08, SD = 200.46, F(1,19) = 0.05, p = 0.82, η 2 ap = −0.05). When the effect of reward was separately tested in the states of low and high vigilance, the effect of reward on delta band power was detected only in the low vigilance state (F(1,19) = 4.57, p = 0.04, η 2 ap = 0.19) and not in the high vigilance state (F(1,19) = 0.23, p = 0.64, η 2 ap = 0.01).

For the theta band, vigilance state had a significant main effect (F(1,19) = 18.56, p < 0.01, η 2 ap = 0.47). Follow-up ANOVA revealed that theta power was weaker in low vigilance state than in high vigilance state in both no-reward condition (NRLV: mean = 432.43, SD = 115.29, NRHV: mean = 512.47, SD = 169.54, F(1,19) = 11.38, p < 0.01, η 2 ap = 0.34) and reward condition (RLV: mean = 455.70, SD = 118.69, RHV: mean = 498.61, SD = 137.47, F(1,19) = 6.36, p = 0.02, η 2 ap = 0.21). The separate ANOVAs revealed that reward did not play a role in low vigilance state (F(1,19) = 1.73, p = 0.20, η 2 ap = 0.03) or in high vigilance state (F(1,19) = 0.21, p = 0.65, η 2 ap = −0.04).

### DISCUSSION

We examined the alterations in behavioral performance and brain electrophysiology produced by the vigilance level and reward during a prolonged period of selective visual attention tasks. Behavioral measures (accuracy, RT, and number of omitted responses), evoked responses (P300 amplitude and latency), and spectral power (delta and theta bands) were analyzed. A clear deterioration in behavioral performance was demonstrated over time (**Figure 2**). The monetary reward improved the performance in accuracy, RT, and number of omitted responses only in the low vigilance state. The P300 amplitude was smaller in low than in high vigilance state; however, in the low vigilance state, reward increased the P300 amplitude to its level in the high vigilance state. The P300 latency was sensitive to vigilance decrement but insensitive to rewards, with longer latency in low than in high vigilance states. Changes in spectral power at 4–8 Hz purely reflected the vigilance level, being stronger in the high vigilance state than in the low vigilance state. Similarly, the spectral responses at 1–4 Hz also decreased with vigilance decrement. However, the reward selectively increased the spectral power at 1–4 Hz

in the low vigilance state to the strength levels in the high vigilance state.

Although the time of emergence of fatigue during the prolonged performance of cognitive tasks has not been defined, earlier studies suggest cumulative effects in performance and neurophysiology with time-on-task (Boksem et al., 2005, 2006; Lorist et al., 2005; Faber et al., 2012; Möckel et al., 2015; Reteig et al., 2019). In line with these findings, our study found a significant decrease in accuracy and an increase in RT and the number of omitted responses with time-on-task (**Figure 2**). The decline of behavioral performance in prolonged attention tasks is in line with our assumption that time-ontask is associated with the decrement of vigilance levels. These results provided justification for testing the interactions between vigilance and reward states, where blocks 1 and 2 (first 40 min) were regarded as representing the high vigilance state, whereas blocks 5 (80–100 min) and 6 (100–120 min) were regarded as the low vigilance state. This selection was also supported by earlier findings of the effects of mental fatigue after 60- to 90-min tasks (Kastner and Ungerleider, 2000; Lorist et al., 2000; Marcora et al., 2009).

The limited processing capacity biased toward goal-directed selection is the core of selective visual attention (Robert and Duncan, 1995; Polich, 2009). The P300 component is considered as an important indicator of attentional capacity in visual tasks (Polich, 2009). In line with earlier results showing a fatigue-related decrease in P300 amplitude during braincomputer interface performance (Käthner et al., 2014), our results demonstrated that the P300 amplitude decreased with vigilance decrement during a selective visual attention task, presumably reflecting a less efficient engagement or limited capacity of attentional resources. The insufficient attention resources allocation in the low vigilance state has also been reflected by the P300 latency, which is thought to provide a specific index for the timing of information processing and stimulus evaluation (Polich and Kok, 1995; Verleger, 1997; Käthner et al., 2014). In our study, the P300 latency was significantly prolonged in the state of low vigilance, in line with earlier studies (Kutas et al., 1977; Boksem et al., 2006; Kato et al., 2009). The result indicates that prolonged task performance accompanies longer evaluation time for processing information. Therefore, in agreement with existing studies, our results demonstrated a decrease in P300 amplitude and an increase in latency along with vigilance decrement.

The close link between the modulations in behavioral performance and the changes in brain electrophysiology is demonstrated by strong correlations between the behavioral and P300 measures (**Table 1**). It is noteworthy that, although the RT correlates with both P300 amplitude and latency, the association between the decrease of P300 amplitude and the increase in RT in particular is clear (**Supplementary Figure S1**).

After reward manipulation, the P300 amplitude increased only in the low vigilance state, reaching the same level as in the high vigilance state. Our results are consistent with earlier studies demonstrating a monetary-reward based improvement of neural measures in mental fatigue (Boksem et al., 2006; Hopstaken et al., 2015). We further provide quantitative evidence for the recovery of attentional resources from low vigilance to high vigilance states. Our results verify that, when rewards are provided, the capacity of attentional resources in the low vigilance state can reach the level of capacity in the high vigilance state, at least within the limited time duration of 2 h 20 min for a visual attention task.

Interestingly, there was no significant reward-induced improvement in P300 latency either in high or low vigilance states despite the reward-related improvement in RT in the low vigilance state. These results are in line with a previous study (Boksem et al., 2006), suggesting that the P300 latency is an unstable electrophysiological marker of motivation compared with the P300 amplitude. These diverging results concerning the RT and P300 latency could be interpreted to indicate improvement in motor response generation (as reflected by the RT) but not in the preceding stage of information processing (as reflected by the P300 latency). However, these results might also reflect the complex composition (subcomponents) of the P300 responses. The P300 component has been shown to comprise two subcomponents—P3a and P3b—with different functional correlations (Demiralp et al., 2001). The P3a with frontal topography has been suggested to contribute to attention engagement in top-down task-relevant processing, whereas the P3b with centroparietal topography has been linked to the level of cognitive workload and memory encoding (Polich, 2009). They are activated in different time windows, and P3a usually emerges earlier than P3b (Polich, 2009). Although we fail to disentangle the two subcomponents in this study, it is possible that, in addition to changes in the amplitude, also changes in the emphasis of these neural subprocesses are associated with vigilance decrement. This complicates the interpretation of latency measures.

The topography of the P300 component in the present study is more anterior than that in some earlier studies (Demiralp et al., 2001; Käthner et al., 2014). This likely reflects the task requirements of the present study. To successfully perform a selective visual attention task, humans are able to filter out task-irrelevant stimuli and engage their limited capacity in task-relevant processing (Robert and Duncan, 1995). Our task is likely to harness—although it was not designed to differentiate—these two subprocesses. The modified Eriksen Flanker Task applied in our study required responses for every trial and was specifically adapted to make the target letter distinction visually hard, emphasizing the need for the active inhibition of the flankers. Some interpretations, especially in studies showing anteriorly located P300 generators, emphasize the role of the inhibitory control underlying the modulation of P300 responses, for example, as a result of aging (Kuba et al., 2012; van Dinteren et al., 2014). Importantly for the present findings, this interpretation is also sensible in the context of vigilance decrement, which is often accompanied by a reduction in the capacity for top-down inhibitory control (Guo et al., 2018).

P300 seems to provide a reliable measure of cognitive performance, but the analysis of phase-locked ERP offers only limited windows to explore the underlying neural processes in more detail. Previous studies have indicated that both delta (Keller et al., 2017) and frontal theta oscillations (Knyazev, 2012) are involved in visual attention tasks, although the influence of time-on-task was not studied in these studies. Focusing on the spectral patterns and the power modulations at different frequency bands may provide additional sensitivity to separate vigilance- and reward-related processes in the brain. The oscillatory activity at low-frequency bands [i.e., the delta (1–4 Hz) and theta (4–8 Hz) bands] has been shown to increase during the transition to the low vigilance state in spontaneous conditions (Lal and Craig, 2001). Although the modulations of the oscillatory activity triggered by cognitive tasks are different from those from spontaneous activity, the temporal variations in the power of these frequency bands triggered by a visual attention task may tap on the same underlying processes as reported based on more spontaneous conditions.

In the present study, the changes in the spectral power at 1–4 Hz and 4–8 Hz reflected different topographies, with the delta band distributed in the centroparietal electrodes and the theta band distributed more focally in the frontal electrodes. In addition, the temporal characteristics of the changes in power in these two frequency bands differed, with an earlier emergence of the modulation at 1–4 Hz (300–600 ms) than at 4–8 Hz (440–660 ms). Therefore, it is likely that the two separable changes in spectral power reflect two distinct cognitive functions involved in selective visual attention. The theta band has been shown to be an indicator of attention allocation to task-relevant stimuli (Keller et al., 2017), whereas the delta band has been linked with internal processing in an attention task (Harmony et al., 1996). Prolonged engagement in a selective visual attention task led to the reduced spectral power in both delta and theta bands. These results are in line with the analysis of the evoked P300 responses and suggest that vigilance decrement impairs both attention allocation and information processing. In the rewarded low vigilance condition, the spectral power at approximately 1–4 Hz increased to the same level as that recorded in the high vigilance state. However, the spectral power at approximately 4–8 Hz did not increase in the rewarded low vigilance state compared to the rewarded high vigilance state. Consequently, the power in these two frequency bands was thus differently modulated by motivation. It would be tempting to associate the current findings with the distinct roles suggested for the theta and delta bands, suggesting that intrinsically driven regulation of information processing can be influenced by reward (as reflected by delta-band changes, Knyazev, 2012), but the top-down attentional control (reflected by theta-band changes, Cavanagh and Frank, 2014) is insensitive to reward. Interestingly, accumulating evidence exists linking theta oscillations with the attentional sampling of the environment, especially during higher task demands (Bastiaansen and Hagoort, 2003; Landau et al., 2015; Spyropoulos et al., 2018; Karamacoska et al., 2019). Interpreted in the context of the current results suggesting the insensitivity of the theta band to reward manipulation, theta power might subserve rather low-level attentional sampling, which is not directly linked with the reward system, at least in the context of nonprimary, extrinsic rewards such as money. On the contrary, delta band oscillations may reflect a separate compensatory mechanism (Knyazev, 2012), which supports the recovery of functions after manipulating motivation.

However, these interpretations must be treated with caution, especially regarding the role of fatigue-related oscillatory dynamics. The current experimental paradigm is not optimal for studying ongoing oscillations, and the changes in rhythmic activation are strongly linked with the visual trigger. It is important to distinguish between ongoing oscillations and stimulus-related changes in spectral power. This fact is highlighted by the detected decrease in the spectral power by the decrease in vigilance in the present study, while ongoing oscillations at low-frequency bands generally show a fatiguerelated increase (Lal and Craig, 2001). When the influence of phase-locked evoked activation (**Supplementary Figure S2**) was removed from the spectral responses, the theta-band modulations strongly decreased. Rather than reflecting neural computation in the theta band, the time-frequency results might be at least partly driven by the phase-locked evoked responses. Our analysis can be seen as advancing the interpretability of evoked responses, and different experimental paradigms are needed to focus purely on the fatigue-related changes in oscillatory dynamics.

Based on our results, vigilance decrement changes the neural processes underlying selective visual attention, as demonstrated by the changes in the spectral power at 1–4 Hz and at 4–8 Hz, as well as evoked P300 response. Motivation plays a different role in the high and low vigilance states, with improvement of performance only in the low vigilance state. This appears inconsistent with the active fatigue framework (which states that the vigilance decrement is the result of the depletion of cognitive resources, and motivation cannot improve the performance impaired by vigilance decrement, Helton and Warm, 2008) because the impairment in the state of low vigilance is improved after motivation manipulation. On the other hand, our results seem to agree with the motivation control framework—that vigilance decrement is a subconscious balancing between the costs and benefits to expend or conserve energy (Kurzban et al., 2013a). When the cost of efforts to carry out a task outweighs the benefits, humans are unwilling to do so, leading to vigilance decrement. However, not all of the neural measures are improved after providing reward. The P300 latency in the low vigilance state was not modulated by reward. Furthermore, the spectral power in the delta but not in the theta band was modulated by motivation manipulation, which means that motivation partially alleviates neural activity in the low vigilance state. In general, our results imply that motivation is not enough to completely restore the impairment induced by vigilance decrement and provide support for the mental fatigue framework, which integrates the evaluation of expected rewards and energetic costs (Boksem and Tops, 2008).

Further studies are inevitably needed to establish a more comprehensive picture of the underlying neural processes affected by motivation and vigilance states. We only analyze the changes in high vs. low vigilance states; nevertheless, focusing on the ongoing changes while performing the task can significantly advance the understanding of the dynamic emergence of mental fatigue. Our study did not consider the effects of monetary values during a long period of attention task engagement. It is also not possible to completely disengage the dimensions of vigilance and motivation, as it is likely that a decrease in vigilance is also accompanied by decreased motivation to perform a task. Furthermore, providing rewards is not the only method to motivate individuals. Further studies should further elaborate the particular differences in sensitivity to reward (positive) and punishment (negative).

# CONCLUSION

Both the behavioral and electrophysiological measures were modulated by vigilance decrement. The neurocognitive processes were only partially recovered by manipulating rewards. In particular, increasing motivation using rewards differentially influenced brain activations in the high vs. low vigilance states, with more evident improvement in the low than in the high vigilance state. The fatigue-related decrease in latency of P300 responses did not recover with rewards, whereas the P300 amplitude increased to the same level as in the high vigilance state. The spectral power of the delta band was specifically increased by motivation, whereas the decrease of the theta band was not recovered by reward. These findings provide evidence for the dissociable effects of motivation in the states of low and high vigilance and might validate the mental fatigue framework integrating the evaluation of expected rewards and energetic costs.

# DATA AVAILABILITY STATEMENT

The datasets generated and analyzed during the present study are available from the corresponding author on reasonable request.

# ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Liaoning Normal University. The participants provided their written informed consent to participate in this study.

# AUTHOR CONTRIBUTIONS

JL, HS, and TP designed the experiment. JL, CZ, and YZ analyzed the data. JL and YL collected the data. JL and TP conducted the statistics. JL, FC, TP, and TR wrote the manuscript. FC, TR, and TP provided the fundings and guidance for all the conduction of work.

#### FUNDING

This work was supported by the National Science Foundation of China (No. 91748105, No. 61703069), the Fundamental Research Funds for the Central Universities in Dalian University of Technology in China (DUT2019), the Academy of Finland grant (No. 295076), and the scholarships from China Scholarship Council (No. 201600090044; No. 201600090042). Open access funding was provided by the University of Jyväskylä (JYU).

#### ACKNOWLEDGMENTS

We thank Dr. Tengfei Liang for the help with E-prime programming and the editor and reviewers for their constructive comments to improve the manuscript.

#### REFERENCES


### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum.2020.002 07/full#supplementary-material.

FIGURE S1 | Scatter diagrams showing the relationships between the behavioral measures (accuracy, reaction time, and number of omitted responses) and the P300 amplitude and latency in the four conditions. NRHV is block 1 in the no-reward high vigilance state, RHV is block 2 in the reward high vigilance state, NRLV is block 5 in the no-reward low vigilance state, and RLV is block 6 in the reward low vigilance state.

FIGURE S2 | Comparison of different ways of calculating time-frequency representation changes for the current data. (A) Calculation of the power with the continuous wavelet transform (CWT) in each trial, and then averaged (presented in the present study). (B) Calculation of the power with the CWT from epochs, from which the contribution of averaged evoked responses are removed from each trial (averaged ERP is subtracted from each trial). (C) Calculation of the power with the CWT from epochs averaged in the evoked responses (spectra power of averaged ERP).


**Conflict of Interest**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Liu, Zhang, Zhu, Liu, Sun, Ristaniemi, Cong and Parviainen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Is the Deliberate Practice View Defensible? A Review of Evidence and Discussion of Issues

David Z. Hambrick<sup>1</sup> \*, Brooke N. Macnamara<sup>2</sup> and Frederick L. Oswald<sup>3</sup>

<sup>1</sup> Department of Psychology, Michigan State University, East Lansing, MI, United States, <sup>2</sup> Department of Psychological Sciences, Case Western Reserve University, Cleveland, OH, United States, <sup>3</sup> Department of Psychological Sciences, Rice University, Houston, TX, United States

#### Edited by:

Benjamin Cowley, University of Helsinki, Finland

# Reviewed by:

Jonathan Wai, University of Arkansas, United States Matt Sibbald, McMaster University, Canada Roger Lister Kneebone, Imperial College London, United Kingdom

#### \*Correspondence:

David Z. Hambrick hambric3@msu.edu; hambric3@gmail.com

#### Specialty section:

This article was submitted to Performance Science, a section of the journal Frontiers in Psychology

Received: 07 December 2019 Accepted: 04 May 2020 Published: 18 August 2020

#### Citation:

Hambrick DZ, Macnamara BN and Oswald FL (2020) Is the Deliberate Practice View Defensible? A Review of Evidence and Discussion of Issues. Front. Psychol. 11:1134. doi: 10.3389/fpsyg.2020.01134 The question of what explains individual differences in expertise within complex domains such as music, games, sports, science, and medicine is currently a major topic of interest in a diverse range of fields, including psychology, education, and sports science, to name just a few. Ericsson and colleagues' deliberate practice view is a highly influential perspective in the literature on expertise and expert performance—but is it viable as a testable scientific theory? Here, reviewing more than 25 years of Ericsson and colleagues' writings, we document critical inconsistencies in the definition of deliberate practice, along with apparent shifts in the standard for evidence concerning deliberate practice. We also consider the impact of these issues on progress in the field of expertise, focusing on the empirical testability and falsifiability of the deliberate practice view. We then discuss a multifactorial perspective on expertise, and how open science practices can accelerate progress in research guided by this perspective.

Keywords: deliberate practice, expertise, talent, skill, individual differences

# IS THE DELIBERATE PRACTICE VIEW DEFENSIBLE? A REVIEW OF EVIDENCE AND DISCUSSION OF ISSUES

Not infrequently, a single theoretical perspective becomes extremely influential in an area of scientific inquiry, shaping the trajectory of research in the field for years or even decades. More than 25 years ago, K. Anders Ericsson and colleagues proposed what has arguably become the most influential theoretical perspective in the scientific literature on expertise and expert performance. In a pivotal Psychological Review article, Ericsson et al. (1993) theorized that expert performance reflects a long period of deliberate practice, which they stated "includes activities that have been specially designed to improve the current level of performance" (p. 368). In studies of violinists (Study 1) and pianists (Study 2), Ericsson et al. (1993) operationally defined deliberate practice as "practice alone" with the goal of improving performance. The most accomplished musicians reported having accumulated an average of around 10,000 h of practice alone by early adulthood, which was thousands of hours more than the averages for this measure of practice for less accomplished groups.

Applying their framework to several domains of expertise, Ericsson et al. (1993) concluded that "individual differences in ultimate performance can largely be accounted for by differential

amounts of past and current levels of practice" (p. 392). They further explained:

[H]igh levels of deliberate practice are necessary to attain expert level performance. Our theoretical framework can also provide a sufficient account of the major facts about the nature and scarcity of exceptional performance. Our account does not depend on scarcity of innate ability (talent) (Ericsson et al., 1993, p. 392).

Reiterating this perspective, Ericsson et al. (2007) stated that "[i]t is possible to account for the development of elite performance among healthy children without recourse to unique talent (genetic endowment)—excepting the innate determinants of body size" (p. 4). And writing in the Harvard Business Review, Ericsson et al. (2007) explained, "Our research shows that even the most gifted performers need a minimum of 10 years (or 10,000 h) of intense training before they win international competitions" (p. 119).

Ericsson and colleagues' perspective, which we refer to as the deliberate practice view, has had a monumental impact on expertise research. As of this publication, the Ericsson et al. (1993) article has been cited nearly 11,000 times in a wide range of literatures, and there have been nearly 200 theses and dissertations on deliberate practice in universities around the world. As portrayed in popular press books such as Malcolm Gladwell's (2008) bestseller Outliers: The Story of Success and Daniel Coyle's (2009) The Talent Code, Ericsson and colleagues' research has also had a profound influence on the public's thinking about the origins of expertise. Taking his inspiration from Ericsson et al.'s (1993) findings, Gladwell wrote that "10,000 h is the magic number of true expertise" (p. 11). In his own popular press book, Peak: Secrets from the New Science of Expertise, Ericsson wrote, "Deliberate practice can open the door to a world of possibilities that you may have been convinced were out of reach. Open that door" (Ericsson and Pool, 2016, p. 179).

We credit and commend Ericsson and colleagues for their highly influential work. However, here we will discuss what we believe are serious concerns with whether the deliberate practice view is viable as a scientific theory—that is, whether it is empirically testable and falsifiable. [For a similar type of review, see Gottfredson's (2003) critique of Sternberg's practical intelligence theory; Sternberg et al. (1995)]. Before doing so, however, we note two uncontroversial claims about expertise, by which we simply mean a person's measurable (i.e., quantifiable) level of performance in a domain. First, as Ericsson and colleagues have emphasized (e.g., Ericsson, 2006), expertise is acquired gradually. In other words, people are not literally born as experts, innately endowed with the type of specialized knowledge that underpins high-level skill in activities like hitting a golf ball, solving math equations, playing an instrument, or choosing a move in a chess game. Domain-specific knowledge and skill can only be acquired gradually over time through some form of training.

The second uncontroversial claim is that training can lead to large, even massive, improvements in people's level of expertise (i.e., domain-relevant performance). This point was amply illustrated by some of Ericsson and colleagues' earliest research. For example, in a classic study, Ericsson et al. (1980) showed that after more than 200 h of training, a college student improved his performance in a random digit memorization task from a typical 7 digits to 79 digits (the world record is currently an astounding 547 digits)<sup>1</sup> . In short, notwithstanding the issues raised in this article, Ericsson and colleagues' deliberate practice view has important value in society, serving as a useful reminder to the layperson that training of some form is necessary to achieve a high level of performance in a domain.

The controversial question in research on expertise is not whether some form of training is necessary to explain intraindividual (i.e., within-person) increases in expertise (it must be), or whether these increases can be massive (they can be). Rather, the controversial question is the extent to which interindividual (i.e., between-person) differences in accumulated amount of training explain interindividual differences in expertise (for a discussion of the distinctions between interindividual and intraindividual variability, see Molenaar et al., 2003). In statistical terms, what is the direction (and the magnitude) of the correlation between expertise and accumulated amount of training? Somewhat counterintuitively, as **Figure 1** illustrates, the necessity of training to explain intraindividual increases in expertise has no direct implication for the answer to this question. That is, taking as a given that the relationship between training and expertise is positive within individuals, the correlation between training and expertise between individuals could be positive (top panel), indicating higher levels of performance for individuals who have engaged in more training; negative (middle panel), indicating lower levels of performance for individuals who have engaged in more training; or zero (bottom panel).

What, then, is the correlation between deliberate practice and expertise across individuals? The first step in attempting to answer this question is to operationalize deliberate practice that is, to develop measures of deliberate practice based on the definition of the construct. Unfortunately, as we document in this article, there remains a great deal of confusion about the definition of deliberate practice, despite more than 25 years of research on the topic. As Ericsson and colleagues themselves recently noted: "It has been common for scientists to be confused about the definition of DP (deliberate practice)" (Dearani et al., 2017, p. 1333). Here, we discuss possible sources of this continued confusion, the confusion around the measurement of deliberate practice that has resulted (e.g., inclusion criteria for metaanalyses), and the impact of this confusion on the science of expertise. We conclude this article with thoughts on how to advance the scientific study of expertise and expert performance. For a companion presentation to this article, visit https://osf.io/ buqsk/.

### WHAT IS DELIBERATE PRACTICE?

It is undoubtedly the case that different types of domain-relevant activities vary in their importance for developing expertise. For example, training under a qualified golf instructor is almost

<sup>1</sup>http://www.world-memory-statistics.co.uk/disciplines.php

certainly more beneficial for improving golf skill than mindlessly hitting practice balls at a driving range. In their article on deliberate practice, Ericsson et al. (1993) distinguished among three forms of domain-specific experience. They described work as engagement in activities for external rewards (e.g., music performances, sports competitions), play as participating in activities for pleasure (e.g., playing a sport with friends for recreation), and deliberate practice as a "highly structured activity, the explicit goal of which is to improve performance" (Ericsson et al., 1993, p. 368).

However, Ericsson and colleagues have been inconsistent on critical elements of the definition of deliberate practice, and consequently it has been unclear what activities do and do not qualify as deliberate practice. For example, Ericsson et al. (1993) stated that "the teacher designs practice activities that the individual can engage in between meetings with the teacher" (p. 368). A few years later, however, Ericsson (1998) stated that "Ericsson et al. (1993) proposed the term deliberate practice to refer to those training activities that were designed solely for the purpose of improving individuals' performance by a teacher or the performers themselves" (p. 84, emphasis added). This latter statement indicated that deliberate practice, as Ericsson et al. (1993) originally defined the term, encompasses a broader range of activities than just teacher-designed practice. Yet, as shown in **Figure 2**, in subsequent articles, Ericsson and colleagues were inconsistent on this critical point, sometimes indicating that deliberate practice must be designed by a teacher (e.g., Ericsson, 2015), but other times stating that it can be designed by teachers or the "performers themselves" (e.g., Keith and Ericsson, 2007). If deliberate practice must be designed by a teacher, then presumably it cannot also be designed by performers themselves.

As another example of definitional confusion, it is unclear from Ericsson and colleagues' writings whether deliberate practice must be a solitary activity, or whether it can also be a group/team activity. Citing research on team sports (Helsen et al., 1998), Ericsson (2006) observed that "the amount of time spent in team-related deliberate practice activities correlates reliably with skill level in team sports" (p. 695). It seems clear from this observation that there can be team deliberate practice. Recently, however, Ericsson and Harwell (2019) indicated that this is not the case, commenting that "it is important to point out that organized team training may be quite effective in improving performance, but it does not meet all the criteria for deliberate practice" (p. 6). This is another apparent shift in the definition of and criteria for deliberate practice, creating still more confusion about what the "correct" definition is. In short, it is unclear what activities do and do not qualify as deliberate practice.

This confusion surrounding the definition of deliberate practice is not a minor matter—it directly impacts how deliberate practice is measured in empirical studies and what evidence (i.e., effect sizes) should be included in a meta-analysis. These decisions, in turn, directly impact the evaluation of the deliberate practice view: whether evidence is concluded to support the view or not. Certainly, definitions of theoretical constructs can and do evolve over time as science progresses, but the shifts in the definition of deliberate practice reflected in **Figure 2** do not appear to reflect this sort of progression. There are meaningful changes even over short spans of time (e.g., compare Keith and Ericsson's, 2007, description of deliberate practice with Ericsson's, 2006). Confusion around the definition of deliberate practice persists in the expertise

literature, even as researchers are attempting to investigate the deliberate practice view.

# Challenges to the Deliberate Practice View

Notwithstanding this confusion over the definition of deliberate practice, there have been numerous attempts to test the deliberate practice view. One of the first noteworthy tests came from a study of chess expertise by Gobet and Campitelli (2007), who administered a questionnaire to 90 members of a Buenos Aries chess club to assess lifetime engagement in deliberate practice and tournament chess rating. The self-reported amount of deliberate practice (hours of studying alone plus hours of group practice) correlated positively and moderately with chess rating (r = 0.42). This is a sizeable correlation by psychological standards—a "medium" effect size in Cohen's (1992) widely used classification scheme (i.e., r = 0.10, small; r = 0.30, medium; r = 0.50, large). However, this finding also challenges the deliberate practice view because it means that deliberate practice left a large amount of the total variance in chess ratings unexplained. To be exact, a correlation of r = 0.42 between two variables indicates that one variable explains 18% of the variance in the other variable (i.e., r = 0.42 × 100 = 18%). In the present case, it must be assumed that some of this unexplained variance reflects random measurement error, because neither a measure of deliberate practice nor a measure of performance can be assumed to be perfectly reliable (we discuss this issue further below). However, the correlation was not large enough to suggest that, even after taking this psychometric artifact into account, participants at similar levels of chess skill would have reported similar amounts of deliberate practice. Instead, it suggests that the chess players varied substantially in the amount of deliberate practice they required to reach a given level of skill. Indeed, according to the data, they did: As Gobet and Campitelli (2007) described in their article, the self-reported estimate of number of hours of deliberate practice required to reach "master" status in their sample ranged from 3,016 to 23,608 h—a difference of nearly a factor of 8. The implication is that although deliberate practice clearly contributes to individual differences in chess expertise, other factors must contribute as well, as we discuss further in the final section of this article.

In our own first effort to test the deliberate practice view (Hambrick et al., 2014b), we reanalyzed results from expertise studies in the domains of chess and music. Our specific goal was to test Ericsson et al.'s (1993) aforementioned claim that "individual differences in ultimate performance can largely be accounted for by differential amounts of past and current levels

of practice" (p. 392, emphasis added). We identified six studies of chess and eight studies of music that reported a correlation between a measure of deliberate practice and performance. The average correlation was r = 0.49 for chess and r = 0.43 for music before applying the standard psychometric correction for measurement error variance (unreliability) of the constituent measures. For deliberate practice, we assumed a reliability coefficient of 0.80 based on information we could find about the reliability of this variable, as well as on Tuffiash et al.'s (2007) statement that "self-report practice estimates repeatedly from experts in sports and music have reported test-retest reliabilities at or above 0.80" (p. 129) and Ericsson's (2013) statement that "[t]he collected reliability of cumulated life-time practice at different test occasions in large samples has typically been found to range between 0.7 and 0.8" (p. 534). For music performance, we used reliability estimates from the studies, or if not reported, from studies that collected similar performance measures. The average amount of reliable variance in expertise explained by deliberate practice was 34% for chess and 29.9% for music. This is a substantial amount of variance, but it is not enough to support the claim that deliberate practice largely accounts for individual differences in expertise. This claim implies that deliberate practice should at least explain most of the variance in expertise, and evidence suggests it does not.

Subsequently, we set out to test the importance of deliberate practice as a predictor of individual differences in expertise, by way of a formal and comprehensive meta-analysis (Macnamara et al., 2014; see also Macnamara et al.'s, 2018, corrigendum for the article), ultimately reviewing over 11,000 articles. Ericsson et al. (1993) explained that deliberate practice "includes activities that have been specially designed to improve the current level of performance" (p. 368). Accordingly, we defined deliberate practice as structured activities designed to improve performance in a domain; and given Ericsson and colleagues' inconsistency on whether a teacher is required to design deliberate practice, we decided to include both teacher- and performer-designed activities. Identifying 88 studies, we found that deliberate practice explained 14% of the variance in performance overall, and 24% for games, 23% for music, 20% for sports, 5% for education, and 1% for professions. We also determined that deliberate practice left more of the variance in performance unexplained than it explained, across a range of possible values for measurement reliability. We concluded that the "amount of deliberate practice—although unquestionably important as a predictor of individual differences in performance from both a statistical and a practical perspective—is not as important as Ericsson and his colleagues have argued" (Macnamara et al., 2014, p. 1615).

In a later meta-analysis that focused on sports (Macnamara et al., 2016b), we found that the contribution of deliberate practice to sports performance varied by skill level: Among elite athletes (e.g., national-level and above), deliberate practice explained only 1% of the performance variance. Although it must be assumed that range restriction (another psychometric issue) would limit the deliberate practice-performance correlation when considering only elite performers, it is critical to note that it was Ericsson et al. (1993) themselves who stated that deliberate practice is still an important predictor of performance differences at the elite level. In their own words, "Individual differences, even among elite performers, are closely related to assessed amounts of deliberate practice" (Ericsson et al., 1993, p. 363, emphasis added). This finding from the Macnamara et al. (2016b) metaanalysis on sports is inconsistent with this claim. Furthermore, we found the relationship between deliberate practice and performance to be very similar whether the practice activities were solitary or in a group (see Macnamara et al., 2016b).

In a commentary on our meta-analysis (Macnamara et al., 2014), Ericsson (2014a) rejected 87 of the 88 studies that we included in our meta-analysis, claiming that we included studies that "violated [their] criteria for deliberate practice" (p. 2). However, in doing so, Ericsson (2014a)rejected numerous studies that he himself had previously used to explicitly argue for the importance of deliberate practice (see quotations from Ericsson's writings in **Table 1**). Thus, by any reasonable account, the standard for evidence concerning deliberate practice had shifted dramatically. Most perplexingly, in applying this new standard for evidence, Ericsson rejected several of his own studies of deliberate practice (e.g., Duffy et al., 2004; Tuffiash et al., 2007; Duckworth et al., 2011), seeming to undermine the case he had for decades been attempting to make for the importance of deliberate practice. Ericsson did not acknowledge that he had once used these studies he was now rejecting to argue for the importance of deliberate practice. This evaluation of evidence challenging the deliberate practice view seems indefensible.

Ericsson's (2014a) rejection of his own study of darts (Duffy et al., 2004) and his rejection of Charness and colleagues' studies of chess (Charness et al., 2005) were especially noteworthy (see the **Appendix** for Ericsson's varying characterizations of the Charness studies). Ericsson's (2014a) stated reason for rejecting these studies was that they provided "no record of a teacher/coach supervising all or most of practice" (see Ericsson, 2014b, **Table 2**). However, in a chapter published in the very same year, Ericsson (2014c) used both these studies to argue for the importance of deliberate practice, stating:

[I]n a study of district, national, and professional dart players Duffy, Baluch, and Ericsson (2004) found that solitary deliberate practice was closely related to performance, whereas the amount of social dart activities did not predict performance (Ericsson, 2014c, p. 191).

In chess, Charness and his colleagues (Charness, Krampe, and Mayr, 1996; Charness, Tuffiash, Krampe, Reingold, and Vasyukova, 2005) have found that the amount of solitary chess study was the best predictor of performance at chess tournaments, and when this type of deliberate practice was statistically controlled, there was no reliable benefit from playing chess games (Ericsson, 2014c, p. 191)<sup>2</sup> .

Another inconsistency was Ericsson's (2014a) rejection of his own study of spelling bee contestants (Duckworth et al., 2011) for

<sup>2</sup>There were two studies in this project. Charness et al. (1996) is a chapter and provides an initial report of data from Study 1 (N = 136 of an eventual N = 239); Charness et al. (2005) is a journal article and reports the full results (Study 1 N = 239; Study 2 N = 169).

TABLE 1 | Examples of studies that Ericsson rejected for violating his criteria for deliberate practice but previously used to argue for the importance of deliberate practice.


In each quotation, the boldface emphasis on "deliberate practice" is added. <sup>1</sup>Rejected because article "do[es] not record assigned individualized practice tasks with immediate feedback and goals for practice" (see Ericsson, 2014b; Table 3). <sup>2</sup>Rejected because article "do[es] not record a teacher or coach supervising and guiding all or most of the practice" (see Ericsson, 2014b; Table 2). Table from Hambrick et al. (2018b); used with permission from John Wiley and Sons.

violating this same teacher/coach criterion. Just 2 years earlier, in criticizing a journalist for his description of the study, Ericsson (2012) emphatically stated that the study had collected data on deliberate practice:

In that study we (as I was also one of the co-authors) collected data on 'deliberate practice.' We found that 'Grittier spellers engaged in deliberate practice more so than their less gritty counterparts, and hours of deliberate practice fully mediated the prospective association between grit and spelling performance' (p. 6).

In a commentary on a subsequent meta-analysis of deliberate practice in sports performance, Ericsson (2016) again insisted that our broad definition of deliberate practice was incorrect (for a reply, see Macnamara et al., 2016a). Yet he did not resolve or acknowledge the material inconsistencies in his past descriptions of deliberate practice, especially those concerning the important question of who designs deliberate practice activities (see **Figure 2**). Furthermore, Ericsson again criticized our inclusion of studies that he had previously used to argue for the importance of deliberate practice (e.g., Helsen et al., 1998; Hodges and Starkes, 1996). It is difficult, if not impossible, for scientists to test a theory if the definition of and standard for evidence are changed repeatedly, with no acknowledgment of and no explanation for the changes (Ferguson and Heene, 2013).

#### New Types of Practice

Around this same time, in their aforementioned popular press book Peak: Secrets from the New Science of Expertise, Ericsson and Pool (2016) proposed a distinction between deliberate practice

TABLE 2 | Ericsson's evaluations of the 14 studies and associated effect sizes included in Ericsson and Harwell's (2019) meta-analysis, from prior to 2014 to the present.


DP, deliberate practice; PP, purposeful practice. Each study/effect size in red was rejected by Ericsson (2014a) for violating one of his criteria at that time for deliberate practice (see Supplementary Tables in Ericsson, 2014b): <sup>1</sup> "Restriction of range in attained performance and accumulated deliberate practice" (Ericsson, 2014c; Table 5). 2 "Articles do not record assigned individualized practice tasks with immediate feedback and goals for practice" (Ericsson, 2014c; Table 3). <sup>3</sup> "Articles do not record a teacher or coach supervising and guiding all or most of the practice" (Ericsson, 2014b; Table 2). \*See Moxley et al. (2019), for reinterpretation of studies as focusing on purposeful practice rather than deliberate practice. \*\*Ericsson and colleagues (see Ericsson and Towne, 2013) cite another report of this chess study by de Bruin and colleagues to argue for the importance of deliberate practice (De Bruin et al., 2008), but that article is based on the same data as reported in the De Bruin et al. (2007) article. \*\*\*As cited in Deakin and Cobley (2003).

and two new forms of practice, perhaps in an attempt to address discrepancies and confusion surrounding the definition of deliberate practice that were being documented in the scientific literature (e.g., Hambrick et al., 2014a). They introduced and defined naïve practice as "essentially just doing something repeatedly, and expecting that the repetition alone will improve one's performance" (p. 14), and purposeful practice as an activity that has well-defined, specific goals and involves feedback, but which is self-directed rather than teacher-directed. Ericsson and Pool (2016) explained that "deliberate practice requires a teacher who can provide activities designed to help a student improve his or her performance....With this definition we are drawing a clear distinction between purposeful practice—in which a person tries very hard to push himself or herself to improve and practice that is both purposeful and informed" (p. 98, emphasis added). They further explained that "some approaches to training are more effective than others" (p. 85) and that deliberate practice is "the most effective method of all. . ..the gold standard, the ideal to which anyone learning a skill should aspire" (Ericsson and Pool, 2016, p. 85).

Using this new framework, Ericsson and colleagues reinterpreted studies they had once used to argue for the importance of "deliberate practice" as studies of the less effective "purposeful practice," but without explicitly acknowledging and justifying the reinterpretation (see Macnamara and Hambrick, 2020). As an example, Ericsson (2005) described Charness et al.'s (2005) chess study (entitled The Role of Deliberate Practice in Chess Expertise) as providing "the most compelling and detailed evidence for how designed training (deliberate practice) is the crucial factor in developing expert chess performance" (p. 237). Nevertheless, in their recent article, Moxley et al. (2019) explained that "Charness et al. (2005) found evidence for an independent effect of engagement in purposeful practice for chess skill" (p. 1163, emphasis added). As another example, Duckworth, Ericsson, and colleagues' spelling bee study (Duckworth et al., 2011) focused on deliberate practice: The article reporting the study was titled Deliberate Practice Spells Success: Why Grittier Competitors Triumph at the National Spelling Bee and the major conclusion of the study was that "[d]eliberate practice mediated the prediction of final performance by the personality trait of grit" (p. 174). Yet the recent Moxley et al. (2019) article stated that "[a]fter the questionnaire, we asked participants to fill out several additional personality measures that Duckworth et al. (2011) had found to be related to purposeful practice in preparation for competitions in spelling" (p. 1158, emphasis added).

In another instance, Ericsson and colleagues went from arguing that activities exist that meet the criteria for deliberate practice in the boardgame SCRABBLE, to arguing that it is not possible to engage in deliberate practice in SCRABBLE. Specifically, referring to Tuffiash et al.'s (2007) SCRABBLE study, Ericsson et al. (2009) stated that "[s]everal researchers have reported a consistent association between the amount and quality of solitary activities meeting the criteria of deliberate practice and

performance in different domains of expertise, such as. . .Scrabble (Tuffiash et al., 2007)" (p. 9). However, Moxley et al. (2019) wrote that because SCRABBLE lacks professional coaches "SCRABBLE players cannot engage in deliberate practice, but only purposeful practice and other types of practice" (p. 1150). Under this new framework, activities that once qualified as deliberate practice are now classified as less effective purposeful practice. Of course, it is appropriate for a theorist to reinterpret past evidence as a theory is refined and revised over time. But it is a serious problem, as in this case, when the reinterpretations of evidence are not explicitly acknowledged, explained, and justified. In the absence of such transparency, the reader will be led to think that the empirical support for the original theory was stronger than it is or ever was.

#### Recent Developments

Two reanalyses of our first meta-analysis (Macnamara et al., 2014) have been published in the past two years. The first was by Miller et al. (2018), who argued that some of the studies that we included in our meta-analysis did not capture deliberate practice. Reanalyzing our data, Miller et al. (2018) explained that they had raters code each study "using only the methods section" (p. 6) and explained that "a study was coded as DP (deliberate practice) if and only if it explicitly indicated it estimated the effects of deliberate practice" (p. 6). Their meta-analysis revealed an average correlation of r = 0.40 for "deliberate practice" (compared to our value of r = 0.38), and an average correlation of r = 0.21 for what they deemed "non-deliberate practice." However, as we pointed out in a reply (Hambrick and Macnamara, 2019a; see also Hambrick and Macnamara, in press), numerous studies Miller et al. (2018) coded as deliberate practice did not meet their own inclusion criteria. For example, Miller et al. (2018) coded Bilalic et al.'s (2007) ´ chess study as a deliberate practice study. Yet Bilalic et al. (2007) ´ made no mention of "deliberate practice" anywhere in their methods, and elsewhere in their article Bilalic et al. (see pp. 467–468) explicitly stated that they ´ did not interpret their practice measures as deliberate practice. This study and numerous others fulfilled the broad definition of deliberate practice we used in our own meta-analysis, but they did not meet Miller et al.'s (2018) narrower definition. There are two possibilities here: Miller et al. (2018) made errors in coding their studies, or they used coding criteria different from those stated in their article.

In a recent reply, while not addressing the coding issue, Miller et al. (2019) highlighted that the average deliberate practiceperformance correlation in their reanalysis and in our metaanalysis were very similar, which is an accurate observation (r = 0.40 vs. r = 0.38). Then they noted, "Still, something about our analyses [Miller et al.'s 2018, reanalysis] was displeasing to [Hambrick and Macnamara, 2019a]" (Miller et al., 2019, p. 289). We were actually clear about what the problem was: Miller et al.'s (2018) meta-analysis included studies that clearly did not meet their own stated inclusion criterion, rendering their results uninterpretable. Miller et al. (2019) added, "The central question of both studies was the role played by deliberate practice in the acquisition of expertize [sic]. One might think there would be a 'meeting of the minds' when the estimates from their analysis and ours returned such similar results" (p. 289). Miller et al. (2019) seem to suggest here that because their reanalysis yielded an average deliberate practice-performance correlation very similar to the corresponding correlation in our meta-analysis, we should have found their results acceptable. However, findings from scientific research should be evaluated based on whether they are accurate and interpretable, not on whether they agree with findings from one's own research. On that note, we reiterate that until Miller and colleagues can clarify their methods the results of their reanalysis will only add to the confusion surrounding deliberate practice. As it stands, the results of their reanalysis remain uninterpretable.

The second reanalysis of our dataset was by Ericsson and Harwell (2019). Again, without resolving past inconsistencies in descriptions of deliberate practice in Ericsson and colleagues' writings, Ericsson and Harwell (2019) criticized our use of a general definition of deliberate practice, stating:

There is no disagreement that the goal of improving performance is one characteristic of deliberate practice, and Ericsson et al. (1993) even wrote that "deliberate practice is a highly structured activity, the explicit goal of which is to improve performance" (p. 368). This sentence was, however, not a definition of deliberate practice any more than the true statement that "a dog is an animal" would imply the inference that "all animals are dogs." (p. 5).

According to this statement, Ericsson et al. (1993) never proposed the general definition of deliberate practice that we used for our meta-analysis. Yet, Lehmann and Ericsson (1997) stated that "Ericsson et al. (1993) have defined deliberate practice as a structured activity designed to improve performance" (p. 47, emphasis added). So, to be clear, Lehmann and Ericsson (1997)stated that Ericsson et al. (1993) defined deliberate practice as a structured activity designed to improve performance, but then Ericsson and Harwell (2019) stated that Ericsson et al. (1993) did not define deliberate practice as such. Contradictory statements such as these lead to confusion about what the definition of deliberate practice is—and even whether there is a fixed definition at all.

Compounding the issues already discussed, Ericsson and colleagues' standard for evidence concerning deliberate practice has apparently shifted yet again. For their main analysis, Ericsson and Harwell (2019) retained 14 of the 88 studies included in our meta-analysis (Macnamara et al., 2014), coding from these studies eight effect sizes as deliberate practice and six effect sizes as purposeful practice<sup>3</sup> . However, as shown in **Table 2**, seven of the eight effect sizes they coded as deliberate practice were from studies previously rejected by Ericsson (2014a) in his commentary on Macnamara et al.'s meta-analysis for not meeting the criteria for deliberate practice (recall that he rejected 87 of the 88 studies in that commentary). The major reason for this

<sup>3</sup>Ericsson and Harwell (2019) state they coded eight effect sizes as deliberate practice and six effect sizes as purposeful practice. They list the deliberate practice effect sizes being from: Baker et al. (2003), Charness et al. (2005), both effect sizes; Ericsson et al. (1993), both effect sizes; Schultetus and Charness (1997), Gobet and Campitelli (2007), and Ruthsatz et al. (2008). And they list the purposeful practice effect sizes being from: Duffy et al. (2004), Tuffiash et al. (2007), Harris (2008), Duckworth et al. (2011), and Maynard et al. (2014). We assume that De Bruin et al. (2007) was the sixth purposeful practice study and that its omission from the latter list is a clerical error by Ericsson and Harwell (2019).

discrepancy is that Ericsson and Harwell (2019) used a more lenient teacher/coach criterion for deliberate practice in their own meta-analysis than Ericsson (2014a) used when evaluating our meta-analysis. More specifically, whereas Ericsson (2014a) required that a study "record a teacher or coach supervising and guiding all or most of the practice" (see Ericsson, 2014b, **Table 2**), Ericsson and Harwell (2019) required only that a study describe individualized sessions with a coach or teacher, with no specification of the lower limit on the amount of supervision and guidance. As they explained:

For coding purposeful versus deliberate practice, we looked for explicit mentions within the original study methods of individualized sessions with coaches or teachers as being included as part of the estimate of solitary practice. If a study did describe individualized instruction sessions as being part of the practice estimate it was considered to be deliberate practice (Ericsson and Harwell, 2019, Supplementary Material, p. 5).

Ericsson and Harwell (2019) do not explain why they made this shift from the stricter criterion that Ericsson (2014a) used when evaluating our meta-analysis to a more lenient teacher/coach criterion for themselves when they reanalyzed our meta-analytic data. Interestingly, they do not even cite Ericsson's (2014a) commentary, although Ericsson (2018) very recently used it as a basis for rejecting the conclusions of the Macnamara et al. (2014) meta-analysis (see also Ericsson and Pool, 2016).

Additional examples of apparent shifts in the standard for evidence concerning deliberate practice can be found in Ericsson and Harwell (2019). For example, in his earlier commentary on Macnamara et al. (2014), Ericsson (2014a) rejected Ruthsatz et al.'s (2008) studies of musicians as a valid study of deliberate practice because there was a restriction of range in the measures (see Ericsson, 2014b, Table 5). However, Ericsson and Harwell (2019) apparently no longer saw this as a problem, because they coded one of these previously rejected studies (Study 2) as deliberate practice and included it in their own meta-analysis.

Setting aside these issues, what would the results of Ericsson and Harwell's (2019) meta-analysis indicate if they were taken at face value? After their correction for measurement error variance (unreliability), the average correlation for the 14 studies that measured either purposeful practice or deliberate practice with performance increased from r = 0.54 to r = 0.78, indicating that purposeful/deliberate practice<sup>4</sup> explained 61% of the reliable variance in the performance. Although psychometric corrections come with larger standard errors and confidence intervals (Oswald et al., 2015), this point estimate of 61% might be interpreted to support Ericsson et al.'s (1993) central claim that individual differences in domain-relevant performance can largely be accounted for by accumulated amount of practice—at least a weak version of this claim (see Hambrick et al., 2018a).

However, it is critical to examine Ericsson and Harwell's (2019) assumptions about the reliability of the variables, because those values determined the degree of correction to the correlation (i.e., the lower the assumed reliability, the greater the correction). As previously noted, Ericsson and colleagues previously stated that "self-report practice estimates repeatedly from experts in sports and music have reported test-retest reliabilities at or above 0.80" (Tuffiash et al., 2007, p. 129) and that "[t]he collected reliability of cumulated life-time practice at different test occasions in large samples has typically been found to range between 0.7 and 0.8" (Ericsson, 2012, p. 534). However, for reasons that are unclear, Ericsson and Harwell (2019) used a lower reliability estimate of 0.60 for purposeful/deliberate practice, apparently drawing on different information about the reliability of practice. This raises the important question of whether, by using a lower reliability value, they overcorrected the correlation between purposeful/deliberate practice and performance, and thus overestimated the variance shared between the two variables.

As is the case for most measured variables in psychological research, the reality is that the reliability of practice and performance variables must be estimated and can never be known with certainty; the accuracy of any reliability estimate will vary with the number of participants, the number of items/questions on the instrument, and other factors. Furthermore, the reliability of practice and performance variables may vary depending on factors such as the domain, the skill level and age of the participants, and so on. Thus, an approach we have adopted in our own research is to correct correlations between these variables, based on whatever point estimates of reliability are available, but also to report a sensitivity analysis in which the correlation is corrected under both lower and higher levels of reliability (e.g., Macnamara et al., 2014, Table S1).

**Table 3** presents such a sensitivity analysis for the metaanalytic correlations that Ericsson and Harwell (2019) report between purposeful/deliberate practice and performance (r = 0.54, k = 14), between purposeful practice and performance (r = 0.51, k = 6), and between deliberate practice and performance (r = 0.56, k = 8). As can be seen, the strength of the correlations varies depending on the reliability estimates, with larger corrections for lower reliability estimates. For example, if reliability is assumed to be rxx = 0.80 for deliberate practice (Ericsson, 2013) and ryy = 0.80 for performance, then the corrected correlation of performance with deliberate practice is r<sup>c</sup> = 0.70, indicating that deliberate practice explains 49% of the reliable between-person variance in performance (rather than 61%). Thus, conclusions will vary considerably depending on what values are used as the reliability estimates in the psychometric correction.

Across the board, the correlations between practice and domain-relevant performance (expertise) in **Table 3** are meaningfully large, from theoretical, statistical, and practical perspectives. At the same time, the correlations vary considerably in magnitude and may lead to different conclusions about the importance of the practice variables. We illustrate this point with reference to the deliberate practice correlations (the bottom set of results in **Table 3**). If deliberate practice explained, for example, 87% of the variance in performance (assuming relatively low reliabilities of rxx = 0.60 and ryy = 0.60), then a strong version of Ericsson et al.'s (1993) claim that individual differences in

<sup>4</sup>Following Ericsson and Harwell's (2019) terminology, when referring to all 14 effect sizes (six coded as purposeful practice and eight coded as deliberate practice) in the meta-analysis, we use the label "purposeful/deliberate practice."

TABLE 3 | Sensitivity analyses for Ericsson and Harwell (2019) meta-analytic correlations.


The table provides corrected correlation coefficients (rcs) based on Ericsson and Harwell's (2019) reported meta-analytic correlations of rpurposeful/deliberate = 0.54, rpurposeful = 0.51, and rdeliberate = 0.56. Values in parentheses are variance estimates (i.e., r<sup>c</sup> <sup>2</sup> × 100). rxx, reliability estimate for practice; ryy, reliability estimate for performance; DP, deliberate practice. The bolded values are the corrected correlations using Ericsson and Harwell's (2019) reliability values. The formula for computing a correlation corrected for unreliability (rc) is the observed correlation (robs) divided by the square root of the product of the variables' reliabilities (rxx and ryy): r<sup>c</sup> = robs/ √ (rxx × ryy) (see Schmidt and Hunter, 1999).

performance can largely be accounted for by deliberate practice would be supported. However, if deliberate practice explained just over half (56%) of the variance (assuming higher reliabilities of rxx = 0.80 and ryy = 0.70), then a weaker version of the claim would be supported. If deliberate practice explained an even smaller amount of the variance—for example, 39% (assuming even higher reliabilities of rxx = 0.90 and ryy = 0.90)—then an implication would be that factors other than deliberate practice explain more of the variance in performance than deliberate practice does. These different variance estimates might then lead to different priorities in expertise research. For example, the first scenario (87% of variance explained) might prompt an exclusive focus on training history, whereas the third scenario (39% of variance explained) might prompt a broader focus on multiple determinants of performance differences (e.g., training history, basic abilities). Elsewhere, we have argued that based on extant evidence, research should indeed investigate the role of a wide range of factors in explaining individual differences in expertise (Ullén et al., 2016; Hambrick et al., 2018b).

We further note that, if they were taken at face value, Ericsson and Harwell's (2019) findings would fail to support a central prediction of the latest version of the deliberate practice view. As mentioned earlier, Ericsson and Pool (2016) differentiated deliberate practice from naïve practice and purposeful practice, describing deliberate practice as "the most effective method of all. . .the gold standard" (p. 85). A straightforward prediction, following from these claims, is that the positive correlation between deliberate practice and domain-relevant performance should be significantly greater than the positive correlation between purposeful practice and domain-relevant performance (i.e., rdeliberate > rpurposeful). Ericsson and Harwell's (2019) findings do not support this prediction: the average correlation was r = 0.56 for deliberate practice and r = 0.51 for purposeful practice, a non-significant difference (p = 0.64). Ericsson and Harwell (2019) report this finding, but they note only that "practice was positively associated with performance whether it was conducted under the guidance of a coach or teacher" (p. 11). From the standpoint of the distinction between the two types of practice (Ericsson and Pool, 2016), the equally important conclusion would seem to be that there is no evidence from the meta-analysis that deliberate practice has higher validity than purposeful practice in predicting individual differences in domain-relevant performance, as is predicted by Ericsson and colleagues' new framework.

# HOW IMPORTANT IS DELIBERATE PRACTICE?

Again, how important is deliberate practice as a predictor of individual differences in expertise? It is somewhat difficult to say given the ambiguity over the definition of deliberate practice, but we can at least summarize evidence from metaanalyses by different groups of researchers that have attempted to

answer this question. Along with the aforementioned reanalysis of chess and music studies (Hambrick et al., 2014b), there have been five formal meta-analyses of the deliberate practiceperformance relationship. **Table 4** summarizes the overall result from these meta-analyses (i.e., the overall correlation between deliberate practice and performance), and presents a sensitivity analysis showing variance estimates under different reliability assumptions, from unacceptable to excellent. As shown, across meta-analyses, deliberate practice explains a sizeable amount of the between-person variance in performance. However, we conclude that it is unlikely to be as important as Ericsson and colleagues have hypothesized it is. In nearly all cases, deliberate practice leaves a large amount of reliable variance unexplained, and in most cases, the unexplained variance exceeds the explained variance. Our conclusion, as in the past, is that deliberate practice is an important predictor of individual differences in expertise. However, deliberate practice is it is unlikely to be as important as Ericsson and colleagues have proposed it is.

# IS THE DELIBERATE PRACTICE VIEW DEFENSIBLE?

In view of the issues discussed in the preceding pages, it seems reasonable to ask whether the deliberate practice view is scientifically defensible—that is, whether deliberate practice can be conceptualized and tested empirically in a consistent manner by the research community. In his book The Logic of Scientific Discovery, the philosopher of science Karl Popper (1959) argued, "In so far as a scientific statement speaks about reality, it must be falsifiable; and in so far as it is not falsifiable, it does not speak about reality" (p. 314). By definition, a theory is unfalsifiable when it cannot be rejected under any circumstances, because it can accommodate any finding. This happens when multiple, contradictory definitions of theoretical concepts are proposed by a theorist, and when the theoretical and operational criteria are kept in a fluid state. Under these conditions, evidence can be rejected or accepted depending on whether it supports the theory.

When a theory becomes unfalsifiable, it ceases to be a scientific theory, at least in the Popperian sense. The theory is always "right" and cannot be evaluated against competing theories. Ferguson and Heene (2013) described a theory that has entered this state as being "undead," like a zombie that is technically dead but remains animate. The undead theory "continues in use, having resisted attempts at falsification, ignored disconfirmatory data, negated failed replications through the dubious use of meta-analysis or having simply maintained itself in a fluid state with shifting implicit assumptions such that falsification is not possible" (Ferguson and Heene, 2013, p. 559).

Is the deliberate practice view falsifiable? We will leave it to readers of this article to draw their own conclusions. At the very least, it seems difficult to deny that there are serious problems with the deliberate practice view, as Ericsson and colleagues have presented it over the past 25 years. As we have documented here, Ericsson and colleagues have described deliberate practice in contradictory ways, creating major confusion about the definition of deliberate practice. Furthermore, Ericsson and colleagues' standard for evidence—the specific criteria that need to be satisfied to use a study to argue for the importance of deliberate practice and even the criteria themselves—has appeared to shift multiple times.

To be sure, it is not only normal, but expected, for a scientist to revise a theory as evidence relevant to that theory accumulates through research. Theory revision is a fundamental part of what the philosopher of science Imre Lakatos (1978) called a "progressive" program of research. In this iterative process, revisions are explicitly acknowledged and clearly explained and justified so that they can be understood and critically evaluated by other scientists. When theory revisions are not made in a transparent manner, then in Lakatos' terminology a theory can be endlessly adjusted and readjusted to keep it "alive." The program of research then shifts from "progressive" to "degenerative" (see also Musgrave and Pigden, 2016).

On a related note, the concept of deliberate practice is arguably underspecified in ways that leave open the opportunity for numerous post hoc explanations of results. For example, Ericsson and colleagues have stated that deliberate practice requires that the performer have "full concentration" (e.g., Ericsson and Harwell, 2019). However, this is a psychological state that may be impossible to achieve. Would, for example, a person's awareness of the environment (e.g., the temperature) or a fleeting thought (e.g., about an event earlier in the day) mean that they were not fully concentrating on the training task? If yes, then it seems unlikely that a person could ever fulfill this criterion of deliberate practice. Furthermore, we are not aware of a method for objectively determining whether a person has full concentration on something. Research approaches from cognitive psychology (e.g., primary-secondary task paradigms) permit no more than relative statements about the degree to which a person is attending to one task (or stimulus) versus some other task(s). Except for subjective self-reports, we are also unaware of attempts by Ericsson and colleagues themselves to measure concentration level during practice. If a criterion for a theoretical construct (e.g., achieving "full concentration") either cannot be achieved or cannot be empirically verified, then imposing that criterion makes the theory unfalsifiable. This sort of flexibility in the deliberate practice view, along with the definitional confusion we have discussed, presents an additional problem for its scientific viability.

Why is it important for researchers to comment publicly when a theory in their research area appears to be degenerating? It is important because a degenerating theory—especially if it is influential—impedes actual progress in an area of research. The value of research to test the theory becomes questionable, because evidence is accepted as valid only if it supports the theory and rejected if it fails to support the theory. In turn, practical recommendations and applications based on the theory will lack a scientific foundation, because even conflicting recommendations and applications can be supported. If conclusions from this research do not have a solid empirical foundation, then recommendations based on the theory may be wasteful, counterproductive, or even harmful.



k, number of effect sizes. r, meta-analytic correlation between deliberate practice and performance.% variance, percentage of variance in performance explained by deliberate practice (i.e., r<sup>2</sup> ). For the corrected correlations, values in parentheses adjacent to correlations are variance estimates (r<sup>c</sup> <sup>2</sup> × 100). NR, not reported.

To be certain, we do not think that the concept of deliberate practice should be abandoned. Deliberate practice is a vital area of research in psychological science and other fields. However, given the issues we have discussed at length in this article, we do believe that it would be highly beneficial to expertise researchers (and scientists from other research areas interested in expertise) for proponents of the deliberate practice view to fully address and resolve the apparent inconsistencies and shifts in the definition of and standard of evidence for deliberate practice that we have documented here—and which raise serious concerns about the viability of the deliberate practice view as a scientific theory. This would allow other researchers to empirically evaluate the importance of deliberate practice as a predictor of individual differences in expertise, both in individual studies and meta-analyses, and to compare its predictive validity to that of other factors (e.g., general aptitudes, basic capacities) and the conditions of practice (e.g., spacing of practice sessions, type of feedback).

To this point, in the following sections, we summarize key evidence for the role of a diverse range of factors in explaining individual differences in expertise, and then discuss in broad terms what we believe are fruitful directions for future research to develop comprehensive models of expertise.

#### TOWARD A MULTIFACTORIAL MODEL OF EXPERTISE

What might explain individual differences in expertise, beyond any contribution of deliberate practice? We direct readers to recent theoretical/review articles in which we discuss this issue at length (Hambrick et al., 2016; Ullén et al., 2016; Macnamara and Hambrick, 2020). Here, we briefly summarize evidence concerning three major classes of factors.

#### Developmental Factors

The question of when specialized training should commence in a person's life is the subject of a longstanding debate in the field of expertise. The early specialization view argues that the earlier the training can begin, the better. The logic of this view is straightforward: Because it is both physically and psychologically taxing, a person can engage in only a few hours of deliberate practice a day (around 4 h on average; Ericsson et al., 1993) without burnout and/or injury. Therefore, the individual who begins training at a relatively late age (e.g., age 12) can never catch up to the individual who begins training earlier (e.g., age 6). However, in a meta-analysis of sports studies with samples representing a wide range of skill, we found no evidence for an earlier average starting age for high-skill athletes relative to lower-skill athletes. Furthermore, research suggests that the highest (elite) levels of sports performance are associated with a later starting age, combined with participation in a diverse range of sports in adolescence. For example, Güllich (2017) compared 83 international medalists (Olympic/World championship) to 83 non-medalists matched on sport, age, and gender. Up to age 18, the medalists had, on average, accumulated significantly fewer hours of organized training/practice in their main sport (by 948 h) than the nonmedalists. Moreover, the average starting age was later by approximately a year-and-a-half for the medalists compared to the non-medalists. One possible explanation for this finding is that a starting age that is too early increases the risk for injury and/or burnout. Another possible explanation is that starting later allows for more early diverse experiences, increasing the likelihood that the individual will find a sport that is a good match to his or her profile of performance-relevant traits (Güllich, 2017).

### Experiential Factors

A central tenet of the deliberate practice view is that deliberate practice is more predictive of individual differences in expertise than other forms of experience, such as work and play. As Boot and Ericsson (2013) explained, "Ericsson and colleagues. . .make a critical distinction between domainrelated activities of work, play, and deliberate practice, and claim that the amount of accumulated time engaged in deliberate practice activities is the primary predictor of exceptional performance" (p. 146). The available evidence does not appear to support this claim. As already mentioned, if Ericsson and Harwell's (2019) findings are taken at face value, they reveal that deliberate practice, although claimed to be the "gold standard" for improving performance (Ericsson and Pool, 2016), is not a significantly stronger predictor of individual differences in expertise than mere purposeful practice (i.e., rdeliberate = 0.56 vs. rpurposeful = 0.51, difference in rs nonsignificant). Furthermore, measures of deliberate practice have not always been found to be stronger predictors of individual differences in expertise than measures seeming to meet the definition of "work" and "play." For example, in a study of insurance salespeople, Sonnentag and Kleine (2000) found

that correlations between sales performance and measures of deliberate practice (rcurrent = 0.21 and raccumulated = 0.13) were not stronger than the correlation between sales performance and a measure fitting the description of work for this domain—the number of cases handled (r = 0.37; see Hambrick et al., 2016, for other examples).

Evidence further suggests that diverse forms of experience are important as well, especially in the early stages of training. For example, in Güllich's (2017) study comparing the 83 international medalists and 83 non-medalists, he not only found that the medalists had accumulated significantly less main-sport practice than their less-accomplished counterparts during childhood/adolescence, but also that the medalists had accumulated significantly more experience with other sports during this period (see also Güllich et al., 2017).

#### Ability Factors

Research has firmly established that cognitive ability explains a statistically and practically significant amount of the variability in people's acquisition of complex skills (Hambrick et al., 2019; also see Ackerman, 1987, for a review of early studies). That is, people higher in cognitive ability learn complex skills more readily and rapidly than people lower in cognitive ability. For example, in a study of music training, participants with little or no experience playing music completed tests of cognitive ability, music aptitude, and growth mindset, and then they were given instruction in playing a simple piece of music on the piano (Burgoyne et al., 2019). Higher-ability participants showed a greater rate of learning than lower-ability participants, with a general intelligence factor explaining approximately 30% of the individual differences in learning rate.

Ericsson (2014d) has theorized that general cognitive ability is important initially in acquiring complex skills, but its predictive power diminishes as domain-specific skills and knowledge are acquired, stating:

For individuals who have acquired cognitive structures that support a high level of performance the expert performance framework predicts that these acquired cognitive structures will directly mediate superior performance and thus diminishing correlations between general cognitive ability and domain-specific performance (p. 84).

For complex tasks of interest to expertise researchers, evidence for this claim, which we termed the circumvention-of-limits hypothesis (Hambrick and Meinz, 2011), is weak and inconsistent. In a recent review (Hambrick et al., 2019), we searched through approximately 1,300 articles and identified 15 studies in the domains of games, music, sports, science, medicine/surgery, and aviation relevant to this hypothesis. Of the 15 studies, only three yielded any evidence supportive of the circumvention-of-limits hypothesis. Moreover, methodological limitations (e.g., small Ns, measures with unknown or unreported reliability) precluded any strong conclusions from those few studies. Providing what might be considered the strongest evidence for the hypothesis, one of these three studies that seem to support the circumvention-oflimits hypothesis was a meta-analysis of chess studies (Burgoyne et al., 2016; see also Burgoyne et al., 2018, corrigendum). As determined by a moderator test, fluid intelligence correlated significantly more strongly with chess rating in lower-skill chess players (avg. r = 0.32) than in higher-skill chess players (avg. r = 0.14). However, it is important to note that skill level was highly confounded with age (i.e., lower-ability samples were youth, whereas higher-ability samples were adults), limiting the strength of the evidence in support of the circumvention-oflimits hypothesis.

We also note that results that have sometimes been used to argue that the influence of general cognitive ability on expertise diminishes with increasing skill do not warrant this conclusion. For example, Ericsson (2014d) pointed to results by Ruthsatz et al. (2008) as support for this hypothesis. Ruthsatz et al. (2008) found that a measure of general cognitive ability (Raven's Progressive Matrices score) correlated positively and significantly with musical accomplishment in high school band members (r = 0.25, p < 0.05), but not in university music majors (r = 0.24) or conservatory students (r = 0.12). However, the critical question is not whether the lower-skill group correlation is statistically significant while the higher-skill group correlations are not. Rather, it is whether the former correlation and the latter correlations are significantly different from each other, as determined by the appropriate statistical test. As it happens, in the Ruthsatz et al. (2008) study, the correlations are not significantly different from each other (all z tests for differences in correlations are statistically non-significant). Thus, the results of Ruthsatz et al.'s (2008) study fail to support the hypothesis that ability-performance correlations diminish with increasing skill.

We also reviewed evidence relevant to the circumventionof-limits hypothesis from the job performance literature, and here the evidence is more consistent and interpretable. General cognitive ability is regarded as the single best predictor of job training performance, and of subsequent job performance (Schmidt and Hunter, 2004; Schmidt, 2014). Higher ability people tend to learn job skills more rapidly and to a higher level than lower ability people, and in turn have greater success on the job. Furthermore, although the validity of general cognitive ability for job performance may drop somewhat as a function of job experience initially, it appears to remain a statistically significant predictor even at high levels of job experience. For example, in a study of 10,088 military personnel across 31 jobs, the correlation between a measure of general cognitive ability (the AFQT score) and hands-on job performance decreased as a function of job experience from 1 to 2 years of job experience (r = 0.34 to r = 0.21; z test for difference = 3.60, p < 0.001), but then stabilized and remained statistically and practically significant (see **Figure 3**). In their own review of the job performance literature, Reeve and Bonaccio (2011) concluded that "although validities might degrade somewhat over long intervals, we found no evidence to suggest that they degrade appreciably, thereby retaining practically useful levels of validity over very long intervals" (p. 269).

#### Genetic and Environmental Influences

Research in the field of behavioral genetics has demonstrated that both genetic and environmental variance across individuals

contribute to the total variance in a wide range of behavioral outcomes (Turkheimer, 2000), including ability factors that have been found to correlate with measures of expertise. The extent of the genetic contribution is captured by the heritability statistic (h 2 ), an estimate of the proportion (0 to 1) of the total variance in a trait that can be attributed to genetic (nonenvironmental) variance within a sample of individuals (Plomin et al., 2008). Because some of these factors correlate with expertise, it stands to reason that both genetic and environmental variance may also contribute to the total variance in expertise. Furthermore, basic abilities and characteristics that may predict individual differences in expertise have also been observed to be substantially heritable, including drawing ability (Arden et al., 2014), music aptitude (Ullén et al., 2014; see Mosing et al., 2018, for a review), and maximal oxygen consumption in athletic performance (VO2max; Schutte et al., 2016).

At the same time, no psychological trait is 100% heritable (Turkheimer, 2000), and even the most heritable psychological trait will have a sizeable environmental component. For example, heritability estimates for measures of general cognitive ability are typically in the 50 to 70% range in samples drawn from developed countries (e.g., Tucker-Drob and Bates, 2016), with the remaining variance (as much as 50%) explained by shared and/or non-shared environmental factors. This means that correlations between a measure of some trait (e.g., general cognitive ability) and a measure of expertise could be driven by the genetic variance or the environmental variance in the trait measure, or by both

created; (B) shows this correlation when eight levels of job experience are created. Figure from Hambrick et al. (2019); used with permission of Oxford University Press.

types of variances. In other words, the finding that a measure of a heritable trait correlates with expertise is only consistent with the possibility that genetic variance is a component of individual differences in expertise.

It is also critical to note that genes and environments cannot generally be assumed to be uncorrelated across people. Rather, across people, genetically influenced factors may contribute to variance in the environments which people seek out and are exposed to. This is the idea of gene-environment correlation, or rGE (Plomin et al., 1977). For example, just as children who are tall might be more interested in playing basketball and more likely to be selected to play on basketball teams than children who are shorter, those with a high level of music aptitude may be more likely to take up, be selected for, and persist in music than those with a lower level of this aptitude. Consistent with this sort of speculation, there is now evidence to indicate that the propensity to practice in a domain is substantially heritable. In a large twin study, Mosing, Ullén, and colleagues found an average heritability of around 50% for accumulated amount of music practice (Mosing et al., 2014; see also Hambrick and Tucker-Drob, 2015). A possible explanation for this finding is that music aptitude, as well as more general ability and non-ability factors, differentially predispose people to engaging in music practice.

Genetic factors and environmental factors may not only correlate with one another; they may interact in influencing behavioral outcomes—what is known as gene-environment interaction, or G × E. G × E occurs when a genetically influenced factor moderates (increases or decreases) the effect of an environmental factor on an outcome. As one example of G × E, analyzing data from the National Merit Twin Study, Hambrick and Tucker-Drob (2015) found that heritability of a music accomplishment variable was 0.43 for individuals who reported engaging in music practice, versus 0.01 for those who did not. (This result was not due to range restriction, as there was still variability in music accomplishment among participants who reported not practicing). This finding suggests that music practice may activate genetic factors that vary across people.

Four additional points concerning the potential contribution of genetic factors to individual differences in expertise are important to note here. First, even if a measure of expertise is found to be heritable, this in no way implies that training is unnecessary to develop a high level of expertise, or that training is beneficial to only some people. Training is necessary and essential for developing a high level of expertise in a domain, and except when a condition rules out some type of training (e.g., a visual training regimen for a person who is blind), anyone would be expected to benefit from proper training. Second, heritability does not imply immutability. For example, in adults, height is highly heritable and relatively fixed, whereas weight is similarly heritable but can be modified through an environmental intervention—namely, dieting. Third, environmental interventions that change individual differences will also change heritability. For example, if an environmental intervention were introduced that allowed nearly everyone to reach about the same level of skill in some task, heritability would be expected to decrease. In a similar manner, heritability can differ across populations (e.g., in a developing country vs. a non-developing country; Tucker-Drob and Bates, 2016). Fourth, and finally, it is safe to assume that to the degree that expertise is heritable, this would reflect variation in a great many genetic variants (i.e., single nucleotide polymorphisms, or SNPs), meaning there is no "expertise gene" in any domain. As Chabris and colleagues noted, "A typical human behavioral trait is associated with very many genetic variants, each of which accounts for a very small percentage of the behavioral variability" (Chabris et al., 2015, p. 305). Research is uncovering genetic variants that may contribute to individual differences in expertise, but it is highly unrealistic to expect that any one of these factors will account for a large amount of the variance in expertise.

# Putting It All Together

fpsyg-11-01134 August 17, 2020 Time: 15:32 # 15

Taken together, evidence suggests that individual differences in expertise arise from influences of multiple factors. This includes training and other forms of domain-relevant experience, as well as developmental factors (e.g., age of starting training), ability factors (e.g., aptitudes), non-ability factors (e.g., personality traits), and background factors (e.g., opportunity to engage in training). In recent articles, we have proposed the multifactorial gene-environment interaction model (MGIM) to describe how these factors relate to one another and to serve as a guide for future research on expertise (Ullén et al., 2016). As illustrated in **Figure 4**, the MGIM assumes that expertise arises from influences of both domain-general and domain-specific factors, which are assumed to be influenced by both genetic and environmental factors. The model further assumes that task/situational factors may moderate the influence of these factors on expertise (i.e., domain-relevant performance).

# THE PATH AHEAD IN EXPERTISE RESEARCH

Ericsson and colleagues' deliberate practice view has had a monumental impact on the field of expertise, and is important in the history of psychology, more generally. However, in our assessment, it is not clear that the deliberate practice view is defensible as a scientific theory. As described here in some detail, Ericsson and colleagues have defined deliberate practice in inconsistent ways (see **Figure 2**) and the standard for evidence concerning deliberate practice has appeared to shift multiple times (see **Tables 1**, **2**). These issues present problems for the empirical testability (i.e., falsifiability) of the deliberate practice view.

Embracing tenets of the open science movement, we believe that the path ahead is for expertise researchers to work together to develop testable theories that take into account a much wider range of potentially relevant causal constructs than have often been considered in previous research, and to use rigorous empirical methods to evaluate these theories. The open science movement promotes normative values aimed at increasing accuracy, openness, and fairness in scientific research and scholarship (see a special issue of the Journal of Expertise devoted to open science in expertise research; McAbee and Macnamara, 2020).

Nearly 80 years ago, the sociologist of science Robert Merton (1942) described four scientific "imperatives" that capture the values of the open science movement. First, universalism: scientific validity is independent of the status of the people conducting the research; evidence should be evaluated based on its own merits rather than the status or prominence of the person reporting the evidence. Second, communism: a theory does not belong to the theorist, it belongs to the field. The theorist has no greater right to the theory once it is made public than any other scientist. Third, disinterestedness: scientists should perform research to increase understanding of some phenomenon rather to advance self-interests, whatever they may be. Finally, organized skepticism: a field should scrutinize claims based on empirical evidence.

In the wake of the replication crisis in the social sciences, many measures have been proposed to increase the reproducibility of research findings in psychological science and to accelerate progress in research (see Munafò et al., 2017). Preregistration of study designs, primary outcomes, and data analysis plans can help safeguard against post hoc interpretation of data. Improved methodological training can help researchers avoid pitfalls in designing studies (e.g., omitting critical control conditions) and in data analysis (e.g., misinterpreting p-values). Collaboration can facilitate collection of large samples and help to ensure that multiple theoretical perspectives are considered in study design. The time is also ripe for a preregistered "adversarial collaboration"—a study in which researchers with differing views agree on an empirical test to resolve a theoretical dispute that is designed to provide a fair test of both views (see Kahneman and Klein, 2009; for a recent example, see Doherty et al., 2019). No less so than in any area of psychological research, we believe that open science will accelerate progress toward greater understanding of the nature and origins of expertise and expert performance.

# Recommendations for Expertise Research

We make four general recommendations for conducting expertise research, which are based on best research practices in differential psychology (see Ackerman and Hambrick, 2020). First, after selecting a domain for the research, the researchers should seek to assess a wide range of potentially relevant causal factors (Simonton, 1999). Whatever the domain, it will be critical to measure key environmental factors, including various types of training and factors related to the opportunity to engage in these activities. However, drawing on the vast literature in differential psychology, it will be equally critical to include measures of basic abilities, capacities, dispositions, and other psychological traits that may affect performance directly, or indirectly through training. Only then can the relative and joint contributions of these factors to individual differences in expertise be evaluated.

The second recommendation is that multiple measures be used to index each of the hypothesized constructs. It is axiomatic in the psychological methods literature that virtually no observed measure (or indicator) is "construct pure." That is, a score collected by an instrument (test, questionnaire, etc.) designed

to measure a given hypothetical construct may reflect that construct to some degree, but it will certainly reflect other, construct-irrelevant factors, such as participants' familiarity with a particular method of assessment (e.g., test format) and psychological states that may affect their responding (e.g., sleep deprivation and motivation). There is no perfect way to deal with this problem, but when multiple measures of a construct are obtained, it becomes possible to use data-analytic techniques (viz., structural equation modeling) that are explicitly designed to deal with this issue by allowing researchers to model latent variables that are closer to theoretical constructs of interest than observed variables are.

The third recommendation is that the sample of participants from the targeted domain should ideally represent a wide range of performance rather than extreme groups. As we have noted elsewhere (Hambrick et al., 2019), categories such as "novice" and "expert" are not naturally occurring—they are groups of performers created based on ultimately arbitrary cuts on performance scores. Accordingly, scientific research on expertise should endeavor to explain the full range of performance differences within different domains rather than differences between artificial groups of performers, and also continuities and discontinuities across this range (see Bliese and Lang, 2016).

The final recommendation is for expertise researchers to begin large-scale longitudinal studies. Longitudinal studies are expensive and, by their very nature, time-consuming. At the same time, they are common in psychology. For example, there are longitudinal studies in the area of cognitive aging that have been running for many decades, such as the Seattle Longitudinal Study (Schaie, 2005). There is also precedent for longitudinal studies of expertise, including Schneider and colleagues' important longitudinal study of tennis skill, which included two future World No. 1 players (Schneider et al., 1993). Although expensive, labor-intensive, and time-consuming, multi-site longitudinal studies of expertise will provide for much stronger conclusions concerning the underpinnings of expertise.

We also offer one more specific recommendation for future research on expertise. The goal of Friedlander and Fine's (2016) grounded expertise components approach (GECA) is to identify predictors of individual differences in expertise in a theoretically neutral manner to minimize bias in findings from research concerning the relative importance of one class of factor versus another (e.g., training vs. basic abilities). The approach begins with exploration: administering a survey to a large sample of performers in domains, with questions about their level of engagement in training activities, as well as education, interests, hobbies, careers, accomplishments, and other characteristics that may be informative about potential predictors of expertise (i.e., performance differences). The survey data are then analyzed to identify a candidate set of potential predictors of expertise. Finally, a study is conducted to estimate the relative contributions of the factors to the prediction of individual differences in expertise. Programmatic research can then proceed on this basis.

A major goal of the GECA is to identify activities that consistently correlate to a practically and statistically significant degree with measures of expertise. These activities may differ across domains. More specifically, for some domains, the activities may meet the ultimate criteria for deliberate practice, whereas for other domains, they may include unstructured activities that do not fit the definition of deliberate practice, purposeful practice, or even naïve practice. Furthermore, within a domain, there may be multiple "routes" to developing expertise. That is, one performer may achieve a given level of expertise by engaging in one set of activities, whereas another performer may achieve the same level of expertise by engaging in a different set of activities. As an illustration, Berliner (1994) found that jazz musicians emphasized the importance of intentionally unstructured "jam" sessions for developing improvisational skill, and noted that "[s]trongly motivated students commonly learned musical instruments without formal instruction by synthesizing bits of knowledge from commercial method books, other young performers, and their own experimentation" (Location 738).

In short, the search for activities using Friedlander and Fine's (2016) GECA holds great potential to shed light on the role of different types of training in explaining expertise across a wide range of domains, including not only domains traditionally studied in expertise research (e.g., chess, sports, and classical music) but also those that have received relatively little attention in research.

#### CONCLUSION

fpsyg-11-01134 August 17, 2020 Time: 15:32 # 17

For decades, the field of expertise has focused on environmental factors as the major determinants of individual differences in expertise, whereas genetically influenced factors are assumed to play a relatively unimportant role, if any role at all (Ericsson et al., 1993; Ericsson, 2007a). Environmental factors certainly are important to consider in investigating the origins of individual differences in expertise, but a comprehensive scientific theory of expertise must take into account genetically influenced factors as well, including basic abilities and capacities ("talent"). At a more general level, we argue that it is time—past time—for the nature vs. nurture debate to be over in the field of expertise, as it has been in most areas of psychological research for decades (Turkheimer, 2000). Embracing the idea that expertise can be best understood as a product of gene-environment interplay (nature and nurture) will, as Plomin (2018) recently observed, move the field ahead and integrate it with the life sciences. At a practical level, findings from this research will provide a scientific foundation for principles and procedures designed and implemented to

#### REFERENCES


accelerate people's acquisition of complex skills across a wide range of domains and elevate the performance of individuals, organizations, and societies.

#### REFLECTION

As we were editing the page proofs for this article, we received the sad news of Anders Ericsson's passing. It is difficult to imagine the field of expertise without Anders—he was a pioneer. But his ideas will live on and continue to inspire scholarship and debate that will lead to greater understanding of the subject about which he was so passionate. We hope that Anders found our work as stimulating as we found his. He forced us to think critically about our most basic assumptions concerning expertise, and to try to put our best case for our perspective forward. We are in his debt, and extend our sincere condolences to his family, friends, students, and colleagues.

#### AUTHOR CONTRIBUTIONS

DH was the primary author and conceived the article. BM and FO provided extensive comments and edits on multiple drafts. All authors contributed to the article and approved the submitted version.

ability and chess skill: a comprehensive meta-analysis. Intelligence 71, 92–96. doi: 10.1016/j.intell.2018.08.004



for Giftedness and Talent Development, eds T. Cross and P. Olszewski-Kubilius (Dallas, TX: Prufrock Academic Press).


Popper, K. R. (1959). The Logic of Scientific Discovery. London: Routledge Classics.



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Hambrick, Macnamara and Oswald. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# APPENDIX

fpsyg-11-01134 August 17, 2020 Time: 15:32 # 21