# COGNITIVE AND BRAIN PLASTICITY INDUCED BY PHYSICAL EXERCISE, COGNITIVE TRAINING, VIDEO GAMES AND COMBINED INTERVENTIONS

EDITED BY : Soledad Ballesteros, Claudia Voelcker-Rehage and Louis Bherer PUBLISHED IN : Frontiers in Human Neuroscience

#### Frontiers Copyright Statement

© Copyright 2007-2018 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88945-507-2 DOI 10.3389/978-2-88945-507-2

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# COGNITIVE AND BRAIN PLASTICITY INDUCED BY PHYSICAL EXERCISE, COGNITIVE TRAINING, VIDEO GAMES AND COMBINED INTERVENTIONS

Topic Editors:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia, Spain Claudia Voelcker-Rehage, Chemnitz University of Technology, Germany Louis Bherer, Université de Montréal, Canada

Image: GiroScience/Shutterstock.com

The premise of neuroplasticity on enhancing cognitive functioning among healthy as well as cognitively impaired individuals across the lifespan, and the potential of harnessing these processes to prevent cognitive decline attract substantial scientific and public interest. Indeed, the systematic evidence base for cognitive training, video games, physical exercise and other forms of brain stimulation such as entrain brain activity is growing rapidly. This Research Topic (RT) focused on recent research conducted in the field of cognitive and brain plasticity induced by physical activity, different types of cognitive training, including computerized interventions, learning therapy, video games, and combined intervention approaches as well as other forms of brain stimulation that target brain activity, including electroencephalography and neurofeedback. It contains 49 contributions to the topic, including Original Research articles (37), Clinical Trials (2), Reviews (5), Mini Reviews (2), Hypothesis and Theory (1), and Corrections (2).

Citation: Ballesteros, S., Voelcker-Rehage, C., Bherer, L., eds. (2018). Cognitive and Brain Plasticity Induced by Physical Exercise, Cognitive Training, Video Games and Combined Interventions. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-507-2

# Table of Contents

*08 Editorial: Cognitive and Brain Plasticity Induced by Physical Exercise, Cognitive Training, Video Games, and Combined Interventions* Soledad Ballesteros, Claudia Voelcker-Rehage and Louis Bherer

#### PART 1

#### COGNITIVE AND BRAIN TRAINING IN YOUNG AND OLDER ADULTS


Marlen Schmicker, Patrick Müller, Melanie Schwefel and Notger G. Müller

*48 Have Standard Tests of Cognitive Function Been Misappropriated in the Study of Cognitive Enhancement?* Iseult A. Cremen and Richard G. Carson

# SECTION 1 COGNITIVE TRAINING

*57 Reading Aloud and Solving Simple Arithmetic Calculation Intervention (Learning Therapy) Improves Inhibition, Verbal Episodic Memory, Focus Attention and Processing Speed in Healthy Elderly People: Evidence From a Randomized Controlled Trial*

Rui Nouchi, Yasuyuki Taki, Hikaru Takeuchi, Takayuki Nozawa, Atsushi Sekiguchi and Ryuta Kawashima


Patrick D. Gajewski, Gabriele Freude and Michael Falkenstein

*125 Spatial Frequency Training Modulates Neural Face Processing: Learning Transfers from Low- to High-Level Visual Features* Judith C. Peters, Carlijn van den Boomen and Chantal Kemner

*134 Improving Dorsal Stream Function in Dyslexics by Training Figure/Ground Motion Discrimination Improves Attention, Reading Fluency, and Working Memory*

Teri Lawton

*150 Sex Differences in Gray Matter Volume of the Right Anterior Hippocampus Explain Sex Differences in Three-Dimensional Mental Rotation* Wei Wei, Chuansheng Chen, Qi Dong and Xinlin Zhou

#### SECTION 2

#### TRAINING WITH VIDEO GAMES


Pilar Toril, José M. Reales, Julia Mayas and Soledad Ballesteros

*213 Music Games: Potential Application and Considerations for Rhythmic Training*

Valentin Bégel, Ines Di Loreto, Antoine Seilles and Simone Dalla Bella

#### SECTION 3

#### COGNITIVE TRAINING IN PATIENTS

*220 Computer-Based Cognitive Training for Executive Functions After Stroke: A Systematic Review* Renate M. van de Ven, Jaap M. J. Murre, Dick J. Veltman

and Ben A. Schmand

*247 Cognitive Training for Post-Acute Traumatic Brain Injury: A Systematic Review and Meta-Analysis*

Harry Hallock, Daniel Collins, Amit Lampit, Kiran Deol, Jennifer Fleming and Michael Valenzuela

#### SECTION 4

#### MEDITATION AND MINDFULLNESS


Yusuf O. Cakmak, Gazanfer Ekinci, Armin Heinecke and Safiye Çavdar

# PART 2

## NEUROFEEDBACK AND MORE


Justin Hudak, Friederike Blume, Thomas Dresler, Florian B. Haeussinger, Tobias J. Renner, Andreas J. Fallgatter, Caterina Gawrilow and Ann-Christine Ehlis

*317 Computer Enabled Neuroplasticity Treatment: A Clinical Trial of a Novel Design for Neurofeedback Therapy in Adult ADHD*

Benjamin Cowley, Édua Holmström, Kristiina Juurmaa, Levas Kovarskis and Christina M. Krause

*330 Beware: Recruitment of Muscle Activity by the EEG-Neurofeedback Trainings of High Frequencies*

Katarzyna Paluch, Katarzyna Jurewicz, Jacek Rogala, Rafał Krauz, Marta Szczypin´ska, Mirosław Mikicin, Andrzej Wróbel and Ewa Kublik

*341 The Posterior Parietal Cortex Subserves Precise Motor Timing in Professional Drummers*

Bettina Pollok, Katharina Stephan, Ariane Keitel, Vanessa Krause and Nora K. Schaal

# PART 3

# FITNESS/EXERCISE EFFECTS ON COGNITION AND BRAIN

*352 No Evidence That Short-Term Cognitive or Physical Training Programs or Lifestyles are Related to Changes in White Matter Integrity in Older Adults at Risk of Dementia*

Patrick Fissler, Hans-Peter Müller, Olivia C. Küster, Daria Laptinskaya, Franka Thurm, Alexander Woll, Thomas Elbert, Jan Kassubek, Christine A. F. von Arnim and Iris-Tatjana Kolassa


Maike M. Kleemeyer, Thad A. Polk, Sabine Schaefer, Nils C. Bodammer, Lars Brechtel and Ulman Lindenberger

*398 The Impact of Aerobic Exercise on Fronto-Parietal Network Connectivity and its Relation to Mobility: An Exploratory Analysis of a 6-Month Randomized Controlled Trial*

Chun L. Hsu, John R. Best, Shirley Wang, Michelle W. Voss, Robin G. Y. Hsiung, Michelle Munkacsy, Winnie Cheung, Todd C. Handy and Teresa Liu-Ambrose


Milos Dordevic, Anita Hökelmann, Patrick Müller, Kathrin Rehfeld and Notger G. Müller


Chelsea M. Stillman, Jamie Cohen, Morgan E. Lehman and Kirk I. Erickson

*514 Morphological and Functional Differences Between Athletes and Novices in Cortical Neuronal Networks*

Xiao-Ying Tan, Yan-Ling Pi, Jue Wang, Xue-Pei Li, Lan-Lan Zhang, Wen Dai, Hua Zhu, Zhen Ni, Jian Zhang and Yin Wu

*524 The Effects of Modified Constraint-Induced Movement Therapy in Acute Subcortical Cerebral Infarction*

Changshen Yu, Wanjun Wang, Yue Zhang, Yizhao Wang, Weijia Hou, Shoufeng Liu, Chunlin Gao, Chen Wang, Lidong Mo and Jialing Wu

*533 Cognitive Resources Necessary for Motor Control in Older Adults are Reduced by Walking and Coordination Training*

# Ben Godde and Claudia Voelcker-Rehage

#### SECTION 1 ACCUTE EXERCISE EFFECTS

*541 Acute Exercise Improves Motor Memory Consolidation in Preadolescent Children*

Jesper Lundbye-Jensen, Kasper Skriver, Jens B. Nielsen and Marc Roig

*551 Movement-Related Cortical Potential Amplitude Reduction After Cycling Exercise Relates to the Extent of Neuromuscular Fatigue* Jérôme Nicolas Spring, Nicolas Place, Fabio Borrani, Bengt Kayser and Jérôme Barral

#### PART 4

#### MULTI-DOMAIN INTERVENTIONS


Kristina Küper, Patrick D. Gajewski, Claudia Frieg and Michael Falkenstein


Normand Teasdale, Martin Simoneau, Lisa Hudon, Mathieu Germain Robitaille, Thierry Moszkowicz, Denis Laurendeau, Louis Bherer, Simon Duchesne and Carol Hudon

*631 Cognitive Flexibility Training: A Large-Scale Multimodal Adaptive Active-Control Intervention Study in Healthy Older Adults* Jessika I. V. Buitenweg, Renate M. van de Ven, Sam Prinssen, Jaap M. J. Murre and K. Richard Ridderinkhof

# Editorial: Cognitive and Brain Plasticity Induced by Physical Exercise, Cognitive Training, Video Games, and Combined Interventions

Soledad Ballesteros 1,2 \*, Claudia Voelcker-Rehage<sup>3</sup> and Louis Bherer 4,5,6

<sup>1</sup> Studies on Aging and Neurodegenerative Diseases Research Group, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain, <sup>2</sup> Department of Basic Psychology II, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain, <sup>3</sup> Institute of Human Movement Science and Health, Technische Universität Chemnitz, Chemnitz, Germany, <sup>4</sup> Department of Medicine, Université de Montréal, Montreal, QC, Canada, <sup>5</sup> Montreal Heart Institute, Montreal, QC, Canada, 6 Institut Universitaire de Gériatrie de Montréal, Montreal, QC, Canada

Keywords: cognitive training, lifespan, multi-domain intervention, neuroplasticity, physical training, randomized controlled trial, video games, working memory

**Editorial on the Research Topic**

#### **Cognitive and Brain Plasticity Induced by Physical Exercise, Cognitive Training, Video Games, and Combined Interventions**

This Research Topic (RT) focused on recent research conducted in the field of cognitive and brain plasticity induced by physical activity, different types of cognitive training, including computerized interventions, learning therapy, video games, and combined intervention approaches as well as other forms of brain stimulation that target brain activity, including electroencephalography and neurofeedback. It contains 49 contributions to the topic, including original research articles (37), clinical trials (2), reviews (5), mini-reviews (2), hypothesis and theory (1), and corrections (2).

The premise of neuroplasticity on enhancing cognitive functioning among healthy as well as cognitively impaired individuals across the lifespan, and the potential of harnessing these processes to prevent cognitive decline attract substantial scientific and public interest. Indeed, the systematic evidence base for cognitive training, video games, physical exercise, and other forms of brain stimulation such as entrain brain activity is growing rapidly, thus paving the way for research geared at better understanding the underlying mechanisms and translation to clinical practice (Raz and Lindenberger, 2013). Studies in this field might contribute to improve our knowledge on cognitive and brain plasticity and be of great help for designing effective cognitive-enhancement interventions (see Karbach and Schubert, 2013). It is well-known that brain plasticity and its role in brain adaptation across the lifespan are influenced by other changes occurring as a result of environmental factors, personality variables and genetic and epigenetic factors (see Ballesteros et al., 2015). To date, most cognitive training studies have focused on measuring gains immediately after training, typically demonstrating effects on the trained tasks or closely-related transfer measures (i.e., near transfer). Yet the potency of cognitive training depends on evidence of: (1) far transfer from training to untrained functions; (2) the durability of training effects, including what booster regimens are needed to maintain cognitive benefits in young and older adults; and (3) the extent to which cognitive training can affect clinically meaningful outcomes.

Researchers are increasingly using cognitive training platforms and video games to investigate its impact on cognition and brain plasticity. Video game play is a very popular leisure activity. An interesting preliminary question is why consumers choose to download smartphones applications

#### Edited by:

Shuhei Yamaguchi, Shimane University, Japan

Reviewed by: Hasan Ayaz, Drexel University, United States

> \*Correspondence: Soledad Ballesteros mballesteros@psi.uned.es

Received: 22 March 2018 Accepted: 11 April 2018 Published: 07 May 2018

#### Citation:

Ballesteros S, Voelcker-Rehage C and Bherer L (2018) Editorial: Cognitive and Brain Plasticity Induced by Physical Exercise, Cognitive Training, Video Games, and Combined Interventions. Front. Hum. Neurosci. 12:169. doi: 10.3389/fnhum.2018.00169 (apps) offering brain training. Torous et al. noticed that there is much interest in brain training apps among US younger people. Results from an online via internet survey with more than 3,000 participants suggest a high level of interest in these sort of programs. However, the data pointed out to the importance of expectations as both naïve participants as well as applicationexposed participants showed a positive perception of brain training. These results suggest that people expect to improve with the use of brain-training apps. Importantly, brain training should focus on scientific research efficacy and generalizable benefits not in expectations of improvement (see Torous et al. correction).

Training with video games has shown to enhance moderately perceptual and cognitive functions in young and older individuals (for meta-analyses, see Lampit et al., 2014; Toril et al., 2014). Video games are inexpensive, gratifying and fun. Regular and occasional video game players reported significantly higher levels of well-being, but gaming could potentially lead to addiction, sedentary lifestyle and social isolation. Maximizing the benefits of video games will require studies dealing with questions such as: (1) what benefits should be expected from specific types of games; (2) how to account for individual differences; and (3) how to address expectancy bias and placebo effects in study designs. Physical activity has been repeatedly shown to improve cognitive functioning in all age groups, particularly in older adults. This Frontiers RT includes articles that investigate (1) the dose-response relationships of different types of exercise as well as the long-term effects of exercise on various cognitive domains, (2) the analysis of functional and structural as well as behavioral data, (3) the association between acute and chronic exercise effects, and (4) the potential of combining physical and cognitive exercise for enhancing cognitive performance and brain health.

In sum, the major aim of this RT is, therefore, to provide the interested reader with an objective picture of the state of the art on the influence of different types of interventions on cognition and brain state across the lifespan. Special emphasis is placed mainly in randomized controlled trials and longitudinal intervention studies conducted to assess possible transfer effects to cognitive and brain health in older adults.

# COGNITIVE TRAINING IN YOUNG AND OLDER ADULTS WITH TRADITIONAL METHODS, COMPUTER-BASED ACTIVITIES, AND VIDEO GAMES

Several articles included in this Frontiers RT are longitudinal studies that have used different types of cognitive training programs to improve different aspects of cognition, including working memory, executive control and other cognitive functions in young and older adults. Other series of articles in this issue dealt with neurophysiological (ERPs) and neurofeedback (NFB) methods as a form of brain stimulation.

In their hypothesis and theory paper, Moreau et al. critically discuss pervasive statistical flaws in intervention designs for training-induced cognitive enhancement. That is (i) lack of power; (ii) sampling error; (iii) continuous variable splits; (iv) erroneous interpretations of correlated gain scores; (v) single transfer assessments; (vi) multiple comparisons; and (vii) publication bias. Similarly Cremen and Carson asked, "Have standard tests of cognitive function been misappropriated in the study of cognitive enhancement?" They argue that the latent constructs to which tests of cognitive functions relate are not usually subject to a sufficient level of analytic scrutiny. In addition they consider their neurophysiological correlates. Using linear mixed models, they found the efficacy of training and some maintenance effects. The individual characteristics considered contributed in some cases to explain the effects of training.

Lawton compared three intervention conditions in dyslexic second grade school children. Two of them target temporal dynamics of either the auditory or visual pathways, and the third condition consisted of a reading control group. The results pointed to the lack of synchronization of the activity of the magnocellular with the parvocellular visual pathway as the main cause of dyslexia, not the phonological deficits. Lawton proposed that visual movement direction-discrimination could be used as a tool for the successful treatment of this deficit Nouchi et al.

Nouchi et al. conducted a RCT to investigate whether the "Learning Therapy" (LT) improved a wide range of cognitive functions in older adults. They found benefits of LT on inhibition of executive functions. The article of Borella et al. explored in older adults whether age, education, vocabulary and baseline performance in a working memory task predict the short- and long-term gains and transfer effects of a verbal WM training. The authors selected four studies of the research group that adopted the verbal WM training procedure. Using linear mixed models, they found the efficacy of training and some maintenance effects. The individual characteristics considered contributed in some cases to explain the effects of training. Maraver et al. provided computerized training to groups of young adults to investigate the effects of training on WM or inhibitory control (IC) as compared to two-control groups, one passive and the other active, which performed non-executive control tasks. The trained groups improved in the trained task. More important were the pattern of near transfer effects as a function of the type of training. Only the IC group showed far transfer to reasoning (Raven test). Interestingly, these findings were obtained with just six training sessions. Schmicker et al. asked whether training young adults either attentional filtering or memory storage would influence decision-making assessed with the Iowa Gambling Task. All participants improved their performance in the trained task but decision-making was more influenced by training to filter out irrelevant information than by training to store items in WM. It seems that selective attention is more important for enhancing efficiency in decision-making.

Video games are perhaps the most popular computerized intervention approach to train different aspects of cognition. So far, the results of intervention studies are mixed with some studies reporting improvements after training in several aspects of cognition (e.g., Basak et al., 2008; Anguera et al., 2013; Ballesteros et al., 2014) while others have not found positive effects (e.g., Ackerman et al., 2010; Owen et al., 2010; Boot et al., 2013). Palaus et al. reviewed the relationship between the use of video games and their neural correlates. The final selection included 100 articles that provided functional data and 22 that measured structural brain changes. The authors established some links between the neural and cognitive aspects, including attention, cognitive control, visuospatial abilities, cognitive workload, and reward processing.

Toril et al. conducted an intervention study with experimental and control groups to investigate whether cognitively healthy older adults trained with non-action video games improve visuospatial working memory and episodic memory and whether these possible enhancements would persist 3 months after the end of training. The group trained with games showed posttraining improvements in visuospatial working memory, and in short-term memory and episodic memory. Some results were maintained during a 3-month follow-up period. The authors concluded that older adults still retain some degree of plasticity and that video games seem to be an effective tool to improve some memory functions in aging. In contrast, the RCT conducted by Buitenweg et al. with healthy older adults assigned to a frequent or to an infrequent switching experimental condition, or to the active control group showed significant time effects on multiple transfer tasks in all three groups, probably as a result of expectancy and motivation. The authors concluded that the therapeutic value of using available training games to train the aging brain is modest. Possibly the use of different methods such as stimulating social interaction and training in groups by individual-adjusted variable exercises would produce better results.

Another type of training that becomes more popular involves the use of music or rhythm within a cognitive or video game setup. For example, rhythmic training has been successfully used to improve motor performance (e.g., gait) as well as cognitive and language skills. Begel et al. reviewed the games readily available in the market in order to see if it can be used for cognitive training in populations with motor or neurodevelopmental disorders (e.g., Parkinson's disease, ADHD). They concluded that none of the existing games provides sufficient temporal precision in stimulus presentation and/or data acquisition and that the available music games are not satisfying for implementing a rhythmic training protocol. The authors also provide guidelines for serious music games targeting rhythmic training in the future. The rational for rhythmic training is supported by some studies. An example is the study of Pollok et al. that used transcranial direct current stimulation to study potential superior synchronization in professional drummers compared to non-musician controls. Their data support the hypothesis that the posterior parietal cortex is involved in auditory-motor synchronization and extend previous findings by showing that its functional significance varies with musical expertise.

Effects of working memory training and training-related alterations in neural activity on dual-tasking in older adults were investigated by Heinzel et al. The training group participated in 12 sessions of an adaptive n-back training. At pre and postmeasurement, a multimodal dual-task was performed in all participants to assess transfer effects. While no transfer to singletask performance was found, dual-task costs decreased at post measurement in the training, but not in the control group. Neural activity that changed in left dorsolateral prefrontal cortex (DLPFC) during one-back predicted post-training auditory dual-task costs, while neural activity changes in right DLPFC during three-back predicted visual dual-task costs. Results might indicate an improvement in central executive processing that could facilitate both working memory and dual-task coordination.

Teasdale et al. investigated whether individuals with MCI can benefit from a training program and improve their overall driving performance in a driving simulator. Therefore, older drivers with MCI participated in five training sessions in a simulator. They revealed gradual and significant decrease in the number of errors, indicating learning and safer driving and therewith the possibility to maintain driving skills and safe driving in MCI individuals. Another clinical population with which cognitive training is often used is patients that suffered from stroke, which often results in cognitive impairments in working memory, attention, and executive function. Van de Ven et al. conducted a systematic review of the evidence for computer-based cognitive training of executive dysfunctions after stroke. They reported that cognitive training could lead to improvement in tasks similar to the training (near transfer) and in tasks dissimilar to the training (far transfer). Studies evaluated neural effects and found changes in both functional and structural connectivity. The authors concluded that for most studies reporting positive findings, including neural changes, future research should address existing methodological limitations.

# COMBINED MULTI-DOMAIN INTERVENTIONS

Some studies looked at the effect of combining multi-domain intervention, such as different cognitive domains or cognitive intervention combined to fitness training regimes. For instance, Fraser et al. examined the dual-task benefits of combined physical and cognitive training in a sample of sedentary older adults, but failed to demonstrate a more beneficial effect of a combined physical and cognitive training as compared to physical and computer training combined with a respective control group (stretch and cognitive/computer training). Küper et al. used electrophysiology (ERP) to examine the effects of multi-domain cognitive training on performance in an untrained cue-based task switch paradigm featuring Stroop color words. Older adults were assigned to either a 4-month multi-domain cognitive training, a passive no-contact control group or an active (social) control group. Only the cognitive training group showed an increase in response accuracy at posttest, irrespective of task and trial type. Cognitive training was also associated with an overall increase in N2 amplitude and a decrease of P2 latency on single trials suggesting enhanced response selection and improved access to relevant stimulus-response mappings.

Recently, movement based video games (exergames) have been introduced to have the capability to improve cognitive function in older adults. During exergaming, participants are required to perform physical activities while being simultaneously surrounded by a cognitively challenging environment. Ordnung et al. investigated the effects of an exergame training over 6 weeks on cognitive, motor, and sensory functions in healthy old participants. However, gains in the trained exergames did not result in specific performance improvements.

# NEUROFEEDBACK (NFB) AND ELECTROPHYSIOLOGICAL STUDIES

Neurofeedback (NFB) is becoming a popular method aimed at improving cognitive and behavioral performance and it is used as a treatment intervention. The goal of this type of intervention is to induce changes in the power of certain electroencephalography (EEG) bands to produce beneficial changes in cognition and motor activity. Rogala et al. reviewed the evidence that support the validity of several NFB protocols. The article highlights that the methodology used in most of the reviewed experiments did not enable proper targeting of the brain regions that control the desired cognitive changes and made a series of recommendations for improving the NFB training efficacy. Paluch et al. conducted an experiment using EEG-NFB with young adults and conclude that the activity from the EEG electrodes might be overwhelmed by the easier to control electro-miographyc signals. The authors advice the NFB community to develop, validate and implement efficient automatic artifact detection algorithms. Cowley et al. presented the results of a RCT intervention that used NFB therapy for adults with Attention Deficit/Hyperactivity Disorder (ADHD/ADD). Preliminary results suggest that NFB improved self-reported ADHD symptoms, but did not show transfer of leaning in a computerized attention test (T.O.V.A.). Hudak et al. reported an effect of NFB in reducing impulsive behavior via the strengthening of frontal lobe functioning. The randomized, controlled functional near-infrared spectroscopy (fNIRS) NFB intervention study tested its efficacy in an ADHD adult subgroup with high impulsivity. The reduction in the commission of errors on a no-go task, and an increase in prefrontal oxygenated hemoglobin concentration in the experimental group only, together with other findings suggest the potential of NFB in reducing impulsive behaviors by improving frontal lobe functioning. In addition, the authors argued that the use of virtual reality with NFB might improve the ecological validity of the training situation. This could affect positively transfer of the just acquired skills to real life.

Peters et al. investigated in an event–related potential (ERP) study conducted with young adults whether spatial frequency training modulates neural face perception. Interestingly, the authors showed that training effects based on a task using lowlevel stimuli, transfer to performance on a higher-level objectprocessing task. Interestingly, training the use of specific spatial frequency information was found to affect neural processing of facial information. These findings may have practical application to improve face recognition in people with atypical spatial frequency processing, including people suffering from cataracts or Autism Spectrum Disorder.

Gajewski et al. conducted a RCT to investigate the deficits in performance and EEG activity in workers with repetitive and unchallenging work. The results showed that the 3-month training protocol improved accuracy performance and affected the electrophysiological correlates of retrieval of stimulusresponse sets (P2), response selection (N2), and error detection (Ne). Importantly, at the 3-month follow-up assessment most of the induced changes at the behavioral and EEG levels were maintained. It appears that cognitive training is useful for improving executive functions in workers with unchallenging work.

Wei et al. investigated whether structural differences in the hippocampus could explain sex difference in a 3D mental rotation task. Males had a larger anterior hippocampus and gray matter volume of the right anterior hippocampus was significantly correlated with performance in a 3D mental rotation task and could explain sex differences in mental rotation. Results suggest that the structural difference between males' and females' right anterior hippocampus was a neurobiological substrate for the sex difference in 3D mental rotation.

Hallock et al. presented the results of a meta-analysis that included 14 published studies that investigated the effects of cognitive training on measures of cognition and function measures in patients that have suffered traumatic brain injury (TBI). They showed a small but significant effect of cognitive training with no evidence of publication bias and a moderate effect size for overall functional outcomes and possible publication bias. In addition, the authors found significant effects only for executive function and verbal memory. The findings indicated that training is moderately effective in improving cognition and function in TBI patients.

# FITNESS EFFECTS ON COGNITION AND BRAIN ANATOMY

It is well-accepted that an active cognitive and physical lifestyle can reduce the risk of cognitive decline and dementia with aging (Valenzuela and Sachdev, 2006; Ngandu et al., 2015). Physical training promotes cognitive and functional brain plasticity in older adults (Nagamatsu et al., 2012; Kattenstroth et al., 2013). An important question is whether cognitive and physical training may increase white matter integrity (WMI), which is deteriorated in cognitive impaired patients. Fissler et al. did not find evidence that short-term cognitive or physical training were related to changes in WMI (hippocampus and prefrontal white matter tracks) in older adults at risk of dementia despite activityrelated cognitive changes. However, the authors found positive associations between the two targeted training outcomes and WMI. This result opens the path for a potential of long-term activities to affect WMI. The article of Fletcher et al. tested the hypothesis that if cardiorespiratory fitness counteracts the negative effects of aging; the regions that show the greatest agerelated volumetric loss should show the largest beneficial effects of physical exercise. The structural magnetic resonance imaging (MRI) data from 54 healthy elders showed that lower fitness and older age are associated with atrophy in different brain areas, but the profiles of age and fitness effects were not totally overlapping. While brain areas such as the precentral gyrus, the superior temporal sulcus, and some parts of the medial temporal lobe were affected by both fitness and aging, other areas (regions of the frontal, parietal, and temporal cortex) were only affected by aging while other brain regions in the basal ganglia were only affected by fitness. These results support the idea that aging and fitness have differential effects on the brain and leads to the conclusion that fitness couldn't revert all the negative effects of aging.

Wengaard et al. investigated the association of physical fitness, measured as maximal oxygen uptake (VO2max), muscle mass, weekly training, and cognitive function in the executive domains of selective attention and inhibitory control, in healthy male high-school students. Only maximal oxygen uptake was positively associated with cognitive function. Kleemeyer et al. addressed the theme of neural specificity (understood as the degree to which neural representations of different types of stimuli can be distinguished). Neural specificity declines with aging and its reduction is associated with lower cognitive performance (Park et al., 2010). The authors concluded that physical activity might protect against age-related declines in neural specificity. The study tested the hypothesis that exerciseinduced improvements in fitness would be related to greater neural specificity in a group of 52 older adults randomly assigned to a high-intensity training group or to a low-intensity training group. The hypothesis was confirmed by the results of an fMRI experiment in which the participants were presented with pictures of faces and buildings. Participants whose physical fitness improved more also showed more changes in neural specificity. Hsu et al. conducted a 6-month RCT to investigate the impact of aerobic exercise (AE) training in older adults on frontoparietal network connectivity. The results suggested that AT improve mobility in older adults with mild subcortical ischemic cognitive impairments (see Hsu et al. correction).

Rehfeld et al. compared the effects of an 18-month dancing intervention and traditional health fitness training on volumes of hippocampal subfields and balance abilities. Both, members of the dance-intervention and members of the fitness intervention revealed hippocampal volume increases mainly in the left hippocampus. The dancers showed additional increases in the left dentate gyrus and the right subiculum. Moreover, only the dancers achieved a significant increase in the balance composite score. Hence, dancing constitutes a promising candidate in counteracting the age-related decline in physical and mental abilities. Kandola et al. performed a review on how aerobic exercise is associated with cognitive enhancements and stimulates a cascade of neuroplastic mechanisms that support improvements in hippocampal functioning. Therefore, they summarized the animal and human literature. Using the examples of schizophrenia and major depressive disorder, they proposed the utility and implementation of an aerobic intervention to the clinical domain.

Panda et al. investigated how meditation alters the default mode network (DMN) using simultaneous EEG and functional MRI to compare the spatial extents and temporal dynamics of the DMN during rest and meditation. They found alterations in the duration of the DMN microstate in meditators highlighting the role of meditation practice in producing durable changes in temporal dynamics of the DMN. Su et al. also studied the effect of mindfulness training and its modulation on pain perception. They compare participants' brain-behavior response before and after a 6-week mindfulness-based stress reduction (MBSR) training course on mindfulness in relation to pain modulation using pain questionnaires and resting-state fMRI. They observed that the pain-afflicted group experienced significantly less pain after the mindfulness treatment than before, in conjunction with increased brain connectivity. These results suggest that mindfulness training can modulate the brain network dynamics underlying the subjective experience of pain. Cakmak et al. investigated the potential structural cortical plasticity in Sufi Whirling Dervishes, a form of physically active meditation. Results demonstrated significantly thinner cortical areas for Sufi Whirling Dervishes subjects compared with the control group in the DMN as well as in motion perception and discrimination areas of the brain.

Dordevic et al. conducted a feasibility study to assess the effect of 1-month of slackline-training on different components of balancing ability and its transfer effects on non-visual-dependent spatial orientation abilities. The training group performed significantly better on the closed-eyes conditions of the clinical balance test and in the vestibular condition of the orientation test, probably caused by a positive influence of slackline-training on the vestibular system function. Beck et al. investigated whether and how academic achievements in children can benefit from specific types of motor activities (e.g., fine and gross) integrated in to learning activities, here math lessons. They conducted a 6 week within school cluster-randomized intervention study. The study demonstrates that motor enriched learning activities can improve mathematical performance, particularly in normal math performers (but less in low math performers).

Godde and Volcker-Rehage conducted an intervention study to investigate whether a walking intervention and a motor control intervention reduce the cognitive brain resources that people recruited while performing motor tasks. Brain activation was assessed pre and post-intervention while the participants were imagining forward and backward walking. The results showed a positive association between initial motor status and activation decrease in the dorsolateral prefrontal cortex from pre-to-post assessment in both trained groups, suggesting that training effects might improve situations where people have to perform motor and cognitive tasks at the same time.

Condello et al. performed a cross-sectional study and investigated whether physical activity (PA) habits may positively impact performance of the orienting and executive control networks in community-dwelling older adults and diabetics, who are at risk of cognitive dysfunction. Results suggest that high PA levels exert beneficial, but differentiated effects on processing speed and attentional networks performance in aging individuals that partially counteract the detrimental effects of advancing age and diabetic status. Nadeau et al. assessed the impact of a 3-month aerobic exercise training using a stationary bicycle on a set of gait parameters and executive functions in sedentary Parkinson's disease (PD) patients and healthy controls. Aerobic capacity, as well as performance of motor learning and on cognitive inhibition, increased significantly in both groups after the training regimen, but only PD patients improved their walking speed and cadence. In PD patients, training-related improvements in aerobic capacity correlated positively with improvements in walking speed. Gait improvements seem to be specific to the type of motor activity practiced during exercise (i.e., pedaling), where improvements in cognitive inhibition were rather unspecific to the type of training (i.e., improvement of cardiovascular capacity).

Tan et al. investigated the cortical structural and functional differences in athletes and novices by comparing gray matter volumes and resting-state functional connectivity in 21 basketball players and 21 novices with MRI techniques. They found larger gray matter volume in basketball players than in novices in many regions (anterior insula, inferior frontal gyrus, inferior parietal lobule, and right anterior cingulate cortex). They also reported higher functional connectivity in the DMN, salience network and executive control network in basketball players compared to novices. Yu et al. investigated the effects of modified constraint induced movement therapy (CIMT) in acute subcortical cerebral infarction. The results showed positive effects immediatelly after treatment but long-term effects were not found.

In a very interesting review, Stillman et al. used a macroscopic lens to identify potential brain and behavioral/socioemotional mediators of the association between physical activity and cognitive function. They first summarized what is known regarding cellular and molecular mechanisms, and then discussed evidence for brain systems and behavioral/socioemotional pathways by which physical activity could impact cognition. The review proposes a number of potential moderator and mediators of the relationship between physical activity and cognitive performances. This review offers a theoretical context that could be useful to organize the current scientific knowledge regarding physical activity and brain structure and functions. It could also lead to a more complete characterization of the processes by which physical activity influences neurocognitive function, as well as a greater variety of targets for modifying neurocognitive function in clinical contexts.

## REFERENCES


Finally, some studies adopted the approach of studying acute exercise effects. For instance, Spring et al. investigation revealed that cardiovascular exercise could reduce movement related cortical potentials assessed with EEG while performing a knee extension task, which was related to muscle alterations and resulted in the inability to produce a maximal voluntary contraction.

Lundbye-Jensen et al. investigated whether acute exercise protocols following motor skill practice in a school setting can improve long-term retention of motor memory in preadolescent children and were able to show that acute intense intermittent exercise performed immediately after motor skill acquisition facilitates long-term motor memory in pre-adolescent children, presumably by promoting memory consolidation.

To summarize, the series of review and research articles that compose this Frontiers RT provide comprehensive information on the importance of different types of interventions as a way of enhancing some cognitive functions across the lifespan. We hope that this RT will prompt a critical "thinking" in the context of the scientific community on the possibilities of improving cognition and well-being, providing clues for conducting further research intervention studies in the next years. We also hope that the information included in this RT will move the scientific community to generate new research projects directed to overcome some shortcomings appearing in the field of neuroplasticity, promoting cognitive enhancement and improving a the quality of life of young and older adults.

# AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

## ACKNOWLEDGMENTS

Grants from Ministerio de Economía y Competitividad (PSI2013- 41409-R; PSI2016-80337-R) supported SB.

cognitive decline: attitudes, compliance and effectiveness. Front. Psychol. 4:31. doi: 10.3389/fpsyg.2013.00031


decline in at-risk elderly people (FINGER). A randomized controlled trial. Lancet 385, 2255–2263. doi: 10.1016/S0140-6736(15)60461-5


Valenzuela, M. J., and Sachdev, P. (2006). Brain reserve and cognitive decline: a non-parametric systematic review. Psychol. Med. 36, 1065–1073. doi: 10.1017/S0033291706007747

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Ballesteros, Voelcker-Rehage and Bherer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Seven Pervasive Statistical Flaws in Cognitive Training Interventions

#### David Moreau\*, Ian J. Kirk and Karen E. Waldie

Centre for Brain Research and School of Psychology, University of Auckland, Auckland, New Zealand

The prospect of enhancing cognition is undoubtedly among the most exciting research questions currently bridging psychology, neuroscience, and evidence-based medicine. Yet, convincing claims in this line of work stem from designs that are prone to several shortcomings, thus threatening the credibility of training-induced cognitive enhancement. Here, we present seven pervasive statistical flaws in intervention designs: (i) lack of power; (ii) sampling error; (iii) continuous variable splits; (iv) erroneous interpretations of correlated gain scores; (v) single transfer assessments; (vi) multiple comparisons; and (vii) publication bias. Each flaw is illustrated with a Monte Carlo simulation to present its underlying mechanisms, gauge its magnitude, and discuss potential remedies. Although not restricted to training studies, these flaws are typically exacerbated in such designs, due to ubiquitous practices in data collection or data analysis. The article reviews these practices, so as to avoid common pitfalls when designing or analyzing an intervention. More generally, it is also intended as a reference for anyone interested in evaluating claims of cognitive enhancement.

data analysis, statistics, experimental design

#### Edited by:

Claudia Voelcker-Rehage, Technische Universität Chemnitz, Germany

#### Reviewed by:

Tilo Strobach, Medical School Hamburg, Germany Florian Schmiedek, German Institute for International Educational Research, Germany

#### \*Correspondence:

David Moreau d.moreau@auckland.ac.nz

Received: 10 February 2016 Accepted: 28 March 2016 Published: 14 April 2016

#### Citation:

Moreau D, Kirk IJ and Waldie KE (2016) Seven Pervasive Statistical Flaws in Cognitive Training Interventions. Front. Hum. Neurosci. 10:153. doi: 10.3389/fnhum.2016.00153 Keywords: brain enhancement, evidence-based interventions, working memory training, intelligence, methods,

# INTRODUCTION

Can cognition be enhanced via training? Designing effective interventions to enhance cognition has proven one of the most promising and difficult challenges of modern cognitive science. Promising, because the potential is enormous, with applications ranging from developmental disorders to cognitive aging, dementia, and traumatic brain injury rehabilitation. Yet difficult, because establishing sound evidence for an intervention is particularly challenging in psychology: the gold standard of double-blind randomized controlled experiments is not always feasible, due to logistic shortcomings or to common difficulties in disguising the underlying hypothesis of an experiment. These limitations have important consequences for the strength of evidence in favor of an intervention. Several of them have been extensively discussed in recent years, resulting in stronger, more valid, designs.

For example, the importance of using active control groups has been underlined in many instances (e.g., Boot et al., 2013), helping conscientious researchers move away from the use of no-contact controls, standard in a not-so-distant past. Equally important is the emphasis on objective measures of cognitive abilities rather than self-report assessments, or on the necessity to use multiple measurements of single abilities to provide better estimates of cognitive constructs and minimize measurement error (Shipstead et al., 2012). Other limitations pertinent to training designs have been illustrated elsewhere with simulations (e.g., fallacious assumptions, Moreau and Conway, 2014; biased samples, Moreau, 2014b), in an attempt to illustrate visually some of the discrepancies observed in the literature. Indeed, much knowledge can be gained by incorporating simulated data to complex research problems (Rubinstein and Kroese, 2011), either because they are difficult to visualize or because the representation of their outcomes is ambiguous. Intervention studies are no exception—they often include multiple extraneous variables, and thus benefit greatly from informed estimates about the respective influence of each predictor variable in a given model. As it stands, the approach typically favored is that good experimental practices (e.g., random assignment, representative samples) control for such problems. In practice, however, numerous designs and subsequent analyses do not adequately allow such inferences, due to single or multiple flaws. We explore here some of the most prevalent of these flaws.

Our objective is three-fold. First, we aim to bring attention to core methodological and statistical issues when designing or analyzing training experiments. Using clear illustrations of how pervasive these problems are, we hope to help design better, more potent interventions. Second, we stress the importance of simulations to improve the understanding of research designs and data analysis methods, and the influence they have on results at all stages of a multifactorial project. Finally, we also intend to stimulate broader discussions by reaching wider audiences, and help individuals or organizations assess the effectiveness of an intervention to make informed decisions in light of all the evidence available, not just the most popular or the most publicized information. We strive, throughout the article, to make every idea as accessible as possible and to favor clear visualizations over mathematical jargon.

A note on the structure of the article. For each flaw we discuss, we include three steps: (1) a brief introduction to the problem and a description of its relation to intervention designs; (2) a Monte Carlo simulation and its visual illustration<sup>1</sup> ; and (3) advice on how to circumvent the problem or minimize its impact. Importantly, the article is not intended to be an indepth analysis of each flaw discussed; rather, our aim is to help visual representations of each problem and provide the tools necessary to assess the consequences of common statistical procedures. However, because the problems we discuss hereafter are complex and deserve further attention, we have referred the interested reader to additional literature throughout the article.

A slightly more technical question pertains to the use of Monte Carlo simulations. Broadly speaking, Monte Carlo methods refer to the use of computational algorithms to simulate repeated random sampling, in order to obtain numerical estimates of a process. The idea that we can refine knowledge by simulating stochastic processes repeatedly rather than via more traditional procedures (e.g., direct integration) might be counterintuitive, yet this method is well suited to the specific examples we are presenting here for a few reasons. Repeated stochastic simulations allow creating mathematical models of ecological processes: the repetition represents research groups, throughout the world, randomly sampling from the population and conducting experiments. Such simulations are also particularly useful in complex problems where a number of variables are unknown or difficult to assess, as they can provide an account of the values a statistic can take when constrained by initial parameters, or a range of parameters. Finally, Monte Carlo simulations can be clearly represented visually. This facilitates the graphical translation of a mathematical simulation, thus allowing a discussion of each flaw with little statistical or mathematical background.

# LACK OF POWER

We begin our exploration with a pervasive problem in almost all experimental designs, particularly in training interventions: low statistical power. In a frequentist framework, two types of errors can arise at the decision stage in a statistical analysis: Type I (false positive, probability α) and Type II (false negative, probability β). The former occurs when the null hypothesis (H0) is true but rejected, whereas the latter occurs when the alternative hypothesis (HA) is true but the H<sup>0</sup> is retained. That is, in the context of an intervention, the experimental treatment was effective but statistical inference led to the erroneous conclusion that it was not. Accordingly, the power of a statistical test is the probability of rejecting H<sup>0</sup> given that it is false. The more power, the lower the probability of Type II errors, such that power is (1−β). Importantly, higher statistical power translates to a better chance of detecting an effect if it is exists, but also a better chance that an effect is genuine if it is significant (Button et al., 2013). Obviously, it is preferable to minimize β, which is akin to maximizing power.

Because α is set arbitrarily by the experimenter, power could be increased by directly increasing α. This simple solution, however, has an important pitfall: since α represents the probability of Type I errors, any increase will produce more false positives (rejections of H<sup>0</sup> when it should be retained) in the long run. Therefore, in practice experimenters need to take into account the tradeoff between Type I and Type II errors when setting α. Typically, α < β, because missing an existing effect (β) is thought to be less prejudicial than falsely rejecting H<sup>0</sup> (α); however, specific circumstances where the emphasis is on discovering new effects (e.g., exploratory approaches) sometimes justify α increases (for example, see Schubert and Strobach, 2012).

Discussions regarding experimental power are not new. Issues related to power have long been discussed in the behavioral sciences, yet they have drawn heightened attention recently (e.g., Button et al., 2013; Wagenmakers et al., 2015), for good reasons: when power is low, relevant effects might go

<sup>1</sup> Step (2) was implemented in R (R Core Team, 2014) because of its growing popularity among researchers and data scientists (Tippmann, 2015), and because R is free and open-source, thus allowing anyone, anywhere, to reproduce and build upon our analyses. We used R version 3.1.2 (R Core Team, 2014) and the following packages: ggplot2 (Wickham, 2009), gridExtra (Auguie, 2012), MASS (Venables and Ripley, 2002), MBESS (Kelley and Lai, 2012), plyr (Wickham, 2011), psych (Revelle, 2015), pwr (Champely, 2015), and stats (R Core Team, 2014).

undetected, and significant results often turn out to be false positives<sup>2</sup> . Besides α and β levels, power is also influenced by sample size and effect size (**Figure 1A**). The latter depends on the question of interest and the design, with various strategies intended to maximize the effect one wishes to observe (e.g., well-controlled conditions). Noise is often exacerbated in training interventions, because such designs potentially increase sources of non-sampling errors, for example via poor retention rates, failure to randomly assigned participants, use of nonstandardized tasks, use of single measures of abilities, or failure to blind participants and experimenters. Furthermore, the influence of multiple variables is typically difficult to estimate (e.g., extraneous factors), and although random assignment is usually thought to control for this limitation, it has been demonstrated repeatedly that such assumption is highly dependent on sample size, with typical designs being rarely satisfactory in this regard (Cohen, 1992b). As a result, the preferred solution to increase power is typically to adjust sample sizes (**Figure 1B**).

Power analyses are especially relevant in the context of interventions because sample size is usually limited by the design and its inherent costs—training protocols require participants to come back to the laboratory multiple times for testing and in some cases for the training regimen itself. Yet despite the importance of precisely determining power before an experiment, power analyses include several degrees of freedom that can radically change outcomes and thus recommended sample sizes (Cohen, 1992b). As informative as it may be, gauging the influence of each factor is difficult using power analyses in the traditional sense, that is, varying factors one at a time. This problem can be circumvented by Monte Carlo methods, where one can visualize the influence of each factor in isolation and in conjunction with one another.

<sup>2</sup>For a given α and effect size, low power results in low Positive Predictive Value (PPV), that is, a low probability that a significant effect observed in a sample reflects a true effect in the population. The PPV is closely related to the False Discovery Rate (FDR) mentioned in the section on multiple comparisons of this article, such that PPV + FDR = 1.

Suppose, for example, that we wish to evaluate the effectiveness of an intervention by comparing gain scores in experimental and control groups. Using a two-sample t-test with two groups of 20 subjects, and assuming α = .05 and 1−β = .80, an effect size needs to be of about d = 0.5 or greater to be detected, on average (**Figure 1C**). Any weaker effect would typically go undetected. This concern is particularly important when considering how conservative our example is: a power of .80 is fairly rare in typical training experiments, and an effect size of d = 0.5 is quite substantial—although typically defined as ''medium'' in the behavioral sciences (Cohen, 1988), an increase of half a standard deviation is particularly consequential in training interventions, given potential applications and the inherent noise of such studies.

We should emphasize that we are not implying that every significant finding with low power should be discarded; however, caution is warranted when underpowered studies coincide with unlikely hypotheses, as this combination can lead to high rates of Type I errors (Krzywinski and Altman, 2013; Nuzzo, 2014). Given the typical lack of power in the behavioral sciences (Cohen, 1992a; Button et al., 2013), the current emphasis on replication (Pashler and Wagenmakers, 2012; Baker, 2015; Open Science Collaboration, 2015) is an encouraging step, as it should allow extracting more signal from noisy, underpowered experiments in the long run. Statistical power directly informs the reader about two elements: if an effect is there, what is the probability to detect it, and if an effect was detected, what is the probability that it was genuine? These are critical questions in the evaluation of scientific evidence, and especially in the field of cognitive training, setting the stage for the central role of power in all the problems discussed henceforth.

# SAMPLING ERROR

A pernicious consequence of low statistical power is sampling error. Because a sample is an approximation of the population, a point estimate or statistic calculated for a specific sample may differ from the underlying parameter in the population (**Figures 2A,B**). For this reason, most statistical procedures take into account sampling error, and experimenters try to minimize its impact, for example by controlling confounding factors, using valid and reliable measures, and testing powerful manipulations. Despite these precautions, sampling error can obscure experimental findings in an appreciable number of occurrences (Schmidt, 1992). We provide below an example of its detrimental effect.

Let us consider a typical scenario in intervention designs. Assume we randomly select a sample of 40 individuals from an underlying population and assign each participant either to the experimental or the control group. We now have 20 participants in each group, which we assume are representative of the whole population. This assumption, however, is rarely met in typical designs (e.g., Campbell and Stanley, 1966). In small samples, sampling error can have important consequences, especially when individual characteristics are not homogeneously represented in the population. Differences can be based upon neural plasticity, learning potential, motivational traits, or any other individual characteristic. When sampling from an heterogeneous population, groups might not be matched despite random assignment (e.g., Moreau, 2014b).

In addition, failure to take into account extraneous variables is not the only problem with sampling. Another common weakness relates to differences in pretest scores. As set by α, random sampling will generate significantly different baseline scores on a given task 5% of the time in the long run, despite drawing from the same underlying population (see **Figure 2C**). This is not trivial, especially considering that less sizeable discrepancies can significantly influence the outcome of an intervention, as training or testing effects might exacerbate a difference undetected initially.

There are different ways to circumvent this problem, and one in particular that has been the focus of attention recently in training interventions is to increase power. As we have mentioned in the previous section, this can be accomplished either by using larger samples, or by studying larger effects, or both (**Figure 2D**). But these adjustments are not always feasible. To restrict the influence of sampling error, another potential remedy is to factor pretest performance on the dependent variable into group allocation, via restricted randomization. The idea is to ensure that random assignment has been effective at shuffling predefined characteristics (e.g., scores, demographics, physiological correlates) evenly to the different experimental conditions. If groups are imbalanced, a simple remedy is to perform new iterations of the random assignment procedure until conditions are satisfied. This is sometimes unpractical, however, especially with multiple variables to shuffle. Alternatively, one can then constrain random assignment a priori based on pretest scores, via stratified sampling (e.g., Aoyama, 1962). Non-random methods of group assignment are sometimes used in training studies (Spence et al., 2009; Loosli et al., 2012; Redick et al., 2013). An example of such methods, blocking, consists of dividing participants based on pretest scores on a given variable, to create homogenous groups (Addelman, 1969). In second step, random assignment is performed with equal draws from each of the groups, so as to preserve the initial heterogeneity in each experimental group. Other, more advanced approaches can be used (for a review, see Green et al., 2014), yet the rationale remains the same, that is, to reduce the influence of initial discrepancies on the outcome of an intervention. We should point out that these procedures bring problems of their own (Ericson, 2012)—with small samples, no method of assignment is perfect, and one needs to decide on the most suitable approach based on the specific design and hypotheses. In an effort to be transparent, it is therefore important to report how group assignment was performed, particularly in instances where it departed from typical (i.e., simple) randomization.

## CONTINUOUS VARIABLE SPLITS

Lack of power and its related issue sampling error are two limitations of experimental designs that often need substantial investment to be remediated. Conversely, splitting

FIGURE 2 | Sampling error. (A) Cognitive training experiments often assume a theoretical ideal where the sample (orange curve) is perfectly representative of the true underlying distribution (green curve). (B) However, another possibility is that the sample is not representative of the population of interest due to sampling error, a situation that can lead to dubious claims regarding the effectiveness of a treatment. (C) Assuming α = .05 in a frequentist framework, the difference between two groups drawn from the same underlying distribution will be unequal 5% of the time, but a more subtle departure from the population is also likely to influence training outcomes meaningfully. The graph shows the cumulative sum of two-sample t-test p-values divided by the number of tests performed, based on a Monte Carlo simulation (N = 10,000) of 40 individual IQ scores (normally distributed, M = 100, SD = 1) randomly divided in two groups (experimental, control). The red line shows P = .5. (D) Even when both groups are drawn from a single normal distribution (H<sup>0</sup> is true), small sample sizes will spuriously produce substantial differences (absolute median effect size, blue line), as illustrated here with another Monte Carlo simulation (N = 10,000). As group samples get larger, effect size estimates get closer to 0. Red lines represent 25th and 75th quartiles, and the orange line is loess smooth on the median.

a continuous variable is a deliberate decision at the analysis stage. Although popular in intervention studies, it is rarely—if ever—justified.

Typically, a continuous variable reflecting performance change throughout training is split into a categorical variable, often dichotomous. Because the idea is to identify individuals who do respond to the training regimen, and those who do not benefit as much, this approach is often called ''responder analysis''. Most commonly, the dichotomization is achieved via a median split, which refers to the procedure of finding the median score on a continuous variable (e.g., training performance) and split subjects who are below and above this particular score (e.g., low responders vs. high responders).

Median splits are almost always prejudicial (Cohen, 1983), and their use often reflects a lack of understanding of the consequences involved (MacCallum et al., 2002). A full account of the problems associated with this practice is beyond the scope of this article, but the main harms are loss of power and of information, reduction of effect sizes, and inconsistencies in the comparison of results across studies (Allison et al., 1993). Turning a continuous variable into a dichotomy also implies that the original continuum was irrelevant, and that the true nature of the variable is dichotomous. This is seldom the case.

In intervention designs, a detrimental consequence of turning continuous variables into categorical ones and separating low and high performers post hoc is the risk of regression toward the mean (Galton, 1886). Regression toward the mean is one of the most well known byproducts of multiple measurements, yet it is possibly one of the least understood (Nesselroade et al., 1980). As for all the notions discussed in this article, regression toward the mean is not exclusive to training experiments; however, estimating its magnitude is made more difficult by potential confounds with testing effects in these types of design.

In short, regression toward the mean is the tendency for a given observation that is extreme, or far from the mean, to be closer to the mean on a second measurement. When a population is normally distributed, extreme scores are not as likely as average scores, therefore making the probability to observe two extreme scores in a row unlikely. Regression toward the mean is the consequence of imperfect correlations between scores from one session to the next—singling out an extreme score on a specific measure therefore increases the likelihood that it will regress to the mean on another measurement.

This phenomenon might be puzzling because it seems to violate the assumption of independent events. Indeed, regression toward the mean can be mistaken as a deterministic linear change from one measurement to the next, whereas it simply reflects the idea that in a bivariate distribution with the correlation between two variables X and Y less that |1|, the corresponding value y in Y of a given value x of X is expected to be closer to the mean of Y than x is to the mean of X, provided both are expressed in standard deviation units (Nesselroade et al., 1980). This is easier to visualize graphically—the more a score deviates from the mean on a measurement (**Figure 3A**), the more it will regress to the mean on a second measurement, independently from any training effect (i.e., assuming no improvement from pretest to posttest). This effect is exacerbated after splitting a continuous variable (**Figure 3B**), as absolute gains are influenced by the deviation of pretest scores from the mean, irrespective of genuine improvement (**Figure 3C**).

This is particularly problematic in training interventions because numerous studies are designed to measure the effectiveness of a treatment after an initial selection based on baseline scores. For example, many studies intend to assess the impact of a cognitive intervention in schools after enrolling the lowest-scoring participants on a pretest measure (e.g., Graham et al., 2007; Helland et al., 2011; Stevens et al., 2013). Medianor mean-split designs should always wary the reader, as it does not adequately control for regression toward the mean and other confounds (e.g., sampling bias) – if the groups to be compared are not equal at baseline, any interpretation of improvement is precarious. In addition, such comparison is often obscured by the sole presentation of gains scores, rather than both pretest and posttest scores. Significant gains in one group vs. the other might be due to a true effect of the intervention, but can also arise from unequal baseline scores. The remedy is simple: unless theoretically motivated a priori, splitting a continuous variable should be avoided, and unequal performance at baseline should be reported and taken into account when assessing the evidence for an intervention.

Despite the questionable relevance of this practice, countless studies have used median splits on training performance scores in the cognitive training literature (Jaeggi et al., 2011; Rudebeck et al., 2012; Kundu et al., 2013; Redick et al., 2013; Thompson et al., 2013; Novick et al., 2014), following the rationale that transfer effects are moderated by individual differences in gains on the training task (Tidwell et al., 2014). Accordingly, individual differences in response to training and cognitive malleability leads researchers to expect a correlation between training gains

FIGURE 3 | Continuous variable splits and regression toward the mean. Many interventions isolate a low-performing group at pretest and compare the effect of training on this group with a group that includes the remaining participants. This approach is fundamentally flawed, as it capitalizes on regression toward the mean rather than on true training effects. Here, we present a Monte Carlo simulation of 10,000 individual scores drawn from a normally-distributed population (M = 100, SD = 15), before and after a cognitive training intervention (for simplicity purposes, we assume no test-retest effect). (A) As can be expected in such a situation, there is a positive relationship between the absolute distance of pretest scores from the mean and the absolute gains from pretest to posttest: the farther a score deviates from the mean (in either direction), the more likely it is to show important changes between the two measurement points. (B) This is particularly problematic when one splits a continuous variable (e.g., pretest score) into a categorical variable. In this case, and without assuming any real effect of the intervention, a low-performing group (all scores inferior or equal to the first quartile) will typically show impressive changes between the two measurement points, compared with a control group (remaining scores). (C) This difference is a direct consequence of absolute gains from pretest to posttest being a function of the deviation of pretest scores from the mean, following a curvilinear relationship (U-shaped, orange line).

and gains on the transfer tasks, a finding that has been commonly reported in the literature (Chein and Morrison, 2010; Jaeggi et al., 2011; Schweizer et al., 2013; Zinke et al., 2014). We explore this idea further in the next section.

## INTERPRETATION OF CORRELATIONS IN GAINS

The goal in most training interventions is to show that training leads to transfer, that is, gains in tasks that were not part of the training. Decades of research have shown that training on a task results in enhanced performance on this particular task, paving the way for entire programs of research focusing on deliberate practice (e.g., Ericsson et al., 1993). In the field of cognitive training, however, the newsworthy research question is whether or not training is followed by enhanced performance on a different task (i.e., transfer). Following this rationale, researchers often look for positive correlations between gains in the training task and in the transfer task, and interpret such effects as evidence supporting the effectiveness of an intervention (Jaeggi et al., 2011; Rudebeck et al., 2012; Kundu et al., 2013; Redick et al., 2013; Thompson et al., 2013; Novick et al., 2014; Zinke et al., 2014).

FIGURE 4 | Correlated gain scores. Correlated gain scores between a training variable and a transfer variable can occur regardless of transfer. (A) Here, a Monte Carlo simulation (N = 1000) shows normally-distributed individual scores (M = 100, SD = 15) on a training task and a transfer task at two consecutive testing sessions (pretest in orange, posttest in blue). The only constraint on the model is the correlation between the two tasks at pretest (here, r = .92), but not at posttest. (B) In this situation, gain scores—defined as the difference between posttest and pretest scores—will be correlated (here, r = .10), regardless of transfer. This pattern can arise irrespective of training effectiveness, due to the initial correlation between training and transfer scores.

Although apparently sound, this line of reasoning is flawed. Correlated gain scores are neither an indication nor a necessity for transfer—transfer can be obtained without any correlation in gain scores, and correlated gain scores do not guarantee transfer (Zelinski et al., 2014).

For the purpose of simplicity, suppose we design a training intervention in which we set out to measure only two dependent variables: the ability directly trained (e.g., working memory capacity, WMC) and the ability we wish to demonstrate transfer to (e.g., intelligence, g). If requirements (a, b, c) are met such that: (a) performance on WMC and g is correlated at pretest, as is often the case due to the positive manifold (Spearman, 1904), (b) this correlation is no longer significant at posttest, and (c) scores at pretest do not correlate well with scores at posttest, both plausible given that one ability is being artificially inflated through training (Moreau and Conway, 2014); then gains in the trained ability and in the transfer ability will be correlated. This correlation will be a consequence of pretest correlations, and cannot be regarded as reflecting evidence for transfer. More strikingly perhaps, performance gains in initially correlated tasks are expected to be correlated even without transfer (**Figures 4A,B**). Correlations are unaffected by a linear transformation of the variables they relate to—they are therefore not influenced by variable means. As a result, correlated gain scores is a phenomenon completely independent from transfer. A positive correlation is the consequence of a greater covariance of gain scores within-session than between sessions, but it provides no insight into the behavior of the means we wish to measure—scores could increase, decrease, or remain unchanged, and this information would not be reflected in the correlation of gain scores (Tidwell et al., 2014). Conversely, transfer can happen without correlated gains, although this situation is perhaps less common in training studies, as it often implies that the training task and the transfer task were not initially correlated.

To make things worse, analyses of correlation in gains are often combined with median splits to look for different patterns in a group of responders (i.e., individuals who improved on the training task) and in a group of non-responders (i.e., individuals who did not improve on the training task). The underlying rationale is that if training is effective, only those who improved in the training task should show transfer. This approach, however, combines the flaw we presented herein with the ones discussed in the previous section, therefore increasing the chances to reach erroneous conclusions. Limitations of this approach have been examined before and illustrated via simulations (Tidwell et al., 2014) and structural equation modeling (SEM; Zelinski et al., 2014). To summarize, these articles point out that correlated gain scores do not answer the question they are typically purported to answer, that is, whether improvement was moderated by training conditions.

The remedy to this intuitive but erroneous interpretation of correlated gains lies in alternative statistical techniques. Transfer can be established when the experimental group shows larger gains than controls, demonstrated by a significant interaction on a repeated measures ANOVA (with treatment group as the between-subject factor and session as the within-group factor) or its Bayesian analog. Because this analysis does not correct for group differences at pretest, one should always report post hoc comparisons to follow up on significant interactions and provide summary statistics including pretest and posttest scores, not just of gain scores, as is often the case. Due to this limitation, another common approach is to use an ANCOVA, with posttest scores as a dependent variable and pretest scores as a covariate. Although often used interchangeably, the two types of analysis actually answer slightly different research questions. When one wishes to assess the difference in gains between treatment groups, the former approach is most appropriate<sup>3</sup> . Unlike correlated gain scores, this method allows answering the question at hand—does the experimental treatment produce larger cognitive gains than the control?

A different, perhaps more general problem concerns the validity of improvements typically observed in training studies. How should we interpret gains on a specific task or on a cognitive construct? Most experimental tasks used by psychologists to assess cognitive abilities were designed and intended for comparison between individuals or groups, rather than as a means to quantify individual or group improvements. This point may seem trivial, but it hardly is—the underlying mechanisms tapped by training might be task-specific, rather than domain-general. In other words, one might improve via specific strategies that help perform well on a task or set of tasks, without any guarantee of meaningful transfer. In some cases, even diminishment can be viewed as a form of enhancement (Earp et al., 2014). It can therefore be difficult to interpret improvement following a training intervention, as it may reflect different underlying patterns. Hayes et al. (2015, p. 1) emphasize this point in a discussion of training-induced gains in fluid intelligence: ''The interpretation of these results is questionable because score gains can be dominated by factors that play marginal roles in the scores themselves, and because intelligence gain is not the only possible explanation for the observed control-adjusted far transfer across tasks''. Indeed, a possibility that often cannot be discarded is that improvement is driven by strategy refinement rather than general gains. Moreover, it has also been pointed out that gains in a test of intelligence designed to measure between-subject differences do not necessarily imply intelligence gains evaluated within subjects (te Nijenhuis et al., 2007).

Reaching a precise understanding about the nature and meaning of cognitive improvement is a difficult endeavor, but in a field with far-reaching implications for society such as cognitive training, it is worth reflecting upon what training is thought and intended to achieve. Although informed by prior research (e.g., Ellis, 1965; Stankov and Chen, 1988a,b), practicing specific cognitive tasks to elicit transfer is a novel paradigm in its current form, and numerous questions remain regarding the definition and measure of cognitive enhancement (e.g., te Nijenhuis et al., 2007; Moreau, 2014a). Until theoretical models are refined to account for novel evidence, we cannot assume that long-standing knowledge based on more than a century of research in psychometrics applies inevitably to training designs deliberately intended to promote general cognitive improvement.

# SINGLE TRANSFER ASSESSMENTS

Beyond matters of analysis and interpretation, the choice of specific tasks used to demonstrate transfer is also critical. Any measurement, no matter how accurate, contains error. More than anywhere else perhaps, this is true in the behavioral sciences—human beings differ from one another on multiple factors that contribute to task performance in any ability. One of the keys to reduce error is to increase the number of measurements. This idea might not be straightforward at first—if measurements are imperfect, why would multiplying them, and therefore the error associated with them, give a better estimate of the ability one wants to probe? The reason multiple measurements are superior to single measurements is because inferring scores from combined sources allows extracting out some, if not most, of the error.

This notion is ubiquitous. Teachers rarely give final grades based on one assessment, but rather average intermediate grades to get better, fairer estimates. Politicians do not rely on single polls to decide on a course of action in a campaign—they combine several of them to increase precision. Whenever precision matters most, we also increase the number of measurements before combining them. In tennis, men play to the best of three sets in most competitions, but to the best of five sets in the most prestigious tournaments, the Grand Slams. The idea is to minimize the noise, or random sources of error, and maximize the signal, or the influence of a true ability, tennis skills in this example.

<sup>3</sup>More advanced statistical techniques (e.g., latent change score models) can help to refine claims of transfer in situations where multiple outcome variables are present (e.g. McArdle and Prindle, 2008; McArdle, 2009; Noack et al., 2014).

with M = 100 and SD = 15; random error is normally distributed with M = 0 and SD = 7.5). (A–C) Scatterplots depicting the relationship between the construct and each of its imperfect measurements (r = .89, in all cases). (D) A better estimate of the construct is given by a unit-weighted composite score (r = .96), or by (E) a regression-weighted composite score (r = .96) based on factor loadings of each measurement in a factor analysis. (F) In this example, the difference between unit-weighted scores and regression-weighted is negligible because the correlations of each measurement with the true ability are roughly equal. Thus, observations above the red line (regression-weighted score is the best estimate) and below the red line (unit-weighted score is the best estimate) approximately average to 0.

This is not the unreasoned caprice of picky scientists—by increasing the number of measurements, we do get better estimates of latent constructs. Nobody says it more eloquently than Randy Engle in Smarter, a recent bestseller by Hurley (2014): ''Much of the things that psychology talks about, you can't observe. [. . .] They're constructs. We have to come up with various ways of measuring them, or defining them, but we can't specifically observe them. Let's say I'm interested in love. How can I observe love? I can't. I see a boy and a girl rolling around in the grass outside. Is that love? Is it lust? Is it rape? I can't tell. But I define love by various specific behaviors. Nobody thinks any one of those in isolation is love, so we have to use a number of them together. Love is not eye contact over dinner. It's not holding hands. Those are just manifestations of love. And intelligence is the same.''

Because constructs are not directly observable (i.e., latent), we rely on combinations of multiple measurements to provide accurate estimates of cognitive abilities. Measurements can be combined into composite scores, that is, scores that minimize measurement error to better reflect the underlying construct of interest. Because they typically improve both reliability and validity in measurements (Carmines and Zeller, 1979), composite scores are key in cognitive training designs (e.g., Shipstead et al., 2012). Relying on multiple converging assessments also allows adequate scopes of measurement, which ensure that constructs reflect an underlying ability rather than task-specific components (Noack et al., 2014). Such precaution in turn allows stronger and more accurate claims of transfer after an intervention. Again, thinking about this idea with an example is helpful. Suppose we simulate an experiment in which we set to measure intelligence (g) in a sample of participants. Defining a construct g and three imperfect measures of g reflecting the true ability plus normally distributed random noise, we obtain single measures that correlate with g such that r = .89 (**Figures 5A–C**). Let us assume three different assessments of g rather than three consecutive testing sessions of the same assessment, so that we do not need to take testing effects into account.

Different solutions exist to minimize measurement error, besides ensuring experimental conditions were adequate to guarantee valid measurements. One possibility is to use the median score. Although not ideal, this is an improvement over single testing. Another solution is to average all scores and create a unit-weighted composite score (i.e., mean, **Figure 5D**), which often is a better estimate than the median, unless one or several of the measurements were unusually prone to error. When individual scores are strongly correlated (i.e., collinear), a unit-weighted composite score is often close to the best possible estimate. When individual scores are not or weakly correlated, a regression-weighted composite score is usually a better estimate as it allows minimizing error (**Figure 5E**). Weights for the latter are factor loadings extracted from a factor analysis that includes each measurement, thus minimizing non-systematic error. The power of composite scores is more evident graphically—**Figures 5D,E** show how composite scores are better estimates of a construct than either measure alone (including the median, see in comparison with **Figures 5A–C**). Different methods to generate composite scores can themselves be subsequently compared (see **Figure 5F**). To confidently claim transfer after an intervention, one therefore needs to demonstrate that gains are not exclusive to single tasks, but rather reflect general improvement on latent constructs.

Directly in line with this idea, more advanced statistical techniques such as latent curve models (LCM) and latent change score models (LCSM), typically implemented in a SEM framework, can allow finer assessment of training outcomes (for example, see Ghisletta and McArdle, 2012, for practical implementation). Because of its explicit focus on change across different time points, LCSM is particularly well suited to the analysis of longitudinal data (e.g., Lövdén et al., 2005) and of training studies (e.g., McArdle, 2009), where the emphasis is on cognitive improvement. Other possibilities exist, such as multilevel (Rovine and Molenaar, 2000), random effects (Laird and Ware, 1982) or mixed models (Dean and Nielsen, 2007), all with a common goal: minimizing noise in repeated-measures data, so as to separate out measurement error from predictors or structural components, thus yielding more precise estimates of change.

# MULTIPLE COMPARISONS

If including too few dependent variables is problematic, too many can also be prejudicial. At the core of this apparent conundrum lies the multiple comparisons problem, another subtle but pernicious limitation in experimental designs. Following up on one of our previous examples, suppose we are comparing a novel cognitive remediation program targeting learning disorders with traditional feedback learning. Before and after the intervention, participants in the two groups can be compared on measures of reading fluency, reading comprehension, WMC, arithmetic fluency, arithmetic comprehension, processing speed, and a wide array of other cognitive constructs. They can be compared across motivational factors, or in terms of attrition rate. And questionnaires might provide data on extraversion, happiness, quality of life, and so on. For each dependent variable, one could test for differences between the group receiving the traditional intervention and the group enrolled in the new program, with the rationale that differences between groups reflect an inequality of the treatments.

With the multiplication of pairwise comparisons, however, experimenters run the risk of finding differences by chance alone, rather than because of the intervention itself.<sup>4</sup> As we mentioned earlier, mistaking a random fluctuation for a true effect is a false positive, or Type I error. But what exactly is the probability to wrongly conclude that an effect is genuine when it is just random noise? It is easier to solve this problem graphically (**Figure 6A**). When comparing two groups on 10 transfer tasks, the probability to make a wrong judgment because of random fluctuation is about 40%. With 15 tasks, the probability rise to 54%, and with 20 tasks, it reaches 64% (all assuming a α = .05 threshold to declare a finding significant).

This problem is well known, and procedures have been developed to account for it. One evident answer is to reduce Type I errors by using a more stringent threshold. With α = .01, the percentage of significant differences rising spuriously in our previous scenario drops to 10% (10 tasks), 14% (15 tasks), and 18% (20 tasks). Lowering the significance threshold is exactly what the Bonferroni correction does (**Figure 6B**). Specifically, it requires dividing the significance level required to claim that a difference is significant by the number of comparisons being performed. Therefore, for the example above with 10 transfer tasks, α = .005, with 15 tasks, α = .003, and with 20 tasks, α = .0025. The problem with this approach is that it is often too conservative—it corrects more strictly than necessary. Considering the lack of power inherent to numerous interventions, true effects will often be missed when the Bonferroni procedure is applied; the procedure lowers false discoveries, but by the same token lowers true discoveries as well. This is especially problematic when comparisons are highly dependent (Vul et al., 2009; Fiedler, 2011). For example, in typical fMRI experiments involving the comparisons of thousands of voxels with one another, Bonferroni corrections would systematically prevent yielding any significant correlation. By controlling α levels across all voxels, the method guarantees an error probability of .05 on each single comparison, a level too stringent for discoveries. Although the multiple comparisons problem has been extensively discussed, we should point out that not everyone agrees on its pernicious effects (Gelman et al., 2012).

Provided there is a problem, a potential solution is replication. Obviously, this is not always feasible, can turn out to be expensive, and is not entirely foolproof. Other techniques have been developed to answer this challenge, with good results. For example, the recent rise of Monte Carlo methods or their non-parametric equivalent such as bootstrap and jackknife offers interesting alternatives. In intervention that include brain imaging data, these techniques can be used to calculate clustersize thresholds, a procedure that relies on the assumption that contiguous signal changes are more likely to reflect true neural activity (Forman et al., 1995), thus allowing more meaningful control over discovery rates.

In line with this idea, one approach that has gained popularity over the years is based on the false discovery rate (FDR). FDR correction is intended to control false discoveries by adjusting α only in the tests that result in a discovery (true or false), thus allowing a reduction of Type I errors while leaving more power to detect truly significant differences. The resulting q-values are corrected for multiple comparisons, but are less stringent than traditional corrections on p-values because they only take into account positive effects. To illustrate this idea, suppose 10% of all cognitive interventions are effective. That is, of all the designs tested by researchers with the intent to improve some aspect of cognition, one in 10 is a successful attempt. This is a deliberately low estimate, consistent with the conflicting evidence

<sup>4</sup>Multiple comparisons introduce additional problems in training designs, such as practice effects from one task to another within a given construct (i.e., hierarchical learning, Bavelier et al., 2012), or cognitive depletion effects (Green et al., 2014).

surrounding cognitive training (e.g., Melby-Lervåg and Hulme, 2013). Note that we rarely know beforehand the ratio of effective interventions, but let us assume here that we do. Imagine now that we wish to know which interventions will turn out to show a positive effect, and which will not, and that α = .05 and power is .80 (both considered standard in psychology). Out of 10,000 interventions, how often will we wrongly conclude that

such that .50 ≤ 1−β ≤ 1 (the orange line shows FDR when 1−β = .80).

an intervention is effective? To determine this probability, we first need to determine how many interventions overall will yield a positive result (i.e., the experimental group will be significantly different from the control group at posttest). In our hypothetical scenario, we would detect, with a power of .80, 800 true positives. These are interventions that were effective (N = 1000) and would be correctly detected as such (true positives). However, because our power is only .80, we will miss 200 interventions (false negatives). In addition, out of the 9000 interventions that we know are ineffective, 5% (α) will yield false positives. In our example, these amount to 450. The true negatives would be the remaining 8550 (**Figure 6C**).

The FDR is the amount of false positives divided by all the positive results, that is, 36% in this example. More than 1/3 of the positive studies will not reflect a true underlying effect. The positive predictive value (PPV), the probability that a significant effect is genuine, is approximately two thirds in this scenario (64%). This is worth pausing for a moment: more than a third of our positive results, reaching significance with standard frequentist methods, would be misleading. Furthermore, the FDR increases if either power or the percentage of effective training interventions in the population of studies decreases (**Figure 6D**). Because FDR only corrects for positive p-value, the procedure is less conservative than the Bonferroni correction. Many alternatives exist (e.g., Dunnett's test, Fisher's LSD, Newman-Keuls test, Scheffé's method, Tukey's HSD)—ultimately, the preferred method depends on the problem at hand. Is the emphasis on finding new effects, or on the reliability of any discovered effect? Scientific rationale is rarely dichotomized, but thinking about a research question in these terms can help to decide on adequate statistical procedures. In the context of this discussion, one of the best remedies remains to design an intervention with a clear hypothesis about the variables of interest, rather than multiply outcome measures and increase the rate of false positives. Ideally, experiments should explicitly state whereas they are exploratory or confirmatory (Kimmelman et al., 2014), and should always disclose all tasks used in pretest and posttest sessions (Simmons et al., 2011). These measures are part of a broader ongoing effort intended to reduce false positives in psychological research, via more transparency and systematic disclosure of all manipulations, measurements and analyses in experiments, to control for researcher degrees of freedom (Simmons et al., 2011).

# PUBLICATION BIAS

Our final stop in this statistical journey is to discuss publication bias, a consequence of research findings being more likely to get published based on the direction of the effects reported or on statistical significance. At the core of the problem lies the overuse of frequentist methods, and particularly H<sup>0</sup> Significance Testing (NHST), in medicine and the behavioral sciences, with an emphasis on the likelihood of the collected data or more extreme data if the H<sup>0</sup> is true—in probabilistic notation, P(d|H0) – rather than the probability of interest, P(H0|d), that is, the probability that the H<sup>0</sup> is true given the data collected. In intervention studies, one typically wishes to know the probability that an intervention is effective given the evidence, rather than the less informative likelihood of the evidence if the intervention were ineffective (for an in-depth analysis, see Kirk, 1996). The incongruity between these two approaches has motivated changes in the way findings are reported in leading journals (e.g., Cumming, 2014) punctuated recently by a complete ban of NHST in Basic and Applied Social Psychology (Trafimow and Marks, 2015), and is central in the growing popularity of Bayesian inference in the behavioral sciences (e.g., Andrews and Baguley, 2013).

Because of the underlying logic of NHST, only rejecting the H<sup>0</sup> is truly informative—retaining H<sup>0</sup> does not provide evidence to prove that it is true<sup>5</sup> . Perhaps the H<sup>0</sup> is untrue, but it is equally plausible that the strength of evidence is insufficient to reject H<sup>0</sup> (i.e., lack of power). What this means in practice is that null findings (findings that do not allow us to confidently reject H0) are difficult to interpret, because they can be due to the absence of an effect or to weak experimental designs. Publication of null findings is therefore rare, a phenomenon that contribute to bias the landscape of scientific evidence—only positive findings get published, leading to the false belief that interventions are effective, whereas a more comprehensive assessment might lead to more nuanced conclusions (e.g., Dickersin, 1990).

Single studies are never definitive; rather, researchers rely on meta-analyses pooling together all available studies meeting a set of criteria and of interest to a specific question to get a better estimate of the accumulated evidence. If only studies corroborating the evidence for a particular treatment get published, the resulting literature becomes biased. This is particularly problematic in the field of cognitive training, due to the relative novelty of this line of investigation, which increases the volatility of one's belief, and because of its potential to inform practices and policies (Bossaer et al., 2013; Anguera and Gazzaley, 2015; Porsdam Mann and Sahakian, 2015). Consider for example that different research groups across the world have come to dramatically opposite conclusions about the effectiveness of cognitive training, based on slight differences in meta-analysis inclusion criteria and models (Melby-Lervåg and Hulme, 2013; Karbach and Verhaeghen, 2014; Lampit et al., 2014; Au et al., 2015). The point here is that even a few missing studies in meta-analyses can have important consequences, especially when the accumulated evidence is relatively scarce as is the case in the young field of cognitive training. Again, let us illustrate this idea with an example. Suppose we simulate a pool of study results, each with a given sample size and an observed effect size for the difference between experimental and control gains after a cognitive intervention. The model draws pairs of numbers randomly from a vector of sample size (ranging from N = 5 to N = 100) and a vector of effect sizes (ranging from d = 0 to d = 2). We then generate stochastically all kinds of associations, for example large sample sizes with small effect sizes and vice-versa (**Figure 7A**). In science, however, the landscape of published findings is typically different—studies get published when they pass a test of statistical significance, with a threshold given by the p-value. To demonstrate that a difference is significant in this framework, one needs large sample sizes, large effects, or a fairly sizable combination of both. When represented graphically, this produces a funnel plot typically used in meta-analyses (**Figure 7B**); departures from this symmetrical representation often indicate some bias in the literature.

Two directions seem particularly promising to circumvent publication bias. First, researchers often try to make an estimate of the size of publication bias when summarizing the evidence for a particular intervention. This process can be facilitated by examining a representation of all the published studies, with a measure of precision plotted as a function of the intervention effect. In the absence of publication bias, it is expected that studies with larger samples, and therefore better precision, will fall around the average effect size observed, whereas studies with smaller sample size, lacking precision, will be more dispersed. This results in a funnel shape within which most observations fall. Deviations from this shape can raise concerns regarding the objectivity of the published evidence, although it should be noted that other explanations might be equally valid (Lau et al., 2006). These methods can be improved upon, and recent articles have addressed some of the typical concerns of solely relying on funnel plots to estimate publication bias. Interesting alternatives have emerged, such as p-curves (see **Figures 7C,D**; Simonsohn et al.,

<sup>5</sup>David Bakan distinguished between sharp and loose null hypotheses, the former referring to the difference between population means being strictly zero, whereas the latter assumes this difference to be around the null. Much of the disagreement with NHST arises from the problem presented by sharp null hypotheses, which, given sufficient sample sizes, are never true (Bakan, 1966).

Frontiers in Human Neuroscience | www.frontiersin.org

2014a,b) or more direct measures of the plausibility of a set of findings (Francis, 2012). These methods are not infallible (for example, see Bishop and Thompson, 2016), but they represent steps in the right direction.

Second, ongoing initiatives are intended to facilitate the publication of all findings, irrespective of the outcome, on online repositories. Digital storage has become cheap, allowing platforms to archive data for limited cost. Such repositories already exist in other fields (e.g., arXiv), but have not been developed fully in medicine and in the behavioral sciences. Additional incentives to pre-register studies are another step in that direction—for example, allowing researchers to get preliminary publication approval based on study design and intended analyses, rather than on the direction of the findings. Publishing all results would eradicate publication bias (van Assen et al., 2014), and therefore initiatives such as pre-registration should be the favored approach in the future (Goldacre, 2015).

# CONCLUSION

Based on Monte Carlo simulations, we have demonstrated that several statistical flaws undermine typical findings in cognitive training interventions. This critique echoes others, which have pointed out the limitations of current research practices (e.g., Ioannidis, 2005), although arguably the flaws we discussed in this article are often a consequence of limited resources—including methodological and statistical guidelines—rather than the result of errors or practices deliberately intended to mislead. These flaws are pervasive, but we believe that clear visual representations can help raise awareness of their pernicious effects among researchers and interested readers of scientific findings. As we mentioned, statistical flaws are not the only kinds of problems in cognitive training interventions. However, the relative opacity of statistics favors situations where one applies methods and techniques popular in a field of study, irrespective of pernicious effects. We hope that our present contribution provides a valuable resource to make training interventions more accessible.

Importantly, not all interventions suffer from these flaws. A number of training experiments are excellent, with strong designs and adequate data analyses. Arguably, these studies have emerged in response to prior methodological concerns and through facilitated communication across scientific fields, such as between evidence-based medicine and psychology, stressing further the importance of discussing good research practices. One example that illustrates the benefits of this dialog is the use of active control groups, which is becoming the norm rather than the exception in the field of cognitive training. When feasible, other important components are being integrated within research procedures, such as random allocation to conditions, standardized data collection and double-blind designs. Following current trends in the cognitive training literature, interventions should be evaluated according to their methodological and statistical strengths—more value, or weight, should be given to flawless studies or interventions with fewer methodological problems, whereas less importance should be conferred to studies that suffer several of the flaws we mentioned (Moher et al., 1998).

Related to this idea, most simulations in this article stress the limit of frequentist inference in its NHST implementation. This idea is not new (e.g., Bakan, 1966), yet discussions of alternatives are recurring and many fields of study are moving away from solely relying on the rejection of null hypotheses that often make little practical sense (Herzog and Ostwald, 2013; but see also Leek and Peng, 2015). In our view, arguing for or against the effectiveness of cognitive training is ill-conceived in a NHST framework, because the overwhelming evidence gathered throughout the last century is in favor of a null-effect. Thus, even well-controlled experiments that fail to reject the H<sup>0</sup> cannot be considered as convincing evidence against the effectiveness of cognitive training, despite the prevalence of this line of reasoning in this literature.

As a result, we believe cognitive interventions are particularly suited to alternatives such as Neyman-Pearson Hypothesis Testing (NPHT) and Bayesian inference. These approaches are not free of caveats, yet they provide interesting alternatives to the prevalent framework. Because NPHT allows non-significant results to be interpreted as evidence for the null-hypothesis (Neyman and Pearson, 1933), the underlying rationale of NPHT favors scientific advances, especially in the context of accumulating evidence against the effectiveness of an intervention. Bayesian inference (Bakan, 1953; Savage, 1954; Jeffreys, 1961; Edwards et al., 1963) also seems particularly appropriate in evaluating training findings, given the relatively limited evidence for novel training paradigms and the variety of extraneous factors involved. Initially, limited data is outweighed by prior beliefs, but more substantial evidence eventually overwhelms the prior and lead to changes in belief (i.e., updated posteriors). Generally, understanding human cognition follows this principle, with each observation refining the ongoing model. In his time, Piaget speaking of children as ''little scientists'' was hinting on this particular point—we construct, update and refine our model of the world at all times, taking into account the available data and confronting them with prior experience. A full discussion of Bayesian inference applications in the behavioral sciences is outside the scope of this article, but many excellent contributions have been published in recent years, either related to the general advantages of adopting Bayesian statistics (Andrews and Baguley, 2013) or introducing Bayesian equivalents to common frequentist procedures (Wagenmakers, 2007; Rouder et al., 2009; Morey and Rouder, 2011; Wetzels and Wagenmakers, 2012; Wetzels et al., 2012). It follows that evidence should not be dichotomized—some interventions work for some individuals, and what needs to be identified is what particular interventions yield the more sizeable or reliable effects, what individuals benefit from these and why, rather than the elusive question of absolute effectiveness (Moreau and Waldie, 2016).

In closing, we remain optimistic about current directions in evidence-based cognitive interventions—experimental standards have been improved (Shipstead et al., 2012; Boot et al., 2013), in direct response to blooming claims reporting post-intervention cognitive enhancement (Karbach and Verhaeghen, 2014; Au et al., 2015) and their criticisms (Shipstead et al., 2012; Melby-Lervåg and Hulme, 2013). Such inherent skepticism is healthy, yet hurdles should not discourage efforts to discover effective treatments. The benefits of effective interventions to society are enormous, and further research is to be supported and encouraged. In line with this idea, the novelty of cognitive training calls for exploratory designs to discover effective interventions. The present article represents a modest attempt to document and clarify experimental pitfalls so as to encourage significant advances, at a time of intense debates sparking around replication in the behavioral sciences (Pashler and Wagenmakers, 2012; Simons, 2014; Baker, 2015; Open Science Collaboration, 2015; Simonsohn, 2015; Gilbert et al., 2016). By presenting common pitfalls and by reflecting on ways to evaluate typical designs in cognitive training, we hope to provide an accessible reference for researchers conducting experiments in this field, but also a useful resource for neophytes interested in understanding the content and ramifications of cognitive intervention studies. If scientists want training interventions to impact decisions outside research and academia, empirical findings need to be presented in a clear and unbiased manner, especially when the question of interest is complex and the evidence equivocal.

# AUTHOR CONTRIBUTIONS

DM designed and programmed the simulations, ran the analyses, and wrote the paper. IJK and KEW provided valuable suggestions. All authors approved the final version of the manuscript.

# FUNDING

Part of this work was supported by philanthropic donations from the Campus Link Foundation, the Kelliher Trust and Perpetual Guardian (as trustee of the Lady Alport Barker Trust) to DM and KEW.

# ACKNOWLEDGMENTS

We are deeply grateful to Michael C. Corballis for providing invaluable suggestions and comments on an earlier version of this manuscript. DM and KEW are supported by philanthropic donations from the Campus Link Foundation, the Kelliher Trust and Perpetual Guardian (as trustee of the Lady Alport Barker Trust).

# REFERENCES


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Moreau, Kirk and Waldie. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Barriers, Benefits, and Beliefs of Brain Training Smartphone Apps: An Internet Survey of Younger US Consumers

John Torous <sup>1</sup> \* † , Patrick Staples 2† , Elizabeth Fenstermacher <sup>1</sup> , Jason Dean<sup>1</sup> and Matcheri Keshavan<sup>1</sup>

<sup>1</sup> Department of Psychiatry, Beth Israel Deaconess Medical Center, Boston, MA, USA, <sup>2</sup> Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA

Background: While clinical evidence for the efficacy of brain training remains in question, numerous smartphone applications (apps) already offer brain training directly to consumers. Little is known about why consumers choose to download these apps, how they use them, and what benefits they perceive. Given the high rates of smartphone ownership in those with internet access and the younger demographics, we chose to approach this question first with a general population survey that would capture primarily this demographic.

#### Edited by:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia (UNED), Spain

#### Reviewed by:

José Manuel Reales, Universidad Nacional de Educación a Distancia (UNED), Spain Chariklia Tziraki-Segal, MELABEV and Hebrew University, Jerusalem

#### \*Correspondence:

John Torous jtorous@bidmc.harvard.edu

†Co-first authors.

Received: 11 January 2016 Accepted: 08 April 2016 Published: 20 April 2016

#### Citation:

Torous J, Staples P, Fenstermacher E, Dean J and Keshavan M (2016) Barriers, Benefits, and Beliefs of Brain Training Smartphone Apps: An Internet Survey of Younger US Consumers. Front. Hum. Neurosci. 10:180. doi: 10.3389/fnhum.2016.00180 Method: We conducted an online internet-based survey of the US population via mTurk regarding their use, experience, and perceptions of brain training apps. There were no exclusion criteria to partake although internet access was required. Respondents were paid 20 cents for completing each survey. The survey was offered for a 2-week period in September 2015.

Results: 3125 individuals completed the survey and over half of these were under age 30. Responses did not significantly vary by gender. The brain training app most frequently used was Lumosity. Belief that a brain-training app could help with thinking was strongly correlated with belief it could also help with attention, memory, and even mood. Beliefs of those who had never used brain-training apps were similar to those who had used them. Respondents felt that data security and lack of endorsement from a clinician were the two least important barriers to use.

Discussion: Results suggest a high level of interest in brain training apps among the US public, especially those in younger demographics. The stability of positive perception of these apps among app-naïve and app-exposed participants suggests an important role of user expectations in influencing use and experience of these apps. The low concern about data security and lack of clinician endorsement suggest apps are not being utilized in clinical settings. However, the public's interest in the effectiveness of apps suggests a common theme with the scientific community's concerns about direct to consumer brain training programs.

Keywords: brain, apps, smartphones, memory, technology assessment

# INTRODUCTION

Over the last decade, consumer markets have seen a veritable explosion in products marketed for ''brain training''. ''Brain training' entails the use of specific exercises, often games, which reputedly improve cognitive performance. While many companies advertise a neuroscientific basis for the efficacy of their brain training products, there is little peer-reviewed research to substantiate these claims. Nevertheless, brain training exercises have maintained broad appeal. Current estimates suggest that brain training has become a billion dollar annual industry, with over 70 million active users of Lumosity, one of the most popular brain training programs. (Sukel, 2015). A direct evaluation of these programs'' efficacy is beyond the scope of this article, and we refer the reader to other work which has addressed this topic (Owen et al., 2010; Bavelier et al., 2012; Rabipour and Raz, 2012; A Consensus on the Brain Training Industry from the Scientific Community, 2014; Lampit et al., 2014; Toril et al., 2014; Ballesteros et al., 2015). This article seeks to assess consumer motivations and perceived benefits and attitudes towards of brain training exercise programs, with a particular focus on smartphone applications (apps). Smartphone apps offer an accessible, affordable, and convenient method for millions of consumers to access and engage in brain training. Understanding why consumers choose to download and use brain training apps is an important question that can help clinicians discuss and understand the role of these digital tools. As non-invasive, easily accessible, and affordable cognitive interventions, brain training apps are appealing consumer devices. From younger workers hoping to become more efficient (Borness et al., 2013), to older adults concerned about dementia (Corbett et al., 2015; Salthouse, 2015) and other psychiatric disorders (Keshavan et al., 2014), brain training holds tremendous promise.

There is little research or consensus regarding what drives consumers to use brain-training apps. One recent study investigated user expectations and noted that people who use these apps tend to have high expectations, even before using them (Rabipour and Davidson, 2015). In early 2016, the US Federal Trade Commission ordered one brain training app, Lumosity, to pay a two million dollar settlement regarding ''deceptive advertising'' stating the company ''preyed on consumers'' (Lumosity to Pay \$2 Million to Settle FTC Deceptive Advertising Charges for Its ''Brain Training'' Program [Internet]., 2016). However, little is known about who uses brain training apps, which apps they are using, what they expect in terms of cognitive improvement, and what they perceive as barriers to use.

Although brain training apps are marketed across all ages, we chose to focus on a more tech-savvy demographic, which was highly correlated with a younger demographic, which has also been shown to be the largest group per capita of smartphone owners. National survey data suggests that 85% of US adults between ages 18–29 and 70% between ages 30–50 own a smartphone (Smith, 2015). Compared to those over the age of 50, this younger demographic is more than twice as likely to use their phones to find health related information online (Fox and Duggan, 2012). Survey data of outpatient psychiatric patients also found high rates of smartphone ownership and interest in apps among a similar demographic (Torous et al., 2014), though there is no survey data on smartphone ownership and app interest among those seeking brain training. We conducted the following survey study in order to better characterize consumer opinions and attitudes towards brain training apps specifically choosing a survey modality that would capture the largest techsavvy group of consumers through an on-line survey format as this demographic is most likely to use these programs. This techsavvy group of consumers is largely comprised of a younger demographic and the majority of our respondents were under the age of 50.

## MATERIALS AND METHODS

In order to reach a large population, we conducted a survey study of the general United States population, using an online survey platform, Amazon's Mechanical Turk (mTurk). This platform has been validated for more complex behavioral research (Crump et al., 2013), though here we used it as a simple anonymous survey platform. The survey was conducted in September 2015 and was offered to 3125 subjects registered to take surveys on mTurk, with compensation of 20 cents per survey. The survey, shown below in **Figure 1**, received hospital IRB approval. Of note, we included a simple math question, ''9 + 4 = ?'' in order to ensure that subjects were actively engaged with the material and not simply ''clicking'' through the survey.

As our survey contained questions around which brain training apps subjects had used, we sought to identify the most popular brain training apps from both the Apple iTunes store and Android Google Play store. Considering that each app store ranks apps by different criteria and provides different information on number of downloads, reviews, and users, we combined the top ten apps from each marketplace (in June 2015) in a single top ten list, based on our judgment and consensus. We also sought to identify brain-training apps that have been clinically studied and identified six apps from an article that reviewed the evidence for brain training apps (Brooks, 2014). Lumosity was the only app the overlapped as it has been clinically studied and was in the top ten apps on the commercial marketplaces. Some questions (7, 8, 9, 10, 11) asked subjects for their perception on features of brain training regardless of use of these apps and this was intentional to be able to explore how perception varied with use. In creating the survey, we composed a list of possible concerns of the participants in the survey (see Question 16). We interpreted these concerns as possible barriers. While our survey was open to anyone and age was not an exclusion factor, the mTurk platform is skewed towards a more tech savvy and therefore younger demographic.

# RESULTS

Over a 2-week period in September 2015, 3125 subjects completed the survey. 48.4% were female (age mean 33.9, SD 12.2), 51.3% were male (age mean 30.9, SD 9.2), and 0.3% of respondents did not answer this question. Of the 3125 subjects,

FIGURE 1 | A copy of the survey questions reformatted to be displayed in a single figure.

1558 (nearly 50%) were under age 30, 978 (just over 31%) were between ages 31 – 45, 276 (almost 9%) between ages 46 – 60, and 54 (nearly 2%) older than age 60. **Figure 2** below shows the a breakdown of smartphone ownership, having any apps, having health apps, and having brain training apps by age brackets. 93.7% of subjects report having apps, 69.2% having used health apps, and 55.7% having used brain training apps. 66.9% reported that brain training apps helped with thinking, 69.3% with attention, 53.3% with mood, 65% with memory, and 14.9% reported that they felt there may be dangers with app use. 98.2% answered the math question correctly. Demographic age and gender related information of subjects and there ownership/use of smartphones and apps is shown below in **Figure 2**.

Of the 16 apps subjects were asked if they had ever used, Lumosity was the most used with 70% of those who had used brain training apps having tried it. **Figure 3** below displays apps used by subjects in a polar plot showing the proportion of brain training apps used, stratified by reported gender.

Looking at **Figure 3**, we notice several trends. First, the largest correlations (0.77 – 0.82) exist between beliefs about the positive effects of brain training apps on cognitive abilities (memory, thinking, and attention). A strong but attenuated correlation also exists between these and the reported benefit of brain training apps on mood (0.34 – 0.40). However, it is instructive to observe that the correlation between using brain apps and reporting the positive effects above, while statistically significant, is very slight (0.09 – 0.14). This suggests two points: first, that those who use brain training apps do not observe the main benefits intended by the app; and second, those who do report benefits report many that are mostly independent of each other, and do not distinguish strongly between specific benefits.

In order to better understand correlations between individual survey responses, we calculated the correlation coefficients which are displayed below in **Figure 4**.

To understand the summary difference between those who have not used app vs. those who have used one or more, we created a score calculated by adding one each point for responding ''yes'' to any of Questions 7–10 (apps help with either attention, memory, thinking, or mood) and for responding ''no'' to Question 11 (there are dangers to app use). Thus the potential score range is between 0 and 5. The results are displayed below in **Figure 5** which presents perceptions of brain training apps in comparison to the number of brain training apps a subject has reported using, subjects were asked to report barriers toward brain training app use and results are shown below in **Figure 6**.

**Figure 6** shows that our sample is not generally concerned about the security of the data gathered by a brain training app (4% males, 6% females), nor whether the brain training app was recommended by a healthcare provider (3% females, 2% males). In contrast, subjects were most likely to report concerns about the cost of apps (30% females, 25% males), as well as concerns about their effectiveness (25% females, 26% males). To understand the association between app ownership on perceptions of efficacy, we compared the distribution of the number of brain training apps used among those who responded that they have concerns about app efficacy to those who did not respond that they have these concerns. A t-test for a difference in the means of these distributions yielded a p-value of 0.12, failing to reject the hypothesis that those with or without concerns about app efficacy differ in the mean number of brain-training apps they use. We also conducted a similar t-test to understand if there is an association of app ownership with concerns about cost of apps, and again failed to find a significant difference (p = 0.35).

# DISCUSSION

To date, this is the largest Internet survey of user perceptions of brain training apps, which provides a window into the use, barriers, and consumer attitudes towards these programs. The mean age of the 3125 respondents was 32.4 years old, which is consistent with the largest demographic of smartphone owners, app users, and internet survey participants. Our data elucidates strong positive perceptions of cognitive training apps, with roughly equal percentages of respondents reporting a belief that these apps improve thinking (66.9%), memory (70.3%), and attention (69.3%). A significant proportion (53.3%) of respondents also believed that brain training apps have a positive effect on mood.

While nearly 50% of survey respondents were under age 30, our results still provide an interesting window on who is using brain training apps. Rates of smartphone ownership, having health apps, and having brain training apps was very similar although slightly lower in the 31 – 45 age demographic as compared to those less than 30, suggesting a broader appeal of these apps beyond those less than age 30. Given that only 9% of respondents were in the 45 – 60 demographic

and 2% in the above 60 years old demographic it is harder to interpret the results for these groups. This 2% result is interesting as it is in line with a recent survey study suggesting that only 1.2% of US adults over age 65 have ever used a handheld device to track their health (Shahrokni et al., 2015). Given national trends that smartphone ownership in younger generations is reaching saturation, and our results that ownership is also less in older demographics, it is possible that the next wave of growth in brain training apps may come from those who are older as they begin to further adopt smartphones Although our sample size is small for adults over age 60, it is interesting to note that in male subjects, brain training apps were reported downloaded more than health apps. While our study is not designed to answer why this may be so, it suggests that brain training apps may be of strong interest in this population.

Subjects' belief that these apps are beneficial for thinking is strongly correlated with the belief that they improve attention, memory and mood. However, this correlation was not as strong in users who reported prior use of a brain training app. This finding is in line with perceptions of barriers in that using a greater number of apps did not improve consumer attitudes. Those who had never used a brain training app reported a positive response score of 3.5. There was a small positive correlation between the number of apps used and positive responses, with the positive response score increasing only minimally for those who had used more apps, reaching 3.84 for those who had used up to five apps. Regardless of the number of apps used, the use of these programs was not strongly correlated with a change in the positive response score. These results are thus in agreement with a recent study, which found that consumers have high expectations for brain training apps before their use (Rabipour and Davidson, 2015). In addition, we have shown that consumer expectations and perceptions do not change after these apps have been used. It is possible that the positive attitude towards these programs may stem primarily from preexisting expectations, not positive experiences with the apps themselves. Or conversely, that strong preexisting positive expectations may be the primary driving force behind app ownership and usage. Our results also suggest that consumer opinions on barriers to app use, such as cost and perceived efficacy, are not statistically correlated with app ownership or number of apps used by subjects. These seemingly paradoxical results may reflect consumers' inconsistency in the recognition of their own preferences, and highlight the complexity in understanding why consumers choose to use or not use these apps.

Whereas clinicians are concerned with privacy and clinical recommendations and evidence backing new technologies like apps (Huckvale et al., 2015) these features were of least concern to consumers, who cited cost as the primary barrier to use. Of note, consumers also listed uncertainty regarding the efficacy of these programs as a strong barrier to use. Overall, our results suggest that consumers prefer an app that is inexpensive, time efficient, and has an evidence base to support its efficacy. Although our survey did not specify what was meant by health data not being secure, further research could explore the overall lack of concern regarding data security. For instance, it is possible that consumers are unaware of how their healthcare data may be used when using these apps.

While our survey did not assess whether user perceptions of efficacy correlates with actual benefit, this topic has recently become a topic of debate (Brooks, 2014; Rabipour and Davidson, 2015). A recent consensus statement by numerous neuroscientists further underscores the lack of rigorous scientific evidence and the concern for misleading marketing (A Consensus on the Brain Training Industry from the Scientific Community, 2014). While many studies have shown positive results (Green and Bavelier, 2008; Anguera et al., 2013; Ballesteros et al., 2014; Hardy et al., 2015), there is concern that these improvements could also be related to improved skill at using the app, rather than an actual improvement in cognition (Owen et al., 2010; Burch, 2014). Our results, which suggest that positive consumer attitudes are related more to preexisting beliefs than to positive user experiences, could support the notion of a digital placebo effect. Our results also indicate a high demand for a more rigorous, scientific approach to these applications. Continued consumer demand for these applications, despite the current paucity of evidence, could present an opportunity for academic researchers, consumers, and industry to collaborate in an exploration of new approaches to brain training.

Like all survey research, our study has several limitations. Our results are self-reported and many questions, especially around barriers were subjective. We focused on correlations, and while we can speculate on potential links between these correlations, our survey was not designed to address causation. In addition, our data is inevitably skewed towards a younger and more digitally connected population, as our research was conducted online and targeted US residents. This limitation was expected, given our focus on tech-savvy internet users

**37**

global maximum of reported proportion at the maximum radius.

FIGURE 6 | A polar plot showing the proportion of barriers to use reported, stratified by reported gender. Each gender is normalized to show the same total volume, and the magnitude is normalized to show the

indicated with a star.

and the use of the mTurk platform. Our survey was not designed to assess if they were seeking brain training apps for a specific reason, e.g., ADHD symptoms. Although our respondents reported high rates of smartphone ownership, 96% for those below age 30, such numbers are close to the US national average which in 2015 was reported at 85% for this same demographic. While our survey may over represent younger connected individuals, it underrepresents older adults (age >65) as they only represented 54 of our 3125 subjects or 1.7%. Perceptions of brain training in an older demographic is an important future research direction, as many older adults may use brain training apps to address declining cognition. Another potential limitation of survey data in general is poor attention to the online survey task, but the fact that over 98% of respondents answered the distracter math question correctly suggests that they were attentive to the other survey questions as well. Of note, we included response from the slightly less than 2% of respondents who answered the math question incorrectly.

The future of brain training smartphone apps is at a crossroads. One path leads to further development of brain

#### REFERENCES


training apps, driven largely by marketing and expectations, rather than scientific evidence. The other route rests on further app development, with a focus on efficacy research and generalizable benefits. Based on our survey data of consumer perspectives and the current body of scientific literature, it appears that brain training app users would prefer the latter path. With the right scientific efforts, consumer education and empowerment, and partnerships with industry, this goal will hopefully be attainable in the near future.

#### AUTHOR CONTRIBUTIONS

JT and MK conceived the research idea. JT, EF, and JD wrote the protocol and IRB. PS analyzed the data and produced all figures. JT, EF, JD, and MK conducted background literature review. All authors helped in the writing and drafting on this manuscript. All authors edited the manuscript.

#### FUNDING

PS is supported by NIH Grant 2T32AI007358-26 (PI Pagano).


they? Telemed. J. E Health 21, 550–556. doi: 10.1089/tmj.2014. 0103


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer JMR and handling Editor SB declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2016 Torous, Staples, Fenstermacher, Dean and Keshavan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Corrigendum: Barriers, Benefits, and Beliefs of Brain Training Smartphone Apps: An Internet Survey of Younger US Consumers

John Torous <sup>1</sup> \*, Patrick Staples <sup>2</sup> , Elizabeth Fenstermacher <sup>1</sup> , Jason Dean<sup>1</sup> and Matcheri Keshavan<sup>1</sup>

<sup>1</sup> Department of Psychiatry, Beth Israel Deaconess Medical Center, Boston, MA, USA, <sup>2</sup> Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA

Keywords: brain, apps, smartphones, memory, technology assessment

#### **A corrigendum on**

#### **Barriers, Benefits, and Beliefs of Brain Training Smartphone Apps: An Internet Survey of Younger US Consumers**

by Torous, J., Staples, P., Fenstermacher, E., Dean, J., and Keshavan, M. (2016). Front. Hum. Neurosci. 10:180. doi: 10.3389/fnhum.2016.00180

#### Edited and reviewed by:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia, Spain

> \*Correspondence: John Torous jtorous@bidmc.harvard.edu

Received: 02 May 2016 Accepted: 17 May 2016 Published: 02 June 2016

#### Citation:

Torous J, Staples P, Fenstermacher E, Dean J and Keshavan M (2016) Corrigendum: Barriers, Benefits, and Beliefs of Brain Training Smartphone Apps: An Internet Survey of Younger US Consumers. Front. Hum. Neurosci. 10:253. doi: 10.3389/fnhum.2016.00253 **Reason for Corrigendum**: Addition of conflict of interest statement by Dr. Matcheri Keshavam. **Clearly state the mistake being fixed**.

After publication, Dr. Matcheri Keshavan noted the paper should include this statement "MK has a contract to purchase Lumosity services for one of his studies, and has provided consultant services to Forum Pharmaceuticals."

## AUTHOR CONTRIBUTIONS

JT and MK conceived the research idea. JT, EF, and JD wrote the protocol and IRB. PS analyzed the data and produced all figures. JT, EF, JD, and MK conducted background literature review. All authors helped in the writing and drafting on this manuscript. All authors edited the manuscript.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

MK has a contract to purchase Lumosity services for one of his studies, and has provided consultant services to Forum Pharmaceuticals.

Copyright © 2016 Torous, Staples, Fenstermacher, Dean and Keshavan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Attentional Filter Training but Not Memory Training Improves Decision-Making

Marlen Schmicker 1,2\*, Patrick Müller 1,2 , Melanie Schwefel 1,2 and Notger G. Müller 1,2,3

<sup>1</sup>Neuroprotection Laboratory, German Center for Neurodegenerative Diseases (DZNE), Magdeburg, Germany, <sup>2</sup>Otto von Guericke University, Magdeburg, Germany, <sup>3</sup>Center for Behavioral Brain Sciences, Magdeburg, Germany

Decision-making has a high practical relevance for daily performance. Its relation to other cognitive abilities such as executive control and memory is not fully understood. Here we asked whether training of either attentional filtering or memory storage would influence decision-making as indexed by repetitive assessments of the Iowa Gambling Task (IGT). The IGT was developed to assess and simulate real-life decision-making (Bechara et al., 2005). In this task, participants gain or lose money by developing advantageous or disadvantageous decision strategies. On five consecutive days we trained 29 healthy young adults (20–30 years) either in working memory (WM) storage or attentional filtering and measured their IGT scores after each training session. During memory training (MT) subjects performed a computerized delayed match-to-sample task where two displays of bars were presented in succession. During filter training (FT) participants had to indicate whether two simultaneously presented displays of bars matched or not. Whereas in MT the relevant target stimuli stood alone, in FT the targets were embedded within irrelevant distractors (bars in a different color). All subjects within each group improved their performance in the trained cognitive task. For the IGT, we observed an increase over time in the amount of money gained in the FT group only. Decisionmaking seems to be influenced more by training to filter out irrelevant distractors than by training to store items in WM. Selective attention could be responsible for the previously noted relationship between IGT performance and WM and is therefore more important for enhancing efficiency in decision-making.

#### Edited by:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia (UNED), Spain

#### Reviewed by:

Tracy L. Luks, University of California, San Francisco, USA Ben Godde, Jacobs University Bremen, Germany

> \*Correspondence: Marlen Schmicker marlen.schmicker@dzne.de

Received: 07 November 2016 Accepted: 09 March 2017 Published: 23 March 2017

#### Citation:

Schmicker M, Müller P, Schwefel M and Müller NG (2017) Attentional Filter Training but Not Memory Training Improves Decision-Making. Front. Hum. Neurosci. 11:138. doi: 10.3389/fnhum.2017.00138 Keywords: decision-making, distractor inhibition, filter training, Iowa Gambling Task, working memory

# INTRODUCTION

In everyday life we have to make decisions all the time. Successful decision-making requires the ability to make decisions that are unpleasant at the moment, but are advantageous in the long run. The Iowa Gambling Task (IGT) can be considered as a proxy to this real-life function. Performing advantageously on this task depends on, as in real-life, dealing with uncertainty in a context of punishment and reward (Brevers et al., 2013). The IGT was originally designed for patients with lesions of the ventromedial prefrontal cortex (Bechara et al., 1994, 1997). Subjects have to pick cards from four different decks to maximize their monetary gain (Bechara et al., 1994). Of these four decks, two decks offer the opportunity to obtain large gains but are also associated with greater losses (disadvantageous: A and B), whereas the other two decks result in small wins but also smaller losses (advantageous: C and D). The decks further differ in their payoff scheme. Decks A and C involve frequent losses; decks B and D infrequent losses. Normal subjects develop a tendency towards advantageous decisions and improve their performance after picking up ∼100 cards (Overman and Pierce, 2013). The behavior of choosing the cards depends on cognitive processes and emotional states (Bechara and Damasio, 2005). The somatic marker hypothesis assumes that decision processes are influenced by emotion-based signals arising from the body to guide human behavior (Damasio et al., 1991). Other studies have suggested a stronger association with cognition and have reported correlations with executive functions (working memory (WM), inhibition, set-shifting) and intelligence (Webb et al., 2014), but others do no support these findings (Toplak et al., 2010).

The extent to which subjects are able to develop advantageous decision strategies in the sequential learning task seems to be related to WM. In a dual task, participants were not able to develop implicit learning strategies under concurrent high WM load. Their IGT performance was lower than with no/low WM load (Cui et al., 2015). In the past few years, researchers have tried to investigate individual differences in WM performance and their relation to decision-making. Subjects with high WM capacity develop a more advantageous strategy than those with low WM capacity (Bagneux et al., 2013). However, the results of studies concerning decision-making and its relation to WM are very inconsistent. To resolve this discrepancy one could ask whether cognitive load and the ability to inhibit irrelevant information influence IGT performance. A recent study investigated the cognitive load effects of divided and full attention on deck selection in the IGT (Hawthorne and Pierce, 2015). They found that a disadvantageous strategy in the divided attention group and limited cognitive resources were responsible for bad decisions. However, the role of attention in decision-making has yet to be properly researched, especially regarding individual differences in distractor inhibition abilities. Gansler et al. (2011) modulated a structural equation model to predict IGT performance. The authors showed that successful IGT performance demands different cognitive functions, and the prediction from attention was twice as strong as the prediction from other executive functions.

The inconsistent results regarding the relationship between WM and decision-making led us to the question whether a third cognitive component could moderate the relation between WM and the IGT. Based on the correlation between WM updating tasks and decision-making (Achtziger et al., 2014), we assumed that an attentional selection component inherent in these updating tasks may be the reason for the close relation. Another empirical evidence for this theory was the finding that subjects with high WM capacity are better able to filter out irrelevant items in a visual WM paradigm (Vogel et al., 2005). The role of selective attention in WM was also emphasized by other authors (Conway and Engle, 1996; Cowan, 1999) and was confirmed by the finding of overlapping neuronal correlates (Awh and Jonides, 2001; Awh et al., 2006). These observations led us to ask whether selective attention is the reason for the inconsistent results on the relation between WM and decision making. In order to investigate the relation between selective attention and decision making further, we employed a cognitive training paradigm aimed at inducing transfer effects from selective attention to decision making (Moreau et al., 2016). We asked whether this selective attentional training would influence IGT performance more than WM storage training. So far, only emotion regulation strategies have been shown to facilitate decision-making (Martin and Delgado, 2011). Here, we designed a task that—like in memory updating—required selective attention. We developed two variants of a difference detection paradigm that stressed either memory storage or selective attention demands. One group performed the selective attention (filter) training, the other performed the memory storage training. In a prior study (Schmicker et al., 2016), we had compared transfer effects of both training regimes on WM performance and observed stronger transfer effects of the selective attention training program. We now asked how the different trainings influenced IGT performance. We assumed that training of the core function of selective attention would enhance the tendency to make advantageous decisions as indexed by higher IGT gains more than memory storage training.

# MATERIALS AND METHODS

#### Sample

Twenty-nine young, healthy subjects (24.31 years ± 2.9, 15 female) took part in the study. All participants were righthanded and had correct or corrected to normal vision. The study protocol was approved by the ethics committee of the University of Magdeburg (Germany). All subjects gave their written informed consent in accordance with the Declaration of Helsinki to participate in the study and were paid 100 € for complete participation. In addition, they were paid an amount of money that was calculated as the mean of the gain from five IGT sessions (mean: 20.94 €; range: 14.20 €–29.90 €). The participants were divided into two training groups: 15 subjects (8 female) received the filter training (FT), and 14 subjects (7 female) received the memory training (MT). The groups were matched according to their age and performance in attention and WM.

## Design

During the training (training days 1–5; Monday to Friday), half of the subjects received 60 min of memory storage training (MT), while the other half trained selective attention by having to filter out irrelevant stimuli for 60 min (FT). In the middle of each training session, a break was used to present the IGT (**Figure 1**).

# Experimental Tasks

#### Training

In MT (**Figure 2A**) participants had to compare two arrays presented consecutively with a delay of 900–1400 ms at the center of the screen. The arrays consisted of 4–6 horizontal and vertical bars (1.43◦ × 0.29◦ ) of one color (either red or green). They were shown after a black square-shaped cue had been presented. The task was to decide whether there was a

bar direction change in one of the presented stimuli. Hence, the MT training lacked the necessity to filter out irrelevant distractors.

In FT participants had to compare two simultaneously presented arrays of colored bars (**Figure 2B**). In these trials, a red or green cue instructed them to compare either only the red or green bars of the double array while ignoring bars of the other color. The arrays consisted of 4–6 relevant bars and the same amount of irrelevant distractors. All bar arrays were presented within a 4◦ × 9.3◦ rectangle against a gray background and were placed 1.79◦ to the right and left of the central fixation cross. The participants were instructed to decide whether the simultaneously presented arrays matched in terms of the orientation of the bars in the relevant color. This condition did not demand memory storage.

MT and FT subjects pressed one of two buttons to respond. In half of the trials, the orientation of one target changed, and in the other half of the trials, no orientation change occurred. Seventy two trials of each were presented in each of the three conditions. The different trials were randomized within four blocks (runs) and presented in a counterbalanced order across participants.

528 training trials were created for the training sessions. Every daily session consisted of 200 training trials in a randomized order. The trainings sessions became more difficult over the week by successively increasing the number of presented bars within one array (for more details, see Schmicker et al., 2016).

#### Iowa Gambling Task

Each of the training days included a short break after 100 training trials in FT or MT. During this break all subjects performed the IGT. Apart from our interest in the impact of our different training protocols on decision-making, the IGT implementation was also intended to keep our subjects motivated by providing them with additional reward. The repeated application of the IGT entails the possibility of practice effects within the IGT so that improvements in IGT scores cannot be solely attributed to transfer effects induced by FT or MT, respectively. Note, however, that these practice effects cannot account for group differences as the IGT assessment did not differ across groups. Group differences, which were the main focus of this study, must, therefore, be related to the different cognitive training protocols. We used the computerized version of the original IGT, which is identical to the reward structure and instructions of

the IGT established by Bechara et al. (1994) presented with the Software Inquisit 4.0.8.0<sup>1</sup> . Before the task every subject was given a fictitious amount of money (20 Euro). Four card decks were presented on the screen, and participants were instructed to select a card by clicking with the mouse cursor on one of the decks to increase their gain of money. Additional information occurred at the bottom of the screen. The reward was reported in green letters, the penalty in red letters, and the total score in a black font. The obtained gain of money was paid out at the end of the study.

# Data Analysis

We analyzed correct answers and reaction times as outcomes for the two training conditions in the bar paradigm. For IGT data, we recorded the total gain of one session for each subject. First, we examined the correlations of the IGT score (training day 1) with correct answers in MT and FT. Second, we investigated the training effects for both training groups and analyzed correct answers and reaction times for the five training days using one-way repeated measures ANOVA and t-tests for both groups. Third, we compared the change in the IGT score of both training groups together and separately in each group in one-way repeated measures ANOVAs (with five levels), with group as the between-subject factor. In addition, we performed paired t-tests for each group comparing training day 1 and training day 5 with a hypothesis defined a priori. Afterwards, the training success in MT and FT was correlated to test if training success correlated with IGT changes (day 5–day 1) for all subjects together and both groups separately.

# RESULTS

# Cognitive Training Effects

To address the question whether FT and MT affected the performance within the trained task, we analyzed performance changes within the five training days. FT trials were more difficult than MT trials (FT: 61.97% ± 2.19%, MT: 78.59% ±1.71%; t(1,28) = 10.15, p < 0.001). Performance in the trained conditions increased for both groups (MT, FT) during the five sessions (**Figure 3**). For both groups, we found a main effect of training sessions in correct answers (MT: F(4,9) = 5.43, p = 0.017; η 2 <sup>p</sup> = 0.707; FT: F(4,8) = 4.65, p = 0.031; η 2 <sup>p</sup> = 0.699) and reaction times (MT: F(4,9) = 5.40, p = 0.017; η 2 <sup>p</sup> = 0.706; FT: F(4,8) = 4.63, p = 0.031; η 2 <sup>p</sup> = 0.698). One-sample t-tests for day 1 and day 5 performances showed significant t-values (p < 0.05) for both. Correct responses increased for all subjects (4.5% for MT, 7.8% for FT), while reactions time decreased (61 ms for MT, 134 ms for FT) during training.

# IGT Changes during Cognitive Training

The main focus of this article is on how the two different training protocols impacted decision-making performance. IGT scores were repeatedly measured during the 5-session-training period for both groups (see **Figure 4** and **Table 1**). In a betweensubject ANOVA with five levels, no main or interaction effect was observed for the factor group. Because of our a priori

<sup>1</sup>http://www.millisecond.com

FIGURE 3 | Increase in working memory (WM; MT) and filtering (FT) training. Mean values and standard errors of correct responses (A) and reaction times (B) are visualized.

defined hypotheses, we calculated ANOVAs for each group. For the factor ''days'', no main effect was observed for the MT group, but there was a main effect for the FT group (MT: F(4,9) = 0.20, p = 0.661; η 2 <sup>p</sup> = 0.015; FT: F(4,8) = 6.90, p = 0.020; η 2 <sup>p</sup> = 0.330). Subjects who had been trained on filtering out distractors won 7.00 € (SD: 11.21 €) more in the last gambling session (training day 5). MT subjects increased their winnings by 2.75 € (SD: 14.93). A one-sample t-test showed significant mean differences for the FT group (t(1,14) = −2.42, p < 0.030) but no significant increase for the MT group (t(1,13) = −0.69, p = 0.50). Additionally, Cohen's d was calculated. We found a large effect size for the FT group and a small d for MT (see **Table 1**).

Performed t-tests for independent samples revealed significant mean differences between the FT and MT group on day 4 (t(1,27) = −2.14, p < 0.041). The FT group had a 5.24 € higher gain than the MT group. For training day 5, a two-sample t-test did not reach significance due to a higher variance in the values, but Cohen's d showed a moderate effect for both days. All means, standard deviations and effect sizes are presented in **Table 1**.

We did not find any correlations of the training success in FT and MT with IGT changes.

#### DISCUSSION

The current study investigated the relationship between WM storage, selective attention and decision-making by testing whether training of either memory storage or attentional filtering influences performance in the IGT. During training, performance increased in the two trained tasks. Regarding IGT performance, for the first three training days, there was no difference between both training groups. Starting with training day 4, moderate effect sizes reflect that subjects who received FT made more advantageous decisions in the IGT than those in the MT group. FT participants also won a significant higher amount of money between day 1 and day 5 than MT participants, which is emphasized by a strong effect size.

These findings suggest different impacts of the trained tasks on decision-making in the IGT. During FT subjects learn to ignore irrelevant items during the encoding process (Schmicker et al., 2016). How does this capability induce effects on learning in decision-making? Knowing that additional cognitive load increases disadvantageous deck selection in the IGT (Hawthorne and Pierce, 2015), the increased ability to filter out irrelevant information after FT may have freed attentional resources for



Statistical results for t-values (t) and significance (p) for group mean differences between (FT-MT) and within (day 5–day 1) groups are reported. Means and standard deviations are stated in €-cent. Significant differences are marked with a star. Cohen's d was calculated with the standard formula using the pooled standard deviations. Strong effect sizes are highlighted in bold type. Fourteen subjects received the memory training (MT), 15 participants were trained with the filter training (FT).

goal-driven decisions in the IGT. Alternatively or in addition, FT may have enhanced not only the top-down control of distracting information but also that of emotions and effective emotional control has been shown to favor advantageous decisions in the IGT (Martin and Delgado, 2011). The sensory marker hypothesis (Bechara and Damasio, 2005) claims that prior emotional processes induced by gains or losses influence implicit and explicit knowledge for advantageous decisions. Assuming an attentional control mechanism for these sensory markers, learning to filter out irrelevant information could extend to the suppression of strong emotions (Brevers et al., 2013) produced by salient bottom-up stimuli (high losses, high gains). In turn, attention could be guided towards less (emotionally) salient stimuli allowing for an advantageous strategy during decision-making. Considering that successful attentional filtering is associated with activity in posterior middle frontal gyrus (McNab and Klingberg, 2008) and training of attentional filtering enhances activity in this area (Schmicker et al., 2016), one can assume that emotion-related neural signals from the amygdala or thalamus (John et al., 2016) are better controlled by this frontal region after FT. Therefore, learning to inhibit emotions might favor more rational decisions. However, a direct neural evidence for this assumed role of attentional filtering in decision-making still needs to be provided and future research should address whether learning to inhibit distracting information extends to the successful suppression of emotional salient stimuli resulting in advantageous decisionmaking. Alternatively, the frontal activity observed in the cited earlier studies and ascribed there to attentional control might be related to motor imagery during the preparation of finger movements for responding, especially as the activity increases emerged in rather posterior, premotor areas (Hanakawa et al., 2008).

Why did the MT group fail to show an increase in IGT performance? First, our previous training study (Schmicker et al., 2016) indicates that MT subjects do not effectively differentiate between relevant and irrelevant information and, therefore, store unnecessary content. Relating this inefficient strategy to decision-making, subjects may have stored information or emotional markers that were irrelevant for a gainful gambling behavior. In other words, simply improving information storage does not make decisions better.

Second, mood can modulate decision processes and, therefore, IGT performance (Bagneux et al., 2013). Fatigue or boredom induced by the less demanding MT tasks may have made participants more susceptible to emotionally attributed stimuli and to avoid them. Harm avoidance is a personality trait that has been shown to cause an inability to filter out irrelevant distractors (especially for emotion-attributed stimuli) and could lead to disadvantageous behavior (Most et al., 2005). Alternatively, the more effortful FT tasks might have simply increased the arousal levels with which subjects addressed the IGT. In contrast the easier MT task with its behavioral ceiling effects may have not been challenging enough to induce transfer effects. Future research should therefore make use of a more challenging memory storage training, for example one that includes constant updating processes. Finally, the lack of a passive control group and the small sample size limits the possibility to draw conclusions regarding possible unspecific effects of the active interventions on IGT behavior. Therefore, based on the present results we cannot specify further which processes underlay the FT induced effects on the IGT.

In sum our results indicate that training of attentional control entails better decisions in the IGT. Whether this finding relates to decision-making in real-life has not yet to be shown. Also, the link between the ability to inhibit emotional reactions and to inhibit distracting information during decision-making needs more elaboration. It would be interesting to investigate neural correlates of FT in decision-making and whether the effects can be transferred to IGT performance in independent measurements and to decision-making in everyday life.

# AUTHOR CONTRIBUTIONS

MaS designed the experiment, collected and analyzed the data and drafted the article. MeS supported the development of the paradigm and the data collection. PM and NGM contributed to the discussion of content-related issues and to the critical revision of the article.

## ACKNOWLEDGMENTS

This work was funded by Deutsche Forschungsgemeinschaft (DFG) grant Mu1364/4.

# REFERENCES


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Schmicker, Müller, Schwefel and Müller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Have Standard Tests of Cognitive Function Been Misappropriated in the Study of Cognitive Enhancement?

Iseult A. Cremen<sup>1</sup> \* and Richard G. Carson1,2

<sup>1</sup>Trinity College Institute of Neuroscience and School of Psychology, Trinity College Dublin, Dublin, Ireland, <sup>2</sup>School of Psychology, Queen's University Belfast, Belfast, United Kingdom

In the past decade, there has emerged a vast research literature dealing with attempts to harness brain plasticity in older adults, with a view to improving cognitive function. Since cognitive training (CT) has shown restricted utility in this regard, attention has increasingly turned to interventions that use adjunct procedures such as motor training or physical activity (PA). As evidence builds that these have some efficacy, it becomes necessary to ensure that the outcome measures being used to infer causal influence upon cognitive function are subjected to appropriate critical appraisal. It has been highlighted previously that the choice of specific tasks used to demonstrate transfer to the cognitive domain is of critical importance. In the context of most intervention studies, standardized tests and batteries of cognitive function are de rigueur. The argument presented here is that the latent constructs to which these tests relate are not usually subject to a sufficient level of analytic scrutiny. We present the historical origins of some exemplar tests, and give particular consideration to the limits on explanatory scope that are implied by their composition and the nature of their deployment. In addition to surveying the validity of these tests when used to appraise intervention-related changes in cognitive function, we also consider their neurophysiological correlates. In particular, we argue that the broadly distributed brain activity associated with the performance of many tests of cognitive function, extending to the classical motor networks, permits the impact of interventions based on motor training or PA to be better understood.

#### Edited by:

Claudia Voelcker-Rehage, Technische Universität Chemnitz, Germany

#### Reviewed by:

Giuliana Lucci, IRCCS Santa Lucia of Rome, Italy Elizabeth A. L. Stine-Morrow, University of Illinois at Urbana-Champaign, United States

\*Correspondence:

Iseult A. Cremen cremeni@tcd.ie

Received: 31 October 2016 Accepted: 10 May 2017 Published: 24 May 2017

#### Citation:

Cremen IA and Carson RG (2017) Have Standard Tests of Cognitive Function Been Misappropriated in the Study of Cognitive Enhancement? Front. Hum. Neurosci. 11:276. doi: 10.3389/fnhum.2017.00276 Keywords: aging, physical activity, coordination training, motor fitness, brain imaging

# INTRODUCTION

A compelling body of evidence indicates that the aging brain's structure and function can be altered by factors amenable to intervention in later life, such as physical activity (PA) and social enhancement (for review see Ballesteros et al., 2015). As a result, the research literature now documents a multitude of attempts to harness the brain's capability for adaptive reorganization and change i.e., ''neuroplasticity''. The majority of these endeavors aim to improve ''cognition''. It is readily apparent that this term encompasses a wide range of putative capabilities. Expressed in the language of cognitive science, these may include: executive function, memory, attention and processing speed. In the context of many intervention studies, standardized tests and batteries are employed to operationalize these elements and examine the degree to which they are amenable to directed change. The argument presented herein is that the latent constructs to which such tests relate are not usually subject to a sufficient level of analytic scrutiny. Relatedly, empirical evidence to the effect that a specific intervention has an impact upon a particular measure of cognitive function does not necessarily lend support in terms relevant to how an older adult functions in daily life (e.g., Simons et al., 2016). Ecological validity is frequently defined as the extent to which results obtained in controlled experimental settings apply to real-world naturalistic settings (Tupper and Cicerone, 1990). In order for interventions to be deemed truly effective therefore, the benefits should generalize to functions germane to everyday life, such as competence and autonomy, and not simply the specific tasks upon which one is trained or tested (Lövdén et al., 2010). With respect to many of the tests that are employed to evaluate interventions to improve cognition, ecological validity is bound by the limits on explanatory scope that are implied by their composition (as distinct from their customary interpretation). This limitation notwithstanding, with the widespread availability of neuroimaging, it is further becoming apparent that the neurophysiological correlates of test performance are frequently at odds with those that are assumed by their adherents. Beyond highlighting the challenges posed by these considerations, we examine how they permit the seemingly positive impact of some forms of PA upon tests of cognitive function to be better understood.

# SEEKING TO IMPROVE COGNITIVE FUNCTION IN OLDER ADULTS

Of the approaches that have been applied in an effort to improve cognitive function in older adults, the most common are cognitive training (CT) and PA. The former typically encompasses games or exercises designed to challenge specific cognitive skills. In contrast, PA interventions use exercise and movement program to improve physical capability, with the expectation that there will also be a positive impact in the cognitive domain (Bamidis et al., 2014). In relation to CT, although meta-analytic reviews have shown small improvements on measures of intermediate transfer of training gains to untrained tasks, there is little evidence of transfer to ''real world'' cognitive skills (Lampit et al., 2014; Melby-Lervåg et al., 2016; Simons et al., 2016). As Druin Burch noted: ''Doing something repeatedly can make you better at it, which is not the same as saying it makes you better'' (Burch, 2014, p.2).

PA interventions, in the forms of aerobic exercise and resistance training, appear to yield somewhat more consistent positive effects upon cognitive function in older adults (Colcombe and Kramer, 2003; Smith et al., 2010). Classes of PA that place greater explicit emphasis upon the generation of coordinated movement (and allude to a concept of ''motor fitness'') are now receiving particular attention (e.g., Voelcker-Rehage et al., 2011; Forte et al., 2013; Berryman et al., 2014; Moreau et al., 2015; Johann et al., 2016). In some cases the rationale for such approaches includes an emphasis on the ''cognitive'' demands of coordinated goal-directed movement, such as anticipatory planning and mapping sensation to action (Voelcker-Rehage et al., 2010). As the boundaries between variants of ''cognitive'' and ''physical'' training become blurred, it is an opportune moment to consider critically the nature of outcome measures used to infer causal influence upon cognitive function.

Necessarily the choice of task(s) used to demonstrate transfer of training related adaptations to the cognitive domain is of critical importance, as it determines the weight of the inferences that can be drawn. In view of this dependency, it has been recommended that multiple measures should be used to minimize measurement error and provide reliable and accurate estimates of the target construct (e.g., Shipstead et al., 2012; Moreau et al., 2016). In many cases, standardized tests and batteries (e.g., CAMDEX, Cogstate, The NIH Toolbox Cognition Battery) are de rigueur. Such selections are designed to ensure that the measurement instruments have been validated, are widely used and accepted, and permit comparisons across multiple studies. In the majority of cases however, these tests were devised to achieve an aim radically distinct from that of measuring enhancements in the cognitive functioning of older adults. We offer a perspective that includes an historical dimension, a delineation of limits on inference, and is informed by contemporary developments in neuroimaging. In examining the ecological and construct validity of prototypical tests, prudence necessarily emerges in relation to that which may be construed from their use (Heinrichs, 1990; Franzen and Arnett, 1997; Chaytor and Schmitter-Edgecombe, 2003).

# SOME TESTS OF COGNITION FUNCTION

We do not seek to be comprehensive with regard to the tests of cognitive function that are employed in contemporary cognitive neurorehabilitation. Instead, we discuss a small number of exemplar tests, not in an attempt to target their specific uses and/or misappropriation, but rather to highlight the limits on their explanatory scope. In addition, we draw attention to the fact that the broadly distributed brain activity associated with the performance of these tests precludes reification in terms of any discrete cognitive processes (Uttal, 2013). Indeed, the most pervasive feature of the brain activation associated with these tests is engagement of the classical motor networks. We focus on common tests used to assess three ''core executive functions'' (Diamond, 2013, p.135): inhibition/inhibitory control, Working memory (WM) and cognitive flexibility/set-shifting (Miyake et al., 2000; Diamond, 2013).

# Response Inhibition Tasks

Response inhibition tasks are used commonly to assess a facility to suppress prepotent actions and carry out a goal-directed response (Diamond, 2013). Response inhibition is said to be a key factor in successful cognitive and motor control (Chambers et al., 2009). The Eriksen Flanker task—perhaps the most common variant, was devised by Eriksen and Eriksen (1974). It is a speeded response time task that explores the effect of ''flanker'' distractor stimuli on target identification reaction time (RT). RT typically increases when the target stimulus is surrounded by ''incongruent'' distractor stimuli—letters or shapes from the target set that require a different response. The Flanker task is included in the NIH Toolbox Cognition Battery (Gershon et al., 2013), the Attentional Network Test (ANT; Fan et al., 2002) and a variant forms part of the CANTAB battery (Attention Switching Task). It is emblematic of a class of response inhibition tests, including the Simon task, that have been used to assess a supposed ability to suppress responses that are inappropriate in a particular context.

The Flanker task appears frequently within the cognitive enhancement literature, in particular in studies exploring the associations between cognitive function and PA (e.g., Colcombe et al., 2004; Davranche et al., 2009). This footing may in itself allude to the neural processes and adaptations to which the task may be sensitive. In Colcombe et al. (2004), a 6-month aerobic exercise intervention was shown to enhance cognition in older adults—as evidenced by improvement on the Eriksen Flanker task. It is no surprise then, that this test has since been used in many other studies examining aerobic PA (e.g., McMorris et al., 2009; Weng et al., 2015), resistance training (e.g., Liu-Ambrose et al., 2012), yoga (Gothe et al., 2013) and recently in those focusing on motor fitness (e.g., Voelcker-Rehage et al., 2010; Schoene et al., 2015). It has been included as a measure of executive function; described variously as a test of selective attention, response inhibition and information processing. In light of the fundamental characteristics of the test however, and in view of the nature of the interventions that give rise to a change in the level of performance—i.e., having an emphasis upon the selection of voluntary movements, the conclusion might be drawn that it will have a high degree of sensitivity to the functional state of elements within the classical motor networks.

The prototypical design is that in which responses in the presence of congruent or incongruent flankers are each compared to responses to neutral flankers. In many studies a direct comparison is made between responses in the presence of congruent flankers and responses in the presence of incongruent flankers. In the context of both designs, the interpretation of outcomes relies upon a ''subtraction logic'', whereby it is assumed that the same motor output is required in each instance, and that any difference between conditions (expressed via any given dependent measure) derives from other sources. With respect to the flanker task, the resulting contrast measure is hypothesized to be a ''pure'' measure of response inhibition, divorced of motor influence.

In a series of brain imaging studies, all of which employed the subtraction logic, it has been shown that during performance of the Flanker task, various elements of the cortical motor network including the pre-supplementary motor areas (pre-SMA) and SMAs (Bunge et al., 2002; Taylor et al., 2007) and Brodmann area 6 (BA6) more broadly defined (Zurawska Vel Grajewska et al., 2011; Caruana et al., 2014) exhibit differential activity in the congruent and incongruent conditions. The characteristics of neural activity registered in primary motor areas also differ reliably in the context of responses made in the presence of congruent and incongruent flankers (Grent-'t-Jong et al., 2013; see also Praamstra et al., 1998, 1999; Verleger et al., 2009). On the basis of such evidence it has been proposed that the executive control nominally sampled by these tests represents an evolutionary extension of the frontal cortex-basal ganglia loops that guide resolution of (motor) response conflict, such that the role of the supporting neural mechanisms extends to a range of processes including the reorienting of attention and the updating of WM (Neubert et al., 2013).

In line with the more general argument that is advanced in this piece, it should also be noted that, while at a phenomenological level the same motor response (e.g., depression of a response key) may appear to be generated in each condition, the state of the ''motor circuitry'' varies systematically across conditions. This can be revealed in a number of ways. As the latency between the onset of electromyographic (EMG) activity and the start of the response movement is longer when incongruent flankers are present than when congruent flankers are present (Coles et al., 1985; Eriksen et al., 1985; Smid et al., 1990), and shorter in the congruent condition than the neutral condition (Smid et al., 1990), differences in muscle activation dynamics are implied. It can also be shown that the time course of changes in the excitability of corticospinal projections to motoneurons innervating muscles that act as an agonists in generating the manual response when incongruent flankers are present, is distinct from that associated with responses made in the presence of congruent flankers (and in control conditions; Michelet et al., 2010; see also Klein et al., 2014; Duque et al., 2016).

In the absence of additional measurements, the possibility that intervention related changes in the state of motor networks contribute to changes in the magnitude of flanker effects cannot be excluded. Thus, when drawing inferences on the basis of the flanker task, and indeed response inhibition tests more generally, it is necessary to recognize that motor function is central to their interpretation.

## Working Memory Tasks

WM is frequently described in such terms as the ability to hold and manipulate information in one's mind (Baddeley and Hitch, 1994; Smith and Jonides, 1999). The n-back task was introduced by Kirchner (1958) to measure differences in performance between younger and older adults on a ''paced'' task. A light flashed on and off in sequence at one of 12 locations, and participants were required to press buttons indicating where the light had gone out n positions before. As n increased, older adults (60–84) performed more poorly on this task when compared to younger adults (18–24). This was attributed to a slowing down of ''central organizing processes of the brain'' (p.357). A commonly overlooked element of this original experiment was that when the time allowed for the response was increased (from 1.5 s to 4.5 s), the performance of older adults improved substantially, leading the authors to conclude ''the time factor plays an important part in the results'' (p. 356).

In modern variants, stimuli may be auditory, visual, or in the case of the more recently developed dual n-back, auditory and verbal stimuli may be presented simultaneously (Jaeggi et al., 2003). The latter version has become popular as a WM training task (Jaeggi et al., 2008). In spite of weak convergence with other measures of WM (e.g., poor correlation with operational span (OSPAN), Kane et al., 2007, and with backward digit span, Miller et al., 2009), the task has become paradigmatic in both clinical and experimental settings. It is included in popular neuropsychological test batteries (e.g., Cogstate; the Penn Computerized Neurocognitive Battery, Gur et al., 2010), and has been used as a measure of WM in a number of physical intervention studies that have sought to enhance cognitive function (e.g., Kramer et al., 2002; Hansen et al., 2004; Gothe et al., 2013). It is however rarely the case that steps are taken in an attempt to parse separately the component elements of task performance.

A strong case can be made that when the n-back task is used in the context of intervention studies, subtraction logic should be applied. This is borne of the recognition that the n-back is a dual-task with two dissociable subcomponents. These comprise a WM updating subtask—involving the ''encoding, manipulation, search and selection of information in WM'', and a matching subtask—requiring the comparison of a currently presented stimulus with the (previous) one already selected (Watter et al., 2001, p. 999). In most implementations of the paradigm, participants are afforded sufficient time to complete the selection of the n-back stimulus prior to the presentation of a new stimulus, and thus the demands of the matching subtask are in principle the same across different n-back variants (i.e., 0-back, 1-back, 2- back etc). Generally this characteristic simplifies the interpretation of the data derived from the n-back paradigm (i.e., in relation to the impact of variations in ''memory load''). In the case of changes in performance arising from an intervention however, it is not possible to exclude the possibility that a decrease in RT (or increase in accuracy) obtained for a single variant (e.g., 2-back) is attributable to a change in execution of the matching subtask. To take account of this caveat, it is necessary to express the level of performance achieved in variants with presumed higher memory load (e.g., 2-back) relative to a reference condition that also includes the matching sub-task (e.g., 0-back). We are aware of very few intervention studies with a focus on motor training or PA in which this step has been taken. Indeed, in many of the studies which have reported a positive impact upon n-back performance, either a single n was included (e.g., Hansen et al., 2004; Stroth et al., 2010; Hogan et al., 2013), or in cases in which data for more than one n were available (e.g., Erickson et al., 2013), normalization procedures were not applied. It is notable that with only one exception of which we are aware (Weng et al., 2015), with respect to those studies in which performance measures for more than one level of n were included in the analysis design, differential effects (e.g., 0-back vs. 2-back) of a motor training or PA intervention have not been reported (e.g., Kramer et al., 2002; Gothe et al., 2013). In the absence of suitable contrasts or normalization procedures, it is not evident that intervention-related improvements in the performance of an nback task variant can be attributed to changes in the efficiency of WM processes. It also remains to be determined whether different n-back variants are characterized by distinct motor signatures—in the manner of those that distinguish the various conditions of the flanker task.

Although the speeded response selection characteristics of the matching subtask, that is intrinsic to the n-back, make plain that significant demands are imposed on the motor system, the ramifications of this are also amenable to scrutiny in terms of neurophysiology. In two activation likelihood estimation (ALE) meta-analyses published simultaneously (Glahn et al., 2005; Owen et al., 2005), 12 brain atlas delineated areas of activation were highlighted in association with performance of the n-back task. In the subset of only five areas for which there was a corresponding response across the two analyses BA6 was prominent. Although it is one of the largest regions in the Brodmann scheme, and a diversity of functions would thus be anticipated, since area 6 includes premotor cortex and SMA it necessarily has a fundamental role in regulating motor output. With respect to WM tests however, engagement of the cortical motor network is not simply a unique feature of the n-back protocol. In a comparison of seven further meta-analyses (in addition to the two that used the n-back procedure), Uttal (2013) noted that forty-seven Brodmann areas were reported as being activated during WM tasks (i.e., across the nine metaanalyses). Of these 47 brain regions, only BA6 was designated as being activated in every case. Indeed, in the context of extremely large variations in regional brain activation, the detection of signal in cortical motor areas during WM tasks is one of the most robust findings (e.g., Niendam et al., 2012). It has furthermore been determined that the threshold at which motor responses to transcranial magnetic stimulation (TMS) can be obtained—which is a measure of the excitability of corticospinal projections from primary motor cortex (M1), is negatively correlated (across individuals) with performance in n-back tasks (Schicktanz et al., 2013; Bridgman et al., 2016). As with response inhibition tests therefore, intervention related improvements in the performance of the n-back test in particular, and of WM tasks in general, may be mediated, at least in part, by adaptive changes within parts of the cortical motor network.

## Cognitive Flexibility Tests

The third ''core'' element of executive function is considered to be cognitive flexibility/set shifting. The Trail Making Test (TMT) is used frequently in this context (Butler et al., 1991; Sellers and Nadler, 1993; Rabin et al., 2005). It is variously described as measuring cognitive flexibility, processing speed, sequencing, (Arbuthnott and Frank, 2000; Bowie and Harvey, 2006; Ashendorf et al., 2013), visual search, scanning and executive functions (Tombaugh, 2004). The TMT, originally devised in 1938 and known first as ''Distributed Attention'' and then as Partington's ''Pathway Test'' (Partington and Leiter, 1949), was originally intended as a test of speed, eye-hand coordination, alertness and distributed attention. In the 1940s, its inclusion in both the Army Individual Test Battery and the Halstead-Reitan Neuropsychological Battery (HRNB, Reitan and Wolfson, 1985) ensured its continued use and propagation in a wide range of research settings. It is now included in many national longitudinal studies of aging (e.g., the Harvard Aging Brain Study, The Irish Longitudinal Study of Aging (TILDA), The Aging, Demographics and Memory Study (ADAMS), etc.).

In the first part of the test (TMT-A), lines are drawn sequentially in order to connect 25 encircled numbers distributed on a sheet of paper. In the second part (TMT-B), the requirements are similar, with the exception that the individual being tested must alternate between letters and numbers, in increasing numerical and alphabetical order (e.g., 1, A, 2, B, 3, C, etc.). Performance is expressed in terms of the time taken to complete separately each portion of the test. On the basis of associations obtained between individual scores for elements of the TMT and other psychometric tests, it has been surmised that the TMT-A draws mainly upon visuo-perceptual abilities, that the TMT-B is primarily an expression of WM and task-switching ability (Sánchez-Cubillo et al., 2009, p.448). As it has been assumed that the demands in relation to ''motor speed and visual scanning'' are equivalent for both parts, further scores are also often derived (Arbuthnott and Frank, 2000, p.519). Most frequently these are the TMT-B—TMT-A difference score, and the TMT-B/TMT-A ratio. It is believed that by reflecting the additional requirements of the TMT-B, these scores measure ''executive control'' (p.519). In view of the fact that in the prototypical variants of the task, the distance between the targets is greater in the TMT-B than the TMT-A (Gaudino et al., 1995; Bowie and Harvey, 2006), a ratio score provides the more appropriate form of normalization. Indeed, it has been argued that it is essential to employ the ratio score when the goal is to evaluate executive function (Oosterman et al., 2010).

It is perhaps surprising therefore that many of the studies that have reported a positive impact of motor training or PA have treated the TMT-A and TMT-B separately (or singly), or reported a single additive measure (Emery et al., 1998; Scherder et al., 2005; Baker et al., 2010; Nguyen and Kruse, 2012; Vaughan et al., 2014; Eggenberger et al., 2015; Tazkari, 2016; de Natale et al., 2017; Gregory et al., 2017; Jonasson et al., 2017). Notwithstanding any evidence that completion times for either part of the TMT may be correlated (across individuals) with other measures of cognitive function (e.g., Sánchez-Cubillo et al., 2009), an intervention-induced change in the performance of TMT-A or TMT-B cannot simply be attributed to an effect on a specific faculty such as cognitive flexibility or set shifting. Contingent variations in the influence of other factors that mediate the successful completion of these tasks must also be contemplated.

On a prima facie basis, the intrinsic nature of the TMT is such as to suggest that individual levels of performance will be particularly sensitive to integrity of motor function. Indeed, this much was implied in its original formulation. The neurophysiological evidence is consistent with this supposition. When variants of the TMT specifically adapted for neuroimaging are employed, for example using a ''virtual stylus'' (Zakzanis et al., 2005) or button box (Jacobson et al., 2011) adapted to collect responses within the confines of an fMRI scanner, the contrast between the TMT-A and TMT-B reveals differences in BOLD response in the dorsal part of M1 (Jacobson et al., 2011; see also Zakzanis et al., 2005; Kodabashi et al., 2014). With respect to a clinical population, it has been reported that the TMT-B—TMT-A difference score is correlated with upper arm central motor conduction delay (Ravaglia et al., 2002). Using the TMT-B, Allen et al. (2011) registered the presence of task-related activation in the left precentral gyrus (M1), bilateral premotor cortex and the medial pre-SMA (see also Moll et al., 2002; Horacek et al., 2006). When measured using functional near-infrared spectroscopy, bilateral activity has been detected during the TMT-B task in premotor regions (Müller et al., 2014). In a further study using the same imaging procedures, activity was also detected in the M1, with older participants exhibiting greater task related changes in O2Hb signal strength in this brain region, and in the right premotor cortex during both variants (Hagen et al., 2014). Employing EEG-derived measures, Wölwer et al. (2012) reported associations between TMT-B task performance and the current density of M1 designated sources. In light of the evidence of cortical motor network mediation, and particularly in view of the limitations that are associated with measures derived from a single TMT variant, the explanation that intervention associated enhancements in performance of the TMT-A or TMT-B (i.e., considered separately or additively) arise from adaptations in motor function, seems most parsimonious.

In a small number of published studies focusing on motor training or PA, a TMT-B/TMT-A ratio score has been employed (Klusmann et al., 2010; Yin et al., 2014; cf, Schoene et al., 2015). In none of these cases was an effect of a motor training or PA intervention demonstrated. Null findings have also been reported in all of the studies known to us in which a TMT-B—TMT-A difference score has been used as a dependent measure (Nagamatsu et al., 2012; Forte et al., 2013; Liu-Ambrose et al., 2016; Barban et al., 2017; de Natale et al., 2017; Jonasson et al., 2017).

Given the evidence that is available presently therefore, there is no basis upon which to suppose that motor training or PA has a reliable impact upon the facets of executive function to which the TMT is thought to be specifically sensitive. While reductions in completion times have been reported for individual task variants (TMT-A or TMT-B), it is not possible to exclude the possibility that these are attributable to changes in some aspects of motor function arising from the particular mode of intervention.

# CONCLUSIONS

In the cognitive sciences, when faced with the practical challenges of operationalizing a theoretical construct, the pragmatic turn is to develop an experimental paradigm to capture its key features. Subsequently however, idiosyncratic features of the methodology may become reified as the phenomenon of interest (Nosek et al., 2012)—this may be quite distinct from the construct the paradigm was developed to capture, or indeed is capable of capturing. The point is that an improvement in the performance of a test arising from therapeutic intervention does not entail that the change may be interpreted in terms of the particular facet of cognition assigned by the practitioner to the test. Thus while enhanced performance of the Eriksen Flanker task might be ascribed to an improvement in selective attention, in the absence of convergent evidence, the effect of the intervention can with equal legitimacy be attributed to adaptations related to motor function. Indeed, with respect to many interventions that are based on PA, the latter is the more parsimonious account.

As a research tradition has evolved to explore notionally different aspects of cognition (''WM'', ''numeracy'', ''executive function'' and so on), measured using different paradigms, there has also emerged a tendency to treat mediating brain processes as similarly dissociable in terms of constituent functions. Thus, in an era in which brain imaging has become the tour de force of cognitive neuroscience, and with access to tools that assign activity to specific brain regions, it remains customary to discuss variations in task-dependent activation in terms of the functional localization of various aspects of cognition (Ross, 2010). Necessarily however, the roles in cognition assumed by spatially circumscribed regions of the brain are highly diverse (Anderson, 2014). This much is certainly true of many elements of the cortical motor network. Beyond the specific examples given above, there is overwhelming evidence that their engagement is an obligatory feature of cognition in general, and the performance of tests of cognitive function in particular. In circumstances—such as therapeutic interventions based on PA, in which the purpose is more effective and/or efficient motor output, functional adaptations within motor networks are anticipated. In light of the tests that are employed, there is every

#### REFERENCES


reason to believe that many improvements in cognition ascribed to these interventions are also accountable in these terms.

We should strive to ensure that cognitive enhancement remains, at its core, an effort to improve the quality of life for older adults, targeting functional independence and activities of daily life. ''Far'' transfer from physical training to test-derived measures of cognition offers promise, however the transfer may not be as ''far'' as is assumed, or as ''far'' as is required.

#### AUTHOR CONTRIBUTIONS

The authors together wrote the article, jointly contributed the intellectual content and gave approval to the final version of the article to be published.

#### FUNDING

This research was supported in part by the Irish Research Council (grant no. GOIPG/2014/258). Richard Carson thanks Atlantic Philanthropies for their generous support, through their funding of the Neuro-Enhancement for Independent Lives (NEIL) programme at Trinity College Institute of Neuroscience.

working memory in non-smokers with schizophrenia. Schizophr. Res. 171, 125–130. doi: 10.1016/j.schres.2016.01.008


Colcombe, S., and Kramer, A. F. (2003). Fitness effects on the cognitive function of older adults a meta-analytic study. Psychol. Sci. 14, 125–130. doi: 10.1111/1467- 9280.t01-1-01430


activity. Int. J. Psychophysiol. 73, 334–340. doi: 10.1016/j.ijpsycho.2009. 05.004


APA Division 40 members. Arch. Clin. Neuropsychol. 20, 33–65. doi: 10.1016/j. acn.2004.02.005


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Cremen and Carson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Reading Aloud and Solving Simple Arithmetic Calculation Intervention (Learning Therapy) Improves Inhibition, Verbal Episodic Memory, Focus Attention and Processing Speed in Healthy Elderly People: Evidence from a Randomized Controlled Trial

Rui Nouchi 1,2,3\*, Yasuyuki Taki 4,5,6 , Hikaru Takeuchi <sup>6</sup> , Takayuki Nozawa3,7 , Atsushi Sekiguchi 5,8 and Ryuta Kawashima3,6,7,9

<sup>1</sup> Creative Interdisciplinary Research Division, Frontier Research Institute for Interdisciplinary Science (FRIS), Tohoku University, Sendai, Japan, <sup>2</sup> Human and Social Response Research Division, International Research Institute of Disaster Science, Tohoku University, Sendai, Japan, <sup>3</sup> Smart Ageing International Research Centre, Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan, <sup>4</sup> Department of Nuclear Medicine and Radiology, Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan, <sup>5</sup> Division of Medical Neuroimage Analysis, Department of Community Medical Supports, Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan, <sup>6</sup> Division of Developmental Cognitive Neuroscience, Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan, <sup>7</sup> Department of Ubiquitous Sensing, PreClinical Research Center, Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan, <sup>8</sup> Department of Adult Mental Health, National Institute of Mental Health, National Center of Neurology and Psychiatry, Tokyo, Japan, <sup>9</sup> Department of Functional Brain Imaging, Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan

Background: Previous reports have described that simple cognitive training using reading aloud and solving simple arithmetic calculations, so-called "learning therapy", can improve executive functions and processing speed in the older adults. Nevertheless, it is not well-known whether learning therapy improve a wide range of cognitive functions or not. We investigated the beneficial effects of learning therapy on various cognitive functions in healthy older adults.

Methods: We used a single-blinded intervention with two groups (learning therapy group: LT and waiting list control group: WL). Sixty-four elderly were randomly assigned to LT or WL. In LT, participants performed reading Japanese aloud and solving simple calculations training tasks for 6 months. WL did not participate in the intervention. We measured several cognitive functions before and after 6 months intervention periods.

Results: Compared to WL, results revealed that LT improved inhibition performance in executive functions (Stroop: LT (Mean = 3.88) vs. WL (Mean = 1.22), adjusted p = 0.013 and reverse Stroop LT (Mean = 3.22) vs. WL (Mean = 1.59), adjusted p = 0.015), verbal episodic memory (Logical Memory (LM): LT (Mean = 4.59) vs. WL (Mean = 2.47), adjusted p = 0.015), focus attention (D-CAT: LT (Mean = 2.09) vs. WL

#### Edited by:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia (UNED), Spain

#### Reviewed by:

Hidenao Fukuyama, Kyoto University, Japan Eduard Kraft, University of Munich, Germany

\*Correspondence: Rui Nouchi rui.nouchi.a4@tohoku.ac.jp

Received: 05 February 2016 Accepted: 26 April 2016 Published: 17 May 2016

#### Citation:

Nouchi R, Taki Y, Takeuchi H, Nozawa T, Sekiguchi A and Kawashima R (2016) Reading Aloud and Solving Simple Arithmetic Calculation Intervention (Learning Therapy) Improves Inhibition, Verbal Episodic Memory, Focus Attention and Processing Speed in Healthy Elderly People: Evidence from a Randomized Controlled Trial. Front. Hum. Neurosci. 10:217. doi: 10.3389/fnhum.2016.00217 (Mean = −0.59), adjusted p = 0.010) and processing speed compared to the WL control group (digit symbol coding: LT (Mean = 5.00) vs. WL (Mean = 1.13), adjusted p = 0.015 and Symbol Search (SS): LT (Mean = 3.47) vs. WL (Mean = 1.81), adjusted p = 0.014). Discussion: This randomized controlled trial (RCT) can be showed the benefit of LT on inhibition of executive functions, verbal episodic memory, focus attention and processing speed in healthy elderly people. Our results were discussed under overlapping hypothesis.

Keywords: learning therapy, cognitive training, reading aloud, simple calculation, transfer effect

# BACKGROUND

Cognitive function changes during one's lifetime (Hedden and Gabrieli, 2004). Previous reports have described that older adults showed a decline in memory performance (Salthouse, 2003), attentional process (Yakhno et al., 2007), processing speed (Salthouse, 1996) and executive functions (Royall et al., 2004). Defining the term executive function is important. The term ''executive function'' is used as an umbrella for various complex cognitive processes and sub-processes (Elliott, 2003). Executive function includes the high-level cognitive processes that facilitate new ways of behaving and which optimize one's approach to unfamiliar circumstances (Gilbert and Burgess, 2008). Three main components in executive functions are ''updating (constant monitoring and rapid addition/deletion of working-memory contents), shifting (switching flexibly between tasks or mental sets) and inhibition (deliberate overriding of dominant or proponent responses)'' (Miyake et al., 2000; Miyake and Friedman, 2012). As described herein, we use executive function as a collective term includes updating, shifting and inhibition.

Previous studies have demonstrated improvements of cognitive functions in older adults (Verhaeghen et al., 1992; Ball et al., 2002; Clare and Woods, 2004; Edwards et al., 2005; Carretti et al., 2007; Green and Bavelier, 2008; Lustig et al., 2009; Jean et al., 2010; Nouchi et al., 2012a; Ballesteros et al., 2014). Cognitive training is an intervention program that provides structured practice on tasks related to aspects of cognitive function. There are many types of cognitive training such as working memory training (Richmond et al., 2011), processing speed training (Edwards et al., 2005, 2009), memory strategic training (Verhaeghen et al., 1992; Mahncke et al., 2006a; Carretti et al., 2007) and brain training game (Nouchi et al., 2012b; Anguera et al., 2013; Lampit et al., 2014; Toril et al., 2014). Developing cognitive training and validating the evidence of cognitive training have been at the forefront of scientific efforts (Nouchi and Kawashima, 2014).

Recently, cognitive training using reading aloud and solving of simple arithmetic calculations, so-called learning therapy (LT), has been developed (Kawashima et al., 2005; Uchida and Kawashima, 2008; Nouchi et al., 2012b). We were motivated to develop the new cognitive training based on following reasons. First, a previous study suggested that cognitive training should be designed in accordance with neuro-scientific evidence (Papp et al., 2009). Consequently, training tasks of learning therapy are created based on recent neuroimaging findings. Learning therapy uses reading text aloud and making simple calculations. Neuroimaging studies show that reading sentences with a voice (Miura et al., 2003, 2005) and simple arithmetic operations (Kawashima et al., 2004; Arsalidou and Taylor, 2011) activated the frontal, temporal and parietal cortices. Second, psychological stress related cognitive training reduced the improvements of cognitive functions after cognitive training (McAvinue et al., 2013). We selected extremely simple and easy tasks to do during intervention periods. Additionally, the training tasks are more familiar activities for older people because reading and calculation are parts of everyday life. Moreover, the training tasks use only papers and pencils. Consequently, elderly people can comprehend and perform training tasks. Third, previous results of studies have suggested that multiple components in training tasks facilitate to improvement of cognitive functions (Stuss et al., 2007; Green and Bavelier, 2008). Reading aloud and solving arithmetic problems are accomplished using a combination of cognitive processes including memory, attention and executive functions (inhibition, shifting and updating; Nouchi et al., 2012b). Therefore, we prepared reading aloud and mathematical calculations tasks as training tasks of learning therapy. Learning therapy is expected to have greater effects than previous cognitive training because it alleviates the difficulties mentioned above.

An earlier report described that learning therapy improves executive functions, episodic memory and processing speed (Kawashima et al., 2005; Uchida and Kawashima, 2008; Yoshida et al., 2014). However, there is no study which investigates the positive effects of learning therapy on various cognitive functions (e.g., memory and attention) in elderly people. Previous studies did not measure any cognitive function except for executive functions and processing speed. It is necessary to test the effectiveness of learning therapy on diverse cognitive functions, because various cognitive functions are necessary to support our actions and behaviors in everyday life.

As we mentioned before, the previous studies using learning therapy showed only improvements of executive functions, episodic memory and processing speed (Kawashima et al., 2005; Uchida and Kawashima, 2008; Yoshida et al., 2014). It is still unclear whether or not learning therapy can facilitate performance of other cognitive domains such as episodic memory, short term memory, working memory, reading ability and attention. Thus, this study was designed to investigate whether or not learning therapy can improve diverse cognitive functions in elderly people. There were essential differences from previous studies. First, the current study was aimed to validate the previous finding using the different cognitive tests in the same domains such as executive functions and processing speed. For example, previous studies used: (1) Frontal Assessment Battery (FAB) as the executive functional measure (Kawashima et al., 2005; Uchida and Kawashima, 2008); (2) the digit symbol coding as the processing speed measure (Kawashima et al., 2005; Uchida and Kawashima, 2008); and (3) the word list as the episodic memory measure (Yoshida et al., 2014). On the other hand, we newly used Stroop and verbal fluency tests for the executive functional measure, Symbol Search (SS) for the processing speed andLogical Memory (LM) and face-name association memory for the episodic memory measure. Second, we measure other cognitive domains such as focus attention, short-term memory and working memory which did not measure in the previous articles. Thus, we can firstly investigate the beneficial effects of LT on a wide range of cognitive functions in one study.

To elucidate the beneficial effects of learning therapy on a wide range of cognitive function, we conducted single-blinded randomized controlled trials (RCT) using learning therapy. Testers who conducted psychological tests were blinded to the study hypothesis and the group membership (learning therapy group or not). To test the benefits of learning therapy, we assessed a broad range of cognitive functions. We measured inhibition and shifting of executive functions, verbal and facial episodic memory, short-term memory, working memory, reading ability, focus attention and processing speed.

Based on previous studies using cognitive training related to reading and mathematical calculations, we hypothesized that learning therapy would improve the inhibition performance in executive functional domain, verbal episodic memory in episodic memory domain, focus attention and processing speed in the healthy elderly people. Several reasons exist: (1) regarding the inhibition performance, a previous study (Uchida and Kawashima, 2008) found that performance of FAB (Dubois et al., 2000) which measured executive functions improved after learning therapy. Especially, the sub score (sensitivity to an interference task) of FAB was increased after learning therapy. The sensitivity to an interference task requiring behavioral self-regulation under verbal commands conflicts with sensory information. It may be highly correlated with the inhibition ability in executive functions. This task is similar to the Stroop task. Cognitive training using calculation (Takeuchi et al., 2011) and video game training (Nouchi et al., 2013), which included reading aloud and mental calculations, in young adults showed the improvements of the Stroop task. Consequently, we hypothesized that learning therapy can improve the inhibition performance measured by Stroop task performance; (2) for episodic memory, a previous study using working memory training, which included calculation training, demonstrated improvements of episodic memory (McAvinue et al., 2013). An earlier long-term intervention study using reading aloud and calculation revealed improvements of verbal episodic memory as measured by word lists (Yoshida et al., 2014). Consequently, we hypothesized that learning therapy would engender improvements of verbal episodic memory performance measured by LM test; (3) for focus attention, to solve mathematical calculations and reading sentences, it is necessary to examine the present information (focus attention) specifically. We assume that the focus attention ability would be improved by simple calculation and reading aloud tasks; and (4) for processing speed, our report of a previous study described improvements of processing speed measured by digit symbol coding (Uchida and Kawashima, 2008). Additionally, video game training using reading aloud and calculations also showed improvements of processing speed measured by SS (Nouchi et al., 2013). Based on the previous finding, we hypothesized that processing speed measured by SS and digit symbol coding. Therefore, we assumed that learning therapy engenders improvements of inhibition, verbal episodic memory, focus attention and processing speed compared to the control group.

# METHODS

# Randomized Controlled Trial Design and Setting of this Trial

The following information was the same as our protocol article (Nouchi et al., 2012b). This study was a RCT. This study ''conducted in Sendai city, Miyagi prefecture, Japan. Written informed consent to participate in the study was obtained from each participant based on the Declaration of Helsinki before enrolment. Informed consent were approved by the Ethics Committee of the Tohoku University Graduate School of Medicine (ref. 2011-153). This study was registered in the University Hospital Medical Information Network (UMIN) Clinical Trial Registry (UMIN000006998)'' (Nouchi et al., 2012b).

To assess the impact of learning therapy on widely diverse cognitive functions in healthy elderly people, we used a single-blinded intervention with two parallel groups: a learning therapy group and a waiting list (WL) control group. Testers were blind to the study's hypothesis and the group membership of participants. The Consolidated Standards of Reporting Trials (CONSORT) statement (http://www.consortstatement.org/home/) was used as a framework for developing the study methodology (see Supplementary Material 1). The trial design is presented in **Figure 1**.

## Participants

Seventy-two participants were recruited from Sendai city through advertisements in the local town paper (Kahoku weekly; **Figure 1**). A procedure of recruitment of participants was the same as our previous study (Nouchi et al., 2014). Sample

size calculation was written in Supplementary Material 2. Interested participants were screened using a semi-structured telephone interview, during which we asked participants for their demographic information (e.g., age and gender) and asked the same questions related to our inclusion and exclusion criteria (e.g., past medical histories of disease known to affect the central nervous system and using medications known to interfere with cognitive functions). ''After the telephone interview, four people were excluded because they reported taking medications known to interfere with cognitive function (including benzodiazepines, antidepressants and other central nervous agents)'' (Nouchi et al., 2012b). Two people were excluded based on severe hypertension (systolic blood pressure over 180, diastolic blood pressure over 110). Two participants declined to participate before a random assignment. All included participants (n = 64) were invited to visit Tohoku University for more detailed screening assessment (JART) and Mini-Mental State Examination (MMSE;


TABLE 1 | Characteristics of participants in the learning therapy group and the waiting list control groups.

No significant difference was found between combination exercise training and the waiting list control groups (two-sample t-test, p > 0.10). JART, Japanese Reading test; SD, standard deviation; MMSE, Mini-Mental State Examination; WAIS, Wechsler Adult Intelligence Scale III.

Folstein et al., 1975) and to provide written informed consent. No participant was excluded based on JART scores. Randomization was conducted after receiving the informed consent statement from participants. Before the intervention period, we measured reading abilities using a sub-scale of Wechsler Adult Intelligence Scale (WAIS) such as Vocabulary (Wechsler, 1997) and arithmetic abilities using a sub-scale of WAIS such as Arithmetic (Wechsler, 1997). **Table 1** presents the baseline characteristics. There were no significant differences between the groups in all data (two-sample t-test, p > 0.10). The scores of MMSE and IQ were within the range of normality.

Details of sample size calculation was described in the protocol of this study (Nouchi et al., 2012b).

## Learning Therapy Group (Cognitive Intervention Group)

The cognitive intervention method was the same as that used in our previous study with learning therapy for healthy older adults (Uchida and Kawashima, 2008). Training tasks use two simple tasks (solving arithmetic and Japanese language problems) with systematized basic problems in arithmetic and reading (Uchida and Kawashima, 2008). The example of training tasks showed in **Figures 2**, **3**. The lowest level of difficulty in simple calculation was simple addition (e.g., 1 + 3). The highest level of calculation was three figure number division (e.g., 156 ÷ 3). The lowest level of difficulty in reading aloud was reading simple sentences of 17 characters such as Japanese Haiku poems. The highest level was reading fairy tales aloud (about 100–120 characters per page). The difficulties of the Japanese Language problems were based on the number of Japanese characters per page because it takes more time to read aloud sentences with more characters.

Before intervention, we measured the percentages of correct answers and the time it takes to solve the diagnostic tests. The diagnostic tests consisted of 70 simple arithmetic and 16 simple reading tests. Based on the diagnostic tests, the appropriate level and workload in training tasks were set. In these tasks, participant was able to solve the problems with ease and without mental stress within 15 min. The difficulty level of training tasks and workloads did not change during the intervention period.

The cognitive intervention was scheduled to be conducted for 23 weeks. Participants in the cognitive intervention group were asked to go to a classroom at Tohoku University once a week. They were instructed to complete five sheets of each task prepared for each participant for that day. The daily learning time for the two tasks was approximately 15 min. They were required to complete five sheets of each task as homework for 4–6 days a week.

# Waiting List Control Group (No Cognitive Intervention Group)

The following texts were the same as our previous protocol article (Nouchi et al., 2012b). ''The waiting list control group received no intervention. Those participants were informed by letter that they were scheduled to receive an invitation to participate after a waiting period of 6 months. No placebo was used for the social contact group'' (Nouchi et al., 2012b). Results of previous intervention studies (Clark et al., 1997; Mahncke et al., 2006) showed that ''a placebo group was unnecessary for study of this type because no difference existed in cognitive or functional improvement between the placebo and no-social-contact groups (control group)'' (Nouchi et al., 2012b). After a waiting period of 6 months, 26 elderly people in the waiting list group participated in the learning therapy. Six elderly people declined to receive the learning therapy because three elderly people dropped out during the waiting period and three elderly did not want to receive the learning therapy after the waiting period.

There were several reasons why we used the waiting list control group. To investigate beneficial effects of cognitive training, we should use certain control conditions because of reductions in the influence of practice effects and cognitive declines by aging. Several studies used no-contact or waiting list groups (Levine et al., 2007; Basak et al., 2008; Shatil et al., 2010; McDougall and House, 2012). On the other hand, many activities were selected for active control conditions (Buiza et al., 2008; Peretz et al., 2011; Nouchi et al., 2012a). For instance, there were many types of activities in the active control groups such as selecting letters in the newspaper (Herrera et al., 2012), leaning trivia using web sites (Richmond et al., 2011a) and video games (Nouchi et al., 2012a). From a methodological perspective, it would be better to use an other intervention program as an active

control group in the cognitive training studies compared to use a waiting list control group. However, to use an active control group, we should consider an ethical issue. Using the active control group should be ''a reasonable balance of the level of benefit to the time investment required of participants in the treatment group'' (Street and Luoma, 2002). An active control group with meaningless and worthless tasks is not suitable. In the cognitive training research field, there is no standard and appropriate task for the active control group. The choice of an active control condition must be considered within costs, research questions and ethical considerations. Additionally, results of previous intervention studies (Clark et al., 1997; Mahncke et al., 2006) reported that a placebo group was unnecessary for study of this type because no difference existed in cognitive or functional improvement between the placebo and no-social-contact groups (control group). Moreover, using the waiting list control group had ''the advantage of letting everyone in the study

receive the new intervention (sooner or later)'' (Nouchi et al., 2014). Therefore, we decided to the waiting list control group.

#### Cognitive Function Measures

The following texts were the same as our previous article (Nouchi et al., 2012b). To test the positive effects of learning therapy on performance in cognitive functions, we used several cognitive measures (**Table 2**). ''Measures of the cognitive functions were divisible into seven categories (executive functions, episodic memory, short-term memory, working memory, reading ability, attention and processing speed)'' (Nouchi et al., 2012b). Executive functions were measured by the Stroop Test (ST; Hakoda and Sasaki, 1990; Watanabe et al., 2011) and Verbal fluency task (VFT; Ito et al., 2004). The ST had ST and reverse ST (rST) conditions. For ST, participants were required to select a word corresponding to the ink color in the color–word combination. For rST participants were asked to choose a color


item corresponding to a color–word combination's semantic meaning or selection. For VFT, we used Japanese version's letter fluency task (LFT) and category fluency (CFT) task. Participants were asked to generate many words beginning with the specific letter (LFT) or many words of a category (CFT). Episodic memory was measured using LM (Wechsler, 1987) and First and Second Names (FSN; Wilson et al., 1985). For LM, participants were asked to memorize the short story. For FSN, participants were asked to memorize FSN with faces. Verbal short-term memory was measured by Digit Span Forward (DS-F; Wechsler, 1997). For DS-F, participants were asked to memorize numbers and repeat the numbers. Working memory was measured using Digit Span Backward (DS-B; Wechsler, 1997). For DS-B, participants were require to memorize digit numbers and answer the number in reverse order. Reading ability was measured by the Japanese Reading Test (JART; Matsuoka et al., 2006). JART was reading test of 25 Japanese Kanji. The Digit Cancellation Task (D-CAT) was used to measure focus attention (Hatta et al., 2000). For D-CAT, participants were asked to check the target number in 12 rows of 50 digits. Digit Symbol Coding (Cd; Wechsler, 1997) and SS (Wechsler, 1997) were used for processing speed tests. For Cd, participants were required to write the symbol corresponding to the number. For SS, participants were required to search the target symbols in five symbols. Details of all tasks are described in the Supplementary Material 2. We assessed these cognitive function measures before and after the intervention period (6 months).

TABLE 2 | Summary of cognitive function measures.


## Analysis

This study was designed to evaluate the beneficial effect of learning therapy in elderly people. The following analysis method was also presented in our protocol article (Nouchi et al., 2012b). We calculated the change score (post-training score minus pre-training score) in all cognitive function measures. We conducted ANCOVA for the change scores in each cognitive test. The change scores were the dependent variable. Groups (learning therapy, waiting list control) were the independent variable. Pre-training scores in the dependent variable, sex and age were the covariates to exclude the possibility that any pre-existing difference of measure between groups affects the result of each measure and adjust for background characteristics. Significance was inferred for p < 0.05. We used Storey's False discovery rate (FDR) correction methods to adjust the p values (Storey, 2002). Moreover, this report describes eta squared (η 2 ) as an index of effect size (Cohen, 1988). Missing data were imputed using the expectation-maximization method, as implemented in the Statistical Package for the Social Sciences (SPSS) Missing Value Analysis. It imputed missing values using maximum likelihood estimation with observed data in an iterative process (Dempster et al., 1977; Nouchi et al., 2012b). All randomized participants were included in the analyses in line with their allocation, irrespective of how many sessions they completed (intention-to-treat principle). All analyses were conducted using software (SPSS ver. 18 or higher; SPSS Inc.).

## RESULTS

Sixty-four elderly people participated in this RCT (learning therapy and waiting list control). Thirty of the 32 members of the learning therapy group and 29 of the 32 members of the waiting list control group were completed. We imputed missing values of two participants in the learning therapy group and three participants in the waiting list control group using intentionto-treat principle (see ''Analysis'' Section). The pre-training and post-training scores in cognitive functions are presented in **Table 3**.

To test the positive effect of the learning therapy on the improvement of cognitive functions, we did ANCOVA for the change scores in each cognitive test (**Table 4**). We found significant group differences in four cognitive domains (executive functions, episodic memory, attention and processing speed). For executive functions, the learning therapy group improved the ST score (learning therapy (Mean = 3.88) vs. waiting list control (Mean = 1.22), F(1,59) = 7.35, η <sup>2</sup> = 0.11, adjusted p = 0.013) and the rST score (learning therapy (Mean = 3.22) vs. waiting list control (Mean = 1.59), F(1,59) = 8.72, η <sup>2</sup> = 0.09, adjusted p = 0.015). For episodic memory, the learning therapy group improved the LM score (learning therapy (Mean = 4.59) vs. waiting list control (Mean = 2.47), F(1,59) = 9.72, η <sup>2</sup> = 0.12, adjusted p = 0.015). For focus attention, learning therapy group improved the D-CAT score (learning therapy (Mean = 2.09) vs. waiting list control (Mean = -0.59), F(1,59) = 12.23, η <sup>2</sup> = 0.14, adjusted p = 0.010). For processing speed, learning therapy group improved the Cd score (learning therapy (Mean = 5.00) vs. waiting list control (Mean = 1.13), F(1,59) = 9.85, η <sup>2</sup> = 0.13, adjusted p = 0.015) and the SS score (learning therapy (Mean = 3.47) vs. waiting list control (Mean = 1.81), F(1,59) = 7.81, η <sup>2</sup> = 0.10, adjusted p = 0.014). These results demonstrate that the learning therapy led to improvements of inhibition performance in executive functions, verbal episodic memory's measures, focus attentional measure and all processing speed's measures.

# DISCUSSION

We investigated the positive effects of learning therapy on diverse cognitive functions in healthy elderly people. Results showed clearly that the learning therapy group exhibited improved inhibition performance of executive functions measured by ST and rST, verbal episodic memory measured by LM, focus attention measured by D-CAT and processing speed measured by SS and Cd compared to the waiting list control group. These results supported our hypothesis. The improvements of Cd performance of processing speed after learning therapy are consistent with previous evidence (Kawashima et al., 2005; Uchida and Kawashima, 2008). However, this study extends the previous finding by demonstrating improvements of focus attention and by replicating improvement of inhibition process and processing speed using different cognitive functional measures such as ST, rST and SS after learning therapy.

The overlapping hypothesis can explain the present results (Nouchi et al., 2012a, 2013, 2014). The overlapping hypothesis hypothesizes that improvements of cognitive functions by a certain type of training would occur if the processes during both cognitive training tasks (e.g., reading aloud and calculation simple arithmetic problem) and untrained tasks (e.g., measures of cognitive functions) were overlapped and were involved in similar cognitive processes. In this study, participants were required to undertake learning therapy, which included a reading aloud task and a simple arithmetic calculation problem task. As we described in the background reading aloud and simple arithmetic calculation involve many cognitive components and processes. To perform these tasks, inhibition process related to executive functions, verbal episodic memory,


TABLE 3 | Cognitive function scores before and after training period in both groups.

Group comparison (two sample t-tests) of the pre-training scores revealed no significant difference in any measure of cognitive functions between the learning therapy group and the waiting list control group (p > 0.10). Pre, pre-training; post, post-training; SD, standard deviation. Executive functions were measured using the Stroop Test (ST) and Verbal fluency task (VFT). Attention was measured using the Digit Cancellation Task (D-CAT). Episodic Memory was measured using Logical Memory (LM) and First and Second Names (FSN). Short-term memory was measured using Digit Span Forward (DS-F). Working memory was measured using Digit span Backward (DS-B). Processing speed was measured using Digit Symbol Coding (Cd) and Symbol Search (SS). Reading ability was measured using the Japanese Reading Test (JART).

focus attention and processing speed are expected to be recruited. For instance, attentions are necessary to modulate one's own voice during reading sentences aloud and inhibition ability is important to select and use rules of mathematics during calculation. Long-term Memory (semantic and verbal episodic memory) is expected to be necessary to read a sentence and to comprehend the sentence during reading aloud (LaBerge and Samuels, 1974; Myers et al., 2000). When we read a sentence, we construct a mental representation, known as the situation model (Kintsch, 1994; Zwaan and Radvansky, 1998). To facilitate understanding and integration of incoming information from sentence, we use related information from verbal long-term memory (Cook et al., 1998). Both semantic and verbal episodic memories are important to create and elaborate the situation model (Van Dijk and Kintsch, 1983; Kintsch et al., 1999). Some previous studies showed that semantic memory has the most important role in reading and comprehending the sentences (Garrod and Terras, 2000). Especially, the patient with episodic memory dysfunctions has no problem with reading (Tulving et al., 1988). Thus, most theories assume that comprehending and reading a sentence rely on more on semantic rather than on episodic memory. However, for healthy people, verbal episodic memory may influence reading processing as early as general world knowledge (Myers et al., 2000). One previous study reported that the verbal episodic memory for prior text facilitated the reading process (Haviland and Clark, 1974). Verbal episodic memory for prior texts is reactivated automatically when we encounter the related information during reading sentences (Myers and O'Brien, 1998). Consequently, verbal episodic memory is also necessary to read and comprehend texts as well as semantic memory. Focus attention is expected to be necessary to perform each task with concentration. Processing speed has a important role to solve mathematical problems and to read as fast as possible during a training period. The overlapping hypothesis explains the improvements of inhibition in executive functions, verbal episodic memory, focus attention and processing speed after learning therapy that follows. First, it would take the mental processes described above to perform the reading aloud task and the simple arithmetic calculation problem task. Second, both these training tasks and the cognitive tests can share the similar cognitive processes. Third, the cognitive processes described above are expected to be facilitated by these training tasks. Therefore, inhibition performance in executive functions, verbal episodic memory, focus attention and processing speed were improved after these training tasks.

Considering common elements of improved cognitive functional measures, it may be true that learning therapy generally improved the speed of processing related to task performance. Improved cognitive functional measures such as ST, rST, D-CAT, SS and Cd required participants to do the task as quickly as possible during a limited time (please see ''Methods'' Section). Two training tasks also require that participants complete materials of two training tasks as quickly as possible. Based on the overlapping hypothesis, both cognitive training tasks such as reading aloud and simple calculations and the psychological tests which measured the cognitive


Change scores were calculated by subtracting the pre-cognitive measure score from the post-cognitive measure score. We conducted analysis of covariance (ANCOVA) for the change scores in each of cognitive tests. In the ANCOVA, pre-training scores in each cognitive test, sex and age were the covariates. Significance was inferred for p < 0.05. The p values were adjusted using FDR method. Moreover, this report describes eta square (η 2 ) as an index of effect size. As a descriptive index of strength of association between an experimental factor (main effect or interaction effect) and a dependent variable, η 2 is defined as the proportion of total variation attributable to the factor, and it ranges in value from 0 to 1. η <sup>2</sup> ≥ 0.01 is regarded as a small effect, η <sup>2</sup> ≥ 0.06 as a medium effect, and η <sup>2</sup> ≥ 0.14 as a large effect. LT, learning therapy; WL, waiting list control group; SD, standard deviation. Executive functions were measured using the Stroop Test (ST) and Verbal fluency task (VFT). Episodic Memory was measured using Logical Memory (LM) and First and Second Names (FSN). Short-term memory was measured using Digit Span Forward (DS-F). Working memory was measured using Digit span Backward (DS-B). Attention was measured using the Digit Cancellation Task (D-CAT). Processing speed was measured using Digit Symbol Coding (Cd) and Symbol Search (SS). Reading ability was measured using the Japanese Reading Test (JART).

functions can share the elements of speed of processing. Therefore, performances of cognitive functional measures that require speed of participants were improved by learning therapy.

It is important to consider dissociations of improvements in executive functions and episodic memory measures. The benefits of learning therapy did not generalize to all tasks in the same functions. In this study, the learning therapy group improved inhibition performance of executive functions (rST and ST) but not other shifting performance of executive functions (LFT and CFT). Additionally, the learning therapy improved verbal episodic memory measure such as LM, but not the facial episodic memory measure such as FSN. According to the overlapping hypothesis, the possibility exists that these training tasks (reading aloud and calculation simple arithmetic problems) and cognitive function measures such as LFT, CFT and FSN might not share the same cognitive processes. For instance, previous studies showed that LFT and CFT require both the generation of words within a subcategory (clustering) and the ability to shift to a new category when a subcategory is exhausted (switching; Troyer et al., 1998). FSN requires a face recognition process and an association between FSN and between faces and names (Wilson et al., 1985). As described earlier herein, these cognitive processes (clustering, switching, face recognition and association) did not include cognitive processes during reading aloud and arithmetic calculation problems. Consequently, the respective levels of performance of LFT, CFT and FSN were not improved by learning therapy.

Dissociation result of episodic memory can also be explained by the following ideas. For episodic memory, participants may become more familiar with the story structure by reading aloud training. Therefore, participants would be able to use the knowledge to encode information better after the reading aloud training. This idea can predict only the improvement of LM, but not the FSN.

From the perspective of definition of executive functions, three main components exist in executive functions. Based on this, learning therapy improved only the inhibition ability in executive functions, but not the shifting ability. The improvement of the inhibition ability after reading aloud and simple calculation training was consistent with our previous results (Uchida and Kawashima, 2008; Nouchi et al., 2013). Our dissociation results of executive functions suggest that it is necessary to prepare multiple cognitive measures in each cognitive domain such as executive functions and episodic memory. Moreover, in future studies, we have to focus on one area of cognitive domain (e.g., working memory) and use varied measures for the specific cognitive domain (operating span, reading span, n-back and digit (spatial) span). It may reveal a great deal about the impact of cognitive intervention on cognitive functions in the elderly people.

In addition, discussion of negative findings of no improvements of verbal ability, short-term memory and working memory is important. For verbal ability measured by JART, participants must read Kanji compound words. The difficulty level of the kanji words in JART is higher than that in the junior high school. However, the difficulty level of kanji in the reading aloud task would be lower than that in JART because our texts for the reading aloud task are based on first-grade to fourth-grade elementary school students (please see ''Methods'' Section). Thus, the participants did not learn how to read difficult kanji through intervention. Therefore, learning therapy did not show the improvement of JART performance. In addition, we found no significant improvements of short-term and working memory measured by DS-F and DS-B. No improvements of short-term and working memory performance would come from the lack of adaptive elements for the training tasks. Some previous studies demonstrated that intensive adaptive training would facilitate improvements of cognitive functions (Takeuchi et al., 2011; McAvinue et al., 2013). One previous study using mental calculation training in young adults reported that the intensive adaptive training using mental calculation led to improvement of short-term memory performance, but not the no intention adaptive training using mental calculation (Takeuchi et al., 2011). In the present study, we set the appropriate difficulties level of both task based on the diagnostic tests. However, the difficulty level of both training tasks did not change through intervention periods. Thus, learning therapy showed no improvement of short-term and working memory performance. Future studies demand additional detailed studies to ascertain whether learning therapy using the intensive adaptive training would improve short-term and working memory performance.

This study has several strength points compared to earlier studies assessing the effects of cognitive training for elderly people. As we described in our protocol article (Nouchi et al., 2012b), we used easy to learn training tasks such as reading sentence and simple calculation using a paper and pencil. Most training tasks in previous studies were complex tasks using computers (Berry et al., 2010; Li et al., 2010). Using computers might make it easy to record data precisely and to control tasks. Nevertheless, elderly people often have difficulty using computers (Wagner et al., 2010). Recent studies have demonstrated that some barriers to computer use exist, such as lack of interest, lack of knowledge and fear of using computer (Saunders, 2004; Peacock and Künemund, 2007). The difficulty of using computers is one main reason of frustration and other negative emotions, possibly reducing their motivation to continue. On the other hand, our training tasks are easy to do for elderly people. Therefore our training tasks were expected to encourage their willingness.

This study has a limitation. We do not use other cognitive training as an active control group. We used the waiting list control group because previous studies revealed no differences in cognitive or functional improvement between the active control group and the non-training control group (Clark et al., 1997; Mahncke et al., 2006). However, from a study design perspective, it would be suitable to use the active control group to prove the effectiveness of learning therapy. Future intervention studies should be conducted to compare learning therapy to other cognitive training such as working memory training or training programs of other types such as exercise training.

# CONCLUSION

In summary, this study was designed to investigate the beneficial effects of learning therapy on diverse cognitive functions in healthy elderly people. This report is the first of a study revealing the beneficial effects of learning therapy on inhibition performance of executive functions, verbal episodic memory, focus attention and processing speed in healthy elderly people. On the other hand, the learning therapy did not improve shifting performance of executive function, facial episodic memory, verbal short-term memory, verbal working memory and reading ability. These positive and negative findings would be important for people who want to use the learning therapy. If people want to improve inhibition performance, verbal episodic memory, focus attention and processing speed, this intervention program would work well. If people want to improve other cognitive domains (e.g., working memory), other cognitive training program such as working memory training would be well for them. This study had some limitations, but produced important and sufficient evidence demonstrating the effectiveness of learning therapy. Given that cognitive functions are important to do daily life activities (Cahn-Weiner et al., 2000, 2002; Lee et al., 2005), our results suggest that cognitive functions in the elderly are enhanced through daily life activities such as reading aloud and calculating simple arithmetic problems.

# AUTHOR CONTRIBUTIONS

RN designed, developed the study protocol and calculated the sample size. RN conducted the study. RN wrote the manuscript with YT, HT, TN, AS and RK. RK also provided advice related to the study protocol. All authors read and approved the final manuscript.

# ACKNOWLEDGMENTS

This study was an industry–academic collaboration of Tohoku University: Smart Aging Square (http://www2.idac. tohoku.ac.jp/dep/sairc/square.html). This study was supported by the Kumon Institute of Education. This study is also supported by JSPS KAKENHI Grant Number 15H05366 (Grant-in-Aid for Young Scientists (A)) and Research Grant of Frontier Research Institute for Interdisciplinary Science (FRIS), Tohoku University. Funding sources of the trial have no involvement in the study design, collection, analysis, interpretation of data, or writing of articles. We thank A. Kasagi and H. Nouchi for recruiting the participants, testers for performing psychological tests, supporters for conducting learning therapy, the participants and all of our other colleagues in IDAC, Tohoku University for their support.

#### REFERENCES


#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2016.00217/abstract


evidence from a randomized controlled trial. Age (Dordr.) 36, 787–799. doi: 10. 1007/s11357-013-9588-x


**Conflict of Interest Statement**: Learning therapy was developed by RK and the Kumon Institute of Education. However, RK derives no income from Kumon Institute of Education and Society for Learning Therapy. RK has no other competing interests. All other authors have declared no competing interests.

Copyright © 2016 Nouchi, Taki, Takeuchi, Nozawa, Sekiguchi and Kawashima. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Training on Working Memory and Inhibitory Control in Young Adults

Maria J. Maraver<sup>1</sup> \*, M. Teresa Bajo<sup>1</sup> and Carlos J. Gomez-Ariza<sup>2</sup>

<sup>1</sup> Department of Experimental Psychology – Research Center for Mind, Brain and Behavior, University of Granada, Granada, Spain, <sup>2</sup> Department of Psychology, University of Jaen, Jaen, Spain

Different types of interventions have focused on trying to improve Executive Functions (EFs) due to their essential role in human cognition and behavior regulation. Although EFs are thought to be diverse, most training studies have targeted cognitive processes related to working memory (WM), and fewer have focused on training other control mechanisms, such as inhibitory control (IC). In the present study, we aimed to investigate the differential impact of training WM and IC as compared with control conditions performing non-executive control activities. Young adults were divided into two training (WM/IC) and two (active/passive) control conditions. Over six sessions, the training groups engaged in three different computer-based adaptive activities (WM or IC), whereas the active control group completed a program with low control-demanding activities that mainly involved processing speed. In addition, motivation and engagement were monitored through the training. The WM-training activities required maintenance, updating and memory search processes, while those from the IC group engaged response inhibition and interference control. All participants were pre- and post-tested in criterion tasks (n-back and Stroop), near transfer measures of WM (Operation Span) and IC (Stop-Signal). Non-trained far transfer outcome measures included an abstract reasoning test (Raven's Advanced Progressive Matrices) and a well-validated experimental task (AX-CPT) that provides indices of cognitive flexibility considering proactive/reactive control. Training results revealed that strongly motivated participants reached higher levels of training improvements. Regarding transfer effects, results showed specific patterns of near transfer effects depending on the type of training. Interestingly, it was only the IC training group that showed far transfer to reasoning. Finally, all trained participants showed a shift toward a more proactive mode of cognitive control, highlighting a general effect of training on cognitive flexibility. The present results reveal specific and general modulations of executive control mechanisms after brief training intervention targeting either WM or IC.

Keywords: executive control, cognitive training, working memory, inhibitory control, plasticity, transfer

# INTRODUCTION

Executive Functions (EFs) refer to a variety of cognitive and brain mechanisms thought to be in charge of regulating the dynamics of human cognition and behavior in changing environments (Burgess, 1997; Smith and Jonides, 1999; Miyake et al., 2000; Jurado and Rosselli, 2007). In an influential empirical work, Miyake et al. (2000; see also Miyake and Friedman, 2012) used latent

#### Edited by:

Soledad Ballesteros, National University of Distance Education, Spain

#### Reviewed by:

Marco Calabria, Pompeu Fabra University, Spain Erika Borella, University of Padua, Italy

> \*Correspondence: Maria J. Maraver mjmaraver@ugr.es

Received: 10 September 2016 Accepted: 04 November 2016 Published: 18 November 2016

#### Citation:

Maraver MJ, Bajo MT and Gomez-Ariza CJ (2016) Training on Working Memory and Inhibitory Control in Young Adults. Front. Hum. Neurosci. 10:588. doi: 10.3389/fnhum.2016.00588

**71**

variables analyses to show that, despite their unity indicated by shared features, three different EFs emerged from performance in a variety of tasks: (i) Switching, which involves shifting flexibly between tasks or mental sets; (ii) Inhibitory Control (IC), which refers to overriding dominant or prepotent responses; and (iii) Updating of information maintained in Working Memory (WM). WM is usually defined as a cognitive system for temporarily storing and managing information that is necessary for undertaking complex cognitive tasks, and it is thought to play a key role in guiding goal-oriented behavior and novel problem solving (Braver et al., 2008; Unsworth, 2010; Wiley and Jarosz, 2012). As WM is thought to have limited capacity, the ability to update and disengage from information in this system is considered a core component of cognitive control and self-regulation (Miyake and Shah, 1999; Braver and Cohen, 2001).

Although there is some disagreement over the exact nature of EFs and their precise neural substrates (Miyake et al., 2000; Braver and Cohen, 2001; Kane and Engle, 2002; Banich, 2009), substantial evidence supports the fact that EFs play an essential role in learning and academic achievement (Bull and Scerif, 2001; St Clair-Thompson and Gathercole, 2006), knowledge acquisition (Blair and Razza, 2007; Danielsson et al., 2010), metacognition (Fernandez-Duque et al., 2000) as well as emotional and self-regulation (Barkley, 2001; Hofmann et al., 2012). The large role that EFs seem to play in efficient cognition and in successful behavior regulation has led researchers to develop interventions aimed at improving executive functioning, even in the short term. Brain plasticity is at the basis of the proposal that cognitive functioning can be enhanced by means of training. The basic idea is that during cognitive training, participants repeatedly activate neural regions involved in the training tasks (Olesen et al., 2004; Buschkuehl et al., 2012; Hussey and Novick, 2012; Hsu et al., 2014) enhancing, thus, the cognitive function supported by the specific neural region. As a consequence, training effects would generalize and transfer to non-trained tasks that also involve the targeted training domain, and the underlying trained brain areas (near transfer) (Lee et al., 2007; Thorell et al., 2009; Borella et al., 2014; Beauchamp et al., 2016). Furthermore, training effects could go beyond the trained domain and show benefits in measures considerably different from the training task, as long as they were associated with the trained process and shared comparable neural circuits (far transfer) (Jaeggi et al., 2008; Borella et al., 2010; Loosli et al., 2012; Dahlin, 2013). Similarly, at the behavioral level, transfer effects could be expected in potentially related cognitive functions, and lead to enhanced performance in a variety tasks that, although untrained, share the same cognitive mechanism than the targeted trained processes (Morrison and Chein, 2011). Although plenty of studies have found near transfer effects after training WM, IC, or attention, far transfer effects are still limited and inconclusive (Thorell et al., 2009; Spierer et al., 2013; Sprenger et al., 2013; Enge et al., 2014; Schwaighofer et al., 2015; Melby-Lervåg et al., 2016).

Training studies differ in the type of EFs targeted by the training tasks. WM has traditionally been the target for many cognitive training programs due to its well-known central role in cognition and its relationship with high-level abilities (Klingberg et al., 2005; Morrison and Chein, 2011; Jaeggi et al., 2013). Several studies have demonstrated positive effects of WM training in different age groups (Borella et al., 2010; Söderqvist et al., 2012; Jaeggi et al., 2013) with transfer to trained and untrained domains such as mathematical performance (Dahlin, 2013; Bergman-Nutley and Klingberg, 2014), reading abilities (Chein and Morrison, 2010; Dahlin, 2010; Loosli et al., 2012; Karbach et al., 2015), or reasoning and fluid intelligence (Klingberg et al., 2005; Borella et al., 2010; Jaušovec and Jaušovec, 2012; Au et al., 2014; but see Chooi and Thompson, 2012; Harrison et al., 2013; Redick et al., 2013 for failures to find far transfer effects after WM training; and Melby-Lervåg and Hulme, 2013; Bogg and Lasecki, 2015; Schwaighofer et al., 2015; Dougherty et al., 2016 for reviews).

Some other studies have focused on training IC processes (Spierer et al., 2013). Although several of these studies have failed to find behavioral transfer effects after training IC (Thorell et al., 2009; Berkman et al., 2014; Enge et al., 2014), others have found positive near and far transfer effects after taskswitching training across the lifespan of healthy individuals (Karbach and Kray, 2009), training-related benefits in fluid intelligence scores in children after executive control training (Rueda et al., 2005, 2012; Liu et al., 2015), and near transfer effects in normal developing children (Dowsett and Livesey, 2000) or with executive control deficits (Kray et al., 2012). In addition to the behavioral effects, brain activity studies have reported different activation patterns in the brain network associated with IC: namely, increased activation in the right inferior frontal gyrus after training response inhibition in young adults (Berkman et al., 2014); or a more adult-like pattern of EEG markers (dorsolateral prefrontal negativity linked to the anterior cingulate gyrus) in 6-year-old children after a 5-day training with tasks involving conflict resolution (Rueda et al., 2005).

While, with some exceptions, studies focusing on either WM or executive control training show transfer effects (see Karbach and Verhaeghen, 2014 for a meta-analysis with studies that trained WM, switching and IC in older adults), to date very few studies have directly compared the effects of WM and IC training across tasks (see Thorell et al., 2009 for a comparison between WM and IC training in preschoolers). Thus, the main aim of the present study was to directly compare near and far transfer effects of two different training programs targeting either WM or IC processes. The direct comparison of these two types of programs is theoretically interesting since, according to some proposals, WM and IC seem to represent two separate EFs and may therefore have separate effects (Miyake et al., 2000). In addition, we also aimed to carefully control some factors that have been subject to criticism in previous training studies.

As mentioned, despite the studies showing positive results after training in young adults, its effectiveness is still controversial and far transfer effects are not always obtained (Schwaighofer et al., 2015; Melby-Lervåg et al., 2016). Results stemming from different training studies need to be carefully interpreted with special attention to methodological differences that could account for the diversity of findings. Thus, for example, training procedures targeting specific cognitive abilities (such as WM or IC) allow for more restricted attributions on training-derived transfer effects (Borella et al., 2010; Rueda et al., 2012; Jaeggi

et al., 2013) than complex procedures that include multiple cognitive domains (memory, attention, IC, reasoning, etc), which seem to be less specific, and often yield more limited transfer effects (Schmiedek et al., 2010; Baniqued et al., 2014, 2015; Dovis et al., 2015; Hardy et al., 2015). Moreover, although many studies have used single training tasks (Rueda et al., 2005; Loosli et al., 2012; Carretti et al., 2013; Jaeggi et al., 2013), the potential generalization of the training might be enhanced by the use of different tasks recruiting the particular targeted process. Switching between multiple tasks targeting the same process during training might promote cognitive flexibility by adapting general processes and strategies, and thus, preventing the use of very specific task-strategies that would more likely be implemented when training is based on just one single task (Schmidt and Bjork, 1992; Bherer et al., 2005; Dahlin et al., 2008; Schwaighofer et al., 2015). Finally, the presence and type of control groups is an essential requirement to dissociate the effectiveness of training (Mohr et al., 2009; Dougherty et al., 2016; Melby-Lervåg et al., 2016). Thus, passive control groups (PC) may allow researchers to keep track of simple practice effects, while active control (AC) groups may enable them to uncover the specificity of different cognitive training procedures by maintaining similar levels of motivation and reducing the possibility of placebo effects (Boot et al., 2013; Dougherty et al., 2016).

Hence, in the present study we explored potential transfer effects of two executive-control training programs in young adults. We used a procedure that attempted to maximize generalization (by using multiple activities within each trained process) and to control for practice and motivation (by including active and passive control groups and by capturing motivational variables during training). Specifically, we compared a group of participants trained in working memory (WMT) to a group trained in inhibitory control (ICT). Both groups were trained with three different training activities during six sessions spread across 2 weeks. Importantly, the training procedures were adaptive and increased executive control demands. We included passive (PC) and active (AC) control groups in the study. AC performed the same training protocol as their experimental counterparts, but they engaged in activities that relied on perceptual abilities and progressively increased their speed demands, without increments in executive load (Peng et al., 2012; Takeuchi and Kawashima, 2012; Lawlor-Savage and Goghari, 2016). The batteries of training activities used here were designed from the Cognitive Training Program of the University of Granada (PEC-UGR<sup>1</sup> ), which included a number of tasks that could be adapted and combined. We designed these batteries considering both the neural basis of the cognitive processes underlying the activities and the logic of the experimental procedures traditionally used to evaluate executive control. As for the ICT group, the training included versions of (i) the Stroop task in which participants had to select coins/numbers contained in congruent or incongruently sized bags; (ii) the Conflict resolution task, where a sample of animals was presented and participants were asked to search for a target match from a set of distractors displaying congruent/incongruent shaped/colored animals; and (iii) the Go/No-Go and Stop-Signal tasks, in which participants had to respond to matching shapes (a robot and a screw), and stop their response when faced with a rustedlooking shape. Regarding WMT, participants performed versions of (i) the n-back task, in which participants had to monitor sequences of open/closed windows from a six-window display presentation, and press a key whenever the open window was the same as the window n trials back; (ii) WM updating, that consisted of the serial presentation of objects of different sizes that were introduced in boxes. Participants were asked to recall the 2 to 6 largest/smallest elements from the series; and (iii) Dual Span tasks, in which participants were asked to recall the shape and color of an increasing number of animals and then ask to select the animal that matched one of the study animals from a set of distractors. Participants were evaluated before and after training with two criterion tasks (n-back and Stroop), with near transfer WM (Operation Span) and IC (Stop-Signal) tasks, as well as with far transfer non-verbal reasoning (Raven's Advanced Progressive Matrices). In addition, we included a far transfer task (AX-CPT; the AX version of the Continuous Performance Test) to explore whether WM and IC training might change the adjustment of distinct executive control strategies (proactive versus reactive), which have been proposed to support cognitive flexibility<sup>2</sup> (Braver et al., 2009; Burgess et al., 2011; Braver, 2012).

Also of relevance, in the present study we also aimed to explore the role of individual differences on training and transfer performance. This represents a recent and unexplored issue that may be important in predicting the benefits of training (Könen and Karbach, 2015). In this sense, previous studies have already reported that at baseline, reasoning predicts training achievement (Bürki et al., 2014). Furthermore, individuals' improvement during training has been shown to be a relevant predictor of transfer effects in young adults (Jaeggi et al., 2011) as well as in children and older adults (Zinke et al., 2013; Wang et al., 2014). Thus, in order to explore the potential role of individual differences in training success, predictors of training improvement and transfer gains were analyzed (Könen and Karbach, 2015).

Based on the key assumption that generalization to nontrained tasks could occur whenever there is cognitive and neural overlap between the trained processes and those engaged in the outcome measures (Woodworth and Thorndike, 1901; Persson and Reuter-Lorenz, 2008; Hussey and Novick, 2012), we expected the two experimental groups to exhibit differential and specific

<sup>1</sup>The Cognitive Training Program (PEC-UGR) has been developed by the collaboration between professors M. Teresa Bajo and M. Rosario Rueda from the School of Psychology of the University of Granada.

<sup>2</sup>Proactive control refers to an "early selection" control mode that anticipates and prevents interference before it occurs, while reactive control implies a "late correction" strategy that detects and solves interferences once it is already present (Braver et al., 2009; Braver, 2012; Morales et al., 2013). The AX-CPT is a sensitive and reliable experimental task widely used to explore individual differences in the use of proactive and reactive control strategies (Braver, 2012; Chiew and Braver, 2014). Hence, we used it to assess whether the participants' control mode changes with training. While young adults tend to rely on a proactive control strategy while performing the AX-CPT (Braver et al., 2009; Morales et al., 2013), training could somehow modulate the way they faced the task, which systematically required goal maintenance, interference detection and conflict resolution.

enhanced post-training performance (for related findings, see Chein and Morrison, 2010; Foy and Mann, 2014). Thus, we expected that, after training, the WMT group would outperform the ICT group on the n-back and Operation Span tasks, which involved WM maintenance demands. On the other hand, due to the greater reliance on conflict resolution for the ICT group than for the WMT group, we predicted better performance after IC training in the Stroop and Stop-Signal tasks relative to the WMT group.

Regarding the active control group, which went through progressive response speed demands, we expected benefits in response times after training. Processing speed, even at a low demand level, may lead to changes in performance mainly driven by the fact that participants' responses could become faster after the training (Peng et al., 2012; Takeuchi and Kawashima, 2012; Lawlor-Savage and Goghari, 2016).

As for the AX-CPT, which provided an index of the control strategy deployed by the participants, we hypothesized that the two executive control-training programs would make participants more dependent on proactive control relative to control conditions. This hypothesis is based on the assumption that the high executive control demands of both training programs would encourage participants to focus on contextual cues and, hence, to enhanced reliance on cue processing (rather than probe processing) on the AX-CPT task. If so, both types of training would lead to maximize the typical proactive strategy deployed by young healthy adults. However, we also expected the WM training, which specifically focuses on monitoring and maintenance, to have a stronger impact on proactivity than IC training.

Finally, on the basis of either the close relationship between matrix problem resolution, WM (Colom et al., 2004; Friedman et al., 2006; Harrison et al., 2015) and executive control (Dempster and Corkill, 1999; Engle and Kane, 2004; Jarosz and Wiley, 2012; Shipstead et al., 2015) and the results of some previous training studies (Jaeggi et al., 2008; Karbach and Kray, 2009; Rueda et al., 2012; Au et al., 2014), we expected to find better post-training performance in the reasoning test in the two experimental groups than in the active and passive control conditions.

## MATERIALS AND METHODS

#### Participants

Participants were recruited via physical ads in the University of Granada requiring the fulfillment of the following conditions: (i) be aged between 18 and 30 years old; (ii) not to have any major medical or psychological condition; (iii) be committed to undertake at least four experimental sessions in the lab, which could be extended to 10. One hundred and twelve undergraduate students were selected to take part in the present study (Mage = 20.51 years; SDage = 1.74; range = 18 – 25; 83 females). After pre-testing, they were randomly assigned to one of the four groups making up the study: ICT, (N = 32; Mage = 20.41 years; SDage = 1.88; 23 females), WMT (N = 32; Mage = 20.31 years; SDage = 1.57; 23 females), active control (AC, N = 24; Mage = 20.75 years; SDage = 1.32; 18 females), or passive control (PC, N = 24; Mage = 20.67 years; SDage = 2.16; 19 females). There were no significant differences either in age (p = 0.76; η 2 <sup>p</sup> = 0.01) or in gender distribution (p = 0.92; η 2 <sup>p</sup> = 0.00). At the end of the study, the participants were economically compensated for their involvement. None of the participants withdrew from the study although they were informed they could do so if they wished. This study was approved and carried out in accordance with the recommendations of the Research Ethics Committees of the University of Granada, with written informed consent from all subjects. All participants were provided with information about the study and gave written informed consent in accordance with the Declaration of Helsinki (World Medical Association, 2013).

#### Procedure

The cognitive training schedule consisted of two (pre- and posttraining) testing sessions and six training sessions distributed over 2 weeks, with three training sessions per week. Therefore, the total length of the study extended for approximately 4 weeks. In the testing sessions all of the participants were evaluated for: (i) criterion tasks (n-back and Stroop); (ii) WM (Operation Span) and IC (Stop-Signal) as near transfer measures; and (iii) adjustment of proactive/reactive cognitive control (AX-CPT) and abstract reasoning (Raven's Advanced Progressive Matrices) as far transfer measures. We created two random task orders for evaluation that were counterbalanced across participants. The training and active control groups engaged in three different activities during each session (20 min per activity). The order of the activities in each training session was also counterbalanced over all participants. The resulting total training time for each activity was 120 min. The passive control group only performed the evaluation sessions and continued with their regular college activity during the 2 weeks between pre- and post-training.

Participants worked in individual cabins although an experimenter continuously supervised the procedure and was available to attend to any request. Every two training sessions participants were to complete a motivation questionnaire (Alonso-Tapia and de la Red Fadrique, 2007; Colom et al., 2013) in which they were asked for their: (i) involvement in the program; (ii) perceived difficulty of the activity levels; (iii) perceived challenge of improving over the levels; and iv) expectations for their achievement. They had to rate each of the four statements on a scale ranging from 0 (very low) to 10 (very high). In the last training session, they were asked for a general evaluation of the training program and their satisfaction with the experimental procedure.

#### Executive Control Training

We used the online training program from the University of Granada (PEC-UGR) that included different game-like activities organized in levels of increasing difficulty. Training difficulty was adaptive in order to maintain activities as a constant challenge (Klingberg et al., 2005; Brehmer et al., 2011; Karbach et al., 2015). Also, participants received feedback on whether their performance was correct or not (Katz et al., 2014). Activity levels were built up over runs of trials. Whenever participants succeeded in three runs they went forward to the next level and if they failed two runs, they went back to the previous level. Details of each of the three activities per training group are detailed below.

# Inhibitory Control Training

fnhum-10-00588 November 16, 2016 Time: 14:6 # 5

#### **Stroop-like**

This activity was modeled on the Steinhauser and Hübner's (2009) complex Stroop task, which involved both conflict resolution and switching. The task was implemented in a scenario where bags of different sizes containing amounts of money had to be put into a treasure chest. Participants had to select the bag with the largest (gold in color) or the smallest (silver in color) number of items, with the number of bags increasing over the levels. The size of the bags could be either congruent or incongruent with the amount inside. An example of a congruent trial is one in which the stimuli were a big bag containing seven golden coins (correct choice) and a small bag containing five golden coins. In an example of an incongruent trial the stimuli could be a small bag containing six golden coins (correct choice) and a big bag containing three golden coins. Difficulty increased by changing the ratio of congruent/incongruent trials (0; 0.25; 0.50; 0.75), so that the larger the proportion of congruent trials, the harder the choice for incongruent trials. At higher levels, switching was manipulated by changing the color of the items from gold to silver between trials within the same round. Times to respond and inter-stimuli intervals were also progressively reduced with each level. The dependent variable was a relative index of conflict resolution [(RT in incongruent trials – RT in congruent trials)/RT in congruent trials].

#### **Conflict resolution task**

The scenario of this activity was an ocean where a sample of sea animals was displayed in the upper part of the screen and a group of animal buttons was shown in the lower part. The buttons set size was always sample n + 1 and it was progressively increased over the levels from 2 to 6. Participants had to select, as fast as possible, the animal of the buttons that had the same shape and color as one of the animals in the sample (match trials). If there was an animal whose shape matched but the color did not, they had to click on the different button (no-match trials). An example of match trial could be one in which the sample stimuli included "blue turtle – yellow starfish – brown crab" and the button choices included "pink turtle – yellow starfish (correct choice) – red crab – gray dolphin." On the other hand, a no-match trial could be one displaying as sample "blue turtle – yellow starfish – brown crab" and the button choices containing "pink turtle – green starfish – red crab – gray dolphin (correct choice)." The percentage of match trials was manipulated (0.25; 0.5; 0.75) so that the higher this ratio, the stronger the tendency to respond. Difficulty was also manipulated with the similarity of the color between the sample and the choice of the buttons. When colors were limited (a different color between the target and the possible options), the choice got harder since the color of the distractors had to be inhibited. The time to respond and interstimuli intervals were also reduced over the levels. The parameter distribution across the levels was manipulated following the procedures used in Rueda et al. (2005, 2012). As in the previous activity, the dependent variable was the score in the relative index of conflict.

#### **Go/No Go-like**

This was a matching-to-sample activity based on the shape of the items: a robot was the target and a screw was the sample. Participants had to respond when the shape of the robot and the screw matched (Go trials: i.e., a squared robot and a squared screw on its top) and inhibit their response when the shapes did not match (No-Go – shape trials: i.e., a circled robot and a squared screw on its top). At higher levels, there was an extra difficulty because the response had to be also inhibited when the screw was rusted (No-Go – color trials: i.e., a squared robot and a rusted squared screw on its top), even if its shape matched that of the robot. The proportion of Go trials (0.10; 0.20; 0.50; 0.80; 0.90) was manipulated together with the additional No-Go color trials ratio (from 0 to 0.30). The higher the proportion of Go trials, the stronger the tendency to respond with greater IC being required to succeed. The manipulation of the parameters was conducted following similar procedures regarding Go/No Go trials proportion (Benikos et al., 2013b) and reaction times deadlines (Benikos et al., 2013a). As in the previous activities, the maximum time to respond was reduced when levels increased (Benikos et al., 2013a). In this case, false alarms and omission errors were the dependent variables.

#### Working Memory Training

#### **N-back**

Participants had to monitor, maintain and continuously update the items throughout a sequence of elements. Participants were presented with a six-window house and had to detect coincidence between positions (opening/closing of the windows), sounds, or the combination of both modalities. They had to give their response pressing a button whenever the position of the opening window, its sound or both, matched the one that was presented as n positions-back in the sequence. Increments in n-back (from 1 to 8) were implemented after participants had completed the n-back level with single (position or sound) and dual (position plus sound) modality levels. As for the dependent variables, we considered the achieved n-back level and the sum of errors in each session.

#### **WM search**

This was a matching-to-sample activity based on the shape and color of the items sequentially displayed: animals on one screen as the sample, and a group of animal buttons after a retention interval. Participants were presented with a matrix to be maintained in memory composed of animals with different shapes and colors displayed in an open field (i.e., memory matrix, "brown bear – red eagle – purple snake"). After a retention time of 5000 ms, participants performed a memory test in which they had to select as fast as possible the animal on the buttons that had the same shape and color of one of the previously retained animals (i.e., button choices, "orange bear – red eagle (correct choice) – yellow snake – blank button"). If none of the animals on the buttons had the same shape and color, they had to select the blank button (i.e., button choices, "orange bear – green eagle–

yellow snake – blank button (correct choice)"). The number of to-be-maintained items increased from 1 to 8 over the levels. The number of elements recalled (set size) was the dependent variable.

#### **WM updating**

This task was adapted from the word updating task from Palladino and Cornoldi (2001). Participants were presented with a group of numbered boxes. Items of different categories (food, objects, animals, or clothing) were sequentially displayed. For each trial, items from only one category were relevant and introduced into the boxes (i.e., animals). Participants were asked to recall the larger (or smaller) element(s) by selecting the box or boxes in which they were introduced (i.e., Rule: recall the smallest animal; Items presented: apple – cat – trousers – bee (correct choice) – chair – elephant). Maintenance and updating in WM were involved in this activity. The memory load was manipulated by increasing the number of elements to recall (from 1 to 7), the number of distractors that belong to the target category (from 1 to 7), and the number of distractors from different categories (from 2 to 20). The program randomly changed the rule from big to small keeping an equal proportion of the trials within a level. The dependent variable was the number of items successfully recalled.

#### Active Control

#### **Speed comparison**

In this matching-to-sample activity, participants were presented with a group of sea animals in the upper half of the screen and another group of animals in the lower half, and they were asked to find as fast as possible which animal in the lower part was present in the upper part of the screen. In all of the trials, the target was presented in the sample, which increased from 2 to 6. Times were reduced within each sample size, so that whenever one element was added to the sample the time to respond started at a higher level at the beginning and was progressively reduced. Response time was the dependent variable.

#### **Speed visual search**

For this speed of processing task, participants were presented with a plate of soup with 10 elements (digits and letters) and they had to find one element contained in the soup out of four different possible options. The number of elements to be found and the possible options remained constant so that the difficulty of the levels was only determined by the speed of the responses over the levels. Average reaction time per session was the dependent variable in this case.

#### **Speed categorization**

This activity required participants to categorize groups of figures while progressively reducing the time to do so over the levels. Participants saw three groups of figures and two boxes to classify them according to different rules (size, color, shape, or quantity). The rule to be applied for categorizing them was always displayed in the upper left corner of the screen so that it trained the response time throughout the levels. As in the two previous activities, we considered reaction time as the dependent variable.

## Transfer Tasks Stroop

The scenario of this task was similar to the one used for training, where participants were presented with different-sized bags and they had to select one with the largest (or smallest) amount inside. The bags were either congruent or incongruent in size. The switching component was manipulated by changing the color of the coins and the consequent response rule from the largest (gold) to the smallest (silver). The number of bags presented (from 2 to 7), the proportion of incongruent trials (0, 0.25 or 0.50), the proportion of switching trials (0, 0.25, 0.50, 0.75), the inter-stimuli interval (from 1500 to 600 ms) and the maximum time to respond (from 3000 to 3600, this increased as a function of the number of bags) was manipulated across blocks of trials (levels). Inter-stimuli intervals were designed considering the average ITI used in the study by Steinhauser and Hübner (2009); the maximum and minimum intervals limited a wider range than the one parametrized for the training so that enough room was left to observe a possible benefit in response time. The dependent variable (conflict score) was calculated as a relative index from (Incongruent trials RT – Congruent trials RT)/Congruent trials RT, for hits. Stimuli were presented randomly both in pre- and post-training testing sessions.

#### N-back

In this WM task, participants had to retain the spatial position of a sequence of elements over nine blocks of increasing memory load. Participants had to give a response any time an element matched the position of an element presented n (from 1 to 8) position-back. The length of the sequence in a block increased parallel to the memory load, from 6 to 20. The maximum time to respond and the inter-stimuli interval was, respectively, 2000 and 1000 for the first four blocks, and 1500 and 800 for the four last ones. N-back level and errors (omissions and commissions) were considered as dependent variables. The order of the stimuli in the sequence was randomized in pre- and post-testing.

#### Stop-Signal

We used this task of response inhibition with the standard parameters of the software STOP-IT (Verbruggen et al., 2008). Participants had to respond with the keyboard as fast as possible to two different stimuli (circles or square) presented in the center of the screen. In 25% of the trials, participants faced an auditory stop-signal (750 Hz, 75 ms) that was presented briefly after the visual stimuli onset and required the response to the current stimulus to be withdrawn. The task comprised of 32 practice trials and three blocks with 64 experimental trials each. The trials were displayed on a black screen and were composed of a 250 ms fixation point (white +), the stimuli presentation (a white square or circle) during 1250 ms and a fixed inter-stimuli interval of 2000 ms. Stimuli were randomized in pre- and post-testing, and in all cases the stop signal was presented with a variable stopsignal delay (SSD). Although initially it was set to 250 ms, it was continuously adjusted. When the inhibition was successful it was reduced 50 ms and if not, increased in 50 ms, so that according to the performance the software tried to maintain a stopping probability of 50%. We considered the Stop-Signal Reaction Time (SSRT) as a measure of motor inhibition efficiency (Verbruggen and Logan, 2008; Verbruggen et al., 2008; Morales et al., 2013).

#### Operation Span (O-Span)

fnhum-10-00588 November 16, 2016 Time: 14:6 # 7

We used the Spanish adapted version of the procedure developed by Turner and Engle (1989) (Turner and Engle, 1989; Tokowicz et al., 2004; Redick et al., 2012). It was a dual memory span task that required participants to verify mathematical operations while trying to remember sets of words of increasing set sizes. Each trial was composed of a simple solved mathematical equation [i.e., (14/2) + 2 = 9] presented for 3750 ms that participants had to verify and mark as correct or not by pressing one out of two keys on the keyboard. Afterward, a word was presented for 1250 ms to be maintained in memory. Operationword pairs were presented in increasing set sizes from 2 to 6. After each set, participants had to recall and type the words. While the order of recall was not important, they were told to avoid writing the last word presented first in order to prevent recency effects. The task comprised of 18 trials (three trials per set size) and the testing procedure was repeated until the end. We developed parallel versions of the task by randomizing the order of the stimuli presented that were counterbalanced across sessions and participants. Two parallel versions were created and counterbalanced for pre- and post-testing by randomizing the equation-word pairing. Special care was taken to avoid a similar pairing set size distribution between the two versions.

#### AX-CPT

We used the same version of the task as Morales et al. (2013) did to explore the adjustment of proactive/reactive cognitive control. In each trial participants were presented with five letters for 300 ms each (cue – three distractors – probe) in the center of a black screen, with a fixed inter-stimuli interval of 1000 ms. Cue and probe stimuli were presented in red font while distractors were presented in white. Participants were instructed to respond "yes" whenever they saw an A in the first position (cue) followed by an X in the fifth position (probe). Participants were asked to respond "no" to any other cue-probe combination and to the distractors (items in positions 2 to 4). The task was composed of a 10 trials practice phase and an experimental block of 100 trials, which were presented randomly both in pre- and post-testing. The target trials (AX) were the most frequent ones (70%) and the rest of the trials (cue – distractor: AY; distractor – probe; BX or neither cue nor probe: BY) occurred in a 10% of the remaining cases. Proactive and reactive control adjustment can be assessed by considering the proportion of errors in AY and BX type trials (Braver et al., 2009; Morales et al., 2013; Chiew and Braver, 2014).

#### Raven's Advanced Progressive Matrices (RAPM)

We used the computerized version of the set II of this test as a standardized measure of fluid intelligence (Raven, 1990). Participants had to solve visual analogy problems of increasing difficulty. A 3 × 3 matrix of patterns was presented and they had to a missing pattern of a matrix, from eight different response alternatives. We counterbalanced two parallel versions of the test over sessions with 18 matrices for the pre- and post-testing as used by Jaeggi et al. (2013). Participants had to complete the task as fast and accurately as possible with a 20 min time restriction. The dependent variable was the proportion of correct matrices answered and the reaction times of the hits.

# RESULTS

# Training Effects

To determine the significance of the training improvement in each activity, we compared the performance in the first training session with that of the final training session (sixth session). Thus, for all tasks repeated-measures analyses of variance (ANOVAs) were conducted on the specific dependent variables for the task (conflict score, errors, reactions times, or memory load) with training session (first vs. sixth) as the within-subject independent variable.

#### ICT Group

For the Stroop-like task, the reaction times from 20 participants (10 from ICT and 10 from WMT) were not registered due to a software coding error and consequently they could not be included in the analyses. The ANOVA yielded a reliable difference in the relative conflict effect [(incongruent-congruent)/congruent hits RT] from the first to the last training session [Ms<sup>1</sup> = 0.52, SDs<sup>1</sup> = 0.24; Ms<sup>6</sup> = 0.33, SDs<sup>6</sup> = 0.19; F(1,21) = 5.94; p = 0.02; η 2 <sup>p</sup> = 0.22]. The conflict effect was also reduced from the first to the last training session in the Conflict resolution task, although the difference did not reach statistical significance [Ms<sup>1</sup> = 0.48, SDs<sup>1</sup> = 0.34; Ms<sup>6</sup> = 0.39, SDs<sup>6</sup> = 0.25; F(1,31) = 1.71; p = 0.20; η 2 <sup>p</sup> = 0.05]. For the Go/No-Go task we analyzed both omission errors and false alarms. The results of these analyses showed that participants reduced their average omission errors [Ms<sup>1</sup> = 3.50, SDs<sup>1</sup> = 2.68; Ms<sup>6</sup> = 1.43, SDs<sup>6</sup> = 2.01; F(1,31) = 13.11; p < 0.01; η 2 <sup>p</sup> = 0.30], while the reduction of false alarms was not reliable [Ms<sup>1</sup> = 3.90, SDs<sup>1</sup> = 3.50; Ms<sup>6</sup> = 3.25, SDs<sup>6</sup> = 2.70; F < 1; p = 0.40; η 2 <sup>p</sup> = 0.02].

#### WMT Group

For all the WM-training tasks (n-back, WM search and WM updating), we compared the memory set size recalled from the first to the last training sessions. The increment in set size recalled was statistically significant for all the three activities trained: n-back [Ms<sup>1</sup> = 1.13, SDs<sup>1</sup> = 0.17; Ms<sup>6</sup> = 2.60, SDs<sup>6</sup> = 0.57; F(1,31) = 190.92; p < 0.01; η 2 <sup>p</sup> = 0.75]; WM Search [Ms<sup>1</sup> = 2.21, SDs<sup>1</sup> = 0.17; Ms<sup>6</sup> = 4.92, SDs<sup>6</sup> = 1.07; F(1,31) = 198.32; p < 0.01; η 2 <sup>p</sup> = 0.76] and WM Updating [Ms<sup>1</sup> = 1.00, SDs<sup>1</sup> = 0.06; Ms<sup>6</sup> = 3.20, SDs<sup>6</sup> = 0.63; F(1,31) = 390.95; p < 0.01; η 2 <sup>p</sup> = 0.86].

#### AC Group

Note that this group did not change the level of executive demands, which was held constant throughout the training sessions. Although, they went forward over levels, so their impression was that they were training, the changes from one level to the next were the progressive reduction of presentation speed and response-time. Hence, we compared the speed of the participants' responses (ms) from the first to the last session for the three activities. The results of this comparison

yielded statistically significant differences for Speed Comparison [Ms<sup>1</sup> = 5075.87, SDs<sup>1</sup> = 437.75; Ms<sup>6</sup> = 3539.04, SDs<sup>6</sup> = 656.75; F(1,23) = 89.62; p < 0.01; η 2 <sup>p</sup> = 0.66]; Speed Visual Search [Ms<sup>1</sup> = 22555.57, SDs<sup>1</sup> = 864.64; Ms<sup>6</sup> = 3585.23, SDs<sup>6</sup> = 534.92; F(1,23) = 8096.82; p < 0.01; η 2 <sup>p</sup> = 0.99] and Speed Categorization [Ms<sup>1</sup> = 24987.26, SDs<sup>1</sup> = 62.39; Ms<sup>6</sup> = 13731.60, SDs<sup>6</sup> = 194.22; F(1,23) = 374.22; p < 0.01; η 2 <sup>p</sup> = 0.89].

#### Training Slopes

The training program PEC-UGR enabled us to create many training levels by using all possible combinations of task parameters (i.e., proportion of congruent/incongruent trials; target-distractor similarity; memory load; response times; etc.). Nonetheless, because the tasks differed in the number of to-bemanipulated parameters, the number of training levels varied across activities. Consequently, in order to put together the trained activities and to compare how far participants from the different groups went in the training, we standardized the level of achievement for each participant by dividing the average level reached in a given activity by the number of levels possible in the activity. Thus, **Figure 1** represents the relative level achieved in each activity and each training session for the three trained groups.

To quantify participants' training improvement over the six sessions of training, we calculated the slope of a linear regression model using the standardized average level in each training session and activity per participant (Katz et al., 2014; Wang et al., 2014). In order to compare the training achievements of the different groups (**Figure 1**), slopes of the three training tasks for each group were averaged. A one-way ANOVA showed a main effect of group [F(2,85) = 16.26; p < 0.01; η 2 <sup>p</sup> = 0.27], as the average slope for the AC (M = 12.23; SD = 1.34) was significantly larger than the one for the ICT (M = 8.68; SD = 2.92; p < 0.01) and the WMT (M = 8.05; SD = 1.78; p < 0.01) groups. This is consistent with the fact that active control activities were significantly easier that the executive control ones, facilitating the advancement through the activity levels. The slopes of the two experimental training groups did not differ one from each other (p = 0.76).

#### Correlations at Pre-test

To check for relationships between the cognitive functions tested at baseline, Pearson correlations were run on the pre-test scores for all the participating groups as a whole. These analyses showed that WM-related measures were correlated: those participants with a higher combined score in the O-Span task showed fewer intrusions in the O-Span (r = −0.34; p < 0.01) and fewer errors in the n-back task (r = −0.19; p = 0.03). Additionally, participants with a larger BSI showed fewer intrusions in the O-Span task (r = −0.20; p = 0.03).

Finally, RAPM scores significantly correlated with errors in the n-back (r = −0.21; p = 0.10) and with the combined score from the O-Span task (r = 0.31; p < 0.01).

#### Transfer Results

**Table 1** summarizes the descriptive data of the outcome measures, including statistical comparisons for the session effects

(pre vs. post) in each of the groups. We also calculated standardized gains subtracting the pre-test scores from the posttest (the opposite for reaction times and errors) and divided by the standard deviation of the entire sample (Colom et al., 2013; Jaeggi et al., 2013; Redick et al., 2013; Borella et al., 2014). One-way ANOVAs were performed for each variable in order to compare standardized gains between the groups. The participants who were excluded at pre-test due to missing data were also excluded from analyses of performance after the training.

#### Stroop

We obtained a relative conflict score from the difference in reaction times between incongruent and congruent trials. There were no pre-test differences between the groups [F(3,88) = 1.86;


#### TABLE 1 | Descriptive statistics for outcome measures: Mean and standard deviations for the outcome measures in the pre- and post-testing.

Significance p-values and effect sizes (Cohen's d) estimators are reported for the Repeated-Measures ANOVAs including session as a within subject variable (pre-test and post-test values) and group as a between-subject effect in each of the four groups. Standardized gains were computed as (Mpost – Mpre)/(SDpre) for hits and as (Mpre – Mpost)/(SDpre) for errors and reaction times.

p = 0.14; η 2 <sup>p</sup> = 0.05]. The ANOVA on the standardized gains failed to show a reliable effect of group, F(3,88) = 1.76; p = 0.16; η 2 <sup>p</sup> = 0.05. As can be observed in **Table 1**, however, it was only the ICT group that was able to significantly reduce their conflict scores after completing the training.

#### N-back

The n-back level and the number of errors were considered in this task. There were no differences in the baseline n-back level before training [F(3,108) = 1.39; p = 0.24; η 2 <sup>p</sup> = 0.03]. However, a one-way ANOVA revealed differences in the number of errors at pre-test [F(3,108) = 10.02; p < 0.01; η 2 <sup>p</sup> = 0.21], whereby the PC group committed significantly fewer errors than the other three groups (all ps < 0.01). **Table 1** shows that only the WMT group showed a reliable increase in the number of items that could be maintained/updated in WM and a reduction in the number of errors committed after the training. The ANOVA performed on the standardized gains scores revealed a statistically significant effect of group for n-back level: [F(3,108) = 4.06; p < 0.01; η 2 <sup>p</sup> = 0.10]. Post hoc comparisons for the n-back level indicated that the only reliable difference was between the PC and the WMT groups (p < 0.01) whereas the pairwise comparisons between the remaining groups were not significant (ICT-WMT: p = 0.11; ICT-AC: p = 1.00; ICT-CP: p = 1.00; WMT-AC: p = 0.42; PC-AC: p = 0.89). In the case of errors, and because we found differences between groups at pre-test, we checked whether there were group differences in the standardized gains as n-back errors committed at pre-test were introduced as a covariate. The analysis of covariance (ANCOVA) revealed a reliable effect of the covariate [F(3,107) = 59.06; p < 0.01; η 2 <sup>p</sup> = 0.35] but also a significant effect of group [F(3,107) = 5.28; p < 0.01; η 2 <sup>p</sup> = 0.13]. Further analyses showed that there was a statistically significant difference between the AC and WMT groups (p = 0.01), and between the two control groups (p < 0.01). None of the other comparisons showed reliable differences (ICT-WMT: p = 0.39; ICT-AC: p = 0.89; ICT-CP: p = 0.14; WMT-CP: p = 1.00).

#### Stop-Signal

We used the software ANALYZE-IT provided by Verbruggen et al. (2008) to determine the impact of training on inhibition. The SSRT is an index of pure response inhibition and the program calculates it by subtracting the SSD from the untrimmed RT mean. Following the criteria of Verbruggen, we removed five participants (one ICT, one WMT, one AC, and two PC participants) from the analysis, since they had an overall probability of responding on stop trials significantly below or above 50% in both pre- and post-test. The groups did not differ in SSRT at pre-test [F(3,104) = 1.85; p = 0.14; η 2 <sup>p</sup> = 0.05]. As for response inhibition, while the corresponding ANOVA did not revealed a reliable effect of group [F(3,104) < 1; η 2 <sup>p</sup> = 0.01], the only reliable pre–post reduction of SSRT was in the ICT group (**Table 1**). Note that there were training-related effects neither on hits nor on the RTs of Go trials<sup>3</sup> . Thus, training effects were only evident in the SSRT as an index of response inhibition, but not in the other variables that assess basic task performance. This is important since it shows that transfer is specific to the executive control trained process.

#### Operation Span

For the O-Span task, we considered the number of words recalled (storage capacity) and the averaged accuracy of equations (ongoing processing) multiplied, and resulting as a combined index of dual processing. For the calculation, we used a partial credit load scoring approach (PCL, Conway et al., 2005), which considered the average proportion of correctly recalled words from all set sizes, regardless of whether the set size group was perfectly recalled or not.

A one-way ANOVA on the combined scores (words recalled and equations accuracy) showed that there were no differences between the groups at pre-test [F(3,108) = 0.42; p = 0.73; η 2 <sup>p</sup> = 0.01]. Particularly, there was a reliable pre– post enhancement in the three training groups, with the greatest effect size in the ICT group (**Table 1**). The ANOVA comparing the groups' standardized gains failed to show reliable differences however, F(3,108) = 1.76; p = 0.15; η 2 <sup>p</sup> = 0.04.

Finally, intrusions were also considered as a measure of updating in WM (low intrusion corresponding to successful updating). In this case, only the WMT group was able to significantly reduce the number of intrusions in the dual task (**Table 1**). The one-way ANOVA confirmed a reliable effect of group, F(3,108) = 2.73; p = 0.04; η 2 <sup>p</sup> = 0.07. The post hoc comparisons showed that the only reliable difference involved the WMT and the PC groups (p = 0.03).

#### AX-CPT

To assess the tendency toward proactive/reactive control, we calculated the Behavioral Shift Index (BSI)<sup>4</sup> introduced by Braver et al. (2009) and Chiew and Braver (2014). Larger BSIs stands for a greater tendency toward proactive control, whereas smaller BSIs indicate a tendency toward reactive control. Invalid trials, which included no responses and trials with responses times below 100 or above 1000 ms, were 6.1% out of the trial total. Eight participants were removed from the analysis because they had more than 10% of invalid trials in pre- and post-test. The four groups were comparable in BSI at pre-test [F(3,100) = 0.21; p = 0.88; η 2 <sup>p</sup> = 0.01].

The one-way ANOVA on BSI standardized gains failed to show a group effect [F(3,100) = 1.60; p = 0.19; η 2 <sup>p</sup> = 0.04]. Nonetheless, the pre–post analyses only showed a reliable effect in both ICT and WMT groups, which exhibited larger BSI after the training (**Table 1**, ps ≤ 0.01).

#### Raven's Advanced Progressive Matrices

There were no differences between the groups at pre-test in either hit rates [F(3,108) = 0.44; p = 0.72; η 2 <sup>p</sup> = 0.01], or reaction times [F(3,108) = 1.47; p = 0.22; η 2 <sup>p</sup> = 0.03]. As shown in **Table 1**, however, the ICT was the only group that exhibited a pre–post increase in hit rates. The one-way ANOVA confirmed an effect of group, F(3,108) = 2.63; p = 0.05; η 2 <sup>p</sup> = 0.06), which was mainly accounted for by the difference between the ICT and PC groups (post hoc comparison with p = 0.04). No effects were found in hit reaction times.

<sup>3</sup>No reliable differences in pre–post effects were observed on hits (ps > 0.33) or RTs (ps > 0.56) in Go trials for the ICT, WMT or AC groups. The PC group showed a worse performance with a lower hit rate [Mpre = 95.71, SDpre = 4.76; Mpost = 91.77, SDpost = 4.56; F(1,21) = 8.06; p = 0.01; η 2 <sup>p</sup> = 0.27]; and slower RTs [Mpre = 753.21, SDpre = 156.80; Mpost = 832.54, SDpost = 168.07; F(1,21) = 11.75; p = 0.00; η 2 <sup>p</sup> = 0.35] in the post-test.

<sup>4</sup>This index is based on the formula (AY − BX)/(AY + BX) for errors and reaction times. Trials where errors were equal to 0 were corrected [(errors + 0.5)/frequency of trials + 1].

# Predictors for Training Improvement and Transfer

We were also interested in exploring which of the cognitive abilities tested at the baseline level predicted the magnitude of training improvement. We ran linear regression analyses with the average training slope as the outcome, and all the measures at the pre-testing stage as predictors. Only for the experimental training groups, RAPM scores significantly predicted the global training improvement (R <sup>2</sup> = 0.12; p = 0.01; β = 0.86). We also looked at whether pre-test performance on the reasoning test predicted transfer gains after training. However, there was not a reliable relationship between RAPM scores before training and any of the gain scores on the transfer tasks. Reasoning scores at pre-test only predicted reasoning scores at post-test (R <sup>2</sup> = 0.24; p < 0.01; β = 0.45).

Going a step further we also looked at whether the magnitude of training improvement predicted transfer gains. We ran linear regression analyses in each training group, setting the standardized gains in the different transfer tasks as the criterion and the average training slope as the predictor variable. In the ICT group, higher training improvements predicted larger gains in the relative conflict score of the Stroop task (R <sup>2</sup> = 0.29; p < 0.01; β = 0.19) and larger gains in the RAPM (R <sup>2</sup> = 0.12; p = 0.04; β = 0.09). No reliable regressions emerged for the WMT and the AC groups (all with ps > 0.15).

On the whole trained sample, the analyses showed that the level participants were able to achieve in the training activities only predicted performance in the criterion tasks; namely, conflict in Stroop (R <sup>2</sup> = 0.12; p < 0.01; β = 0.13) and errors in the n-back task (R <sup>2</sup> = 0.05; p = 0.03; β = −0.10).

# Motivation Results

In order to account for the motivational factors during training, every two sessions we asked participants about their: (i) involvement in the program; (ii) perceived difficulty of the activity levels; (iii) perceived challenge of improving over the levels; (iv) expectations for their achievement (Alonso-Tapia and de la Red Fadrique, 2007; Colom et al., 2013). We averaged all the variables across the three measurement points and explored their distribution across groups. One-way ANOVAs failed to show group differences in any of the four motivational variables: implication [(AC: M = 9.09; SD = 0.85; ICT: M = 9.03; SD = 0.94; WMT: M = 8.98; SD = 0.92); F < 1; p = 0.90, η 2 <sup>p</sup> = 0.00]; perceived difficulty [(AC: M = 6.31; SD = 1.46; ICT: M = 6.11; SD = 1.76; WMT: M = 6.25; SD = 1.49); F < 1; p = 0.8, η 2 <sup>p</sup> = 0.00]; perceived challenge to improve [(AC: M = 7.15; SD = 1.46; ICT: M = 7.31; SD = 1.47; WMT: M = 7.28; SD = 1.52); F < 1; p = 0.91, η 2 <sup>p</sup> = 0.00] and expectations to improve [(AC: M = 7.28; SD = 1.58; ICT: M = 7.76; SD = 1.20; WMT: M = 7.94; SD = 1.07); F(2,85) = 1.91; p = 0.15; η 2 <sup>p</sup> = 0.04]. Then, we calculated partial correlations controlling for group between the four motivation variables and the global training slope. We only found a modest correlation between the training slope and the perceived challenge (r = 0.25; p = 0.04), so that those participants who perceived the training as more challenging were the ones who tended to improve the most.

In order to explore whether participant's motivation modulated training improvement, we averaged the four motivation variables and calculated a global motivation score (AC: M = 6.81; SD = 0.69; ICT: M = 6.99; SD = 0.75; WMT: M = 6.99; SD = 0.54). A one-way ANOVA showed no differences between the groups in general motivation, F < 1; p = 0.54; η 2 <sup>p</sup> = 0.01. However, because we wanted to more precisely examine whether the motivation level was related to the participants' training achievement, we split all the executive control trained participants by the median of the global score (Md = 6.95) to differentiate between high and low motivated participants. A one-way ANOVA with motivation (high and low) as the factor and global training slope of ICT and WMT participants as the dependent variable showed motivation levels to be statistically significant, F(1,62) = 5.55; p = 0.02; η 2 <sup>p</sup> = 0.08); with high motivated participants exhibiting a higher training slope (M = 9.39; SD = 2.65) than less motivated participants (M = 7.82; SD = 2.66).

To explore whether motivation predicted transfer, multiple linear regression models were run setting the standardized gains in the transfer tasks as criterion variables, the four motivational variables measured during training as predictor variables, and considering the two training groups as a whole. The level of motivation predicted transfer to the O-Span task in the two experimental training groups (R <sup>2</sup> = 0.15; p = 0.03), so that those who felt more involved (β = 0.28; p = 0.03) and those who perceived the training as less difficult (β = −0.31; p = 0.02) had larger gains after training.

Lastly, we compared the transfer gains in those participants who were highly motivated from the experimental groups (ICT: n = 17; WMT: n = 15) with those who were highly motivated in the active control group (n = 12). Most likely due to the small sample sizes, only a marginal statistical effect was found on the standardized gains of one of the two criterion tasks. Specifically, in the n-back task the WMT participants had larger gains (M = 0.92; SD = 0.99) than the ICT (M = 0.13; SD = 1.07) and the AC participants (M = 0.19; SD = 0.96), [F(2,31) = 2.80; p = 0.07; η 2 <sup>p</sup> = 0.12].

# DISCUSSION

The main goal of the present study was to directly compare the effectiveness of two specific process-based EFs training programs (WM and IC) in young adults. These two programs were based on the assumption of the highly influential "Unity and Diversity" model of EFs proposed by Miyake et al. (2000). The main feature of this model is that the EFs system could be partitioned into overlapping (unity) and yet distinct (diversity) components (inhibition, shifting, and WM updating). A logical conclusion drawn from the assumption of diversity is that EFs training could specifically be targeted to one of these functions with transfer effects showing some degree of specificity and commonality. The results of the present experiment generally support this assumption.

Thus, regarding the improvement in the criterion tasks – structurally similar to the trained ones – our results support the

specificity of EFs training on the basis of the specific benefits observed at post-test. Only the WMT group showed pre–post enhancement in the n-back task (n-back level and errors) and only the ICT group exhibited reduced conflict scores in the Stroop task after training. Even though some previous studies have shown benefits in the Stroop task following WM training (Borella et al., 2010; Chein and Morrison, 2010; Schweizer et al., 2011), we failed to observe a reliable effect of WMT over conflict resolution. Hence, the results concerning the criterion tasks point to straightforward training-specific effects.

In relation to near transfer effects, we also observed specific training benefits for the WMT group in the non-trained WM task (O-Span). Particularly, for the O-Span task only the WMT group showed a benefit in suppressing memory intrusions; consistent with previous studies showing the relationship between high WMC and more efficient intrusions suppression in span tasks (Rosen and Engle, 1998; Turley-Ames and Whitfield, 2003; Borella et al., 2008). Similarly, the ICT group was the only group that specifically showed a benefit in response inhibition (SSRT), indicating that adaptive training in conflict resolution tasks improves performance in other tasks also thought to require conflict resolution mechanisms (for related results, see Logan and Burkell, 1986; Manuel et al., 2013; Berkman et al., 2014; Enge et al., 2014; Dovis et al., 2015). Together with the results found with the criterion tasks, the near transfer results also support the idea that training on either WM or IC leads to specific performance benefits in tasks related to the training (Simons et al., 2016).

However, and despite this specificity, the two EFs-trained groups also showed some common features regarding near transfer effects. Thus, both WMT and ICT groups improved dual performance (Equations accuracy × Words recalled) in the O-Span task [related findings of improved complex span scores have been reported after simple, complex span and visual search training, (Harrison et al., 2013); rehearsal strategy training (Turley-Ames and Whitfield, 2003) or task-switching training (Karbach and Kray, 2009)], suggesting that dual tasking may require both WM capacity and IC mechanisms (Towse et al., 2000; Smith et al., 2001; Unsworth, 2010; Chein et al., 2011). Hence, IC seem to be demanded not only in the training activities practiced by our ICT group but also in the WM updating tasks that required suppression of irrelevant information and that were extensively practiced by the WMT group. This might be indicating the relationship between WM and IC at the behavioral level, and be suggestive of the degree to which trained and transfer processes may overlap in their underlying neuro-cognitive networks. Kane and Engle (2002) proposed that dorsolateral prefrontal cortex could play a role in WM capacity in contexts providing potential interference (and requiring attentional control). Conway et al. (2003) and Gray et al. (2003) agree that in WM span tasks regions in the prefrontal cortex are activated when an executive control mechanism is recruited to reduce interference during the maintenance and manipulation of information.

It is, however, puzzling that we also observed a training effect for the active control group in the O-Span task, which did not differ from that obtained by the WMT group. Note that, although the AC group did not increase the cognitive load over the training levels, we used activities that involved increasing difficulty by augmenting the speed of processing. Thus, as also predicted, it is possible that the positive effect for this control group stemmed from the overarching time-limited nature of the tasks. Increasing speed of processing could have led to more efficient processing and maintenance in WM that would result in better performance in the O-Span task. Unsworth et al. (2009) reported a negative correlation between processing speed and WM maintenance, suggesting that participants who processed quickly recalled more items that those who worked slowly. Similarly, faster speed processing has been proposed to reduce the possibility of items being forgotten, and less time for rehearsing or refreshing processes (Towse et al., 2000; Hudjetz and Oberauer, 2007; Unsworth et al., 2009).

Regarding far transfer effects, we also found common and diverse features in our trained groups. We included two tasks (AX-CPT and RAPM) that did not directly capture WM or IC: the AX-CPT was used to explore whether training effects might change the control strategy used by the participants, and Raven's matrices to explore whether WM and IC training transferred to a more general complex domain such as abstract reasoning. The AX-CPT is widely used to explore the dynamic adjustment of cognitive control strategies and it has shown to be very sensitive to individual differences in cognitive control (Braver et al., 2009; Burgess et al., 2011; Braver, 2012). Proactive control requires goal maintenance and is related to paying attention to contextual cues in order to effectively solve interference while keeping the monitored cues in mind (Rush et al., 2006). In this version of the task, the use of a proactive control strategy was encouraged since the context was highly predictive (the A cue precedes the X target in 70% of the trials); hence, a control mode that involves sustained maintenance of task-relevant information would lead to a high success rate, albeit it would lead to errors in trials where the cue was A but the probe was not X (AY trials; 10% of the trials). Thus, enhanced proactive control is expected to increase AY errors and reduce BX errors, with the BSI tending to larger values since the cue in BX trials does not signal a "yes" response. Usually, because it is the most efficient strategy, young adults exhibit behavioral performance and brain activity (sustained lateral PFC activation) consistent with a predominant proactive control strategy (Braver, 2012; Morales et al., 2013).

Interestingly, results from our experiment regarding the BSI (an index signaling changes toward proactive control) in the AX-CPT suggested a higher reliance on proactive control for WM and IC trained participants compared to active and passive control groups. Previous studies have already reported the malleability of cognitive control mechanisms engaged in the AX-CPT due to experience-based conditions such as bilingualism (Morales et al., 2013, 2015) or different kinds of training interventions: task-strategy training made older adults (Paxton et al., 2006) and people with schizophrenia (Braver et al., 2009; Edwards et al., 2009) more prone to engage in proactive control; indeed, more similar to adults-like performance than before training. Previous studies have also reported proactive shifts in cortical regions as the lateral PFC after strategy (Braver et al., 2009) and IC training (Berkman et al., 2014), suggesting the possibility

that the lateral PFC might serve to anticipate upcoming control demands across a range of executive control domains. Our results replicate and extend these findings by showing behavioral shifts toward proactive processing in both ICT and WMT (even though numerically larger in the WMT group), suggesting again some common executive resources for inhibitory and WM processes.

In contrast, the results of the non-verbal reasoning (RAPM) task showed some degree of specificity. Specifically, we observed a benefit for the ICT group but not for the WMT group. The question of whether cognitive training could improve fluid intelligence is a recurrent controversial area of research with considerable number of studies reporting data against (Shipstead et al., 2012; Melby-Lervåg and Hulme, 2013) and in favor of it (Morrison and Chein, 2011; Au et al., 2014). Results of our ICT group join others showing better reasoning performance after training. Karbach and Kray (2009) reported improved performance in a composite measure of reasoning after four sessions of task-switching training in children, young and older adults compared to an active control group (Karbach and Kray, 2009). Similarly, Rueda et al. (2005) found benefits in a measure of reasoning after 5 and 10 days (Rueda et al., 2012) of executivecontrol training in pre-school children when compared to control groups (but see Thorell et al., 2009 and Enge et al., 2014 for failures to show such positive effects).

However, our WMT group did not show benefits in abstract reasoning. Although fluid intelligence and WM share common variance (Colom et al., 2004; Oberauer et al., 2005; Friedman et al., 2006; Harrison et al., 2015) and EFs have been related to reasoning operations (Dempster and Corkill, 1999; Engle and Kane, 2004; Jarosz and Wiley, 2012; Shipstead et al., 2015) it is possible that our participants did not reach the level of difficulty needed to show far transfer. In support of this interpretation, the results of the regression analyses showed that training improvement only predicted transfer to abstract reasoning for the ICT group, which suggests that the training levels achieved by the WMT group did not reach high enough demand levels to promote transfer. Previous studies reporting positive training effects have normally used single but highly demanding tasks, such as the dual n-back task (Jaeggi et al., 2008, 2010, 2013) and/or participants attained high levels of performance over training, such as n-back levels of over 3 (Jaeggi et al., 2008, 2013). Note that, in average, our WMT participants reached an n-back level below three and performed a single n-back task. Hence, and considering the fact that we trained more than one task, it is possible that the level of difficulty was below that needed to show far transfer effects with WM training.

Together, the observed transfer effects allow us to claim that it was the ICT group that showed the most consistent pattern of enhanced performance across tasks. While there might be more than a single reason behind this finding, we favor the idea that the benefit for the ICT group is not related to differences in cognitive demands or motivational aspects between the two training programs. As previously noted, the training levels achieved by the WMT group could have not been demanding enough to lead to stronger overall transfer.

An additional and interesting point addressed in the present work was to look at individual differences regarding training and transfer effects. This is an issue that remains to be explored in deep (Könen and Karbach, 2015). In line with previously reported studies (Bürki et al., 2014), we have found that abstract reasoning was a meaningful predictor of training improvement, indicating that people with higher reasoning scores benefited more from training. Furthermore, training improvement constituted a relevant predictor of transfer to the criterion tasks for the two experimental trained groups, and particularly in the case of the ICT group a predictor for transfer to reasoning and conflict reduction. This pattern of results highlights the importance of considering individual differences before training because they might influence how well they do during training and how much benefit they can take from it (Könen and Karbach, 2015).

In addition to the more important theoretical issues related to brain plasticity and transfer, a secondary aim of our study was methodological in nature. Previous training studies have been criticized for the suitability of the control conditions, for not considering motivational factors, or for the use of single training tasks (Jaeggi et al., 2013; Redick et al., 2013; Melby-Lervåg et al., 2016). In our study, we took these factors into account by using different tasks to train the target processes (and to increase the probability of generalization), by introducing two different control conditions (active and passive), and by considering motivational variables associated with training. Thus, the active control group engaged in tasks essentially requiring processing speed (for related approaches see Goldin et al., 2014; Lawlor-Savage and Goghari, 2016), in order to keep participants' motivation and engagement similar to those from the experimental groups. Importantly, all trained participants (including the AC group) showed a meaningful improvement in the specifically trained process (IC, WM and processing speed). In fact, the AC group showed larger training slopes than the two experimental groups (**Figure 1**). Note again that control activities were mainly perceptual and successive levels did not engage greater executive load but only faster responses with a low constant cognitive effort.

Also of relevance, the motivation questionnaire revealed similar levels of implication, perceived difficulty, perceived challenge and expectations during training for the three trained groups, indicating that transfer differences among the groups were not due to differences in motivation or perceived difficulty. Interestingly, while motivation cannot easily account for the differences between the training and control groups, it was a factor that predicted training improvement in the experimental groups, so that highly motivated participants – those that were more involved and perceived the training as less difficult – showed larger improvements across the training sessions than less-motivated participants. Thus, and consistent with previous studies, motivation this result highlights the importance of considering individual motivation through training, since it is related to greater improvements that could result in greater transfer effects (Mather and Carstensen, 2005; Jaeggi et al., 2013; Katz et al., 2014). Apart from motivation, it would be of interest in future studies the inclusion of additional selfreporting assessments regarding individual differences in beliefs about the fixed or malleable nature of cognition (Jaeggi et al., 2013), expectancy and perceived improvement (Boot et al., 2011).

It must be noted that the present research is not without limitations. First, and despite the specific benefits found in the within-group comparisons, the lack of group effects in some of the standardized gain measures suggests caution in the interpretation of the results. Null effects in the gain comparisons may reflect the lack of statistical power but also inflated variability among groups. In a recent meta-analysis, Melby-Lervåg et al. (2016) established that training studies with large effect sizes normally included small sample sizes (less than 20 participants per training condition) and untreated (passive) control groups, which produces biases toward significant – but low powered – results (Enge et al., 2014; Melby-Lervåg et al., 2016). In the present study, we used samples that were all over 20 participants per condition and we included a passive control group as well as an active control group. Second, the present training schedule covered 2 weeks, which is in the lower end of the range of the current training studies (from 2 to 14 weeks; Morrison and Chein, 2011). Hence, it is yet to be explored the magnitude of the transfer effects when training is extended over a longer period of time. Similarly, our study is blind regarding possible long-term effects since we did not follow them up in time. Future studies should address this issue because the value of training interventions essentially relies on the durability of training-induced results. Finally, we recognize that transfer effects in studies with young healthy samples are limited as long as they might be optimally functioning at pre-testing, leaving not enough room for meaningful improvements with training. Hence, studies with children and older adults could be more sensitive to training-related changes than studies with young people (Kelly et al., 2014; Spencer-Smith and Klingberg, 2015; Weicker et al., 2016).

In closing, and despite the existing limitations, our results lead us to suggest that executive-control training may modulate cognitive abilities in young people. The malleability of EFs challenges the long-standing assumption that cognitive abilities remain fixed over time. Training cognition is not a new concept (Jolles and Crone, 2012; Boot and Kramer, 2014; Schubert et al., 2014), but the idea that training and experience can generalize to tasks and domains beyond those trained is still controversial. In this sense, our results, while being modest and at the task level – rather than at the construct level – are promising and support substantial plasticity of cognitive control mechanisms by means of training. Interestingly, the results also suggest that

#### REFERENCES


there is some specificity in the consequences of the trained processes so that transfer occurs only when the specific trained process is tapped by the transfer task and domain. This opens the possibility that training in applied settings may be specific to the process needed for a specific domain, or to the impaired process due to deficient brain functioning. Also, this is suggestive of setting the ambitious goal of exploring the potential benefit of executive control training for everyday activities (Simons et al., 2016). Before this, further research would need to address the potential effects of executive-control training over brain structure and dynamics. Analyzing structural and functional brain profiles may provide further insight into why specific interventions may be more successful for certain individuals, and help characterize the overlap between training tasks and tests that show trainingrelated transfer.

# AUTHOR CONTRIBUTIONS

This work is part of the thesis dissertation of the first author (MM). The authors developed the concept of the study together. MM contributed to the design of training tasks, data collection and analyses, and manuscript writing. MB in collaboration with M. R. Rueda designed the online program PEC-UGR. MB and CG-A supervised the process of accomplishing the study, and wrote, reviewed and approved the final version of the manuscript.

# FUNDING

The current research was completed thanks to financial aid provided by the doctoral research grant BES-2013-066842 to MM, and by grants from the Spanish Ministerio de Economía y Competitividad to MB (PSI2012–33625 and PCIN-2015-132), and CG-A (PSI2011-25797 and PSI2015-65502-C2-2-P), and by the Andalusian Government to MB (P12-CTS-2369-Fondos Feder).

# ACKNOWLEDGMENTS

We would like to thank M. J. Ruiz for his help in data collection, and R. Garcia-Ortega and D. Lopez-Padial for his technical support with the online training program PEC-UGR.

effects, limitations, and great expectations? PLoS ONE 10:e0142169. doi: 10.1371/journal.pone.0142169


perceived effort on ERP components. Int. J. Psychophysiol. 87, 266–272. doi: 10.1016/j.ijpsycho.2012.08.005


of Learning and Motivation, ed. B. Ross (New York, NY: Elsevier), 145–199.


controlled trials of psychological interventions. Psychother. Psychosom. 78, 275–284. doi: 10.1159/000228248


special focus on brain injured patients. Neuropsychology 30, 190–212. doi: 10.1037/neu0000227


age, baseline performance, and training gains. Dev. Psychol. 50, 304–315. doi: 10.1037/a0032982

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Maraver, Bajo and Gomez-Ariza. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Working Memory Training for Healthy Older Adults: The Role of Individual Characteristics in Explaining Shortand Long-Term Gains

Erika Borella<sup>1</sup> \*, Elena Carbone<sup>1</sup> , Massimiliano Pastore<sup>2</sup> , Rossana De Beni <sup>1</sup> and Barbara Carretti <sup>1</sup> \*

<sup>1</sup> Department of General Psychology, University of Padova, Padova, Italy, <sup>2</sup> Department of Developmental and Social Psychology, University of Padova, Padova, Italy

Objective: The aim of the present study was to explore whether individual characteristics such as age, education, vocabulary, and baseline performance in a working memory (WM) task—similar to the one used in the training (criterion task)—predict the short- and long-term specific gains and transfer effects of a verbal WM training for older adults.

#### Edited by:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia, Spain

#### Reviewed by:

José Manuel Reales, Universidad Nacional de Educación a Distancia, Spain Tiina Salminen, Synaptikon GmbH (NeuroNation), Germany

#### \*Correspondence:

Erika Borella erika.borella@unipd.it Barbara Carretti barbara.carretti@unipd.it

Received: 21 October 2016 Accepted: 20 February 2017 Published: 22 March 2017

#### Citation:

Borella E, Carbone E, Pastore M, De Beni R and Carretti B (2017) Working Memory Training for Healthy Older Adults: The Role of Individual Characteristics in Explaining Shortand Long-Term Gains. Front. Hum. Neurosci. 11:99. doi: 10.3389/fnhum.2017.00099 Method: Four studies that adopted the Borella et al. (2010) verbal WM training procedure were found eligible for our analysis as they included: healthy older adults who attended either the training sessions (WM training group), or alternative activities (active control group); the same measures for assessing specific gains (on the criterion WM task), and transfer effects (nearest on a visuo-spatial WM task, near on short-term memory tasks and far on a measure of fluid intelligence, a measure of processing speed and two inhibitory measures); and a follow-up session.

Results: Linear mixed models confirmed the overall efficacy of the training, in the short-term at least, and some maintenance effects. In the trained group, the individual characteristics considered were found to contribute (albeit only modestly in some cases) to explaining the effects of the training.

Conclusions: Overall, our findings suggest the importance of taking individual characteristics and individual differences into account when examining WM training gains in older adults.

Keywords: working memory training, older adults, age, working memory baseline performance, general cognitive ability, training benefits, individual differences, individual characteristics

# INTRODUCTION

Working memory (WM), i.e., the ability to retain and manipulate information for use in complex cognitive tasks, is one of the core mechanisms involved in higher-order cognitive abilities (e.g., fluid intelligence, problem-solving, and reading comprehension; de Ribaupierre, 2001; Borella et al., 2011). Though characterized by a limited capacity, WM is a crucial mechanism in cognition. It is also one of the cognitive processes that suffer a clear and linear decline with aging (e.g., Borella et al., 2008; Mammarella et al., 2013). WM is consequently one of the general processes targeted by the new generation of process-based cognitive training. The assumption that WM

**89**

is trainable is based on evidence of the plasticity of our cognitive system across the whole life span (i.e., Hertzog et al., 2008). Further, according to some WM models, such as the continuity model (see Cornoldi and Vecchi, 2003; Cornoldi, 2010), WM is characterized by different processes that depend on the type of content processed (verbal vs. spatial) and also on the involvement of executive control. Therefore, by improving WM, its related processes can also theoretically be enhanced. The Cornoldi and Vecchi WM model distinguished between a "basic structure" (a sort of personal biological equipment), and a "used ability" determined by the way in which individuals use their WM. On this basis, the benefits of training may presumably concern not only the basic structure of WM, but also its usage.

The aim of WM training in aging is thus to improve older adults' information processing system (e.g., Zinke et al., 2012; Bürki et al., 2014), in order to sustain their cognitive functioning for an active aging. WM training was shown to improve performance not only in the trained tasks (or in tasks similar to the one used in the training), but also in untrained tasks (transfer effects). Training changes the way in which individuals process information, enabling them to make more flexible use of their own resources.

The recent meta-analysis by Karbach and Verhaeghen (2014), focusing on aging, showed that WM training for older people could promote significant gains both in the trained tasks and in other similar tasks (near transfer effects). There also seemed to be some improvements in untrained tasks that shared some cognitive processes with the task used in the training (far transfer effects), though they were usually small in terms of effect size (see Karbach and Verhaeghen, 2014). There have been mixed reports on the matter of the efficacy of WM training in aging (see **Table 1**), however, making it necessary to identify which factors are involved in giving rise to training benefits. Among the numerous factors to consider, individual characteristics such as age, general cognitive ability, and baseline cognitive resources—believed to predict the benefits of memory training (e.g., Verhaeghen and Marcoen, 1996) may also have a role as modulators of WM training outcomes (Bürki et al., 2014). Surprisingly, their role has not been the focus of WM training studies as yet.

Age is one of the crucial factors that may explain whether and to what extent individuals may gain more or less in terms of both specific training gains (in a given trained task) and transfer effects (e.g., von Bastian and Oberauer, 2014). Some WM training studies examined the role of age in explaining the benefits of training by comparing young and older adults, or considering older adults in different age brackets (see also Borella et al., 2014 for a review). Some of the studies that included both young and older adults analyzed how performance changed over the course of the training sessions (Dahlin et al., 2008; Li et al., 2008; Richmond et al., 2011; Brehmer et al., 2012; von Bastian et al., 2013; Bürki et al., 2014). Brehmer et al. (2012), for instance, considered weekly WM performance scores, pooling participants' daily performance in 7 WM training tasks into a single t-standardized WM performance score. They found that young adults gained more than older adults from week 1 to 2, but then the two age groups showed comparable improvements from the second week to the end of training, from week 2 to 4. Bürki et al. (2014) found that age-related differences in the performance of young and older adults persisted over 10 training sessions (with greater improvements in the former). Li et al. (2008) found significant improvements for both young and older adults in two trained spatial n-back tasks (though the best performance reached by the older adults was still not as good as that of the younger adults). Other studies reported mixed results, however: age-related differences in favor of young adults were found in some of the trained tasks, while improvements were comparable between the two age groups in others.

As concerns specific training gains (i.e., in the criterion tasks), as shown in **Table 1**, mixed results were found: comparable benefits in young and older adults in tasks strictly similar to those used in the training were obtained in five studies (Li et al., 2008; Richmond et al., 2011; von Bastian et al., 2013; Bürki et al., 2014; Zaj ˛ac-Lamparska and Trempała, 2016); three studies showed greater improvements in young than in older adults (Dahlin et al., 2008; Heinzel et al., 2014; Salminem et al., 2015); one study obtained mixed results with age-related differences for some criterion tasks but not for others (Brehmer et al., 2012); two studies showed that older adults reached the young adults' baseline performance level on the WM criterion tasks immediately after the training-i.e., at the post-test assessment-(Li et al., 2008; Salminem et al., 2015), and one study found that older adults exceeded the young participants' baseline performance in the criterion task.

Similarly, for near as well as far transfer effects (when found), studies found either no differences between the two age groups, or larger effects in young adults than in older adults, or again mixed results (see **Table 1**). As for any long-term effects, if they were examined, Brehmer et al. (2012), and Dahlin et al. (2008) found a comparable maintenance of specific training gains between young and older adults. Brehmer et al. (2012) also identified the maintenance of both near and far transfer effects in both age groups. Partially in contrast, Li et al. (2008) found larger long-term specific training gains for young adults than for older ones, while the long-term near transfer effects were comparable between the two age groups (see **Table 1**).

Among the studies focusing only on older adults (see **Table 1**), the ones that found significant specific training gains and transfer effects, along with their maintenance were those involving young-old participants (from 60 to 74 years old). Studies that included old-old participants (from 75 to 87 years old), and those considering a broad age range (i.e., from 60 to 82) reported mixed findings in terms of specific and transfer training gains in the short term (see **Table 1**). As for the maintenance effects, some found limited transfer benefits (Borella et al., 2010, 2013, 2017; Zinke et al., 2013), and others found none (Buschkuehl et al., 2008). In one of these studies, older age also emerged as a negative predictor of training gains and at least some transfer effects (Zinke et al., 2013); it is worth noting that the effect sizes for transfer effects in this case were medium to large for tasks assessing near effects, but only small for far transfer effects (see **Table 1**).

Taken together, the above studies seem to support a negative role of age in determining the benefits of WM training.


Another variable that may influence the efficacy of cognitive training is general cognitive ability, operationalized in some studies with crystallized intelligence, i.e., performance in a vocabulary test (Zinke et al., 2013). This can be considered an index of general cognitive ability (e.g., Baltes, 1987), and a possible moderator of WM training benefits. The only WM training study that considered this variable found, however, that it did not contribute to explaining WM training gains and transfer effects (Zinke et al., 2013).

Individual differences in cognitive resources, such as WM baseline performance, are another factor that may predict training outcomes (see Jaeggi et al., 2014 for evidence in young adults), but only two WM training studies that focused on older adults (aged 77 to 96, Zinke et al., 2012; aged 65 to 80 and over: Zinke et al., 2013) have considered this variable. One study found a negative correlation between specific gains and participants' baseline WM performance, i.e., those with a weaker baseline WM performance gained more in the trained tasks than those whose WM performance was better (Zinke et al., 2012). The other confirmed this association, i.e., the lower the baseline WM performance, the larger the specific gains in the trained tasks (Zinke et al., 2013). In another study by Bürki et al. (2014), the pre-test score obtained in a reasoning measure was considered instead, and the results indicated that the effects of the training were predicted not by this reasoning score, but by age group.

Overall, the pattern of results concerning the role of individual characteristics and individual differences in training-related performance gains and transfer effects is rather mixed. It is also worth noting that, despite the importance of analyzing individual factors when assessing the benefits of WM training, only three studies have so far addressed this issue directly in relation to aging (Zinke et al., 2012, 2013; Bürki et al., 2014).

Hence the present study, the aim of which was to examine the role of individual differences (or individual characteristics) by jointly considering different factors to identify those capable of influencing short- and longer-term training-induced plasticity, measured in terms of both specific training gains and transfer effects. The factors considered as potential mediators of the efficacy of training (e.g., von Bastian and Oberauer, 2014) were demographic characteristics (i.e., age), baseline WM performance (Zinke et al., 2012) and general cognitive ability (i.e., crystallized intelligence measured with a vocabulary test; Zinke et al., 2013). The role of education was also examined because education is considered an index of cognitive efficiency that can preserve cognitive functioning, and because it is also used as a proxy of cognitive reserve (e.g., Stern, 2002; Staff et al., 2004), although no studies have examined whether it interacts with the trainability of WM.

We investigated the role of these variables by analyzing data emerging from studies that adopted the same WM training procedure, developed by Borella et al. (2010). This is one of the few procedures to have been used across different studies, generating consistent and promising results in terms of short- and long-term benefits (Borella et al., 2010, 2017) also in tasks related to everyday abilities (Carretti et al., 2013b; Cantarella et al., 2017)—in normal and pathological aging (in healthy young-old and old-old, Borella et al., 2013, 2014; in amnestic Mild Cognitive Impairment, Carretti et al., 2013a). The effectiveness of the training has been attributed to the fact that it involves participants practicing with a complex WM span task, combining an adaptive procedure with a systematic variation of the demands of the task, so that it remains constantly novel and challenging, keeping participants interested and motivated during the proposed activities. According to the authors, the training also engages numerous different processes that include encoding, maintaining and inhibiting information, simultaneously managing two tasks, sustaining and shifting attention. Together, these aspects are believed to promote learning and particularly to enable the training to favor transfer effects (see Borella et al., 2010). To date, seven studies have adopted this procedure (see **Table 2** for a summary), and four were selected for the present analysis because: (i) the same verbal procedure was adopted; (ii) the same measures were used to assess training gains and transfer effects; (iii) a follow-up session was included; and (iv) a sample of healthy older adults was considered (see **Tables 2, 3**).

Specific training gains and transfer effects were categorized along a conceptually-based continuum of nearest to far transfer tasks (i.e., Noack et al., 2009). The complex WM task (the Categorization Working Memory Span task, CWMS) was used to assess specific training gains because it is similar to the task administered to participants during the training sessions. Another complex WM task measuring the same narrow ability (WM), and also involving active processes (see Cornoldi and Vecchi, 2003), but with a different type of material (visuospatial, the Dot Matrix Task) was administered to assess what we describe here as nearest transfer effects. Measures of the same broad ability (memory), but with different demands from those of the other complex WM tasks (the Forward and Backward Digit Span tests; see meta-analyses by Bopp and Verhaeghen, 2005) were used to assess near transfer effects. Finally, tasks assessing fluid intelligence (the Cattell test), processing speed (the Pattern Comparison task), and inhibitory mechanisms (Stroop Color test and intrusion errors in the CWMS), i.e., mechanisms differing from WM but known to correlate with WM and to help explaining the age-related decline in WM (e.g., de Ribaupierre and Lecerf, 2006), were used to measure far transfer effects.

Linear mixed effects (LME) models were used to examine the role of individual characteristics (demographic variables) and individual differences in predicting improvements in the measures used to assess the effects of the training (in terms of training gains and transfer effects). These models afford a more robust analytical approach for addressing problems associated with hierarchical and correlated data than the traditional analyses generally conducted in training studies (e.g., ANOVA, t-test). In particular, LME models allow for a more flexible approach in dealing with individual changes over time when repeated measures are considered (e.g., Gueorguieva and Krystal, 2004; Wainwright et al., 2007; Baayen et al., 2008).

In general, we expected to confirm the beneficial effect of the WM training in terms of short- and long-term gains in the criterion task, and at least short-term transfer effects for all the measures considered. The advantage of performing such an analysis on all four studies sharing the same procedure lay in enabling us to establish the strength of the effects (i.e., effect sizes) on a larger sample.


∧Only the group that attended the Verbal WM training without the use of the imagery strategy was considered here.

Concerning the main objective of the study, we used LME models to analyze participants' individual characteristics and differences vis-à-vis the short- and long-term effects of their training. Analyzing these potential predictors will enable us to test the two proposed theoretical explanations for individual differences in training-related performance gains, i.e., a compensation or a magnification effect of process-based training on cognition in older adults (see for example Titz and Karbach, 2014; see also Lövdén et al., 2012). If there is a magnification effect, then individuals who already perform well will benefit the most from the WM training. In other words, high-performing participants may have more efficient cognitive resources and therefore be in a better position to learn and implement new abilities. The WM training should therefore result in a magnification of age-related (in older adults) and individual differences; baseline cognitive performance should also be positively associated with training-related gains and transfer effects. If there is a compensation effect, on the other hand, then high-performing individuals will benefit less from the training because they are already functioning at their optimal level, and thus have less room for improvement. In this case, age-related and individual differences should be reduced after the training, and baseline cognitive performance should be negatively associated with training-induced gains.

The magnification and compensation effects would thus lead in opposite directions. Among the older adults, the younger participants with a good cognitive status, as represented by a measure of crystallized intelligence (vocabulary), a good WM (revealed by their baseline CWMS performance), and a good education might profit more from an adaptive training on a complex aspect of cognition—WM—because a relatively high level of functioning is required to actively engage in and benefit from the activities proposed in the training (Bissig and Lustig, 2007; Lustig et al., 2009), which would, in turns, magnify their abilities. On the other hand, older participants with a worse cognitive status might benefit more from the WM training (Zinke et al., 2012, 2013) because it could counteract the suboptimal use of resources typical of aging by prompting a more flexible use of these resources, more reliant on controlled than on automatic processes, that would re-activate the older participants' potential, having a compensatory effect.

There is also the possibility, as emerges from the results obtained by Zinke et al. (2013), that the factors thought to predict training-related gains might depend on the measures considered, because these factors may also take effect independently. In fact, the individual characteristics examined may explain the training gains differently, as the transfer tasks vary in several aspects—not only in terms of their relationship with WM, but also in terms of the processes involved, such as the type of control (passive as in the short-term memory tasks vs. active as in the reasoning task), or the more or less strong involvement of fluid abilities (stronger in reasoning and in processing speed than in short-term memory tasks) and/or those related to knowledge.

## METHOD

**Table 1** lists the characteristics and the main results of the seven studies that used the verbal WM training procedure developed by Borella et al. (2010). As mentioned in the Introduction, three studies (in the last rows of **Table 2**) were not considered because: one involved a sample of older adults with mild cognitive impairment (Carretti et al., 2013a); one used a visuo-spatial version of the training program (Borella et al., 2014); and one did not include a follow-up assessment (Cantarella et al., 2017).

The four studies considered eligible for the present analysis had in common: (i) the same verbal procedure; (ii) the same measures for assessing training gains and transfer effects (see **Table 3**); and (iii) a follow-up assessment.

#### Participants

All the four studies considered here included a sample of healthy older adults (all native Italian speakers) recruited from the University of the Third Age, at social clubs in north-eastern Italy, or by word of mouth, who all volunteered for the study.

They were told, either individually (Borella et al., 2010, 2017) or at a plenary session (Borella et al., 2013; Carretti et al., 2013b), that they would be involved in one of two different programs each consisting of five individual sessions, plus a final one at a later date (follow-up). They were also told that the activities proposed in one program would concern their cognitive functioning (i.e., practicing with memory tasks), while in the other one they would be asked to reflect on aspects of memory (e.g., autobiographical recall) and complete some questionnaires.

Depending on the study, participants had to meet the following inclusion criteria: (i) good physical and mental health, assessed by means of a questionnaire and a semi-structured interview, respectively (as in Borella et al., 2010; Carretti et al., 2013b); (ii) none of the exclusion criteria proposed by Crook et al. (1986) as in Borella et al. (2010, 2013); (iii) a Mini-Mental State Score (Folstein et al., 1975) higher than 27 (as in Borella et al., 2013); (iv) a maximum score on the Italian Checklist for Multidimensional Assessments (SVAMA; Gallina et al., 2006), i.e., no signs of incipient dementia (as in Borella et al., 2017). In all four studies, participants were randomly assigned to either the trained group or the active control group.

Overall, 148 participants were involved in the four studies considered, with 73 forming the trained groups, and 75 the active control groups. The pooled trained and control groups were comparable in terms of age (age range: 61–87; trained group: M = 71.63, SD = 5.53; control group: M = 71.61 SD = 5.67), F(1, 146) < 1, years of formal education (from 8 to 24 years; trained group: M = 9.42 SD = 4.54; control group: M = 9.97 SD = 4.72), F(1, 146) < 1, and vocabulary score in the Wechsler Adult Intelligence Scale—Revised (WAIS–R; Wechsler, 1981; max 70; trained group: M = 49.21 SD = 10.89; control group: M = 47.04 SD = 11.87), F(1, 146) = 1.33, p = 0.25.

As common outcome measures used to assess transfer effects varied within the four studies considered, pooled trained and control groups were compared with respect to demographic characteristics and vocabulary score. The pooled trained and control groups were not statistically different in terms of age, years of formal education, and vocabulary score<sup>1</sup> .

<sup>1</sup>As for the Forward and the Backward Digit Span tasks and the Pattern Comparison task, these measures were used in three

# MATERIALS

#### Criterion Task

#### **Categorization Working Memory Span (CWMS) task (De Beni et al., 2008)**

The task consisted of 10 sets of word lists, each including 20 lists of words (divided into groups containing from 2 to 6 lists). Participants listened to a set of word lists audio-recorded at a rate of 1 word per second and they had to tap with their hand on the table whenever an animal noun was heard (processing phase). The interval between word lists was 2 s. At the end of a set, participants recalled the last word on each list (maintenance phase)—i.e., they needed to remember from 2 to 6 words altogether, depending on the difficulty of the set.

The total number of words recalled was used as the measure of WM performance (maximum 20).

#### Nearest Transfer Effects

#### Visuo-Spatial WM Task

#### **Dot Matrix task (adapted from Miyake et al., 2001)**

In this task participants had to check a matrix equation consisting of an addition or a subtraction presented as lines drawn on a 3 × 3 matrix, and to memorize sequences of dots presented on a 5 × 5 grid. They were given a maximum of 4.5 s to check each equation and say "True" or "False." Immediately after they gave their answer, they were shown a 5 × 5 grid containing a dot in one of the squares for 3 s. After seeing sets of two to six pairs of equations and grids, they had to indicate the positions of the dots on a blank 5 × 5 grid. There was one practice trial with two equations, each with one dot. The number of dot locations to recall increased from two to six. A total of 28 equations and 28 matrices were presented. The total number of dot positions correctly recalled was considered as the dependent variable (maximum score 14).

# Near Transfer Effects

#### Short-Term Memory Tasks

#### **Forward and Backward Digit Span tasks (De Beni et al., 2008)**

Participants had to repeat series of digits in the same (forward) or reverse (backward) order. Each level (from 3 to 9 digits for the forward task, from 2 to 8 digits for the backward task) contained two series of digits. After two consecutive recall errors, the task was discontinued. One point was awarded for each correctly recalled series. The final score corresponded to the total number of series recalled correctly (maximum score of 14 for both tasks).

#### Far Transfer Effects Fluid Intelligence

#### **Culture Fair test (Cattell test; Cattell and Cattell, 1963)**

This task consisted of 4 subtests (to be completed in 2.5–4 min., depending on the subtest) in which participants were asked to: (1) choose from among six different options which ones correctly completed a series of figures; (2) identify figures or shapes that did not belong in a sequence; (3) choose items that correctly completed matrices of abstract figures; (4) assess relationships between sets of items. The dependent variable was the number of correct answers across the four subtests (maximum score 50).

#### Processing Speed

#### **Pattern comparison task (adpated from Salthouse and Babcock, 1991)**

In this task, participants had to decide whether arrangements of line segments were identical or not. The items to be compared were set out on two pages each containing 30 items. Responses consisted of writing S (for Si [Yes], for identical items) or N (for No, for different items) on the line between the two items in each pair. The experimenter used a stopwatch to record the time taken to complete each page. Three practice trials were run before the experiment started. The dependent variable was the total time taken to complete the task.

#### Inhibition

#### **Stroop Color task (adapted from Trenerry et al., 1989)**

In this task participants were shown six cards. The first two contained names of colors printed in an incongruent ink color (Incongruent condition); the third and fourth contained names of colors printed in a congruous ink color (Congruent condition); and the last two contained color patches (Control condition). Participants had to name the ink color of each stimulus and were asked to process the stimuli as quickly and accurately as possible. The experimenter recorded response latencies for all conditions by using a stopwatch to time the interval between naming the first and last stimuli—as typically done in other studies using the paper version (e.g., West and Alain, 2000; Van der Elst et al., 2006; Troyer et al., 2006)—and noted the respondents' accuracy by hand on a prepared form. The dependent variable—in order to control for individual differences at the baseline (e.g., Borella et al., 2009)—was the interference index computed in terms of relative difference between the time taken to complete the task in the incongruent and control conditions, that is [(incongruent

<sup>(</sup>Borella et al., 2010, 2013, 2017) of the four studies considered. The pooled trained (n = 56) and control (n = 56) groups from these three studies did not differ in terms of age (trained group: M = 72.43 SD = 5.81; control group: M = 72.04, SD = 6.16), F(1, 110) < 1, years of formal education (trained group: M = 9.482 SD = 4.72; control group: M = 10.30 SD = 4.65), F(1, 110) < 1, or vocabulary score on the Wechsler Adult Intelligence Scale - Revised (WAIS–R; Wechsler, 1981; trained group: M = 50.23 SD = 11.27; control group: M = 49.07 SD = 11.45), F(1, 110) < 1.

As for the Cattell test, this was used in three (Borella et al., 2010, 2013; Carretti et al., 2013b) of the four studies considered. The pooled trained (n = 55) and control (n = 57) groups from these two studies did not differ in terms of age (trained group: M = 72.35, SD = 5.86; control group: M = 72.72 SD = 5.47), F(1, 110) < 1, years of formal education (trained group: M = 7.80 SD = 3.73; control group: M = 8.49 SD = 4.07), F(1, 110) < 1, or vocabulary score on the Wechsler Adult Intelligence Scale—Revised (WAIS–R; Wechsler, 1981; trained group: M = 45.11 SD = 8.89; control group: M = 43.16 SD = 10.25), F(1, 110) = 1.15, p = 0.28.

The Dot Matrix task and the Stroop Color task were used in two (Borella et al., 2010, 2013) of the four studies considered. The pooled trained (n = 38) and control (n = 38) groups from these two studies did not differ in terms of age (trained group: M = 73.84, SD = 6.12; control group: M = 73.89 SD = 5.85), F(1, 74) < 1, years of formal education (trained group: M = 7.66 SD = 3.79; control group: M = 8.24 SD = 3.62), F(1, 74) < 1, or vocabulary score on the Wechsler Adult Intelligence Scale—Revised (WAIS–R; Wechsler, 1981; trained group: M = 44.79 SD = 8.93; control group: M = 44.21 SD = 9.65), F(1, 74) <1.

condition − control condition)/control condition]. A higher score thus implied a greater difficulty in controlling the prepotent response in the incongruent condition.

#### **Intrusion errors in the CWMS -CWMS intrusion errors- (De Beni et al., 2008)**

The total intrusion errors made in the CWMS, i.e., words that were not actually the last in of each string of words presented, were also considered as a measure of inhibition, representing a participant's ability to inhibit no longer relevant information (Borella et al., 2007).

For each task, two parallel versions were devised and administered in a counterbalanced order across the testing sessions.

# Procedure

Participants attended six individual sessions: the first and fifth were the pre-test and the post-test sessions, and the sixth was for the follow-up (held 6–8 months later). For the other three sessions, the trained participants attended the training program, while the active controls were involved in alternative activities. For both groups, all activities were completed within a 2 week time frame, with a fixed 2-day break between sessions. The duration of the sessions and the amount of interaction with the experimenter were much the same for the two groups.

The three sessions of WM training (sessions 2-3-4) lasted about 30–40 min. Participants were presented with lists of words audio-recorded and organized in the same way as for the CWMS task, and asked to recall target words, and to tap with their hand on the table when they heard an animal noun. The maintenance demand of the CWMS task was manipulated by increasing the number of words that successful participants were asked to recall, and by presenting the lowest memory load to participants who were unsuccessful (session 2). The demands of the task also varied and, depending on the session, they could involve having to recall: (i) the last or first word in each list; (ii) words preceded by a beep sound. The processing demand (tapping on the table when an animal noun occurred) was also manipulated by varying the frequency of the animal words in the lists (session 3). Participants in the active control group were asked to complete questionnaires on memory (session 1: Autobiographical Memory questionnaire; session 2: Memory Sensitivity questionnaire—De Beni et al., 2008), and on psychological well-being (De Beni et al., 2008; see Borella et al., 2010 for more details of the training program and the active control group's activities).

The procedure was completed in accordance with the Declaration of Helsinki (2008).

# RESULTS

## Linear Mixed Models

Linear mixed effects (LME) models were used to analyze the data because, unlike the more classical and frequentlyused methods, they enable estimates to be adjusted for repeat sampling (when more than one observation arises from the same individual), and for sampling imbalance (when some individuals are sampled more than others), and because they allow for variation among individuals within the data (McElreath, 2016). Adopting the Bayesian approach when estimating parameters enabled us to exploit all the advantages of LME modeling, focus directly on the probability of an effect, given the observed data (posterior probability), and compute the evidence of our results.

#### Analytical Plan

For each of the eight measures of interest (i.e., the total number of words recalled in the CWMS, the total number of dots recalled in the Dot Matrix task, the total number of series correctly recalled in the Forward and in the Backward Span tasks, the total number of correctly answered items in the Cattell test, the total time taken to complete the task in the Pattern Comparison test, the interference index in the Stroop Color task, and the total number of intrusion errors in the CWMS), we tested several mixed effects models including all combinations of predictors, i.e., group (trained vs. control), age, education, vocabulary, baseline performance in the verbal WM task (the CWMS), and subjects as random effects. More precisely, we started from the null model (i.e., the model with no predictors), considering only the longitudinal effect (pre-test, post-test, follow-up) and subsequently introduced all the predictors and the interactions of all the predictors with the sessions.

For data analysis, we proceeded as follows:


For the regression parameters (ß) we used normal priors (M = 0 and SD = 10), and for standard deviation parameters we used half-Student t (df = 3, M = 0, SD = 10); convergences were assessed by examining the potential scale reduction factor (PSRF; Gelman and Rubin, 1992).



#### Data Inspection

**Table 3** contains the descriptive statistics for each of the measures of interest by group (trained and control), and by assessment session (pre-test, post-test, and follow-up).

#### Model Fitting and Parameter Estimation

In all, 1,026 models, 58 for the CWMS, and 121 for each of the other measures of interest, were fitted. The fit indices of the 5 best models for each measure of interest are given in **Table 5**.

#### Model Comparison and Best Model Analysis

For each outcome measure we compared the fit indices of the 5 best models. Then, focusing on the best one (the one with the lowest WAIC), we proceeded with a graphical inspection of the Group (trained vs. control) X Session (pre-test vs. post-test vs. follow-up) interaction to assess the effectiveness of the training (see **Figure 1**).

To gain a better understanding of the extent of the benefits of training on the trained group's performance, the effect size was computed on the differences between the two groups (trained and control) at pre-test, post-test, and follow-up (see **Table 6**). In addition, to ascertain the dimension of the immediate (pre- vs. post-test) and long-term (pre-test vs. follow-up) gains obtained in the trained group, Cohen's d was computed using the following formula: {(Post-test or follow-up for the trained group − Pre-test for the trained group) − (Post-test or follow-up for the controls − Pre-test for the controls)}/(Pooled SD of the difference; see Weisz and Hawley, 2001). This enabled us to adjust the gains shown by the trained group in relation to the gain obtained by the active control group (see **Table 6**).

Then, for the trained group, we conducted a graphical inspection (separately for each session) of the values fitted for the significant effects of the best model (see **Figure 2**), supported by the evidence ratio for the hypothesis involving the ß coefficients considered (see **Table 7**).

The evidence ratio represents the evidence of the targeted hypothesis (e.g., ß > 0) with respect to the opposite hypothesis (e.g., ß < 0). If the evidence ratio equals 1, then the two hypotheses are equally plausible. An evidence ratio larger than 1 indicates that the target hypothesis is more plausible than the opposite one, and an evidence ratio of <1 means that the opposite hypothesis is more plausible than the targeted one. In the present study, the evidence ratio was used to assess the differences between the pre- and post-test slopes, and between the post-test and follow-up slopes.

# Criterion Task

#### **CWMS**

For the CWMS we considered a total of 58 models<sup>2</sup> . The best model was the one with the Session X Group X Vocabulary interaction, with a probability of 59.1% (see **Table 5**).

**Figure 1A** shows the Group (trained vs. control) X Session (pre-test vs. post-test vs. follow-up) interaction from the best model: the trained group performed better at post-test than at pre-test, and maintained its better performance from post-test

TABLE

4


Descriptive

statistics

for

the

outcome

measures

by

group

(trained

vs.

controls)

and

by

assessment

session

(pre-test,

post-test,

follow-up).

<sup>2</sup>For the CWMS, the baseline performance in the CWMS was not entered as a predictor because of multicollinearity problems.

#### TABLE 5 | Fit indices of the five best models for each variable of interest.


CWMS, Categorization Working Memory Span Task; ss, session; vocab, vocabulary; educ, education; CWMS baseline, baseline performance level; in the Categorization Working Memory Span Task; CWMS intrusions, intrusion errors in the CWMS; WAIC, Widely Applicable Information Criterion (lower values indicate better fit; Watanabe, 2010); δWAIC, Widely Applicable Information Criterion difference between the best model and the others; weigth, Akaike Weight, i.e., an estimate of the probability that the model will make the best prediction on new data conditional on the set of models considered (Burnham et al., 2011; McElreath, 2016).

to follow-up. The effect sizes for group differences were large at both post-test and follow-up (see **Table 6**). No differences were found for the active control group. The trained group outperformed the control group at post-test and follow-up (see **Figure 1A**). The net effect size index for the trained group, adjusted on the control group's performance, was large for

FIGURE 1 | Group (trained vs. control) X Session (pre-test vs. post-test vs. follow-up) interaction from the best model for each measure of interest. Categorization Working Memory Span task (A), Dot Matrix task (B), Forward (C) and Backward (D) digit span tasks, Cattell test (E), Pattern Comparison task (F), Stroop Color task (G) and intrusion errors in the Categorization Working Memory Span task (H). CWMS, Categorization Working Memory Span Task; CWMS intrusions, intrusion errors in the Categorization Working Memory Span Task; RTs, Response Times. Segments represent the 95% credibility intervals.

#### TABLE 6 | Effect sizes.


Cohen's d of the difference between the trained and control groups at pre-test, post-test, and follow-up, for each measure of interest (first three columns); and net effect sizes computed immediately after the training (pre-test vs. post-test; short-term gains), then 6/8 months later (pre-test vs. follow-up; long-term gains) for all measures of interest (last two columns). CWMS, Categorization Working Memory Span Task; CWMS intrusions, intrusion errors in the CWMS; RTs, Response Times. \*For short-term gains, the net effect size was computed with the formula {(Post-test for the training groups − Pre-test for the training groups) − (Post-test for the controls − Pre-test for the controls)}/(Pooled SD of the difference). \*\*For the long-term gains, the net effect size was computed with the formula {(Follow-up for the training groups − Pre-test for the training groups) − (Follow-up for the controls − Pre-test for the controls)}/(Pooled SD of the difference).

immediate gains (pre- vs. post-test) and for long-terms gains (see **Table 6**).

**Figure 2A** shows the fitted values from the best model for the trained group alone, as a function of vocabulary score at the three assessment sessions, with the relative estimated linear trend. The regression slope decreased from session 0 (pre-test) to session 1 (post-test), with an evidence ratio of 89.91 (see **Table 7**), and it became flat at follow-up, as shown by the evidence ratio (see **Table 7**). These results indicate that participants with low vocabulary scores were the ones who showed an improvement in performance in the criterion WM task from pre-test to post-test, and maintained this gain at follow-up.

#### Nearest Transfer Effects Visuo-Spatial Working Memory

# **Dot Matrix task**

For the Dot Matrix task we considered a total of 121 models. The best model was given by the Session X Age X Group X Vocabulary interaction with a probability of 84.7%. This model was about 17 times more evident than the next one, which achieved a probability of around 5.1% (see **Table 5**).

As for the Group (trained vs. control) X Session (pre-test vs. post-test vs. follow-up) interaction, the two groups did not differ at pre-test. The trained group performed better at posttest than at pre-test, but this gain was not maintained at followup, when performance was not as good as at post-test (see **Figure 1B**). Effect sizes for group differences were large at posttest and became small at follow-up (see **Table 6**). No differences were found for the control group. The trained group only outperformed the control group at post-test (see **Figure 1B**). The net effect size for the trained group, adjusted on the control group's performance, was large for the immediate gains (prevs. post-test), but became small for the long-terms gains (see **Table 6**).

**Figure 2B** shows the values fitted from the best model for the trained group alone, as a function of age, and of vocabulary score at the three assessment sessions, with the corresponding estimated linear trend. For age, the regression slope suggests a change—with an evidence ratio of 32.06 (see **Table 7**) from session 0 (pre-test) to session 1 (post-test), and a slight deterioration from post-test to follow-up (evidence ratio of 7.05; see **Table 7**): it was the younger participants whose performance improved from pre-test to post-test, and then dropped back at follow-up to much the same as their pre-test performance.

As for vocabulary, the regression slope rose from session 0 (pre-test) to session 1 (post-test), with an evidence ratio of 5.02 (see **Table 7**), and clearly dropped again, becoming flat at follow-up (see evidence ratio): it was the participants with high vocabulary scores whose performance improved in terms of the number of dots correctly recalled from pre-test to post-test, but not at follow-up when their performance clearly deteriorated.

#### Near Transfer Effects Short-Term Memory **Forward Digit Span task**

A total of 121 models, one of which did not converge, were considered for the Forward Digit Span task. The best model was represented by the Session X Education X Group X Vocabulary interaction with a probability of 27.3%. This model did not seem much more evident than the next two, for which the probability was around 23.5 and 17.7%, respectively (see **Table 4**).

As for the Group (trained vs. control) X Session (pre-test vs. post-test vs. follow-up) interaction, the two groups did not differ at pre-test. The trained group performed better at post-test than at pre-test, but this gain was not maintained at follow-up, when performance was worse than at post-test (see **Figure 1C**). The effect sizes for group differences were large at post-test and became small at follow-up (see **Table 6**). No differences were seen


TABLE 7 | Evidence ratio of the differences between pre- and post-test slopes, and between post-test and follow-up slopes, for each variable of interest, by fit index.

CWMS, Categorization Working Memory Span Task; RTs, Response Times; CWMS intrusions, intrusion errors in the CWMS.

for the control group. The trained group only outperformed the control group at post-test (see **Figure 1C**). The net effect size for the trained group, adjusted on the control group's performance, was large for immediate gains (pre- vs. post-test), but became small for long-terms gains (see **Table 6**).

**Figure 2C** shows the values fitted from the best model for the trained group alone as a function of education, vocabulary, and pre-test performance in the WM criterion task, at the three assessment sessions, with the corresponding estimated linear trend. A minimal change from pre-test to post-test emerged for all the variables: the evidence ratio between session 0 (pretest) and session 1 (post-test) was 7.39 for education, 199 for vocabulary, and 0.63 for pre-test performance in the WM criterion task (see **Table 7**). At follow-up, there was a change from post-test (see evidence ratio), with performance dropping back to pre-test levels. In particular, it was the participants who had a limited education, low vocabulary scores, and a poor pretest performance in the WM criterion task who experienced a slight improvement in their performance, but only at posttest.

#### **Backward digit span task**

For the Backward Digit Span task we considered a total of 121 models, one of which did not converge. The best model emerged for the Session X Age X Education X Group interaction, with a probability of 48.3%. This model was about two times more evident than the next one, for which the probability was around 29% (see **Table 5**).

For the Group (trained vs. control) X Session (pre-test vs. post-test vs. follow-up) interaction, the two groups did not differ at pre-test. The trained group performed better at posttest than at pre-test, and then its performance deteriorated from post-test to follow-up (see **Figure 1D**). The effect sizes for group differences were large at post-test and became small at followup (see **Table 6**). No differences were identified in the control group. The trained group only outperformed the control group at post-test (see **Figure 1D**). The net effect size for the trained group, adjusted on the control group's performance, was large for immediate gains (pre- vs. post-test), but became small for long-terms gains (see **Table 6**).

**Figure 2D** shows the values fitted from the best model for the trained group alone as a function of education, and of age at the three assessment sessions, with the corresponding estimated linear trend. For education, there was no change in the slope. For age, the younger the participants, the greater the improvement in performance from pre-test to post-test, with an evidence ratio of 77.43 (see **Table 7**). From post-test to follow-up, there was a decline in the slope, with an evidence ratio of 665.67 (see **Table 7**), meaning that performance returned to the levels seen at pre-test.

#### Far Transfer Effects

#### Fluid Intelligence

#### **Cattell test**

For the Cattell test we considered a total of 121 models. The best model was obtained with the Session X Age X CWMS baseline X Group interaction, with a probability of 43.1%. This model was about three times more evident than the next, for which the probability was around 16.6% (see **Table 5**).

For the Group (trained vs. control) X Session (pre-test vs. post-test vs. follow-up) interaction, the two groups did not differ at pre-test. The trained group performed better at post-test than at pre-test, and then its performance declined from post-test to follow-up (see **Figure 1E**). The effect sizes for group differences were medium at post-test and became small at follow-up (see **Table 6**). No differences came to light for the control group. The trained group outperformed the control group at both post-test and follow-up (see **Figure 1E**). The net effect size for the trained group, adjusted on the control group's performance, was medium for both immediate gains (pre- vs. post-test) and long-term gains (see **Table 6**).

**Figure 2E** shows the values fitted from the best model for the trained group alone as a function of age, and of pre-test performance in the WM criterion task at the three assessment sessions, with the corresponding estimated linear trend. For age, it was the younger participants whose performance was better at post-test than at pre-test, with an evidence ratio of 443.44 (see **Table 7**), and they also maintained their better level of performance at follow-up, with an evidence ratio of 41.11 (see **Table 7**).

There was no differences in the slopes for pre-test performance in the WM criterion task, as confirmed by the evidence ratio. It is worth noting that there was a slightly higher variability in the Cattell test at post-test for participants with low scores for pre-test performance in the WM criterion task.

#### Processing Speed

#### **Pattern Comparison task**

We considered a total of 121 models for the Pattern Comparison task. The best model was the one with the Session X CWMS baseline X Group X Vocabulary interaction, reaching a probability of 38.5%. This model was about two times more evident than the next, which reached a probability of around 20.5% (see **Table 5**).

For the Group (trained vs. control) X Session (pre-test vs. post-test vs. follow-up) interaction, the two groups did not differ at pre-test. The trained group performed better (taking less time to complete the task) at post-test than at pre-test, and maintained this improvement from post-test to follow-up (see **Figure 1F**). The effect sizes for group differences were large at post-test and became medium at follow-up (see **Table 6**). No differences were found for the control group. The trained group outperformed the control group at post-test and, to a certain extent at, at followup too (see **Figure 1F**). The net effect size for the trained group, adjusted on the control group's performance, was medium for both immediate gains (pre- vs. post-test) and long-term gains (see **Table 6**).

**Figure 2F** shows the values fitted from the best model for the trained group alone, as a function of vocabulary, and pre-test performance in the WM criterion task at the three assessment sessions, with the corresponding estimated linear trend. There was a very weak effect for vocabulary—the evidence ratio was 13.13 (see **Table 7**)—and it was the participants who had a higher pre-test vocabulary score who improved in the processing speed measure from pre- to post-test. No differences were found in the slope when the WM criterion task at pre-test was considered; a high individual variability also emerged (see **Figure 2**).

#### Inhibition

#### **Stroop color task**

We considered a total of 121 models for the Stroop color index on response times (RTs). The best model coincided with the Session X Age X Group interaction, with a probability of 22%, which is rather low, though this model was about 4 times more evident than the next one, which reached a probability of around 6.3% (see **Table 5**). As shown in **Figure 1G**, no Group (trained vs. control) X Session (pre-test vs. post-test vs. followup) interaction was found. This is in line with the null effect size identified on the differences between the groups, both immediately after the training and at follow-up (see **Table 6**), and also on the effect size computed for the immediate and long-term training gains obtained by the trained group (see **Table 6**).

The best model was only examined for the trained group. **Figure 2G** shows the values fitted from the best model as a function of age at the three assessment sessions, with the corresponding estimated linear trend. The results suggest that younger participants were more sensitive to interference, and that it decreased from pre-test to post-test, with an evidence ratio of 665.67 (see **Table 7**), but then rose again to the pre-test level at follow-up.

#### **CWMS intrusions**

We considered a total of 121 models for the CWMS intrusion errors. The best model was the one with the Session X Age X CWMS baseline X Group interaction, with a probability of 93%. This model showed a higher evidence than the others, since the associated probability was 18 times higher than that of the next model (see **Table 5**).

The differences were not very large for the Group (trained vs. control) X Session (pre-test vs. post-test vs. follow-up) interaction. Performance seemed to deteriorate from pre-test to post-test in the trained group, but not in the control group, while the number of errors increased at follow-up (see **Figure 1H**). The effect sizes for group differences were small at post-test and became medium at follow-up (see **Table 6**), but the two groups did not differ. The net effect size for the trained group, adjusted on the control group's performance, was small for immediate gains (pre- vs. post-test), but became medium for long-term gains (see **Table 6**).

**Figure 2H** the values fitted from the best model for the trained group alone as a function of age, and of pre-test performance in the WM criterion task, at the three assessment sessions, with the corresponding estimated linear trend. The regression slope for age decreased from post-test to follow-up, but not from pretest to post-test, as supported by the evidence ratio, indicating that it was the older individuals who made more mistakes at pre-test, and it was only at follow-up that they were likely to experience fewer intrusion errors. The regression slopes for baseline performance in the WM criterion task decreased from session 0 (pre-test) to session 1 (post-test), with an evidence ratio of 799 (see **Table 7**), and then became flat, as supported by the evidence ratio (see **Table 7**): all the participants with a poor WM performance made more mistakes, in terms of intrusion errors, at pre-test, and fewer mistakes at post-test, and this improvement was maintained at follow-up.

# DISCUSSION AND CONCLUSIONS

Our aim in the present study was to delineate how certain individual characteristics contribute to explaining WM training gains and transfer effects. Despite the importance of individual characteristics in cognition, only three studies in the aging literature have considered this issue. Here, age, formal education, general cognitive ability (operationalized with the vocabulary score), and WM baseline performance level were considered as a predictor of the short- and long-term specific training gains and transfer effects of a verbal WM training in a sample of healthy older adults.

To elucidate this issue, an analysis was conducted using linear generalized mixed effects models on data from four previous studies on healthy older adults that adopted the verbal WM training program developed by Borella et al. (2010). Part of the interest of such an analysis lies in that—for the first time, to our knowledge at least—all the studies examined were based on the same procedure and the same assessment measures, and they all included a follow-up session. The Borella et al. (2010) training program seems to be the only WM procedure to have been applied repeatedly in older adults with consistent results across studies. It is worth adding that another advantage of the studies selected for the present analysis is the inclusion of an active control group, and parallel versions of the tasks were presented (as recommended, but rarely done, in the literature; Zinke et al., 2012). The effects identified therefore cannot be attributed to the influence of item-specific practice.

Overall, our findings confirmed the efficacy of the verbal WM training procedure proposed by Borella et al. (2010): the trained group showed specific gains, performing better in the criterion task than the active control group immediately after the training, and maintaining this benefit at follow-up. Positive effects of the WM training were also generally apparent in terms of transfer effects, in the short term at least (at post-test), since the trained group outperformed the active controls in all the near transfer measures considered. As for the far transfer measures, the trained group again outperformed the active controls in all tasks, but not in terms of the Stroop Color index on RTs or CWMS intrusion errors.

This pattern of results was confirmed by the generally large post-test effect sizes (over 0.80) computed on the differences between the trained and active control groups, with the exception of the reasoning task –the Cattell test- (medium effect sizes) and intrusion errors in the CWMS (small effect sizes). At follow-up, the differences remained large for the criterion task, but became medium in the processing speed task—the Pattern Comparison task—and for intrusion errors in the CWMS, and small in the other tasks (Forward Digit and Backward Digit Span tasks, Dot Matrix task, and Cattell test). There were no changes in the effect size of the Stroop Color index on RTs, which confirms the absence of an effect of the training—the fact that there were no group differences—between the trained and control groups. Also by considering the net effect size of the training activities on participants' performance, that is changes in the trained group across sessions—pre-test, post-test and follow-up—(see **Table 6**) after adjusting the value for any change in the control group, the training benefits were confirmed. It is consequently reasonable to say that the training produced some maintenance effects on the trained group's performance.

These overall findings are consistent with the previouslypublished results obtained with the same WM training program. Although the present training regimen is quite short (only three training sessions) near and far training gains were found, confirming that this WM training procedure is effective. As also suggested by the meta-analysis conducted by Karbach and Verhaeghen (2014), the length of a training does not seem a crucial factor in determining its efficacy: in fact, most of the WM training procedures for older adults failed to document any benefits although they were much longer than the one considered here (see Borella et al., 2017; see also **Table 1**). The adaptive regimen adopted may well have favored training gains by: (i) ensuring that the tasks were always challenging, cognitively demanding and novel, consequently inducing participants to adhere to the task; (ii) producing a change in participants' allocation of attentional resources because the training engages several processes (including encoding, retaining information, inhibiting no longer relevant information, managing two tasks

simultaneously, shifting attention, and attentional control) for an efficient handling of the different demands of the tasks. On the other hand, the lack of any short-term transfer effects for the two inhibitory measures may mean that inhibitory mechanisms are less amenable to training (see Zavagnin and Riboldi, 2012). Some degree of caution is required in interpreting the findings obtained with the Stroop Color task because they were based on RTs, which are not a very reliable indicator (e.g., de Ribaupierre et al., 2003; see Ludwig et al., 2010; Borella et al., 2017), and the sample was reduced for this particular measure (see **Table 3**). It is also possible, as discussed below, that it would take longer to prompt any detectable change for the inhibitory measures, or some of them at least.

Concerning long-term effects of the training, there was evidence of the maintenance of the specific training gain (in the criterion WM task), in line with all the WM training studies in the aging literature. In the transfer tasks, the training gains were only maintained for the Cattell test and the Pattern Comparison task, as seen in other studies using the same training procedure: the advantage of the trained group over the controls lay in the range of a medium effect size (or near-medium for the Pattern Comparison task). Such a selective maintenance of the training gains may be attributable to the well-documented strong relationship between WM and (i) processing speed (measured with the Pattern Comparison task), and (ii) reasoning ability (for a comprehensive discussion see Borella et al., 2010).

Improving WM performance makes cognitive operations more efficient, thus fostering the ability to move among the basic information processes. The other tasks may call upon more task-specific processes and abilities instead, leading to only transient (immediate) transfer effects (for a discussion, see Borella et al., 2010). One of the inhibitory measures examined, intrusion errors in the CWMS, seems particularly intriguing in that it only showed a clear improvement (fewer intrusion errors in the criterion task) at follow-up. This may mean that it would take longer to see a benefit of the training for some measures (inhibitory mechanisms in the present case). This phenomenon (i.e., clear transfer effects only at follow-up) has been found in other training studies in aging too (e.g., Borella et al., 2017), and has been called the "sleeper" effect. Although its nature needs to be further investigated (see Jaeggi et al., 2014), it may indicate that certain abilities take longer to show a significant improvement in performance. Future studies should make an effort to examine this issue.

Such a result on the intrusion errors and not in the Stroop Color task (leaving aside the problems associated with measuring RTs and the reduced sample size) may even indicate that WM training is more beneficial for some inhibitory functions than for others. In fact, intrusion errors in the CWMS and the Stroop Color task do belong to two different inhibitory functions. CWMS intrusion errors—an internal measure of the WM task and therefore closely related thereto (see Robert et al., 2009) represent the resistance to proactive interference function of inhibition, which helps attention to be focused on relevant items and simultaneously-presented irrelevant items to be ignored; the Stroop Color task measures the resistance to prepotent response function of inhibition, which blocks dominant and prepotent cognitive responses automatically activated by the stimulus presented (e.g., Borella et al., in press). Resistance to proactive interference is also considered the only inhibitory function related to the control of information coming from memory content (Friedman and Miyake, 2004).

In light of the present findings, the questions following are: 1. Do any individual characteristics have a part to play in these findings? Are the effects of training supported by magnification or compensation effects, or both? Of course, there could be several aspects, such as methodological issues, but also participants' individual characteristics, capable of explaining the training gains and supporting the results. Here, we particularly analyzed the role of certain demographic variables (age and educational level), cognitive abilities in WM (i.e., pre-test performance in the criterion task), and a vocabulary test score as an indicator of crystallized intelligence.

Our findings showed that the role of the individual characteristics considered depended on the type of measure examined, and the effect of these variables was very modest for some tasks. The most interesting aspect seems to be that the factors considered would support either a compensation or a magnification effect immediately after the training, depending on which measure was analyzed. In particular, irrespective of the near or far transfer effects, the more the tasks demanded active information processing (i.e., the Dot Matrix, Backward Digit Span and Pattern Comparison tasks, the Cattell test, and the Stroop Color task), the more the factors examined seemed to support a magnification effect (of variable robustness). In other words, participants who had a higher initial performance in the crystallized measure used here and/or were younger, were more likely to improve after the training. For more passive tasks, on the other hand (i.e., the Forward Digit Span task, which is a shortterm memory task), our results supported a compensation effect: participants with lower baseline vocabulary scores, an older age, and a weaker WM performance benefited more from the training. A particular pattern emerged for the criterion task (i.e., the task similar to the one used in the training) and for a closely related measure (CWMS intrusion errors): although the criterion task is complex, participants with a lower performance in a task of crystallized intelligence, as assessed with the vocabulary test, gained more from the training than those with higher vocabulary scores. That vocabulary should have such a role suggests that knowledge can counteract age-related decline (e.g., Baltes, 1987). Further, the role of vocabulary in explaining specific training gains may also suggest that participants exhibiting transfer effects were those who acquired new knowledge, rather than a greater processing efficiency. A finely-graded analysis at individual level may be able to clarify this issue.

As for intrusion errors there was evidence of a compensation effect related to age and baseline WM performance, with older participants and those with a lower baseline WM performance improving the most. These results may mean that the exercises used in the training enabled individuals with a lower crystallized ability to adapt to the demands of the tasks, engage bettercontrolled processes, and make more efficient use of their resources. It is hardly surprising that such a pattern of results should emerge with a training task that involves an adaptive procedure, which may have enabled progress to be made during the training, leading to a better performance and fewer intrusion errors at post-test and its maintenance at follow-up. Similarly, participants with a low baseline WM performance also became better able to manage no longer relevant information (CWMS intrusion errors).

One way to interpret these results could be by referring to the supply-demand mismatch conceptualized by Lövdén et al. (2010). According to these authors, changes in cognitive performance are induced by a mismatch between available resources and task demands: to cope with this mismatch, individuals engage in activities that promote flexibility, and consequently also plasticity. This hypothesis enables us to predict how individual differences might affect the benefits gained from a training regimen, depending on a task's complexity. The compensation effect seen for the criterion task (which closely resembled the task used in the training) may be due to the fact that using an adaptive procedure while practicing with the training task favored the "right amount" of supply-demand mismatch (i.e., demands exceeding than the available capacity) for individuals with a weaker profile in terms of their general cognitive abilities to re-activate their potential, and thus benefit from the training in terms of a better performance in the criterion task. It might have been easier to support this interpretation if our training procedure had been designed to enable us to test how performance changed from one training session to the next. Such an analysis would also shed light on what happens to individuals with the opposite profile (good general cognitive abilities—high vocabulary scores in the present study), who would only experience the mismatch if the WM tasks used in the training were more difficult, so instead of benefiting in terms of performance in the trained task they would gain in terms of plasticity in the transfer tasks.

According to the mismatch concept, the magnification effect found for the more demanding transfer tasks may indicate that in participants with a higher profile, in terms of age (i.e., younger individuals), or crystallized intelligence (i.e., those with higher vocabulary scores) the training induced a supply-demand mismatch that gave an impetus to change, thus engendering a flexible behavior. In participants with a lower profile, on the other hand, the high demands of the tasks used in the training might prevent any supply-demand mismatch because these individuals might abandon any attempt or be unable to apply resources and processes suited to the task.

The mixed results emerging from the present analysis as concerns the role of individual characteristics in explaining the compensation or magnification effects are consistent with a report from Zinke et al. (2013): the authors found that participants with weaker transfer effects were older (magnification), and that those with smaller training gains had stronger transfer effects (compensation). The role of age and WM performance varied, however, depending on the transfer tasks considered. The role of the predictors was examined too, but the effect size was small for some of the transfer tasks, and this limits the value of the results obtained.

It has to be said that the results found in the present study were modest too, so some degree of caution is warranted in interpreting them. Further, such a pattern of findings was found at post-test and, except for the criterion task and the intrusion errors, the role of the predictors was not maintained at followup for the other measures. Such a result could be interpreted in two different ways: one stems from on the idea that, because of the training, the individual characteristics are no longer significant because something beyond them has been modified during the training, such as the way in which participants process information; the other simply attributes the result to the fact that the effects of the training were not maintained. It therefore seems important to analyze the influence of other individual characteristics on the effects of WM training. This has been done, for the first time, at least to our knowledge, in both the short and the long term; the other three studies that approached such an issue did not consider the role of the predictors in the long term.

Our findings also suggest that compensatory and magnification effects are not mutually exclusive in explaining training gains; they may both contribute to characterizing and explaining the outcome of training. It would therefore be important for future WM training studies in general (and in aging, in particular) to make the effort to examine the role of individual factors. We are aware that large samples are needed for such analyses, but it is only by trying to overcome such practical problems that research can advance and enable us to ascertain the real usefulness of intervention to promote an active aging.

A number of limitations of the present study have to be acknowledged. First, single measures were used to represent the constructs of interest, whereas using multiple indicators of the same process (e.g., Shipstead et al., 2010) in training studies would enable us to draw stronger conclusions. Second, we were unable to consider training gains per-se, i.e., the role of individual differences in improvements induced by training across the sessions, the rate of learning, in predicting the outcome of the training, because the particular procedure used did not allow for these aspects to be analyzed. Examining what progress a given individual makes, and how it relates to the effects of the training could shed further light on the value of the present training program (Zinke et al., 2013; Bürki et al., 2014). A third limitation lies in that we analyzed the role of a limited number of factors potentially influencing training gains. In future studies we plant to conduct a more complete assessment of general cognitive ability, and assessing them with tasks not used to test transfer effects (to avoid multi-collinearity problems). Since we acknowledge the exploratory nature of our results, we have discussed overall trends for the transfer tasks, rather than the influence of each of the specific predictors on each task. There are also other factors, of course, that may have influenced the training gains, and that have yet to be considered, such as metacognitive (motivational) variables, mood, and psychological well-being (e.g., von Bastian and Oberauer, 2014; Könen and Karbach, 2015). It might also be of interest to analyze the influence of genetics (such as dopamine availability) on training gains (von Bastian and Oberauer, 2014). Future studies should therefore strive to include a broad array of factors, with larger and more homogeneous samples (i.e., with same size) than the one used here, in an effort to delineate all the conditions capable of shaping the effects of WM training in older adults.

To conclude, the present study provides further evidence of the elderly gaining in cognitive flexibility and plasticity from a verbal WM training. It also highlights the importance of analyzing the factors influencing WM training gains in aging. Also, although we showed that older people's WM can be improved thanks to a plasticity that persists with aging, we found that the role of individual characteristics depended on the transfer measure examined. It is consequently important to ascertain "who" gains from the training, but also "who gains in which tasks," in order to be able to design the most effective WM training to suit an individual's cognitive profile. This study could thus be considered as one of the first promising steps toward clarifying the impact of individual

#### REFERENCES


characteristics on the short- and long-term efficacy of WM training.

# AUTHOR CONTRIBUTIONS

EB designed the study, assisted in carrying out the analyses, and wrote the paper. EC wrote the paper and assisted in carrying out the analyses. MP carried out the statistical analyses. RD assisted in writing the paper. BC designed the study, assisted in carrying out the analyses, and wrote the paper.

# FUNDING

The study was supported by the grant CPDA141092/14 awarded by the University of Padova to BC.

maintenance. Front. Hum. Neurosci. 6:63. doi: 10.3389/fnhum.2012. 00063


training in old-old age. Gerontology 58, 79–87. doi: 10.1159/0003 24240

Zinke, K., Zeintl, M., Rose, N. S., Putzmann, J., Pydde, A., and Kliegel, M. (2013). Working memory training and transfer in old age: effects of age, baseline performance and training gains. Dev. Psychol. 50, 304–315. doi: 10.1037/a0032982

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer JMR and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review

Copyright © 2017 Borella, Carbone, Pastore, De Beni and Carretti. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Cognitive Training Sustainably Improves Executive Functioning in Middle-Aged Industry Workers Assessed by Task Switching: A Randomized Controlled ERP Study

Patrick D. Gajewski<sup>1</sup> \*, Gabriele Freude<sup>2</sup> and Michael Falkenstein1,3

<sup>1</sup> Ageing Research Group, Leibniz Research Centre for Working Environment and Human Factors (IfADo), Technical University of Dortmund, Dortmund, Germany, <sup>2</sup> Federal Institute for Occupational Safety and Health, Berlin, Germany, 3 Institute for Working, Learning and Aging, Bochum, Germany

Recently, we reported results of a cross-sectional study investigating executive functions in dependence of aging and type of work. That study showed deficits in performance and electrophysiological activity in middle-aged workers with long-term repetitive and unchallenging work. Based on these findings, we conducted a longitudinal study that aimed at ameliorating these cognitive deficits by means of a trainer-guided cognitive training (CT) in 57 further middle-aged workers with repetitive type of work from the same factory. This study was designed as a randomized controlled trail with pre- (t1), post- (t2), and a 3-month follow-up (t3) measure. The waiting control group was trained between t2 and t3. The training lasted 3 months (20 sessions) and was evaluated with the same task switching paradigm used in the previous cross-sectional study. The CT improved performance in accuracy at the behavioral level and affected the electrophysiological correlates of retrieval of stimulus-response sets (P2), response selection (N2), and error detection (Ne), thus unveiling the neuronal background of the behavioral effects. The same training effects were observed in the waiting control group after CT at t3. Moreover, at t3, most of the behavioral and electrophysiological traininginduced changes were found stable. Hence, CT appears to be an important intervention for compensating cognitive deficits in executive functions in middle-aged employees with cognitively unchallenging work.

Keywords: aging, work, cognitive training, task switching, ERPs, N2, Ne, ERN

# INTRODUCTION

The continuous aging of the society leads to an increasing average age of the workforce and an increase in the pensionable age at least in the majority of western countries (Ilmarinen, 2006). In addition, the working demands have changed in recent years: while physical work and physical demands have diminished successively, mental work and psychological work demands have increased (see Hagger et al., 2010, for an overview). Also, there has been an increasing trend for highly specialized work which in turn implies a larger amount of repetitive work which is done under increasing time pressure and worry about job loss due to globalization and cost pressure

#### Edited by:

Soledad Ballesteros, National University of Distance Education, Spain

#### Reviewed by:

Philip D. Zelazo, University of Minnesota, USA Marco Calabria, Pompeu Fabra University, Spain

> \*Correspondence: Patrick D. Gajewski gajewski@ifado.de

Received: 14 November 2016 Accepted: 08 February 2017 Published: 22 February 2017

#### Citation:

Gajewski PD, Freude G and Falkenstein M (2017) Cognitive Training Sustainably Improves Executive Functioning in Middle-Aged Industry Workers Assessed by Task Switching: A Randomized Controlled ERP Study. Front. Hum. Neurosci. 11:81. doi: 10.3389/fnhum.2017.00081

(Sparks et al., 2001; De Cuyper et al., 2008). This raises the problem whether and to which extent such modern working environments have a negative (or positive) impact on cognitive skills, and particularly so in older employees. Even more important is the issue of potential interventions that may ameliorate cognitive changes due to age and work and hence increase the workability of older employees.

It is well-known that extended cognitive engagement has a positive effect on cognition (Hultsch et al., 1999; Schooler et al., 1999; Wilson et al., 1999; Schooler and Mulatu, 2001; Singh-Manoux et al., 2003; Stern, 2009). Cognitively demanding work is an important cognitive stimulation supporting cognitive functions (Alvarado et al., 2002; Bosma et al., 2002; Wild-Wall et al., 2009; Marquié et al., 2010) which also lowers the risk of dementia in older age (Stern et al., 1994; Andel et al., 2005; Lee et al., 2010; Swaab and Bao, 2011).

In a longitudinal study, Schooler et al. (1999) investigated the effects of complex and cognitive demanding work on cognitive performance across 20 years. The authors showed that demanding jobs improved cognitive performance, particularly in older employees. Potter et al. (2008) observed a similar beneficial effect of demanding work in a sample of about 1000 older workers across a time range of 7 years, whereas tedious physical job demands tended to impair cognition in the workers regardless of age, intelligence, and education. Furthermore, in a recent longitudinal study, Marquié et al. (2010) analyzed performance in cognitive tasks as a function of cognitive work demands in about 3000 workers. The authors could show that the more cognitively stimulating the work the higher the scores in tests of episodic memory, attention and speed of processing. More importantly, cognitively stimulating work leads even to an increase in cognitive performance in the tested employees over 10 years despite their increasing age.

In the previous cross-sectional study (Gajewski et al., 2010b), we assessed cognitive functions of young and middle-aged bluecollar workers who have been working for a long time either under repetitive or under flexible working conditions. Both groups of middle-aged employees were matched regarding ageand education. In particular, we aimed at investigating the status of cognitive control functions involved in a memorybased task-switching task, namely task preparation, working memory, and error processing. These functions were investigated by analyzing a specific ERP-component which reflects the cortical activity underlying different aspects of processing during task switching. We observed no age- and work-related impairments when the relevant task was indicated by a cue which reduced the working memory load. Age differences were apparent when memory demands were higher, i.e., when the task sequence had to be maintained across a large number of trials (memory-based task switching). Importantly, the older workers with long-term repetitive work showed the strongest deficits in performance. These behavioral impairments were accompanied by specific findings in the ERP, namely attenuation of preparation, reduction of working memory capacity, and diminished error detection. In contrast, the older workers with long-term flexible work showed small or no performance and ERP detriments when compared to young workers but differed considerably from their contemporary, repetitive working colleagues. These results are in line with the literature reporting negative impact of long-term unchallenging work on cognition.

Nonetheless, since our previous study was a cross-sectional one, the impairment of cognition in the workers with long-term repetitive work could also be due to early selection mechanisms, i.e., workers with relatively low cognitive skills prefer work in such repetitive jobs. While this is unlikely, given that the deficit was only seen in the demanding memory-based switching condition, it cannot be excluded. Therefore, a longitudinal study would be necessary to control for possible confounding variables and to analyze the time-course of cognitive functions across decades in workers with different working conditions. More useful, however, would be a longitudinal study which examines methods to improve cognitive functions (Stern, 2009; Gajewski and Falkenstein, 2015b). More specifically, it should be analyzed whether the cognitive deficits and their electrophysiological correlates found in our previous study could be ameliorated by cognitive training (CT).

Indeed, in the literature the most often reported type of formal CT is training of a specific function that focuses on one domain only, for example memory (Jaeggi et al., 2008), attention (Green and Bavelier, 2008), visual search (Becic et al., 2008), dual task (Bherer et al., 2005), or task switching (Karbach and Kray, 2009). In most studies the training effects were restricted to the trained function and did not transfer to other functions (e.g., Willis and Schaie, 1986; Willis et al., 2006), while some reports showed also transfer effects on non-trained activities (e.g., Gopher et al., 1994; Ball et al., 2007; Caserta et al., 2007; Cassavaugh and Kramer, 2009; Edwards et al., 2009; Karbach and Kray, 2009). Yet, the trained function could be influenced by an implicit involvement of non-explicitly trained functions.

As a consequence from the training literature it appears beneficial to conduct training that involves several fluid functions relevant for work and daily life (Kramer and Morrow, 2008). In most CT studies, test-like tasks or games which target different cognitive functions were trained via PC (computerized cognitive training; CCT) for an extended time. Recent meta-analyses suggest that CCT leads to improvements in various cognitive functions and also transfers to untrained cognitive tasks or even everyday situations (far transfer) in healthy older adults (Karbach and Verhaeghen, 2014; Kelly et al., 2014; Lampit et al., 2014; Ballesteros et al., 2015; Gajewski and Falkenstein, 2015b).

## ERPs in the Task Switching Paradigm

Studies evaluating training-related effects using electrophysiological measures are scarce despite the advantage of this method to investigate distinct processing steps from stimulus analysis across memory activation, decision making to response-related processes. In the present study, we focus primarily on cognitive processes associated with particular ERP components: retrieval of stimulus-response sets (P2), response selection (N2), allocation of cognitive resources to the task (P3b) and response-related processes like response monitoring (Nc) and error detection (Ne). The P2 is a frontocentral positive wave with latency of about 170–200 ms that has been associated with evaluation of task relevant stimuli and retrieval of S-R

mappings (Kieffaber and Hetrick, 2005; Gajewski et al., 2008; Adrover-Roig and Barceló, 2010). Larger P2 amplitudes may reflect enhanced target processing under difficult conditions such as task switch trials where larger P2 amplitudes were related to slower responses (Finke et al., 2011). Moreover, P2 latency was positively correlated with reaction times (RTs), supporting the interpretation of P2 as an index of S-R retrieval (Kieffaber and Hetrick, 2005).

Following the P2 the frontocentral negativity, the N2, with a latency of about 280–320 ms occurs. The N2 was mainly associated with mismatch (Folstein and Van Petten, 2008, for review), cognitive control of response (Hämmerer et al., 2014), conflict processing (Van Veen and Carter, 2002; Yeung and Cohen, 2006), decision making (Ritter et al., 1979), and response selection (Gajewski et al., 2008). In task switch trials interference processing between conflicting task-sets (i.e., S-R sets) was often associated with larger N2 amplitude and longer latency than in non-switch trials (Gajewski et al., 2010a, 2011; Karayanidis et al., 2011). This process seems to be sensitive to training as the N2 increased substantially after 4 months of CT in seniors (Gajewski and Falkenstein, 2012).

The P3b is a large positive wave with parietal maximum peaking at about 300 to 600 ms after stimulus onset that is associated with context updating (Donchin and Coles, 1988), allocation of cognitive resources (Kok, 2001), working memory (Polich, 2007), and outcome of a decision process (Nieuwenhuis et al., 2005; Verleger et al., 2006). The P3b was frequently reported in task switching studies with larger amplitudes in non-switch than task switch trials (e.g., Barceló et al., 2000; Lorist et al., 2000; Rushworth et al., 2002; Jost et al., 2008; Gajewski and Falkenstein, 2011). Training effects on the P3b are inconsistent. However, generally larger P3b amplitude was associated with better performance (Polich and Lardon, 1997; Hillman et al., 2006).

During response execution an error may occur. Such response errors are usually followed by a negative wave with a frontocentral maximum, the error negativity (Ne; Falkenstein et al., 1991), or error-related negativity (ERN; Gehring et al., 1993) which is thought to reflect the detection of errors or incorrect response tendencies. The Ne is seen as the result of a comparison process between the expected and the actual outcome, leading to behavioral readjustment and reflects a precondition for successful learning (Falkenstein et al., 2000; Holroyd and Coles, 2002). It has also been shown that the Ne is smaller for older than younger participants, suggesting a weakening of action monitoring processes in older adults (Band and Kok, 2000; Falkenstein et al., 2001). After correct responses, a smaller negativity is seen, called Nc or CRN (Yordanova et al., 2004) which is thought to reflect response monitoring (Allain et al., 2004).

# The Present Study

The aim of the longitudinal part of the cross-sectional study reported by Gajewski et al. (2010b) was to ameliorate cognitive decline found in middle-aged assembly line workers, as outlined above. In the present study, CT as a crucial method for improving fluid cognition as reported in the literature (Strobach and Karbach, 2016) was applied. We used a formal paper- and computer-based CT which was found to be effective in improving cognitive functions in healthy older subjects as measured with behavior as well as electrical brain activity in the previous training study (Gajewski and Falkenstein, 2012; Wild-Wall et al., 2012). The training was not focused on a particular cognitive function but aimed at affecting a broad spectrum of cognitive abilities with different training tasks. The CT contained both several paper and pencil tasks (Mental Activation Training, MAT; Lehrl et al., 1994) and a number of computer-based tasks from commercial packages. The tasks were selected to target different cognitive functions such as attention, working memory, inhibition, and dual-task performance. The training format was two sessions per week (lasting 1.5 h each) for 3 months in groups guided by a skilled teacher. Fifty-seven middle-aged workers with highly repetitive work finished this study. The participants were randomly assigned to the training group and a Waiting Control Group. The latter received a combined cognitive, relaxation, and stress management training after the Cognitive Training Group finished the training, i.e., after both groups were post-measured. The offer of the combined training was aimed to keep the motivation of the waiting group high and to reduce dropouts and also to assess effects of the stress management training (which will be reported elsewhere). Moreover, this design allows additional validation of CT effects in a second sample. Apart from behavioral data we aimed at analyzing a number of ERPs which may be affected by CT.

Following hypotheses were formulated: Existing evidence indicates that CT is a useful tool for enhancing cognitive functions (Strobach and Karbach, 2016). Thus, CT should be particularly helpful in restoring cognitive deficits in individuals with long-term repetitive and unchallenging type of work. To measure the effects of CT, the same difficult memorybased task switching paradigm was used as in the previous study that revealed cognitive impairments in assembly line workers (Gajewski et al., 2010b). This task should be suitable to detect even marginal training-induced changes in executive functioning in middle-aged healthy workers. According to the findings of our previous training study with healthy seniors (Gajewski and Falkenstein, 2012), we expected improvement of accuracy in this task, whereas no changes in speed were expected. Moreover, performance effects should be accompanied by specific enhancements in ERP activity. Specifically, we expect an enlargement of N2 and P3b amplitudes after the training, indicating more efficient response selection, and working memory processes, as well as increased Ne/ERN, suggesting improved error detection that would corroborate our previous results. In respect to the P2, we also expect larger amplitudes after CT as enhanced performance was associated with larger P2 amplitudes (Wild-Wall et al., 2012). We do not have any specific expectations regarding the long-term effect of training assessed by the follow-up measure. Finally, we do not expect any beneficial effects of a combined relaxation and CT over a pure CT on executive functions, as relaxation alone was not associated with cognitive effects (Gajewski and Falkenstein, 2012).

#### MATERIALS AND METHODS

fnhum-11-00081 February 20, 2017 Time: 14:48 # 4

#### Participants

From the 60 participants 57 healthy male volunteers aged from 40 to 57 (M = 46.5, SD = 4.5) finished the study and completed all measures. None of the individuals participated in the cross-sectional study conducted previously (Gajewski et al., 2010b). Eleven participants were left- or both handed. They had normal or corrected-to-normal vision. Five participants finished Primary school (4th grade), 34 secondary general school (10th grade), 13 intermediate secondary school (10th grade), and 5 grammar school (gymnasium, 12th grade). Twenty-eight participants worked in double shifts (early and late shift), 20 participants worked in a night shifts only. Early and night shift employees were trained together in the afternoon after the early shift was finished. Late shift employees were trained before the late shift began. All participants received a payment for their participation. This study was carried out in accordance with the recommendations of the local ethics committee of the Leibniz association with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the local ethics committee of the Leibniz association.

#### Study Design

The schedule of trainings and measurements is presented in **Figure 1**. The study was designed as a randomized controlled trial with pre-, post-measure, and a follow-up measures. After the pre-measure (t1), participants were randomly assigned to the cognitive training (COG) and a Waiting Control Group. After the 20 sessions of the CT were finished (t2) both groups were tested again. Thereafter the Control Group received a stress management and relaxation training (REL) in the first eight sessions and in the remaining 12 sessions qualitatively the same (though shorter) cognitive training (REL + COG). After the combined training was finished both groups were measured again (t3), providing a follow-up measure for the COG group and an assessment of the training changes in the combined training group (REL + COG). Note, though the main focus was on the effects of the COG-group relative to the control between t1 and t2 and the long-term effect of training assessed at t3, we also report the effects of the combined intervention to investigate whether the training effects were similar to that for the COG group. Before, after the training and 3 months after the end of the training several paper and pencil and computer-based psychometric tests were applied. Here, we report only the results of the PC-based task switching paradigm.

#### Cognitive Training

The CT lasted 3 months with two sessions per week and 1.5 h per session (20 sessions in total). The training aimed at enhancing general cognitive abilities and was not designed to improve a particular cognitive function. Thus, the training consisted of a paper and pencil and a number of commercial and noncommercial PC-based exercises. A number of training packages listed below were already used and described in a previous study (cf. appendix in Gajewski and Falkenstein, 2012).

During the first month the so-called MAT (Lehrl et al., 1994) was applied. MAT is a paper-based CT to increase basic cognitive functions such as short-term memory and speed of information processing. In the subsequent weeks, the participants were trained with selected tasks taken from commercial and non-commercial internet-based software packages<sup>1</sup>,2,3,4,<sup>5</sup> . The difficulty level of the training tasks was continuously adapted to the participants' individual abilities either by the program or by the trainer.

Each session comprised different exercises that aimed to train crucial cognitive functions like perceptual speed, attention, and memory. For example, the training package FreshMinder2 includes games that require fast responding to specific stimuli like colored balloons, fast selecting digits in ascending order, memorizing and delayed recalling of faces, repeating of sound sequences, matching of letters, memorizing shopping lists, counting bricks in 3-D figures, memorizing and recalling of schematic paths, etc.

The package Ahano peds consists of units with different difficulty levels. The freely available program includes an eye– hand coordination task, money counting task, detection of word repetitions in a text, block tapping task, memorizing abstract figures etc. The package Mentaga consists of exercises enhancing vigilance, perceptual speed, spatial cognition, and eye–hand coordination. Finally, the package Mental-Aktiv offers a number of memory tasks using digits, letters, colors and figures and exercises to train speed of processing. The specific tasks from those packages were chosen on the basis of their validity for the targeted functions as well as the fun they induced during exercising.

Additional sessions took place at the end of the program for those participants who missed the regular sessions.

#### Task Switch Task

The same task was used as in a number of previous reports (Gajewski et al., 2010b; Gajewski and Falkenstein, 2012, 2015a; Schapkin et al., 2014). Briefly, the stimuli in the task switching paradigm consisted of the digits 1–9, excluding the number 5. The digits were presented in white on a black computer screen 3 mm above the white fixation point (10 mm diameter). Each digit was presented in small (7 mm × 10 mm) and in large (12 mm × 18 mm) size. A cue (16 mm × 32 mm) indicating the relevant task was presented 3 mm below the fixation point. The cue "NUM" (German "Numerisch," numeric) indicated a numerical task (greater or less than 5), "GER" (German "Geradzahligkeit," parity) the parity task (odd vs. even), "SCH" (German "Schrift," font) the font-size task (small vs. large). The cue was presented in the single tasks only. In the memorybased mixed block (see below) three letters X were presented

<sup>1</sup>www.brain-trainer.de

<sup>2</sup>www.happyneuron.de

<sup>3</sup>www.ahano.de

<sup>4</sup>www.freshminder.de

<sup>5</sup>www.mentaga.de

instead of the informative cue. Responses consisted of pressing one of two buttons which were mounted in a response box. The buttons should be pressed with the index fingers. The stimulus– response mapping of the three tasks was overlapping, that is, responses according to 'smaller than five,' 'even,' and 'small size' were assigned to the left key and 'larger than five,' 'odd,' and 'large size' to the right key. This assignment was counterbalanced across participants.

# Procedure

#### A schematic example of a trial is shown in **Figure 2**.

The same procedure was used as in the previous studies that used this task (Gajewski et al., 2010b; Gajewski and Falkenstein, 2012, 2015a; Schapkin et al., 2014). Shortly, a trial started with a presentation of the fixation point for 300 ms. In the single task blocks, a cue stimulus was presented for 1300 ms which remained visible when the digit was presented. A response had to be given within 2500 ms after target-onset. Five hundred milliseconds after the response a feedback was displayed for 500 ms. In case of a correct response a plus sign, after a false response a minus sign was displayed. After that the next cue was shown. The response-cue interval (RCI) was set to 1000 ms and included the response-feedback delay and the feedback.

At the beginning of the session participants performed three single task blocks with a fixed task NUM, GER, and SCH consisting of 34 trials each. The single task blocks were used to become familiar with the different task rules before the mixed block was run and to assess baseline performance in a task without a concurrent task.

In the memory-based mixed block, the three tasks were presented in mixed order and the participants were instructed to switch the task after every three trials in the following order "NUM–NUM–NUM–GER–GER–GER–SCH–SCH–SCH" etc. while a dummy cue "XXX" instead of a cue was presented, i.e., participants had to keep the trial sequence in mind. When three consecutive errors were made or no response within the 2500 ms interval was given, cues were presented for the next three trials, helping the participants to find the track. The mixed block consisted of 126 trails.

The frequency of task switch in the memory-based block amounted to 33.3% of trials. The participants were given written instructions explaining the task. The instructions encouraged quick and accurate responses.

## EEG Recording

The same recording parameters were used as in our previous studies (Gajewski et al., 2010b; Gajewski and Falkenstein, 2012, 2015a). Briefly, EEG was recorded from 32 scalp electrodes according to the extended 10–20 system and mounted on an elastic cap. The montage included 8 midline sites and 12 sites on each hemisphere and two mastoid electrodes (M1 and M2). The EEG was re-referenced offline to linked mastoids. The horizontal and vertical EOG was recorded bipolarly from electrodes at both eyes. Eye movement artifacts were corrected

using the correction algorithm of Gratton et al. (1983). Electrode impedance was kept below 10 k. The amplifier bandpass was 0.01–140 Hz. EEG and EOG were sampled continuously with a rate of 2048 Hz. Offline, the EEG was downscaled to a sampling rate of 1000 Hz and cut in stimulus-locked by using the software Vision Analyzer (Brain Products, Munich). Epochs in which the amplitude exceeded ±150 µV were rejected. The ERPs were filtered digitally offline with a 17 Hz low and 0.05 Hz high pass.

#### Data Analysis

Excluded from the RT analysis were: the first trial of each test block, trials with responses faster than 100 ms or slower than 2500 ms and error trials. RTs and error rates (ERs) were subjected to an ANOVA design including within-subject factors Task-Set Transition (non-switch vs. switch), Session (pre-test (t1) vs. posttest (t2)) and the between-subject factor Group (training vs. control).

As the design was not fully balanced (CT between t1 and t2 and combined training between t2 and t3, see **Figure 1**) the follow-up effect in the COG was assessed by planned comparisons t2 vs. t3 and t1 vs. t3 within this group. ANOVAs were conducted with the two within-subject factors Task-Set Transition and Session without the between subject factor Group. A significant effect of Session between t1 and t3 and lack of difference between t2 and t3 would suggest a training-induced change that persists until the follow-up measure.

The training effect in the combined training group (REL + COG) was assessed by planned comparisons between t2 and t3 and t1 vs. t2. Similar to the planned comparisons used in the COG group, ANOVAs with the two within-subject factors Task-Set Transition and Session within this group were conducted. A substantial training effect would be found when a change occurs between t2 and t3 whereas no difference will be found between t1 and t2 (effect of repeated measures).

Behavioral and ERP data were analyzed from the memorybased mixing block. The length of the epoch in the target-locked ERP was 1100 ms, and 600 ms in the response-locked ERP. The ERP analysis was restricted to the post-target and post-response ERPs at the midline electrodes (Cz and Pz) as the targeted components P2, N2, P3b, and Nc/Ne are usually maximum at the midline. Moreover, a priori selection of these electrodes considerably reduced the complexity of the reported effects. In contrast to our previous studies, we analyzed the P2 at Cz and not at Fz as the P2 amplitude was clearly maximum at Cz. The P2 was measured as the most positive peak between 150 and 300 ms. N2, P3b, and Ne were analyzed at the same electrodes as in the previous reports (Gajewski et al., 2010b; Gajewski and Falkenstein, 2012, 2015a). The N2 was measured as the most negative peak at Cz in the time range 150–400 ms after targetonset. The P3b was measured at Pz in the time range 300–600 ms after target-onset. P2, N2, and P3b were measured in correct trials relative to 100 ms baseline prior to target – onset. Nc and Ne were measured in as the most negative peak in the time range 0–150 ms after the response relative to the 100 ms pre-response baseline. Nc and Ne were collapsed across non-switch and switch trials to get at least six error trials per subject (Steele et al., 2016). The ERPs were analyzed in the same way as the behavioral data.

# RESULTS

## Behavioral Data

Mean RTs and ERs for the non-switch and switch trials for the training and the Control Group are presented in **Figure 3**.

For the analysis of response times, error trials (8.5 and 6.0%) and outliers (0.9 and 0.4%) for the training and Control Group, respectively, were discarded.

#### Reaction Times

For the group comparison between t1 and t2 the mean RTs in non-switch trials were shorter than in switch trials (750 vs. 868 ms), reflecting substantial switch costs [F(1,56) = 66.8, p < 0.0001, η <sup>2</sup> = 0.544]. The factor Session was significant due to faster RTs in the post- than pre-measure [776 vs. 846 ms; F(1,56) = 8.8, P < 0.005, η <sup>2</sup> = 0.137]. None of these factors were modulated by the factor Group.

The follow-up effect in the COG group assessed by the comparison between the Sessions t2 and t3 confirmed again an effect of Task-Set transition [F(1,25) = 39.1, p < 0.0001, η <sup>2</sup> = 0.610] but no effect of Session nor interaction between Task-Set Transition and time were significant (both F's < 1). The comparison between t3 and t1 showed, apart from the effect of Task-Set Transition [F(1,25) = 24.2, p < 0.0001, η <sup>2</sup> = 0.493], only trends for the effect of Session [F(1,25) = 2.7, p = 0.114, η <sup>2</sup> = 0.097] and the interaction of both factors [F(1,25) = 2.5, p = 0.123, η <sup>2</sup> = 0.092], suggesting no stable change of RTs due to training.

In order to assess a possible training effect in the REL + COG group, t2 was compared with t3. ANOVA yielded an effect of Task-Set Transition [F(1,30) = 40.8, p < 0.0001, η <sup>2</sup> = 0.557], which was not modulated by time-point (F < 1). However, there was a trend of Session, suggesting faster RTs at t3 than at t2 in this group [701 vs. 740 ms, F(1,30) = 3.4, p = 0.076, η <sup>2</sup> = 0.101]. For the comparison t2 vs. t1 the effect of Task-Set Transition was again significant [F(1,30) = 62.3, p < 0.0001, η <sup>2</sup> = 0.675] as well as the effect of Session [F(1,30) = 11.0, p < 0.005, η <sup>2</sup> = 0.269]. The interaction did not reach significance [F(1,30) = 1.4, p = 0.245, η <sup>2</sup> = 0.045]. Basically, no changes in speed due to training were observed. RT reductions were merely due to repeated measures.

#### Error Rates

The main effect of Task-Set Transition was significant [F(1,56) = 76.5, p < 0.0001, η <sup>2</sup> = 0.577], suggesting switch costs in accuracy (10.5 vs. 18.1%, for non-switch and switch trials, respectively). The ER was lower at t2 than t1 resulting in the effect of Session [12.4 vs. 16.2%; F(1,56) = 5.9, p < 0.05, η <sup>2</sup> = 0.096]. Importantly, the factor Session was modulated by Group [F(1,56) = 12.2, p < 0.001, η <sup>2</sup> = 0.179] due to a reduction of ERs from t1 to t2 in the COG group from (20.7 to 11.5%) and a slight increase in the Control Group (11.7 vs. 13.4%).

The follow-up analysis in the COG group revealed an effect of Task-Set Transition [F(1,25) = 49.8, p < 0.0001, η <sup>2</sup> = 0.666] and no effect of Session or interaction of both factors (both F's < 1). Comparison between t3 and t1 yielded an effect of Task-Set Transition [F(1,25) = 36.7, p < 0.0001, η <sup>2</sup> = 0.595] and an effect of Session [F(1,25) = 4.7, p < 0.05, η <sup>2</sup> = 0.159], whereas no interaction between Task-Set Transition and Session was found (F < 1).

After the training between t2 and t3 the REL + COG group showed apart from the expected effect of Task-Set Transition [F(1,30) = 36.5, p < 0.0001, η <sup>2</sup> = 0.549] a main effect of Session [F(1,30) = 10.7, p < 0.005, η <sup>2</sup> = 0.264], whereas the interaction Session × Task-Set Transition did not reach significance [F(1,30) = 1.4, p = 0.236, η <sup>2</sup> = 0.049]. The comparison t1 vs. t2 did not reveal effect of Session or interaction Session × Task-Set Transition and showed merely the effect of Task-Set Transition [F(1,30) = 33.2, p < 0.0001, η <sup>2</sup> = 0.518].

In order to analyze which participants benefitted from the CT we split the COG group in low and high performers (i.e., lower vs. higher ERs at t1 in task switch condition than the median). The analysis revealed an interaction Session × Performance [F(1,24) = 10.1, p < 0.005, η <sup>2</sup> = 0.297], and a second

order interaction Session × Task-Set Transition × Performance [F(1,24) = 13.8, p < 0.001, η <sup>2</sup> = 0.365], suggesting that the low performers improved the accuracy after training particularly in task switch trials between t1 and t2 (41.2 vs. 20.1%), whereas high performers did not (10.0 vs. 10.4%) improve. The benefit was less pronounced in non-switch trials (23.6 vs. 12.9% in the low performers and 8 vs. 3.3% in the high performers). Moreover, age was differentially associated with training gains. Younger trainees (40–47 years) benefitted strongly from the training (14% accuracy improvement) than their older colleagues (47–57 years, 2% improvement), resulting in an interaction Age Group × Session [F(1,24) = 5.5, p < 0.05, η <sup>2</sup> = 0.187].

In sum, the ERs were reduced in the COG group after training. This reduction remained unchanged during the followup measure. In the Waiting Control Group, no changes were found between t1 and t2 but again a clear reduction of ERs was observed after the combined training at t3. No specific changes of local switch effects in accuracy due to training were seen. Initial low performers benefitted more from the training than high performers, particularly in task switching trials. Younger trainees improved their accuracy after training more than the older ones.

#### ERP Data

Grand averages for target-locked ERP-waveforms at Cz and Pz for the two groups are shown in **Figures 4** and **5**.

In order to focus on training-related changes of executive functions, only effects or interactions including Task-Set Transition, Session, and Group were reported in the following.

# ERPs

#### P2

The ANOVA analyzing effect of the COG training relative to the Waiting Control Group revealed a trend for an interaction Session × Group for P2 latency [F(1,56) = 4.2, p < 0.05, η <sup>2</sup> = 0.069]. This interaction was due to reduced P2 latency at t2 relative to t1 in the COG group (158 vs. 167 ms), whereas no such effect was observed in the Control Group (161 vs. 158 ms). Also, there was a trend for the interaction Session × Group × Task-Set Transition [F(1,56) = 3.2, p = 0.078, η <sup>2</sup> = 0.055], suggesting that the latency reduction was mainly due to task switch (14 ms) and hardly in non-switch trials (5 ms). P2 amplitude showed an effect of Session [F(1,56) = 10.7, p < 0.005, η <sup>2</sup> = 0.161], indicating a general reduction of P2 at t2 compared to t1 (5.3 vs. 4.7 µV) that was additionally modulated by the factor Group [F(1,56) = 9.6, p < 0.005, η <sup>2</sup> = 0.147]. This interaction was based on the amplitude reduction between t1 and t2 in the COG group (5.4 vs. 4.2 µV) while no changes in the Waiting Control Group were observed (5.1 vs. 5.1 µV). The assessment of the follow-up effect for P2 latency and amplitude (t2 vs. t3) yielded no effect of Session nor interaction Session × Task-Set Transition (all F's < 1). However, the comparison between t1 and t3 revealed a substantial effect of Session for P2 amplitude (5.4 vs. 3.9 µV; F(1,23) = 19.0, p < 0.0001, η <sup>2</sup> = 0.354) but not for P2 latency (F < 1).

Comparison between the Sessions t2 and t3 for the Waiting Control Group (REL + COG) showed for the P2 latency a main effect of Task-Set Transition due to delayed P2 in non-switch than switch trials [168 vs. 159 ms; F(1,30) = 6.0, p < 0.05, η <sup>2</sup> = 0.167] and an interaction Task-Set Transition and Session [F(1,30) = 4.7, p < 0. 05, η <sup>2</sup> = 0.127] due to later P2 latency at t3 than t2 for non-switch (178 vs. 159 ms) and a shorter latency at t3 than t2 for task switch trials (155 vs. 162 ms). The P2 amplitude showed a large reduction after the combined REL + COG training (5.2 vs. 3.6 µV ms), resulting in a main effect of Session [F(1,30) = 54.5, p < 0.0001, η <sup>2</sup> = 0.645]. No effect of Task-Set Transition or interaction Task-Set Transition × Session was found (Both F's < 1). Finally, in order to assess possible effects of repeated measures comparison between t1 and t2 was conducted. ANOVA showed no significant effects or interactions either for P2 latency or amplitude.

In sum, the P2 amplitude was substantially reduced and the latency partly shortened both after the pure cognitive training (COG) and the combined relaxation and cognitive training (REL + COG). The P2 amplitude reduction remained stable during the follow-up measure.

#### N2

ANOVA analysing effects of the CT on the N2 latency showed an effect of Task-Set Transition [F(1,56) = 5.9, p < 0.05, η <sup>2</sup> = 0.096], reflecting prolonged N2 latency in switch vs. nonswitch trials (283 vs. 272 ms). There was a strong trend of Session [F(1,56) = 3.8, p = 0.054, η <sup>2</sup> = 0.065] due to shorter latency at t2 than at t1 (172 vs. 184 ms). The interaction Session × Group did not reach significance [F(1,56) = 2.3, p = 0.13, η <sup>2</sup> = 0.040].

In respect to N2 amplitude a main effect of Session indicated a larger (more negative) N2 at t2 than at t1 [−1.0 vs. −0.5 µV; F(1,56) = 6.2, p < 0.05, η <sup>2</sup> = 0.100]. Most importantly, Session was modulated by Group [F(1,56) = 5.7, p < 0.05, η <sup>2</sup> = 0.092], suggesting a N2 increase after COG training at t2 relative to t1 (−1.9 vs. −1.0 µV) while no difference in the Waiting Control Group was observed (−0.1 vs. −0.1 µV). The follow-up measure in the COG group showed no significant effects or interactions on N2 latency, whereas N2 amplitude was significantly reduced at t3 relative to t2 (−0.6 vs. −1.9 µV), resulting in an effect of Session [F(1,23) = 12.1, p < 0.005, η <sup>2</sup> = 0.345]. Comparison between t1 and t3 did not yield any significant effects or interactions on N2 amplitude.

Assessment of the training effects in the REL + COG group showed an effect of Session on N2 latency [F(1,56) = 6.5, p < 0.05, η <sup>2</sup> = 0.183], indicating a shorter N2 latency at t3 than at t2 (248 vs. 267 ms). No further effect or interaction reached significance. Comparison between t1 and t2 reflecting effects of repeated measure showed merely an effect of Task-Set Transition due to larger N2 in task switch than non-switch trials [−0.3 vs. 0.1 µV; F(1,56) = 5.8, p < 0.05, η <sup>2</sup> = 0.157]. No effects of Session or interaction Session × Task-Set Transition was found.

In sum, N2 amplitude was increased after CT whereas no difference was found in the Waiting Control Group. However, the effect on N2 amplitude diminished in the follow-up measure. No significant training effect on N2 latency in the COG group was found. However, the N2 latency was shorter in the REL + COG group after training.

#### P3b

ANOVA did not show any effects or interactions on P3b latency. Analysis of P3b amplitude revealed an effect of Task-Set Transition, indicating smaller P3b in switch than non-switch trials [5.1 vs. 5.9 µV; F(1,56) = 12.8, p < 0.001, η <sup>2</sup> = 0.187] and a trend of Session, showing a slight increase in P3b amplitude at t2 compared to t1 [6.2 vs. 6.8 µV; F(1,56) = 3.5, p = 0.06, η <sup>2</sup> = 0.059]. There was no interaction with Group (F's < 1). However, the main effect of Group showed a larger P3b in the control than in the training group [6.5 vs. 4.5 µV; F(1,56) = 6.5, p < 0.05, η <sup>2</sup> = 0.105]. The follow-up measure showed no substantial changes of the P3b between t2 and t3.

The ANOVA analysing the effects of the combined training (REL + COG) yielded an effect of Session, indicating P3b reduction from t2 to t3 [6.9 vs. 5.8 µV; F(1,30) = 7.5, p < 0.01, η <sup>2</sup> = 0.201]. No effects on P3b latency were found. Finally, the comparison t1 vs. t2 in this group showed apart from the standard effect of Task-Set Transition [6.9 vs. 6.0 µV; F(1,31) = 8.8, p < 0.01, η <sup>2</sup> = 0.222] a trend of Session, suggesting P3b increase from t1 to t2 [6.2 vs. 6.8 µV; F(1,31) = 3.7, p = 0.06, η <sup>2</sup> = 0.108].

Taken together, the P3b amplitude and latency did not vary consistently as a function of training. Its amplitude increased at t2 and decreased at t3. The Waiting Control Group showed generally larger P3b amplitude than the COG group.

#### Nc and Ne

The correct response negativity (Nc) and error negativity (Ne) at Cz (**Figure 6**) were pooled across switch and non-switch trials. There were 24 participants in each group that made a sufficient number of erroneous responses. There were no effects or interactions on Nc or Ne latency.

The ANOVA including the factors Response Correctness (Nc, Ne), Session and Group showed an effect of Correctness [F(1,46) = 26.8, p < 0.0001, η <sup>2</sup> = 0.368] with larger amplitudes in error than correct trials (−3.1 vs. −1.0 µV), an interaction Correctness and Session [F(1,46) = 6.6, p < 0.05, η <sup>2</sup> = 0.125] due to larger increase in the Ne (−2.6 vs. −3.6 µV) than Nc (−1.3 vs. −0.7 µV) between t1 and t2. Most importantly, the interaction of all three factors was also significant [F(1,46) = 4.6, p < 0.05, η <sup>2</sup> = 0.091], indicating a larger increase in the Ne after the CT whereas no difference was found for the Control Group. Comparison between t1 and t2 for Ne in the COG group confirmed this result [−1.8 vs. −3.6 µV; F(1,23) = 4.3, p < 0.05, η <sup>2</sup> = 0.158], whereas no difference between t1 and t2 was found in the Waiting Control Group (−3.4 vs. −3.6 µV; F < 1). The Ne amplitude in the COG group remained at the same level in the follow-up measure (−3.7 µV; F < 1). An additional comparison between t1 and t3 revealed a trend [−1.8 vs. −3.6 µV; F(1,20) = 3.5, p = 0.07, η <sup>2</sup> = 0.150].

For the combined REL + COG group, the ANOVA showed again an effect of Correctness due to larger amplitude in error than correct trials [−5.6 vs. −1.2 µV; F(1,18) = 19.5, p < 0.0001, η <sup>2</sup> = 0.520], an effect of Session, suggesting an amplitude increase between t2 and t3 [−2.6 vs. −4.2 µV; F(1,18) = 4.8, p < 0.05, η <sup>2</sup> = 0.210] and an interaction Correctness × Session [F(1,18) = 7.1, p < 0.05, η <sup>2</sup> = 0.284], showing a strong increase in the Ne between t2 and t3 [−3.8 vs. −7.3 µV; F(1,18) = 6.3, p < 0.05, η <sup>2</sup> = 0.259], while the Nc showed similar amplitudes at t2 and t3 (−1.4 vs. −1.0). The comparison between t1 and t3 corroborated the training-related increase in the Ne [F(1,18) = 15.4, p < 0.001, η <sup>2</sup> = 0.448], which cannot be attributed to repeated measures as indicated by the non-significant comparison between t1 and t2 (F < 1).

In sum, the Ne after erroneous responses was, as expected, larger than the Nc after correct responses. The Ne (but not the Nc) was larger after the CT and remained stable during the follow-up measure while no changes were found in the Waiting Control Group. However, after the combined training (REL + COG) this group also showed a strong enhancement of the Ne.

#### DISCUSSION

The aim of the present study was the enhancement of executive control processes in middle-aged industry employees with unchallenging, repetitive type of work by a 3 months, trainerguided CT or a combined relaxation/stress management and cognitive intervention. Pre- and post-measure in the Cognitive Training and the Waiting Control Group allowed assessing the effects of the CT. A follow-up measure was included to assess the residual effects of training 3 months after the training was finished. The Waiting Control Group received a combined relaxation and CT (REL + COG) lasting also 3 months after the post-measure was completed. The effects of the combined training were evaluated after the training was finished.

Training effects were assessed using a complex memory based task switching paradigm which was shown to be sensitive to a number of cognitive or physical interventions (Gajewski and Falkenstein, 2012, 2015a). In accordance with the previous study with seniors (Gajewski and Falkenstein, 2012) CT substantially improved the accuracy in this task, whereas RTs were not affected. The new finding of the present study is that this improvement remained stable 3 months after the training ended. An additional support for this finding provided the Waiting Control Group showing a similar improvement in accuracy after the REL + COG training. However, one point may be problematic for the interpretation of training effects in the present study. At first glance the different ERs at t1 between the groups may question the real training effect as the chance to reduce ERs was rather higher in the COG group due to poorer initial performance than in the REL + COG group (despite randomization procedure). However, the statistical analysis showed unequivocally significant

improvement in accuracy both in the COG group as well as in the REL +COG group. In other words, the REL + COG group did not reduce ERs between t1 and t2 but reduced it significantly after training, indicating no effects of repeated measures and a real effect of training even if the initial accuracy was high.

Electrophysiological data allowed further insights in the specific processing steps and cognitive subcomponents influenced by training. The target-locked ERPs provide a number of interesting findings concerning effects of the CT. First, the amplitude of the P2 was consistently reduced and its latency shortened both after CT alone and after combined training. This attenuation remained stable during the follow-up measure and was not caused by repeated measures. Its reduced amplitude and latency after training may reflect less effort during task-set retrieval that leads to a faster activation of task relevant stimulusresponse sets (c.f. Finke et al., 2011; Schapkin et al., 2014).

The next process which was influenced by training in the present study was the frontocentral N2. Our previous study with seniors already showed the sensitivity of the N2 to CT (Gajewski and Falkenstein, 2012). Apart from the usually observed effects of a larger N2 in task switch than nonswitch trials, the N2 amplitude was increased both after the pure CT and the combined cognitive and relaxation training. The enhanced N2 suggests a more intense response selection and may be the origin of the improved accuracy. No training effects on N2 latency were found. A surprising finding in this study was the reduction in the amplitude enhancement after a 3-month training free period. This may suggest either an adaptation effect of neuronal structures underlying response selection or alternatively an index of gradual decay of the training induced effect. Both possibilities should be considered in future studies.

The P3b showed a lower amplitude in switch vs. nonswitch trials, which was consistently reported previously (Barceló et al., 2000; Rushworth et al., 2002; Jost et al., 2008; Gajewski and Falkenstein, 2011, 2012, 2015a; Karayanidis et al., 2011). However, the P3b did not vary as a function of training. Its amplitude was generally increased during pre-measure and reduced during follow-up measure but this effect was probably not related to the training but rather due to the repeated application of the same task and corresponding adaptation effects. As the P3b is a conglomerate of diverse neuronal mechanisms (e.g., Polich, 2007) a simpler paradigm like Oddball paradigm may be more appropriate to analyze P3b changes in the course of CT.

Finally, the most consistent results associated with training was provided by the analysis of error negativity Ne. The Ne was expectedly more pronounced than the corresponding negativity during correct responses. Most importantly, the Ne (but not the Nc) was substantially increased after CT in both groups (COG, COG + REL) and remained stable during the follow-up measure for the COG group while no changes were found in

the Waiting Control Group. These results corroborate findings obtained in our previous study with seniors (Gajewski and Falkenstein, 2012). The improvement of accuracy was previously interpreted as a consequence of the enhancement of the response selection process reflected by the N2. This may lead to higher awareness about the correct response which in turn produces a strong signal if the expectation to respond correctly was violated.

The present study extended previous results obtained with middle-aged employees of the same big car factory that showed cognitive impairments in assembly line workers compared to flexibly working employees responsible for service, maintenance and repair of machines (Gajewski et al., 2010b). The assembly line employees showed increased ERs and RTs in the same task as in the present study and reduced P3b and Ne relative to their flexible working colleagues. In the present study, the initial cognitive status of the participants was comparable to the performance of those of the previous study as the participants were derived from the same population of middleaged assembly line workers. The training effects obtained in the present study indicate that 3 months of cognitive or combined relaxation training is able to reduce cognitive decline and to elevate cognitive performance to the level of flexibly working employees.

Unfortunately, these findings do not allow drawing conclusions regarding the employability of the elderly workers as no far transfer tasks related to the work were used. However, it is plausible to assume that the observed gains of cognitive functions may improve the ability to learn new contents and work processes, enhance self-confidence and help to strengthen individual potential at work. Therefore, future studies in occupational environments should also evaluate far transfer by measurements of work efficiency, individual performance at work, risk of workplace injuries and work-related illness.

In summary, CT appears a most promising tool for improving mental fitness and employability in older workers. Future CT studies in occupational environments should evaluate far transfer by measurements of work efficiency, individual performance at work, risk of workplace injuries and work-related illness.

#### Limitations

The present study has a number of limitations which have to be acknowledged. Firstly, much effort has been invested to acquire the sample, as a large number of industry workers of this car factory were not willing to participate in the relatively time-consuming study before or after their work. Therefore, we assume that the study population was not fully representative of the factory population. Secondly, the age of the participants was rather homogeneous (mid forty). It would be interesting to analyze training effects in older employees, beyond sixties as older workers may show larger cognitive deficits which may be restored to improve or maintain their quality of life and employability until their retirement. Thirdly, the experimental design was not fully balanced and the training effects in the COG and REL + COG Group not directly comparable. Fourthly, the effects of CT were not specific to the switch ability as no differential effects in non-switch and switch trials were found. Reduction of local switch costs would be indicative for improvements in the switch ability, whereas performance enhancement in nonswitch trials would be indicative for improvements in mixing costs. Indeed, mixing costs are a more sensitive parameter in respect to physical or CT (Gajewski and Falkenstein, 2012, 2015a). In other words, it is quite possible that other functions were improved by the training as a wide range of cognitive abilities was trained. Our assumption is that CT does not affect the switch ability per se but rather enhance updating of relevant task information and working memory capacity that improve the ability to maintain a long task sequence as in the current paradigm and to activate the relevant task-set at the right time point. This was supported by the observation that the errors in this paradigm are mainly due to lapses in the representation of the task sequence and activating the wrong task-set. Finally, and related to the previous point, the CT was qualitatively heterogeneous and trained a large number of cognitive functions. Therefore, it is not possible to determine which specific training regime was effective to affect the data of our study and which trainings were fully ineffective. Nevertheless, as this study was conducted in an applied context, controlling of a large number of confounding factors was not possible. Future studies with more controlled training regimes should be conducted to establish the most efficient training conditions in respect to duration, intensity, and content.

# CONCLUSION

The present study was designed to compensate cognitive deficits in middle-aged workers with unchallenging work. Training-related gains in cognitive performance were observed in an improved accuracy, suggesting enhanced maintenance of a complex task sequence in working memory. This performance benefit was accompanied by a number of electrophysiological effects, like amplitude decrease and latency reduction of a frontocentral P2 related to retrieval of task-sets and latency decrease and amplitude increase of the frontocentral N2 associated with selection of a correct response. The P2 effect persisted 3 months after the training was finished, whereas the N2 amplitude effect disappeared.

Furthermore, an increase in error negativity (Ne) associated with error detection and monitoring was found which also persisted after a 3 months training-free period. In contrast, no performance and ERP effects in the Control Group were observed. Though, after the Waiting Control Group received the combined cognitive and relaxation training, the same performance and ERP changes were observed as in the Cognitive Training Group, suggesting that these effects are unequivocally due to training and not to the effect of repeated measures. These findings suggest that formal CT may indeed ameliorate cognitive decline that is also reflected in a sustained improvement of brain activity.

# AUTHOR CONTRIBUTIONS

fnhum-11-00081 February 20, 2017 Time: 14:48 # 13

PG: Conception and design, data analysis, interpretation of the data, writing, and final approval. GF and MF: Conception and design, interpretation of the data, writing, and final approval.

## FUNDING

The research reported in this article was conducted as a PFIFF2-Project (program for improving cognitive abilities in older employees) within a framework of INQA – New Quality of Work which was funded by Federal Ministry of Labour

#### REFERENCES


and Social Affairs, BMAS. The publication of this article was supported by the Open Access Fund of the Leibniz Association and by the Open Access Fund of the Technical University of Dortmund.

#### ACKNOWLEDGMENTS

We thank Dr. Thomas Koiky and Dieter Welwei for assistance in organizing the study, Claudia Wipking, Ines Mombrei, Christiane Westedt and Rita Willemssen for conducting the testing, Rita Pfeiffer for conducting the training and Ludger Blanke for developing the software and technical support.


flexibility assessed by task switching. Biol. Psychol. 85, 187–199. doi: 10.1016/ j.biopsycho.2010.06.009



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Gajewski, Freude and Falkenstein. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Spatial Frequency Training Modulates Neural Face Processing: Learning Transfers from Low- to High-Level Visual Features

Judith C. Peters 1,2\*, Carlijn van den Boomen3,4 and Chantal Kemner 3,4

<sup>1</sup>Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands, <sup>2</sup>Department of Neuroimaging and Neuromodeling, Netherlands Institute for Neuroscience, Institute of the Royal Netherlands Academy of Arts and Sciences (KNAW), Amsterdam, Netherlands, <sup>3</sup>Department of Experimental Psychology, Helmholtz Institute, Utrecht, Netherlands, <sup>4</sup>Department of Developmental Psychology, Utrecht University, Utrecht, Netherlands

Perception of visual stimuli improves with training, but improvements are specific for trained stimuli rendering the development of generic training programs challenging. It remains unknown to which extent training of low-level visual features transfers to high-level visual perception, and whether this is accompanied by neuroplastic changes. The current event-related potential (ERP) study showed that training-induced increased sensitivity to a low-level feature, namely low spatial frequency (LSF), alters neural processing of this feature in high-level visual stimuli. Specifically, neural activity related to face processing (N170), was decreased for low (trained) but not high (untrained) SF content in faces following LSF training. These novel results suggest that: (1) SF discrimination learning transfers from simple stimuli to complex objects; and that (2) training the use of specific SF information affects neural processing of facial information. These findings may open up a new avenue to improve face recognition skills in individuals with atypical SF processing, such as in cataract or Autism Spectrum Disorder (ASD).

#### Edited by:

Soledad Ballesteros, National University of Distance Education, Spain

#### Reviewed by:

Elvire Vaucher, Université de Montréal, Canada Joseph Allen Harris, Otto-von-Guericke University Magdeburg, Germany

#### \*Correspondence:

Judith C. Peters j.peters@maastrichtuniversity.nl

Received: 11 October 2016 Accepted: 03 January 2017 Published: 18 January 2017

#### Citation:

Peters JC, van den Boomen C and Kemner C (2017) Spatial Frequency Training Modulates Neural Face Processing: Learning Transfers from Low- to High-Level Visual Features. Front. Hum. Neurosci. 11:1. doi: 10.3389/fnhum.2017.00001 Keywords: ERP, face processing, spatial frequency, learning, neuroplasticity, ASD

# INTRODUCTION

Perception of visual stimuli improves with training, but is in general highly specific for the trained stimulus set or feature. For example, learning to distinguish individuals in one set of face identities does not transfer to other face identities (e.g., Hancock et al., 2000), or across emotional expressions (Calder et al., 2000). This is generally also true for low-level features: training improves performance on a wide range of perceptual tasks (see Fine and Jacobs, 2002; Watanabe and Sasaki, 2015 for review) including discrimination of orientation (e.g., Schoups et al., 2001), texture (Karni and Sagi, 1991), coherent motion (Watanabe et al., 2001) and spatial frequency (SF; Fiorentini and Berardi, 1980), but does not transfer to other stimulus dimensions (Yu et al., 2004), stimuli (Fahle, 2004) or visual field locations (e.g., Karni and Sagi, 1991; Shiu and Pashler, 1992).

However, it remains unknown to what extent training-induced improved sensitivity of low-level visual features (such as spatial frequency (SF), the number of black-to-white transitions in an image) transfers to complex stimuli. Here, we study whether improved sensitivity to Low SF (LSF) content, achieved by learning to discriminate black-white stripes (gratings), affects neural LSF processing in faces. LSF information in faces contains the pivotal global information necessary for proficient holistic face processing (Goffaux et al., 2005; Peters et al., 2013). In adult face perception, information carried by different SF bands is combined following a coarse-tofine sequence (Goffaux et al., 2011; see Ruiz-Soler and Beltran, 2006 for review). LSF conveys highly important coarse information (e.g., emotional expressions) that is first extracted, before more fine-grained High SF (HSF) information is examined for further facial cues (related to for example facial age; see LSF- and HSF-filtered faces in right panel of **Figure 1B**).

If training LSF sensitivity indeed enhances an optimized use of information in LSF content during face processing, such an approach may lead to new skill training development to improve (emotional) face recognition abilities. Such training could aid individuals with Autism Spectrum Disorder (ASD), who have a detrimental bias towards processing information conveyed by HSF over LSF ranges, resulting in hampered recognition of faces and emotional expressions (Deruelle et al., 2004, 2008; Vlamings et al., 2010). Although there are training programs available to improve face processing skills in children with ASD (Silver and Oakes, 2001; Tanaka et al., 2003), learning a particular set of faces does not transfer easily to other face identities, expressions or general context. Moreover, face learning is a slow process (Faja et al., 2008), compared to learning of low-level visual features such as SF (e.g., Fiorentini and Berardi, 1980; Huang et al., 2008). Finally, low-level feature learning effects are long-lasting, causing a neural reorganization in the visual system (Karni and Sagi, 1993; Schoups et al., 2001), essential to training programs aiming for long-term improvements in face perception skills.

The present study investigates whether such a learning transfer is feasible. To this end, we examined four unresolved questions related to SF training:


between gratings are specific for the trained SF range (Fiorentini and Berardi, 1980; Huang et al., 2008). In the same vein, we expect that training-induced LSF sensitivity will only affect processing of face images containing the trained SF band. Thus, we assume exclusive training effects for LSF- but not for HSF-faces in the present study.


visual areas (Ghuman et al., 2014), regardless whether the faces is presented in the left or right hemifield. Therefore, the N170 might be influenced by neuroplastic changes in lower visual areas in both hemispheres, and moreover, both left- and right-hemispheric N170 activity might be affected by LSF training. The right hemispheric lateralization of face perception (Ojemann et al., 1992) might however make the training effects more pronounced for N170 responses in the right hemisphere.

In sum, we expect that LSF training will improve LSF sensitivity, leading to skilled processing of LSF (but not HSF) content in faces. At the neural level, this is reflected by a reduced N170 after training for LSF faces presented in the trained hemifield. Such a training-induced reduction is not expected for HSF faces, or LSF faces presented in the untrained hemifield. Overall, our findings confirm our expectations, suggesting the following answers to the questions raised above: (1) Improved LSF sensitivity acquired by learning to discriminate SF variations in simple stimuli (gratings) does transfer to complex objects such as faces. (2) This increased LSF sensitivity exclusively modifies processing of LSF and not HSF information in faces. (3) At the neural level, such traininginduced modifications of LSF processing in faces are mirrored in reduced post-training N170 responses. (4) The observed N170 effect is specific for the trained retinotopic location (i.e., the N170 reduction only occurs for LSF faces presented in the trained hemifield).

# MATERIALS AND METHODS

## Participants

Twenty healthy adults (10 males; age 18–30) with normal or corrected-to-normal visual acuity participated in two ERP measurements and three (n = 13) or 4 (n = 7) psychophysical training sessions for financial compensation or as part of their Psychology curriculum. One participant did not complete the last session and the corresponding electroencephalogram (EEG) data were therefore excluded from further analyses. This study was carried out in accordance with the recommendations of the local ethics committee of the Faculty of Psychology and Neuroscience, Maastricht University with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

#### Experimental Procedure

**Figure 1A** illustrates the timeline of the experiment: on day 1, a baseline ERP measurement (''pre-training EEG'') was performed, in which the subject performed an emotion categorization task and oddball detection task on HSF and LSF faces presented in the left or right hemifield (see **Figure 1B** and below). Subsequently, subjects participated in 25-min sessions on day 2, 4 and 7 in which they trained LSF discrimination on grating stimuli presented in the left visual field (**Figure 1B**). Eight participants received the second training on day 5 instead of day 4. The SF difference between the target and reference gratings was adapted to the participant's performance (staircase tracking 84% accuracy), resulting in improved LSF sensitivity as subjects learned to discriminate very fine varieties in LSF content. Finally, a second EEG measurement (''posttraining EEG'') was carried out on day 8, in which subject performed the same tasks as in the pre-training EEG session. Task order in the EEG sessions was counterbalanced across participants.

All subjects were individually tested. They were comfortably seated in a dimly lit room shielded by a Faraday cage and monitored by cameras. Subjects were reminded throughout the sessions to maintain fixation at the middle of the screen (and limit unnecessary movements and eyeblinks during EEG recordings). Stimuli were presented at a 21<sup>00</sup> CRT screen, (1280 × 1024 × 32 screen resolution; refresh rate 75 Hz) using the Presentation software package (v. 12.1; Neurobehavioral Systems, San Francisco, CA, USA). Subjects viewed the stimuli at a distance of 106 cm, and viewing position was stabilized using a chin-rest.

# Stimuli and Tasks

#### LSF Discrimination Training

Participants trained SF discrimination in a four Alternative Forced Choice task, in which they had to indicate which of the four sequentially presented gratings had a different SF (Fine and Jacobs, 2000; see left panel of **Figure 1B**). Each trial began with a 500 ms presentation of a fixation cross at the middle of a blank screen. Then, a sequence of four black-white, square-wave gratings (4.6◦ ∗ 4.6◦ visual angle; 100% contrast) was presented at 4 ◦ eccentricity on the left horizontal meridian. Each grating had a random phase and was immediately followed by a randomly scrambled phase noise mask of the same size. Both stimuli were presented for 67 ms, followed by a 200 ms interval in which only the fixation-cross was present.

Importantly, three reference gratings had a SF of 2 cycles/degree of visual angle (cpa; reference SF), whereas the SF of the fourth grating (target grating) had a higher or lower SF (target SF), with higher or lower SF being randomly selected for each trial. This difference in SF varied across trials controlled by an adaptive staircase procedure targeting a discrimination accuracy of 84% (staircase step size 0.05% cpa; Wetherill and Levitt, 1965). The SF difference was 30% for the first trial of the first session. The starting levels for the subsequent sessions were determined based on the just noticeable difference (JND) achieved on the previous session (except for the first seven subjects who were tracked at 79, 84, 87 and 89% correct performance in session 1–4 respectively; this was corrected in the analyses by scaling obtained differences according to tracked performance). The presentation order of target and reference gratings was randomly selected for each trial. After presentation of the fourth grating and mask, a fixation cross was shown until participants responded (maximally for 1.5 s). Subjects were instructed to respond as fast and accurately as possible by pressing the keyboard keys ''1'' ''2'' ''3'' or ''4'' to indicate the target as the 1st, 2nd, 3rd or 4th grating, respectively. Participants received feedback on their response by brief (200 ms) coloring of the fixation cross (green for correct, red for incorrect or miss).

To improve task performance, we included 7% catch trials, in which two target and two reference gratings were presented. In this case, participants were required to press the spacebar. Catch trials were excluded from analyses and did not influence staircase accuracy. Note that at the beginning and throughout the session, subjects were instructed to maintain fixation at the fixation cross throughout the experiment. Training sessions lasted 25 min (400 trials) excluding self-paced breaks every 2 min.

#### EEG: Oddball Detection Task

We investigated the influence of SF training on neural processing of facial expressions using an oddball task. This orthogonal (i.e., unrelated to face perception) task ensured a continuous attention to the stimuli, yet enabled us to study face processing that occurs without any imposed task constraint that could bias facial perception. Sixty grayscale front-view photographs of Caucasian faces (50% male) with neutral (n = 30) or fearful (n = 30) expression, and four houses (odd-ball targets) served as stimuli. Face stimuli were selected from the NimStim Face Set (Tottenham et al., 2009) and subsequently trimmed to remove neck and hairline. Furthermore, all stimuli (5.4◦ ∗ 3.8◦ ) were equal in mean luminance and root mean square contrast and were presented on a homogeneous gray background of the same luminance. The SF content of each stimulus was unfiltered (broad-pass SF or BSF), or filtered with a high-pass (HSF; 6 cpa) or low-pass (LSF; 2 cpa) cut-off (see Peters et al., 2013 for details). Faces were presented at the same position as the gratings in the training task (trained hemifield) or at the mirror location in the opposite hemifield (untrained hemifield). Finally, 50 neutral faces (LSF or HSF filtered) with inverted (180◦ rotation) orientation were presented in each hemifield, in order to test effect of training on perception of inverted faces. All stimuli (50 trials per conditions) were in random order presented for 200 ms (Inter Stimulus Interval = 700–1100 ms) at the horizontal meridian at 4◦ eccentricity left or right of center. During the task, subjects were instructed to maintain fixation at the cross in the middle of the screen and press the spacebar as soon as a house was shown on the screen. Presentation of oddball trials (n = 32) was dispersed across the task (spacing between 11–19 stimuli). The inverted and fearful faces were presented (together with the neutral upright faces) in two separate, consecutive runs for three subjects, whereas all conditions were randomly presented

in one run for all other subjects. The total task lasted about 20 min.

#### EEG: Emotion Categorization Task

The emotion categorization task employed the same stimuli as the oddball detection task, excluding inverted faces and houses. In this task, however, a scrambled version of one of the (randomly selected) face stimuli was presented immediately after the stimulus mask in order to keep stimulus processing time identical between conditions. For mask creation, phase of the face images was scrambled in the Fourier domain via random permutation, which preserves orientation content (Dakin et al., 2002). The face and mask were presented for 150 ms each, after which a fixation cross was shown 800 ms. Subjects were instructed to indicate as fast and accurately as possible whether the face had fearful or neutral emotional expression by pressing the ''F'' or ''J'', respectively. Half of the subjects applied the reversed button order. Subsequently, participants received feedback on their response by brief (200 ms) coloring of the fixation cross (green for correct, red for incorrect, and the word ''faster'' for a missing response), followed by a 300 ms fixation cross. Each face was presented twice in each condition (60 trials per condition), resulting in 720 trials per session in total (360 trials per hemifield). Stimulus onset markers were not recorded in the post-training session of one subject, which missing values in the ANOVA were therefore replaced with the condition mean. The total task lasted about 20 min, including five short, self-paced breaks.

# EEG Recording

The EEG was recorded (sampling rate 500 Hz; band-pass filter of 0.01–200 Hz) from 35 AgCl scalp electrodes (extended International 10/20 system; Easycap, BrainProducts) with reference electrodes placed at the mastoids. Signals were collected using the left mastoid as reference and re-referenced off-line to the average activity of all electrodes. Horizontal and vertical electrooculograms (EOG) were recorded with bipolar electrodes placed at the external canthi and above and below the left eye. Electrode impedance was kept below 5 kOhm for all electrodes.

# Data Analysis

#### Behavioral Data

Individual performance thresholds on SF discrimination were estimated for each training day. JNDs were computed as the geometric average of the last 14 reversal points in the staircase (Wetherill and Levitt, 1965). The first session JND of one subject with insufficient reversal points was replaced by group mean JND. Improvement of LSF sensitivity was assessed by contrasting normalized JNDs of the first and third session of all subjects with a paired t-tested.

Reaction times of the emotion categorization task were filtered (i.e., responses below 350 ms after stimulus onset and outliers 3 standard deviations below or above condition mean were excluded) before entering the data into a repeated-measures ANOVA with SF (LSF, HSF), emotion (fear, neutral), hemifield presentation (trained, untrained stimulus position) and time (pre-training, post-training) as within-subject factors. Finally, to assess task performance, we computed d-primes indexing changes in the sensitivity of fearful facial expression detection. D-primes (d 0 ) were subjected to repeated-measures ANOVA with SF (LSF, HSF), hemifield presentation (trained, untrained stimulus position) and time (pre-training, post-training) as within-subject factors. Post hoc paired t-tests were Bonferroni corrected.

#### EEG Analyses

EEG data were epoched (−200 to 900 ms, relative to stimulus onset), band-pass filtered (0.01–30 Hz; and 50 Hz Notch filter) and baseline corrected (200 ms pre-stimulus interval) using Vision Analyser (Brain Products GmbH., Munich, Germany). Artifacts from horizontal eye movements and blinks were reduced with the algorithm of Gratton et al. (1983). Trials with artifacts (i.e., samples exceeding ±75 µV, a change in voltage of 50 µV per ms, or a difference of 200 µV per 200 ms) were excluded from subsequent analyses.

For each subject-specific averaged EEG epoch of a condition, N170 peak latency and amplitude at maximal negative amplitude between 140 and 230 ms after stimulus onset were extracted for electrode PO7 (right) and PO8 (left hemisphere). In addition, mean N170 amplitudes were extracted to analyze the mean amplitude (178–182 ms) in the emotion categorization task and upward N170 slope (140–170 ms) in the oddball detection task. We opted to analyze mean amplitudes rather than subjectand condition-specific peak amplitudes to avoid averaging distortions by the trial-to-trial latency jitter in the emotion category task (e.g., Luck, 2014). Furthermore, the analysis of the upward slope was not planned a priori, but based on potential differences in the grand averages. Amplitudes and latencies were submitted to separate repeated-measures ANOVAs with SF (LSF, HSF), hemifield presentation (trained, untrained stimulus position) and time (pre-training, post-training) as within-subject factors. Note that we averaged across emotion (fear, neutral) for peak analyses to reduce the number of factors, since a first set of analyses did not show any interactions between emotion, SF and time. All ANOVA results were Greenhouse–Geisser corrected (but uncorrected degrees of freedom are reported) and were performed in SPSS 24 (SPSS INC, Chicago, IL, USA). Main effects and interactions that are not reported did not reach significance.

# RESULTS

# LSF Discrimination Training

Participants improved LSF sensitivity across training sessions, as indicated by a lower JND in the third compared to first training session (t(18) = 4.08; p = 0.0007). **Figure 2** shows this gradual decrease in required SF difference across concatenated sessions.

On average, participants could detect more than twice as small differences between the SF of the target and reference gratings in the third (mean JND SF difference = 25.4%) compared to first

(mean difference = 67.8%) session. The fourth session (n = 7) did not seem to result in further learning, as suggested by no further decrease of JND in the fourth compared to third session (t(6) = 0.88; p > 0.4).

Likewise, reaction times decreased from 674 (SE = 32) in the first to 576 (SE = 40) ms in the third session (t(18) = 2.31; p = 0.03). Furthermore, RTs in the third and fourth session did not differ (t(6) = 0.95; p > 0.3), suggesting that learning plateaued in the fourth session.

#### EEG: Oddball Detection Task

Task performance on the oddball task was excellent, with a mean accuracy of 96.8% (SE = 0.9). Accuracy did not differ between pre- and post-training (t(18) = 0.43; p = 0.8). One subject had only 80% accuracy in the post-training session and was therefore excluded from subsequent analyses for this task.

N170 peak latencies were faster in pre- compared to post-training (F(1,17) = 12.5, p = 0.003) for electrode PO7, whereas no main effects or interactions were present for peak latencies at PO8 (**Figure 3A**).

Analyses of N170 peak activity revealed an interaction between time and SF, which was significant for electrode PO7 (F(1,17) = 5.3, p = 0.03) and a tendency at PO8 (F(1,17) = 3.4, p = 0.08). Post hoc comparisons for PO7 did not survive Bonferroni correction. However, planned comparisons revealed the expected time<sup>∗</sup> SF interaction per hemifield at electrode PO8: the N170 tended to be higher for LSF than HSF faces in the trained (left) hemifield before (t(17) = 2.1, p = 0.05) but not after (p > 0.5) training. Such an effect was not present for stimuli presented in the untrained (right) hemifield (p's > 0.1). Notably, this differential learning effect between LSF and HSF faces presented in the trained hemifield is already present in the upward slope of the N170 at PO8: mean activity was higher for LSF than HSF faces in the trained hemifield before (t(17) = 2.3, p = 0.03) but not after (p > 0.3) training. Such an effect was not present for stimuli presented in the untrained hemifield (p's > 0.2).

#### EEG: Emotion Categorization Task

Participants performed the task fast (632.9 ms; SE = 19.7 ms) and accurately (mean accuracy of 74.7%; SE = 2.6%; mean d <sup>0</sup> = 1.6; SE = 0.14). Reaction times revealed an interaction between emotional expression and SF (F(1,18) = 13.0, p = 0.02). The only post hoc comparison that survived Bonferroni correction revealed that LSF neutral faces were on average 16 ms faster recognized than HSF neutral faces (t(18) = 3.0; p = 0.08).

No main effects or interactions were observed for d 0 : although the training-induced increase in sensitivity was twice as high for LSF compared to HSF faces (d <sup>0</sup> post- minus pre-training difference for LSF = 0.19 for HSF = 0.09), variance was too high to obtain significant differences.

As illustrated by **Figure 3B**, N170 peak latency at PO8 was shorter for LSF faces compared to HSF faces (F(1,18) = 4.6, p = 0.046). Furthermore, stimuli presented in the trained hemifield were faster processed than stimuli in the untrained hemifield (F(1,18) = 64.3, p < 0.001). However, SF, time and hemifield showed no interactions. In contrast, peak latencies at PO7 were faster for left (trained) compared to right hemifield (F(1,18) = 84.4, p < 0.001), but this effect did not interact with time, nor were any other effects observed.

For electrode PO8, mean activity in the N170 window was affected by hemifield (F(1,18) = 25.0, p = 0.0001), SF (F(1,18) = 17.1, p = 0.001) and interactions between hemifield and SF (F(1,18) = 5.6, p = 0.03), time, hemifield and SF (F(1,18) = 7.3, p = 0.015) and a tendency for an interaction between time and SF (F(1,18) = 4.1, p = 0.06). To interpret the two- and three-way interactions, we performed additional ANOVAs per hemifield with SF and time as factors. Results showed that ERPs elicited by stimuli presented in the untrained hemifield were affected by SF content of the face image (F(1,18) = 17.7, p = 0.01), but no effects of time. In contrast, stimulus presentation in the trained hemifield was influenced by an interaction between SF and time (F(1,18) = 10.8, p = 0.04): compared to pre-training, mean activity in the post-training was reduced for LSF faces (t(18) = 2.2; p = 0.045) but not for HSF faces (t(18) = 0.5; p > 0.5). In sum, results showed that LSF training reduced neural activity in the N170 window, but only for stimuli with LSF content presented in the trained hemifield. This selective influence of training is reflected in **Figure 3C** showing mean differential activity as a function of hemifield presentation and SF content.

For mean N170 activity at electrode PO7, we observed an interaction between hemifield and SF (F(1,18) = 9.7, p = 0.06). Post hoc tests revealed that activity in the untrained hemifield was smaller for LSF than HSF faces (t(18) = 3.7; p = 0.002), whereas activity did not differ in the trained hemifield (t(18) = 0.7; p > 0.1).

In sum, the N170 in the right hemisphere was reduced in the post-training compared to pre-training session. Notably, this

decreased processing was only observed for faces containing LSF information. Moreover, this learning effect was not only specific for trained SF, but also for trained location: the learningrelated decrease in LSF processing was only present when faces were presented at the same location as the gratings in the LSF discrimination training sessions. In contrast, no differences between pre- and post-training ERP were observed for LSF faces presented in the untrained hemifield.

# DISCUSSION

For fast and proficient face processing, facial cues conveyed by information in the LSF range are essential (Goffaux et al., 2005, 2011). Improving LSF processing might therefore increase face processing abilities. Our results showed that training-induced improvement in LSF discrimination of low-level stimuli indeed transfers to LSF processing in faces, which is accompanied by enduring changes at the neural level.

Participants learned to discriminate increasingly small SF variations in LSF gratings in a discrimination task. After only three training sessions (25 min., 400 trials), the JND between target and reference SF dropped from ∼68% to ∼25%, indicating a fast and strong increase in LSF sensitivity. Interestingly, this improvement in LSF perception was neurally reflected by a decrease in N170 amplitude. This reduction was exclusively observed for LSF faces in the trained hemifield in the post-training emotion categorization task. This is in line with psychophysical observations that SF learning is restricted to trained SF range and retinotoptic location (Fiorentini and Berardi, 1981). Similar learning-specific effects were also present in an oddball task, yet less pronounced. This discrepancy could result from several differences between the categorization and oddball detection task. Fast and accurate categorization of emotional expressions required more intensive processing than the passive perception in the oddball task, which could underlie the more pronounced expression of LSF training effects. That is, whereas LSF content is important for proficient, configural processing in general, it is known to play an even more pivotal role in assessing emotional expressions (Vlamings et al., 2009). Moreover, the categorization task put a high demand on attentional resources, since emotional expressions had to be correctly identified within 150 ms. This higher demand on attention resources might have contributed to lateralization towards the right hemisphere in the categorization task (Heilman and Van Den Abell, 1980), compared to the more distributed effects in the easy oddball task. Increased attentional processing might have also boosted neural face (LSF-) processing in the categorization task. The higher amplitude of the N170 in the categorization compared to the detection task (**Figure 3**), despite being elicited by identical face stimuli, corroborates the idea that attention differences might play a role in the observed task differences.

To our knowledge, there are only a few studies that investigated perceptual learning with EEG and none of them studied the transfer of learning effects to other stimulus categories. The majority of studies reported a decreased occipital N1 across sessions of training in line discrimination (Song et al., 2002; Qu et al., 2010). More complex results were observed in a visual texture segmentation task, showing that learning-related Visual Evoked Potential decreased for stimulus configurations where global and local orientations conflicted, but not for conflict-free configurations (Casco et al., 2004). Our reduced early neurophysiological activity is in agreement with previous findings using electrophysiology (Yang and Maunsell, 2004) and fMRI (Zhang et al., 2010) suggesting that learning narrows the tuning-curves of feature-selective visual neurons. A steeper slope of the tuning curve changes the neuron's discrimination threshold, resulting in a sparser response (i.e., reduced number of responding neurons) at the neural population level, culminating in reduced ERPs. Such a tuning mechanism is a likely candidate to explain the reduced LSF processing in face images, resulting from the improved LSF sensitivity induced by LSF discrimination training.

Peak latencies and amplitudes were differently affected. In line with previous results (Goffaux et al., 2005; Flevaris et al., 2008; Peters et al., 2013), we observed earlier N170 peaks for LSF compared to HSF faces. This effect cannot be driven by other low-level stimulus differences, as contrast and luminance were equalized between LSF and HSF faces. Rather, this effect indicates that facial information in LSF ranges is processed faster than those in HSF ranges. Neuroimaging studies in adults suggest that LSF content in faces is not only processed faster, but also processed via different neural pathways than HSF content (Vuilleumier et al., 2003; Rotshtein et al., 2007). That is, LSF information travels via the middle occipital gyrus to an area in the fusiform gyrus specialized in face processing (the so-called fusiform face area; Kanwisher et al., 1997), where it converges with HSF information coming from inferior occipital and temporal areas (Rotshtein et al., 2007). This differential processing might be a continuation from the distinct magnoand parvocellular pathways running from retina to early visual areas which are specialized for processing coarse and fine details respectively (e.g., De Valois et al., 1982; Hess, 2004). Peak latency was not influenced by LSF training, corroborating previous findings suggesting that learning-related adaptations are reflected in reduced visual activity rather than faster processing in visual cortex (Song et al., 2002; Casco et al., 2004).

In sum, our results show—to our knowledge for the first time—that training effects based on an orthogonal task using low-level stimuli, transfer to a higher-level object processing task. That is, the training employed a fundamentally different stimulus type (gratings) and task (LSF discrimination) than the experiment in which we observed the transfer

#### REFERENCES


effects (face images in an oddball detection and emotion categorization task). The present study only investigated face perception, but LSF training effects might transfer to other stimuli as well. Although adequate LSF processing is particularly important for holistic face perception, it might aid configural object processing in general. Similar to our face perception expertise, acquired expertise on other object classes appears to be guided by holistic processing (e.g., Richler et al., 2011) and proficient use of LSF information (Viggiano et al., 2006). Further research could investigate whether the current LSF learning paradigm may transfer to other object classes for which configural processing is important.

Interestingly, SF learning affected neural face processing after training was finished, suggesting that training effects caused a long-lasting neural reorganization. These findings can have important implications for treatment of atypical vision. Various conditions, such as ASD (Deruelle et al., 2004, 2008; Vlamings et al., 2010), pervasive developmental disorder (Boeschoten et al., 2007) and cataract (Ellemberg et al., 1999) are associated with deteriorated HSF and/or LSF processing. Our results suggest that the neural LSF and HSF processing pathways in such individuals can be optimized by SF discrimination training, resulting in improved processing of the SF ranges that convey the most important information (e.g., LSF content in faces). SF (and orientation) decomposition is a fundamental step in vision, affecting all further visual processing stages. Improving such a cardinal aspect of vision could constitute a highly generic training approach that might complement existing specific face training programs in promoting face processing skills in atypical development.

#### AUTHOR CONTRIBUTIONS

JP and CK conceived and designed the experiments. JP and CB: data acquisition and analyses. JP, CB and CK wrote the manuscript.

#### ACKNOWLEDGMENTS

We thank Joel Reithler for comments on the manuscript. This research was supported by an NWO-VICI (453-07-004) grant to CK.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Peters, van den Boomen and Kemner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Improving Dorsal Stream Function in Dyslexics by Training Figure/Ground Motion Discrimination Improves Attention, Reading Fluency, and Working Memory

#### Teri Lawton1,2 \*

<sup>1</sup> Perception Dynamics Institute, Del Mar, CA, USA, <sup>2</sup> Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA

There is an ongoing debate about whether the cause of dyslexia is based on linguistic, auditory, or visual timing deficits. To investigate this issue three interventions were compared in 58 dyslexics in second grade (7 years on average), two targeting the temporal dynamics (timing) of either the auditory or visual pathways with a third reading intervention (control group) targeting linguistic word building. Visual pathway training in dyslexics to improve direction-discrimination of moving test patterns relative to a stationary background (figure/ground discrimination) significantly improved attention, reading fluency, both speed and comprehension, phonological processing, and both auditory and visual working memory relative to controls, whereas auditory training to improve phonological processing did not improve these academic skills significantly more than found for controls. This study supports the hypothesis that faulty timing in synchronizing the activity of magnocellular with parvocellular visual pathways is a fundamental cause of dyslexia, and argues against the assumption that reading deficiencies in dyslexia are caused by phonological deficits. This study demonstrates that visual movement direction-discrimination can be used to not only detect dyslexia early, but also for its successful treatment, so that reading problems do not prevent children from readily learning.

#### Edited by:

Soledad Ballesteros, National University of Distance Education, Spain

#### Reviewed by:

Elvire Vaucher, Université de Montréal, Canada Trichur Raman Vidyasagar, University of Melbourne, Australia

#### \*Correspondence:

Teri Lawton tlawton@pathtoreading.com

Received: 14 March 2016 Accepted: 25 July 2016 Published: 08 August 2016

#### Citation:

Lawton T (2016) Improving Dorsal Stream Function in Dyslexics by Training Figure/Ground Motion Discrimination Improves Attention, Reading Fluency, and Working Memory. Front. Hum. Neurosci. 10:397. doi: 10.3389/fnhum.2016.00397 Keywords: attention networks, reading remediation, cortical plasticity, perceptual learning, improving dorsal stream function, neural timing, dyslexia development, figure/ground motion discrimination

# INTRODUCTION

Dyslexia is a multifaceted reading disability that encompasses both pronunciation-based and visual processing-based reading issues (Stein, 2001) that is characterized by severe reading and spelling problems (Vellutino et al., 2004). Reading difficulties, including people having dyslexia and attention deficits, are prevalent in the United States (US) where 65% of fourth graders and 62% of 12th graders are not proficient in reading (National Center for Educational Statistics, 2013). Previous studies have shown that reading difficulties in many children may indeed be prevented through early intervention (Schatschneider et al., 2004). Identification of the cognitive skills that predict subsequent reading ability can help identify children at risk for reading problems, and following appropriate training reduce the severity of their symptoms (Kevan and Pammer, 2009). Since motion detection deficits in pre-reading children predict who will develop reading problems (Boets et al., 2011), it is likely that a task to improve motion discrimination, and thereby timing, in either the auditory or visual domain will remediate reading problems, a key question addressed by this study.

Slow reading speeds are a hallmark of dyslexia (Lyon et al., 2003; Nicholson and Fawcett, 2007). Children with dyslexia are reported to have some combination of spatial (Lovegrove et al., 1980; Cornelissen et al., 1995; Stein and Walsh, 1997; Lawton, 2000, 2007, 2008, 2011; Talcott et al., 2000; Hansen et al., 2001; Stein, 2001) and/or temporal (Stanley and Hall, 1973; Bradley and Bryant, 1983; Tallal et al., 1993; Temple et al., 2003) sequencing deficits. These spatial and temporal sequencing deficits cause the letters in the words and the words on the page to appear distorted, displaced, or crowded together (Atkinson, 1991), often resulting in eyestrain and headaches (Wilkins, 1995). These spatial and temporal sequencing deficits, found when images are rapidly presented or moving, have been hypothesized to result from neural timing deficits associated with sluggish magnocellular neurons (Livingstone et al., 1991; Stein and Walsh, 1997; Vidyasagar, 1999, 2001, 2012; Lawton, 2000, 2007, 2008, 2011; Stein, 2001; Vidyasagar and Pammer, 2010; Boets et al., 2011), causing deficits in integration of information between magnocellular (''where'') and parvocellular (''what'') neurons. A normally functioning magnocellular pathway is sensitive to low-contrast achromatic patterns (Kaplan and Shapley, 1986; Sclar et al., 1990). All dyslexics exhibit high contrast thresholds for discriminating the direction of moving patterns against stationary background patterns (Lawton, 2000, 2007, 2011; Ridder et al., 2001).

# Visual Timing (Magnocellular) Deficits in Dyslexics

Receiving predominantly magnocellular input (Livingstone and Hubel, 1988; Maunsell et al., 1990; Merigan and Maunsell, 1993), the dorsal stream, specialized for processing the movement and location of objects in space (Ungerleider and Mishkin, 1982; Livingstone and Hubel, 1988; Felleman and Van Essen, 1991), projects from the primary visual cortex (V1), through visual area medial temporal cortex (MT), and on to the posterior parietal cortex (PPC), a selective spatial attention area (Posner et al., 1984). This is in contrast to the ventral stream which receives both magnocellular and parvocellular inputs as it projects from V1 through area V4 and on to the inferotemporal (IT) cortex, an area specialized in extracting details relating to an object's shape and color (Ungerleider and Mishkin, 1982; Livingstone and Hubel, 1988; Felleman and Van Essen, 1991).

Dyslexics have magnocellular responses that were found to be 20–40 ms slower than typically developing observers (Livingstone et al., 1991), being 2–4 fold slower than the normal magnocellular lead time of 10 ms (Dreher et al., 1976). Some investigators hypothesize that in dyslexics a lack of synchronization in timing between magnocellular and parvocellular activations may prevent effective sequential processing, pattern analysis, and figure/ground discrimination, and hence impede development of efficient reading and attention skills (Stein and Walsh, 1997; Vidyasagar, 1999, 2001, 2012; Lawton, 2000, 2007, 2008, 2011, 2015; Stein, 2001). It is further possible that the dyslexic reader's deficit in attentional focus (Vidyasagar, 1999, 2001; Facoetti et al., 2000, 2010; Solan et al., 2001) is another consequence of sluggish magnocellular neurons, preventing the linked parvocellular neurons from isolating and sequentially processing the relevant information needed for reading (Vidyasagar, 1999, 2001; Vidyasagar and Pammer, 2010), and not from the information overload as proposed previously (Stuart et al., 2001).

Visual timing deficits resulting from sluggish magnocellular (motion-sensitive) neurons in the dorsal stream are likely to be highly involved in the dyslexic's reading deficits (Stein and Walsh, 1997; Vidyasagar, 1999, 2001; Lawton, 2000, 2007; Stein, 2001; Gori et al., 2014). Convergent evidence suggests that many dyslexic readers demonstrate impairments in tasks that require dorsal stream involvement. Dyslexics have been found to have deficits in motion perception at: (1) the retinal level (Tyler, 1974) when measured using the frequency doubling illusion (Pammer and Wheatley, 2001; Buchholz and McKone, 2004; Kevan and Pammer, 2009; Gori et al., 2014); (2) V1 measured using Visual Evoked Potentials (VEPs; Livingstone et al., 1991; Shelley-Tremblay et al., 2011); (3) V1 and MT using both fMRI brain imaging (Eden et al., 1996; Demb et al., 1998) and psychophysical tasks of movement discrimination relative to a stationary background (Lawton, 2000, 2007, 2011); (4) MT using motion coherence for direction discrimination (Cornelissen et al., 1995; Talcott et al., 2000; Hansen et al., 2001; Boets et al., 2011); (5) the lateral intraparietal cortex (LIP) and Frontal Eye Fields (FEF), anterior cortical areas activated by saccades, based on saccade and antisaccade training tasks (Fischer, 2012); and (6) parietal structures, prefrontal language systems, cerebellum, and basal ganglia (Nicholson and Fawcett, 2007). These results are consistent with the suggestion of a relationship between dorsal stream processing and reading ability, such that poor dorsal processing relates to slower timing and poor reading skills. This study demonstrates that when a figure/ground motion discrimination paradigm is used, then poor reading skills are not only associated with poor visual dorsal stream functioning, but also can be remediated rapidly by training designed to improve dorsal stream function.

The degree to which dorsal stream deficits play a causal role in reading failure has yet to be established (Boden and Giaschi, 2007; Kevan and Pammer, 2009). Previous results indicate that there is a relationship between dorsal stream sensitivity and reading skill found in both pre-kindergarten children before reading is learned (Kevan and Pammer, 2009) and after the emergence of reading in children (Boets et al., 2011) and adults. Intervention studies targeting dorsal stream function need to be carried out in order to establish a direct causal link from dorsal stream functioning to reading skill (Kevan and Pammer, 2009). It is possible that since visual movement-discrimination training, designed to improve dorsal stream function, caused the reading speeds of dyslexic children to increase up to 10 times faster (Lawton, 2011), training dorsal stream function may be essential for developing not only reading fluency, but also the attention networks. Therefore, this study will not only measure reading fluency, but also measure attention and both visual and auditory working memory, for the first time, using standardized tests to demonstrate the range of cortical areas affected by training aimed at improving function in the V1-MT dorsal stream areas.

The novel question addressed by this study is whether improving neural timing in the dorsal stream (by improving magnocellular function) improves reading fluency more when training is in the auditory domain to improve auditory timing (language-based), or is in the visual domain using a visual motion direction-discrimination task (improving visual timing), when compared to a traditional reading intervention, using linguistic word building that does not specifically target neural timing. The intervention we used to improve auditory timing lengthens the individual phonemes so that phonological processing improves, the length of the phonemes decreasing as the training progresses. Motion directiondiscrimination training, on the other hand, measures the contrast needed for figure/ground discrimination of sinewave gratings moving left or right relative to a stationary background. These backgrounds increase task complexity by increasing the number of background spatial frequencies, background contrast, thereby activating more parvocellular neurons, with left-right movement increasing in speed as the training progresses. The motion direction-discrimination training patterns, vertical sinewave gratings (**Figure 1**), are designed to differentially activate motion-sensitive (magnocellular) neurons in the V1-MT network (Allman et al., 1985; Felleman and Van Essen, 1991; De Valois et al., 2000) relative to pattern-sensitive (parvocellular) neurons, thereby being an effective training stimulus to improve magno-parvo integration deficits at both early and higher levels of motion processing. Unlike the motion direction-discrimination training paradigm used in this study, direction-discrimination using motion coherence of random dots, differentially activates motion-sensitive neurons only in MT and at higher processing levels (Zohary et al., 1994; Braddick et al., 2001). Deficits in detecting motion coherence are rarely found in all individuals in a dyslexic sample (e.g., Talcott et al., 2013). Moreover, improvements using motion coherence direction-discrimination (Solan et al., 2004) have not been shown to be as effective a training paradigm to improve reading speed as found previously using direction-discrimination of dim vertical bars moving relative to a stationary textured background (Lawton, 2000, 2011).

This study explored the hypothesis that if sluggish magnocellular neurons underlie dyslexia, then training to improve the sensitivity and timing of magnocellular relative to parvocellular processing should improve reading fluency and attention. This study investigated whether improving dorsal stream function is more effective in remediating reading, attention, and memory problems when the intervention training improves timing in the auditory or visual domain, compared to linguistic methods for improving phonological processing. Since both the auditory timing and linguistic interventions required responses chosen from a larger number of possible responses, it is likely they require more frequent use of selective attention than the visual direction-discrimination task, providing good comparison interventions to determine whether visual training in the dorsal stream is the most effective type of training to improve attention.

# MATERIALS AND METHODS

Only students who were diagnosed as being dyslexic by the Decoding Encoding Screener for Dyslexia (DESD), based on single word decoding (word identification) and encoding (spelling and writing phonetically), participated in this study. The DESD, standardized by Western Psychological Services in Los Angeles, CA, USA was clinically validated using the Woodcock-Johnson standardized reading tests (Guerin et al., 1993), as well as the Gray Oral Reading Test (GORT) and spelling subtest of the Wide Range Achievement Test (WRAT; Handford and Borsting, 2015). In this study, the severity of dyslexia, most having borderline or mild dyslexia: (scored as 1: above normal, 2: normal, 3: borderline dyslexia, 4: mild dyslexia, 5: moderately severe dyslexia, and 6: markedly below normal) was determined by combining each student's dyseidetic score from 1 to 6 (spelling problems) and dysphonetic score from 1 to 6 (pronunciation problems) on the DESD. Matched samples were created by ordering students by their severity of dyslexia using this combined score, and then randomly assigning this ordered list into one of the three groups, either the control group or one of the two treatment groups, or in year-round schools into one of two groups, control and visual direction-discrimination training.

Dyslexic second graders (7 years old) were trained on three different reading interventions to improve: (1) auditory timing; (2) visual timing; or (3) linguistic word building. Second graders were studied since they are in the middle of the developmental period to learn direction discrimination (Lawton, 2000, 2007, 2008; Wolf et al., 2000), maximizing the ease of learning this task. This study was conducted in six elementary schools in San Diego Unified School District (SDUSD), where 50% of the students were reading below proficiency, as revealed by the California Standardized Tests available on each school's website. The students spoke English fluently. This study was carried out in accordance with the recommendations of both the University of California San Diego (UCSD) IRB and The Research and Reporting Department at SDUSD with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

This study examined 58 children in second grade, 7.4 ± 0.4 years of age, 49% girls and 51% boys. The number of participants in Traditional Schools (TS) was seven control, six visual timing and six auditory timing students; and in yearround schools was 19 control and 20 visual timing students. Even though participants were randomly assigned from the ordered



Baseline values in Table 1 are mean ± SD for age and scores on tests graphed in Figure 2.

list of DESD scores into either two (year-round schedules) or three groups (TS schedules), the groups were balanced on age and baseline reading rate, attention and working memory, as shown in **Table 1**. The ethnic distribution for students in: (1) TS was 48.6% Caucasian, 25.7% Hispanic, 8.6% African American, and 17.1% Asian; and (2) Year-Round schools was 33.3% Caucasian, 28.2% Hispanic, 10.3% African American, and 28.2% Asian. These ethnicities were distributed equally among groups.

So that the training could be done by one Research Assistant (RA) for each 1–2 second graders, 40 UCSD undergraduate RAs were trained extensively at the beginning of the school year. The RAs were in charge of administering the standardized tests and reading interventions. The standardized tests were administered in a masked manner; RAs did not know whether the student was in a control or treatment group, thereby removing the possibility of experimenter effects. Moreover, parents: (1) were not aware of what group their child was in; and (2) thought that the auditory or linguistic interventions would work more effectively, since these were the traditional interventions advertised to improve reading. This fact, though anecdotal, would suggest that if any parent expectancy effects were in play, they would work against the efficacy of the visual direction-discrimination intervention. Stickers were given to students at the end of training each day for good behavior and completing the intervention correctly to reward them for paying attention to the task. Motivational strategies were used to keep the participants on task.

#### Experimental Design

The study was conducted for 20 weeks in four traditional schedule and 2-year-round public elementary schools in SDUSD in the morning, right before guided reading in the classroom, so that each student had plenty of practice on reading following the interventions. Twenty weeks of training was longer than used in previous studies of direction-discrimination training to improve reading fluency (Lawton, 2000, 2007, 2008, 2011), but was the minimum needed for the auditory timing training to be effective (Ostarello and De Ley, 2009). Controls were students who stayed in the classroom doing linguistic word building, when students in the treatment groups were pulled to do the visual or auditory timing interventions for 30 min, either 3 days a week for visual timing training or 5 days a week for auditory timing training. The linguistic word building intervention was a reading intervention provided by SDUSD, one aimed at improving phonics, decoding, vocabulary, reading comprehension, and reading fluency by word building exercises. Visual directiondiscrimination was trained for a total of 20–30 h, depending on the time needed to complete 20 contrast thresholds, compared to 50 h of training on auditory phonemic processing. Auditory timing training was only done in schools having a traditional schedule, requested by those implementing the auditory timing training, since this training was found to regress in effectiveness when 4-week long vacations occur during the intervention training (Ostarello and De Ley, 2009), as occurs in the year-round schools.

At each school, the interventions were administered in a room devoted to this task on 13<sup>00</sup> Macbook Pro computers purchased for this study. The computers were calibrated at the beginning of the school year with a Spectra Pritchard 1980A photometer to increase luminance and contrast linearly. The mean luminance was set to 125 cd/m<sup>2</sup> on all computers by reducing the brightness of the screen 2–3 levels. The screen brightness, volume control, and date were checked each day before beginning visual timing training. Students sat an arm's length from the screen, about 57 cm.

# Standardized Tests

Standardized tests of reading fluency, phonological processing, attention, and working memory were administered one-on-one to every student in the study before and after the intervention training by trained UCSD undergraduate students. These tests were chosen as the ''Gold Standard'' for fast and accurate measurements of fluency-based reading, attention, and working memory skills. The standardized tests which took about 1.5 h to administer were:


was used to measure reading speeds. Reading speed, measured in words/minute using a double staircase procedure, was not limited by the child's rate of speaking, as is the case for the GORT below. In addition, words/minute is a much higher resolution scale, than the 1–5 scale used to score reading rates on the GORT-3 below;


## Interventions

#### Auditory Timing Intervention: FastForWord

The auditory timing intervention, implemented using FastForWord (FFW), developed by Scientific Learning Corporation, is designed to improve phonological processing by lengthening the phonemes until they are perceived accurately. The auditory timing intervention focuses on the building of auditory processing and oral language skills important for reading, by using acoustically-modified, digitally-generated speech: (1) frequency modulated tonal sweeps; (2) speech syllables with parametric modifications of temporal features, so format transitions were lengthened and intensified, or silent gaps were lengthened; (3) word exercises using human speech that was either parametrically modified in the temporal domain to teach students word ID, word matching, or following directions; and (4) phrases and sentences with increasingly complex grammatical structures to develop higher-level language skills, including phonology, morphology, syntax, grammar, and short-term memory. The stimuli changed adaptively, increasingly approximating normal speech, until the final levels, when normal speech was presented. This computer-based intervention is designed to build language and reading skills while strengthening the cognitive skills of memory, attention, processing, and sequencing. This auditory timing intervention was composed of seven exercises, three being done during each half-hour training session. To keep each participant challenged, but not frustrated, the exercises adapt so that the participant is successful around 80% of the time. Detailed reporting is provided to help instructors track participant performance, with alerts that let the instructor know when a participant is ready to move to the next product or when a participant is struggling and needs extra help. Individual interventions were provided for subjects who had difficulty progressing through the levels of the exercises to enable them to complete the exercises. For some subjects, occasionally the screen had to be covered, since the graphics prevented the subject from concentrating on the subtle auditory discriminations required to do FFW training. FFW was trained for 30 min, 5 days a week for 20 weeks.

#### Visual Timing Intervention: Motion Direction-Discrimination Training

The visual timing intervention, implemented using PATH to Reading, developed to remediate dorsal stream function, consisted of motion direction-discrimination training. This novel intervention (Lawton, 2000, 2007, 2008, 2011, 2015), developed by the author, and available commercially<sup>1</sup> , will be described in sufficient detail to understand its basic components. The student sat in front of a computer monitor with a display similar to the ones in **Figure 1**. During the presentation, the bars in the ''fish-shaped'' window in the center of the screen formed by a sinusoidal grating, moved left or right very briefly (450 ms). The student reported which way the center pattern moved by pushing the left or right arrow key. A brief tone was presented after incorrect responses. The program adaptively changed the contrast of the test pattern in order to keep the student at 79% correct. There were also levels of difficulty introduced by making the background pattern more similar to that in the fish, by increasing the pattern's complexity level, and by increasing the number of directions of movement from one to two directions of motion.

The patterns used for this visual direction-discrimination task were designed to be optimal for activating magnocellular neurons (moving test pattern) relative to parvocellular neurons (stationary background; Lawton, 2000, 2007, 2011). In a given staircase run, the center spatial frequency (i.e., the test frequency) was either 0.25, 0.5, 1, or 2 cyc/deg. The surround grating spatial frequency was either equal to the test frequency or 1–2 octaves higher or lower than the test frequency, e.g., see **Figure 1B**. In addition to these simple backgrounds, multifrequency backgrounds were used, where the first background frequency equaled the spatial frequency of the single frequency background, having two additional background frequencies with a difference frequency equal to the test frequency. For example, the multifrequency backgrounds in **Figure 1C** are for a test frequency of 1 cyc/deg and a background frequency of 1 cyc/deg + 2 cyc/deg + 3 cyc/deg.

At the start of a session, both the test and background gratings were set to 5% contrast to ensure that the contrast of the test pattern was in the middle of the magnocellular contrast range (Kaplan and Shapley, 1986). Each time the child correctly identified the direction the fish stripes moved, the contrast of the test grating was lowered until the child answered incorrectly. Following the first incorrect response, a double-staircase procedure was used to estimate the directiondiscrimination contrast threshold, which allowed measuring the contrast sensitivity, defined as the reciprocal of the contrast threshold times 100. This staircase procedure estimates the contrast needed for 79% correct responses, providing the most sensitive, repeatable measurements of contrast sensitivity (Higgins et al., 1984). A full training cycle of the directiondiscrimination task required 20 threshold determinations (i.e.,

<sup>1</sup>www.pathtoreading.com

one for each of the four test spatial frequencies paired with each of the five background spatial frequencies).

The complexity level increased the: (1) number of sinewave components in the background from one (**Figure 1B**) to three (**Figure 1C**); (2) background contrast from 5% to 20%, (**Figure 1C**); and (3) pattern's speed of movement after every four complexity levels, increasing from 6.7 Hz up to 13.3 Hz, as shown in **Table 2**, so that the student was challenged as the training progressed. The background contrast was increased to 20% contrast to provide a background that increased parvocellular activity, since magnocellular neurons saturate at 10% contrast (Kaplan and Shapley, 1986). The 20% contrast background required students to analyze information from magnocellular activity relative to increased parvocellular activity, making the task more challenging. The order of presentation for each complexity level was chosen to gradually increase the difficulty of the task (Lawton, 2011). Therefore, as the level of complexity increased, the contrast threshold should be higher initially. Once all 16 complexity levels of the Motion program were completed, the student progressed onto the next program, the MotionMemory program. Instead of discriminating the direction one pattern moved by pushing the left or right arrow key as in the Motion program, MotionMemory requires signaling the direction that two separate patterns moved, one after the other, by pushing one of four arrow keys. Each threshold in both the Motion and MotionMemory programs required 20–40 trials to complete. A score was given to make the training more game-like. The lower the contrast threshold, the higher was the score. After learning how to do this task, children typically took about 15–20 min to complete one replication, consisting of 20 contrast thresholds. Motion direction-discrimination was trained for between 15–30 min, 3 days a week for 20 weeks.

#### Linguistic Word Building Intervention (Control Intervention)

A linguistic word building intervention was implemented using Learning Upgrade to help struggling readers at the second and third grade level overcome reading difficulties. Each course



contained 60 lessons sequenced to build reading skills. The lessons featured a song-video for instruction and a game for practice with remediation. After logging onto the website, a song-video of length 1–2 min was presented which taught the word building topic through lyrics with a catchy melody and a synchronized animated visual of letters, words and pictures. A game followed the song-video, which required students to answer a series of questions through interaction. Immediate remediation through a spoken voice and animated visuals was given for each incorrect answer, followed by additional problems. When a student reached 100 points, if they had achieved higher than 75% correct, they moved on to the next lesson. If not, they repeated the lesson with the same song but varied questions in the game. When a student completed all 60 lessons, they earned a Bronze certificate which could be printed. A student then used a visual map of lessons and scores to repeat any lessons below 90% to earn a Silver certificate, and then repeated any lessons below 95% to earn a Gold certificate. When a student had earned a Gold certificate, typically in about 20–30 h of time on task, the student was finished with the course and moved to a higher course.

#### Hypotheses

The primary hypothesis in this study is that timing interventions: either one to improve auditory timing or one to improve visual timing, would improve attention, reading, and working memory more than linguistic word building exercises. Attention is the primary outcome measure, with reading speed and comprehension, phonological processing, and working memory being secondary outcome measures that result from improved selective and sustained attention. The secondary hypotheses, based on physiological data demonstrating that 1 cyc/deg is the lowest spatial frequency channel (Blakemore and Campbell, 1969), predict that direction discrimination sensitivity improves: (1) the most for the lowest spatial frequency channel, 1 cyc/deg, which moves twice as far in the same amount of time as the higher 2 cyc/deg test pattern; (2) the least for the 0.25 cyc/deg test pattern which requires pooling across spatial frequency channels to complete the task; and (3) more when a wider background frame of reference consisting of multiple spatial frequencies that are a harmonic (multiple) of the test frequency is presented, as found in typically developing observers (Lawton, 1989).

#### Statistical Analyses

Change in test performance for the primary and secondary outcome measures (attention, reading speed and comprehension, working memory and phonological processing) and all secondary hypotheses were modeled using ANCOVAs controlling for age, sex, ethnicity (Caucasian, Hispanic, Asian, African-American), and school enrolled. Data was either: (1) pooled across schools having a traditional year schedule (four schools); or (2) pooled across the six schools, four having a traditional year schedule and two having a year-round schedule, with school and type of school (traditional vs. year round) included as covariates in the planned ANCOVA. ANCOVA contrast tests were used to compare change in standardized scores in controls vs. treatment groups.

A one-sample t-test was used to compare initial contrast sensitivity in the 58 dyslexics in this study to published levels in typically-developing second graders; paired t-tests were used to compare initial to final contrast sensitivity levels within each treatment group. The relationship between contrast sensitivity and the motion direction-discrimination task complexity level was assessed using 24 visual timing group students who completed all 16 levels of the Motion direction-discrimination training. For data from test frequencies 0.25, 0.5, 1.0, and 2.0 cyc/deg, the relationship between complexity level and contrast sensitivity was assessed using a linear mixed effects model, with hypothesis testing based on the fixed effect estimate of mean trajectory as complexity level increases, i.e., as amount of training increased. Tests investigating whether student's contrast sensitivity at specific complexity levels deviated from the overall linear trend (i.e., were higher or lower than expected) were performed by adding indicator variables for the complexity levels in question to the linear mixed effects models. Paired t-tests were used to test for significant improvement in contrast sensitivity at 0.25, 0.5, 1.0, and 2.0 cyc/deg from baseline to end of study within students trained on PATH to Reading. All analyses were performed using the R statistical programming language. ANCOVA models were fit using the aov function (Chambers et al., 1992), and mixed effects models were fit using the lmer function (Pinheiro and Bates, 2000). All tests were 2-sided since students could increase or decrease in academic skills, with significance level α = 0.05 for all testing.

#### RESULTS

This study, examining the efficacy of visual timing vs. auditory timing vs. linguistic word building training, found significant improvements in student's attention, reading fluency, and working memory only following visual motion-discrimination training when compared to linguistic word building. If languagebased deficits underlie dyslexia, then training to improve auditory timing should also significantly improve these academic skills, since this training was done using clever, engaging auditory exercises for twice as long, 30 min 5 times/week, compared to the training to improve visual timing, done for 15–30 min 3 times/week.

## Effect of Interventions on Attention, Reading Fluency, and Working Memory

Students trained on motion direction-discrimination improved significantly more than controls, see **Table 3** and **Figure 2**, in Attention: **Figure 2A** (pooled data) [t(44) = 2.69, p = 0.009], and **Figure 2B** TS [t(21) = 3.18, p = 0.004], Reading Speed: **Figure 2C** (pooled data) [t(44) = 3.01, p = 0.004], and **Figure 2D** TS [t(21) = 2.98, p = 0.007], Reading Comprehension: **Figure 2E** (pooled data) [t(44) = 2.04, p = 0.046], sequential Visual Working Memory: **Figure 2F** TS [t(21) = 2.34, p = 0.036], nonsequential Auditory Working Memory: **Figure 2G** (pooled data) [t(44) = 2.14, p = 0.037], and **Figure 2H** TS [t(21) = 2.34, p = 0.027], Delayed Recall: **Figure 2I** TS [t(21) = 2.39, p = 0.026], and Phonological Processing (CTOPP Blending Words): **Figure 2J** (pooled data) [t(44) = 3.52, p = 0.0009], whereas students trained on improving auditory timing, implemented using FFW, did not improve significantly more than controls on these tasks. The significant improvements in attention by students trained on visual direction-discrimination which required less attention to complete than either the auditory or linguistic intervention, shows that visual training is more effective than auditory or linguistic training in improving the attention networks. Visual training that does not activate dorsal stream functioning at both low and high levels, e.g., motion coherence, however, is not effective in improving reading fluency (Solan et al., 2004).

Direction-discrimination training improved reading speed in the classroom from 50 to 125 words/minute on average more than found using linguistic interventions. Note that even though both the auditory and visual timing groups had only six subjects in each group in TS, the visual timing intervention improved attention, reading speed, visual and auditory working memory, and delayed recall more than found when trained on the auditory timing intervention, and significantly more than found when trained using linguistic word building, implemented using Learning Upgrade. This same pattern of results was found the following year, yet they were not as large in magnitude, and only reading speed, phonological processing (Blending Words subtest of the CTOPP and Auditory Working-Memory) improved significantly, since the PATH intervention had to be administered before school, instead of before guided reading. PATH to Reading is most effective when immediately followed by guided reading. Moreover, even though motion directiondiscrimination is a visual task, it significantly improved phonological processing more than interventions using an auditory task (either auditory timing or linguistic word building). Only students trained on motion directiondiscrimination improved significantly more than controls in the combined (sequential and nonsequential) auditory working memory standardized score [t(44) = 2.23, p = 0.03]. Future studies with larger sample sizes are needed to determine the advantage of improving visual timing over auditory timing conclusively.

#### Effect of Interventions on Visual Motion Processing

Students in this study had abnormal visual motion processing, as shown by the mean baseline Contrast Sensitivity Function (CSF) for direction discrimination in **Figure 3A**. Initially participants in this study had elevated contrast thresholds for movement discrimination, averaging 2.9% ± 0.2, significantly higher [one sample t(44) = 5.81, p < 0.0001] than the previously reported mean contrast threshold for typically-developing second graders of 1.35% ± 0.1 (Lawton, 2007). Direction-discrimination contrast sensitivity improved significantly only for those students who were trained on the motion direction-discrimination intervention (**Figure 3A**), improving in sensitivity three-fold


TABLE 3 | Mean increase in timing interventions (treatment effect) vs. word building intervention (controls) and the df (degrees of freedom), t value, and p value for the significant improvements only found following PATH training.

after motion direction-discrimination training in both TS [onesample paired t(5) = 3.694, p = 0.014] and in pooled data from traditional and year-round schools, [one-sample paired t(24) = 5.618, p < 0.0001]. The motion discrimination CSF increased significantly as a function of complexity level for each of the test frequency targets, shown in **Figure 3B** and **Table 4**. The

FIGURE 2 | Improvements over controls in Attention: (A, pooled data), <sup>∗</sup>p < 0.009, (B) Traditional Schools (TS), <sup>∗</sup>p < 0.004; Reading Speed: (C, pooled data), <sup>∗</sup>p < 0.004, (D) TS, <sup>∗</sup>p < 0.007; Reading Comprehension Gray Oral Reading Test (GORT-3): (E, pooled data), <sup>∗</sup>p < 0.046; Visual Working Memory: (F) TS, <sup>∗</sup>p < 0.036; Auditory Working Memory: (G, pooled data), <sup>∗</sup>p < 0.037, (H) TS, <sup>∗</sup>p < 0.027; Delayed Recall: (I) TS, <sup>∗</sup>p < 0.026; and Phonological Processing (Blending Words): (J, pooled data), <sup>∗</sup>p < 0.0009, following each intervention: [PATH to Reading (PATH): black, FastForWord (FFW): striped]. These barplots display the mean and (SE) difference in improvement of standardized scores in each treatment group compared to improvements observed in the control group. Positive bars indicate subjects in the treatment group improved more than subjects in the control group, negative bars mean control subjects improved more than those in the treatment group.

FIGURE 3 | (A) Mean and (SE) improvements in direction-discrimination contrast sensitivity for 1 cyc/deg test patterns at first complexity level, plotting initial contrast sensitivity function (CSF) measured at the beginning and end of intervention training, averaged over the five different background patterns, for students in the PATH group, <sup>∗</sup> significant at p < 0.0001. (B) Pooled data from traditional and year-round schools (26 subjects). Improvements in direction-discrimination contrast sensitivity at increasing levels of complexity, plotting initial (0) and maximum contrast sensitivity at each level of complexity (1–16) for each of the four test frequencies: 0.25, 0.5, 1 and 2 cyc/deg. The data in this graph represent the mean and (SE) contrast sensitivity averaged across subjects trained on PATH in TS (6) and year-round schools (20) for subjects who completed all 16 levels of complexity in the PATH program.

temporal frequencies that the students could not discriminate the direction of movement before training, and had the highest contrast sensitivities following training were the 10 and 13 Hz motion (complexity levels 9–16).

When the 5% contrast background changed from being single frequency to being composed of multiple spatial frequencies (i.e., at complexity levels 2, 6, 10, and 14, described in **Table 2**), contrast sensitivity levels were higher, on average, 18.5 points relative to the general linear trend [t(21) = 2.62, p = 0.016]. Conversely, when the multifrequency background pattern was presented at 20% contrast for the lowest spatial frequency target of 0.25 cyc/deg (complexity levels 4, 8, 12, and 16 in **Table 2**), contrast sensitivity was lower, being marginally significant [t(21) = −1.99, p = 0.06], which is expected because at 20% contrast, the background activates parvocellular neurons more

TABLE 4 | Mean increase in contrast sensitivity as a function of test frequency and PATH complexity level for pooled data from traditional and year-round schools.


than magnocellular neurons (Kaplan and Shapley, 1986), making the task more difficult. The 0.25 cyc/deg test frequency requires pooling of contrast information over several spatial frequency channels since, as shown by Blakemore and Campbell (1969), there are no luminance-varying spatial frequency channels below 1 cyc/deg. Since the 1 cyc/deg test pattern showed the highest mean increase in contrast sensitivity, 11.2% per level of complexity, this indicates that the 1 cyc/deg test pattern, is the predominant test frequency for improving motion discrimination. Finding an increased CSF at increasing levels of complexity, thereby increasing: (1) the speed of motion, as shown in **Table 2**; and (2) the width of the background frame of reference (from single to multifrequency backgrounds) and its contrast (activating more parvocellular neurons at higher contrasts) suggests that direction-discrimination training improves the functioning of magnocellular neurons (left-right movement) relative to the functioning of parvocellular neurons (stationary background).

Not only was contrast sensitivity for direction-discrimination increased significantly following motion training, but also the time to discriminate the direction of motion was reduced significantly for students who did the direction-discrimination intervention. For example, for the 1 cyc/deg test frequency, the most sensitive test frequency target (see **Figure 3B**), the mean time to complete five threshold measurements decreased an average of 6 s per complexity level [mean decrease by mixed effects model analysis t(22) = 4.225, p = 0.004]; mean time was 5.00 min ± 0.22 at baseline and 2.38 min ± 0.17 at complexity level 16. That is, the mean time to complete motion direction-discrimination decreased as the complexity level increased. These results show that both the: (1) sensitivity to discriminate direction-discrimination increased; and (2) time required to complete motion direction-discrimination training decreased.

#### DISCUSSION

The key stimulus attribute needed to detect motion discrimination deficits in dyslexics is assessed by measuring the contrast sensitivity for the direction of motion relative to a stationary background (Georgeson and Scott-Samuel, 1999). Only when the direction of motion is discriminated against a stationary background do both dysphonetic and dyseidetic dyslexics exhibit an impaired ability to discriminate the direction of motion (Lawton, 2000, 2007, 2011; Ridder et al., 2001). When the direction of movement is not judged relative to a stationary background, then some dyslexics do not exhibit motion deficits, as reviewed previously (Stein, 2001; Gori et al., 2014). Studies that have questioned whether magnocellular deficits in the dorsal stream cause the reading problems found in dyslexics (Amitay et al., 2002; Williams et al., 2003) examined a dyslexic's sensitivity to flicker or high contrast random dot patterns, relative to no background pattern or a moving background (Sperling et al., 2006), none of these stimuli being optimal for activating direction-selective cells (Baker, 1990; De Valois et al., 2000). Patterned backgrounds, as opposed to featureless backgrounds, require figure/ground discrimination, suggesting that a core deficit in dyslexia may be figure/ground discrimination analyzed by the dorsal stream, consistent with: (1) the dyslexic's deficits being primarily due to deficits in the spatiotemporal parsing of the letter stream (Vidyasagar, 1999, 2001) that are normally transmitted both by feedforward magnocellular (low-contrast movement) input and feedback at the attended location from LIP to MT (Saalman et al., 2007) and from MT to V1 (Hupe et al., 1998); (2) an impairment in the low gamma frequency oscillations reducing feedback in visual cortical areas (Vidyasagar, 2013); and (3) in excluding noisy backgrounds (Sperling et al., 2006). Training with the stationary background frame of reference provided by multifrequency backgrounds (**Figure 1C**) improved the dyslexic's ability to discriminate the direction of movement (Lawton, 2011), enabling the child to attend to a wider region of space. This study supports the hypothesis (Lawton, 1989) that stationary multifrequency backgrounds confer an advantage when discriminating the direction of motion, providing a wider, more structured frame of reference, most likely by taking advantage of MT's center-surround organization (Allman et al., 1985) to facilitate figure/ground discrimination. Moreover, only with stationary textured backgrounds has motion directiondiscrimination training been found to improve reading fluency in all types of dyslexics (Lawton, 2000, 2007, 2011).

This study found that direction-discrimination training, a task that optimally activates the V1-MT network (De Valois et al., 2000), improved: (1) movement direction sensitivity; (2) speed of processing for both motion direction discrimination and reading rates; (3) attention; (4) reading comprehension; (5) phonological processing; and (6) both auditory and visual working-memory, including delayed recall, more than found following phonological training, either by improving auditory timing or word building strategies. These results indicate that direction-discrimination training improves the sensitivity and timing of sluggish magnocellular neurons (improving dorsal stream function), relative to parvocellular neurons early in the dorsal stream, as evidenced by improved motion discrimination sensitivity at higher background contrasts and temporal frequencies. After direction-discrimination training, the highest contrast sensitivities were found for patterns moving from 10–13 Hz, these temporal frequencies being key to improving attention in dyslexics. These results contradict Goswami's temporal sampling framework theory, proposing that the key timing deficits in dyslexia are for movement <10 Hz (Goswami, 2011). This study found that improving visual motion directiondiscrimination sensitivity and timing improved processing in the neural networks underlying attention, reading, and working memory in dyslexics. These improvements are found by presumably improving low levels in the dorsal stream, the V1-MT network, which improved functioning at higher levels in the dorsal stream, including the PPC, the dorsal lateral prefrontal cortex (DLPFC), and the attention networks. This study provides additional evidence that visual motion processing is fundamental for paying attention, good reading performance, and remediating reading deficits, contrary to common practice based on the assumption that only auditory-based phonological processing can be used to remediate reading deficits (Tallal et al., 1993; Temple et al., 2003; Vellutino et al., 2004; Dehaene, 2009; Olulade et al., 2013).

Initially, the biological basis of dyslexia was assumed to be in the brain regions responsible for the visual perception of text (Hinshelwood, 1917). However presently, the dominant view is that the core deficit underlying reading disabilities is an auditory phonological processing deficit (Bradley and Bryant, 1983; Tallal et al., 1993; Temple et al., 2003; Vellutino et al., 2004; Dehaene, 2009; Olulade et al., 2013). A careful examination of the neuroimaging studies responsible for this paradigm shift reveal that visual word form areas and other visual processing areas were also implicated in many of these studies. For instance, Shaywitz et al. (1998) state ''Brain activation patterns differed significantly between the groups with dyslexic readers showing relative underactivation in posterior regions (Wernicke's area, the angular gyrus, and striate cortex) and relative overactivation in an anterior region (inferior frontal gyrus).'' Finding the striate (visual) processing area to be hypoactive in persons with dyslexia is widespread in the literature (Eden et al., 1996; Demb et al., 1998; Shaywitz et al., 1998; Shelley-Tremblay et al., 2011) and reliably co-occurs with abnormal patterns of cortical activity in areas more typically associated with auditory analyses. The visual contribution of dorsal stream processing to dyslexia has been dismissed by the American Academy of Pediatrics (2009) based on a version of the magnocellular deficit theory that has been shown to be biologically implausible (Scarborough, 2005). Proposing that phonological processing deficits are the sole and key abnormal factor in dyslexia is not born out by studies showing visual motion processing deficits are found for all types of dyslexics (Lawton, 2000, 2007, 2011; Ridder et al., 2001). While phonological processing is a reliable and robust predictor of future reading, it cannot fully account for the variance in reading ability and the full range of deficits in dyslexic readers, instead only accounting for approximately 25% of future reading skills (Mann and Liberman, 1984; Wagner, 1997).

# Novel Method to Remediate Attention, Reading, and Working Memory in Dyslexics

Movement figure/ground discrimination, a novel method (Lawton, 2000, 2015), is fundamental for detecting and remediating attention, reading, and memory problems for all types of dyslexics. This study found that training to improve motion direction-discrimination, most likely by improving the timing and sensitivity of directionally-selective magnocellular neurons relative to parvocellular neurons in the dorsal stream is linked to improved attention skills, enabling the beginning and end of the word, and processing the letters sequentially to be done effortlessly, thereby improving reading performance. Moreover, previous studies (Lawton, 2011) found that the more a student was trained on motion direction-discrimination, the more reading speed improved. Consequently, abnormal visual motion processing is implicated as a fundamental factor underlying the reduced functionality of the attention networks in dyslexics, causing slow reading speeds and poor comprehension. Furthermore, this abnormality can be remediated rapidly by visual training that improves a person's contrast sensitivity for direction-discrimination of dim vertical bars moving relative to a stationary textured background, indicating that visual timing deficits are a cause not a result of dyslexia.

The significant improvements in both phonological processing and auditory working-memory found in this study demonstrate that training to improve visual timing improves auditory skills. Consequently, training early in the visual dorsal stream improved higher levels of processing in the dorsal stream, in particular the PPC, where: (1) there is a supramodal representation of space with convergence of both auditory and visual inputs in the parietal cortex (Farah et al., 1989); and (2) selective endogenous attention activates this area which connects to frontal areas, like the DLPFC (Posner et al., 1984; Posner and Petersen, 1990; Supekar and Menon, 2012). By improving attention, students were able to hear the sequential ordering of sounds more accurately, improving phonological processing and auditory working memory. Students given training aimed at auditory magnocellular function, as embodied by the auditory timing intervention, improved in reading fluency, but the improvements were not significant when compared to the improvements made by controls, as also found in a review of FFW studies (Strong et al., 2011).

A major limitation of this study is the small sample size in the group to improve auditory timing. Most of the students in our study were from year-round schools, whose schedules precluded implementing the auditory timing program. Power to detect treatment effects in the auditory timing group was limited, requiring a larger study to determine unequivocally the relative effect of improving auditory timing on reading fluency and attention. Another limitation of this study is the lack of an out of classroom control condition that was comparable in terms of the extent of personal attention from the college students administering the reading interventions. Since half the classroom was pulled to be trained on the timing interventions, the students who stayed in the classroom had much more attention from their classroom teacher. Moreover, there was no evidence of such an effect in the auditory timing group, even though this group experienced the same level of personal interaction with students for 5 days a week compared to only 3 days a week for the visual timing intervention. Hence, it is unlikely that the significant effects observed in the motion direction-discrimination group were driven by effects associated with pulling children from the classroom and personal attention.

This study found that motion direction-discrimination training remediates reading deficits of both phonological (requiring accurate temporal sequencing) and visual (requiring accurate spatial sequencing) origin. Moreover, there is evidence that improvements in reading speed after motion directiondiscrimination training are sustained over time (Lawton, 2011), whereas improvements in word reading found following auditory interventions to improve phonological processing degrade over time, two years later showing no difference in word reading compared to controls not having the auditory intervention (Wise et al., 2000).

# Sluggish Magnocellular Processing Limits Reading Acquisition in Dyslexics

It has been proposed that the visual system exploits the dichotomy of a fast magnocellular channel and a slower parvocellular channel for the purpose of selective attention (Vidyasagar, 2001, 2012, 2013). The faster transmission time of the magnocellular neurons projecting predominantly to the dorsal stream are ideal to provide the input for feedback to intermediate stages in the cortical dorsal and ventral streams, as well as to V1 (Vidyasagar, 1999, 2001, 2013). Feedback from MT has its strongest effects for stimuli of low salience (Hupe et al., 1998), such as the low contrast patterns that maximally activate magnocellular neurons (Kaplan and Shapley, 1986; Sclar et al., 1990) that are being used to train visual movement discrimination in this study. There is parvocellular input to MT from: (1) parvocellular layers in the lateral geniculate nucleus (Nassi et al., 2006); (2) layer 6 V1 cells, having both parvocellular and magnocellular input, projecting to layer 4Cb in V1 which projects to MT (Callaway, 1998); and (3) V4 (Maunsell et al., 1990), enabling parvocellular activity to provide a background frame of reference for discriminating the direction of movement in the dorsal stream. Parvocellular functioning among dyslexics has been found to be equivalent to that in normal controls, whereas magnocellular function is significantly impaired (Lovegrove et al., 1980; Hansen et al., 2001; Kevan and Pammer, 2009; Gori et al., 2014), being the primary cause for slow reading and attention deficits.

When reading, it has been proposed that the PPC uses the spatial information of the location and overall shape and form of a word it receives through the rapid magnocellular pathway to gate the information going into the temporal stream. The information is gated via attentional feedback to the striate cortex and to other regions in the occipito-temporal cortex (Martinez et al., 1999; Vidyasagar, 1999, 2001), most likely by topdown feedback which uses synchronized neuronal oscillations at the lower end of the gamma frequency range (Vidyasagar, 2013), which can then be used by parvocellular neurons in the ventral stream as a starting point for deciphering the individual letters (Vidyasagar, 2001; see **Figure 4**). Each cycle of gamma oscillation focuses an attentional spotlight on the primary visual cortical representation of just one or two letters before sequential recognition of these letters and their concatenation into word strings. The timing, period, envelope, amplitude, and phase of the synchronized oscillations modulating the incoming signals in the striate cortex have a profound influence on the accuracy and speed of reading (Vidyasagar, 2013). The speed determined by the gamma frequency oscillation is the essential rate-limiting step in dyslexia (Vidyasagar, 2013). Figure/ground movement discrimination training is likely to strengthen coupled: (1) theta/gamma activity for the test patterns moving at 6.7 and 8 Hz; or (2) alpha/gamma activity for the test patterns moving at 10 and 13.3 Hz. Therefore, it is likely that the visual direction-discrimination training paradigm used in this study improves not only magnocellular function and attention, but also magno-parvo integration, figure/ground discrimination, and low gamma frequency oscillation.

Our working hypothesis is that sluggish magnocellular neurons early in the dorsal cortical visual pathway (V1), found in dyslexics (Livingstone et al., 1991), disrupt processing at higher levels of dorsal stream processing, dyslexics having little or no activity in MT (Eden et al., 1996; Demb et al., 1998), including the development of these processes. After 6 weeks of motion direction-discrimination training 3 times/week in dyslexic fourth graders, dorsal stream activity improved as shown by their Visual Evoked Potentials (Shelley-Tremblay et al., 2011), consistent with a recent pilot study using magnetoencephalography (MEG) source imaging (Lawton and Huang, 2015) that found improved function in both the dorsal stream (V1, V3, MT, MST areas) and fronto-parietal attention networks. Magnocellular output from

with their eyes since magnocellular neurons are sluggish, being delayed 20–40 ms, causing confusion and misrecognition.

the anterior portion of the dorsal stream, including the PPC, is input to the mid/posterior insula, a hub of the Central Executive Network (CEN), which includes the PPC and the DLPFC (Supekar and Menon, 2012). Magnocellular activity signals the beginning and end of a word, thereby gating the processing of parvocellular activity, as proposed by Vidyasagar (2001, 2013), and illustrated in **Figure 4**. The sluggish magnocellular neurons in dyslexics not only result in attention deficits, an impairment in the low gamma frequencies reducing feedback in visual cortical areas (Vidyasagar, 2013), but also disrupted processing in LIP and FEF, either within a fixation, between fixation sequences, or both (Vidyasagar, 2001; Slaghuis and Ryan, 2006; Fischer, 2012). This study found, for the first time, that direction-discrimination training improved not only reading fluency, but also attention and working memory. Therefore, direction-discrimination training improved CEN functioning, also found using MEG source imaging (Lawton and Huang, 2015), providing more evidence that abnormal visual motion processing is a fundamental cause of attention and subsequent reading problems in dyslexics.

By improving the attention network's functioning, motion direction-discrimination training provides a wider usable field of view so that more objects are perceived in their correct location in a single glance. Motion direction-discrimination training is the key for reading acquisition to happen at an efficient speed for dyslexics, most likely by increasing the ease of magno-parvo integration. When motion direction-discrimination training was followed by guided reading in the classroom, attention, reading fluency, and working memory skills improved significantly more than found after training on linguistic word building or auditory timing interventions. Remediating visual timing deficits in the dorsal stream reveals the causal role of visual motion discrimination and attention in reading acquisition. This study supports the hypothesis that faulty timing in synchronizing the activity of magnocellular (left-right movement discrimination) with parvocellular (stationary background) visual pathways are a fundamental cause of dyslexia and argues against the assumption that reading deficiencies in dyslexia are caused by phonological or language deficits. This study demonstrates that visual movement figure/ground discrimination can be used to

#### REFERENCES


not only detect dyslexia early, but also for its successful treatment, so that reading problems do not prevent children from readily learning.

#### AUTHOR CONTRIBUTIONS

TL designed study, recruited and trained staff, ran daily operations, and wrote article.

#### ACKNOWLEDGMENTS

I thank Drs. Jack Shelley-Tremblay, Sue Cotter, Norma Graham, Mike Posner, Eric Borsting, Steve Hillyard, Dan Felleman, Jaime Pineda, and Bryan Hansen for many thoughtful and insightful suggestions to help improve this article substantially. I thank Wendy Portnuff, the chief trainer at Scientific Learning, for training and supervising the RAs to administer FastForWord, and her vast knowledge of motivational strategies used to improve compliance of all students in this study. I thank Steve Edland, the UCSD biostatistician who designed and analyzed all the statistical analyses for this study using the R statistical package, and helped write earlier versions of this article. I thank Kelly Lawton for helping to automate the computer calibration, training staff using easy to follow written instructions, helping film the video training movies, and help in developing the motivational strategies. I thank Jordan Conway, the Senior Research Assistant, who helped run daily operations, being in charge of the lead assistants at each school, as well as helping to recruit and train the 40 UCSD undergraduates who were in charge of performing these experiments, help in developing the motivational strategies, helping film the video training movies, backing up and calibrating all of the computers, and providing the motivational strategies at each school. I thank the 40 UCSD cognitive science and psychology undergraduates who collected the data. I thank Dr. Doug Stephey for his support, encouragement, and advice, especially recommending we use the CAS Stroop and Number Detection subtests to measure attention, and the TIPS to measure working memory. This work was supported by the Institute of Educational Sciences (IES), US Department of Education (IES Award R305A100389) to UCSD.


Callaway, E. M. (1998). Local circuits in primary visual cortex of the macaque monkey. Annu. Rev. Neurosci. 21, 47–74. doi: 10.1146/annurev.neuro.21.1.47


Hinshelwood, J. (1917). Congenital Word-Blindness. London: H. K. Lewis and Co.


**Conflict of Interest Statement**: The author has a potential conflict of interest, since she is the developer of Path To Reading (PATH). She had no part in collecting or analyzing the data, thereby having no influence over the results we obtained.

Copyright © 2016 Lawton. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Sex Differences in Gray Matter Volume of the Right Anterior Hippocampus Explain Sex Differences in Three-Dimensional Mental Rotation

#### Wei Wei 1,2,3 , Chuansheng Chen<sup>4</sup> , Qi Dong<sup>3</sup> and Xinlin Zhou<sup>3</sup> \*

<sup>1</sup> Advanced Technology Innovation Center for Future Education, Beijing Normal University, Beijing, China, <sup>2</sup> Department of Psychology and Behavioral Sciences, Zhejiang University, Hangzhou, China, <sup>3</sup> State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China, <sup>4</sup> Department of Psychology and Social Behavior, University of California, Irvine, CA, USA

Behavioral studies have reported that males perform better than females in 3-dimensional (3D) mental rotation. Given the important role of the hippocampus in spatial processing, the present study investigated whether structural differences in the hippocampus could explain the sex difference in 3D mental rotation. Results showed that after controlling for brain size, males had a larger anterior hippocampus, whereas females had a larger posterior hippocampus. Gray matter volume (GMV) of the right anterior hippocampus was significantly correlated with 3D mental rotation score. After controlling GMV of the right anterior hippocampus, sex difference in 3D mental rotation was no longer significant. These results suggest that the structural difference between males' and females' right anterior hippocampus was a neurobiological substrate for the sex difference in 3D mental rotation.

#### Edited by:

Claudia Voelcker-Rehage, Chemnitz University of Technology, Germany

#### Reviewed by:

Rachael D. Seidler, University of Michigan, USA Chris Lange-Küttner, London Metropolitan University, UK

#### \*Correspondence:

Xinlin Zhou zhou\_xinlin@bnu.edu.cn

Received: 21 January 2016 Accepted: 02 November 2016 Published: 15 November 2016

#### Citation:

Wei W, Chen C, Dong Q and Zhou X (2016) Sex Differences in Gray Matter Volume of the Right Anterior Hippocampus Explain Sex Differences in Three-Dimensional Mental Rotation. Front. Hum. Neurosci. 10:580. doi: 10.3389/fnhum.2016.00580 Keywords: sex difference, mental rotation, anterior hippocampus, brain structure

# INTRODUCTION

The hippocampus plays an important role in spatial processing. The ''place cells'' in the hippocampus are the core part in the ''GPS'' of the brain that processes spatial information (O'Keefe, 1976; Fyhn et al., 2004; Sargolini et al., 2006). Both animal and human studies have shown that the hippocampal volume is related to spatial ability. For example, homing pigeons have larger hippocampi than non-homing pigeons (Rehkämper et al., 1988). The birds that need to store their food have larger hippocampi than the ones who do not (Krebs et al., 1989; Sherry et al., 1989). In terms of the mammals, the rats that store their food in a distributed manner have larger hippocampi than the ones that store their food in one central location (Jacobs and Spencer, 1994).

Human studies also showed that the hippocampus plays an important role in spatial processing (for a recent summary see Lee et al., 2012). First, patients who suffered from developmental topographical disorders (DTD) have damages to the hippocampus (in addition to the retrosplenial cortex, fusiform gyrus, and lingual gyrus) and show impaired performance on navigation tasks (Aguirre and D'Esposito, 1999; Iaria and Barton, 2010). Damages to the hippocampus also result in a deficit in spatial working memory (Abrahams et al., 1997; Bohbot et al., 1998; Holdstock et al., 2000). Second, neuroimaging studies found that the hippocampus is activated by navigation tasks (Maguire et al., 1998; Mellet et al., 2000; Hartley et al., 2003; Iaria et al., 2003, 2008) and spatial memory tasks (Maguire, 1997; Burgess et al., 2002). Moreover, when performing the spatial memory task, only spatial strategies (but not non-spatial strategies) elicited activation in the hippocampus (Iaria et al., 2003; Bohbot et al., 2004). Third, the size of the hippocampus may change dynamically due to a long-term or sustained demand of spatial processing. Experienced taxi drivers have larger posterior hippocampi than the control subjects (Maguire et al., 2000). Spatial navigation training has been found to increase the gray matter of the hippocampus (Lövdén et al., 2010, 2012; Kühn and Gallinat, 2014; Kühn et al., 2014).

Neuroimaging studies have further shown differentiated functions of the anterior and posterior hippocampus (Chua et al., 2007; Ghetti et al., 2010; DeMaster and Ghetti, 2013). Based on the results of 54 previous studies, Lepage et al. (1998) proposed the Hippocampus Encoding/Retrieval (HIPER) Model that specifies the anterior hippocampus's role in acquiring or encoding new visuospatial information and the posterior hippocampus's role in retrieval. Subsequent studies have further supported this model (Strange and Dolan, 2001; Hartley et al., 2003; Kumaran and Maguire, 2005; Maguire et al., 2006a; Iaria et al., 2007). Most of the previous studies had focused on spatial memory and have thus implicated the posterior hippocampus for such a function. For example, the ''place cells'' were found only in the posterior hippocampus in monkeys (Colombo et al., 1998). Taxi drivers' constant retrieval of city maps can lead to increased gray matter volume (GMV) of the posterior hippocampus (Woollett and Maguire, 2011).

If the hippocampus plays a crucial role in visuospatial processing (in particular, the anterior hippocampus's role in spatial information encoding), would its structural variations explain one of the most well-documented sex differences—that in visuospatial processing? Decades of behavioral research have shown that sex differences in visuospatial ability are stable across different age groups (Voyer et al., 1995) and that males' advantage is particularly obvious in mental rotation (Linn and Petersen, 1985; Maeda and Yoon, 2013; Reilly and Neumann, 2013; Lütke and Lange-Küttner, 2015). Moreover, compared to 2-dimensional (2D) mental rotation, 3-dimensional (3D) mental rotation (which requires more spatial processing) has a larger and more consistent sex effect across age groups (Peters et al., 1995; Roberts and Bell, 2003; for a meta-analysis, see Linn and Petersen, 1985). Overall, sex differences are robust based on large-scale studies (e.g., 109,612 men and 88,509 women in Maylor et al., 2007, study; 90,000 women and 111,000 men in Lippa et al., 2010 study) and meta-analysis studies (Geiser et al., 2008; Maeda and Yoon, 2013; Reilly and Neumann, 2013). These sex differences are so robust that only special experimental designs (e.g., positive instructions to women, different delivery modes) or training would reduce or sometimes eliminate them (Goldstein et al., 1990; Peters et al., 1995; Roberts and Bell, 2003; Peters, 2005; Feng et al., 2007; Monahan et al., 2008; Moè, 2009; Glück and Fabrizii, 2010; Tzuriel and Egozi, 2010; Maeda and Yoon, 2013).

In support of a possible neural basis of sex differences in mental rotation, especially, 3D mental rotation, neuroimaging studies have found structural and functional differences between male and female hippocampi. Structurally, the total volume of the hippocampus was larger for females than for males (Giedd et al., 1997; Goldstein et al., 2001). Males showed a significant correlation between the size of the hippocampus and age (from 18 years to 42 years), which was not significant for females (Pruessner et al., 2001; Suzuki et al., 2005), suggesting that the growth of the hippocampus occurred earlier in females (Giedd et al., 1997). Functionally, when performing a navigation task, males have been found to depend more on the hippocampus, whereas females depend more on the parietal and frontal lobes (Grön et al., 2000). Thus far, however, no study has examined whether structural differences in the hippocampus would account for sex differences in 3D mental rotation.

In the current study, we examined the relationship between GMV of the hippocampus (especially its anterior part) and sex differences in 3D mental rotation. The anterior part is particularly relevant because mental rotation tasks mainly involve the encoding of spatial information, rather than the retrieval of previously stored spatial information. We hypothesized that males would have larger volumes in the anterior hippocampus than would females, and such a difference would explain a large part of sex differences in 3D mental rotation.

# MATERIALS AND METHODS

# Participants

Participants were 431 college students (192 males and 239 females, mean age = 19.9 years, ranging from 18 to 24 years) from Beijing Normal University. No participants had a history of neurological or psychiatric disorders or head injury. This study was approved by the Institutional Review Board of the Imaging Center for Brain Research in the Institute of Cognitive Neuroscience and Learning at Beijing Normal University.

# MRI Acquisition

Whole brain structural MRI scans were performed on a Siemens 3T Trio scanner (Munich, Germany) by using a T1-weighted three-dimensional gradient echo sequence (TR = 2350 ms, TE = 3.39 ms; flip angle = 7◦ ; field of view = 100 mm; matrix = 256 × 256; voxel size = 1 mm × 1 mm × 1 mm).

# Behavioral Tasks

The three-dimensional mental rotation task was based on Shepard's mental rotation task (Shepard and Metzler, 1971; Peters et al., 1995). Participants were presented one 3D picture as the target and four as answer choices, and they were asked to select the answer that matched the target after rotation. This task had 24 items, which were divided into two 3-min blocks. The total score was analyzed.

Motor-Free Visual Perception Test, Third Edition (MVPT-3) was used to test the general visual perception ability (Colarusso and Hammill, 1972). Five categories of visual perception were measured: spatial relationship, visual closure, visual discrimination, visual memory and figure ground. Total score was analyzed. This task was used to control the general visual perception ability.

Raven's Advanced Progressive Matrices (RAPM; Raven and Court, 1998) was used to assess general intelligence. Subjects were given 30 min to complete as many items as possible. For each test item, subjects were asked to select from several alternatives of the missing segment that would complete a larger pattern. The number of correct trials was analyzed.

#### Data Analysis

#### Structural Whole Brain Analysis

Data were analyzed by using voxel-based morphometry (VBM) implemented in Statistical Parametric Mapping (SPM5, Wellcome Department of Cognitive Neurology) and executed in MATLAB (R2012b, Mathworks, Sherborn, MA, USA). Following the procedures described by Ashburner and Friston (2000) and Good et al. (2001), we conducted the following steps of data pre-processing: extraction of the brain, spatial normalization into the stereotactic space by using the standard SPM gray template, segmentation into gray and white matter and CSF compartments, and correction for volume changes induced by spatial normalization (modulation). The spatially normalized images were written in voxels of 1 mm × 1 mm × 1 mm. The smoothing with a 12 mm full width at half maximum (FWHM) isotropic Gaussian kernel was applied. In the current study, the GMV was analyzed. Two-sample t-test was used to investigate sex differences in GMV of the hippocampus.

#### Structural ROI Analysis

The current study focused only on the hippocampus arch, which can be divided into three parts: anterior, middle and posterior hippocampus. This segmentation method has been commonly used to study the different parts of hippocampus arch (Huang et al., 2013; Travis et al., 2014; for a review see Malykhin and Coupland, 2015). The segmentation process was executed in MATLAB (Mathworks, Sherborn, MA, USA). A sagittal view of the hippocampus from Anatomical Automatic Labeling (AAL) was projected onto a plane and the maximum values in the right-left direction and the top-down direction were calculated. After marking the hippocampus into four quarters, it was divided into the anterior, middle, and posterior parts, with the ratios of 1:2:1 (Duvernoy, 2005), which were then used as the mask for structural ROI analysis. To control for sex differences in overall brain size, the whole brain GMV was used as a covariate. A separate analysis was conducted without controlling for the whole brain GMV (see Supplementary Material).

Statistical analysis of the behavioral data was conducted in SPSS (SPSS Inc., Chicago, IL, USA).

# RESULTS

#### Behavioral Results

The behavioral tasks showed that males performed better than females on the 3D mental rotation task (F(1,429) = 5.95, p = 0.015, d = 0.234). This sex difference became even slightly greater (F(1,427) = 7.65, p = 0.006, d = 0.254), after controlling for general visual perception and intelligence (RAPM) even though these two correlates did not show significant sex differences (for visual perception, F(1,429) = 2.77, p = 0.097; and for RAPM, F(1,429) = 2.43, p = 0.120; see **Table 1**).

#### Structural ROI Analysis

The structural ROI analysis involved six brain areas (bilateral anterior, middle and posterior hippocampus), so we used Bonferroni correction to adjust for the significance thresholds: p = 0.05/6 = 0.008. Controlling for the whole brain GMV, the GMV of the anterior hippocampus was significantly larger for males than for females (F(1,428) = 14.92, p < 0.008, for the left hemisphere, and F(1,428) = 13.44, p < 0.008, for the right hemisphere), whereas the posterior hippocampus was significantly larger for females than males (F(1,428) = 24.32, p < 0.008, for the left hemisphere, and F(1,428) = 9.59, p < 0.008, for the right hemisphere; see **Table 2** and Supplementary Table 1S). There was no significant sex difference in GMV of the middle hippocampus (F(1,428) = 0.001, n.s., for the left hemisphere; F(1,428) = 0.01, n.s., for the right hemisphere). After controlling for the whole brain GMV, there also were no significant sex differences in GMV of the whole left hippocampus (F(1,428) = 0.07, n.s.) and the whole right hippocampus (F(1,428) = 0.19, n.s.).

**Table 3** shows the correlations between GMVs of the different parts of the hippocampus and the scores of 3D mental rotation and the other two cognitive tasks after controlling for the whole brain GMV (see Supplementary Table 2S). The results showed that GMVs of the left and right anterior hippocampus were significantly correlated with performance on the 3D mental rotation task (left, r = 0.14, p < 0.008; right, r = 0.19, p < 0.008; see **Figure 1** and Supplementary Figure 1S). After controlling for visual perception, the correlations between 3D mental rotation and the GMVs of the left and right anterior hippocampus

TABLE 1 | Mean scores (and standard deviations) and sex differences for the three behavioral tasks.


Note: <sup>∗</sup>p < 0.05.

TABLE 2 | Mean scores (and standard error) and sex differences in gray matter volumes (mm<sup>3</sup> ) of the left and right anterior, middle and posterior hippocampus (with the gray matter volume of the whole brain as a covariate).


Note: The critical p value after Bonferroni correction was 0.008 (0.05/6 ROIs). Significant p values are in bold.

remained significant (left, r = 0.14, p < 0.008; right, r = 0.19, p < 0.008). After controlling for intelligence, the correlation between 3D mental rotation and the GMVs of the right anterior hippocampus remained significant (right, r = 0.17, p < 0.008). After controlling for both visual perception and intelligence, the correlation between 3D mental rotation and the GMV of the right anterior hippocampus also remained significant (right, r = 0.18, p < 0.008). The correlations between mental rotation and the GMV of the left anterior hippocampus (r = 0.12, controlling for either intelligence alone or both intelligence and visual perception) were no longer significant after Bonferroni correction.

ANCOVA was conducted to test whether the sex difference in 3D mental rotation could be explained by sex differences in GMVs of the hippocampus. After controlling for GMVs of the right anterior hippocampus and the whole brain, the sex difference in 3D mental rotation was no longer significant (F(1,427) = 3.04, p = 0.070). The change in sex difference was statistically significant (t = 3.47, p < 0.01).

#### Whole-Brain Analysis

To supplement the structural ROI analysis, we conducted a whole-brain analysis in SPM. Sex differences in the whole brain

GMV are shown in **Table 4**. Correlation analysis confirmed that the structure of left and right anterior hippocampus was correlated with 3D mental rotation (p < 0.05, FWE-corrected for multiple comparison, peak, left: x = −34, y = −9, z = −19, t = 3.80; right: x = 35, y = −9, z = −17, t = 4.69; see **Figure 2**). In addition, GMV in the calcarine was significantly and positively associated with 3D mental rotation (p < 0.05,

TABLE 3 | Correlation coefficients between gray matter volumes of the left and right hippocampus and behavioral performance (with the gray matter volume of the whole brain as a covariate).


Note: The critical p value after Bonferroni correction was 0.008 (0.05/6 ROIs). Significant p values are in bold.

TABLE 4 | Loci showing significant sex differences in gray matter volume based on the whole brain analysis (p < 0.001, FWE-corrected for multiple comparisons, cluster size >200).


FWE-corrected for multiple comparison, cluster size >50; **Table 5**).

# DISCUSSION

In the present study, we found that, compared to females, males performed better in 3D mental rotation and had greater GMV in the anterior hippocampus. The GMV of the right anterior hippocampus was significantly correlated with 3D mental rotation performance, even after controlling for visual perception and general intelligence. The sex difference in 3D mental rotation disappeared after controlling for GMV of the right anterior hippocampus.

## Sex Differences in Spatial Abilities

Our behavioral results were consistent with many previous studies that showed a stable male advantage in spatial ability, especially in 3D mental rotation. This male advantage may have had an evolutionary origin because of males' role in hunting, for which a good sense of direction and a superior ability in spatial relations (throwing spears at the games) had been critical (Geary, 1995; Ecuyer-Dab and Robert, 2004). In modern-day life, males may no longer need to hunt, but they still prefer activities involving spatial processing such video games and sports (Okagaki and Frensch, 1994; Ozel et al., 2004; Robert and Héroux, 2004; Cherney and London, 2006), which may have continued to help males gain an advantage in spatial ability. Males and females have also been found to use different strategies when performing a navigation task, with males using spatial strategies and females using both verbal and spatial strategies (Merrill et al., 2016).

It is worth mentioning that, although Raven's Progressive Matrices test also involves spatial processing and mental rotation, it did not show sex differences in our study. This result is consistent with previous studies (e.g., Raven, 1938; Eysenck and Kamin, 1981; Wei et al., 2012; Lütke and Lange-Küttner, 2015). There are at least two possible reasons. First, Raven's Progressive Matrices aims to test general intelligence, so it includes measures of both visuospatial ability and reasoning. Using structural equation modeling, Schweizer et al. (2007) showed that the component of reasoning explained 46% of the


TABLE 5 | Loci showing positive correlations between 3-dimensional (3D) mental rotation and gray matter volumes based on the whole brain analysis (p < 0.05, FWE-corrected for multiple comparisons, cluster size >50).

variance of the total score, whereas mental rotation explained only 7% (Schweizer et al., 2007). Second, Raven's mental rotation task involves only 2D rotation, which has smaller sex differences than 3D rotation (Peters et al., 1995; Roberts and Bell, 2003; Lütke and Lange-Küttner, 2015).

# Neural Basis of Sex Differences in Spatial Abilities

Sex difference in spatial processing has been associated with the volume and activation of the hippocampus. In nonhuman animals, the sex that is responsible for searching for food and nesting has a larger hippocampus (Clayton et al., 1997, and for a review see Lee et al., 1998). Our results showed that compared to females, males had a larger anterior hippocampus, which accounted for their advantage in 3D mental rotation. Consistently, previous research has linked a larger anterior hippocampus to better performance in encoding new spatial information (Maguire et al., 2006b). Our results on the role of the anterior hippocampus in mental rotation are consistent with the HIPER model (Lepage et al., 1998). According to this model, the anterior hippocampus is responsible for acquiring or encoding new visuospatial information, more specifically, coding information for head directions and angular features (Maguire et al., 1998), and registering new and abstract spatial environment (Save et al., 1992; Hartley et al., 2003; Sperling et al., 2003; Jackson and Schacter, 2004; Maguire et al., 2006b; Chua et al., 2007; Iaria et al., 2007; Doeller et al., 2008). Functional MRI studies also found that imagination tasks activated anterior hippocampus more than did memory tasks (Addis et al., 2007, 2009). The 3D mental rotation task seems to meet the requirements of this particular type of spatial information processing because it involves head directions and angular features and the 3D figures were likely to be new and abstract to the participants. Therefore, it seems likely that the anterior hippocampus subserves 3D mental rotation as tested in our project. Furthermore, it was the right anterior hippocampus that had a high correlation with 3D mental rotation after controlling for covariates. Consistent with such hemispheric specialization, previous fMRI studies have found that navigation tasks activate the right anterior hippocampus (Maguire et al., 1998; Mellet et al., 2000; Iaria et al., 2003, 2009; Doeller et al., 2008) and lesion studies have found that damages to the right anterior hippocampus impair spatial memory while damages to the left anterior hippocampus impair verbal memory (Abrahams et al., 1997; Bohbot et al., 1998). In sum, structural differences between males and females in the right anterior hippocampus seem to explain sex differences in 3D mental rotation.

Interestingly, the sex difference in the posterior hippocampus was opposite of that in the anterior hippocampus. Consequently, the whole hippocampus did not show sex difference in GMV. Some of the previous studies, however, have shown sex difference in the total GMV of the hippocampus (Giedd et al., 1997; Goldstein et al., 2001). One possible reason is age differences between studies. Goldstein et al. (2001) found that females in their 30 s (mean = 36.3 years of age) showed a larger hippocampus than their male counterparts and Pruessner et al. (2001) found that males in their 30 s began to show a decrease in their hippocampus. Our participants were all college students. Indeed previous studies found sex differences in developmental trajectories of the size of the hippocampus (Giedd et al., 1997; Pruessner et al., 2001; Suzuki et al., 2005).

It should be mentioned that, although our study focused on the hippocampus's role in 3D mental rotation, our wholebrain analysis also found that the right calcarine was associated with 3D mental rotation. The calcarine sulcus is an important part of the primary visual cortex (V1; Sereno et al., 1995), which is important for all types of visual processing, but its specific role in 3D mental rotation needs further research. In addition to these two regions, previous studies found that mental rotation also elicited activation in other brain regions, but with mixed results on sex differences. One study found that males activated the frontal gyrus, cingulate cortex and occipital gyrus more than did females (Semrud-Clikeman et al., 2012), whereas another study found that females activated the middle temporal gyrus, frontal gyrus and primary motor cortex more than did males (Kucian et al., 2005). These cross-study differences in other brain regions may have been due to different levels of difficulty of the tasks (Roberts and Bell, 2003) or the particular paradigms used in different studies. Previous studies also found that the parietal lobe, rather than the hippocampus, had the highest correlation with spatial ability (Corbetta et al., 2000; Koscik et al., 2009; Hänggi et al., 2010). There are three possible explanations. First, the hippocampus is involved during the early brief stage of the spatial tasks (Iaria et al., 2003; Etchamendy et al., 2012), and thus its activation may be overshadowed by later stages of processing. Event-related potentials (ERP) studies also showed a late posterior negativity related to mental rotation tasks (Peronnet and Farah, 1989; Heil, 2002). Second, there are sex differences in neural bases of spatial processing, which might have complicated previous findings: males depend more on the hippocampus for spatial processing, whereas females depend more on the parietal and frontal lobes (Grön et al., 2000). Third, as mentioned earlier, task differences (using spatial or verbal strategies) affect the involvement of the hippocampus (Iaria et al., 2003; Bohbot et al., 2004).

Finally, although this study found significant sex differences in 3D mental rotation which could be accounted for by structural differences in the right anterior hippocampus, we should recognize that the brain is plastic and the structure of the hippocampus could be changed through training (Lövdén et al., 2010; Kühn and Gallinat, 2014; Kühn et al., 2014), even in the case of the aging brain (Kempermann et al., 1998; Maguire et al., 2000, 2006a; Lövdén et al., 2012). Training has been found to improve spatial ability (Sanchez, 2012; Sorby et al., 2013) and reduce or eliminate its sex difference (Terlecki and Newcombe, 2005; Feng et al., 2007; Spence et al., 2009; Tzuriel and Egozi, 2010). In addition, some researchers have also found that the GMV of females' posterior hippocampus changes with menstrual cycle, showing increases from the early follicular phase to the late follicular phase (Lisofsky et al., 2015). Such an influence should be considered in future research of sex differences in the hippocampus, especially the posterior hippocampus.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

CC, QD and XZ designed the study. WW analyzed the data. WW, CC and XZ wrote the manuscript.

#### ACKNOWLEDGEMENTS

This research was supported by the National Key Basic Research Program of China (no. 2014CB846100), two grants from the Natural Science Foundation of China (nos. 31521063 and 31271187), a grant from Advanced Technology Innovation Center for Future Education, Beijing Normal University, and the 111 Project (B07008). We thank Chunhui Chen, Qinghua He, Jin Li, Mingxia Zhang, Bi Zhu, Xuemei Lei, and many others for helping collect the data.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2016.00580/full#supplementary-material


Neuropsychopharmacol. Biol. Psychiatry 21, 1185–1201. doi: 10.1016/s0278- 5846(97)00158-9


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Wei, Chen, Dong and Zhou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Neural Basis of Video Gaming: A Systematic Review

#### Marc Palaus <sup>1</sup> \*, Elena M. Marron<sup>1</sup> , Raquel Viejo-Sobera1, 2 and Diego Redolar-Ripoll <sup>1</sup>

<sup>1</sup> Cognitive NeuroLab, Faculty of Health Sciences, Universitat Oberta de Catalunya, Barcelona, Spain, <sup>2</sup> Laboratory for Neuropsychiatry and Neuromodulation, Massachusetts General Hospital, Boston, MA, USA

Background: Video gaming is an increasingly popular activity in contemporary society, especially among young people, and video games are increasing in popularity not only as a research tool but also as a field of study. Many studies have focused on the neural and behavioral effects of video games, providing a great deal of video game derived brain correlates in recent decades. There is a great amount of information, obtained through a myriad of methods, providing neural correlates of video games.

Objectives: We aim to understand the relationship between the use of video games and their neural correlates, taking into account the whole variety of cognitive factors that they encompass.

Methods: A systematic review was conducted using standardized search operators that included the presence of video games and neuro-imaging techniques or references to structural or functional brain changes. Separate categories were made for studies featuring Internet Gaming Disorder and studies focused on the violent content of video games.

#### Edited by:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia, Spain

#### Reviewed by:

Tao Liu, Zhejiang University, China Pierre Mégevand, Université de Genève, Switzerland José Manuel Reales, Universidad Nacional de Educación a Distancia, Spain

> \*Correspondence: Marc Palaus mpalausg@uoc.edu

Received: 16 September 2016 Accepted: 26 April 2017 Published: 22 May 2017

#### Citation:

Palaus M, Marron EM, Viejo-Sobera R and Redolar-Ripoll D (2017) Neural Basis of Video Gaming: A Systematic Review.

> Front. Hum. Neurosci. 11:248. doi: 10.3389/fnhum.2017.00248

Results: A total of 116 articles were considered for the final selection. One hundred provided functional data and 22 measured structural brain changes. One-third of the studies covered video game addiction, and 14% focused on video game related violence.

Conclusions: Despite the innate heterogeneity of the field of study, it has been possible to establish a series of links between the neural and cognitive aspects, particularly regarding attention, cognitive control, visuospatial skills, cognitive workload, and reward processing. However, many aspects could be improved. The lack of standardization in the different aspects of video game related research, such as the participants' characteristics, the features of each video game genre and the diverse study goals could contribute to discrepancies in many related studies.

Keywords: addiction, cognitive improvement, functional changes, internet gaming disorder, neural correlates, neuroimaging, structural changes, video games

# INTRODUCTION

Nowadays, video gaming is a highly popular and prevalent entertainment option, its use is no longer limited to children and adolescents. Demographic data on video gaming shows that the mean age of video game players (VGPs) (31 years old, as of 2014) has been on the rise in recent decades (Entertainment Software Association, 2014), and it is a common activity among young

**159**

adults. Moreover, the increasing ubiquity of digital technologies, such as smart-phones and tablet computers, has exposed most of the population to entertainment software in the form of casual video games (VGs) or gamified applications. Therefore, an important segment of society, over 30% in tablet computers and 70% in smart phones, has been exposed to these technologies and can be considered now, in some form, casual gamers (Casual Games Association, 2013).

It is not uncommon to hear both positive and negative health claims related to VGs in the mass media. Most of the time, these are unverified and sensationalist statements, based on "expert" opinions, but lacking evidence behind them. On the other side, as VGs become more complex (due to improvements in computer hardware), they cater to audiences other than children, appealing to older audiences, and VGs have gained prevalence as a mainstream entertainment option. Consequently, the number of people who spend hours daily playing these kinds of games is increasing.

There is interest in knowing the possible effects of longterm exposure to VGs, and whether these effects are generally positive (in the shape of cognitive, emotional, motivation, and social benefits) (e.g., Granic et al., 2014) or negative (exposure to graphic violence, contribution to obesity, addiction, cardiometabolic deficiencies, etc.) (e.g., Ivarsson et al., 2013; Turel et al., 2016). Moreover, VGs possess a series of intrinsic features which make them suitable for use in experimental procedures: they seem to increase participants' motivation better than tasks traditionally used in neuropsychology (e.g., Lohse et al., 2013) and, in the case of purpose-made VGs, they offer a higher degree of control over the in-game variables.

For all the reasons mentioned above, VGs have recently sparked more scientific interest. The number of publications that study or use some form of gaming has been increasing, since 2005, at a constant rate of 20% per year. While during the 90's around 15 VG-related articles were published per year, in 2015 that number was over 350 (see **Figure 1**).

However, the concept of VG is extremely heterogeneous and within the category we find a myriad of hardly comparable genres. The behavioral effects and the neural correlates derived from the use of VGs depend both on the nature of the VG, the exposition to the game (hours of game play, age of onset, etc.) (Kühn and Gallinat, 2014), and, to a large extent, the individual characteristics of each participant (Vo et al., 2011).

Furthermore, due to the popularity of VG genres where graphic violence is prevalent (shooters, survival horror, fantasy), many studies have chosen to focus on this variable. Therefore, there is a reasonable amount of scientific literature devoted to the study of violent behaviors and violence desensitization as a consequence of violence in VGs (e.g., Wang et al., 2009; Engelhardt et al., 2011). Lastly, in particular since the emergence of online VG play, there are concerns about the addictive properties of VGs, akin to gambling and substance abuse, consequently making it another recurrent topic in the literature (e.g., Young, 1998).

For the time being, this whole body of knowledge is a complex combination of techniques, goals and results. On one hand, there are articles which study the effects of VG exposure over the nervous system and over cognition (e.g., Green and Seitz, 2015); it seems that there is solid evidence that exposure to certain kinds of VGs can have an influence on behavioral aspects, and therefore, we should be able to appreciate changes in the neural bases (Bavelier et al., 2012a). Actually, assessing the cognitive and behavioral implications of VG exposure has already been the object of study in recent systematic reviews and meta-analysis that used neuropsychological tasks to measure the influence of these games in healthy individuals. This is highly relevant since they evaluate the possible transfer effects of VG training to wider cognitive domains, providing a global perspective on how experimental and quasi-experimental designs differ in the size of the effect depending on the cognitive function (Powers et al., 2013), and how aging interferes with cognitive training by means of computerized tasks (Lampit et al., 2014) and VGs (Toril et al., 2014; Wang et al., 2016). Knowledge obtained about transfer effects is very important since it allows us to establish a link between VGs and cognition, indirectly helping us understand its neural basis, which in this case acts as a bridge between them. From an applied perspective, this knowledge can be used to design more effective rehabilitation programs, especially those focusing on older populations, keeping the most

useful components and reducing those which are shown to have less benefits.

On the other hand, VGs have been used as a research tool to study the nervous system. In this group of studies, it is common to find exposure to VGs as the independent variable, especially in most studies that use unmodified commercial VGs. However, it is not unusual to employ custom designed VGs, such as the widely used Space Fortress, where in-game variables can be fine-tuned to elicit certain mental processes in consonance with the research hypothesis (e.g., Smith et al., 1999; Anderson et al., 2011; Prakash et al., 2012; Anderson et al., 2015). Nevertheless, in both cases, the study of the VG exposure over the nervous system and the use of VGs as a research tool, VGs are used to obtain information about the underlying neural processes relevant to our research interest.

As yet there is no systematic review on this topic. The aim of this article is to gather all the scientific information referring to neural correlates of VGs and synthesize the most important findings. All articles mentioning functional and structural changes in the brain due to video gaming will be analyzed and information about the most relevant brain regions for each kind of study will be extracted; the main objective of many VG-related articles is not to study their neural correlates directly. Studies focusing on the addictive consequences or the effects of violence will be categorized independently.

Our final goal is to highlight the neural correlates of video gaming by making a comprehensive compilation and reviewing all relevant scientific publications that make reference to the underlying neural substrate related to VG play. This is the first effort in this direction that integrates data regarding VGs, neural correlates and cognitive functions that is not limited to action-VGs or cognitive training programs, the most frequently found research topics.

## METHODS

In order to structure reliably the gathered information in this systematic review, the guidelines and recommendations contained in the PRISMA statement (Liberati et al., 2009) have been followed.

# Eligibility Criteria

All articles which included neural correlates (both functional and structural) and included VG play in the research protocol or studied the effects of exposure to VGs were included in the review. Both experimental and correlational studies were included. No restrictions regarding publication date were applied.

Healthy participants of any age and gender were considered. Studies include both naive and experienced VG participants. Participants that reported gaming addiction or met criteria for internet gaming disorder (IGD) were also included in the review owing to the interest in observing neural correlates in these extreme cases. Other pathologies were excluded in order to avoid confounding variables.

Articles employing several methodologies were included. These can be organized into three main groups: studies where naive participants were trained in the use of a VG against a control group, studies comparing experienced players vs. non-gamers or low-experience players, and studies comparing differential characteristics of two VG or two VG genres.

The primary outcome measures were any kind of structural and functional data obtained using neuroimaging techniques including computerized tomography (CT) scan, structural magnetic resonance imaging (MRI), functional MRI (fMRI), positron emission tomography (PET), single-photon emission computed tomography (SPECT), magneto encephalography (MEG), transcranial direct current stimulation (tDCS), electroencephalogram (EEG), event-related potentials (ERP), event-related spectral perturbation (ERSP), steady state visually evoked potential (SSVEP), Doppler, and near-infrared spectroscopy (NIRS), following or related to VG use.

# Information Sources

Academic articles were located using two electronic databases: MEDLINE and Web of Science, and by scanning reference lists in other studies in the same field. Only the results from these two databases are reported since results from other sources (Scopus, Google Scholar) did not provide any relevant new results. The search was not limited by year of publication and only articles published in English, Spanish, or French were considered for inclusion. The first studies relevant to the topic are from 1992, while the most recent studies included in this review were published in February 2016.

#### Search

A systematic search was performed using a series of keywords which were expected to appear in the title or abstract of any study containing neural correlates of VGs. These keywords were grouped in two main categories. First of all, a group of keywords trying to identify articles which used VG as a technique or as a study goal. These keywords included search terms related to "video games" proper (in different orthographic variants), types of players (casual, core, and hardcore gamers) and references to serious gaming. In second place, two groups of keywords were used to detect articles which studied the neural basis: (1) keywords related to anatomical features, such as structural or functional changes, gray, or white matter (WM) volumes, cortical features, and connectivity and (2) keywords which mentioned the neuroimaging technique used to obtain that data, such as EEG, MRI, PET, or NIRS. (See Appendix)

#### Study Selection

Due to the large amount of results obtained by the previous search terms, strict exclusion criteria were applied to limit the final selection of studies. The same criteria were applied in a standardized way by two independent reviewers, and disagreements between reviewers were resolved by consensus. Due to high variability in the terminology and the diversity of keywords used in the search, a large number of false positive studies (65% of items found) appeared during the review process (see **Figure 2**).

By performing a search using standardized terms, a list of studies from the two databases was extracted. A large number of studies (62% of those that met the inclusion criteria) were found

to be duplicates in both databases, so a careful comparison was made in order to merge the references.

No unpublished relevant studies were considered. Studies relevant to the topic but not published in peer-reviewed journals, such as conference posters and abstracts were considered.

#### Data Collection Process

All the relevant information was classified in a spreadsheet, according to the variables listed below. Variables related to violence and abuse of VGs were also categorized, since a significant portion of the studies focused on these behaviors. A small number of articles (n = 7) were found in sources other than the two databases, mainly through references in other articles.

For each study, the following data was extracted: (1) characteristics of the sample, including sample size, average age and range, inclusion and exclusion criteria, and gaming experience; (2) aim of the study, specially noting if it is focused on gaming abuse or exposure to violent content; (3) name and genre of the VG used during the study, if applicable; (4) study design; (5) main neuroimaging technique applied in the study, and whether the technique was applied while participants played; (6) functional and structural neural correlates observed in the study. Studies were then classified in several groups as to whether they provided structural or functional data, and whether they addressed violent or addictive aspects.

Moreover, in order to understand the outcomes derived from the neural correlates, most of the studies establish a connection between these correlates and their cognitive correspondence, either by directly measuring the outcomes using cognitive tasks and questionnaires, or by interpreting their results based on existing literature.

In the discussion section of this review, we attempted to summarize the main findings by associating the neural changes to their cognitive and behavioral correspondences. Whereas, in many cases the original articles provided their own explanation for the phenomena, we also worked on integrating the general trends from a cognitive perspective. We therefore indicate which studies provide and interpret empirical cognitive or/and behavioral data (non-marked), those which discuss cognitive or/and behavioral implications without assessing them (marked with <sup>∗</sup> ), and those which did not provide any cognitive nor behavioral information (marked with ∗∗).

#### RESULTS

#### Study Selection

The combined search of MEDLINE and Web of Science provided a total of 306 unique citations. Of these, 205 studies were discarded because they did not seem to meet the inclusion criteria after reviewing the abstract. The main reasons for exclusion were: being a review article (n = 22), absence of neural correlates (n = 70), presence of pathology in the participants (n = 65), not being related to VGs or using simple computerized tasks which could not be considered VGs (n = 69), testing of new technologies in which the brain correlates were a mere by-product (n = 25), articles focused on motor functions (n = 15), pharmacological studies (n = 2), and finally, articles in languages other than English, Spanish, or French (n = 18). Excluded articles often met more than one exclusion criteria. As mentioned in the eligibility criteria, an exception were those articles in which the pathology consisted of gaming overuse or addiction and articles which featured psychopathology and included groups of healthy participants from whom neural data was provided.

Fifteen extra articles that met the inclusion criteria were found after examining the contents and following the references in the previously selected studies. As expected, articles written in English comprised the vast majority; among the rest (8.9%), 10 of them (4.9%) were discarded from the review solely for language reasons. No unpublished relevant studies were considered. Studies relevant to the topic but not published in peer-reviewed journals, such as conference posters and abstracts were considered. Ultimately, a total of 116 studies were identified for inclusion in the review (see flow diagram in **Figure 2**).

Most studies (n = 100; 86.2%) provided functional data, while only 22 (18.9%) of them studied structural changes in the brain. A few (n = 6; 5.2%) provided both structural and functional data. A significant number of the studies focused their attention on excessive playing or VG addiction. That was the case for 39 (33.6%) of the reviewed articles, so we considered it appropriate to analyze them in their own category. Likewise, 16 studies (13.8%) focusing on the violent component of VGs were also placed in their own category. These categories were not always exclusive, but there was only one case where the two criteria were met. (See **Table 1** for a breakdown by category).

#### Characteristics of Included Studies

Based on their methodology, studies in this review could be classified as experimental (n = 54; 46.6%), randomly assigning the participant sample to the experimental groups, and quasiexperimental (n = 62; 53.4%), where the groups were usually constructed according to the participants' characteristics. While studies involving excessive gaming almost always followed a quasi-experimental design comparing experienced gamers against low-experience VG players, articles studying normal gaming and the effects of violence exposure used both experimental and quasi-experimental designs. A fraction of

#### TABLE 1 | Article breakdown by category.


IGD, Internet Gaming Disorder.

the studies (n = 15; 13%), both experimental and quasiexperimental, compared the results to a baseline using a pretestposttest design. That was the case for most studies involving a training period with VGs.

The cumulative sample included in this review exceeds 3,880 participants. The exact number cannot be known since participants could have been reused for further experiments and in some cases the sample size was not available. Most studies used adolescents or young adults as the primary experimental group, since that is the main demographic target for video gaming. In many cases, only male participants were accepted. In the cases where VG experience was compared, the criteria varied greatly. For the low video gaming groups, VG usage ranged from <5 h/week to none at all. For the usual to excessive VG groups, it could typically start at 10 h/week. In some cases, where the level of addiction mattered, the score in an addiction scale was used instead.

In more than half of the studies (n = 67; 57.8%) participants actually played a VG as part of the experimental procedure. In the rest, either neural correlates were measured in a resting-state condition or VG related cues were presented to the participants during the image acquisition.

Structural changes in the gray matter (GM) were measured in the form of volumetric changes, whereas WM was assessed using tractography techniques. Functional changes were typically measured comparing activation rates for different brain regions. Nearly half (n = 55; 47.4%) of the assessed studies used fMRI as the neuroimaging technique of choice, while other functional techniques remained in a distant second place. Functional connectivity was assessed in several studies employing restingstate measures. EEG in its multiple forms was also widely used (n = 32; 27.6%) to obtain functional data, either to measure activation differences across regions or in the form of event related potentials. (See **Table 2** for a breakdown by neuroimaging technique).

The high variability in the study designs, participants and objectives meant we focused on describing the studies, their results, their applicability, and their limitations on a qualitative synthesis rather than meta-analysis.

#### Structural Data

Data regarding structural changes following VG use was available from 22 studies, fourteen of which provided structural data for more than 800 participants that had a normal VG use and included both VGPs and non-VGPs (see **Table 3**). The remaining eight studies examined aspects concerning the excessive or professional use of VG (see **Table 4**).

In studies dealing with healthy, non-addicted participants, eight studies used MRI to provide structural information for the GM, while six focused on the WM using diffusion tension imaging (DTI).

Three studies compared lifetime VG experience prior to the study, while the rest used a training paradigm where participants were exposed to a VG during the experimental sessions prior to the neuroimaging procedure and compared to a baseline. Seven studies provided WM integrity data using the DTI technique while the rest analyzed cortical thickness variations using regular structural MRI.

The most researched areas in studies examining volumetric differences found relevant changes in prefrontal regions, mainly the dorsolateral prefrontal cortex (dlPFC) and surrounding areas, superior and posterior parietal regions, the anterior cingulate cortex (ACC), the cerebellum, the insula, and subcortical nuclei, as well as the striatum and the hippocampus. In addition to this, structural connectivity studies observed changes in virtually all parts of the brain, such as in fibers connecting to the visual, temporal and prefrontal cortices, the corpus callosum, the hippocampus, the thalamus, association fibers like the external capsule, and fibers connecting the basal ganglia.

#### Functional Data

A 100 articles provided functional data combined with VG use. Of these, around half (n = 51) were studies which did not include violence or addiction elements (See **Table 5**). A third (n = 34) corresponded to articles aiming at understanding the neural bases of IGD (See **Table 6**), often drawing parallels with other behavioral addictions and trying to find biomarkers for VG addiction. The rest (n = 16) were devoted to study the effects of violence exposure in VGs (See **Table 7**). In total, these studies provided functional data for 3,229 experimental subjects, including control groups. Note that there is some overlap with the structural section, since a few (n = 6) studies provided both structural and functional data.

The rich diversity of methodologies and research goals means that the study of functional brain correlates covers practically all regions of the brain. The most studied areas are found in frontal and prefrontal regions and are concerned with high-order cognitive processes and motor/premotor functions. Activity changes in parietal regions, like the posterior and superior parietal lobe, relevant for diverse functions such as sensory integration and visual and attentional processing, are also a common find. The anterior and posterior cingulate cortices, together with other limbic areas, such as the amygdala, and the entorhinal cortex, display activity changes possibly as a consequence of learning and emotion processing and memory. Structures in the basal nuclei also have a prominent role, particularly the striatum, in studies related to VG addiction. Finally, we must not overlook a series of brain regions which do not appear as frequently, such as occipital and temporal cortices, the cerebellum, the thalamus, and the hippocampus, TABLE 2 | Neuroimaging techniques used in the reviewed studies.


EEG, Electroencephalography; ERP, Event-related potentials; ERSP, Event-related spectral Dynamics; fMRI, Functional magnetic resonance imaging; MRI, Magnetic resonance imaging; NIRS, Near-infrared spectroscopy; PET, Positon emission tomography; SPECT, Single-photon emission computed tomography; SSVEP, Steady-state visual evoked potential.

where distinctive activity patterns have also been observed as a result of VG play.

# DISCUSSION

Due to the given amount of data provided in the reviewed articles, we decided to categorize all the information based on the cognitive functions which are associated with the neurophysiological correlates, rather than focusing on the main research goal for each study. Thus, the discussion has been grouped into six main sections: attention, visuospatial skills, cognitive workload, cognitive control, skill acquisition, and reward processing. These cognitive processes are not clearly independent since they present some degree of overlap. This is particularly relevant in the cases of cognitive workload, which may be linked to virtually any cognitive function, and attention, which is also closely related to cognitive control, among other functions. Nevertheless, after analyzing the literature, virtually all the articles included in this review focused on one or more of the mentioned cognitive functions in order to explain their findings. Thus, the proposed categories have sufficient presence in the literature to justify their use as separate domains for the study of cognition. While they should not be understood as independent aspects of cognition, the chosen categorization will allow a link between the underlying neural correlates and corresponding behavior to be easily established.

Within each one of the sections, structural and functional correlates are discussed according to their contributions to cognitive functioning, including possible inconsistencies between studies and the presence of transfer effects. Owing to the close link between VG violence, limbic and reward systems, and the possible abnormal reward mechanisms in addicted players, studies previously classified with violence in VGs and VG addiction are predominantly discussed in the reward processing section.


**165**


3


Continued


**167**


this dimension. Articles marked with a double asterisk (\*\*) did not provide either empirical cognitive data nor discuss cognitive implications.

 The rest of the articles (non-marked)

 have measured cognitive correlates with specific tasks.




5


Continued



**173**



vmPFC, Ventromedial

empirical cognitive data nor discuss cognitive implications.

 prefrontal cortex; VS, Ventral striatum. Articles marked with an asterisk (\*) discuss cognitive implications

 The rest of the articles (non-marked)

 have measured cognitive correlates with specific tasks.

 without directly assessing this dimension. Articles marked with a double asterisk (\*\*) did not provide either


(Continued)

TABLE

6


Studies

providing

functional

data

dealing

with

VG

experts

or

excessive

gaming.



(Continued)

TABLE

6


Continued


May 2017 | Volume 11 | Article 248

TABLE

6


Continued

**179**


(Continued)

TABLE

6


Continued


6


Continued


6


Continued

SMA, Supplementary

an asterisk (\*) discuss cognitive implications

(non-marked)

 have measured cognitive correlates with specific tasks.

 motor area; SN, Salience network; SPG, Superior temporal gyrus; STG, Superior temporal gyrus; TPJ,

 without directly assessing this dimension. Articles marked with a double asterisk (\*\*) did not provide either empirical cognitive data nor discuss cognitive implications.

Temporo-parietal

 junction; vlPFC, Ventrolateral

 prefrontal cortex; VS, Ventral striatum. Articles marked with

 The rest of the articles


(Continued)

TABLE 7 | Studies providing functional

 data focused on the violent contents of VG.



Frontiers in Human Neuroscience | www.frontiersin.org

TABLE 7 | Continued

potentials; PFC, Prefrontal cortex; PoCG, Post-central

Video game player. Articles marked with an asterisk (\*) discuss cognitive implications

implications.

 The rest of the articles (non-marked)

 have measured cognitive correlates with specific tasks.

 Functional connectivity;

 Fusiform gyrus; fMRI, Functional Magnetic Resonance Imaging; FPN, Frontoparietal

 gyrus; rACC, Rostral anterior cingulate cortex; SFG, Superior frontal gyrus; SPECT, Single-photon

 without directly assessing this dimension. Articles marked with a double asterisk (\*\*) did not provide either empirical cognitive data nor discuss cognitive

 network;

 Inferior frontal gyrus;

 emission computed tomography;

 Middle temporal gyrus;

 STG, Superior temporal gyrus; VG, Video game; VGP,

 cortex; Pcu, Precuneus;

## Attention

Attentional resources are one of the main cognitive domains in which VGs are involved and one of the most researched. The involvement of attentional networks during gameplay is closely related with other brain regions responsible for cognitive control, especially when more complex operations toward a specific goal are required. Many brain regions are involved in attention, particularly nodes in the dorsal frontoparietal system, mediating top-down attentional processes in goal-oriented behavior, but also nodes in the ventral network, responsible for bottom-up sensory stimulation (e.g., Vossel et al., 2014) dealing with those salient stimuli to which the player must pay attention.

There is evidence that VGPs display enhanced performance in a range of top-down attentional control areas, such as selective attention, divided attention, and sustained attention (Bavelier et al., 2012b). The ACC is an area that consistently shows functional activity during VG play due to its involvement as the main hub in top-down attentional processes (selective or focused attention) and goal-oriented behavior (e.g., Anderson et al., 2011<sup>∗</sup> ; Bavelier et al., 2012b).

Non-VGPs, compared to VGPs, showed greater frontoparietal recruitment, a source of selective attention, as task demands increased, showing that habitual gamers have more efficient topdown resource allocation during attentional demanding tasks (Bavelier et al., 2012a). That resource optimization effect can also be observed in attentional control areas, such as the right middle frontal gyrus (MFG), right superior frontal gyrus (SFG), and the ventromedial prefrontal cortex (vmPFC) (Prakash et al., 2012<sup>∗</sup> ). Functional connectivity changes in the attentional ventral stream, particularly in occipitotemporal WM, responsible for bottom-up reorienting toward novel stimuli, have also been observed as a result of VG training and were linked to cognitive improvement (Strenziok et al., 2014<sup>∗</sup> ). Integration between attentional and sensoriomotor functions has been observed in expert VGPs in the form of increased structural GM and functional connectivity in anterior and posterior insular sub regions where long-term exposure to attentional VG demands coordinated with the fine skills involved in using the VG controller may have resulted in plastic changes in these two regions that are respectively involved in attentional and sensoriomotor networks (Gong et al., 2015<sup>∗</sup> ).

Using electrophysiological techniques, it seems that VG play correlates with an increment of the frontal midline theta rhythm, associated with focused attention (Pellouchoud et al., 1999<sup>∗</sup> ), and increases with VG practice (Sheikholeslami et al., 2007∗∗; Smith et al., 1999), both in an action and a puzzle VG, attributable to ACC activity. Likewise, amplitudes in the P200 (Wu et al., 2012), an early visual stimuli perceptual component, and P300 components (Mishra et al., 2011; Wu et al., 2012), which involved in early stages of decision-making, were also linked to topdown spatial selective attention improvements after training and lifetime exposure to action VG. Action VGPs and nonaction VGPs seem to respond differently in the way they deploy attention to central and peripheral targets in visual attention tasks, as measured by the N2pc component (West et al., 2015), which is also linked to selective attention.

If we consider different VG genres, it seems that action VGs are better at improving selective attention than other slow-paced VGs such as role-playing games (RPG) (Krishnan et al., 2013), puzzle (Green and Bavelier, 2003), or strategy VGs (Tsai et al., 2013) which require high planning skills and other forms of proactive cognitive control. This is probably due to the extensive use of attentional systems, paired with precise timings that action VGs require. While these improved attentional skills are typically observed in habitual VGPs, it is possible to achieve long-lasting improvements as a result of a single VG training procedure (Anguera et al., 2013).

# Visuospatial Skills

Visuospatial skills encompass processes that allow us to perceive, recognize, and manipulate visual stimuli, including visuomotor coordination and navigational skills, and VGs are predominantly interactive visual tasks.

The areas implicated in visuospatial processing have traditionally been classified along a visual ventral stream (responsible for object recognition) and a visual dorsal stream (responsible for spatial location). Both depart from the visual cortex, in the occipital lobe, and reach the posterior parietal cortex (dorsal stream) and the inferior temporal cortex (ventral stream). More recent proposals have refined that model, broadening the traditional conceptualization of the two-stream model (for further details see Kravitz et al., 2011). Among other nodes, the role of the hippocampus stands out for its function in higher order visual processing and memory (Kravitz et al., 2011; Lee A. C. H. et al., 2012).

Neural correlates related to visuospatial skills have been detected in relationship with structural volume enlargements of the right hippocampus (HC), both in long-term gamers and experimentally after a VG training period (Kühn et al., 2013; Kühn and Gallinat, 2014<sup>∗</sup> ). Increased hippocampal volumes were also found by Szabó et al. (2014∗∗), although the authors do not attribute that effect to the VG training. The entorhinal cortex, associated with navigational skills (Schmidt-Hieber and Häusser, 2013), which together with the HC is involved in spatial memory (Miller et al., 2015), was also correlated with lifetime experience in logic/puzzle and platform VG (Kühn and Gallinat, 2014<sup>∗</sup> ).

Decreased activation in occipitoparietal regions, associated with the dorsal visuospatial stream (Goodale and Milner, 1992), has also been linked to improved visuomotor task performance, suggesting a reduction of the cognitive costs as a consequence of the VG training, dependent on the training strategy used in the VG (Lee H. et al., 2012). Earlier N100 latencies in the visual pathways are another feature found in long-term VGPs, which may contribute to faster response times in visual tasks after years of practice (Latham et al., 2013).

Reduced WM integrity in interhemispheric parietal networks for spatially-guided behavior could be another symptom for a decreased reliance on specific visuospatial networks after VG training as performance improved (Strenziok et al., 2013<sup>∗</sup> ). However, other studies found that increased WM integrity in visual and motor pathways was directly responsible for better visuomotor performance in long term VGPs (Zhang et al., 2015<sup>∗</sup> ). Despite these connectivity changes, brain functional

differences between VGPs and non-VGPs do not always reflect performance in visuospatial skills, which were best predicted by non-visual areas (Kim Y. H. et al., 2015<sup>∗</sup> ).

#### Cognitive Workload

Brain activation patterns depend on the cognitive demands of the environment and also on the associated level of workload (Vogan et al., 2016), which is directly related to the allocation of resources to the working memory and its associated attentional processes (Barrouillet et al., 2007). When we manipulate this variable and observe its neural correlates, it is likely that we are seeing the result of neural recruitment mechanisms as the cognitive demands increase (Bavelier et al., 2012a). VGs have often been employed to obtain cerebral measures of cognitive workload, given the ability to adjust many of their features, particularly in a purpose-made VG, such as the popular Space Fortress. Due to the nature of this task, it is likely that functional changes related to the manipulation of cognitive load appear along the attentional networks and in specific key nodes related to executive functions, mainly in prefrontal and parietal cortices.

Cognitive workload is not a unitary concept; some studies have been able to identify different activation patterns by manipulating the difficulty of a task (e.g., Anderson et al., 2011<sup>∗</sup> ). Namely, the number of stimuli appearing simultaneously on the screen and the complexity of each stimulus seem to elicit different responses from the brain. For instance, in the context of an air traffic control simulator, when directly manipulating the task difficulty by increasing the number of planes that a participant had to attend, the theta band power increased (Brookings et al., 1996). Theta band power also displayed higher power compared to a resting condition, and gradually increased during gameplay (Sheikholeslami et al., 2007∗∗). The theta band seems to be directly related to the level of cognitive demand in a wide range of cognitive abilities, such as attention, memory, and visuospatial processes, although this finding is not universal and decreased theta band power has been observed as a feature of sustained attention. So it appears that it is both related to task complexity and levels of arousal and fatigue. On the other hand, beta band power seemed to be more associated with the complexity of the task, especially in frontal and central areas, likely indicating a qualitative change in the cognitive strategy followed by the participant or the type of processing done by the brain (Brookings et al., 1996).

Assessing cognitive workload with ERP shows that during VG play, amplitudes tend to correlate negatively with game difficulty in expert VGPs, with most ERP (P200, N200) having its maximum amplitude in frontoparietal locations, with the exception of the P300, being larger in parietal regions (Allison and Polich, 2008). This is consistent with previous literature about cognitive workload related to attention and working memory demands and ERP peak amplitude decrements (Watter et al., 2001).

Frontoparietal activity, linked to attentional processes, also exhibits recruitment effects as game difficulty increases, which also affects reaction times, making them slower (Bavelier et al., 2012a). As mentioned above, comparing habitual VGPs with non-VGPs, it appears that the former show less recruitment of frontoparietal networks when compared to the non-gamers, which could be attributed to their VG experience and the optimization of their attentional resources (Bavelier et al., 2012a). Increased blood flow in prefrontal areas like dlPFC was also associated with increasing cognitive demands related to attention, verbal and spatial working memory and decision making (Izzetoglu et al., 2004<sup>∗</sup> ).

The intensity of the events displayed in the VG was also linked with certain electrophysiological correlates. High intensity events, such as the death of the VG character, were associated with increased beta and gamma power when compared with general gameplay (McMahan et al., 2015).

## Cognitive Control

During the course of a VG, the player can encounter many situations in which he has to use one of several possible actions. For instance, while playing a game, the player might be required to interrupt and quickly implement an alternate strategy, or manipulate a number of elements in a certain way in order to solve a puzzle and progress in the storyline. All these abilities can be characterized under the "umbrella" of cognitive control, which includes reactive and proactive inhibition, task switching and working memory (Obeso et al., 2013). These cognitive control aspects are key to overcoming the obstacles found the VG. In fact, they are frequently used in parallel (Nachev et al., 2008) in order to engage in goal-directed behavior. These processes have their neural substrate in the prefrontal cortex, supported by posterior parietal areas and the basal ganglia (Alvarez and Emory, 2006). Therefore, most changes regarding cognitive control observed after VG play will likely be detected in these regions.

Indeed, prefrontal regions are one of the brain areas in which GM volumetric changes have been observed as a result of a cognitive training with a VG, which is remarkable if we consider that the common VG training period spans from a few weeks to a couple of months. These regions, such as the dlPFC, determinant for cognitive control (Smith and Jonides, 1999), show volumetric changes that seem to correlate with VG performance and experience, likely as a result of the continuous executive demands found in a VG, such as attentional control and working memory (Basak et al., 2011). These volumetric changes even result in correlations with transfer effects in cognitive control tasks (Hyun et al., 2013). Volumetric-behavioral correlations work both ways, since individuals with decreased orbitofrontal cortex (OFC) volumes as a consequence of VG addiction show poorer performance in similar tasks (Yuan et al., 2013a).

During VG play, these prefrontal regions increase their activation in response to the cognitive demands (game difficulty) and display a positive correlation with performance measures (Izzetoglu et al., 2004<sup>∗</sup> ). Still, prefrontal activity is not only affected by the complexity of the task, but also by the nature of the task and the individual differences of the participants (Biswal et al., 2010). Some research groups have found deactivation of dorsal prefrontal regions during gameplay. A possible explanation for this phenomenon could be the interference effect of attentional resources during visual stimuli, since activity in the dlPFC remained stable while passively watching a VG, but not while actively playing it (Matsuda and Hiraki, 2004<sup>∗</sup> ). Likewise, the same team also found that finger movement while handling the game controller did not seem to contribute as a source of prefrontal deactivation. Further studies also noted that the observed prefrontal deactivation was not affected by age or performance level (Matsuda and Hiraki, 2006<sup>∗</sup> ), although some authors have challenged that finding, claiming that prefrontal activation during video gaming was age-dependent, where most adults tended to show increased prefrontal activity while it was attenuated in some of the children. So prefrontal activation could be a result of age, game performance, level of interest and attention dedicated to the VG (Nagamitsu et al., 2006∗∗).

It has been possible to establish a causal relationship between dlPFC activation and cognitive control using noninvasive stimulation methods. Stimulating the left dlPFC using tDCS results in a perceptible improvement in multitasking performance in a three-dimensional VG (Hsu et al., 2015).

Changes in functional activity after a training period in other executive-related nodes, such as the superior parietal lobe (SPL) have also been associated with working memory improvements (Nikolaidis et al., 2014).

Connectivity-wise, Martínez et al. (2013) found resting-state functional connectivity changes in widespread regions (frontal, parietal, and temporal areas) as a result of a VG training program, which were attributed to the interaction of cognitive control and memory retrieval and encoding.

Despite the observed structural and functional changes in prefrontal areas, executive functions trained in a VG show poor transfer effects as measured with cognitive tasks (Colom et al., 2012; Kühn et al., 2013). Others, showing neural correlates related to executive functions, visuospatial navigation and fine motor skills, failed to observe far transfer effects even after a 50 h training period, as measured by neuropsychological tests (Kühn et al., 2013). By studying lifelong experts or professional gamers, some studies have detected structural GM changes that correlated with improved executive performance, involving posterior parietal (Tanaka et al., 2013), and prefrontal (Hyun et al., 2013) regions. Regarding structural connectivity, WM integrity changes in thalamic areas correlated with improved working memory, but integrity of occipitotemporal fibers had the opposite effect (Strenziok et al., 2014). VG experience also seems to consolidate the connectivity between executive regions (dlPFC and the posterior parietal cortex -PPC-) and the salience network, composed by the anterior insula and the ACC, and responsible for bottom-up attentional processes (Gong et al., 2016).

Different VG genres seem to affect which cognitive skills will be trained. Training older adults in a strategy VG seemed to improve verbal memory span (McGarry et al., 2013), but not problem solving or working memory, while using a 2D action VG improved everyday problem solving and reasoning. Transfer effects were even more relevant in the case of a brain training/puzzle VG, where working memory improvements were also observed (Strenziok et al., 2014). Using a younger sample, working memory improvements were detected after training with a 2D action VG (Space Fortress, Nikolaidis et al., 2014). Nevertheless, training periods found in scientific literature vary greatly and it is difficult to ascertain if a lack of transferred skills cannot be due to a short training period.

Regarding electrophysiological methods, electroencephalography studies have shown functional correlations with alpha oscillations in the frontal cortex that could reflect cognitive control engagement in the training VG (Mathewson et al., 2012).

# Skill Acquisition

Several studies have attempted to determine which regions could act as predictors for skill acquisition. Since this is a domain in which multiple cognitive functions are involved, volumetric and functional changes will appear in a wide range of cortical regions. Most of the learning in VGs is non-declarative, including visuospatial processing, visuomotor integration, and motor planning and execution. Improvements in these areas will generally lead to decreased cortical activation in the involved areas due to the optimization of resources, whereas this is not the case for striatal and medial prefrontal areas, which display a distinctive pattern of activation and typically increase their activity due to skill acquisition (Gobel et al., 2011).

Striatal volumes were determined as predictors for skill acquisition, although structural changes in the hippocampal formation were not (Erickson et al., 2010). Particularly, the anterior half of the dorsal striatum was the region which more accurately predicted skill acquisition in a complex VG (Vo et al., 2011). Other areas identified as predictors were the medial portion of the Brodmann area 6, located in the frontal cortex and associated with motor control in cognitive operations and response inhibition and the cerebellum, likely associated with motor skill acquisition (Basak et al., 2011). The same authors also considered the post-central gyrus, a somatosensory area that could be related to a feedback mechanism between prefrontal and motor regions, while the volume of the right central portion of the ACC also correlated with skill acquisition and is responsible for monitoring conflict. Finally, dlPFC volumes, with a central role on the executive functions, also showed correlation with VG performance over time (Basak et al., 2011).

On a functional level, Koepp et al. (1998∗∗) was the first team to identify a relationship between striatum activity, associated with learning and the reward system, and performance level in a VG. The study by Anderson et al. (2015) also support the notion that the striatum, particularly the right dorsal striatum, composed of the caudate nucleus and the claustrum, is a key area in skill acquisition. However, the same team was able to predict learning rates more accurately by comparing whole sequential brain activation patterns to an artificial intelligence model.

Learning gains seemed to be best predicted by individual differences in phasic activation in those regions which had the highest tonic activation (Anderson et al., 2011<sup>∗</sup> ). Differences related to learning rates were also observed in the activation of the default mode network, especially when different training strategies were employed by the participants. Using electrophysiological methods, the best predictors were the alpha rhythms (Smith et al., 1999), particularly frontal regions, and alpha and delta ERSP, which are associated with cognitive control (task switching and inhibition) and attentional control networks (Mathewson et al., 2012). Frontal midline theta rhythms, linked with focused concentration and conscious control over attention, seemed to increase over the course of the training sessions with a VG (Smith et al., 1999).

# Reward Processing

#### Addiction

VG addiction is understood as an impulse-control disorder with psychological consequences, not unlike other addictive disorders, especially non-substance addictions such as pathological gambling (Young, 1998). Internet Gaming Disorder (IGD) has been recently proposed for inclusion as a psychiatric diagnosis under the non-substance addiction category in the Diagnostic and Statistical Manual for Mental Disorders 5th ed. (DSM-5) (American Psychiatric Association, 2013), with its diagnostic criteria being adapted from those of pathological gambling. Efforts in order to find a consensus regarding its assessment are still ongoing (Petry et al., 2014). In some cases, VG addiction is included as a subset within the broader definition of Internet addiction, although this categorization is not always consistent, since many VGs in which addiction is studied do not have an online component. Several instruments have been developed to assess gaming addictions: the Internet Addiction Test (IAT) by Young (1998) and the Chen Internet Addiction Scale (CIAS) (Chen et al., 2004) being the most used in research and clinical practice.

Within the VG literature, there is a great deal of interest in knowing the neurobiological basis of VG addiction and whether it can be related to other behavioral addictions by observing abnormal reward processing patterns. This seems to be the case, since many regions involved in the reward system have been found affected in people with VG addiction (e.g., Liu et al., 2010<sup>∗</sup> ; Hou et al., 2012<sup>∗</sup> ; Hahn et al., 2014). Among the complex set of structures that are involved in the reward system, the cortico-ventral basal ganglia circuit is the center of the network responsible for assessing the possible outcomes of a given behavior, especially in those situations where, during a goal-oriented behavior, complex choices must be made and the value and risk of secondary rewards must be weighed (Haber, 2011).

Differential structural and functional changes in addicted individuals can be found throughout the reward system. The main components of this circuit are the OFC, the ACC, the ventral striatum, ventral pallidum, and midbrain dopaminergic neurons (Haber, 2011), but many other regions seem to be involved in the wider context of addiction.

By exposing the participants to gaming cues, it is possible to elicit a craving response and study which regions show stronger correlation in IGD patients compared to controls. The model proposed by Volkow et al. (2010) involves several regions, which are mentioned consistently across studies, to explain the complexity of the craving. First, the precuneus, which showed higher activation in addicted individuals (Ko et al., 2013<sup>∗</sup> ), is an area associated with attention, visual processes, and memory retrieval and integrates these components, linking visual information (the gaming cues) to internal information. Regions commonly associated with memory and emotional functions are also involved: the HC, the parahippocampus and the amygdala seem responsible for providing emotional memories and contextual information for the cues (Ding et al., 2013<sup>∗</sup> ), regions where subjects showed higher activation (O'Brien et al., 1998). Central key regions of the reward system, like the limbic system and the posterior cingulate have a role in integrating the motivational information and provide expectation and reward significance for gaming behaviors (O'Doherty, 2004). The OFC and the ACC are responsible for the desire for gaming and providing a motivational value of the cue-inducing stimuli (Heinz et al., 2009), contributing to the activation and intensity of the reward-seeking behavior (Kalivas and Volkow, 2005; Brody et al., 2007; Feng et al., 2013<sup>∗</sup> ). In the last step, prefrontal executive areas such as the dlPFC have also shown involvement during craving responses (Han et al., 2010a<sup>∗</sup> ; Ko et al., 2013<sup>∗</sup> ), and are linked to the formation of behavioral plans as a conscious anticipation of VG play. All these frontal regions[dlPFC, OFC, ACC, and the supplementary motor area (SMA)] tend to show reduced GM volumes in participants with IGD (Jin et al., 2016<sup>∗</sup> ).

Striatal volumes, particularly the ventral striatum, responsible for a key role in reward prediction, were reduced in people with excessive internet gaming compared to healthy controls (Hou et al., 2012<sup>∗</sup> ) and in the insula, with its role in conscious urges to abuse drugs (Naqvi and Bechara, 2009).

Overall, these features are characteristic of reward deficiencies that entail dysfunctions in the dopaminergic system, a shared neurobiological abnormality with other addictive disorders (Ko et al., 2009<sup>∗</sup> , 2013<sup>∗</sup> ; Cilia et al., 2010; Park et al., 2010; Kim et al., 2011).

Several regions seem to be related to the intensity of the addiction. In a resting state paradigm, connectivity between the left SPL, including the posterior cingulate cortex (PCC), and the right precuneus, thalamus, caudate nucleus, nucleus accumbens (NAcc), SMA and lingual gyrus (regions largely associated with the reward system) correlated with the CIAS score, while at the same time, functional connectivity with the cerebellum and the superior parietal cortex (SPC) correlated negatively with that score (Ding et al., 2013<sup>∗</sup> ). The distinctive activation and connectivity patterns related to the PCC (Liu et al., 2010<sup>∗</sup> ), an important node in the DMN and reward system (Kim H. et al., 2015), could be used as a biomarker for addiction severity, both in behavioral and substance dependence. As the addiction severity increases, changing from a voluntary to a compulsive substance use, there is a transition from prefrontal to striatal control, and also from a ventral to a dorsal striatal control over behavior (Everitt and Robbins, 2005), Matching evidence in the form of weaker functional connectivity involving the dorsal-caudal putamen has been found in IGD patients (Hong et al., 2015<sup>∗</sup> ).

It is important to note that, even controlling the amount of time playing VGs, professional and expert gamers display very different neural patterns compared to addicted VGPs. Gamers falling into the addiction category show increased impulsiveness and perseverative errors that are not present in professional gamers and, on a neural level, they differ in GM volumes in the left cingulate gyrus (increased in pro-gamers) and thalamus (decreased in pro-gamers), which together may be indicative of an unbalanced reward system (Sánchez-González et al., 2005; Han et al., 2012b).

#### Exposure to Violent Content

Many articles use violent VGs in their designs as a way to study the effects of violence exposure, emotional regulation and long-term desensitization. Exposure to violent content has been associated with reduced dlPFC activity and interference in executive tasks (inhibition, go/no-go task) (Hummer et al., 2010), which cannot be interpreted without studying the link with the limbic and reward systems. It is likely that repeated exposure to violent content will trigger desensitization processes that affect regions linked to emotional and attentional processing, particularly a frontoparietal network encompassing the left OFC, right precuneus and bilateral inferior parietal lobes (Strenziok et al., 2011). It is hypothesized that this desensitization may result in diminished emotional responses toward violent situations, preventing empathy and lowering the threshold for non-adaptive behaviors linked to aggressiveness (Montag et al., 2012).

Limbic areas are associated with violence interactions, shown by the activation changes detected in the ACC and the amygdala in the presence of violent content (Mathiak and Weber, 2006<sup>∗</sup> ; Weber et al., 2006<sup>∗</sup> ). Lateral (especially left) prefrontal regions might be involved as well, integrating emotion and cognition and therefore working as a defense mechanism against negative emotions by down-regulating limbic activity (Montag et al., 2012). Wang et al. (2009) also provided evidence of that regulation mechanism by observing differing functional correlations between the left dlPFC and the ACC, and medial prefrontal regions & the amygdala during an executive task after a short-term exposure to a violent VG.

The reward circuit also seems to be implicated in the presence of violent content. Activation decreases in the OFC and caudate appeared in the absence of an expected reward. However, it does not seem that violence events were intrinsically rewarding (Mathiak et al. (2011<sup>∗</sup> ). Zvyagintsev et al. (2016<sup>∗</sup> ) found that resting-state functional connectivity was reduced within sensorymotor, reward, default mode and right frontotemporal networks after playing a violent VG, which could be linked to short-term effects on aggressiveness.

Gender differences in neural correlates were observed in one study (Chou et al., 2013<sup>∗</sup> ) after being exposed to violent content, with reduced blood flow in the dorsal ACC after playing a violent VG in males, but not females, possibly as a result of the role of the ACC in regulating aggressive behavior in males.

The effect of certain personality traits, particularly empathy, have been assessed using violent VG exposure (Lianekhammy and Werner-Wilson, 2015<sup>∗</sup> ). However, while empathy scores correlated with neural activity (frontal asymmetry during EEG), they were not affected by the presence of violent content. Markey and Markey (2010) found that some personality profiles, especially those with high neuroticism and low conscientiousness and agreeableness, are more prone to be affected by the exposure to violent VGs.

VG player's perspective may also be determinant to the level of moral engagement; while ERP N100 amplitudes were greater during a first person violent event, if the player was using a distant perspective, general alpha power was greater, which is indicative of lower arousal levels (Petras et al., 2015).

Montag et al. (2012), observed that regular gamers have been habituated to violence exposure and show less lateral prefrontal activation, linked to limbic down-regulation, compared to nongamers. However, gamers have not lost the ability to distinguish real from virtual violence, as Regenbogen et al. (2010<sup>∗</sup> ) found, although that also depended on each person's learning history.

While attenuated P300 amplitudes have been linked to violence desensitization, both in short and long term exposure (Bartholow et al., 2006), these amplitudes did not increase using a pro-social VG (Liu Y. et al., 2015). Engelhardt et al. (2011), experimentally linked the lower P300 amplitudes to violence desensitization and their effects on aggression. Bailey et al. (2010) also supported the link between violent VG exposure and desensitization to violent stimuli, associating it with early processing differences in attentional orienting.

#### Flow

Flow and boredom states during VG play have also been the subject of research using neural correlates. The concept of flow, described by Csikszentmihalyi (1990), is understood as a mind state of being completely focused on a task that is intrinsically motivating. Among other characteristics, the state of flow implies a balance between the task difficulty and the person's skills, the absence of ambiguity in the goals of the task, and is commonly accompanied by a loss of awareness of time. Considering that the concept of flow is a complex construct which itself cannot be directly measured, it is necessary to operationalize its components. Some authors have identified some of these components as sustained attention (focus), direct feedback, balance between skill and difficulty, clear goals and control over the activity (Klasen et al., 2012<sup>∗</sup> ) and it has been theorized to be firmly linked to attentional and reward processes (Weber et al., 2009).

VGs provide the appropriate context in which flow states are encouraged to occur, since feedback is offered continuously and the level of difficulty is programmed to raise progressively, in order to match the improving skills of the player (Hunicke, 2005; Byrne, 2006). Therefore, VGs are perfect candidates to operationalize the components involved in the flow theory.

During gameplay in an action VG, Klasen et al. (2012<sup>∗</sup> ) could not relate the feedback component to any meaningful neural activity, but the four remaining flow-contributing factors showed joint activation of somatosensory networks. Furthermore, motor regions were implicated in the difficulty, sustained attention and control components. Together, the authors identify this sensorimotor activity as a reflection of the simulated physical activity present in the VG, which can contribute to the state of flow. The rest of the components elicited activity in several different regions. The reward system was involved in the skill-difficulty balance factor, observed by activation in the ventral striatum and other basal nuclei, rewarding the player in successful in-game events. In addition to activity in reward regions, this factor also correlated with simultaneous activity in a motor network comprised of the cerebellum and premotor areas. The factor comprising concentration and focusing during the VG was associated with changes in attentional networks and the visual system, as players switched away from spatial orientation to processing the numerous elements of the VG in high focus settings. Goal-oriented behavior showed decreased activity in the precuneus and regions of the ACC, while activity in bilateral intraparietal sulcus and right fusiform face area (associated with face processing) increased, which the authors explain as a result of a shift from navigation in a known environment to seeking new game content (Klasen et al., 2012<sup>∗</sup> ).

When manipulating the VG settings to elicit states or boredom, operationalized as the absence of goal-oriented behavior, one of the main aspects of flow, affective states appear. While the lack of goal-directed behavior resulted in an increase of positive affect, the neural correlates were characterized by lower activation in the amygdala and the insula (Mathiak et al., 2013). However, a different neural circuit was responsible when negative affect increased, characterized by activation in the ventromedial prefrontal cortex and deactivation of the HC and the precuneus, that seemed to counteract the state of boredom, possibly by planning future actions during inactive periods (Mathiak et al., 2013). Involvement of frontal regions was also observed by Yoshida et al. (2014) related to flow and boredom states. During the state of flow, activity in bilateral ventrolateral prefrontal cortex (vlPFC) [comprising the inferior frontal gyrus (IFG) and lateral OFC] increased, and it decreased when participants were subject to a boredom state. The OFC is linked to reward and emotion processing (Carrington and Bailey, 2009), and monitoring punishment (Kringelbach and Rolls, 2004). However, this study employed boredom differently, using a low difficulty level in the VG instead of the suppressing goal-directed behavior.

Brain-computer interfaces, using electrophysiological methods to measure brain activity, have been able to differentiate states of flow and boredom, created by adjusting the level of difficulty of a VG. The EEG frequencies that were able to discern between flow states were in the alpha, low-beta and mid-beta bands, measured in frontal (F7 and F8) and temporal (T5 and T6) locations (Berta et al., 2013).

#### Gender Differences

Although some studies have already discussed the presence of gender differences in cognitive processes related to VG playing, the lack of studies dealing with this topic and providing neural data are notable. The most relevant study of gender differences (Feng et al., 2007<sup>∗</sup> ) found that a 10-h training in an action VG (but not in a non-action VG) was enough to compensate for baseline gender differences in spatial attention, and to reduce the gap in mental rotation skills. Whether the initial difference was innate or a product of lesser exposure to this kind of activities in women is a matter of debate (Dye and Bavelier, 2010). Actually, one of the reasons men do not improve as much as women could be explained by a ceiling effect due to previous exposure to VGs. On the other hand, women with less experience in these activities are able to achieve equal performances in visuospatial skills that reach the same ceiling effect with a short training period. In this respect, Dye and Bavelier comment on the possible effects of lifetime VG exposure since the gender gap in attentional and non-attentional skills is smaller or non-existent during childhood compared to adult life, and the greater development of these skills in male individuals is partially due to games targeting a male audience.

Other authors (Ko et al., 2005) have focused on other psychosocial factors to explain gender differences in online VG addictions. Considering most online VGPs are men and this difference is also observed in addiction cases, they studied the possible factors and observed that lower self-esteem and lower daily life satisfaction are determinant in men, but not women. They attribute these differences to the reasons on why they play VGs: while men declared to play to pursue feelings of achievement and social-bonding, it was not the case for women. This aspect is not new to VG addiction and is shared aspect with other addictions. It is likely that VGs are used as a way to cope with these problems, leading up to the development of the addiction.

# LIMITATIONS

The study of neural correlates of VGs entails a number of inherent difficulties. The main limitation encountered during the development of this review was the dual nature of studies with regard to VGs as a research tool or as an object of study. The lack of standardization in study objectives is another limitation that should be addressed. Despite the recent popularity of VG-related studies, there are a multitude of similar research lines that offer hardly comparable results, making it difficult to draw general conclusions. We aimed to unify all sorts of studies in order to interpret and generalize the results.

First of all, we compared a large number of studies that not only used completely different techniques, but also had very heterogeneous research goals. We grouped them together with the aim of extracting all the available neuroimaging information, but it is likely that some information that would have been relevant for us was missed in the studies because their research objectives differed greatly from our own. In fact, in certain cases, VGs were almost irrelevant to the aim of the study and were only used as a substitute for a cognitive task, so the provided results may not directly reflect the VG neural correlates. Similarly, VGs were sometimes used as tools to provide violence exposure or to study the effects of behavioral addictions without the VG being the central object of study.

Another issue was the lack of a proper classification for VG genres. While the most common division is between action and non-action VGs, it would be interesting to establish which variables determine this classification. For instance, both first person shooters and fighting games could be considered action VGs. Both demand quick response times and high attentional resources, but first person shooter games require much higher visuospatial skills while fighting games do not. Consequently, efforts should be made to determine which aspects of each VG genre are related with each cognitive process and its associated neural correlates.

Apart from these aspects, comparisons between gamers and non-gamers are common in VG literature. Nevertheless, there is no consensus on the inclusion requirements for each group and it seems that no scientific criterion has been used to establish a cut-off line. Current dedication to VGs, measured in hours per week, seems to be the most common classification method. Nongamer groups sometimes are so strict as to exclude any gaming experience, but on other occasions, for the same category, several weekly VG hours are tolerated. This is problematic since, in some cases, cognitive changes have been found after just a few weeks of VG training. However, in most cases, the onset age of active VG play, which is a particularly relevant aspect (Hartanto et al., 2016), is not taken into account. Another relevant variable, which tends to be forgotten, is lifetime VG experience, usually measured in hours. Moreover, despite the clearly different outcomes caused by different VG genres, this variable is not included when describing a participant's VG experience. Therefore, VG experience should be measured taking into account all the variables mentioned above: onset age, lifetime VG experience (in hours), current VG dedication (hours per week) and VG genres.

With regard to this review, it was really difficult to extract all the relevant information because of the limitations of the existing literature about the topic. But we did our best to clarify the results and to extract valuable conclusions.

Another limitation was the link between neural changes and cognitive functions. The neural correlates of VGs are the focus of this review, and we found it essential to complement this data by discussing their cognitive implications. In most cases these implications were directly assessed by the individual studies, but in some cases they were extrapolated based on previous literature. Furthermore, even when functional or structural changes are detected, they do not always reflect cognitive changes. This may be due to a lack of sensitivity in the cognitive and behavioral tasks employed. In order to detect both neural and cognitive changes, specific research designs, with sufficiently sensitive measurements of the three dimensions (functional, structural, and cognitive) are needed. Ideally, to determine when each change starts to appear as a result of VG exposure, an experimental design, including a VG training period, should be used. In this design, the neural and cognitive data would be assessed along a series of time points until the three types of changes were detected. An exhaustive discussion of the cognitive implications of VGs is beyond our scope since there are already other works that deal with this particular issue (Powers et al., 2013; Lampit et al., 2014; Toril et al., 2014; Wang et al., 2016).

Efforts should be made to systematize VG-related research, establishing VG training protocols and determining the effects of lifetime VG exposure, in order that more comparable results can be obtained and to improve the generalizability of results.

#### CONCLUSIONS

The current work has allowed us to integrate the great deal of data that has been generated during recent years about a topic that has not stopped growing, making it easier to compare the results of multiple research groups. VG use has an effect in a variety of brain functions and, ultimately, in behavioral changes and in cognitive performance.

The attentional benefits resulting from the use of VG seem to be the most evidence-supported aspect, as many studies by Bavelier and Green have shown (Green and Bavelier, 2003, 2004, 2006, 2007, 2012; Dye et al., 2009; Hubert-Wallander et al., 2011; Bavelier et al., 2012b). Improvements in bottom-up and top-down attention, optimization of attentional resources, integration between attentional and sensorimotor areas, and improvements in selective and peripheral visual attention have been featured in a large number of studies.

Visuospatial skills are also an important topic of study in VG research, where optimization of cognitive costs in visuomotor task performance is commonly observed. Some regions show volumetric increases as a result of VG experience, particularly the HC and the entorhinal cortex, which are thought to be directly related to visuospatial and navigational skills. Optimization of these abilities, just like in attention and overall skill acquisition, is usually detected in functional neuroimaging studies as decreased activation in their associated pathways (in this case, in regions linked to the dorsal visual stream). It is likely that the exposure to a task first leads to an increase of activity in the associated regions, but ultimately, as the performance improves after repeated exposures, less cortical resources are needed for the same task.

Likewise, although not always consistent, even short VG training paradigms showed improvements in cognitive control related functions, particularly working memory, linked to changes in prefrontal areas like the dlPFC and the OFC. How to achieve far transfer in these functions remains one of the most interesting questions regarding cognitive control. Despite VGs being good candidates for cognitive training, it is still not wellknown what the optimum training parameters for observing the first effects are. It seems intuitive that longer training periods will have a greater chance of inducing far transfer, but how long should they be? We also commented on how VG genre can have differential effects on cognitive control, so we cannot expect to observe these effects without first controlling this variable, since different VG genres often have little in common with each other.

Cognitive workload studies have offered the possibility of observing neural recruitment phenomena to compensate for the difficulty and complexity of a cognitive task and a number of studies have pointed to the importance of frontoparietal activity for this purpose.

It has been also possible to link skill acquisition rates with certain cerebral structures. Several brain regions are key in this regard, mainly the dlPFC, striatum, SMA, premotor area, and cerebellum. Moreover, as suggested by Anderson et al. (2015), models of whole-brain activation patterns can also be used as an efficient tool for predicting skill acquisition.

The role of the reward system is always present when we talk about VGs, due to the way they are designed. Addiction has a heavy impact throughout the neural reward system, including components like the OFC, the ACC, the ventral striatum, ventral pallidum, and midbrain dopaminergic neurons, together with diverse regions that have support roles in addiction. The role of structures that link addiction to its emotional components, such as the amygdala and the HC should not be underestimated.

Limbic regions work together with the PCC to integrate the motivational information with the expectation of reward.

Exposure to violent content has implications regarding the reward circuits and also emotional and executive processing. Reduced functional connectivity within sensory-motor, reward, default mode and right frontotemporal networks are displayed after playing a violent VG. The limbic system, interacting with the lateral prefrontal cortex, has a role in downregulating the reaction to negative emotions, like those found in violent contexts, which may lead to short-term violence desensitization.

Despite the difficulties in locating the main components of flow in the brain, it seems that several networks are involved in this experience. General activation of somatosensory networks is observed while being in this state, whereas activation in motor regions is only linked to three components of flow: skill-difficulty balance, sustained attention and control over the activity. The reward system has key implications in the experience of flow, showing that the ventral striatum and other basal ganglia are directly linked to the skill-difficulty balance in a task. When seeking new content in order to avoid boredom, the bilateral intraparietal sulcus and the right fusiform face area seem to be the most implicated regions. During a flowevoking task, the absence of boredom is shown by activity in the IFC, the OFC, and the vmPFC. Flow is also linked to emotional responses, and both positive and negative affect during a VG have shown changes in the amygdala, insula, vmPFC and the HC.

It is also worth commenting on the negative effects of VGs. While much has been written about the possible benefits of VG playing, finding articles highlighting the negative outcomes in non-addicted or expert VGPs is much less common. To our knowledge, only four studies pointed out neural correlates which predicted hindered performance in a range of cognitive domains. VG use has been linked with reduced recruitment in the ACC, associated with proactive cognitive control and possibly related to reduced attentional skills (Bailey et al., 2010). Likewise, exposure to violent content in VG is associated with lower activity in the dlPFC, interfering with inhibitory control. The same team (Bailey and West, 2013) observed how VG play had beneficial effects on visuospatial cognition, but in turn had negative effects on social information processing. Lastly, VG exposition has been linked to

#### REFERENCES


delayed microstructure development in extensive brain regions and lower verbal IQ (Takeuchi et al., 2016).

Finally, although this review is focused on the neural correlates of VG, not their cognitive or behavioral effects, we believe in the importance of integrating all these aspects, since raw neuroimaging data often offer little information without linking it to its underlying cognitive processes. Despite the fact that this integration is increasingly common in the literature, this is not always the case and it is an aspect that could be addressed in future studies.

#### AUTHOR CONTRIBUTIONS

All authors had an equal involvement during the process of making this review article. The article's design, data acquisition, and analysis of its content has been made by consensus among all the authors.

# FUNDING

This study has been supported by the doctoral school of the Open University of Catalonia, Spain, under the IN3-UOC Doctoral Theses Grants Programme 2013-2016 (http://in3.uoc.edu). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

#### ACKNOWLEDGMENTS

We would like to sincerely thank our colleague Cristina García Palma for her assistance during the whole process of extracting and processing information from the scientific databases and for her valuable contributions during the course of this work. We would also like to express our gratitude to Nicholas Lumsden, who assisted in the proof-reading and English-language correction of the manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2017.00248/full#supplementary-material

game: a model-based test of the decomposition hypothesis. J. Cogn. Neurosci. 23, 3983–3997. doi: 10.1162/jocn\_a\_00033


in the setting of negative consequences. J. Psychiatr. Res. 73, 1–8. doi: 10.1016/j.jpsychires.2015.11.011


Int. J. Hum. Comput. Interact. 17, 211–227. doi: 10.1207/s15327590ijhc 1702\_6


adolescents: a multi-method study. Soc. Cogn. Affect. Neurosci. 6, 537–547. doi: 10.1093/scan/nsq079


changes in frontolimbic circuitry in adolescents. Brain Imaging Behav. 3, 38–50. doi: 10.1007/s11682-008-9058-8


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer JMRA and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2017 Palaus, Marron, Viejo-Sobera and Redolar-Ripoll. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Video Game Training Enhances Visuospatial Working Memory and Episodic Memory in Older Adults

Pilar Toril 1,2\*, José M. Reales 1,3 , Julia Mayas 1,2 and Soledad Ballesteros 1,2

<sup>1</sup> Studies of Aging and Neurodegenerative Diseases Research Group, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain, <sup>2</sup> Department of Basic Psychology II, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain, <sup>3</sup> Department of Methodology of the Behavioral Sciences, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain

In this longitudinal intervention study with experimental and control groups, we investigated the effects of video game training on the visuospatial working memory (WM) and episodic memory of healthy older adults. Participants were 19 volunteer older adults, who received 15 1-h video game training sessions with a series of video games selected from a commercial package (Lumosity), and a control group of 20 healthy older adults. The results showed that the performance of the trainees improved significantly in all the practiced video games. Most importantly, we found significant enhancements after training in the trained group and no change in the control group in two computerized tasks designed to assess visuospatial WM, namely the Corsi blocks task and the Jigsaw puzzle task. The episodic memory and short-term memory of the trainees also improved. Gains in some WM and episodic memory tasks were maintained during a 3-month follow-up period. These results suggest that the aging brain still retains some degree of plasticity, and that video game training might be an effective intervention tool to improve WM and other cognitive functions in older adults.

#### Edited by:

Rachael D. Seidler, University of Michigan, USA

#### Reviewed by:

Marian Berryhill, University of Nevada, Reno, USA Corey Bohil, University of Central Florida, USA

#### \*Correspondence:

Pilar Toril pilartoril@psi.uned.es

Received: 25 January 2016 Accepted: 22 April 2016 Published: 06 May 2016

#### Citation:

Toril P, Reales JM, Mayas J and Ballesteros S (2016) Video Game Training Enhances Visuospatial Working Memory and Episodic Memory in Older Adults. Front. Hum. Neurosci. 10:206. doi: 10.3389/fnhum.2016.00206 Keywords: brain plasticity, cognitive aging, episodic memory, training, video games, visuospatial working memory

# INTRODUCTION

Age-related brain changes occurring mainly in the prefrontal cortex and the medial temporal lobe system (including the hippocampus and the cerebellum) are associated with cognitive declines (Raz et al., 2005) in several functions, including processing speed (Salthouse, 1996), peripheral vision (Muiños and Ballesteros, 2014), dynamic visual acuity (Muiños and Ballesteros, 2015), working memory (WM), executive control functioning, and episodic memory (e.g., Baltes and Lindenberger, 1997; Hoyer and Verhaeghen, 2006; Nilsson, 2003; Park and Gutchess, 2002; Rönnlund et al., 2005). However, other cognitive functions, including implicit memory, verbal abilities and world knowledge, are mostly spared with age (e.g., Park et al., 2002; Mitchell and Bruss, 2003; Craik and Bialystok, 2006; Osorio et al., 2010; Ballesteros et al., 2013; Ballesteros and Mayas, 2015). Experience-related changes induced by the modification of the social environment, physical activity, and cognitive training affect brain structure and function (for a recent review see, Ballesteros et al., 2015a). Research on brain plasticity in older adults and its relationship to experiential changes is currently attracting substantial public interest (Raz and Lindenberger, 2013). However, several studies have found brain plasticity not only in healthy older adults, but also in patients suffering chronic traumatic brain injury (Sacco et al., 2011), schizophrenia (Fisher et al., 2010), and intellectual disability (Söderqvist et al., 2012). Recently, research on training-induced changes in brain and behavior has attracted a great interest in cognitive neuroscience across lifespan, especially in the older age. Studies on this field might contribute to improve our knowledge on brain plasticity and result of a great help for designing effective interventions (see Karbach and Schubert, 2013).

Training intervention studies suggest that the older human brain maintains a certain level of neural plasticity (Bialystok and Craik, 2006; Li et al., 2006, 2008). Based on this idea, an active line of research concerns ways of maintaining and/or improving cognitive skills (Green and Bavelier, 2003), delaying cognitive and brain declines as much as possible (Hertzog et al., 2008; Park and Reuter-Lorenz, 2009; Park and Bischof, 2013; Reuter-Lorenz and Park, 2014). The observed increase in neural volume in response to cognitive training is an indicator of brain plasticity (see, Boyke et al., 2009; Park and Bischof, 2013). Based on the assumption that the older brain retains at least some degree of plasticity and still has the capacity to modify its structural and functional patterns to meet new environmental demands, researchers are intensively exploring different types of intervention for older adults.

One of the most popular computerized intervention approaches is training older adults with video games. Some intervention studies have reported improvements in the trained group but not in the control group in several cognitive functions, including processing speed (e.g., Clark et al., 1987; Dustman et al., 1992; Ballesteros et al., 2014), visuo-motor coordination (Drew and Waters, 1986), attention (e.g., Goldstein et al., 1997; Belchior, 2008; Mayas et al., 2014), memory (e.g., Craik et al., 2007; Smith et al., 2009; Hampstead et al., 2012), WM (e.g., Edwards et al., 2005; Erickson et al., 2007; Anguera et al., 2013), and global cognitive function (Torres, 2008). By contrast, other studies have failed to find any positive effects of training with video games on cognition (e.g., Ackerman et al., 2010; Owen et al., 2010; Boot et al., 2013a).

Video games are designed with two aims: enjoyment and sustained player engagement (Anguera and Gazzaley, 2015). They can be classified as simple (non-action games) and complex (action games). Complex video games are fast, intense, unpredictable, and require more perceptual and cognitive skills than non-action games (e.g., Green and Bavelier, 2003; Feng et al., 2007). Some researchers have used complex video games to train older adults (e.g., Basak et al., 2008; Stern et al., 2011), while others have used non-action games, which seem more appropriate for older adults (e.g., Torres, 2008; van Muijden et al., 2012; Ballesteros et al., 2014).

An important question in this context is whether training with video games transfers to other untrained cognitive functions. This is a critical issue for its practical significance, but remains debatable. A systematic review conducted to examine the effectiveness of computer-based interventions in cognitively healthy older adults found that video game training improved processing speed and global cognition but was less efficient for improving executive functions (Kueider et al., 2012). More recently, we conducted a meta-analysis to examine the hypothesis that training older adults with video games enhances their cognitive functioning (Toril et al., 2014). The studies included in this meta-analysis were 20 experimental video game training interventions with pre- and post-training measures, published between 1986 and 2013. The mean effect size was moderate [0.37 (SE 0.05) with a 95% CI of between 0.26 and 0.48]. The results indicated that training older adults with video games produces moderate positive effects on several cognitive functions (e.g., reaction time (RT), attention, memory and global cognition), but does not improve executive functions. This meta-analytic study (see also Lampit et al., 2014) also found that these positive results were moderated by variables such as the age of the trainees and the frequency or length of the training program (the amount of time needed to induce cognitive improvement).

WM is a capacity-limited system that stores and processes information needed for ongoing cognition. This capacity-limited workspace is necessary to keep things in mind while performing complex tasks such as comprehension and reasoning (Baddeley and Hitch, 1974). This key component of cognition, central to many cognitive functions, including concentration, problem solving, and impulse control, declines significantly with age (e.g., Park et al., 2002; Bopp and Verhaeghen, 2005; Park and Reuter-Lorenz, 2009). Many recent reviews and longitudinal computerized cognitive training studies have investigated the effectiveness of computerized training approaches aimed at improving WM (e.g., Dahlin et al., 2008; Perrig et al., 2009; Klingberg, 2010; Shipstead et al., 2010, 2012; Takeuchi et al., 2010; Boot et al., 2011; Morrison and Chein, 2011). Unfortunately, the results of these studies are at best mixed, with some articles reporting the effectiveness of WM training (e.g., Borella et al., 2010; Klingberg, 2010; Morrison and Chein, 2011), while others concluded that it is ineffective (e.g., Shipstead et al., 2010; Redick et al., 2013; Ballesteros et al., 2014).

Several recent meta-analytic studies (Karbach and Verhaeghen, 2014; Lampit et al., 2014; Toril et al., 2014) have noticed the great variability in the interventions in terms for example of the intensity and duration of the training regimes, whether they are carried out at home or in the presence of the trainer, and the age of the participants. Our metaanalytic study (Toril et al., 2014) showed that a small number of training sessions is more effective than a large number, possibly because older adults get tired and lose motivation after many training sessions. We also found that the benefits of training increased with the age of participants. The lower baseline scores of the older participants can explain this result. Lampit et al. (2014) also found that computerized cognitive training can improve the cognitive performance of healthy older adults, but that its effectiveness varies across domains. They suggested that training more than three times per week is ineffective. Karbach and Verhaeghen (2014) examined the effects of executive-function and WM training in older adults, suggesting that the inconsistencies of the results were due to differences in the type, intensity, and duration of the intervention, and to the methods used to compare different studies. Their results suggest that WM and executive-function training produces significant and large improvements in the performance of the trained tasks and reliable small to mediumsized transfer effects in the process trained, at least in healthy older adults.

A recent randomized controlled trial study conducted to investigate the effects of training older adults with non-action video games on a series of cognitive functions that decline with age and on subjective wellbeing (Ballesteros et al., 2014) found significant improvements in the experimental group after training in processing speed, attention, immediate and delayed visual recognition memory, as well as a tendency to improve in some dimensions (affection and assertiveness) of the wellbeing scale. However, visuospatial WM and executive control did not improve after training. Overall, these pre-/post-training results support the view that training older adults with non-action video games improves some cognitive abilities but not others. Moreover, the assessment conducted after a 3-month no-contact interval showed that the benefits in processing speed, attention and long-term memory vanished and that only the effects on wellbeing were maintained 3 months later (see Ballesteros et al., 2015b). However, participants in the trained group showed no transfer to either executive control or spatial WM from pretest to 3-month follow-up. These results suggest that cognitive plasticity can be induced in healthy older adults, but that periodic boosting sessions are needed to maintain the training benefits.

In view of the importance of WM for the daily life activities of older adults, we designed this longitudinal intervention study taking into account the results of our previous study (Ballesteros et al., 2014, 2015b) and the findings of several recently published meta-analyses (Karbach and Verhaeghen, 2014; Lampit et al., 2014; Toril et al., 2014). The program was composed of 15 1-h sessions. An important variable is the number of video games included in the training schedule (see Toril et al., 2014), and we therefore selected just six non-action video games to train mainly WM. Importantly, all the participants in the experimental group were trained in group sessions at the municipal senior center and in the presence of an experimenter (as recommended by Kelly et al., 2014 and Lampit et al., 2014).

The goal of the present study was to investigate whether cognitively healthy older adults could benefit from training with non-action video games. We addressed two main questions. First, would training older adults with non-action video games improve their visuospatial WM as well as short- and long-term memory? Secondly, would any improvements persist after a 3-month no-contact period? Based on the results of previous studies, we hypothesized that: (1) video game training would improve the visuospatial WM of older adults; (2) the effects of training would transfer to episodic memory; and (3) memory improvements would persist 3 months after finishing the training program.

# MATERIALS AND METHODS

#### Participants

Forty cognitively healthy volunteer older adults were recruited from a municipal senior center in the Madrid suburbs to participate in this training study. They all had normal or corrected-to-normal vision and hearing and informed that they did not have previous experience with video games. After signing a consent form, participants were randomly assigned either to the trained (experimental) group or to the control group. Participants in both groups regularly attended cultural activities at the senior center (e.g., painting classes, lectures, cultural visits). The control group continued their routine lifestyle activities at the senior center. The study was approved by the Ethics Committee of the Universidad Nacional de Educación a Distancia. The inclusion criteria were to obtain a score of 26 or above on the Mini-Mental State Examination (MMSE; Folstein et al., 1975) and a normal score on the Information subscale of the Wechsler Adult Intelligence Scale (WAIS III; Wechsler, 1999). The two groups did not differ in age, years of education, or in the Information subscale and MMSE scores (all ps > 0.05). In the experimental group, one participant declined to participate after screening for medical reasons. Thus, 19 participants in the training group and 20 in the control group completed the study. Demographic data for each group are displayed in **Table 1**.

#### Study Design

The study was a 2 (group: experimental, control) × 3 (session: pre-training, post-training, 3-month follow-up) mixed factorial design. Group was the between-subjects factor and session was the within-subjects factor. To investigate the effectiveness of the intervention to improve and/or maintain both visuospatial WM and short- and long-term memory, participants performed a series of tests and experimental tasks designed to assess these types of memory: digit span forward and backward, Corsi blocks, Jigsaw puzzle task, and immediate and delayed visual episodic memory tasks (Faces I and II, and Family Pictures I and II from Wechsler Memory Scale, WMS III). The Corsi blocks and the Jigsaw puzzle tasks were programmed using E-Prime 2.0 (Psychology Software Tools Inc., Pittsburg, PA, USA). Long-term memory was assessed with tests from the WMS-III (Wechsler, 1997), and the Digit Span tasks were extracted from WAIS III (Wechsler, 1999).

# Training Schedule and Overview of the Training Program

Participants assigned to the experimental group completed 15 1-h training sessions at the community senior center in the presence of the experimenter over a period of 7–8 weeks. In each training session, participants played six video games twice


Note: Means and Standard deviations (SD) by group; MMSE, Mini-Mental State Examination.

each. The games were selected from Lumosity<sup>1</sup> , a web-based cognitive training platform<sup>2</sup> ; they were Speed Match, Memory matrix, Rotation matrix, Face memory, Money comb and Lost in migration. The session score for each participant on each game was calculated as the mean score of the first and second time they played the game. The session RT for each participant on each game was calculated as the mean performance of the first and second time they played the game. The control group did not receive training but met the experimenter periodically (once a month) in the senior center to talk about their activities and other general topics related to aging. The video games used in this study are described below.

#### Speed Match

In this game, a symbol is displayed on the computer screen, followed immediately by another. The trainee has to decide whether the two symbols are the same, indicating their choice by pressing one of two keys (same, different) as fast as possible.

#### Memory Matrix

A matrix varying in size is displayed in the center of the screen with a pattern of colored squares followed by a blank matrix. The player has to reproduce the pattern by clicking on the squares that were colored.

#### Rotation Matrix

This game is similar to the previous one, except that the matrix is rotated between the coding phase and the response phase. The player has to mentally rotate the encoded matrix, and click on the correct positions of the colored squares.

#### Face Memory

Different faces appear on the screen continuously, one after another, and the player has to decide whether the face on the screen matches the one shown one (1-back), two (2-back), or three (3-back) faces before.

#### Moneycomb

In this game, a honeycomb appears in the center of screen and a sequence of tokens of different values is presented briefly inside it. The task consists of clicking on the correct tiles of the honeycomb to reveal the tokens in the correct order (from lowest to highest value).

#### Lost in Migration

In this game, a static flock of birds appears in the center of the screen. The goal is to identify the direction in which the bird in the middle of the flock is flying (right, left, upward, downward) by pressing one of the four arrow keys on the keyboard as fast as possible.

Participants received points based on their performance on each video game. Some of the games also recorded response times. None of the participants in the study reported that they had any previous experience of playing video games.

#### Assessment Tasks and Procedures

Assessment tasks fell into one of the following three domains: visuospatial WM, short-term memory and episodic memory.

#### Visuospatial Working Memory Tasks

Visuospatial WM (Baddeley and Hitch, 1974) was assessed with the Corsi blocks and the Jigsaw-puzzle task.

#### Corsi Blocks Task

The original Corsi Blocks task (Milner, 1971) consisted of a set of nine identical blocks (3 cm × 3 cm × 3 cm) unevenly positioned on a wooden board (23 cm × 28 cm). The participant had to point to the blocks in their order of presentation. The length of the sequence increased until recall was no longer correct in terms of order or position (Berch et al., 1998). In this study, we used the same computerized version of the Corsi task as in our previous study (Ballesteros et al., 2014) with four levels of increasing difficulty (2, 3, 4 and 5 cube positions) and 10 trials per level. The stimuli were black squares on a 3 × 3 matrix that appeared one after the other, for 1 s each. The positions in each sequence were selected randomly, with the restriction that stimuli could not appear in the same position in two consecutive sequences. In each trial, the participant reproduced the previously presented sequence of cubes (the black squares in the 3 × 3 matrix) by writing down their order of presentation on a separate response sheet. To familiarize participants with the task, they performed a practice block of trials. The final score was the proportion of correct sequences obtained at each difficulty level.

#### Jigsaw-Puzzle Task

The original pencil-and-paper Jigsaw-Puzzle task was developed to assess active visuospatial abilities (Richardson and Vecchi, 2002). We designed a computerized version of this task with puzzles consisting of 4, 6 or 9 pieces. Each piece was numbered and the participant had to write down on a response sheet the number corresponding to the pieces in the correct spatial positions. The stimuli were 15 pictures with similar visual complexity (mean = 2.4, SD = 0.32) and familiarity (mean = 4.3, SD = 0.26) selected from the Snodgrass and Vanderwart (1980) picture set. Each picture was fragmented into 4, 6 and 9 pieces to produce 45 different puzzles. The pictures were enlarged to fit a 12 cm × 12 cm area and were cut into four 6 cm × 6 cm pieces, six 6 cm × 4 cm pieces, or nine 3 cm × 3 cm pieces using Adobe Photoshop CC (Adobe Systems Software, Ireland Ltd.). We generated three different counterbalanced orders. Different pictures were used at pre-test, post-test and followup assessments. Participants were presented with 15 puzzles representing all possible combinations and number of pieces. The response sheets contained grids of the same size as the original pictures with the appropriate number of squares (4, 6, 9 squares). We used two puzzles as practice items and their results were not included in the analysis. For each trial, a fragmented picture appeared on the computer screen and the participant

<sup>1</sup>http://www.lumosity.com

<sup>2</sup>The authors confirm that they did not have any contact with the Lumosity cognitive training platform at any time during the study, which was conducted independently.

wrote down on the response sheet the appropriate numbers to form a spatially correct picture. The jigsaw was presented on the computer screen for 90 s. Participants were allowed to correct errors within that time. Performance was assessed in terms of the proportion of correct puzzles per level (4, 6 and 9 pieces).

#### Short-Term Memory

Short-term memory was assessed with the Digit Span Test of the WAIS III scale (Wechsler, 1999).

#### Digit Span Test

This test has two parts: Digit span forward and Digit span backward. For each part, the test administrator says a series of numbers aloud at the rate of one per second. The participant then repeats the numbers in the same order (digit span forward) or in reverse order (digit span backward). Both tests begin with a series of two numbers. For digits forward, the test continues up to a maximum of eight numbers. For digits backward, the test continues up to a maximum of seven numbers. Participants are given two trials at each length and the test continues until the participant fails both trials at one length. In both the forward and the backward task, the score was the maximum number of correctly remembered digits.

#### Immediate and Delayed Episodic Memory Tests

The **Faces and Family Pictures** subtests of the WMS-III were used to assess visual episodic memory. For immediate recognition and inmediate recall, we used Faces I and Family Pictures I, respectively. Delayed recognition and recall was assessed 25 min later using Faces II for delayed recognition and Family Pictures II for delayed recall.

## RESULTS

# Video Game Practice Effects

Although the main dependent variables of this intervention study were the scores obtained on the memory tests, we also analyzed performance on the video games to evaluate whether the trained participants improved as a consequence of playing the video games. Video game performance showed significant improvements (accuracy and response times) across the 15 training sessions (see **Figures 1A,B**). **Figure 1A** shows the positive linear trend of the mean number of correct responses as a function of session. **Figure 1B** presents the mean response times for the video games that recorded response times. The mean scores of each game at the beginning and end of the training period were compared using regression analysis, with Training Session as the predictor variable and RT and Game Score as the criterion variables. Performance on all games improved after training. R 2 coefficients were high and accounted for more than 80% of the variance of the model in the six games. The ANOVAS for the previous analyses showed that all R 2 coefficients were statistically significant. This means that Training Session was a reliable predictor of Score and RT in the six games. **Table 2** summarizes the results.

# Effects of Video Game Training on Visuospatial Working Memory Tasks and Other Memory Tests

We investigated whether training with video games improved memory abilities that decline with age, especially visuospatial WM and episodic memory. We addressed two questions. The first was whether training older adults with video games would transfer to performance on a series of memory tasks (transfer effects). The second was whether these possible enhancements would remain after a 3-month no-contact period (maintenance). To answer these questions, we investigated whether group (trained vs. control) interacted with session (pre-test, post-test, follow-up) with regard to performance on the different memory tests. Statistical analyses were conducted using Bonferroni correction for main effects and interactions in all tasks.

#### Jigsaw-Puzzle Task

A Group (2) × Session (3) × Level of fragmentation (3) mixed ANOVA with Group as the between-subjects factor and Session

TABLE 2 | Determination coefficients (R 2 ), F and p values for the six video games.


Note. DV, Dependent variable; R<sup>2</sup> , Regression coefficient; R2corr, corrected regression coefficient; F, F values of ANOVAs; p, p values.

and Level of fragmentation as the within-subjects factor was performed on the proportion of correct puzzles completed at each level of fragmentation. The results showed that the main effect of Group was statistically significant (F(1,37) = 12.10, MSe = 3.03, p = 0.001, η 2 <sup>p</sup> = 0.84). The trained group performed better (mean = 0.52, SD = 0.17) than the control group (mean = 0.33, SD = 0.17). Session was also statistically significant (F(2,74) = 8.71, MSe = 0.32, p = 0.001, η 2 <sup>p</sup> = 0.19). Participants performed better at post-test (mean = 0.41, SD = 0.12) than at pre-test (mean = 0.36, SD = 0.18), but there was no difference between post-test and 3-month follow-up (p = 0.29). Level of fragmentation was also significant (F(2,74) = 207.56, MSe = 12.70, p = 0.001, η 2 <sup>p</sup> = 0.84), showing that performance deteriorated with higher levels of fragmentation (level 4, mean = 0.75, SD = 0.18; level 6, mean = 0.43, SD = 0.24; level 9, mean = 0.09, SD = 0.12). The two-way Session by Group interaction was statistically significant (F(2,74) = 13.30, MSe = 0.49, p = 0.001, η 2 <sup>p</sup> = 0.26), showing that the trained group performed better at post-test (mean = 0.61, SD = 0.13) than the control group (mean = 0.32, SD = 0.13). The trained group performed better (mean = 0.55, SD = 0.13) at the 3-month follow-up assessment (p = 0.001) than the control group (mean = 0.32, SD = 0.13), while the performance of the control group did not differ between sessions. No other interaction was significant (all p > 0.05); see **Figure 2** (bottom right).

#### Corsi Blocks Task

A Group (2) × Session (3) × Corsi level (2, 3, 4 and 5 blocks) mixed ANOVA with Group as the between-subjects factor and Session and Corsi level as within-subjects factors were conducted on the proportion of correct sequences per level. The results showed that the main effect of Group was statistically significant (F(1,37) = 10.04, MSe = 3.55, p = 0.001, η 2 <sup>p</sup> = 0.21), showing that the trained group (mean = 0.62, SD = 0.17) outperformed the control group (mean = 0.45, SD = 0.17). The main effect of Session was also statistically significant (F(2,74) = 5.43, MSe = 0.24, p = 0.001, η 2 <sup>p</sup> = 0.12), with better performance at post-test (mean = 0.58, SD = 0.12) than at pre-test (mean = 0.50, SD = 0.18). There were significant differences between pretest and post-test assessments (p = 0.01). Moreover, there were marginally significant differences (p = 0.054) between post-test (mean = 0.58, SD = 0.12) and 3-month follow-up (mean = 0.53, SD = 0.12), while performance at pre-test and 3-month followup (p = 0.57) did not differ. The main factor of Corsi level was also significant (F(3,37) = 285.84, MSe = 12.84, p = 0.001, η 2 <sup>p</sup> = 0.88), showing that performance deteriorated as the number of blocks increased (Corsi 2, mean = 0.89, SD = 0.06; Corsi 3, mean = 0.67, SD = 0.18; Corsi 4, mean = 0.46, SD = 0.24; Corsi 5, mean = 0.12, SD = 0.12). The two-way Session by Group interaction was also statistically significant (F(2,74) = 5.25, MSe = 0.23, p = 0.001, η 2 <sup>p</sup> = 0.12). The analysis of this interaction showed that there were significant differences (p = 0.001) between pre- and post-test in the trained group (mean pretest = 0.55, SD = 0.21; mean post-test = 0.70, SD = 0.17), but a reverse trend was observed in the control group (p = 1.00). There were also significant differences (p = 0.01) in the trained group between post-test and 3-month follow-up (mean posttest = 0.70, SD = 0.17; mean follow-up = 0.62, SD = 0.17), with lower performance at the 3-month follow-up. Differences between pre- and post-test were not significant (p = 1.00) in the control group (mean pre-test = 0.45, SD = 0.22; mean posttest = 0.46, SD = 0.17). The two-way Group by Level interaction was also significant (F(3,111) = 5.67, MSe = 0.24, p = 0.001, η 2 <sup>p</sup> = 0.13). The analysis of this interaction showed that there were significant differences (ps < 0.05) between groups at all Corsi levels. In the trained group, the means for each level were: level 2 = 0.94, SD = 0.08; level 3 = 0.78, SD = 0.17; level 4 = 0.60, SD = 0.26; level 5 = 0.17, SD = 0.13. In the control group the means were: level 2 = 0.84, SD = 0.08; level 3 = 0.54, SD = 0.17; level 4 = 0.32, SD = 0.26; level 5 = 0.07, SD = 0.13). Although the trained group performed better than the control group in this task, performance deteriorated in both groups as the number of blocks increased; see **Figure 2** (bottom left).

# Digit Forward Test

The ANOVA conducted with Group (2) and Session (3) on the numbers of correct digits reportedly showed that Session was significant (F(2,74) = 3.97, MSe = 1.23, p = 0.02, η 2 <sup>p</sup> = 0.09), with better performance at post-test (mean = 5.38, SD = 0.87) than at pre-test (mean = 5.05, SD = 0.74), but there were no significant differences between post-test and 3-month followup (p = 1.00). The trained group performed better at post-test than at pre-test (p = 0.02), but there was no difference between post-test and 3-month follow-up (p = 1.00). Group as a main factor was not significant (p = 0.37), as the performance of the trained group (mean = 5.35, SD = 0.69) was similar to that of the control group (mean = 5.15, SD = 0.71). No other factors or interactions were significant (all ps > 0.05); see **Figure 2** (top left).

#### Digit Backward Test

An ANOVA with Group (2) and Session (3) was conducted on the numbers of correct digits reported. The main factor of Group was significant (F(1,37) = 5.09, MSe = 8.78, p = 0.03, η 2 <sup>p</sup> = 0.12), showing that the trained group performed better (mean = 4.29, SD = 0.73) than the control group (mean = 3.75,

FIGURE 2 | Top: Mean performance of trained and control groups at pre-test, post-test and follow-up assessments in Digits forward (left) and Digits backward (right). Bottom: Mean performance of trained and control groups at pre-test, post-test and follow-up assessments in the working memory (WM) tasks (left: Corsi blocks; right: Jigsaw puzzle tasks). Error bars represent ± SE. <sup>∗</sup>p < 0.05.

SD = 0.67). The main effect of Session (F(2,74) = 5.24, MSe = 1.96, p = 0.001, η 2 <sup>p</sup> = 0.12) was significant. Performance at pre-test was better than at post-test (mean pre-test = 4.28, SD = 0.81; mean post-test = 3.88, SD = 0.93). The control group performed worse at post-test than at pre-test. The two-way Group by Session interaction was also significant (F(2,74) = 4.69, MSe = 1.75, p = 0.01, η 2 <sup>p</sup> = 0.11). Simple effects analysis showed no significant differences in the trained group between pretest, post-test and 3-month follow-up evaluations (p = 1.00). Participants performed similarly in the three assessment sessions (mean = 4.31, SD = 0.82; mean = 4.31, SD = 0.80; mean = 4.26, SD = 0.87, for pre-test, post-test and 3-month follow-up, respectively). By contrast, there were significant differences in the control group between pre- and post-test (p = 0.001) due to poorer performance at post-test (mean post-test = 3.45; SD = 1.14) than at pre-test (mean = 4.25, SD = 1.14), but there were no differences (p = 1.00) between post-test and 3-month follow-up; see **Figure 2** (top right).

#### Episodic Memory Test (Faces)

The results of the episodic memory tests are shown in **Figure 3** and **Table 3**.

#### Faces I

An ANOVA with Group (2) and Session (3) was performed on the recognition scores (Faces I). The analysis showed that Group was not significant although there was a trend in that direction (p = 0.07). Session was significant (F(2,74) = 4.37,

FIGURE 3 | Mean performance of trained and control groups at pre-test, post-test and follow-up in the episodic memory tasks. Top: Faces I (left) and Faces II (right). Bottom: Family Pictures I (left) and Family Pictures II (right). Error bars represent ± SE. <sup>∗</sup>p < 0.05.

MSe = 50.11, p = 0.01, η 2 <sup>p</sup> = 0.10) with better performance at the 3-month follow-up (mean = 34.32, SD = 5.42) than at pre-test (mean = 32.05, SD = 5.05). There were no significant differences (p = 0.55) between pre-test (mean = 32.05, SD = 5.05) and post-test (mean = 33.08, SD = 5.61), of between post-test (mean = 33.08, SD = 5.61) and 3-month follow-up (mean = 34.32, SD = 5.42). The two-way Group by Session interaction was significant (F(2,74) = 3.28, MSe = 37.67, p = 0.04, η 2 <sup>p</sup> = 0.08). The analysis of this interaction showed that there were significant differences between groups at post-test (p = 0.01), but only a trend was found at 3-month follow-up (p = 0.07). The trained group performed better at post-test than at pre-test (p = 0.03), but with similar performance at post-test and 3-month follow-up evaluations (p = 1.00). By contrast, the control group performed similarly in the three evaluation sessions (p > 0.05).

#### Faces II

The ANOVA conducted with Group (2) and Session (3) on the recognition scores (Faces II) showed that the effect of Group was statistically significant (F(1,37) = 4.20, MSe = 262.15, p = 0.04, η 2 <sup>p</sup> = 0.10). The trained group performed better (mean = 33.22, SD = 4.52) than the control group (mean = 30.23, SD = 4.55). Session was also significant (F(2,74) = 5.40, MSe = 62.57, p = 0.001, η 2 <sup>p</sup> = 0.12). There were no differences between pre-


TABLE 3 | Pre-test, post-test and follow-up training performance on working memory and episodic memory tasks corresponding to the trained and the control groups. M (Mean), SD (Standard deviation).

test and post-test scores (p = 0.15), or between post-test and 3-month follow-up scores (p = 0.55), but performance was better (p = 0.001) at 3-month follow-up (mean = 32.94, SD = 5.61) than at pre-test (mean = 30.41, SD = 4.80). The two-way Group by Session interaction was also statistically significant (F(2,74) = 11.29, MSe = 130.70, p = 0.001, η 2 <sup>p</sup> = 0.23), suggesting that groups differed at post-test (p = 0.001), but not at the 3-month follow-up assessment (p = 0.42). The trained group improved from pre-test to post-test (p = 0.001), but not between post-test and 3-month follow-up (p = 0.43). Moreover, there were significant differences (p = 0.02) between pre-test and 3-month follow-up, suggesting that performance at followup (mean = 33.68, SD = 5.69) was better than at pre-test (mean = 30.57, SD = 4.82). In the control group there were no significant differences between pre-test and post-test (p = 0.14), but participants in this group performed better at 3-month follow-up than at post-test (p = 0.001). Differences between pre-test and 3-month follow-up were not significant (p = 0.25).

## Episodic Memory Tests (Family Pictures)

#### Family Pictures I

An ANOVA with Group (2) and Session (3) performed on the recall scores (Family Pictures I) showed a significant effect of Group (F(1,37) = 11.18, MSe = 1203.86, p = 0.001, η 2 <sup>p</sup> = 0.23), indicating that the trained group performed better (mean = 29.01, SD = 5.95) than the control group (mean = 22.60, SD = 5.94). Session was not significant (p = 0.06), but the Session × Group interaction was significant (F(2,74) = 8.47, MSe = 211.37, p = 0.001, η 2 <sup>p</sup> = 0.18). The analysis of this interaction suggests that there were significant differences between groups at post-test (p = 0.001) and 3-month followup (p = 0.03). Performance of the trained group did not differ between sessions (p > 0.05), but the control group performed worse at post-test than at pre-test (p = 0.001) with no difference between post-test and 3-month follow-up (p = 0.88). However, the control group performed worse at 3-month follow-up than at pre-test (p = 0.01).

#### Family Pictures II

The ANOVA Group (2) × Session (3) conducted on the recall scores (Family Pictures II) showed that the effect of group was significant (F(1,37) = 8.89, MSe = 977.08, p = 0.001, η 2 <sup>p</sup> = 0.19), suggesting that the trained group performed better (mean = 26.29, SD = 6.00) than the control group (mean = 20.51, SD = 6.03). The effect of Session was also statistically significant (F(2,74) = 4.09, MSe = 100.71, p = 0.02, η 2 <sup>p</sup> = 0.10). There were no significant differences between post-test and 3-month followup (p = 1.00) or between pre-test and post-test (p = 0.27), but scores differed significantly between pre-test and 3-month follow-up (p = 0.01). Performance was worse at 3-month followup (mean = 22.00, SD = 7.08) than at pre-test (mean = 25.16, SD = 4.90). The Group by Session interaction was significant (F(2,74) = 3.73, MSe = 91.87, p = 0.02, η 2 <sup>p</sup> = 0.09), showing that the trained group performed better than the control group at both post-test and 3-month follow-up assessments. The trained group did not differ between sessions (p > 0.05), but the control group performed worse at post-test than at pre-test (p = 0.01). Moreover, the control group performed similarly at post-test and 3-month follow-up (p = 1.00), but this group performed worse (p = 0.001) at the 3-month follow-up test (mean = 18.90, SD = 7.82) than at pre-test (mean = 23.90, SD = 5.54).

# DISCUSSION

The study yielded three main findings. First, the trainees improved their video game performance across sessions. Second, and most important, the trainees performed the Jigsaw puzzle, Corsi Blocks, Digit forward, and Faces I and II tasks better than the control group. Third, the improved performance of the trained group was maintained from baseline to the 3-month follow-up for the Jigsaw puzzle task, which is a visuospatial WM task, and on the Digits forward, and Faces I and II tasks, but not on the Corsi Blocks, the other visuospatial WM task. These results are encouraging considering the age-related declines that occur in these memory functions.

# Non-Action Video Games Training Transferred to Working Memory

The improvements found in the present study are in line with previous findings reported in other video game training studies conducted with older adults (e.g., Basak et al., 2008; Anguera et al., 2013). Basak et al. (2008) found improvements on working memory tasks after training older adults for 23.5 h with a real-time strategy video game (action video game). Anguera et al. (2013) trained older adults (60–75 years) for 4 weeks with an adaptive version of Neuroracer. Participants reduced multitasking costs at the post-training evaluation compared to an active control group and a no-contact control group. Moreover, the benefits of training were extended to an untrained WM task, and gains persisted for 6 months.

The results of this longitudinal study are in agreement with those of other researchers who trained older adults using computerized training programs. For example, Buschkuehl et al. (2008) conducted an adaptive visual WM training study with oldest-old adults (mean age = 80 years). They found substantial gains in the trained task and improvements immediately after training in visual WM, which disappeared at the 1-year followup. Li et al. (2008) also investigated the effects of WM training on performance improvement, transfer and short-term maintenance of practice gains. In their study, young and older adults practiced a spatial WM task for 45 days, about 15 min per day. In both age groups, these researchers found improvements on the practiced tasks, and near transfer to spatial and numerical n-back tasks. Moreover, practice gains and near transfer effects were maintained at 3-month follow-up, but performance after training was lower in older than in young adults. Dahlin et al. (2008) conducted a computer–based training study with young and older adults based on updating information in WM. The results showed that both trained groups showed significantly greater improvement on the letter memory criterion task than the control group. Interestingly, gains were maintained 18 months later in young adults but not in older adults. Recently, Zinke et al. (2014) trained WM in older adults in nine sessions over 3 weeks and found near transfer effects in a Corsi blocks task at post-test compared with pre-test.

The results of the present study also agree with findings of a recent meta-analysis (Karbach and Verhaeghen, 2014). The authors showed that executive functions and WM training led to significant improvements in performance in the trained tasks as well as large near transfer effects in older adults. However, the findings of the present study do not agree with the results of Maillot et al. (2012) who found that after a 24-h training program the trainees improved more than controls in measures of executive control and processing speed functions, but not in visuospatial measures. The present results conflict with those of Ballesteros et al. (2014), who did not find any improvement in visuospatial WM tasks (Corsi Blocks and Jigsaw puzzle tasks) or executive functions after 20 1-h training sessions with 10 non-action video games selected from the Lumosity platform, although the video game training intervention was effective for improving RT, attention, and episodic memory.

The trained group improved digit span forward performance after training while the control group performed similarly across the three assessment sessions. However, in digit span backward test (a more difficult task), the trained group did not improve after training while the performance of the control group declined. It is important to note that the age-related declines in digit span backward performance is greater than that in digit span forward. This result is in agreement with the meta-analysis of Babcock and Salthouse (1990). This result supports the idea that with advancing age, the digit span forward test tends to remain stable while the digit span backward task tends to decline (Lezak, 1995). This might explain the performance of our control group in the forward and backward span digit tests.

# Video Game Training Enhanced Some Episodic Memory Tests

The present study also found improvements in some episodic memory tests after training, similar to the results of our previous study (Ballesteros et al., 2014), in which we found effects of training in episodic memory (Family Pictures I and II). However, in the present study we found improvements after training in Faces (I and II), which were maintained over a 3-month period without contact. We found improvements in Faces I and II (two recognition tests) after training but not in Family Pictures (two recall tests). This result might be explained because free recall requires greater resources than recognition and this effect increases with age (Craik and McDowd, 1987). In Faces, the recognition test, the information is present while performing the memory task. However, in Family Pictures, a recall test, very few cues are provided and participants have to initiate a series of mental operations, which require more effort. In the trained group, video game training could help participants to recall features although they did not improve after training. The results obtained in Faces suggest that the task was easy for both groups. The trained group improved after training and controls were able to maintain their performance over time.

Our results are in line with those of Buschkuehl et al. (2008) who conducted a WM training study with 80-year-old adults who trained twice a week for 3 months. Participants showed improvements in the trained tasks (visual WM tasks) and to a lesser degree, in a visual episodic memory task (visual free recall) in which they had to look for differences between two almost identical pictures.

# Results at 3-Month Follow-Up

The usefulness of the intervention depends on both the occurrence of transfer effects and the durability of the training effects. Accordingly, it was important to ascertain whether the benefits found at post-training on some WM tasks and shortand long-term memory tests were maintained after a 3-month no-contact follow-up period. In this study, transfer effects were maintained in the Digit forward test, the Jigsaw puzzle task, and in the Faces I and Faces II tests. However, significant improvements in the Corsi blocks task were not maintained.

Our results are in line with those of Anguera et al. (2013) who trained older adults for 4 weeks with an adaptive version of Neuroracer and found benefits after training. Specifically, they reported reduced multitasking costs in the trained group compared to the control group. The benefits found after training extended to an untrained WM task, and gains persisted for 6 months. The 3-month maintenance found in the present study is in line with the results of Li et al. (2008) who found specific improvements in young and older adults in a WM task and maintenance of near transfer effects at 3-month follow-up.

# The Video Games Training Debate

The conflicting results obtained in cognitive and brain-training studies with computerized cognitive exercises and video games have been explored in several recent meta-analyses (Karbach and Verhaeghen, 2014; Lampit et al., 2014; Toril et al., 2014). Specifically, we found that short training interventions conducted with older adults produced better results than long regimes (Toril et al., 2014). Training sessions are exciting at first, but older adults get tired and bored during the last sessions. Karbach and Kray (2009) found significant transfer effects in older adults after just four training sessions, as did Kramer et al. (1995) who provided a limited number of training sessions. Lampit et al. (2014) concluded that unsupervised athome training regimes were less effective than group-based sessions, and that training more than three times a week was also ineffective. Another important variable was the number of video games used during the training sessions. Although not significant, Toril et al. (2014) found a trend in the analysis indicating that it is better to use a small set of video games than a large set.

It is important to stress that, on the basis of previous results (Ballesteros et al., 2014, 2015b; Lampit et al., 2014), we designed the present study as a group-based training program with the presence of the experimenter throughout. The presence/absence of the experimenter might affect the participants' interest in training (Borella et al., 2010). It is important to note that in our previous training study (Ballesteros et al., 2014) the experimenter was always present during each training session, but each session involved only 2–3 participants and not the whole group. The lack of improvement after training in visuospatial WM in our previous study (Ballesteros et al., 2014) and the positive results obtained in the present study might be due to the larger number of video games used in the earlier study (10) compared to just six specially selected to train visuospatial WM in the present study. The training regime of the present study was focused on enhancing WM, and this might explain the positive training effects obtained here.

Another important question regarding cognitive training concerns transfer. The evidence of transfer from video game training to untrained tasks is mixed, with both positive and negative results. Some researchers (Melby-Lervåg and Hulme, 2013) argue that WM training has positive effects on tasks close to the trained tasks (near transfer). However, in a more recent study the same authors did not find evidence that WM training was effective (Melby-Lervåg and Hulme, 2016). Owen et al. (2010) examined whether some training tasks would improve cognitive performance, and concluded that there was no transfer to untrained tasks. In our study, we found positive near transfer effects, as well as smaller transfer effects on other untrained episodic memory tasks. Moreover, it is important to stress that there is not an overlap between the training and the transfer tasks.

# CONCLUSIONS, LIMITATIONS OF THE PRESENT STUDY AND FUTURE DIRECTIONS

To summarize, the results of the present study suggest that training older adults with non-action video games can be an effective way of improving visuospatial WM performance in tasks designed to assess this type of memory and episodic memory tests. Importantly, the effects were maintained over a 3-month no-contact follow-up period in the Jigsaw puzzle task, Digit forward (short-term memory), and Faces I and Faces II (episodic memory). Transfer effects were not maintained on Corsi blocks. These findings suggest that older brains retain plasticity, but that some periodic boosting sessions are needed to maintain the benefits.

The present study has several limitations. First, our sample was smaller than in other studies (e.g., Mozolic et al., 2011; Anguera et al., 2013). However, it is important to stress that we did not have any drop-outs, which is unusual in longitudinal training studies, which lose between 30 and 40% of participants at follow-up. This suggests that training programs carried out in places that older adults attend regularly, and training sessions attended by the whole group with the presence of the experimenter are more effective than training individually at home or in small groups. Secondly, we did not examine the effects of training older adults with video games on everyday life tasks. This is an important issue for future studies. Thirdly, the control group in the present study was passive. However, most studies have also used a passive control group (e.g., Goldstein et al., 1997; Basak et al., 2008; Maillot et al., 2012; Ballesteros et al., 2014) to compare with the trained group, only a few training studies involving both an active and a passive control group (e.g., Torres, 2008; Stern et al., 2011; Anguera et al., 2013; Boot et al., 2013a). However, it is worth mentioning that in their meta-analysis, Toril et al. (2014) calculated the effect sizes of the published studies Toril et al. Video Games Training in Older Adults

that included both an active and a passive control group (5 out of the 20 studies in the meta-analysis). The mean effect size (Cohen's d) was 0.36 for the active control group and 0.37 for the passive group. The difference was not statistically significant. However, future studies would include active and passive control groups.

It would be interesting in future studies to include a questionnaire to assess expectation (anticipated cognitive gains from game play) and the effects of the social contact with the experimenter and the other older adults. Furthermore, video game designers need to work with researchers in aging to create attractive and useful games specifically designed for older adults. Video games have to be interesting to motivate older adults to play them.

Commercial brain-training programs are currently generating millions of dollars, with very large revenues for the brain-training industry. Video game training is a very active area of research, but there are still important intervention-based factors that require further research. Largescale longitudinal studies with long follow-up assessments of trained and control groups are necessary before researchers can answer many important questions related to the effectiveness of video-games to improve cognition (Boot et al., 2013b; Boot and Kramer, 2014; Green et al., 2014; Anguera and Gazzaley, 2015).

In conclusion, future research should investigate ways of designing video game training regimes that produce and maintain training benefits in older adults. Further research

# REFERENCES


should take into account multi-domain interventions that can be carried out in social settings, involving computerized cognitive training (e.g., video game training) and physical exercise (Ballesteros et al., 2015b). In sum, future studies would benefit from using well-supported neuroscience findings to design multi-domain, longitudinal intervention studies to investigate the possible benefits for older adults, and then validate the benefits of the intervention.

## AUTHOR CONTRIBUTIONS

PT, JM, SB and JMR: conceived and designed the experiments, contributed reagents/material/analysis tools, wrote the article, and reviewed the manuscript. PT: performed the experiments, analyzed the data. JMR: reviewing the data. PT, SB, JM: reviewing literature.

#### ACKNOWLEDGMENTS

This work was supported by a predoctoral fellowship to PT awarded to the Studies on Aging and Neurodegenerative Diseases Consolidated Research Group (Universidad Nacional de Educación a Distancia) and grants from the Spanish Government (PSI2013-41409R) and the Madrid Community (S2010/BMD-2349). We are very grateful to the Senior Center (Pozuelo de Alarcón, Madrid) and to the volunteers who participated in this study.

functions that decline with aging: a randomized controlled trial. Front. Aging. Neurosci. 6:277. doi: 10.3389/fnagi.2014.00277


Frontiers in Human Neuroscience | www.frontiersin.org


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Toril, Reales, Mayas and Ballesteros. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Music Games: Potential Application and Considerations for Rhythmic Training

#### Valentin Bégel 1,2\*, Ines Di Loreto<sup>3</sup> , Antoine Seilles <sup>2</sup> and Simone Dalla Bella1,4,5,6 \*

<sup>1</sup>Euromov Laboratory, University of Montpellier, Montpellier, France, <sup>2</sup>NaturalPad, Montpellier, France, <sup>3</sup>Homme, Environnement et Technologies de l'Information, Université de Technologie de Troyes, Troyes, France, <sup>4</sup> Institut Universitaire de France, Paris, France, <sup>5</sup> International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada, <sup>6</sup>Department of Cognitive Psychology, Wyzsza Szkoła Finansów i Zarzadzani ˛a w Warszawie (WSFiZ), Warsaw, Poland ˙

Rhythmic skills are natural and widespread in the general population. The majority can track the beat of music and move along with it. These abilities are meaningful from a cognitive standpoint given their tight links with prominent motor and cognitive functions such as language and memory. When rhythmic skills are challenged by brain damage or neurodevelopmental disorders, remediation strategies based on rhythm can be considered. For example, rhythmic training can be used to improve motor performance (e.g., gait) as well as cognitive and language skills. Here, we review the games readily available in the market and assess whether they are well-suited for rhythmic training. Games that train rhythm skills may serve as useful tools for retraining motor and cognitive functions in patients with motor or neurodevelopmental disorders (e.g., Parkinson's disease, dyslexia, or ADHD). Our criteria were the peripheral used to capture and record the response, the type of response and the output measure. None of the existing games provides sufficient temporal precision in stimulus presentation and/or data acquisition. In addition, games do not train selectively rhythmic skills. Hence, the available music games, in their present form, are not satisfying for training rhythmic skills. Yet, some features such as the device used, the interface or the game scenario provide good indications for devising efficient training protocols. Guidelines are provided for devising serious music games targeting rhythmic training in the future.

#### Edited by:

Louis Bherer, Université de Montréal, Canada

#### Reviewed by:

Katja Kornysheva, University College London, United Kingdom Michael Thaut, Colorado State University, United States

#### \*Correspondence:

Valentin Bégel valentin.begel@umontpellier.fr Simone Dalla Bella simone.dalla-bella@umontpellier.fr

> Received: 15 October 2016 Accepted: 08 May 2017 Published: 29 May 2017

#### Citation:

Bégel V, Di Loreto I, Seilles A and Dalla Bella S (2017) Music Games: Potential Application and Considerations for Rhythmic Training. Front. Hum. Neurosci. 11:273. doi: 10.3389/fnhum.2017.00273 Keywords: rhythm, serious game, rehabilitation, movement, training

# INTRODUCTION

#### Musical Rhythm as a Training Tool

Humans display a natural tendency to move, spontaneously or deliberately, to the beat of rhythmic auditory stimuli, such as music (Repp, 2005; Repp and Su, 2013). This activity is widespread and is typically participatory. It manifests, for example, in dance, synchronized sport, and in group activities (e.g., waving together at a rock concert). Synchronization to a musical beat is sustained by a complex neuronal network, including perceptual regions (the superior temporal gyrus; Thaut, 2003; Chen et al., 2008a; Schwartze and Kotz, 2013), motor regions (e.g., the basal ganglia and the cerebellum; Grahn and Brett, 2007; Chen et al., 2008b; Grahn and Rowe, 2009), as well as sensorimotor integration areas (e.g., premotor cortex; Chen et al., 2006; Zatorre et al., 2007; Kornysheva and Schubotz, 2011). Disruption of these neuronal networks due to brain damage or neurodevelopmental disorder affects auditory-motor synchronization to a musical beat (Corriveau and Goswami, 2009; Bégel et al., 2017), as well as other functions such as speech. For example, difficulties encountered by individuals who stutter in speech production extend to non-verbal sensorimotor skills (Watkins et al., 2008; Falk et al., 2015). In speech production tasks, individual who stutter display reduced activity in brain regions that are also responsible for beat tracking and synchronization to a musical beat such as the basal ganglia (Civier et al., 2013), and the cerebellum (Brown et al., 2005).

Notably, tracking the beat does not require mandatorily an explicit motor response. In perceptual tasks, when detecting a deviation from the beat in an isochronous sequence (anisochrony detection; Ehrlé and Samson, 2005; Dalla Bella et al., 2017b), or detecting if a metronome is aligned or not to the beat (Beat Alignement Task, Iversen and Patel, 2008), the beat is extracted from the auditory signal to perform the judgment. Interestingly, beat extraction in the absence of an explicit motor response recruits motor regions of the brain, such as the basal ganglia (Grahn and Brett, 2007; Grahn and Rowe, 2009), or premotor cortex (Chen et al., 2008a,b). It is worth noting that processing sequences with an underlying beat engages partly separate mechanisms (beat-based timing), as compared to treating single durations (duration-based timing; Coull et al., 2011). The former rely on basal-ganglia-cortical mechanisms, while the latter are more associated to cerebellar-cortical pathways (Grube et al., 2010; Teki et al., 2011). For the purposes of this short review we focus in particular on the training of rhythmic skills engaging beat-based mechanisms.

Because mere listening to an auditory rhythm, for example conveyed via music, activates movement-related areas of the brain, training with rhythmic stimuli may be beneficial to (re)activate the motor system in the damaged and in the healthy brain. There are a few examples of the beneficial effects of rhythm on motor behavior. Rhythmic auditory stimulation can be used as a tool to retrain gait in Parkinson disease (e.g., for increasing speed and stride length; Thaut et al., 1996; Thaut and Abiru, 2010; Benoit et al., 2014; Dalla Bella et al., 2015, 2017a), to improve arm function in stroke (Thaut et al., 1997, 2002, 2007), or to enhance physical performance in sport (e.g., by reducing oxygen consumption in cycling; Hoffmann et al., 2012; Bardy et al., 2015). Positive effects of rhythmic training are not confined to motor behavior, but can extend to perception (Benoit et al., 2014; Dalla Bella et al., 2015). Stimulation using auditory rhythms shows promise also for training speech perception in children with Developmental Language Disorders (e.g., for syntax processing, Przybylski et al., 2013; Schön and Tillmann, 2015). In sum, previous studies point to beneficial effects of a rhythmic training protocol on movement and cognition. In addition, as rhythmic skills are linked to other cognitive abilities such as working memory and reading skills (Tierney and Kraus, 2013; Woodruff Carr et al., 2014), rhythmic training may foster improvements of more general cognitive abilities, which play a critical role in language learning and literacy (Schwartze and Kotz, 2013; Gordon et al., 2015; Kotz and Gunter, 2015). Ultimately, these beneficial effects of rhythmic training are likely to have positive consequences for health and well-being, such as promoting an active lifestyle, by reducing motor and cognitive decline in patient populations or reducing the need for healthcare services.

# Serious Games

A great deal of work over the last two decades has been devoted to devise and promote games for training patients and for rehabilitation. This stream of research has been encouraged by low-cost and widespread new technologies offering unprecedented opportunities to implement training protocols. An increasing number of technologies are designed to improve health and well-being, from smartphone applications to control dietetics (Withings Wi-Fi Scales) to movement-based rehabilitation tools using motion capture (Zhou and Hu, 2008; Weiss et al., 2009; Chang et al., 2011). Among them, video games provide a way to entertain people while targeting serious goals, such as the rehabilitation of impaired movement skills (e.g., Hammer and Planks, Di Loreto et al., 2013; Nintendo Wii games, Saposnik et al., 2010) or cognitive re-entrainment (e.g., RehaCom, Fernández et al., 2012) in neurological diseases (for a review and classification of serious games in health, see Rego et al., 2010). In particular, movement-based rehabilitation games exploiting motion capture devices such as the Wii or the Kinect are a promising way to use technology in the context of re-education (for a review see Webster and Celik, 2014). This method is referred to as ''Exergaming''. Note that video games for entertainment may also be used in a serious manner. For example, off-the-shelf video games are often used by physicians for therapeutic purposes, such as Nintendo Wii or Kinect games (Lange et al., 2009; Barry et al., 2014; Karahan et al., 2015). Exergaming has been proven as efficient in stroke (Webster and Celik, 2014), Parkinson's disease (Harris et al., 2015), as well as in healthy elderly adults (Sun and Lee, 2013).

During the last 5 years, studies have focused on the cognitive and neuronal underpinnings of the benefits linked to healthtargeted serious games (Connolly et al., 2012). On top of the physical and physiological benefits associated with serious games (e.g., via dedicated physiotherapeutic exercises), the effects of this type of training extend to cognition. Cognitive functions such as language and memory can also be enhanced by serious games, an effect which is likely to be accompanied by plastic changes of the brain. For example, structural brain changes associated with learning have been observed due to the use of videogames (Anguera et al., 2013). These promising results indicate that implementing training protocols via serious games may be particularly valuable for enhancing brain functions as well as for therapy and rehabilitation.

In summary, serious games and rhythmic stimulation are promising tools that can be exploited to improve or retrain movement and cognition. We propose that a training of rhythm skills implemented in a serious game would be a means to set up training protocols which may serve rehabilitation of different patient populations. The aim of this review article is to provide an overview of the existing rhythm games and to assess whether they could be well-suited for training purposes. We conducted a survey in which we used criteria such as the precision of the recorded response or the modality of the stimuli provided to evaluate the benefits and limitations of each game.

# LIMITS AND ADVANTAGES OF THE EXISTING RHYTHM GAMES

To the best of our knowledge, only one music-based training program that uses a game setting has been successfully devised for arm rehabilitation in stroke patients (Friedman et al., 2011, 2014). However, this protocol is not training rhythmic skills per se but is rather a music-based adaptation of a standard rehabilitation protocol (i.e., conventional tabletop exercises therapy; Dickstein et al., 1986). To examine whether existing games involving rhythm conveyed by auditory stimuli could be potentially used as training tools, we selected games based on the following inclusion criteria. First, the game has to focus on rhythmic skills. The player must be instructed to synchronize movement (or voice) with stimuli (auditory or visual) which can be predicted on the basis of their temporal structure (i.e., the underlying beat). To our knowledge, no rhythm games currently on the market use purely perceptual tasks, in which the player's task is to judge the rhythmic features of music. All the games presented below involve movement synchronized to auditory or visual cues. Second, the game device must record the temporal precision of the player's responses. The scores, levels, difficulty and feedback given to the player must depend on her/his temporal precision in performing the movement. Once the games were selected, they were categorized by: (a) the peripheral used to capture and record the response; (b) the type of response that is recorded; and (c) the output. The peripheral is important to judge if the game is readily usable for training (e.g., in a clinical context). In addition, note that most of the studies in cognitive psychology of rhythm use finger tapping, since this is a simple and objective way to study rhythmic skills (Repp, 2005; Repp and Su, 2013), but other reponses are possible (e.g., full-body motion). Finally, the output is relevant as it indicates whether the games provide a feedback (an outcome measure or score) on the precision of the performance reflecting a participant's rhythmic skills. These categories are helpful for evaluating the therapeutic potential of each game. For example, a game requiring finger tapping is likely to have a different effect on behavior than a game requiring full body motion, such as dance.

Twenty-seven games on a variety of devices (Wii, PlayStation, PC, Tablet/Smartphone, Xbox, Gameboy) fulfilling the aforementioned criteria were considered for the analysis (see **Table 1**). These games were classified in four categories as indicated below.

# Games that Involve Full Body Movements Recorded via an External Interface (e.g., Kinect, Wii)

Here we refer mostly to dance games (e.g., Just Dance). These games have interesting applications in physiotherapy for patients with spinal cord injury, traumatic brain injury and stroke (Lange et al., 2009). They focus more on physical exercise and activity than on rhythm per se. Indeed, the ability of these games to record and score the rhythmic precision of the player is rather poor. Because these games focus on discrete movements/actions instead of repeated movements (i.e., rhythmic) they cannot be used for delivering specific training of rhythmic skills. For example, Just Dance consists in reproducing movements that are illustrated through images displayed on the screen. The player's score depends on the precision of the movements as compared to a model action sequence. The player has to execute the movements in a given temporal window. Yet, the task is not purely rhythmic and synchronization to the musical beat is not recorded. In spite of the fact that these games do not measure rhythmic skills per se, they provide a motivating setting to perform dance while monitoring the player's movements. Adding a rhythm component to some of these games, as in the case of dance, may be a valuable strategy to translate them into a training program.

# Games that Involve Rhythmic Finger Tapping on a Tablet

An example of these games is Beat Sneak Bandit. Here, the player has to tap precisely to the beat in order to make the character progress, avoid the enemies and so forth. This kind of feature is used in serious games dedicated to learning, such as Rhythm Cat, designed to learn music rhythm notation. For the purposes of training rhythmic skills, one major drawback of these games is that the timing precision of the software is very poor. The time window in which a response is considered as good is very wide (i.e., up to several hundreds of milliseconds) and the temporal variability of the recording is high. In addition, no feedback on the rhythmic performance of the player is provided.

# Computer or Console Games that Involve Finger Tapping on Keys

These games can be played on a keyboard, using a joystick, or on special devices. One of the most famous is Guitar Hero. In this game the player plays on a guitar replica with five keys, and has to push the keys in correspondence of images presented on a screen. Rhythm precision of the responses is recorded and used to compute a performance score. The response must appear in a specific temporal window to be considered as good. The same concept is used in many PC games, but keyboards key (e.g., arrows) are used instead of guitar replica. As in the case of tablet games, the main weakness of these games is their low temporal precision in recording rhythmic performance (around 100 ms in Guitar Hero). Nevertheless, these games are interesting as they represent a good starting point to develop serious-game applications aimed at training rhythmic skills.

# Console Games Involving Singing

In these games, the player is asked to sing in synchrony with the music. This is not a rhythmic task per se, but the performance involves a rhythmic component. As in classical karaoke, lyrics are presented on the screen. In this case, a feedback (score) is provided to the player while she/he sings and a final global score is given at the end of the performance, including temporal precision (the response must appear in a given temporal window to be considered as good) but also pitch precision. Here, the potential benefit for health rests upon the fact that singing is

#### TABLE 1 | List of the reviewed rhythm-based games.


The last row concerns online PC games (available at www.musicgames.co/games-by-category/rhythm-games/) having similar characteristics.

a good way to restore speech abilities (e.g., fluency) in aphasia following stroke (for example, see Norton et al., 2009).

Even though some of the aforementioned games present good ground for training rhythmic skills, their main drawback is that their temporal precision when recording movement relative to the beat is rather poor. Thus, the output measures provided by these games are insufficient to isolate rhythmic features of the performance (e.g., variability of the motor performance, precision of the synchronization with the beat, etc.). Moreover, in none of these games the rhythmic complexity of musical stimuli has been manipulated. Difficulty is manipulated only through the amount of responses required during the game (e.g., number of visual tags which the player has to react to) which is not a rhythmical feature. For example, using music with various degrees of beat saliency would allow introducing rhythm-based difficulty levels in the game. This has the advantage that rhythms with increased complexity could be presented progressively throughout the game, thus potentially leading to improved beat-tracking skills.

# CONCLUSION

We reviewed 27 rhythm-based games already in the market that could be used in a rhythmic training protocol. Unfortunately, based on our criteria, none of the aforementioned games is satisfying for this purpose. First, in most of the games, the task consists in reacting to visual stimulations while music is presented. Thus rhythmic skills are not selectively trained. Second, the number of stimuli, instead of the rhythmic characteristics of the music, is varied to change the difficulty of the game. Third, in spite of the fact that the regularity of rhythmic patterns can influence the performance in the game, the response provided by the player is not targeted at the rhythmic aspects of the stimuli. For example, the player touches the screen at the right moment to catch objects or makes full-body movements to imitate model-actions in dance games. In addition, note that the reviewed games do not offer opportunities for controlled functional movement training. For example, none of them provide a guidance to achieve desired movement trajectories. This problem may be overcome in the future by providing relevant feedback when the player approaches optimal movement trajectories (e.g., via sonification, Effenberg et al., 2016). The tasks implemented in these games are vaguely reminiscent of implicit timing tasks (Lee, 1976; Zelaznik et al., 2002; Coull and Nobre, 2008). Explicit and implicit timing have been treated as relatively independent processes in cognitive neuroscience (Zelaznik et al., 2002; Coull and Nobre, 2008; Coull et al., 2011). The former is associated with tasks requiring voluntary motor production (e.g., synchronized tapping tasks; Repp, 2005; Repp and Su, 2013) or overt estimation of stimulus duration (e.g., duration discrimination; Grondin, 1993). In contrast, implicit timing is tested with tasks unrelated to timing (e.g., avoiding a vehicle when crossing the road), but in which temporal prediction affects the performance (judging the time before the vehicle reaches us; Lee, 1976; for more details, see Nobre et al., 2007; Coull and Nobre, 2008; Coull, 2009). In particular, temporal prediction fostered by a regular temporal pattern (e.g., a metronome) of sensory stimuli improves performance in non-temporal tasks (e.g., working memory, Cutanda et al., 2015; pitch judgment, Jones et al., 2002; language judgments, Przybylski et al., 2013).

Despite the available music games are not explicitely targeted at rhythmic training, they may still foster training timing implicitly, in combination with other more explicit processes (e.g., focusing on spatial and pitch accuracy). There is evidence that the implicit dimension of timing may be more robust than explicit timing, for example in beat deafness (Bégel et al., 2017). It is possible that participants with timing disorders (e.g., Parkinson's disease or developmental stuttering; Grahn and Brett, 2009; Falk et al., 2015) may still be able to capitalize on partly spared implicit timing functions to re-learn rhythmic skills via a training program. Note, however, that so far beneficial effects of rhythm-based training protocols typically made use of explicit timing tasks (e.g., walking with an auditory rhythm; e.g., Lim et al., 2005; Spaulding et al., 2013; Benoit et al., 2014). This may suggest that tasks which recruit explicit timing mechanisms may be a particularly good candidate to build a successful protocol for rhythmic training. In only one of the reviewed games (Beat Sneak Bandit), the goal was to tap to the beat of music, which is an explicit timing task. In sum, almost all of the reviewed games do not require participants to perform explicitly rhythmic tasks. Yet, they are likely to engage implicit timing mechanisms. Whether training rhythm implicitely in the context of a music game can lead to positive effects comparable to those found with explicit rhythmic tasks deserves further enquiry.


In summary, the games currently on the market, albeit they are not optimal for rhythmic training, provide at least interesting ideas that might pave the ground to devise successful training programs. Games on portable devices (e.g., tablets or smartphones) using tapping to the beat provide the simplest solution to implement a rhythm training protocol. They are low-cost while offering a motivating and user-friendly environment to train rhythmic skills with a playful interface. Although this solution has some potential, there are two problems. The precision of movement recording relative to the beat, and the ensuing measures of rhythm precision, need significant improvement. To deal with these issues, methods used to analyze synchronization to the beat in the neurosciences of rhythm (Kirschner and Tomasello, 2009; Pecenka and Keller, 2009; Woodruff Carr et al., 2014; Dalla Bella et al., 2017b) should be applied to games designed for rhythm training. In addition, estimations of timekeeper and motor implementation variance (e.g., based on tapping performance) might allow to refine the feedback on the performance (e.g., Schulze and Vorberg, 2002; for a review see, Wing, 2002). This will ensure that a precise feedback on the rhythmic performance can be provided and that the stimuli and game progression can be tailored to individual learning curves. Moreover, to ensure that the training program specifically targets rhythmic skills, stimulus (or response) will have to be varied in terms of rhythmic difficulty. This can be achieved, for example, by selecting musical excerpts based on their rhythmic complexity. Using stimuli with increasing difficulty in beat tracking (e.g., with a less salient beat) throughout the game might allow to progressively fine tune the player's rhythmic skills. These guidelines should be taken into account in the future to devise efficient protocols for training rhythmic skills via serious music games.

## AUTHOR CONTRIBUTIONS

VB and SDB conceived the study; VB conducted the survey; VB, IDL and AS contributed to data analysis; all authors contributed to the writing of the manuscript.

# FUNDING

Junior Grant from the Institut Universitaire de France to SDB.

# ACKNOWLEDGMENTS

We wish to thank two reviewers for their insightful comments on a first draft of the manuscript.


with Parkinson's disease: a systematic review. Clin. Rehabil. 19, 695–713. doi: 10.1191/0269215505cr906oa


arm training with stroke patients. Neuropsychologia 40, 1073–1081. doi: 10.1016/s0028-3932(01)00141-5


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Bégel, Di Loreto, Seilles and Dalla Bella. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Computer-Based Cognitive Training for Executive Functions after Stroke: A Systematic Review

Renate M. van de Ven<sup>1</sup> \*, Jaap M. J. Murre<sup>1</sup> , Dick J. Veltman<sup>2</sup> and Ben A. Schmand1, 3

<sup>1</sup> Department of Psychology, Brain and Cognition, University of Amsterdam, Amsterdam, Netherlands, <sup>2</sup> Department of Psychiatry, VU University Medical Center, Amsterdam, Netherlands, <sup>3</sup> Department of Medical Psychology, Academic Medical Centre, University of Amsterdam, Netherlands

Background: Stroke commonly results in cognitive impairments in working memory, attention, and executive function, which may be restored with appropriate training programs. Our aim was to systematically review the evidence for computer-based cognitive training of executive dysfunctions.

Methods: Studies were included if they concerned adults who had suffered stroke or other types of acquired brain injury, if the intervention was computer training of executive functions, and if the outcome was related to executive functioning. We searched in MEDLINE, PsycINFO, Web of Science, and The Cochrane Library. Study quality was evaluated based on the CONSORT Statement. Treatment effect was evaluated based on differences compared to pre-treatment and/or to a control group.

#### Edited by:

Louis Bherer, PERFORM Centre at Concordia University, Canada

#### Reviewed by:

Claude Alain, Rotman Research Institute, Canada Sean Commins, National University of Ireland Maynooth, Ireland

> \*Correspondence: Renate M. van de Ven r.m.vandeven@uva.nl

Received: 07 December 2015 Accepted: 27 March 2016 Published: 20 April 2016

#### Citation:

van de Ven RM, Murre JMJ, Veltman DJ and Schmand BA (2016) Computer-Based Cognitive Training for Executive Functions after Stroke: A Systematic Review. Front. Hum. Neurosci. 10:150. doi: 10.3389/fnhum.2016.00150 Results: Twenty studies were included. Two were randomized controlled trials that used an active control group. The other studies included multiple baselines, a passive control group, or were uncontrolled. Improvements were observed in tasks similar to the training (near transfer) and in tasks dissimilar to the training (far transfer). However, these effects were not larger in trained than in active control groups. Two studies evaluated neural effects and found changes in both functional and structural connectivity. Most studies suffered from methodological limitations (e.g., lack of an active control group and no adjustment for multiple testing) hampering differentiation of training effects from spontaneous recovery, retest effects, and placebo effects.

Conclusions: The positive findings of most studies, including neural changes, warrant continuation of research in this field, but only if its methodological limitations are addressed.

Keywords: working memory, attention, restitution, retraining, acquired brain injury, brain training, executive functions, computer-based

# INTRODUCTION

Stroke, resulting from brain hemorrhage or infarction, commonly results in cognitive impairments such as aphasia, neglect, reduced processing speed, impaired attention, and executive dysfunction (e.g., Cumming et al., 2013). Even though cognition can improve during the first year after stroke (Desmond et al., 1996; Tham et al., 2002; del Ser et al., 2005), cognitive impairment frequently persists long after. More than 60% of stroke survivors still reported mild to severe cognitive impairment up to 10 years after stroke (Maaijwee et al., 2014; Middleton et al., 2014). Furthermore, cognitive impairments continue to deteriorate in 11% of stroke survivors during the first year after stroke (Tham et al., 2002). Therefore, rehabilitation efforts to ameliorate these cognitive impairments are essential.

Guidelines for neurorehabilitation are mainly focused on compensational strategy training (Cicerone et al., 2011). These strategies do not aim to restore brain functions (i.e., restitution), but aim to compensate for the lost function by using remaining intact functions. In this approach, residual plasticity of the brain throughout adulthood, which may enable restitution of the impaired function, is ignored (e.g., Kelly et al., 2006; Takeuchi and Izumi, 2015).

Robertson and Murre (1999) postulated that depending on the amount of remaining connectivity, different types of intervention are needed, notably restitution or compensation. Mildly damaged brain networks might reconnect by everyday life activities, and no special intervention is necessary. Severely affected brain networks may not be able to reconnect at all. Therefore, in severe cases compensational interventions are required that make use of preserved networks. For moderately affected networks, restitution-based interventions may be needed to stimulate the relevant parts of the impaired network.

Restitution focused treatments commonly consist of massed frequent repetition or stimulation of the affected function (Hamzei et al., 2006). They have proven to be effective in the domains of language, motor function, and vision (e.g., Kurland et al., 2010; Thrane et al., 2014). For other cognitive domains, such as attention and executive function, restitution training may consist of, for example, training reaction speed. Conversely, compensation interventions may consist of, for example, time management training to teach the patient to take more time for task execution. One type of restitution-based interventions use computer tasks aimed at training of damaged networks.

To date it is not yet clear whether restitution-based computer training can improve attention, working memory, and executive functions. In healthy adults, training effects have been contradictory (e.g., Owen et al., 2010; Anguera et al., 2013; Corbett et al., 2015), but a recent meta-analysis concluded that cognition can be improved (Toril et al., 2014). A systematic review of 10 studies in stroke patients concluded that restitution- and compensation-based interventions improved executive functions (Poulin et al., 2012). Even though the review by Poulin et al. did not only focus on restitutionbased computerized training programs, their review does provide ground to further evaluate these restitution-based training programs.

This systematic review provides an overview of the evidence concerning the effects of computer-based restitution rehabilitation after stroke and other acquired brain damage to restore executive functioning. The term executive function includes a spectrum of cognitive functions, all revolving around control of one's behavior. This includes mental set shifting (i.e., changing from one set of task rules to another), information updating, and inhibition of prepotent but inappropriate responses (Miyake et al., 2000). For this review we considered working memory and divided (or selective) attention as part of the executive domain. Training programs that only focused on vigilance, tonic alertness, and sustained attention without any divided or selective attention tasks were not included.

# METHODS

#### Search Strategies

We performed this systematic review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA, Moher et al., 2009) statement. We searched in MEDLINE, PsycINFO, Web of Science, and The Cochrane Library. The search terms entered were a combination of three search areas that defined (1) the population as adults who had suffered a stroke or acquired brain injury, (2) the intervention as executive function computer training, and (3) the outcome as executive functioning. The complete search strategy can be found in Supplementary Material 1.

## Inclusion and Exclusion Criteria

We considered articles in English, limited to humans, and published before the 12th of May 2015. Included participants were adults who had suffered stroke or other acquired brain injury. Computer training had to be the main intervention with a focus to improve working memory, attention related to executive functioning, or executive functioning.

Studies of strategy education or virtual reality training were excluded. Study protocols and dissertations were not considered. The selection of studies was first based on screening of title and abstract, followed by reading of the full text of the remaining studies (see flowchart in **Figure 1**). When in doubt, selection was discussed until consensus was reached.

# Rating of Methodological Quality

The quality of the included studies was evaluated based on recommendations for reporting trials of the Consolidated Standards of Reporting Trials (CONSORT) statement (see **Table 1**). For each study, we also extracted the authors; year of publication; population; control group; training and its focus, duration, and setting; outcome measures and their significance level; the presence of adjustments for multiple testing; whether performance on training was related to outcome measures; use of ecologically valid measures; and potential conflicts of interest (see **Tables 2**–**4**). Treatment effect was evaluated based on statistically significant differences compared to pre-treatment and/or to a control group. Whenever adjustment for multiple testing was not performed and p-values were provided, we adjusted the reported p-values with Bonferroni-Holm correction. Similarly, for studies that did adjust but provided sufficient information to calculate the unadjusted p-value, **Tables 2**–**4** show tasks that would be significant without the adjustment. Due to the heterogeneity of the outcome measures, it was not possible to perform a metaanalysis.

# RESULTS

We reviewed 1469 titles and abstracts; 63 studies were reviewed based on full-text. Twenty studies satisfied inclusion and

exclusion criteria for this systematic review (see **Figure 1**). An overview of the data extracted is listed in **Tables 2**–**4**.

The included studies consisted of nine randomized controlled trials (RCT), six single case studies, four uncontrolled trials of which two used multiple baselines (i.e., multiple measurement time-points before training onset), and one retrospective study. Two studies used an active control group (i.e., the control group received an alternative, but supposedly ineffective training), and seven studies used a passive control group (i.e., the control group did not receive anything in addition to care as usual). The median sample size was 32 (range: 1–75). Two studies had a single subject design (i.e., n = 1). Two studies used the same sample (Akerlund et al., 2013; Björkdahl et al., 2013). Five studies included postacute patients, six included chronic patients, and nine included a combination of both.

Scores on the selected CONSORT statement criteria ranged from 7 to 11.5 out of maximum 16 (see **Table 1**). Setting of training (e.g., given at home or in the rehabilitation center with or without supervision) was described in 11 studies. In all but two studies (Chen et al., 1997; De Luca et al., 2014) reports of training duration included the scheduled number of sessions per week. The median planned number of hours of training was 15.6 (range: 4.5–60). Only three studies included the actual number of training hours performed by the participants (Gray et al., 1992; Westerberg et al., 2007; Lundqvist et al., 2010).

Blinding of assessors was done in three studies, but the participants were never blinded. Description of outcome measures commonly included the name of the task, but not which specific task parameter was used (e.g., raw scores or scaled scores, response times or number of errors). One study did not use statistical methods to evaluate its results (van Vleet et al., 2015). Potential harms of the training were evaluated in two studies. One study reported no harms (Westerberg et al., 2007); the other reported mental fatigue, headache, and eye irritation (Fernandez et al., 2012).

Four studies adjusted for multiple statistical testing (Chen et al., 1997; Sturm et al., 1997, 2003; Spikman et al., 2010), and one corrected part of the statistical tests (Lundqvist et al., 2010). None of the studies correlated improvements on outcome measures with progression of performance during the training. Four studies examined performance on the training tasks itself, which improved in all studies (Westerberg et al., 2007; Lundqvist et al., 2010; Zickefoose et al., 2013; van Vleet et al., 2015). Two studies reported conflicts of interest (Ruff et al., 1994; Westerberg


 case

 +,

 -,

 n.a.,




(Continued)


SDMT, Symbol Digit Modalities Test; 2-letter cancel., Two-letter cancellation

 task; subj. Att. Beh., Rating Scale of Attentional Behaviors; TAP, Test of Attentional Performance.


3B


An

overview

of

attention

training

studies

with

single

baseline

measurement.

(Continued)


Train.improv.,trainingimprovements;Ss,subjects;IG,interventiongroup;CG,controlgroup;CAU,careasusual;N,samplesize;y,years;m,months;w,weeks;d/w,daysperweek;h,hours;min,minutes;TBI,traumaticbrain

 injury;dysfunc., dysfunction; att., attention; unkn., unknown; rehab., rehabilitation center; n.a., not applicable; stat., statistical; correct., correction; EF, executive functioning. tests:Mini-MentalStateMontrealFrontalAssessmentand

Abbreviations neuropsychological MMSE, Examination; MoCA, Cognitive Assessment; FAB, Battery; HADS, Hospital Anxiety Depression Scale; anx., anxiety; depr., depression; IADL, Instrumental Activities of Daily Living Scale; SS-QOL-2, Stroke Specific Quality of Life Scale; PGIS, Patients' Global Impressions Scale; CGIS, Clinical Global Impressions Scale; ACT, auditory consonant trigrams; TMT, Trail Making Test;verb.,verbal;LNS,letternumbersequencing;PCL,Post-TraumaticStressDisorderChecklist;acc.,accuracy;RT,reactiontime;TAP,TestofAttentionalPerformance;TEA,TestofEverydayAttention.

TABLE

3B


Continued


TABLE

Abbreviations

Assoc., Associated;

 Subj., subjective; att., attention; mem., memory; depr., depression.

neuropsychological

 tests: Pc, personal computer; SAT, Selective Attention Test; Cont. perf., Continuous

 performance;

 RAVLT, Rey Auditory Verbal Learning Test; WMS, Wechsler Memory Scale; reprod., reproduction;


4B | An overview of combined working memory and attention training studies with single baseline

measurement.

TABLE

(Continued)


4B


Continued


4B


Continued

(Continued)


4B


Continued


DEX, Dysexecutive

Tx Satisf. Scale, Treatment Satisfaction

 Questionnaire;

 Ex. Obser. Scale, Executive Observation

 Scale; Tx Goal Attain., Treatment Goal Attainment;

Wechsler Memory Scale; PASAT, Paced Auditory Serial Attention Test; proc., processing;

 pers., perseverative;

 Scale; QOLIBRI, Quality of Life after Brain Injury; satisfac, satisfaction;

 Ex. Secret. Task, Executive Secretarial Task.

 cancel., cancelation;

 GHQ-28, General Health Questionnaire;

 BADS, Behavioural

 Assessment

 of the Dysexecutive

 Syndrome; ToL, Tower of London;

 corr., correlation; Role Resum. List, Role Resumption

 List;

April 2016 | Volume 10 | Article 150

et al., 2007), six studies reported no conflicts of interest, and 12 studies did not report on this. The extracted studies evaluated working memory training, attention training, or both. We will now discuss the evidence of these training programs in more detail.

# Working Memory Training

Working memory is the storage of information for a short period of time such that it can be manipulated (Baddeley, 1992). It is important for many other cognitive functions such as planning, problem solving, and learning. It is crucial for everyday functioning, which is one of the reasons that it is the focus of many training studies (Westerberg et al., 2007; Lundqvist et al., 2010; Akerlund et al., 2013; Björkdahl et al., 2013). The most common computerized working memory training currently used is Cogmed QM (from Cogmed Systems AB, Stockholm, Sweden; now published by Pearson Assessment and Information B.V.).

#### Cogmed Training

The Cogmed training consists of five 30–40 min sessions per week during 5 weeks. Thus, a total of about 15 h of training is provided. It includes both audio (verbal) and visual (visuospatial) working memory tasks, which always require a motor response. Task difficulty is adapted to the performance of the trainee, and positive feedback is given immediately. It is a computer-based program that can either be done at the rehabilitation center (Lundqvist et al., 2010; Akerlund et al., 2013; Björkdahl et al., 2013) or at home (Westerberg et al., 2007). A coach monitors the progression of the trainee and contacts the trainee once per week to provide individual feedback. A detailed description of each task used in the training can be found elsewhere (Westerberg et al., 2007).

#### Objective Improvements of Working Memory

The training resulted in improvements on most objective working memory tasks used (Westerberg et al., 2007; Lundqvist et al., 2010; Akerlund et al., 2013) and the effects remained stable during three (Akerlund et al., 2013) or 5 months after training completion (Lundqvist et al., 2010; see **Table 2** for an overview). The tasks used to evaluate the training were all fairly similar to the training tasks and included verbal and visuospatial tasks, but some tasks were dissimilar to the training. This is important, because improvements only on tasks that are similar to the training (i.e., near transfer effect) are less likely to contribute to improvements in daily living than improvements that also generalize to tasks that are dissimilar to the training (i.e., far transfer effects). Far transfer was observed for complex working memory tasks that involved more than just remembering the stimuli (Lundqvist et al., 2010). These improvements in the intervention group (n = 21) were not observed in the passive control group (n = 11), but the two groups were not directly compared. The improved performance of one of these complex working memory tasks remained significant 5 months after training completion.

Objective Improvements in Untrained Cognitive Tasks Objective improvements were not only observed on working memory tasks. General cognitive performance, as measured by an elaborate screening, significantly improved after training, also in comparison to the control group (Akerlund et al., 2013).

Improvements in other cognitive domains were mixed. Attention, which is closely related to working memory, also benefited from working memory training (Westerberg et al., 2007). Conversely, performance on a non-trained reasoning task did not improve significantly more than in the control group (Westerberg et al., 2007). The effect of the working memory training on inhibition appears somewhat inconclusive. Improvement on the Stroop color-word interference task was not significantly greater than in the control group (Westerberg et al., 2007). In another study, however, scores on the inhibition and switching condition of the slightly different Color Word Interference Test (CWIT) significantly improved after the training and remained stable 20 weeks after training completion (Lundqvist et al., 2010). This task seems to involve more working memory than the Stroop task, as it requires not only inhibiting a preferred response, but also switching between two task sets (i.e., mentioning the color of the ink vs. mentioning the letters of the word). This may explain why improvement of working memory could benefit CWIT performance and, thus, may not reflect improved inhibition per se.

#### Subjective Improvements

Working memory training also seems to improve subjective functioning in daily life. Improvements were seen in subjective ratings of working memory and in the effects of fatigue on daily living (Björkdahl et al., 2013), subjective cognitive functioning (Westerberg et al., 2007), and (satisfaction with) occupational performance (Lundqvist et al., 2010). It did not specifically improve subjective executive functioning (Akerlund et al., 2013). Effects of the training on health related quality of life were inconsistent as a significant improvement was only found for one of two questionnaires (Lundqvist et al., 2010).

However, all these studies used a control group that received either no training (Westerberg et al., 2007; Lundqvist et al., 2010) or care as usual (Akerlund et al., 2013; Björkdahl et al., 2013). Factors such as social contact or placebo effects may have accounted for the reported results. Nevertheless, Westerberg and colleagues reported that the effect of the training on the subjective measure of cognitive functioning was mostly in items related to attention and not in more general items. This suggests that it was a real training effect. Future studies should include an active control group that receives a mock training to control for placebo effects.

The question is, however, whether a placebo effect should be seen as irrelevant. The subjective experience of participants is important as this may improve their mood and selfconfidence. Indeed, mood seemed to improve after working memory training (Akerlund et al., 2013). Furthermore, as Lundqvist suggested, following the structured training program may prepare individuals for returning back to work as they need to adhere to appointments and schedules in both instances.

#### Limitations of Working Memory Training Studies

Apart from the lack of appropriate control groups, another limitation of most of these studies is that they did not adjust for multiple statistical testing (Westerberg et al., 2007; Akerlund et al., 2013; Björkdahl et al., 2013), or only for part of the statistical tests (Lundqvist et al., 2010). An overview of which tasks would survive adjustment for multiple testing can be found in **Table 2**.

If multiple testing and comparisons with appropriate control groups were taken into account, some effects would disappear. From the objective working memory measures, only digit span backwards appeared to be significantly improved immediately after training (Westerberg et al., 2007; Akerlund et al., 2013) and at 3 months follow-up (Akerlund et al., 2013). The objective improvements of attention would remain significant and thus seem promising (Westerberg et al., 2007). Of the subjective measures, only subjective cognitive improvement tended to remain significant (Westerberg et al., 2007). In the study by Björkdahl et al. (2013) none of the between-group comparisons of subjective measures remained significant after adjusting for multiple testing, suggesting that these effects were not robust.

In two out of three studies there was no effect of the training on the visuospatial working memory task after adjustment for multiple testing (Westerberg et al., 2007; Akerlund et al., 2013). The visuospatial tasks used in the training may not have been sufficiently challenging to elicit transfer effects.

Lundqvist et al. (2010) and Westerberg et al. (2007) reported improved performance on training tasks. If improvements in cognition are due to the training, there needs to be a substantial correlation between the two. However, none of the studies related the improvements of the outcome measures to the improvement observed during the training.

#### Conclusion of Working Memory Training Studies

In sum, there is preliminary evidence that Cogmed can improve performance on tasks that are similar to the training (near transfer) and tasks that are dissimilar to the training (far transfer). This is the case for both objective working memory and attention. It also seems to improve subjective cognitive functioning. Moreover, the effect of the training has been shown for verbal working memory but not for visual working memory. Nevertheless, all studies described so far suffered from methodological limitations, to which we will return in the discussion section.

# Attention Training

#### AixTent Training

Training programs aimed at improving attention are more diverse than those aimed at working memory (see **Table 3A** for an overview of attention studies with double baseline and **Table 3B** for studies with single baseline). One commonly used training is AixTent, which consists of separate training modules that can be combined. The modules focus on phasic alertness, vigilance, selective attention, or divided attention. Responses can be given with two response keys that can also be operated with only one hand. All tasks were designed to be game-like, and task difficulty is automatically adapted to the performance of the participant. Feedback is given during and at the end of a training session.

The phasic alertness training task requires controlling the speed of a vehicle to avoid hitting obstacles. The vigilance training tasks include identifying damaged objects in a production line and identifying changes in airplane movements on a flight radar. The selective attention training tasks requires to respond quickly when previously defined objects appear on the screen and to ignore others. The divided attention training task requires to monitor three parameters (both visual and auditory) and press whenever either of these parameters fall outside a certain range (Sturm et al., 1997).

#### Specific vs. Non-Specific Attention Training

AixTent was used in two studies that examined whether attention training should be specifically aimed at the impaired domain or whether general attention training could also result in improvements of a specific attention domain. Participants received the training for one of at least two attention domains that were impaired. Thus, the affected target domain received specific training, whereas the other received a non-specific training. After adjusting for multiple testing, the training improved only (Sturm et al., 2003) or mostly (Sturm et al., 1997) the target domain. This does not imply that the training resulted only in near transfer, as the tasks used for the training differed from the outcome measures. Moreover, the vigilance training improved selective attention, and the basic alertness training improved more complex selective and divided attention (Sturm et al., 1997). Thus, some far transfer effects to other domains seemed to be present. The authors concluded that attention training should be specific. This may in particular be the case when cognitive functions are hierarchical, where more basic functions should be trained first followed by more complex cognitive functions.

#### Basic Attention Training

These results (Sturm et al., 1997, 2003) also suggest that improvements in basic cognitive functions may generalize toward improving more complex cognitive functions but not the other way around. This implication indeed seemed to hold (at least partially) in a single case study and in a small matched control study of a basic alertness training (Hauke et al., 2011; van Vleet et al., 2015). In the single case study, the training effect was largest for alertness, that is, for the attention domain being trained (Hauke et al., 2011). During the multiple baseline assessments there was no improvement of alertness, suggesting the effect was specific to the training period. Training this basic attention domain not only improved alertness, but also focused attention, vigilance, and divided attention (both visual and auditory). These improvements remained stable 6 months after training completion. The participant also reported subjective improvements of attention to a normal level. She reported lower levels of fatigue, but still not at a normal level.

All improvements were observed already within six or eight training sessions, and subsequently, performance remained stable, suggesting that a few sessions suffice to train attention. Alternatively, placebo effects may have been present as only three training sessions already had a significant effect on alertness. Moreover, the significant improvements in the attention domains not being trained were already observed during the baseline period. Thus, it is impossible to separate the effect of the basic attention training in these more complex attention domains.

Basic attention training also resulted in improvements of nontrained executive functioning in a small matched control study (van Vleet et al., 2015). Three mild TBI patients with complaints of executive functioning received 4.5 h of alertness training. Clinically significant improvements (z-score difference > 1) were found on the individual level. All three patients clinically improved on two or three of the five executive functioning tasks and on an attention task. Conversely, one of the two control participants improved on only one of the five executive functioning tasks. These two small studies did not provide pvalues (Hauke et al., 2011) or did not perform statistical testing (van Vleet et al., 2015). Thus, evaluation of the effects after adjustment for multiple testing could not be performed.

#### Hierarchical Attention Training

The above findings suggest that training basic attention may result in improvement of more complex attention and executive functioning. The effect of a hierarchical approach to attention training was examined in four patients who suffered an acquired brain injury (Gauggel and Niemann, 1996). During the first week of the study alertness was trained, followed by vigilance training and selective attention training, and in the last week divided attention was trained.

Participants were studied within 3–16 months post onset, and two already showed improvements during the baseline phase. It was, therefore, impossible to conclude whether the improvement after training of these two participants on an attention task was due to the hierarchical training. The effect of training did not generalize to ratings of life satisfaction and depressive feelings, or to non-trained cognitive domains.

The inconclusive results of this small study are not in line with the previous studies. Since this study presented the training in a hierarchical manner, one would expect clear improvement in attention and maybe even in other cognitive domains. The training duration of 12.5 h may have been insufficient as multiple training tasks were used. No outcome measures related to executive functioning were included. Thus, it is impossible to determine whether a hierarchical approach results in improvements of executive functioning.

#### Training of Multiple Attention Domains

Several other studies that also used tasks from multiple attention domains, but which did so for each training session in a nonhierarchical way, showed mixed results. Tasks used to train attention can be either basic or can be made more interesting by adding graphics and by integrating them into a gamelike environment (such as AixTent). Zickefoose et al. (2013) compared both of these types of attention training within one study. Their sample consisted of four participants who had suffered a severe traumatic brain injury (TBI) at least 3 years ago. Within an A-B-A-C-A design, participants first started with 20 half-hour sessions of either the basic Attention Process Training-3 or several game-like attention tasks of the Lumosity website. Next, they followed 20 sessions of the other training.

Participants improved on the training tasks; they especially enjoyed Lumosity and were motivated to continue the training. Improvements were only observed in a subset of the nontrained tasks. One of the attention tasks appeared to suffer from a ceiling effect. One participant significantly improved after both training programs, whereas the other three participants showed both improvements and decrements in performance. Nevertheless, when there was an improvement, it was not only in basic attention but also in the more complex divided attention. The patterns of improvement revealed that generalization effects in this study, if any, were not very convincing. The authors suggested that the effects could be larger for less severely affected patients or for those receiving training early after injury. In addition, similar to Gauggel and Niemann (1996), the training occurred two times per week for 4 weeks, giving a total of 20 h, which may have been too short for generalization to occur.

In a RCT, Prokopenko et al. (2013) trained post-acute stroke patients with mild cognitive impairment and mild dementia. They based their training on several tasks used in neuropsychological assessments and kept the graphics of the training simple. Two weeks of training, focused on improving attention and visual and spatial abilities, apparently resulted in near transfer effects. After the training, participants in the intervention group (n = 24) scored significantly higher than the care-as-usual control group (n = 19) on tasks that closely resembled tasks used in the training.

Far transfer effects, however, were only observed in one out of seven tasks (a screening of executive functioning). Instrumental activities of daily living, mood, and quality of life did not improve (Prokopenko et al., 2013). None of the significant near and far transfer effects would survive adjustment for multiple testing. The measures that did not improve were very general and may have been insensitive to training effects. Furthermore, even though relatively long compared to other attention training programs, this training was still short. It only involved 15 h spread over 2 weeks and over training tasks of multiple attention domains, and the training tasks were not very attractive, which may have influenced participants' motivation. Nevertheless, only the intervention group reported subjective improvement of symptoms after the 2-week period, based on a rating of training satisfaction.

One study did not find any training effects. Ten patients who were within 9 months post severe head injury followed a speed of processing training that consisted of simple reaction time tasks, some of which involved the inhibition of responses (Ponsford and Kinsella, 1988). At a group level, the training did not add to the effect of spontaneous recovery. In half of the participants there only appeared to be a training effect when the therapist gave feedback about performance on the training tasks. This suggests that giving insight into the participant's performance, and thereby potentially increasing their motivation for the training, is important.

The training duration was 7.5 h in total, which is nearly half as long as the attention training programs we discussed so far. In addition, multiple tasks were used in the training, thus the training may not have been long enough to result in improvements. Another study that did show some effect of training with multiple tasks had at least 15 h of training (Prokopenko et al., 2013). In addition, the participants of Ponsford and Kinsella's study suffered very severe head injury, so that their brain damage may have been too severe for restitution training to be effective.

A strong point of the study by Ponsford and Kinsella is that they used an appropriate method to control for effects of spontaneous recovery. They did not only use a multiple baseline design, but they also investigated whether the increase in performance was larger during the training period than during the baseline period. The lack of training effect after correcting for spontaneous recovery underscores the necessity of adequate control groups or multiple baseline measurements.

#### Conclusion of Attention Training Studies

Based on the results of these studies, it is still unclear what an attention training should consist of to be effective. Neither the Attention Process Training-3 nor Lumosity training proved to be superior to the other (Zickefoose et al., 2013). Participants preferred the graphically stimulating Lumosity training, compared with the basic training. This indicates the importance to adjust training environments to the preferences of the trainee. Graphics can make the training more interesting. However, our experience in clinical practice is that, for example, flashing graphics and sounds may be distracting for certain patients. This potential trade-off should be investigated further.

Training is most effective in the attention domain that is specifically trained (Sturm et al., 1997, 2003; Hauke et al., 2011; Prokopenko et al., 2013; van Vleet et al., 2015). Attention may be seen as a hierarchy, in which training of basic attention can improve more complex attention. It is not clear yet whether training complex before basic attention can result in overloaded basic attention and consequently in deteriorated performance, as was suggested by Sturm et al. (1997, 2003). It is also not clear whether a hierarchical training would be superior to a training that either focuses on one attention domain or that combines several attention domains per session (Gauggel and Niemann, 1996; Prokopenko et al., 2013).

Several types of attention training transferred to at least some executive function tasks (Sturm et al., 1997; Hauke et al., 2011; Prokopenko et al., 2013; Zickefoose et al., 2013; van Vleet et al., 2015), but not to an abstraction task (Ponsford and Kinsella, 1988). Ecologically valid measures were not often included (Sturm et al., 1997, 2003; Zickefoose et al., 2013) or were only very general (Gauggel and Niemann, 1996; van Vleet et al., 2015). Of these ecologically valid measures, objective attention (Ponsford and Kinsella, 1988), subjective IADL (Prokopenko et al., 2013), and life satisfaction (Gauggel and Niemann, 1996; Prokopenko et al., 2013) did not significantly improve. Only subjective attention improved (Hauke et al., 2011), whereas results for mood were inconclusive (Gauggel and Niemann, 1996; Prokopenko et al., 2013; van Vleet et al., 2015). Finally, it is important to provide feedback to the participant (Ponsford and Kinsella, 1988). Except for Sturm et al. (1997, 2003) and Prokopenko et al. (2013), studies did not correct for multiple testing and did not provide p-values. Thus, we were unable to take into account any distortions due to multiple statistical testing.

#### Limitations

The inter-individual differences in training outcomes may be due to factors such as lesion characteristics. None of the studies determined the extent of brain damage. One would expect that not everybody benefits equally from restitution-based training, assuming it depends on the residual functionality of the network being trained (Robertson and Murre, 1999). The study that included very severe head injury patients (Ponsford and Kinsella, 1988) did not reveal any transfer effects of the training, whereas the studies that included mild brain injury patients showed some transfer effects (e.g., Prokopenko et al., 2013; van Vleet et al., 2015). Future studies should, therefore, include imaging measures that can provide insight into the severity of damage to brain networks. Other limitations will be outlined in the discussion section.

# Combined Working Memory and Attention Training

Non-specific training may result in beneficial effects when the aim is not to train one specific domain. Most studies that combined several cognitive domains included attention and working memory games (see **Table 4A** for an overview of a combined training study with double baseline and **Table 4B** for studies with single baseline). A variety of programs were used. One program used by two studies was RehaCom.

#### RehaCom

The RehaCom training consists of several graphical games that adapt to the performance of the participant and use a variety of stimuli such as playing cards. The training focusses on several cognitive domains. First, selective attention tasks where, for example a particular image needs to be found amongst several distracter images. Second, working memory tasks included to click on the playing cards that were shown before; at higher levels the cards need to be reproduced in reversed order. Finally, executive function was trained via divided attention tasks such as control the speed of a car while listening to the radio; or buying items from a shopping list while the purchases must fit within a certain budget.

#### Non-Specific Training

The two studies that evaluated RehaCom found generalizing effects to nearly all tasks used. Training improved performance on seven working memory tasks (both auditory and visual) and an attention task (Fernandez et al., 2012; Lin et al., 2014). Even though the authors did not adjust for multiple testing, the effect found by Lin et al. (2014) would remain significant if adjusted. No improvements were observed in the control group (n = 18), which received no training (Lin et al., 2014). However, the two groups were not directly compared, and Fernandez and colleagues did not include a control group. Thus, the results may be due to factors other than the training. Although these two studies used training programs of 50–60 h that included executive function tasks, there were no significant improvements on a task that is frequently used to measure executive function (i.e., the Trail Making Task version B). Both studies only used one outcome measure to assess executive function, which may have been insufficient to capture the spectrum of executive functioning.

A RCT using a similar, non-specific, 8-week training did reveal significant improvements on two tasks measuring executive function (De Luca et al., 2014). Participants who completed this training (n = 15) improved on 13 of the 14 outcome measures. This included objective neuropsychological measures of executive functioning, attention, and memory. It also included subjective functional and behavioral scales for daily living (De Luca et al., 2014). These improvements, except for one scale measuring functional performance in everyday life, were significantly larger than in a control group (n = 19), which received care as usual. Even though the authors did not adjust for multiple testing, 12 outcome measures (including the executive function measure) would survive adjustment for multiple testing. This suggests that the training resulted in improvements that generalized to untrained tasks.

As the study sample consisted of post-acute patients who had suffered severe brain injury, these positive results do not agree with the studies discussed earlier that failed to reveal (conclusive) transfer of training effects after severe brain injury (Ponsford and Kinsella, 1988; Zickefoose et al., 2013). De Luca and colleagues did not provide detailed information about the training or session duration. It is, therefore, impossible to evaluate which elements of the training resulted in these positive effects. For the subjective outcome measures it should be kept in mind that the control group received less attention, whereas the intervention group received 24 extra sessions, which may have contributed to a larger placebo effect.

Spikman et al. (2010) evaluated a 20-h Cogpack training (n = 37) and compared it to a multifaceted strategy training (n = 38). They found improvements in objective and subjective executive functioning in both groups. A far transfer effect was also observed in short-term memory and in subjective quality of life. All but the subjective quality of life improvements remained significant 6 months after training completion. These results were adjusted for multiple testing, which suggests that effects were likely to be true effects. Nevertheless, the Cogpack computertraining group never improved significantly more than the comparison group. Conversely, immediately after training, the strategy group improved more than the computer-training group on two executive function scales. These were, however, both rated by the therapist who was not blind to treatment condition. Neither of the training programs showed improvements in two tasks commonly used to measure inhibition or executive functioning. This was similar to what was found by Fernandez et al. (2012) and Lin et al. (2014). These two tasks may have been less vulnerable to retest effects than the other two executive function tasks that did show improvements after the training.

Both groups were equally satisfied with training, reported less executive dysfunction 6 months after the training, and felt that they started to participate again in social and vocational life. There was no evidence that the Cogpack computer training resulted in better outcome compared to strategy training. However, since improvements were observed in both groups a waiting list control group would be necessary to confirm whether the effects were specific to the training. Nevertheless, even if the improvements were mere retest effects, they may have had a positive effect on the participants' mood and motivation to continue a rehabilitation program.

In sum, training that combines memory and attention tasks resulted in transfer to working memory and attention tasks that were not trained. The extent of these training effects on executive function remains unclear as most studies included only one executive function task (Fernandez et al., 2012; Lin et al., 2014). The studies that did include multiple executive function tasks, did find improvements on most of these tasks (Spikman et al., 2010; De Luca et al., 2014), but the results of Spikman and colleagues were also found in their comparison group.

#### More Specific Training

Two studies used training programs that were primarily focused on one cognitive domain. The main focus of the training used in the RCT by Gray et al. (1992) was attention; we report this training in this section as it also included set shifting. The training consisted of approximately 14 sessions of 1–1.5 h, resulting in about 15 h of training (n = 17). The active control group (n = 14) could play computer games of their choice that did not involve time pressure, and they trained 12.7 h on average.

Twenty-one outcome measures were used, but only two significant group differences were found. Moreover, these effects disappeared when time since onset of brain injury and premorbid IQ were taken into account. Thus, the authors failed to find any far transfer effects immediately after training. However, 6 months after training completion, the experimental group did show a significant improvement compared to the control group on several tasks that were similar to the focus of the training. This effect remained after controlling for premorbid IQ and time since onset. The authors suggested that these improvements were already visible immediately after training but only reached significance at follow-up. They concluded that the training only had an effect on targeted functions but failed to generalize to cognitive functions that were not trained. This study stresses the importance of follow-up measurements.

Although the training included several executive functioning tasks, the experimental group did not improve significantly more on these tasks than the control group. Both groups showed large variability in baseline scores on the executive functioning task similar to the training. Perhaps the study lacked sufficient statistical power to reveal a significant improvement. Furthermore, as the control group could freely choose the computer tasks, it was unclear which they performed and whether these tasks improved cognition.

Another study that used a specific training consisted of either memory tasks or attention tasks (Ruff et al., 1994). These two training programs were compared in a multiple baseline design with 15 participants who had suffered severe head injury. However, both groups were pooled for statistical analyses, so that unfortunately training specific effects could not be identified. Results revealed that a proxy, who knew that their acquaintance followed the training, rated significant improvements in both attention and memory. Participants themselves rated that they significantly improved in memory, but not in attention. The training also improved objective short-term memory performance but failed to influence long-term memory. Depression scores did not consistently change after the training.

The authors did not include a control group, nor did they adjust for multiple testing. Only the effect on a processing speed task and the proxy ratings on memory would remain significant if they would have been adjusted. As the training tasks were not described, it is impossible to evaluate the results in light of the training. Moreover, the absence of executive function outcome measures makes it impossible to conclude whether the effects generalized to executive functions.

#### Hierarchical Training

In a retrospective study a hierarchical computer training was evaluated in closed head injury patients (Chen et al., 1997). The training started with basic cognitive functions and subsequently focused on more complex functions. Due to the retrospective nature of this study, training duration and interval between training and follow-up differed between participants. No differences were found between the care-as-usual group (n = 20) and the computer-training group (n = 20) in four composite scores of the cognitive domains on which the training focused. Nevertheless, when evaluating each task separately, the computer-training group gained significantly on 20 tasks compared with a mere 10 tasks in the care-as-usual group after adjusting for multiple testing (see **Table 4B** for measures that would be significant without adjustment). This included an executive function task, an attention task, and some memory tasks. Participants were not randomly assigned to groups, and the groups differed significantly in time since onset and length of treatment. Even though these two variables were added as covariates, still other factors may have influenced the treatment effects.

#### Conclusions and Limitations of Combined Working Memory and Attention Training

Training programs combining attention, working memory, and other executive function tasks did not show consistent objective executive functioning improvements. This may be due to the small number of tasks used in some studies to measure executive functioning, to the large variability of baseline scores on these tasks, and to the often small sample sizes and ensuing low statistical power of these studies.

Subjective executive function improvements were noted by the participants themselves and by their proxies and therapists (Spikman et al., 2010). Other subjective improvements were reported for attention and memory (Ruff et al., 1994), everyday life functioning (De Luca et al., 2014), and quality of life (Spikman et al., 2010). Effects on mood were inconclusive; whereas reductions in anxiety were found, psychological wellbeing did not improve (Gray et al., 1992) and depression levels were only reduced in one of two studies in which it was measured (De Luca et al., 2014). Except for depression, these subjective ratings were never measured in more than one study. Thus, replication is clearly needed. Moreover, studies that included an active control group found improvements in both groups (Spikman et al., 2010) and did not find any group differences (Gray et al., 1992; Spikman et al., 2010). The other studies either included a passive control group or no control group at all, and thus results could be due to placebo effects.

Both objective auditory and visual memory commonly improved (Chen et al., 1997; Fernandez et al., 2012; Lin et al., 2014), but this was the case for immediate recall and rarely for delayed recall (Ruff et al., 1994; Spikman et al., 2010; De Luca et al., 2014). Similarly, objective attention also improved (Gray et al., 1992; Ruff et al., 1994; Chen et al., 1997; Fernandez et al., 2012; De Luca et al., 2014; Lin et al., 2014). Some of these effects were revealed only at the long term (Gray et al., 1992) and some effects were not significantly larger compared with the control group (Chen et al., 1997; Spikman et al., 2010). Most training programs did include a memory or attention component, and therefore, improvements in these domains were expected.

Improvements in non-trained objective outcomes were also frequently reported. General cognition improved more than in the control group (De Luca et al., 2014). Furthermore, increased participation in everyday life (Spikman et al., 2010), processing speed (Ruff et al., 1994; Chen et al., 1997), IQ, and problem solving (Chen et al., 1997) were found. Conversely, improvements of verbal reasoning were inconsistent (Gray et al., 1992; Chen et al., 1997). The within group effects were not compared with a control group (Ruff et al., 1994; Chen et al., 1997) or the effects were also found in the control group (Spikman et al., 2010). Thus, even though these results seem promising, they need to be interpreted cautiously because of the lack of proper control groups, and they need to be replicated with improved methodological designs.

In contrast to attention specific training (Ponsford and Kinsella, 1988; Zickefoose et al., 2013), the training programs including multiple cognitive domains were effective after severe brain injury (Ruff et al., 1994; De Luca et al., 2014). Training also appeared to be effective for both post-acute patients (De Luca et al., 2014; Lin et al., 2014) and for those who were in the chronic phase (Fernandez et al., 2012). Finally, stroke patients (Lin et al., 2014) as well as patients with other etiologies (Ruff et al., 1994; Chen et al., 1997; Spikman et al., 2010; Fernandez et al., 2012; De Luca et al., 2014) seemed to benefit from the training.

## Neural Effects of Computer Training

Nordvik et al. (2014) emphasized that most computer-based training studies do not investigate the effects on a neural level. In their overview, they summarize evidence for both gray and white matter changes after training certain skills in the healthy population (Nordvik et al., 2014). Within the stroke population, imaging is rarely used as an outcome measure. However, recently two studies reported both functional and structural changes in the brain after restitution-based training. One of these studies included strategy education as part of their training (Nordvik et al., 2012). Even though this study, therefore, does not fulfill our inclusion criteria, we still report it here, because the main elements of the training were two types of computer training, and because such imaging studies are sparse.

In a single case study, both a general computer training (focusing on five cognitive domains) and the specific Cogmed working memory training, were combined with a weekly session which included discussions about possible strategy use. Structural white matter connectivity measures changed during the training period and were stable when the participant was not training (Nordvik et al., 2012). Visual inspection of the data revealed that both training programs improved working memory. The connectivity measure correlated with working memory.

Functional connectivity also changed after the training used by Lin et al. (2014). As mentioned before, both working memory and attention improved after this training. This improvement was related to increased functional connectivity of several brain areas. The control group did not show any improvements in working memory, attention, or executive function after the training. The regional functional connectivity of this group did, however, significantly decrease after the period without training, but these changes did not correlate with cognitive performance. Although changes in functional connectivity were observed in both groups, this suggests that these changes were only related to the training effects in the intervention group and not in the control group.

It is important to note that brain changes can occur even when no behavioral changes are measurable. As both increased and decreased activity can be interpreted positively (i.e., increased communication vs. more parsimonious and efficient communication, respectively), one should preferably have a clear a-priori hypothesis and include healthy aged matched controls. Using non-invasive brain imaging is still relatively new in the field of brain training, which will be able to provide more insight into its effectiveness.

# DISCUSSION

# Summary of Results

In this review we aimed to determine whether computer-based restitution training can improve executive functions. Two of the studies we reviewed were of high quality because they were RCTs with active control groups and a sufficiently large sample size (Gray et al., 1992; Spikman et al., 2010). The intervention training groups in these studies did not improve more than the active control groups.

All other studies suffered from important methodological limitations. Consequently, their more positive results should be interpreted with caution. Results from the RCTs that included passive control groups, thus not correcting for potential placebo effects, revealed that training resulted in near transfer effects (Westerberg et al., 2007; Lundqvist et al., 2010; Akerlund et al., 2013; Prokopenko et al., 2013). Far transfer effects were also found, but mostly in tasks that were somehow related to the trained cognitive function (Westerberg et al., 2007; Lundqvist et al., 2010; Akerlund et al., 2013; Prokopenko et al., 2013; De Luca et al., 2014; Lin et al., 2014). Subjective improvements were not conclusively demonstrated but transfer was observed in several studies (Westerberg et al., 2007; Lundqvist et al., 2010; Björkdahl et al., 2013; De Luca et al., 2014). Spikman et al. (2010) found similarresults within their intervention group (thus without comparing it to the active control group).

Effects on executive function remain inconclusive. Four studies found no improvements (Ponsford and Kinsella, 1988; Fernandez et al., 2012; Akerlund et al., 2013; Lin et al., 2014), five found improvements in part of the measures (Gray et al., 1992; Chen et al., 1997; Westerberg et al., 2007; Spikman et al., 2010; Zickefoose et al., 2013), and seven found improvements in all of their executive function outcome measures (Sturm et al., 1997, 2003; Lundqvist et al., 2010; Hauke et al., 2011; Prokopenko et al., 2013; De Luca et al., 2014; van Vleet et al., 2015). These effects were usually based on only one or two tasks. One particular working memory and attention measure (i.e., Paced Auditory Serial Addition Test; PASAT (Gronwall, 1977)) showed training effects in all three studies that included this task as an outcome measure (Gray et al., 1992; Westerberg et al., 2007; Lundqvist et al., 2010). This concerned studies of working memory training and studies of combined working memory and attention training. The PASAT seems to be a sensitive task to training effects and is suitable to be included in future studies. Three studies did not evaluate training effects on executive functioning (Ruff et al., 1994; Gauggel and Niemann, 1996; Björkdahl et al., 2013).

Six studies evaluated long-term outcome (Ponsford and Kinsella, 1988; Gray et al., 1992; Lundqvist et al., 2010; Spikman et al., 2010; Akerlund et al., 2013; Björkdahl et al., 2013). Transfer effects mostly remained stable several months after training. In the RCT with an active control group of Gray and colleagues, the only significant effects were observed at long-term follow-up. Only two studies evaluated the neural effects of training (Nordvik et al., 2012; Lin et al., 2014). They found that both structural and functional changes were related to training improvement.

#### What are the Effective Elements of Training, and Who Benefits?

It remains unclear which patients benefit from training and which training elements are essential. Positive results were observed in both severe and mildly affected patients in both the post-acute or the chronic phase. One study did not find any effects in a very severely affected post-acute sample (Ponsford and Kinsella, 1988). Both specific and general training programs seemed to be effective. Nevertheless, improvements were largest in the domain of the training itself, and results suggest that the function being trained should at least partially be targeted on the task where transfer is desired. The two hierarchical training programs failed to be effective, perhaps due to their methodological limitations. Training can be either basic or provided in a game-like environment. Participants showed a slight preference for the game-like training, not surprisingly, so training should be adjusted to the personal preferences of the patient. Finally, it is important to provide feedback.

#### Limitations of the Reviewed Studies Lack of Control Groups and Blinding

The lack of proper control groups is one of the most important limitations of the studies reviewed here. Including a proper control group is important because spontaneous recovery can occur, and retest effects are common, especially for executive functioning tasks. A meta-analysis of attention training (not necessarily by computer) revealed that effect sizes of studies without control groups were always larger than effect sizes of studies with control groups (Park and Ingles, 2001). Similarly, transfer effects were absent in the current review when compared to an active control group (Gray et al., 1992; Spikman et al., 2010). Without proper controls it is impossible to draw conclusions about the nature of any effects. A passive control group will only correct for retest effects and spontaneous recovery, but not for placebo effects. An active control group controls for both placebo effects and Hawthorne effects (i.e., effects of being involved in something new and receiving attention). Nonetheless, the training interventions of the two active control groups used by Gray and Spikman were both potentially effective themselves, suggesting that both the experimental and the control training resulted in transfer effects. On the basis of our review we recommend that both an active control group and a passive control group should be included.

Placebo effects, for that matter, are not necessarily an objectionable phenomenon. Even if just being involved in something new results in placebo effects, it may improve the patients' quality of life, and motivate them for other types of rehabilitation. Long-term evaluation, which is currently lacking in most studies, is necessary to determine whether short-term training or placebo effects indeed benefit the patient.

Some may consider the use of control groups as controversial from an ethical point of view, because a potentially beneficial training is withheld from patients. Alternatively, multiple baseline measures, especially if baseline duration varies between participants, could filter out some of the effects of spontaneous recovery and retesting (as done by Ponsford and Kinsella, 1988). Also, the methodology of single-subject designs has improved considerably over the last decade, and it deserves to be applied more often (Dugard et al., 2011).

Blinding of both assessor and participant is another important factor for reliable assessment of outcomes. Only three studies blinded the assessors, and none of the studies reported that the participants were blinded. Blinding of the participants is of course difficult, but can be achieved when mock training is included. This is challenging, because the line between an effective training and a convincing control training is very thin.

#### Incomplete Training Descriptions

Most studies did not report the mean training time. In studies that did report the actual training time, this often differed from the training time as previously planned by protocol (e.g., Gray et al., 1992). Training duration and frequency are important in order to conclude whether behavioral improvements may be ascribed to the training, and whether neural changes may be likely. The median planned training duration of the reviewed studies was 15.6 h. This seems rather brief to obtain stable behavioral changes. The number of repetitions achieved within this time frame may also be insufficient for neural changes to occur (Kimberley et al., 2010).

The setting of the training was hardly ever described. In healthy elderly, training effects were smaller when training was done at home than when it was done in a group setting on site (Lampit et al., 2014). Face to face instructions also resulted in longer training sessions (Cruz et al., 2014) and in larger improvements (Man et al., 2006), than when they were given online and training was done at home. These factors could not be evaluated in the current review. The lack of description of the specific outcome parameters used, of relating the outcomes to training performance, of reports on conflicts of interest, and of evaluation of possible harmful effects of the training, all complicate evaluation of training effects. Without a clear description of all training tasks, it is impossible to determine whether an effect is evidence for far or near transfer.

#### Statistical Considerations

Only four studies adjusted for multiple statistical testing. Currently, there is no consensus whether this correction is necessary for pre-planned analyses (Rothman, 1990; Curran-Everett, 2000; Glickman et al., 2014). Confirmatory studies need to correct for multiple tests that concern the same research question; exploratory studies are not required to do so (Bender and Lange, 2001). In any case, it seems advisable to report unadjusted p-values and confidence intervals, and interpret the results in light of the number of statistical tests performed, especially when many tests are done. Replication studies are needed with the same outcome measures that previously have shown transfer effects, to allow drawing firm conclusions. The reviewed studies hardly ever used the same training program or outcome measures, and thus replication is still lacking.

The sample sizes used in the studies were small. Only three studies had more than 20 participants per group, one of which did not include a control group. None of the studies reported an a-priori sample size calculation to determine the sample size needed to reveal clinically significant effects. It is likely that effect sizes in this research field are small or moderate at best (e.g., Corbett et al., 2015). Thus, the studies reviewed here may have been underpowered, in which case, however, one might ask whether such small effects are still clinically relevant. For better insight into the clinical relevance of training interventions, future studies should report effect sizes.

#### Outcome Measures

Executive functioning was usually measured with only one task. As this is a very broad concept, a single task may not be enough to capture potential effects on executive functioning. The large variation of baseline performance on executive function tasks may mask potential individual improvements, which also remain undetected with small sample sizes.

Ecologically valid measures were rarely used. If used, they mostly consisted of subjective ratings and questionnaires. Ecologically valid measures are needed to evaluate real life benefits. Imaging was used in only two studies, and it was thus rarely possible to assess the training effects at the neural level. Results from imaging were promising, supporting the inclusion of imaging as an outcome measure in future studies.

#### Selection Bias

Another issue is possible selection bias. Most likely, patients only participated if they had at least some affinity with computers. Patients were recruited via rehabilitation centers, and sometimes from only one center. The latter may reduce the generalizability of the results. The exclusion rate was not often reported, but in the Akerlund study it was very high (e.g., >50%) which again reduces generalizability.

#### Limitations of This Review

There are several limitations to this review, most of which are inherent to the novelty of the field. First, due to heterogeneity of outcome measures it was not possible to perform a metaanalysis. Second, we could not assess the risk of bias. It is possible, and maybe even likely, that publication biases exist in this field of research. Selection bias in the studies was also not assessed. The acquired brain injury population is very heterogeneous with many different outcome and impairment patterns. Studies used strict inclusion criteria, which reduces generalizability. Third, we excluded virtual reality studies. Virtual reality often involves the use of the whole body, which makes is difficult to distil whether the effect is due to cognitive retraining or to the physical exercise involved. A recent systematic review of virtual reality studies concluded that it can be effective in improving cognition (Larson et al., 2014). With virtual reality it is possible to safely recreate real life situations. This may, therefore, be a good future way for repeated practice of certain tasks requiring executive functions.

# Strength of This Review

Computers are now widely available and there is a trend to do brain training in many patient populations. It is important to establish whether the effectiveness of restitution-based computer programs can be confirmed. Our review added to the results of the previously performed systematic review (Poulin et al., 2012) because we systematically evaluated 20 studies that provided restitution-based training. Results of our review can be used to improve the methodology of future studies.

## CONCLUSION

Most studies we reviewed suffered from methodological limitations. Samples were mostly small, appropriate control groups were often absent, and adjustment for multiple testing was rarely done. Consequently, it is difficult to draw firm conclusions about the effectiveness of training. With the current study designs, the effects reported may be due—at least in part—to spontaneous recovery, retest effects, or placebo effects.

Effects were most often reported on non-trained tasks that measured the function being trained. There were also reports of far transfer to non-trained tasks, but these tasks still mostly included some part of the function being trained. Training often increased subjective functioning, which is probably very important to motivate patients to continue following rehabilitation and to work on improvements. Overall, the results of these studies warrant continuation of research to determine whether restorative training methods can improve cognitive functioning. Computer training can easily be done at home,

## REFERENCES


which is a cost effective way of improving motivation and subjective functioning, and hopefully of objective functioning after acquired brain injury.

The most important methodological improvements for future studies are that these should have larger sample sizes, both a stimulating but non-effective active control group and a passive control condition. Training periods should be longer and more stimulating training tasks adjusted to the preference and the ability level of the trainee should be used. Studies should also evaluate predictors of training outcome such as time since injury and symptom severity. Multiple outcome measures per cognitive domain without ceiling effects and with satisfactory ecological validity should be used. Long-term effects need to be evaluated and results should be replicated. The interpretation of the results should be in light of training progression and after appropriate adjustment for multiple testing. Effect sizes should be reported in order to evaluate clinical significance of results.

In this field it is a challenge to conduct well designed and sufficiently powered studies due to low budgets available, limited number of available patients, heterogeneity of the population, and ethical considerations. With this in mind, the currently reviewed studies provide valuable insights and emphasize the need of carefully designed RCTs for the future.

# AUTHOR CONTRIBUTIONS

Conception and design of the work: RV, JM, BS. Data acquisition: RV. Data analysis: RV, BS. Interpretation of data: RV, JM, DV, BS. Drafting and revising the work: RV, JM, DV, BS. Final approval of the version to be published: RV, JM, DV, BS. Agreement to be accountable for all aspects of the work: RV, JM, DV, BS.

# ACKNOWLEDGMENTS

This project is part of the research program "Treatment of cognitive disorders based on functional brain imaging" funded by the Netherlands Initiative Brain and Cognition, a part of the Organization for Scientific Research (NWO) under grant number 056-14-013. We thank Janneke Staaks for the search strategy assistance.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2016.00150


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 van de Ven, Murre, Veltman and Schmand. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Cognitive Training for Post-Acute Traumatic Brain Injury: A Systematic Review and Meta-Analysis

Harry Hallock<sup>1</sup> , Daniel Collins<sup>1</sup> , Amit Lampit1,2, Kiran Deol<sup>3</sup> , Jennifer Fleming4,5,6 and Michael Valenzuela<sup>1</sup> \*

<sup>1</sup> Regenerative Neuroscience Group, Brain and Mind Centre, University of Sydney, Sydney, NSW, Australia, <sup>2</sup> School of Psychology, University of Sydney, Sydney, NSW, Australia, <sup>3</sup> Sydney Medical School, University of Sydney, Sydney, NSW, Australia, <sup>4</sup> School of Health and Rehabilitation Sciences, University of Queensland, Brisbane, QLD, Australia, <sup>5</sup> Occupational Therapy Department, Princess Alexandra Hospital, Brisbane, QLD, Australia, <sup>6</sup> Centre for Functioning and Health Research, Metro South Hospital and Health Service, Brisbane, QLD, Australia

Objective: To quantitatively aggregate effects of cognitive training (CT) on cognitive and functional outcome measures in patients with traumatic brain injury (TBI) more than 12 months post-injury.

Design: We systematically searched six databases for non-randomized and randomized controlled trials of CT in TBI patients at least 12-months post-injury reporting cognitive and/or functional outcomes.

Main Measures: Efficacy was measured as standardized mean difference (Hedges' g) of post-training change. We investigated heterogeneity across studies using subgroup analyses and meta-regressions.

#### Edited by:

Soledad Ballesteros, National University of Distance Education, Spain

#### Reviewed by:

Pierluigi Zoccolotti, Sapienza University of Rome, Italy José Manuel Reales, National University of Distance Education, Spain

#### \*Correspondence:

Michael Valenzuela michael.valenzuela@sydney.edu.au

> Received: 18 March 2016 Accepted: 11 October 2016 Published: 27 October 2016

#### Citation:

Hallock H, Collins D, Lampit A, Deol K, Fleming J and Valenzuela M (2016) Cognitive Training for Post-Acute Traumatic Brain Injury: A Systematic Review and Meta-Analysis. Front. Hum. Neurosci. 10:537. doi: 10.3389/fnhum.2016.00537 Results: Fourteen studies encompassing 575 patients were included. The effect of CT on overall cognition was small and statistically significant (g = 0.22, 95%CI 0.05 to 0.38; p = 0.01), with low heterogeneity (I <sup>2</sup> = 11.71%) and no evidence of publication bias. A moderate effect size was found for overall functional outcomes (g = 0.32, 95%CI 0.08 to 0.57, p = 0.01) with low heterogeneity (I <sup>2</sup> = 14.27%) and possible publication bias. Statistically significant effects were also found only for executive function (g = 0.20, 95%CI 0.02 to 0.39, p = 0.03) and verbal memory (g = 0.32, 95%CI 0.14 to 0.50, p < 0.01).

Conclusion: Despite limited studies in this field, this meta-analysis indicates that CT is modestly effective in improving cognitive and functional outcomes in patients with post-acute TBI and should therefore play a more significant role in TBI rehabilitation.

Keywords: traumatic brain injury, TBI, closed head injury, cognitive training, cognitive outcome, neuropsychological outcome, rehabilitation

# INTRODUCTION

Traumatic brain injury (TBI) causes ongoing disability for millions worldwide (Wilson et al., 2014), with cognitive impairment and psychosocial issues presenting major barriers to positive social outcomes such as community reintegration and employment (Rice-Oxley and Turner-Stokes, 1999). Cognitive impairment in TBI frequently affects the domains of attention, memory,

executive functions, processing speed, language, and visuospatial skills (Dikmen et al., 2009). Reviews (Gordon et al., 2006; Cicerone et al., 2011; Lu et al., 2012) have suggested that cognitive rehabilitation for TBI, which encompasses several therapeutic strategies and interventions, can be beneficial for improving these cognitive domains and even community functioning. These interventions may include education, goal-setting, counseling, and internal and external compensation strategies targeting specific cognitive domains.

An on-going issue within the wider field of cognitive rehabilitation is a lack of a consensus for taxonomy of cognitive interventions, including of cognitive training (CT), but here we utilize a working definition consistent across key contributors to the literature (Clare et al., 2003; Buschert et al., 2010; Gates and Valenzuela, 2010). Here, we define and assess the impact of one specific form of cognitive rehabilitation which is seen to be cost-effective, scalable, adaptive (Gates and Valenzuela, 2010): CT. We and others have operationally defined CT to include four main characteristics: (1) repeated practice, (2) on problem-orientated tasks, (3) using standardized stimuli, and (4) targeting specified cognitive domains (Gates and Valenzuela, 2010; Bahar-Fuchs et al., 2013). CT aims to restore impaired skills or harness compensatory mechanisms (Buschert et al., 2010) and can include drill and practice exercises or applied mnemonic strategies. It can be administered either in paperand-pen format, typically facilitated on a one-on-one basis by a therapist, or computer-assisted CT that can be supervised in a group setting or delivered at home at the individual level. It is therefore important to distinguish CT from the more holistic concept of cognitive rehabilitation that may include aspects of CT targeted to improve cognitive deficits, but also includes non-CT interventions aimed at improving psychological, emotional, motivational, and interpersonal functioning (Gordon et al., 2006).

Restorative treatments and compensatory strategies are generically recommended for the rehabilitation of TBI patients displaying cognitive deficits (INCOG guidelines; (Bayley et al., 2014)). Based on efficacy in other clinical populations (Wykes et al., 2011; Lampit et al., 2014; Leung et al., 2015), CT may have therapeutic potential for TBI. Yet prior reviews of cognitive interventions (Cicerone et al., 2005, 2011; Rees et al., 2007) have not specifically addressed the efficacy of CT for TBI patients. These reviews have attempted to synthesize across mixed samples with various kinds of acquired brain injury (ABI), as well as combine different types of cognitive therapies, and permitted a diversity of study designs. A recent meta-analysis (Rohling et al., 2009) highlights the potential therapeutic benefits of CT for specific brain injury deficits, but similar to the reviews, their study is of mixed etiology and also combines samples of varying time since injury, which although is inevitable, potentially introduces spontaneous recovery as a confounder. However, this could be attenuated by confining research to before or after 12 months post injury.

Accordingly, using a meta-analytic approach, this study aims to systematically evaluate whether operationally defined CT is effective in improving cognitive and functional outcomes at least one-year post-TBI, and to analyze potential moderators that may affect treatment outcomes. The study will analyze individual cognitive and functional domains, as well as overall cognition and overall functioning by pooling the individual domains together, respectively. Investigation of individual domains allows for identification of specific training effects, whilst pooling together individual domains allows for the identification of more general or overall effects that may not be apparent at the individual level for a multitude of reasons such as low sample size or poor study design. Additionally, to investigate potential moderators of training, a sub-group analysis will be conducted. Studies in this field are often small, underpowered and vary in design, thus a meta-analysis can add clarity, as it allows for amalgamation of these small studies to produce an overall analysis with greater statistical power and further reaching conclusions. Thus, as the field of cognitive rehabilitation in TBI is still in its infancy and much more research is required, a meta-analysis could prove crucial in identifying the future direction of CT, and potential design factors that may prove most effective.

# MATERIALS AND METHODS

This systematic review and meta-analysis adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Supplementary Table S1), (Liberati et al., 2009) was prospectively registered with PROSPERO (CRD42014013274) and largely follows methods established in our previous meta-analyses (Lampit et al., 2014; Leung et al., 2015).

## Eligibility Criteria

We included both non-randomized and randomized controlled trials (RCTs) provided they investigated the effects of a CT intervention on cognitive and/or functional outcomes in individuals (both intervention and controls groups) with postacute TBI (time since injury ≥12 months, study mean). Thus we excluded studies that included healthy or acute TBI controls. Eligible outcomes were baseline and post-training performance on measures of cognition, Instrumental Activities of Daily Living (IADL) or dysexecutive functioning, defined as holistic disruptions to frontal lobe functions such as behavior, executive functions and cognition. CT was defined as any intervention incorporating computer-assisted CT, penciland-paper-administered CT, or cognitive strategy training, practiced systematically for a minimum of 4 hours. Studies that used combined interventions (e.g., CT with standard physical rehabilitation) were eligible if CT comprised at least 50% of the total intervention duration.

# Search Methodology

We searched CINAHL, the Cochrane Database of Systematic Reviews, EMBASE, MEDLINE, PsycBITE, and PsycINFO databases from inception to July 27, 2015 using a comprehensive search strategy (Supplementary Datasheet 1).

Relevant articles were downloaded to an EndNote library, duplicates were removed and articles from other sources (e.g., references from systematic reviews) were manually added. HH

and DC conducted initial screening for eligibility using title and abstract, and then independently examined full-text articles for inclusion.

# Study Appraisal and Risk of Bias Within Studies

A modified form of the Physiotherapy Evidence Database (PEDro) scale (Maher et al., 2003), designed for rating the quality of RCTs, was used by HH and DC to assess the methodological rigor of included studies. As blinding of participants and therapists is impractical in CT trials, these two PEDro items were not assessed, and the maximum overall score (i.e., highest study quality) became 9 (Lampit et al., 2014). Risk of bias resulted from lack of assessor blinding or adherence to intention-totreat analysis was assessed using the Cochrane's risk of bias tool (Higgins and Green, 2011). RCTs with high or unclear risk of bias for either of these categories were defined as having a high risk of bias.

# Data Extraction

Cognitive and functional outcome data were extracted in the form of means and standard deviations for each group immediately pre- and post-intervention using a correlation of 0.6 between timepoints or mean group change, and entered into Comprehensive Meta-Analysis version 2 (CMA, Biostat, Englewood, NJ, USA). Coding of outcomes into cognitive domains and effect direction were performed using the Compendium of Neuropsychological Tests (Strauss et al., 2006) or by consensus.

## Data Analysis

The dependent variable was standardized mean differences (SMD) (calculated as Hedges' g to correct for small sample sizes) of change from baseline to post-intervention between CT and control groups. Precision of SMD was estimated using 95% confidence intervals (CI). Analyses were conducted on individual cognitive (executive functions, verbal memory, working memory, attention, processing speed, non-verbal memory, visuospatial, language) and functional domains (IADL and dysexecutive functions). Analyses were also conducted on Overall Cognition and Overall Function, which were a result of combining or pooling the respective individual cognitive or functional domains together (Wykes et al., 2011; Lampit et al., 2014). An effect size of g < 0.30 was considered small, g ≥ 0.30 moderate, and g ≥ 0.60 large.

To avoid selective analyses of outcomes, study-level SMDs from the same cognitive domain were combined into a single effect estimate, corrected for inter-correlation across outcomes using a correlation of 0.7 (Gleser and Olkin, 2009). Pooling of outcomes across studies was conducted using random effects model. Heterogeneity across studies was quantified using the I 2 statistic, which quantifies the proportion of variance due to heterogeneity in true effects rather than random error (Higgins et al., 2003). I 2 values of 25, 50, and 75% imply low, moderate, and large heterogeneity, respectively.

To assess publication bias (small-study effect), funnel plots were visually inspected and formally tested using Egger's Test of the Intercepts if at least 10 studies were available for analysis (Egger et al., 1997; Sterne et al., 2011). If significant asymmetry was detected (p < 0.1), we estimated the magnitude of smallstudy effect using Duval and Tweedie's Trim and Fill method (Duval and Tweedie, 2000).

In order to detect design factors that may affect CT efficacy, we performed subgroup meta-analyses based on mixedeffects model. Between-subgroup heterogeneity was tested using Cochrane's Q statistic (significant at p < 0.05). Analyses were performed for overall cognitive and overall functional outcomes based on the following study characteristics: study design (randomized or non-randomized), intervention type (combined, strategy or training), control type (active or passive), total hours of training (≤20 or >20 h), session length (≤60 or >60 mins), and session frequency (<4 or ≥4 a week). Univariate metaregressions were used to detect relationships between cognitive results and PEDro score, sample size and year of publication. All analyses were conducted in CMA.

# RESULTS

# Study Selection

After removal of duplicates, 3464 articles were screened for inclusion based on published title and abstract. 421 articles were suitable for full-text screening, including one manually added study (**Figure 1**). After full-text screening, 15 studies were eligible for review, however one focused solely on children and adolescence (Thomas-Stonell et al., 1994) and was therefore excluded, leaving 14 studies for analysis. Age was not a screening criterion, but given the fact that TBI can manifest quite differently during adolescent brain development, it was deemed appropriate to exclude this study.

# Characteristics of Included Studies

Data from 575 participants and 169 outcomes were included. The mean number of participants per study was 41.07, with a mean participant age of 38.79 years. Brain injury severity ranged from mild to severe. Time since injury ranged from 1.01 to 14.4 years (mean = 5.3 years, SD = 3.78). Strategy-based interventions were used in four studies (O'Neil-Pirozzi et al., 2010; Vas et al., 2011; Dawson et al., 2013; Twamley et al., 2014), drill and practice training (including computer-assisted training) in six (Ryan and Ruff, 1988; Baribeau et al., 1989; Ruff et al., 1989; Rattok et al., 1992; Potvin et al., 2011; Nelson et al., 2013), and a combination of both in four studies (Goranson et al., 2003; Tiersky et al., 2005; Wai-Kwong Man et al., 2006; Cantor et al., 2014). An active control group was used in seven studies (**Table 1**). The average PEDro score was 5.64/9 (SD = 0.84). Seven studies were RCTs (Ryan and Ruff, 1988; Ruff et al., 1989; Tiersky et al., 2005; Vas et al., 2011; Nelson et al., 2013; Twamley et al., 2014) and seven were non-randomized controlled studies (Baribeau et al., 1989; Rattok et al., 1992; Goranson et al., 2003; Wai-Kwong Man et al., 2006; O'Neil-Pirozzi et al., 2010; Potvin et al., 2011; Dawson et al., 2013; Cantor et al., 2014). Four of the studies

chart.

fnhum-10-00537 October 25, 2016 Time: 14:33 # 4

confirmed assessor blinding, whilst three reported intention-totreat.

# Efficacy on Overall Cognitive Outcomes

There was a small, statistically significant positive effect of CT on overall cognitive outcomes (k = 12, g = 0.22, 95% CI 0.05 to 0.38, p = 0.01; **Figure 2A**). Heterogeneity across studies was low (I <sup>2</sup> = 11.71%, 95% CI 0% to 51.39%). A funnel plot of results did not reveal asymmetry (Egger's intercept = −0.93, p = 0.51; **Figure 3**) suggesting no significant evidence of systematic bias toward including positive (or negative) outcomes.

# Efficacy on Specific Cognitive Domains Executive Function

The effect size was small and statistically significant (k = 8, g = 0.20, 95% CI 0.02 to 0.39, p = 0.03; **Figure 2B**). Statistical heterogeneity across studies was zero (I <sup>2</sup> = 0, 95% CI 0% to 0%), and the funnel plot did not show evidence of asymmetry (**Figure 3**).

#### Verbal Memory

The effect size was moderate and statistically significant (k = 10, g = 0.32, 95% CI 0.14 to 0.50, p < 0.01; **Figure 2C**). Statistical heterogeneity across studies was small (I <sup>2</sup> = 3.40, 95% CI 0% to 63.84%), and the funnel plot did not show evidence of asymmetry (Egger's intercept = −1.69, p = 0.21; **Figure 3**).

#### Working Memory

The effect size was small and statistically non-significant (k = 7, g = 0.06, 95% CI −0.21 to 0.34, p = 0.94; **Figure 4A**). Statistical heterogeneity across studies was moderate (I <sup>2</sup> = 36.88, 95% CI 0% to 73.44%), and the funnel plot did not show evidence of asymmetry (**Figure 3**).

#### Attention

The effect size was small and statistically non-significant (k = 6, g = 0.14, 95% CI −0.09 to 0.37, p = 0.22; **Figure 4B**). Statistical heterogeneity across studies was zero (I <sup>2</sup> = 0, 95% CI 0% to


(Continued)

TABLE 1


Characteristics

 of included studies.


61.04%), and the funnel plot did not show evidence of asymmetry (**Figure 3**).

#### Processing Speed

The effect size was small and statistically non-significant (k = 5, g = 0.22, 95% CI −0.01 to 0.46, p = 0.06; **Figure 4C**). Statistical heterogeneity across studies was zero (I <sup>2</sup> = 0, 95% CI 0% to 62.91%), and the funnel plot did not show evidence of asymmetry (**Figure 3**).

#### Non-verbal Memory

The effect size was negative, small and statistically non-significant (k = 4, g = −0.08, 95% CI −0.40 to 0.24, p = 0.63; **Figure 5A**). Statistical heterogeneity across studies was zero (I <sup>2</sup> = 0, 95% CI 0% to 0%), and the funnel plot did not show evidence of asymmetry.

#### Visuospatial

The effect size was small and statistically non-significant (k = 4, g = 0.01, 95% CI −0.29 to 0.31, p = 0.94; **Figure 5B**). Statistical heterogeneity across studies was zero (I <sup>2</sup> = 0, 95% CI 0% to 62.81%), and the funnel plot did not show evidence of asymmetry.

#### Language

The effect size was small and statistically non-significant (k = 3, g = 0.08, 95% CI −0.27 to 0.43, p = 0.66; **Figure 5C**). Statistical heterogeneity across studies was small (I <sup>2</sup> = 24.57, 95% CI 0% to 97.47%), and the funnel plot did not show evidence of asymmetry.

#### Efficacy on Overall Functional Outcomes

A pooled analysis of the seven studies reporting functional outcomes revealed a moderate and statistically significant effect size (g = 0.32, 95% CI 0.08 to 0.57, p = 0.01; **Figure 6A**). Heterogeneity across studies was low (I <sup>2</sup> = 14.27%, 95% CI 0% to 75.39%). The funnel plot revealed asymmetry, indicating more positive results in smaller studies. A trim and fill analysis revealed a smaller and statistically non-significant effect size (g = 0.23, 95% CI −0.05 to 0.51, p = 0.11 **Figure 3**).

#### Efficacy on Specific Functional Domains Instrumental Activities of Daily Living (IADL)

The effect size was moderate and statistically non-significant (k = 7, g = 0.36, 95% CI −0.04 to 0.75, p = 0.08; **Figure 6B**). Statistical heterogeneity across studies was moderate (I <sup>2</sup> = 62.07%, 95% CI 13.59% to 83.35%). The funnel plot showed evidence of asymmetry, but trim and fill analysis did not alter the effect size (**Figure 3**).

#### Dysexecutive Functions

The effect size was small and statistically non-significant (k = 2, g = 0.23, 95% CI −0.11 to 0.57, p = 0.19; **Figure 6C**). Statistical heterogeneity across studies was zero (I <sup>2</sup> = 0).

#### Moderators of CT Efficacy

Possible moderators of training effects on overall cognitive (**Figure 7A**) and functional (**Figure 7B**) outcomes were investigated using sub-group analyses. For overall cognition,

TABLE 1


fnhum-10-00537 October 25, 2016 Time: 14:33 # 6

we did not find significant between group differences for study design, intervention type, control type, total hours of training, session length or session frequency. However there was a strong trend toward less training being more effective on overall cognition, with studies providing 20 h or less of training (g = 0.41, 95% CI 0.14 to 0.68, p < 0.01, I <sup>2</sup> = 21.40%) being more effective than those that provided more than 20 hours (g = 0.06, 95% CI −0.15 to 0.28, p = 0.55; Q = 3.80, df = 1, p = 0.05). To further investigate this trend, we conducted an analysis on a post hoc basis. This correlation comparing length of training and severity of injury was found to be non-significant (r = 0.26, p = 0.44, n = 11). There were no significant between-subgroup differences with overall functional outcomes for any of these moderators. As both IADL and working memory outcomes had moderate heterogeneity, subgroup analyses were conducted, but no significant differences were found for either. For other domains, heterogeneity was close to zero, thus subgroup analyses were not warranted.

Meta-regression showed no statistically significant relationships between overall cognitive effects and PEDro score (β = 0.06, p = 0.57), sample size (β = 0.004, p = 0.23), or year of publication (β = 0.01, p = 0.18).

A matrix was constructed to investigate whether the content of training (the domain/s that were trained) moderated outcomes on specific cognitive domains outcomes, i.e., if there was transfer. A summary of these cognitive outcomes is presented in **Figure 8**, and categorized by study and cognitive domain trained. No

statistical analysis was run on this data, but the matrix illustrates which cognitive domains were trained (gray color cells), and the effect sizes at a study level or pooled together at a domain level.

## DISCUSSION

Cognitive-based interventions are effective in several clinical populations (Wykes et al., 2011; Lampit et al., 2014; Leung et al., 2015), and here we expand the evidence base to include postacute TBI. CT was particularly effective on overall cognition, as well as the cognitive domains of verbal memory and executive function, and jointly improved individuals' IADLs whilst reducing severity of dysexecutive signs and symptoms.

TBI is extremely heterogeneous in its etiology and origins. Accordingly, patients present with a variety of cognitive deficits (Dikmen et al., 2009), with information processing speed and verbal memory most commonly affected (Skandsen et al., 2010). It is therefore promising that this study found not only general

cognitive efficacy, but specific efficacy for executive function and verbal memory. Contrary to this, a previous meta-analysis (Rohling et al., 2009) found that cognitive rehabilitation was not effective in TBI patients. However, that study combined several different types of cognitive interventions and patients varied greatly in the time since injury. As mentioned, cognitive rehabilitation encompasses a variety of therapeutic approaches, and here we aim to focus on CT as operationally defined in the introduction. Moreover, timing of intervention may well be critical. TBI often progresses through stages of unconsciousness and emerging consciousness; confusion with dense anterograde amnesia that can vary from days to several weeks; and a longterm period of restoration of cognitive, neuropsychological and social functioning that can last for several years (Povlishock and Katz, 2005). Here we have clarified the literature to some extent and shown that one approach to cognitive rehabilitation, CT, is effective for certain cognitive domains in the post-acute phase.

Cognitive rehabilitation, which can include CT, is known to improve community functioning even several years after TBI (Gordon et al., 2006). Our analyses suggest that CT may itself be sufficient to retrain functional skills or facilitate compensatory mechanisms that can translate into everyday outcomes. In the TBI literature, functionality is often measured by IADL scales and assessment of dysexecutive syndrome. Given the importance of both IADLs and dysexecutive syndrome to everyday life, it is noteworthy that CT produced a moderate effect size on these outcomes when combined. Furthermore, the low heterogeneity surrounding this estimate indicates that the result is subject to little explainable variation and is thus an accurate estimate of effect size. Whilst the combination of these two outcomes

may appear to be novel, previous studies have shown loose connections between the two (Pa et al., 2009; Marshall et al., 2011). Importantly, this result suggests that CT has the potential to achieve so-called "far-transfer" (Barnett and Ceci, 2002) to positively influence real world issues faced by TBI patients.

Despite these combined results, IADL or dysexecutive functioning did not produce significant improvements when considered separately. This may be due to insufficient power, as not only were there limited studies examining these outcomes, and small sample sizes, but a separate analysis of the two domains displayed larger CI. Positive effects on daily function were restricted to a pooled analysis of combined dysexecutive and IADL outcomes. Whilst this approach has some precedence (Pa et al., 2009; Marshall et al., 2011) and was planned a priori, when each type of outcome was considered individually no significant effects were observed. This therefore brings up the issue as to what can be reasonably combined in terms of outcomes measures within a meta-analysis – a topic treated in detail by


FIGURE 7 | Subgroup analysis of moderators for (A) overall cognitive outcomes, and (B) overall functional outcomes.


FIGURE 8 | Matrix of training content against effect size of each cognitive outcome from the individual studies. Gray cells indicate the study trained in the domain. EF = executive functions; WM = working memory. ∗∗p < 0.05

Borenstein et al. (2009). In their example, combining tests of Maths and English is justified, "If our goal is to assess the impact on performance in general, then the answer is Yes." (Borenstein et al., 2009, p. 357). Our goal was to assess the impact of CT on those areas that most impact day to day function in TBI rehabilitation, inclusive of both dysexecutive syndrome (Rao and Lyketsos, 2000) and impaired IADLs (Colantonio et al., 2004). Hence, there are promising indications that CT can help support daily function in chronic TBI patients, but clearly more research is required to parse these effects out.

Interestingly, CT had a significant effect on executive function but not dysexecutive outcomes. This may appear paradoxical given the two outcomes are intrinsically (and inversely) related (Ardila, 2013). However, this pattern of results can be explained by the nature of the data. Executive outcomes originate from neuropsychological tests that are generally objective, quantitative and continuous, and thus sensitive to change, whilst dysexecutive instruments are generally subjective, qualitative and ordinal. By nature these instruments are therefore of lower resolution and require much larger behavioral change before detection. Further research is therefore required to determine whether CT can improve not just psychometric executive function but also minimize the presentation or severity of dysexecutive symptoms in post-acute TBI.

Of the potential moderators analyzed, a strong trend was found only for training hours. Studies where subjects trained for ≤ 20 h showed improvements in overall cognition compared to studies where patients trained more. This is consistent with evidence of weaker effect sizes in studies that provided intense training schedules (Lampit et al., 2014) or long training durations (Toril et al., 2014) in healthy older adults. A possible explanation for this trend could be the heterogeneity in injury severity amongst the population, whereby those with more severe injuries required more training, with the assumption that increased severity means lower improvement. We conducted a post hoc analysis to test this theory, but we did not find a relationship between length of training and injury severity across studies. However given that this post hoc analysis was conducted on such a small sample size, and thus lacks power, we cannot completely rule out that the trend in training time is linked with injury severity. Nonetheless, it is intriguing that across different clinical cohorts there may be converging evidence for the importance of avoiding overdosing, or over-training participants. This concept is even more salient in the field of TBI, where rehabilitation is often guided by the principle that greater intensity or number of repetitions is better. Here, we conclude that CT at a circumscribed dose, at the right time in the post-acute stage, is preferable.

Other possible moderators analyzed were found to be nonsignificant, consistent with the small number of studies and minimal explainable between-study variance – a concept we have previously discussed (Leung et al., 2015). More specifically, our data suggests that an important study design factor, whether randomization occurred or not, did not impact CT efficacy in post-acute TBI. However, we cannot rule out that this could be due to a lack of power, a notion counter-weighed by similar effect sizes from the two design methods. This finding supports our decision to combine both non randomized and RCTs into a single analysis.

To further explore moderating or driving factors of cognitive outcomes, we investigated whether there was a link between training content and cognitive outcomes (**Figure 8**). For this population, cross-transfer, the idea that training in one domain can result in improvements in another untrained domain, appears to be unlikely. This is evident when looking at the columns for working memory, speed, language and executive functions. We can see here, as indicated by the gray cells, or lack thereof, that there was minimal training on these domains, however there was training in many other domains. The fact that there are no significant results, in addition to the obvious lack of power, suggests a lack of cross transfer, a sentiment mirrored in previous research (Edwards et al., 2002). Importantly, we cannot conclude that certain domains, such as working memory,

speed, language, and executive functions are ineffective or nonresponsive in this population. Instead, this figure suggests that there is need for more trials that are training or targeting multiple different cognitive domains.

Limitations include potential selection bias that may have influenced results. Our narrow eligibility criteria and decision to include only studies published in English, resulted in wellcontrolled CT studies being excluded from this analysis, such as trials implemented before 12 months post-injury. We chose this temporal window for clinical reasons, namely to minimize the confounding effects of spontaneous recovery of function that can occur during the acute and sub-acute stages (Sohlberg and Mateer, 2001). A caveat to this criterion was a reliance on studylevel characteristic. Some studies included participants from 3 months post-injury and onward, resulting in large variations in time since injury, despite the reported study average being >12 months post injury. To further clarify the specificity of our findings to this temporal window, a patient-level meta-analysis is required. In addition, our decision to only include functional outcomes that could be categorized as IADL or dysexecutive functioning was a potential source of selection bias, but a decision we consider clinically principled since functional outcomes from three studies e.g., 'Life-3' (Dawson et al., 2013; Cantor et al., 2014; Twamley et al., 2014) were idiosyncratic and deemed incomparable.

A notable limitation of our analysis is the heterogeneity in injury severity, but this is reflective of the state of the field. Indeed, many of the included studies themselves comprised patients of varying TBI severity (from mild to severe). However, low statistical heterogeneity indicates that there was no other important source of bias between study variance besides total hours of training. Average PEDro quality scores were relatively low, but this was mainly attributable to two points being allocated for randomization procedures. We specifically tested this factor and found it was not influencing effect size estimates. Perhaps the largest limitation of our study is the relative infancy of the field. With only six of the studies included being RCTs, the field is somewhat nascent, thus our results must be viewed with some skepticism. Nonetheless there is enough power to show effectiveness of CT on overall cognitive and functional outcomes, however, clearly future research with more rigorous trial design and reporting is required.

TBI is fundamentally heterogeneous and manifests in complex and unpredictable patterns, resulting in diverse physical,

#### REFERENCES


behavioral, cognitive and functional outcomes. Discerning therapeutic efficacy in this population is therefore challenging. Despite this potential for background 'noise' and the limited studies in a still developing field, we found encouraging results with implications for clinical practice. Namely, significant cognitive gains were seen as a result of CT more than one-year post-injury when spontaneous neurological recovery is assumed to have stabilized. It is encouraging to see that there may be a possible link between training intensity and overall efficacy, but further studies with larger sample sizes and more heterogeneous populations are required to explore this relationship. Small samples and lack of power have meant that the effectiveness of CT in TBI has been inconclusive in individual studies – this is precisely the condition when a meta-analysis can add value and clarity to a field (Borenstein et al., 2009). This meta-analysis thereby provides evidence that CT may be modestly effective in promoting cognitive and functional gains in everyday life. Accordingly, further investigation of different approaches to CT is required along with health economic analyses of the costs and benefits of CT for post-acute TBI.

# AUTHOR CONTRIBUTIONS

HH, DC, KD and MV: Design and/or conceptualization of the study. HH, DC, JF, AL and MV: Analysis and/or interpretation of the data. HH, DC, KD, JF, AL and MV: Drafting and/or revising the manuscript.

# FUNDING

JF is supported by an NHMRC project grant (ID 1083064). AL is an ARC-NHMRC Dementia Research Development Fellow (ID 1108520). MV is an NHMRC Clinical Career Development Research Fellow (ID 1112813).

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2016.00537/full#supplementary-material



traumatic brain injury. Arch. Phys. Med. Rehabil. 86, 1565–1574. doi: 10.1016/j.apmr.2005.03.013


Wykes, T., Huddy, V., Cellard, C., Mcgurk, S. R., and Czobor, P. (2011). A metaanalysis of cognitive remediation for schizophrenia: methodology and effect sizes. Am. J. Psychiatry 168, 472–485. doi: 10.1176/appi.ajp.2010.10060855

**Conflict of Interest Statement:** MV and AL receive in-kind research support in the form of no-cost software from BrainTrain Inc. and Synaptikon GmbH for projects unrelated to this work. All the other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer JR and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2016 Hallock, Collins, Lampit, Deol, Fleming and Valenzuela. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Temporal Dynamics of the Default Mode Network Characterize Meditation-Induced Alterations in Consciousness

Rajanikant Panda1,2† , Rose D. Bharath1,2 \* † , Neeraj Upadhyay <sup>3</sup> , Sandhya Mangalore<sup>2</sup> , Srivas Chennu4,5 and Shobini L. Rao<sup>1</sup>

<sup>1</sup> Cognitive Neuroscience Center, National Institute for Mental Health and Neurosciences, Bangalore, India, <sup>2</sup> Department of Neuroimaging and Interventional Radiology, National Institute for Mental Health and Neurosciences, Bangalore, India, <sup>3</sup> Department of Neurology and Psychiatry, Sapienza University of Rome, Rome, Italy, <sup>4</sup> School of Computing, University of Kent, Chatham Maritime, UK, <sup>5</sup> Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK

Current research suggests that human consciousness is associated with complex, synchronous interactions between multiple cortical networks. In particular, the default mode network (DMN) of the resting brain is thought to be altered by changes in consciousness, including the meditative state. However, it remains unclear how meditation alters the fast and ever-changing dynamics of brain activity within this network. Here we addressed this question using simultaneous electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) to compare the spatial extents and temporal dynamics of the DMN during rest and meditation. Using fMRI, we identified key reductions in the posterior cingulate hub of the DMN, along with increases in right frontal and left temporal areas, in experienced meditators during rest and during meditation, in comparison to healthy controls (HCs). We employed the simultaneously recorded EEG data to identify the topographical microstate corresponding to activation of the DMN. Analysis of the temporal dynamics of this microstate revealed that the average duration and frequency of occurrence of DMN microstate was higher in meditators compared to HCs. Both these temporal parameters increased during meditation, reflecting the state effect of meditation. In particular, we found that the alteration in the duration of the DMN microstate when meditators entered the meditative state correlated negatively with their years of meditation experience. This reflected a trait effect of meditation, highlighting its role in producing durable changes in temporal dynamics of the DMN. Taken together, these findings shed new light on short and longterm consequences of meditation practice on this key brain network.

Keywords: default mode network, microstate, DMN-microstate, simultaneous EEG-fMRI, meditation

#### INTRODUCTION

The grand challenge of characterizing the dynamic neural substrate underlying human consciousness has captured the interest of many researchers cutting across disciplinary boundaries and covering altered states ranging from sleep, meditation, hypnosis, anesthesia, coma, disorders of consciousness, delirium tremens, psychoses, etc. Normal consciousness is thought to require both wakefulness and arousal, and several neuro scientifc studies conceptualize wakefulness as a

#### Edited by:

Claudia Voelcker-Rehage, Chemnitz University of Technology, Germany

#### Reviewed by:

Tao Liu, Zhejiang University, China Márk Molnár, Hungarian Academy of Sciences, Hungary

#### \*Correspondence:

Rose D. Bharath cns.researchers@gmail.com drrosedawn@yahoo.com †These authors have contributed equally to this work.

> Received: 08 May 2016 Accepted: 11 July 2016 Published: 22 July 2016

#### Citation:

Panda R, Bharath RD, Upadhyay N, Mangalore S, Chennu S and Rao SL (2016) Temporal Dynamics of the Default Mode Network Characterize Meditation-Induced Alterations in Consciousness. Front. Hum. Neurosci. 10:372. doi: 10.3389/fnhum.2016.00372 Panda et al. Temporal Dynamics of the DMN and Meditation

continuum with different levels of awareness (Grill-Spektor et al., 2000; Bar et al., 2001; Kouider et al., 2010). Resting state functional magnetic resonance imaging (rsfMRI) has been used to study the neural correlates of conscious awareness in normal and altered states of consciousness. In particular, this research enterprise has highlighted and progressively refined our understanding of the so-called default mode network (DMN) of the brain, consisting of precuneus/posterior cingulate cortex (PCC), medial frontal cortex (mPFC), the temporoparietal junction (TPJ) and hipocampal formation including parahipocampal cortex, as a key neural correlate of consciousness (Buckner et al., 2008). In particular, researchers have studied changes in the DMN as a function of meditative and introspective cognitive states like day dreaming, mind wandering, and autobiographical memory retrieval (Baerentsen et al., 2010; Baron Short et al., 2010; Hasenkamp et al., 2012; Garrison et al., 2013). These investigations into the neurophenomenology of meditation have found PCC deactivation to be associated with ''undistracted awareness'' and ''effortless doing'', with PCC activation being linked to ''distracted awareness'' and ''controlling'' in experienced meditators (Garrison et al., 2013). This evidence is in line with the contrasting approaches in two major types of meditation, which either emphasize focused attention or open monitoring. Focused attention meditation aims to reduce mind wandering by concentrating on tasks like breath, sounds or mental imagery, while open monitoring meditation practice encourages mind wandering, and makes one aware of this process (Xu et al., 2014). Extensive study of the DMN in health and disease have also found correlations between DMN connectivity and sleep (Fukunaga et al., 2006; Horovitz et al., 2008; Picchioni et al., 2008), anesthesia (Vincent et al., 2007), disorders of consciousness (Fernández-Espejo et al., 2012; Guldenmund et al., 2012).

While rsfMRI has enabled us to have a fine-grained spatial understanding of the DMN in these states of consciousness, conventional analytical approaches are not best suited to measure the temporal dynamics of its activity, especially during meditation-induced alterations in the state of consciousness. To better understand the temporal dynamics of the DMN, researchers have developed a range of techniques, including time resolved resting fMRI analysis where DMN connectivity is assessed multiple times using sliding temporal windows (Chang and Glover, 2010; Hutchison et al., 2013; Leonardi et al., 2013; Allen et al., 2014; Zalesky et al., 2014). Alternative techniques to obtain temporal detail with fMRI include identification of temporal functional modes using temporal independent component analysis (ICA; Smith et al., 2012), modified seed to voxel based connectivity to define coactivation patterns (Liu and Duyn, 2013) and the recent innovation driven coactivation pattern (Karahanoglu and Van De Ville, 2015). While these methods have attempted to capture the temporal dynamics of the DMN, the time scales of the observed fluctuations in its activity vary from tens of seconds to few minutes in fMRI based studies (Chang and Glover, 2010; Handwerker et al., 2012; Hutchison et al., 2013). In contrast, modeling of electromagnetic brain dynamics using electroencephalography (EEG) and MEG suggest that these neuronal fluctuations called microstates have durations of 100–200 ms (Brandeis and Lehmann, 1989; Pascual-Marqui et al., 1995; Michel et al., 2004; Lehmann et al., 2010; Baker et al., 2014). The concept of microstates was first proposed and demonstrated by Lehmann et al. (1987) when they described brain states that remain stable for 80–120 ms before rapidly evolving into another quasi-stable microstate. The most common parameters used to quantify microstate dynamics are duration or lifespan, which is the average length of time each microstate remains stable whenever it appears. Another useful parameter is frequency of occurrence, which is the average number of times per second that the microstate becomes dominant (Lehmann et al., 1987). Most microstate studies reports four classic microstates which can explain more than 70% of the variation in the scalp topographies manifesting in EEG time series (Tei et al., 2009; Khanna et al., 2015), which have been found to be correlated with rsfMRI networks associated with phonological processing, visual processing, the salience network, and attentional switching (Mantini et al., 2007; Britz et al., 2010). Other studies have reported a higher number of microstates (10–13) with different analysis methods (Musso et al., 2010; Yuan et al., 2012), some of which correlate with the DMN. Here, we refer to the ''DMN microstate'' as the EEG microstate which correlated maximally with the DMN identified with fMRI. Such microstate analyses have been applied to the study of meditation, where increased duration of microstates has been reported in EEG-based studies in Chan-meditators and Ch'anMo'chao, or Vipassana meditators (Faber et al., 2005; Lo and Zhu, 2009; Tei et al., 2009). Microstate parameters have also been shown to be modulated by psychiatric disorders (Dierks et al., 1997; Lehmann et al., 2005; Kikuchi et al., 2011; Nishida et al., 2013) and even by personality type (Schlegel et al., 2012).

In this study, we analyzed the DMN microstate to understand the mechanisms of meditation-induced alterations in consciousness. By contrasting healthy controls (HCs) at rest against expert meditators at rest and during meditation, we explored both state and trait changes in DMN-microstate dynamics produced by meditation with a hypothesis that these could cause differential alterations in its duration and frequency. The state changes felt during meditation are usually described as a deep sense of calm peacefulness, cessation or slowing of mind's internal dialog and conscious awareness merging completely with the object of meditation (Brown, 1977; Wallace, 1999). Alongside, long-term expertise in meditation also produces durable changes in neural dynamics, with improvements in mental and physical health presumably due to its trait effects (Chiesa and Serretti, 2010, 2011; Hofmann et al., 2010). Here, we describe changes in the spatial configuration of the DMN as a function of meditation, and show that state and trait influences on the temporal dynamics of the DMN microstate can indeed be dissociated.

## MATERIALS AND METHODS

#### Participants

This was a prospective study conducted at a tertiary neurological institute, the National Institute of Mental Health and Neurosciences (NIMHANS) in Bangalore, India. The study was performed after obtaining informed written consent from the participants. They were recruited as healthy participants in the multi-institutional study on cognitive networks. Ethical approval was obtained from the institutional ethics committee for studies involving humans, convened by NIMHANS (No.NIMHANS/69th IES/2010). The meditator cohort included 20 Raja Yoga expert meditators (male; age: 35 ± 7.9 years, years of education: 15.4 ± 1.6 years, right handed) from the Brahma Kumaris Spiritual Organization, all of them with more than 10 years of Raja yoga meditation practice. Raja yoga meditation involves internally visualizing a glowing star as rays emerging between the eye brows and thus could be considered a type of focused attention type of meditation practice. All participants reported that they spent 1.68 ± 0.59 h in meditation per day in the last 10–22 years (15.2 ± 3.54). They also reported having a cumulative experience of 11332.5 ± 6009.86 h of meditation practice (cumulative experience calculated by combining the numbers of hours per day with years of meditation practice) in their life. Twenty HCs were also recruited for the study. The control participants were matched with the meditator cohort by age, gender, education and ethnicity (male; age: 29 ± 6.8 years, years of education: 16.1 ± 1.1 years, right handed) and both the groups were comparable. None of controls had experience in any type of regular meditative practices. Both meditators and control participants were multilingual (languages known 3.38 ± 0.58), with kannada as first language. None of the participants had any history of neurological or psychiatric illnesses, or prior trauma, and were not on any chronic medications that could affect the experiment.

#### Experiment Design

The study design was explained to the subjects and instructions were given before performing the EEG-fMRI data collection in the MRI scanner. The participants (both controls and meditators) were instructed to lie awake, with their eyes closed in a relaxed state within the MRI gantry. They were advised to refrain from any cognitive, language or motor tasks during the acquisition. Ear plugs were given to reduce scanner induced noise. Initially a 4.24 min structural MRI was recorded. This allowed them to get familiarized with the MRI environment before starting the rest and meditation session. A single resting EEG-fMRI was obtained in HCs (9.24 min). However in meditators two serial (9.24 min each) resting EEG-fMRI were recorded, one during resting wakefulness, followed immediately by one during meditation. During a post hoc interview, all meditators reported that they could satisfactorily achieve the meditative state inside the scanner. This was also subsequently confirmed with analysis of the EEG wave form, where we visually inspected the EEG time series in the time and spectral domains to identify an increase in alpha oscillations during meditation, consistent with previous studies which have reported this increase (Takahashi et al., 2005; Lagopoulos et al., 2009).

# Data Acquisition

#### EEG Data Acquisitions

EEG data were recorded using a 32-channel MR-compatible EEG system (Brain Products GmbH, Gilching, Germany). The EEG cap (BrainCap MR, Brain Products) consisted of 31 scalp electrodes placed according to the international 10–20 system electrode placement and one additional electrode dedicated to the ECG which was placed on the back of the subject. Data were recorded relative to an FCz reference and a ground electrode (GND) according to 10–20 electrode system. Data was sampled at 5000 Hz to enable removal of MRI gradient artifact. The impedance between electrode and scalp was kept below 5 kΩ. EEG was recorded using the Brain Recorder software (Version 1.03, Brain Products). To prevent the movement of the subjects' head, we placed sufficient padding between the head and the head coil of the MRI scanner. Total time for each rest EEG recoding was same as rsfMRI recording (9.24 min), and time locked to each other.

#### fMRI Data Acquisition

rsfMRI was acquired using a 3T scanner (Skyra, Siemens, Erlangen, Germany). One hundred and eighty-five volumes of gradient-echo, Echo-Planar Images (EPI) were obtained using the following EPI parameters: 36 slices, 4 mm slice thickness in interleaved manner with an field of view (FOV) of 192 × 192 mm, matrix 64 × 64, repetition time (TR) 3000 ms, echo time (TE) 35 ms, re-focusing pulse 90◦ , matrix- 256 × 256 × 114, voxel size-3 × 3 × 4 mm. We also acquired a three dimensional magnetization prepared rapid acquisition gradient echo (3D MPRAGE) sequence for anatomical information (with the voxel size 1 × 1 × 1 mm, 192 × 192 × 256 matrix) for better registration and overlay of brain activity.

# Data Analysis

#### EEG Artifact Removal and Preprocessing

Raw EEG data was processed offline using BrainVision Analyzer software version 2 (Brain Products GmbH, Gilching, Germany). The gradient artifacts were corrected according to Allen et al. (2000). A moving average width of 20 MR volumes (TRs) was used for gradient correction. Corrected EEG data were filtered using a high-pass filter at 0.03 Hz and a low-pass filter at 75 Hz. The data was then down-sampled to 250 Hz. Ballistocardiogram (BCG) artifacts were removed from the EEG using an averaged subtraction method using heartbeat events (R peaks; Allen et al., 1998, 2000; Goldman et al., 2000), implemented in BrainVision Analyzer 2. Once gradient and BCG artifacts were removed, the data were downsampled to 250 Hz. They were visually inspected for artifacts resulting from muscular sources or head movement artifact and any epoch containing any channel varying more than 150 µV was rejected. Finally, the signal was filtered with a band-pass of 0.01–45 Hz. To confirm that the meditators were in the meditative state during the second session, EEG spectral analysis was carried out for the entire 9.24 min for resting state and for the entire 9.24 min for meditative state. A fast fourier transform (FFT) was used to calculate spectral power density (µV 2 ).

#### EEG Feature Extraction

The EEG microstates were derived from the resting EEG data using sLORETA software. To compute microstates the local maxima of the global field power (GFP; Lehmann and Skrandies, 1980) were identified. Since scalp topography remains stable around peaks of GFP as a result of temporal smoothing in sLORETA, they are representative of the transient microstates (Pascual-Marqui et al., 1995; Koenig et al., 2002). The algorithm implemented for estimating microstates is based on a modified version of the classical k-means clustering method in which cluster orientations are estimated. An optimal number of 13 clusters for this method was determined by means of a cross-validation criterion (Pascual-Marqui et al., 1995). The entire EEG data at each time point was decomposed into a temporal sequence of one of these 13 EEG microstates. We used this EEG microstate decomposition to guide the analysis of the rsfMRI data at the single subject level.

#### rsfMRI Preprocessing

The rsfMRI analysis was performed using FSL software (FMRIB's Software Library<sup>1</sup> ), in particular with the fMRI Expert Analysis Tool (FEAT) and multivariate exploratory linear decomposition into independent components (MELODIC) modules. The first five functional images (EPI volumes) were discarded from each of the subjects' rsfMRI data to allow for signal equilibration, giving a total of 180 volumes used in analysis. We conducted motion correction using MCFLIRT (Jenkinson et al., 2002), and non-brain tissue (Scalp, CSF, etc.) removal using the Brain Extraction Toolbox (Smith, 2002). The average head motion of the subjects was also not found to significantly differ between the groups. Spatial smoothing was performed using a Gaussian kernel of 5 mm full width at half maximum (FWHM), followed by a mean based intensity normalization of all volumes by the same factor, and then a high temporal band-pass filtering with sigma 180 s. These temporal filtering parameters were based on previously validated methodology employed by Beckmann and Smith (Smith et al., 2004; Beckmann and Smith, 2005; Beckmann et al., 2005). We co-registered the EPI volumes with the individual subject's high-resolution anatomical volume in MNI152 standard space template and re-sampled the filtered data into standard space, keeping the re-sampled resolution at the fMRI resolution (3 mm) using FNIRT, the FMRIB non-linear image registration tool (Klein et al., 2009). Two participants from each group were excluded due to uncorrectable cardio-ballistic and motion artifacts during data preprocessing.

#### Identification of DMN-Microstate

The spectral Z-stats maps of the EEG microstates were created by convolving all the 13 microstates with the gamma hemodynamic

<sup>1</sup>www.fmrib.ox.ac.uk/fsl

response function using customized three column format in FEAT Version 5.98, part of FSL (FMRIB's Software Library<sup>2</sup> ). The GLM was modeled as an event related design, where EEG microstates were considered as explanatory variables (regressors) for rsfMRI analysis. This procedure, enabled us to combine EEG and fMRI time series despite their very different sampling rates. To do so, the onset time and the duration of each EEG microstate were provided as inputs to the GLM and convolved with a gamma hemodynamic response function (Musso et al., 2010). This rsfMRI, thus informed by the EEG microstates, was analyzed at the individual level to derive subject-wise Z-stats maps for each microstate. We examined the spatial correlation (R) between the spectral Z-stats map and the DMN component estimated by ICA of blood-oxygenlevel dependent (BOLD) activity (see below). Using the FSL utility ''fslcc'', the Z-stats maps which correlated with BOLD ICA map of the DMN with a correlation of at least 0.3 were selected for further analysis (Kiviniemi et al., 2011). When more than one microstate Z-map correlated with the DMN the one with highest correlation was selected. This subject-wise spectral Z-stats map of the DMN microstate was included in further group analysis. Group level analysis was done using the higher level FEAT tool, where three group mean effects (controls at rest, meditators at rest and meditation state) were extracted by specifying the contrast (1 0 0; 0 1 0; 0 0 1) respectively.

#### Independent Components (ICs)

Functional connectivity was ascertained using ICA which decomposes the rsfMRI data into statistically independent spatial maps (and associated time series). This is a data driven approach by which we can extract the functional networks in a voxel-wise manner based on their spatial and temporal signal fluctuations. ICA of rsfMRI was carried out with the probabilistic independent components analysis (PICA) using FSL's MELODIC method. This single-subject ICA was carried out with FEAT MELODIC data exploration, in which the rsfMRI data was extracted into the independent components (ICs) during the preprocessing step. All extracted IC were visually inspected. Non-motion related noise sources such as eye related artifacts, field in homogeneity, high frequency noise, slice dropout and gradient instability were removed on the basis of existing literature (Smith et al., 2004; Beckmann and Smith, 2005; Beckmann et al., 2005). ICs that showed clearly identifiable patterns in both spatial and frequency domains, and activations in frequency plots below 0.1 Hz were removed to reduce respiration and heart rate related artifacts. The selected noise components were removed using the command ''fsl\_regfilt'' defined in FSL MELODIC. Multivariate group PICA was then carried out using FSL MELODIC (Beckmann and Smith, 2005; Beckmann et al., 2005), to derive maximally spatially ICs across all datasets (18 control and 18 meditator × 2 sessions), which were temporally concatenated. The data set was decomposed into 15 sets of independent vectors which describe signal variation across the

<sup>2</sup>http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/

temporal (timecourses) and spatial (maps) domain by optimizing for non-Gaussian spatial source distributions using the Fast ICA algorithm. The resulting maps were threshold using an alternative hypothesis test with a significance level of p > 0.5 (i.e., an equal weight is placed on false positives and false negatives) by fitting a Gaussian/Gamma mixture model to the histogram of intensity values (Smith et al., 2004; Beckmann and Smith, 2005). Finally, the components representing the DMN were visually identified from the set of group IC maps by comparing their constituent cortical sources with the regions of the DMN reported in the literature (Beckmann and Smith, 2005; Beckmann et al., 2005; Damoiseaux et al., 2006).

#### DMN-Microstate Statistical Analysis

The mean duration and number of occurrences per second of the DMN microstates were calculated for each subject. The duration of the DMN-microstate was defined as a continuous time epoch (milliseconds) during which all EEG GFP maps were assigned to the the DMN-microstate class by the k-means clustering algorithm. The frequency of occurrence of the DMNmicrostate is defined the number of such occurrences per second. Group differences were calculated for both the DMNmicrostate duration and frequency of occurrence for all subjects. Independent samples t-tests were carried out to compare the difference between meditators and controls at rest, and a paired sample t-test was carried out between the meditator at rest and during meditation. At a 95% confidence interval (CI), a p-value of < 0.05 was considered significant. Furthermore, the number of years of meditation practice were correlated with the duration and occurrence of DMN microstates with a pearson correlation. Multiple comparisons were compensated for by FDR correction.

#### DMN Dual Regression Analysis

Groupwise comparison of rsfMRI ICs was performed using dual regression, which allows for voxel-wise comparisons<sup>3</sup> . The set of spatial maps from the group-mean analysis (GroupICs) was used to generate subject-specific-spatial maps and associated time series (Beckmann et al., 2009; Filippini et al., 2009). DMN component maps from all subjects were arranged into a single 4D file and voxel-wise differences were estimated. Statistical differences were assessed using randomized, onparametric permutation, incorporating the threshold-free cluster enhancement (TFCE) technique (Smith et al., 2009). Separate null distributions of t-values were derived for the contrasts reflecting the inter-group effects by performing 5000 random permutations and testing the difference between groups or against zero for each iteration (Nichols and Holmes, 2002). To estimate group mean effects, HC > meditator, meditator > HCs, meditator > meditation and mediation > meditator contrasts were used and the resultant statistical maps were threshold at p < 0.05 (family-wise error (FWE) corrected). Following this, inter-group effects were threshold by at p < 0.05 (false discovery rate (FDR) corrected; Filippini et al., 2009). When the voxel wise differences between the groups were higher, the term ''increased connectivity'' was used and vice versa.

# RESULTS

Despite being in an MRI scanner with gradient acoustic noise, with a fixed head and body position, all meditators reported that they were able to stay awake in relaxed state and also enter into a meditative state as instructed. Visual inspection of the EEG for sleep spindles and K-complexes confirmed that none of the subjects fell asleep during rsfMRI data acquisition. In all the results presented below, three conditions are compared, controls at rest, meditators at rest, and the same meditators during meditation.

# Group Level EEG Frequency Differences During Meditation

Significant changes were noted in spectral power of EEG frequency bands between the meditation and rest states in meditators. Meditators had increased power particularly in the alpha, theta and beta frequencies in the meditation state compared to rest state, which is depicted in the **Figure 1**. This also ascertained that the meditators did not go into meditative state during resting state and performed meditation in meditative state as instructed.

# Spatial Analysis of DMN-Microstate and DMN During Meditation

**Figure 2** depicts the average DMN-microstate, the corresponding Z-stats map and rsfMRI-derived DMN ICA component in HCs, meditators at rest (meditator) and during meditation.

Dual regression analysis (**Figure 3**) of the rsfMRI DMN revealed that meditators were different from HCs (p < 0.05) already at rest, as they had decreased posterior cingulate (number of voxels: 223, T-max: 0.987, coordinates: −4, −40, 26) connectivity, increased right middle frontal gyrus (MFG; number of voxels: 263, T-max: 0.956, coordinates: 42, 8, 46) and left middle temporal gyrus (MTG; number of voxels: 53, T-max: 0.963, coordinates: −52, −48, 6) connectivity. This

<sup>3</sup>http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/DualRegression

FIGURE 2 | Default mode network (DMN) microstates during rest and meditation. (A) Group averaged electroencephalography (EEG) microstate in controls at rest (control), meditator at rest state (meditator) and meditator during meditation (meditation). (B) Group averaged Z-stats maps corresponding to the DMN microstates. (C) Corresponding DMN components derived from resting state functional magnetic resonance imaging (rsfMRI) using independent components analysis (ICA). The striking spatial similarity between the EEG-derived DMN Z-stats map (B) and rsfMRI-derived DMN (C) is evident. Meditators showed evidence of decreased connectivity in precuneus/posterior cingulate cortex (PCC) and medial prefrontal cortex (mPFC) at rest, which decreased further during meditation (C).

relatively lower cingulate connectivity was further significantly reduced relative to rest, when the meditators entered into the meditative state (number of voxels: 107, T-max: 0.998, coordinates: −2, −40, 26). Alongside this reduction, we also observed increased left MTG (number of voxels: 122, T-max: 0.992, coordinates: −58, −46, 8) and right MFG (number of voxels: 424, T-max: 0.998, coordinates: 40, 8, 46) connectivity during meditation.

(p < 0.05 false discovery rate (FDR) corrected) decreased PCC connectivity in meditators compared to controls, which further decreased during meditation. (A) Meditators had higher connectivity in right middle frontal gyrus (MFG) and left middle temporal gyrus (MTG) than controls, which again increased during meditation (B).

#### Temporal Analysis of the DMN-Microstate

The meditators at rest had increased duration and higher frequency of occurrence of the DMN-microstate compared to controls (**Figure 4**).The mean duration of the DMN-microstate was 93.05 ± 15.18 ms in HCs and 118.88 ± 12.78 ms in meditators at rest (Unpaired t-test, t = −3.55; df = 34; p = 3.6E-06); the mean frequency of its occurrence was 3.15 ± 0.66/s in controls and 3.84 ± 0.48/s in meditators (Unpaired t-test, t = −3.56; df = 34; p = 0.001). During meditation the duration and frequency of occurrence increased significantly further. Specifically, during meditation, the mean duration of DMNmicrostate was 134.27 ± 11.21 ms (Paired t-test, t = −5.06; df = 17; p = 9.5E-05) and the mean frequency of occurrence was 4.03 ± 0.46/s (Paired t-test, t = −5.06; df = 17; p = 1.3E-06; **Figure 4**).

# Correlation of DMN-Microstate Dynamics With Years of Experience in Meditation

The observed differences in the temporal properties of the DMNmicrostate suggested a combination of state and trait-based influences. To further explicate the influence of long-term meditation experience in engendering trait differences between meditators and controls, we correlated the duration and frequency of occurrence of the DMN-microstate in individual meditators with their meditation experience in years. The results are summarized in **Figure 5**. There was a very strong, significant positive correlation in the duration of DMN-microstate in meditators at rest (r = 0.96, p = 7.3E-11) and in the meditative state (r = 0.51, p = 0.03) with years of meditation practice (**Figure 5A**). Interestingly, in each meditator, the net increase in the duration of the DMN-microstate during meditation relative to rest actually correlated negatively with the years of meditation practice (**Figure 5C**, r = −0.549, p = 0.023). This implied that the more experienced meditators had a DMN-microstate duration akin to being in a meditative state even at rest, highlighting how the state effect of meditation on DMN dynamics gradually transitions into a trait effect that potentially permanently alters it over many years of practice. However, this effect could potentially be explained away by the age of the meditators rather than their years of experience, as participants with longer years of experience were older than those with lesser experience. To address this potential confound, we regressed out this effect by including age of the meditator as a covariate in the correlation analysis. Having accounted for the influence of age, we still found a significant negative correlation between the state-induced change in DMN-microstate duration and years of mediation experience (r = −0.549, p = 0.03). This confirmed that there was indeed a trait-level modulation of the DMN's dynamics produced

by meditation experience, that could not be explained away by aging.

Further, we found that this pattern of effects were specific to the duration of the DMN-microstate. Specifically, the effect of meditation experience on the frequency of occurrence of the DMN-microstate was not statistically significant, either at rest (r = 0.43, p = 0.06) or in the meditative state (r = 0.46, p = 0.053), though there was a trend towards increasing DMN-microstate occurrence with increasing years of meditation practice (**Figure 5B**). Further, the meditation state induced increase in the frequency of occurrence did not correlate with years of practice (r = −0.004, p = 0.98). As the frequency of occurrence of the DMN-microstate was unaffected by the years of experience, it was distinct from duration and is probably indicative of the state effect of meditation, as seen in **Figure 4B**.

## DISCUSSION

We have investigated dynamic alterations of the DMN using simultaneous EEG-fMRI, with the aim of characterizing the spatial and temporal changes caused by meditation induced alterations of consciousness. Using spatial ICA analysis of our fMRI data, we found that the posterior cingulate hub of the DMN was less strongly connected in meditators, which became further reduced during meditation. In contrast, the right middle frontal and left middle temporal gyri were more active in meditators and during meditation. By linking the fMRI with EEG, we were able to identify the DMN microstate in all participants, by matching its spatial configuration to their respective DMN ICA components. The temporal properties of this DMN-microstate highlighted its significantly higher duration and frequency of occurrence, in meditators at rest and during meditation. Further, the relationship between these temporal properties and years of meditation experience enabled us to explore state and trait influences on DMN dynamics. This analysis suggested that in less experienced meditators, entering into a meditative state was associated with significant increases both in the duration and frequency of the DMNmicrostate. As experience with meditation increased, there was a progressively higher duration of the DMN-microstate at rest and entering a state of meditation only brought about an increase in its frequency of occurrence. Taken together, these findings elucidate salient spatiotemporal mechanisms of meditation, and particularly highlight its role in altering the temporal dynamics of brain activity over short and long time scales.

Within the large body of fMRI literature on resting state networks, the DMN has been a particular focus of attention (Raichle et al., 2001; Buckner et al., 2008) as a key brain network that underlies awareness of the self within the environment. Our finding of decreased PCC connectivity and increased temporal and frontal connectivity within the DMN during meditation has been reported previously in seed to voxel connectivity studies (Farb et al., 2007; Brewer et al., 2011). Alongside, the significance of the temporo-frontal regions during meditation has been consistently highlighted in previous task based fMRI studies (Brefczynski-Lewis et al., 2007; Lagopoulos et al., 2009). Indeed, our findings mirror those from recent studies on fMRI of meditation (Baijal and Srinivasan, 2010; Cahn et al., 2010; Hasenkamp and Barsalou, 2012; Hasenkamp et al., 2012). In particular, Hasenkamp and Barsalou (2012) studied the effects of meditation experience on DMN functional connectivity, and demonstrated increased connectivity in frontal areas during meditation in meditators with more experience, which they interpreted as evidence of greater attentional control gained with such experience. In our meditator cohort, these attentional control networks could also be similarly more active in the expert meditators. This increased attentional control, along with decreased PCC connectivity, is hypothesized to undergo neuroplastic changes that enable them to gain better control of mind-wandering. Based on evidence of task positive networks being recruited at rest, it has been argued that continuous and regular meditation

practice transforms the rest state of experienced meditators (Brewer et al., 2011). However in the current study the right frontal and left temporal areas described are part of the DMN and do not represent task positive networks. However, it is possible that this neuroplasticity is mediated by task positive networks, and future research could elucidate how meditation modulates the relationship between these different brain networks.

Within this research context, our finding have generated new evidence delineating temporal changes in DMN activity that accompanies these spatial changes. Our finding of increasing frequency of occurrence of the DMN-microstate during meditiation implies that there is a decreased occurrence of other microstates, thus resulting in an overall decrease in variability in the temporal dynamics of brain activity. According to large scale dynamical models of consciousness (Deco et al., 2013), resting brain can be divided into three states, namely the spontaneous state, multistable attractor states and unstable spontaneous state, differing in their coupling strengths. The state with least coupling strength is the spontaneous state and that with highest coupling strength is the unstable spontaneous state often associated with a task. Human consciousness is postulated to be positioned at the verge of instability defined to lie between the multistable attractor state and the unstable spontaneous state. Within this framework, decreased variability within such a dynamical system is compatible with our finding of increased frequency of the DMN microstate during meditation. It is worth noting that the focus of the analysis here has been on the DMNmicrostate, as that was the only microstate that was consistently present in all subjects. This is similar to findings reported by Musso et al. (2010), whose group analysis only revealed a fronto-occipital network that overlapped with visual networks and the DMN. This is despite the fact that all the subjects in their study had microstates which correlated with several other resting state networks. With the continual advancement in methodological tools for simultaneous EEG-fMRI analysis, we might be able parse spatiotemporal brain dynamics with the requisite detail and consistency for mapping all the resting state networks.

Our finding of experienced meditators having a higher duration of DMN microstate that did not change much in the meditative state is consistent with the philosophical perspective (Lutz et al., 2008), behavioral (Carter et al., 2005; Srinivasan and Baijal, 2007) and brain structural changes (Lazar et al., 2005) related to the trait effects of meditative practice. Because of the synergistic interaction between state and trait effects it has been largely difficult to design studies to separate these effects (Cahn and Polich, 2006) and hence relatively few studies that have exclusively looked at the state effect of meditation. In future research, the study of neurophenomenology of meditation, which correlates moment-to-moment first person reports of internal experience to guide neuroimaging analysis might be able to separate state effect from trait effects of meditation (Jack and Shallice, 2001; Jack and Roepstorff, 2002; Lutz et al., 2002).

# CONCLUSION

Our investigation of the spatial configuration of the DMN in meditators and during meditation revealed a decrease in posterior cingulate connectivity, and a complementary increase in middle frontal and temporal connectivity. Using temporal microstate analysis applied to simultaneous EEG-fMRI data, we found that the duration and frequency of occurrence of the DMN-microstates are useful metrics of meditationinduced altered consciousness, and increase systematically across controls, meditators during rest and during meditation. We report complementary roles of these metrics, where duration primarily reflected the trait effect and occurrence represented the state effect of meditation. Our results shed new light on the neurobiology of meditation, and its effect on the spatiotemporal properties of the DMN in particular.

# AUTHOR CONTRIBUTIONS

RP and RDB have contributed equally for this article. Contributions to the conception or design of the work by RP, RDB, SLR. Data acquisition done by RP and NU and analysis was carried out by RP, RDB, NU, SM. RDB, SC interpreted the results of the study. Manuscript drafted by RP

## REFERENCES


followed by RDB, SC, NU revised the manuscript critically for important intellectual content. Final approval and agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved by all authors.

# FUNDING

We acknowledge the support of the Department of Science and Technology, Govt. of India, India for providing the 3T MRI scanner exclusively for research in the field of neurosciences.

# ACKNOWLEDGMENTS

We thank Dr. S. R. Chandra for ascertaining the quality of EEG (Department of Neurology, National Institute of Mental Health and Neurosciences (NIMHANS), India). We thank BK. Ambika, BK. Sneha, BK. Sushilchandra and BK. Srikant from Spiritual Application Research Center (SpARC), Prajapita Brahamakumari Iswariya Viswa Vidyalaya for facilitating the participation of expert meditators. We are grateful to the staff especially the radiographers, for their support. Finally, we thank all the participants for their time.


and interpretations. Neuroimage 80, 360–378. doi: 10.1016/j.neuroimage.2013. 05.079


ongoing conscious states during a simple visual task. Proc. Natl. Acad. Sci. U S A 99, 1586–1591. doi: 10.1073/pnas.032658199


image analysis and implementation as FSL. Neuroimage 23, S208–S219. doi: 10. 1016/j.neuroimage.2004.07.051


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Panda, Bharath, Upadhyay, Mangalore, Chennu and Rao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Pain Perception Can Be Modulated by Mindfulness Training: A Resting-State fMRI Study

I-Wen Su1,2,3, Fang-Wei Wu<sup>4</sup> , Keng-Chen Liang2,3,4, Kai-Yuan Cheng<sup>5</sup> , Sung-Tsang Hsieh2,3,6, Wei-Zen Sun2,3,7 and Tai-Li Chou2,4 \*

<sup>1</sup> Graduate Institute of Linguistics, National Taiwan University, Taipei, Taiwan, <sup>2</sup> Neurobiology and Cognitive Science Center, National Taiwan University, Taipei, Taiwan, <sup>3</sup> Graduate Institute of Brain and Mind Sciences, National Taiwan University, Taipei, Taiwan, <sup>4</sup> Department of Psychology, National Taiwan University, Taipei, Taiwan, <sup>5</sup> Institute of Philosophy of Mind and Cognition, National Yang-Ming University, Taipei, Taiwan, <sup>6</sup> Department of Neurology, National Taiwan University Hospital, Taipei, Taiwan, <sup>7</sup> Department of Anesthesiology, National Taiwan University Hospital, Taipei, Taiwan

The multi-dimensional nature of pain renders difficult a holistic understanding of it. The conceptual framework of pain is said to be cognitive-evaluative, in addition to being sensory-discriminative and affective-motivational. To compare participants' brainbehavior response before and after a 6-week mindfulness-based stress reduction training course on mindfulness in relation to pain modulation, three questionnaires (the Dallas Pain Questionnaire, Short Form McGill Pain Questionnaire-SFMPQ, and Kentucky Inventory of Mindfulness) as well as resting-state functional magnetic resonance imaging were administered to participants, divided into a pain-afflicted group (N = 18) and a control group (N = 16). Our results showed that the pain-afflicted group experienced significantly less pain after the mindfulness treatment than before, as measured by the SFMPQ. In conjunction, an increased connection from the anterior insular cortex (AIC) to the dorsal anterior midcingulate cortex (daMCC) was observed in the post-training painafflicted group and a significant correlation was found between AIC-daMCC connectivity and SFMPQ scores. The results suggest that mindfulness training can modulate the brain network dynamics underlying the subjective experience of pain.

#### Edited by:

Louis Bherer, Université de Montréal, Canada

#### Reviewed by:

Jeremy Hogeveen, University of California, Davis, USA Dhruman D. Goradia, Banner Alzheimer's Institute, USA

> \*Correspondence: Tai-Li Chou tlchou25@ntu.edu.tw

Received: 05 May 2016 Accepted: 27 October 2016 Published: 10 November 2016

#### Citation:

Su I - W, Wu F - W, Liang K - C, Cheng K - Y, Hsieh S - T, Sun W - Z and Chou T - L (2016) Pain Perception Can Be Modulated by Mindfulness Training: A Resting-State fMRI Study. Front. Hum. Neurosci. 10:570. doi: 10.3389/fnhum.2016.00570

# INTRODUCTION

Pain relates to a sensation that hurts. Pain is a very individual experience. Only the person experiencing it can be certain of its existence and even then may have great difficulty describing it with much accuracy. When it comes to measuring one's pain objectively, it is virtually impossible. The best way to find out how much pain a person is enduring depends at best on a subjective pain report.

# Mindfulness-Based Stress Reduction (MBSR) and Pain Relief

A recent development in the study of pain indicates that mindfulness meditation brings pain relief rather than merely acting as a placebo (Zeidan et al., 2012). Mindfulness skills make it possible for participants to stay not only attentive but also clearly aware of what they are experiencing at the moment, suggesting that mindfulness meditation can be an effective way to reduce pain and deserves further investigation.

Keywords: pain, resting-state, connectivity, mindfulness, perspective shift

Kabat-Zinn's world-renowned Mindfulness-Based Stress Reduction (MBSR) Clinic, founded in 1979, provides training to participants so that they may develop the capacity to observe their own thoughts and feelings from a detached perspective (McCracken et al., 2013), and thus see themselves as if they were to "video-tape someone else" (Gallagher, 2000), and in turn to view their pain, together with their pain-elicited emotions and judgments, as transient passing events rather than as reflections of the self or reality (McCracken et al., 2013). MBSR's effect on chronic pain patients has been well established as an outpatient program in behavioral medicine (Kabat-Zinn, 1982). Inspired by such findings among populations of Westerners, we intend to explore in this study functional magnetic resonance imaging (fMRI) changes vis-à-vis behavioral changes in pain-afflicted Chinese subjects after a 6-week MBSR training intervention.

# MBSR: Attention, Empathy, and Perspective Shift

Mindfulness is, according to Kabat-Zinn (2003), the "awareness that emerges through paying attention on purpose, in the present moment, and non-judgmentally to the unfolding of experience moment to moment." The role of awareness as a vital dimension of consciousness aiding in diverting one's sensational involvement with pain is not at all a new idea (Allport, 1988; Schmidt, 1994). Since "paying attention on purpose" to the very moment of existence is the basic skill and the ultimate outcome of MBSR training, we will focus mainly on attention in this study, and the word awareness will be included only as it pertains to our discussion of attention in this paper.

Mindfulness-based stress reduction training is in fact a means to cognitively "re-construct" participants' perception of pain, through developing their capacity to observe their own thoughts and feelings from a detached perspective (McCracken et al., 2013). The purpose of such a perspective shift is to place oneself in another's position, which is empathy, defined as the capacity to understand or feel what another person is experiencing from within the other being's frame of reference (Bellet and Maloney, 1991). It should be noted that the capacity to understand from another's frame of reference is a matter of perspective shifting. A speaker usually takes a stance from his own perspective, but to keep up with the true spirit of empathy, he will need to take the other's perspective in order to understand what the other is experiencing.

# Brain Regions Related to Attention and Empathy

Mindfulness meditation is typically described as non-judgmental attention to experiences in the present moment, as reviewed in Tang et al. (2015). Their review has elucidated neural mechanisms as being an important topic of study in this domain since the complex mental state of mindfulness is supported by alterations in large-scale brain networks. The study of neural mechanisms is a crucial aspect of attempts to understand the mental state of mindfulness and includes analyses of complex networks such as those involved in resting-state connectivity. Also, Tang et al. (2015) considered exhaustively the current state of research on mindfulness meditation and discussed the methodological challenges the field now faces. Taking into account the several shortcomings discussed by Tang et al. (2015) in existing studies, we propose three methodological improvements in the present study. First, a longitudinal approach is designed to minimize pre-training differences that exist in cross-sectional studies. Using two time points allows us to contrast post-training with pre-training to tease apart differences that existed among participants before training. Second, a control group that receives the same mindfulness training as the experimental group is added. The control group enables us to reduce confounds such as practice, memory, or fatigue found in a one-group longitudinal design. Third, resting-state connectivity is utilized to observe changes in coordination among brain regions related to mindfulness training. The correlation of this functional connectivity with questionnaire performance is analyzed to strengthen our arguments by establishing brainbehavior relationships.

As for previous studies on emotional awareness, we are inspired by McRae et al. (2008) in their research of the role of arousal in the relationship between trait emotional awareness and dorsal anterior cingulate cortex (dACC) activity. The relationship between the dACC and emotional awareness is specific to highly arousing emotional stimuli, such as viewing highly arousing pictures. When this is considered in conjunction with the brain areas involved in pain processing, especially the "pain matrix" described by Tracey and Johns (2010), many findings (e.g., Apkarian et al., 2005) indicate that the dorsal anterior midcingulate cortex (daMCC) and the anterior insular cortex (AIC) are highly relevant to attention, with the former (see Bush, 2011; Fan et al., 2014) being involved in top-down attentional control. According to Brooks et al. (2002), paying attention to a pain stimulus results in activation of the AIC, an area of the brain which a later study (Schweinhardt et al., 2006) reports as being related to both acute and chronic pain. Craig (2002) further confirmed that the AIC contains interoceptive representations that substantialize feelings from the body and emotional awareness, especially the pain-related unpleasant ones (Medford and Critchley, 2010).

The AIC has been claimed to be necessary for empathetic pain perception (Gu et al., 2012). Singer et al. (2004) found that empathic feelings for close others experiencing painful stimuli were associated with bilateral activation of the AIC. All these studies have contributed to the present research in their design of experiments aimed at identifying brain regions responsible for attention control, with hopes to see its correlation with emotion regulation and self-awareness, the three core areas mentioned in Tang et al. (2015).

## Our Working Hypotheses

The present study focuses on the change in signal intensity in areas anatomically related to the processing of nociceptive stimuli, as well as areas responsible for attention-related processes, and areas for empathic processes. Pain is a complex, multidimensional and subjective experience, which cannot be fully accounted for by any modality alone. It affects processes that are of a motor-integratory nature, as well as those that are deemed

sensory–discriminative, affective–motivational, and cognitive– evaluative (Kupers, 2006). Based on what we have learned from MBSR training, we hypothesize that one may learn to shift one's subjective experiences via attention practices and we hope to verify such an effect via the participants' self-reported surveys and the results of the fMRI scans.

To observe changes related to mindfulness training, we use a two-group longitudinal design by taking pre- and posttraining measurements with scans and questionnaires for both a pain-afflicted group and a control group. Thus, a 2 (group: low pain, high pain) × 2 (time: pre, post) mixed ANOVA design is used to explore the training effect on fMRI scans and three self-reported surveys, the Dallas Pain Questionnaire (DPQ), the Short Form McGill Pain Questionnaire (SFMPQ) and the Kentucky Inventory of Mindfulness (KIMS). Moreover, resting-state connectivity is also measured to explore changes in dynamic interactions among brain regions associated with mindfulness training. The changes in resting-state functional connectivity (rsFC) measurements are correlated with changes in the questionnaire measurements to elucidate the neural substrates of pain modulation.

Due to the multiple roles played by the AIC in attention, awareness and empathy networks, we hypothesize that the mindfulness training will influence pain perception by strengthening the AIC's connectivity with the dorsal ACC in meditators (Grant et al., 2011). The relationship between the AIC and the daMCC is crucial because these two regions are both involved in pain processing and in the attention salience network (Seeley et al., 2007), and therefore we hypothesize that the resting-state fMRI will display increased functional connectivity in brain regions associated with pain modulation, especially the AIC-daMCC connection, as a result of MBSR training.

## MATERIALS AND METHODS

Thirty-four adult participants were recruited in Taiwan for the fMRI experiment. Eighteen participants were selected as the painafflicted group, those who both claimed to suffer from moderate or severe pain and scored greater than 1 on the present pain intensity (PPI) index of the standard McGill Pain questionnaire (Melzack, 1975). The remaining sixteen participants selected as the control group scored less than or equal to 1 on the PPI pain index, indicating mild or no pain. These 34 native speakers of Mandarin Chinese (mean age = 38.59, 25 females) were first given an informal interview to ensure that they met the following criteria: (1) right-handedness, (2) normal or correctedto-normal vision, and (3) without a history of any language deficit or learning disability. After the interview, informed consent was obtained. Our study was approved by the Research Ethics Committee of National Taiwan University before the training and the experiments were administered.

The established model chosen for our mindfulness practices was a 6-week MBSR intervention developed by Kabat-Zinn (1982). The training consisted of six 2.5-h sessions per week and one 8-h non-speaking session in the 4th week. The participants were asked to learn and practice different kinds of mindfulness meditations during the training, including a body scan, sitting meditation, hatha yoga, walking meditation and other informal practices. The body scan was conducted under spoken directions, guiding the participants to progressively move their attention from their toes to head as they observed the physical sensations of different bodily regions. Sitting meditation involved concentration on one's own breath while remaining open-minded to thoughts, emotions, and other feelings. Hatha yoga contained gentle exercises and body stretching in order to improve the attentive awareness of one's physical situation in hopes of finding a balance of mind and body. Walking meditation involved walking with intense attention to changes in one's own gestures and movement. Lastly, the MBSR trainer demonstrates how to make use of the aforementioned methods so that the participants can use them for pain management and in other aspects of their daily life (Shapiro et al., 2007). During the training, the participants were encouraged to focus only on their own breathing, but had to remain aware of different sensations (e.g., sounds and thoughts), accepting the feelings without being responsive to them. All of these techniques were administered in order to encourage the participants to disengage from their personal thoughts and emotions.

In order to compare the differences between pre- and posttraining, both the questionnaires and the resting-state fMRI were employed as our major instruments. The three questionnaires were meant to measure the mindfulness skills of the participants, the effects of pain on each individual's life, and their thoughts about pain, both before and after the training; the resting-state fMRI was used to measure brain functional connectivity before and after the 6-week training.

In order to understand the progression of pain perception and mindfulness skills, two questionnaires, the DPQ and the SFMPQ, were distributed. In addition, the KIMS was used to obtain the participants' subjective evaluation of their acquired mindfulness skills. All three questionnaires were translated into Mandarin Chinese for the participating subjects.

The DPQ is designed to assess how chronic pain affects different aspects of an individual's life, including daily activities, work and leisure activities, feelings of anxiety-depression, and social interest (Lawlis et al., 1989). Each of the DPQ's sixteen sections consists of a single item with a short question asking to what degree pain has adversely affected a particular aspect of life and a corresponding continuous rating scale divided into 5–8 equal sections on which respondents are asked to place a mark indicating the degree from 0 to 100% that expresses their answer. As suggested in Lawlis et al. (1989), the subjective evaluation of pain experience is an important factor in determining how motivated a person is to seek treatment. The self-reported outcome of the DPQ, on the other hand, is conducive to the understanding of chronic pain and, therefore, serves as one of our pain assessment tools.

The SFMPQ, a short version of the McGill pain questionnaire (MPQ) (Melzack, 1975), is commonly used in evaluating how chronic pain influences participants' sensory, affective, and present feelings. It consists of two subscales with adjectival pain descriptors, including eleven sensory ones and four affective ones. The PPI index of the standard MPQ is also included in the

SFMPQ, whose items are presented according to an intensity rating scale ranging from none, mild, moderate to severe, so designed in order to assess the participants' subjective pain experience (Melzack, 1987). Due to its accessibility and proven validity, the easy-to-follow SFMPQ was chosen as one of our pain assessment tools.

The KIMS is designed to assess whether or not people can exercise mindfulness skills in their daily lives with regard to four facets: observing, describing, acting with awareness, and accepting without judgment (Baer et al., 2004). Mindfulness practices in the KIMS focus on the participants' abilities to put their feelings, emotion, perceptions and thoughts into words (Bergomi et al., 2013). Based on the "thinking for speaking" hypothesis articulated by Slobin (1987) that one's language use may shape one's cognition and may further affect one's feelings (of pain), we used the KIMS to assess the connection between cognition (one's comprehension as well as communicative skills) and sensation (one's feelings).

A 2 (group: low pain, high pain) × 2 (time: pre, post) mixed ANOVA was conducted on the composite score of the DPQ, SFMPQ, and KIMS questionnaire, respectively. These analyses allowed us to assess the behavioral changes related to mindfulness training intervention.

In both groups, resting-state fMRI was used to detect differences in the functional connectivity in brain regions before and after the MBSR training. Such a comparison may help to discover how the training via linguistic instruction assists in regulating a participant's specific networks that are involved in the cognition of pain. Before fMRI scanning, participants were instructed to lie in the scanner with their head position secured. The head coil was then positioned over the participants' head. Scanning was conducted on the Bruker 3T S300 BIOSPEC/MEDSPEC MRI scanner, using a quadrature head coil. During the resting-state fMRI scans, participants were instructed to close their eyes and think about nothing, but had to remain awake (Raichle et al., 2001). Each resting scan lasted for 10 min. The data were collected using a gradient-echo planar pulse (EPI) sequence [repetition time (TR) = 3 s, echo time (TE) = 30 ms, 35 slices oriented to the AC-PC line, flip angle = 90◦ , matrix size = 64 × 64, voxel size = 3.75 × 3.75 × 3.75, slice gap = 0 mm, field of view (FOV) = 24 cm × 24 cm].

Image preprocessing was performed with the Resting-State fMRI Data Analysis Toolkit (REST) version 1.6 (Song et al., 2011) and SPM5 (Statistical Parametric Mapping). The images were corrected for differences in slice-acquisition time to the middle volume and were realigned to the first volume in the scanning session using affine transformations. No participant had more than 3 mm of movement in any plane according to the averages of the realignment parameters. Co-registered images were normalized to the ICBM EPI template, smoothed using a full-width at half-maximum (FWHM) kernel of 10 mm (Chou et al., 2009), detrended, and bandpass-filtered (0.01– 0.1 HZ) to reduce non-neuronal contributions to BOLD fluctuations (Zang et al., 2007; Zou et al., 2008). Afterward, low frequency artifacts were removed with a high-pass filter (128-s cutoff period). The nuisance signals of the six head motion parameters, global mean signal, white matter, and cerebro-spinal fluid were regressed out from each voxel's time course.

The average BOLD signal time course within the seed of the AIC (radius = 8 mm), centered at [−34,16,6] based on a pain study (Tseng et al., 2010), was correlated to every voxel in the brain for each subject using Pearson's correlation coefficient. The threshold was set to p < 0.01 uncorrected at the voxel level. Before group comparisons, the correlation coefficients were converted to z-scores using the Fischer r-to-z transformation. These z-score images were entered into the statistical analysis. For each participant, the mean signal time course was computed from the seed (i.e., the AIC) and used as a regressor in a voxel-wise rsFC analysis with the functions of REST toolbox 1.6. For each participant, one whole-brain correlation map was obtained from the first-level analysis. Each map was then r-to-z-transformed in order to yield a normal distribution for parametric second-level group analysis. This level comprised a voxel-wise one-sample t-test for both the pre-training and post-training, examining whether the correlation coef?cient (z-value) indicates positive rsFC (Baur et al., 2013). Signi?cantly correlated voxels were determined at p < 0.05 corrected for FDR (false-discovery error rate) at the voxel level with a cluster size greater than or equal to 10 voxels with the use of the daMCC (radius = 10 mm) as our region of interest based on relevant studies (as reviewed in Bush, 2011).

In order to compare the changes in rsFC, we conducted a 2 (group: low pain, high pain) × 2 (time: pre, post) mixed ANOVA on the rsFC magnitudes as represented by the z-value. Finally, to explore the brain-behavior relationship, the rsFC magnitude was correlated to the composite score of the DPQ, SFMPQ, and KIMS questionnaire, respectively, with parametric (Pearson) correlation (Baur et al., 2013). All p-values were two-tailed. Correlations were assessed with SPSS (version 20).

## RESULTS

Analysis of the scores on the questionnaires to assess the participants' pain perception and mindfulness skills showed that the participants' abilities improved after the 6-week training. A 2 (group) × 2 (time) mixed ANOVA on the SFMPQ revealed a main effect of time (F = 8.91, p < 0.01), a main effect of group (F = 14.25, p < 0.01), and an interaction of time × group (F = 5.28, p < 0.05). The post hoc comparisons showed that the change in the composite score of the SFMPQ was significant in the pain-afflicted group (t = 3.05, p < 0.01), but not in the control group (t = 0.82 p = 0.42). Moreover, the 2 (group) × 2 (time) mixed ANOVA on the DPQ indicated a main effect of time (F = 6.45, p < 0.05) and a main effect of group (F = 6.23, p = 0.01). In addition, the 2 (group) × 2 (time) mixed ANOVA on the KIMS showed that there was a main effect of time (F = 75.25, p < 0.01) and an interaction effect of group x time (F = 5.12, p < 0.05). Means and standard deviations for the three questionnaires are included in **Table 1**. The results of the 2 (group) × 2 (time) mixed ANOVAs performed on the scores of the three questionnaires are included in **Table 2**.

High pain group (n = 18) Low pain group (n = 16) Pre Post Pre Post Mean (SD) Mean (SD) Mean (SD) Mean (SD) DPQ Daily 38.83 (4.63) 29.04 (3.77) 19.73 (4.32) 14.4 (4.26) Working/Leisure 39.72 (4.62) 25.72 (3.62) 17.07 (5.74) 13.62 (5.71) Anxiety/Depression 39.44 (2.97) 37.96 (4.18) 32.72 (5.41) 30.1 (5.83) Social interests 25.28 (4.19) 19.84 (4.16) 12.55 (3.72) 11.39 (4.05) SFMPQ Sensation 12.61 (1.4) 7.78 (0.94) 5.59 (1.7) 3.65 (1.64) Affective 4.72 (0.56) 2.86 (0.54) 3.06 (0.87) 2.18 (0.68) PPI 2.5 (0.17) 1.52 (0.22) 0.41 (0.12) 0.53 (0.23) KIMS Observing 34.53 (2.01) 43.34 (1.43) 32.38 (1.82) 36.06 (2.44) Describing 21.88 (1.54) 27.93 (1.44) 25.59 (1.43) 29.12 (1.22) Acting with awareness 24.87 (1.66) 31.68 (2.05) 27.24 (1.59) 31.06 (1.44) Accepting without judgment 17.78 (1.47) 28.06 (1.11) 22.65 (1.7) 29.83 (1.37)

TABLE 1 | Mean and standard deviation (SD) for scores of the Dallas Pain Questionnaire (DPQ), the Short Form McGill Pain Questionnaire (SFMPQ), and the Kentucky Inventory of Mindfulness (KIMS) for pre- and post-training.

TABLE 2 | 2 (group: low pain, high pain) × 2 (time: pre, post) mixed ANOVA on the Dallas Pain Questionnaire (DPQ), the Short Form McGill Pain Questionnaire (SFMPQ), and the Kentucky Inventory of Mindfulness (KIMS).


\*indicates a significant difference.

In the fMRI data analysis, a 2 (group) × 2 (time) mixed ANOVA on the rsFC magnitudes revealed an interaction between AIC activity and daMCC activity (the daMCC coordinates [−20,−6,21], z = 3.07, p = 0.001 uncorrected, **Figure 1**). The correlated voxels were also significant at p < 0.05 corrected for familywise errors (FWE), using a sphere of 10 mm radius based on relevant studies (Bush, 2011). The post hoc comparisons showed a significant increase in post vs. pre-training rsFC magnitudes in the pain-afflicted group (t = 2.68, p < 0.01), but not in the control group (t = 1.08 p = 0.30). Correlation analysis was also conducted between the AIC-daMCC connection strength and the composite score of the DPQ, SFMPQ, and KIMS questionnaire, respectively. The results showed a significant negative correlation between the AIC-daMCC connection strength for the pain-afflicted group, when measured against the composite score of the SFMPQ (r = −0.48, p < 0.05). There was no significant correlation for the DPQ (r = −0.05, p = 0.79) or KIMS (r = −0.07, p = 0.68) questionnaire.

#### DISCUSSION

We used a two-group longitudinal design as a methodological improvement over previous studies and found that, according to the SFMPQ questionnaire results, pain was reduced in the pain-afflicted group but not in the control group after MBSR training. Increased connectivity between the AIC and the daMCC was also found in the pain-afflicted group, but not in the control group. Furthermore, a negative correlation was found between the SFMPQ measure and the AIC-daMCC connection strength for the pain-afflicted group. We will center our discussion on how one's perception of pain may be different as a result of attention training, an MBSR-induced cognitive effect. We will also touch briefly on attention-triggered reappraisal.

# Cognition-Based Attention and Perception of Pain

Based on the results of the questionnaires, significant reduction of pain was observed after the 6-week training course. With the reduction in emotional discomfort and somatic problems, the participants' quality of life was enhanced, which indicates a positive impact from such attention training.

Clinically, mindfulness-based cognitive training has been found to be effective in regulating the participants' attention to stressor stimuli, i.e., their pain and the associated self-evaluation of pain, by successfully distracting them from the physical feeling of pain and its related associations using the trained decentering techniques (Garland et al., 2009). For example, to report a stimulus of pain, the participants need to be attentive enough to become alert, to orient themselves toward the source of pain and lastly to detect how they may be affected by it. Decentering allows the participants to become more aware of the experience "from some distance" (McCracken et al., 2013).

During the 6-week training period, we observed that attention plays a pivotal role in shifting our perceptions away from pain, by transforming our associative responses to it, and eventually

alleviating it. But, what enables the participants to report the stimuli of pain and the associated evaluation of it?

Our study does hint that one's application of attention enables a change in one's neural connections, which leads to the shift of pain sensation, as evidenced by the fMRI scans. By shifting their attention, the participants become detached observers who do not focus on being confronted with pain, but remain in a state of mindfulness; that is, they remain attentive. Their past experience of pain, often associated with negative emotions, is now viewed differently as a result of decentering. This may well expand into a new experience where one learns to be unaffected by obsession with one's own pain. The natural outcome of such perspective shifting may help reduce pain-related stress, mitigate emotional discomfort, or even yield positive emotions via pain modulation. Thus, by comparing the questionnaires conducted before and after the mindfulness-training course, we observed pain-reduction-induced emotional changes.

#### Mindfulness and Cognitive Reappraisal

When it comes to the results of the fMRI imaging, the increased connection strength observed between the AIC and the daMCC can be viewed as a result of the MBSR training in the painafflicted group. The AIC plays a pivotal role in deactivating the default mode network (DMN), which consists of brain areas that are activated during mind-wandering or resting states (Menon and Uddin, 2010). The DMN, most notably in the ventral medial prefrontal cortex (vmPFC) and the PCC, is deactivated during meditation—a state requiring constant focus on attention (Brewer et al., 2011). However, the DMN deactivation patterns that occur during meditation may vary from patient to patient (Buckner et al., 2008; Tagliazucchi et al., 2010).

We thus infer that the practice of MBSR may trigger AIC activation, which in turn may inhibit vmPFC activity, a signal reflecting emotional suffering (Apkarian et al., 2011). When receiving nociceptive stimuli, the AIC and the daMCC are both activated, and such co-activation is considered to be responsible for autonomic and emotional processes (Sterzer and Kleinschmidt, 2010). Both the AIC and the daMCC could very well be specific regions of the "salience network" (Sterzer and Kleinschmidt, 2010; Harsay et al., 2012), which "forms the fundamental neuroanatomical basis for all human emotions" (Decety, 2011). This significant difference between the experimental group and the control group indicates that the increased AIC-daMCC connectivity observed could be specifically related to pain perception.

The daMCC is said to take part in cognitive control and emotional regulation (Modinos et al., 2010; Kanske et al., 2011). Also, the daMCC is involved in attentional control (as reviewed in Bush, 2011). Together with the pain matrix proposed by Decety (2011), the strengthened connectivity associated with the participants' ability to modulate their perception of pain, based on a perspective shift, suggests that this strengthened connectivity is the underlying mechanism for such change in pain perception.

Alternatively, the AIC and the daMCC are often jointly activated, which appears consistent with the idea that both of them serve as complementary limbic sensory and motor regions

(Craig, 2009). These two regions work together to form a link between self-recognition and self-control. In the present study, the observed increase in AIC-daMCC connectivity may be related to enhanced self-recognition and self-control due to the MBSR training. The participants' perception of pain differed after the 6 week training program. We contend that the difference is a result of cognitive re-evaluation made possible via MBSR attention training, in keeping with the contention that a positive shift in perspective is an effective means of coping with stressful events and a good predictor of increased mindfulness, which is well supported by Garland et al. (2009, 2011). In the model advanced by Garland et al. (2011), there exists an upwardspiraling relationship between positive perspective shift and mindfulness: The more mindful one becomes, the more likely one is to use a positive perspective shift as a coping strategy in dealing with previously stressful or painful experiences.

## CONCLUSION

Our resting-state fMRI results suggest that people can modify their cognition of pain through attention training. The results based on the resting-state fMRIs and questionnaires indicate that a possible mechanism, perspective shift, may be at work during the cognitive regulation of pain perception during the mindfulness training. Regardless, we need to admit that several confounding factors may limit the findings presented so far. One is that we should have investigated the importance of quantity and quality on the effect of MBSR skill training. We also are aware that we should have included a scanning of participants engaged in a cognitive task to further reveal the function and meaning of increased connection strength.

## FUTURE STUDIES

Although recent studies, especially those involving neuroimaging, have started to identify brain areas and networks

## REFERENCES


that mediate the correlation between mindfulness and attention, the underlying neural mechanisms remain unclear. To gain a better understanding of the neural basis of the changes in the brain, we would like to consider other mechanisms in our future study. For instance, is it via language that pain can be noticed, constructed, and assessed, notwithstanding the fact that it is hard to come by much pain-describing language (Dyer, 2011)? The notion that labeling emotional states can help to regulate negative emotional states is hardly new, as can often be seen in talk therapies which involve individuals instructed to put their feelings into words in hopes of managing or transforming their feelings. Lieberman (2011) further confirms that putting feelings into verbal language activates a region of the brain that is capable of inhibiting various kinds of immediate experience, including affective distress. Perspective-taking is clearly not foreign to language, and empathy (i.e., taking the perspective of someone else) is partially mediated by language (Izard et al., 1988).

# AUTHOR CONTRIBUTIONS

I-WS, K-CL, K-YC, S-TH, and T-LC designed this study. I-WS, F-WW, W-ZS, and T-LC collected and analyzed data. I-WS, F-WW, K-CL, and T-LC wrote the paper.

# ACKNOWLEDGMENTS

This work was supported by the Ministry of Science and Technology (formerly the National Science Council of Taiwan) (MOST 102-2420-H-002-011-MY3, MOST 104-2410- H-002-151, and NSC 101-2410-H-002-093-MY2). We thank the MOST-supported Imaging Center for Integrated Body, Mind and Culture Research, National Taiwan University for technical and facility supports. We also want to thank Mr. Roy Te-Chung Chen for conducting the MBSR training sessions.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Su, Wu, Liang, Cheng, Hsieh, Sun and Chou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fnhum-10-00570 November 8, 2016 Time: 17:17 # 8

# A Possible Role of Prolonged Whirling Episodes on Structural Plasticity of the Cortical Networks and Altered Vertigo Perception: The Cortex of Sufi Whirling Dervishes

#### Yusuf O. Cakmak <sup>1</sup> \*, Gazanfer Ekinci <sup>2</sup> , Armin Heinecke<sup>3</sup> and Safiye Çavdar <sup>4</sup>

<sup>1</sup>Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand, <sup>2</sup>Radiology Department, School of Medicine, Marmara University, Istanbul, Turkey, <sup>3</sup>Brain Innovation BV, Biopartner Center, Maastricht, Netherlands, <sup>4</sup>Department of Anatomy, School of Medicine, Koc University, Istanbul, Turkey

Although minutes of a spinning episode may induce vertigo in the healthy human, as a result of a possible perceptional plasticity, Sufi Whirling Dervishes (SWDs) can spin continuously for an hour without a vertigo perception.This unique long term vestibular system stimulation presents a potential human model to clarify the cortical networks underlying the resistance against vertigo. This study, therefore, aimed to investigate the potential structural cortical plasticity in SWDs. Magnetic resonance imaging (MRI) of 10 SWDs and 10 controls were obtained, using a 3T scanner. Cortical thickness in the whole cortex was calculated. Results demonstrated significantly thinner cortical areas for SWD subjects compared with the control group in the hubs of the default mode network (DMN), as well as in the motion perception and discrimination areas including the right dorsolateral prefrontal cortex (DLPFC), the right lingual gyrus and the left visual area 5 (V5)/middle temporal (MT) and the left fusiform gyrus. In conclusion, this is the first report that warrants the potential relationship of the motion/body perception related cortical networks and the prolonged term of whirling ability without vertigo or dizziness.

#### Edited by:

Claudia Voelcker-Rehage, Chemnitz University of Technology, Germany

#### Reviewed by:

Hidenao Fukuyama, Kyoto University, Japan Xin Di, New Jersey Institute of Technology, USA

#### \*Correspondence:

Yusuf O. Cakmak yusuf.cakmak@otago.ac.nz

Received: 26 August 2016 Accepted: 03 January 2017 Published: 23 January 2017

#### Citation:

Cakmak YO, Ekinci G, Heinecke A and Çavdar S (2017) A Possible Role of Prolonged Whirling Episodes on Structural Plasticity of the Cortical Networks and Altered Vertigo Perception: The Cortex of Sufi Whirling Dervishes. Front. Hum. Neurosci. 11:3. doi: 10.3389/fnhum.2017.00003 Keywords: cortex, whirling, meditation, vertigo, vestibular, plasticity

## INTRODUCTION

Most studies to date that have investigated brain networks for vertigo or vestibular system have used or focused on the sole stimulation of the vestibular system, such as caloric or galvanic stimulation. However, these methodological approaches lack an engagement of proprioception and vestibular organ stimulation that would mimic motion perception based vertigo. To date, there is not a study that has investigated the structural plasticity induced by a prolonged period of stimulation using both systems, together. The Mevleviye Semazens, Q6

**Abbreviations:** DLPFC, Dorsolateral Prefrontal Cortex; DMN, Default Mode Network; fMRI, Functional Magnetic Resonance Imaging; LH, Left hemisphere; MT, Middle Temporal; PCC, posterior cingulate cortex; POI, Patch of Interest; RH, Right hemisphere; SWDs, Sufi Whirling Dervishes; TAL, Talairach; TMS, Transcranial Magnetic Stimulation; V5, Visual area 5.

alternatively known as Sufi Whirling Dervishes (SWDs), have a unique meditation style that is termed as the Sema Ceremony which may provide a unique model to investigate cortical networks of motion perception and balance together with vestibular and proprioception sensory systems.

The Mevleviye is an ascetic Sufi order founded in 1273 in Konya, Turkey (Smeets, 2006). In the Sema Ceremony, a SWD rotates anti-clockwise around the vertical axis of their body, while also rotating around the other SWDs. Whirling intends to be a travel of soul awareness and loosening of the material self. To become trained for the Sema Ceremony so that they can able to rotate up to 1 h without vertigo or dizziness perception, Sufis traditionally receive up to 1000 days of training within the Mevlevi houses. At the end of that time, the Sufis are now trained as SWD. They re-join their families and return to their jobs, but gather together for Sema ceremony for several days in a week (Smeets, 2006). This unique whirling based meditation style of SWDs achieves extraordinary physiological outcomes that overcome vertigo and balance impairment, which would be expected after prolonged times of whirling.

It has been argued that alternating self-localization is caused by abnormal integration of vestibular signals. Additionally, it has been shown that vestibular processing is involved in space perception and locomotion (somatosensory processing) as well as the cognitive aspects of own-body representations, the consciousness of the own-body and bistable visual perception (Lopez et al., 2010, 2012).

The default mode of functioning was initially defined on particular areas of the brain that decrease activity when subjects focus on goal-directed tasks in comparison to simply resting (Raichle and Snyder, 2007). In the following years, the default mode definition extends to default mode network (DMN; Raichle and Snyder, 2007). The DMN of the brain has been observed to be related to self-awareness, consciousness, embodiment and also unhappiness (Killingsworth and Gilbert, 2010; Brewer et al., 2011). Therefore, it may also be theorized that prolonged periods of whirling based meditation of the Sufi dervishes contribute to structural changes in the networks of the DMN and selfperception, as well as motion perception related networks. A recent analysis of research on DMN that included anatomical connectivity and task-evolved neuroimaging revealed hubs of the resting state activity, including posterior cingulate cortex (PCC) and Precuneus areas (Gramann et al., 2006; Andrews-Hanna, 2012). It is relevant to note that in Kang's study, the reported regions of decreased cortical thickness in meditators' brains were the precuneus and PCC of the DMN (Kang et al., 2013). Additionally, a study that looked at functional magnetic resonance imaging (fMRI) scans of all meditation types but not including SWDs (Brewer et al., 2011) demonstrated that experienced meditators compared with controls showed decreased activity in the DMN including the main hub precuneus (Laird et al., 2009).

Previous studies demonstrated that long-term meditation practice is associated with altered resting brain activity which suggests long lasting activity changes persist in the brain (Lutz et al., 2004). The following cross-sectional studies demonstrated that meditation and experience dependent differences are correlated with cortical thickness (Maguire et al., 2000; Mechelli et al., 2004; Lazar et al., 2005). Significant positive associations were also evidenced between the cognitive ability factor and cortical thickness in most multimodal association areas in a large sample of healthy children and adults (Karama et al., 2009). Moreover, a relationship between cortical thickness and functional activation in the early blind have also been demonstrated recently (Anurova et al., 2014). A recent research (Burge et al., 2016) demonstrated that cortical thickness in human V1 is associated with central vision loss in 10 macular degeneration patients in comparison to 10 controls in a crosssectional study that underlines the functional relationship with the cortical thickness. Considering the motionless but embodiment-related meditation study results that demonstrated a functional depression on cortical hubs of body perception networks, cortical thickness changes in experienced based crosssectional studies and loss of visual input reflections on thinner relevant cortical areas, we theorized that there might also be a decrease in cortical thickness of precuneus and PCC of the DMN in SWDs as a results of the depressed or altered perceptions of motion and embodiment inputs to induce the cortical neuronal changes, resulting in the thinning of responsible cortices. Any additional structural plasticity findings of the SWDs' cortical areas may also have the possibility to highlight the plasticity of the motion related networks that may be responsible for the alterations of the vertigo perception of SWDs. Improvement of navigation softwares for non-invasive brain stimulation techniques like Transcranial Magnetic Stimulation (TMS) have enabled structural and functional mapping of the brain and targetted stimulation of the specific cortical areas to be performed. This study aimed to map the structural cortical plasticity induced by Sufi whirling meditation as a unique human model of vestibular system stimulation and plasticity to clarify a network that may alter vertigo perception in SWDs and it may also provide a potential cortical map for non-invasive brain stimulation modalities to alleviate vertigo.

## MATERIALS AND METHODS

#### Participants

It is noted that far fewer SWD ceremonies are performed in Mevlevi-houses, as a result of secularization policies enforced by government in the early 20th Century. Although, in the late 20th Century, the Turkish government did again allow performances, most of these have been confined to public tourist audiences and are simplified to meet commercial requirements (Smeets, 2006). Consequently, it is difficult to select SWDs who perform the whirling ceremony using the traditional physical and spiritual method.

Ten (8 male, 2 female adults) right-handed traditional SWDs with greater than 8 years and an average of 10.5 years of whirling meditation (regular two whirling sessions each week) and 10 (8 male, 2 female adults) meditation naive right-handed controls were included into our study. The controls were case-matched for the country of origin and location (Turkey, Istanbul), primary language (Turkish) and demographics such as sex, age, race, education and employment status (SWD mean age: 32 years (range: 26–44), 8 male, 2 female. Control group mean age: 33 years (range: 26–44), 8 male, 2 female). Exclusion criteria for all subjects were abnormalities in magnetic resonance imaging (MRI), MRI incompatible implants and implanted devices and general medical disorders or any clinically relevant abnormalities. All subjects were free of medical, neurological and psychiatric disorders.

All procedures of this study were carried according to the principles and procedures outlined in the Declaration of Helsinki for Medical Research involving human subjects. The study was approved by the Ethics Committee (Prof. Ihsan Solaroglu - Head of the ethical committee for noninvasive human research. Institutional Review Board) at the Koç University. Each Participant provided written informed consent before entering the study and understood that s/he could discontinue the study at any time.

## Scanning Sequence

MRI scans of the participants were carried out on a 3T scanner with an 8-channel head coil (MAGNETOM Verio, Siemens Healthcare, Erlangen, Germany). The three-dimensional magnetization prepared rapid acquisition gradient echo sequence (3D T1-weighted MPRAGE). This was used to acquire the volume data of the whole brain of all the participants. 3D T1-weighted MP-RAGE protocol (42) was used with the following parameters: TR = 1670 ms, TE = 2.47 ms, TI = 900 ms, flip angle 9◦ , 176 slices scanned for sagittal plane with 1.0 mm slice thickness and the scanning matrix was 256 × 256 with a field of view of 250 mm, resulting in a voxel size of 1.0 mm × 1.0 mm × 1.0 mm. Total scan time for 3D T1-weighted imaging was 3.47 min.

# Anatomical Data Processing

#### Data Import, Preprocessing and Normalization

Raw MRI data from each subject was provided in DICOM format. It was then imported and converted into BrainVoyager's internal ''VMR'' data format. Correction for inherent spatial intensity inhomogeneities was applied according to Vaughan et al. (2001). The data was then transformed into AC-PC position and Talairach (TAL) standard space.

#### Cortex Segmentation

Segmentation of the gray/white matter boundaries was achieved using the method of Kriegeskorte and Goebel, 2001. This included automatic segmentation routines, followed by a ''bridge removal'' algorithm, which ensured the creation of topologically correct mesh representations. For the two resulting segmented subvolumes, borders were tessellated to produce a surface reconstruction of the left hemisphere (LH) and right hemisphere (RH; Kriegeskorte and Goebel, 2001). All processing steps for segmentation as well as cortical thickness calculation have been thoroughly checked and evaluated by an expert. This is an important part of the preparation for the cortical thickness calculation in BrainVoyager, which is not directly comparable to automated analysis approaches in other analysis tools.

#### High-Resolution Intersubject Cortex Alignment

A high-resolution, multiscale cortex alignment procedure was performed following the method of van Atteveldt et al. (2004). This procedure substantially increased the statistical power and spatial specificity of group analyses. Before performing the group analysis on the basis of the subject-specific cortical thickness maps, all the single subject maps have been aligned using transformation matrices generated on the basis of cortical alignment (Fischl et al., 1999). The Cortex-based alignment approach (Fischl et al., 1999) has been specifically applied to the data to allow a proper comparability between cortical structures between subjects. Taken this into account, smoothing the cortical thickness data was not necessary. The addition of smoothing may even prove to be more harmful than helpful. In this context, multiple reference articles using BrainVoyager for analyzing cortical thickness data don't apply spatial smoothing (Davis et al., 2008; Geuze et al., 2008; Strenziok et al., 2011; Van Swam et al., 2012; Thorns et al., 2013).

#### Cortical Thickness Analysis

The normalized version of each VMR was prepared in the following way to prepare the calculation of cortical thickness. First, the VMR data was interpolated to a higher resolution (0.5 mm <sup>∗</sup> 0.5 mm <sup>∗</sup> 0.5 mm) version using a sinc interpolation. In this new dataset, the ventricles and subcortical areas were filled, using a standard intensity value. Using an automatic detection approach, the cerebellum was removed. By applying a sigma filtering step, the tissue contrast of the data was enhanced.

Next, the boundary between gray and white matter was detected using a gradient-based adaptive approach. On the basis of a dilation procedure, the border between gray matter and Cerebrospinal fluid was detected. The final result of the preparatory steps consists of a VMR representing only gray and white matter in two grayscale/intensity values. To improve the quality of the procedure, this dataset was compared to the original VMR file in 0.5 mm resolution and corrected for potential errors.

To calculate the cortical thickness in the whole cortex, Laplace equations (Jones et al., 2000) were applied. The volumetric cortical thickness values were sampled to standardized surface meshes of the separate hemispheres using trilinear interpolation.

To correct the final result map showing group differences in cortical thickness, the automated cluster-level thresholding approach (Forman et al., 1995) was applied. This means that every patch of interest (POI) exceeding a calculated size (square mm on the surface mesh) is considered significant. The final result map has on average a corrected false alarm level of 5%. A cluster defining the threshold (CDT) of p < 0.05 was utilized. After applying the cluster level thresholding method, each contiguous regions of interest on the surface reaching a size beyond the calculated region size threshold (58 square mm for the left and 65 square mm for the RH) was turned into a POI. Based on the idea that there was no specific assumption about the direction of a difference in cortical thickness between groups, a two-sided t-test was performed to analyze the group differences.

For the surface-based analysis, a standardized mesh size was applied to the cortical surface of every subject. The standardized mesh has exactly ''40961'' vertices per hemisphere (identical for every subject) and thus solves a potential mapping issue between regions and between subjects. For the group comparison of the subject-specific cortical thickness surface maps, a simple subtraction of the cortically aligned thickness values was performed between two cohorts (Sufi dervishes and a control cohort). The analysis was performed specifically for each of the hemispheres based on the standard approach of segmentation and cortex-based alignment in BrainVoyager QX. The separation of hemispheres also allows for a more specific evaluation and analysis of hemisphere-specific effects. The variability within and between groups was also checked besides just using a t-test to compare the groups. It is still important to check the details within globally ''detected'' regions of interest. We have done this to perform a ''sanity check'' of the data included within the analysis and to explore the variability within and between groups.

#### RESULTS

**Figure 1** shows regional differences in cortical thickness between Sufi dervishes and control group displayed on the average surface mesh (after applying the cortex-based alignment procedure to obtain optimal fitting of cortical structures). An average difference in cortical thickness of 0.10 mm for the LH and 0.15 mm for the RH was found. We compared the average cortical thickness within each hemisphere between the groups and found no significant differences within either of the hemispheres (LH: t = 1.56, p = 0.14, RH: t = 2.07, p = 0.055).

**Table 1** shows the cortical thickness differences between Sufis and Controls for the whole brain and **Table 2** shows the significant clusters with their t, p values, X,Y,Z coordinates and areas on both hemispheres.

On the basis of the detected POIs on the surface mesh, corresponding TAL coordinates were extracted. This is based on the referential connection between the underlying TAL VMR and the cortical surface meshes created afterwards.

Based on external analysis of the coordinates using the ''Talairach daemon'' database (Lancaster et al., 1997) was performed. The results are displayed in **Table 2**. **Figure 1** demonstrates surface maps for the statistically significant differences of cortical thickness between groups (Sufi–Controls) in both hemispheres : four POIs for the RH and five POIs for the LH.



LH, Left hemisphere; RH, Right hemisphere.

#### DISCUSSION

The present study demonstrates differences in cortical thickness analysis between the 10 SWDs brain and 10 control cases as a proof of structural plasticity potentially induced by whirling meditation of SWDs.

Cortical thickness analysis, as structural plasticity reflection, is considered one of the best tools to reveal prolonged effects of meditation on the cerebral cortex. It has been well documented in the literature that meditation exercises including Zen meditation induce cortical plasticity in specific cortical zones as thickened gray matter (Lazar et al., 2005; Pagnoni and Cekic, 2007; Hölzel et al., 2008, 2010; Luders et al., 2009; Vestergaard-Poulsen et al., 2009; Grant et al., 2010, 2013). While the study by Lazar et al. (2005) demonstrated increased cortical thickness in the insula and prefrontal regions, yet Kang et al. (2013) demonstrated increased cortical thickness in the frontal and temporal regions only, and decreased cortical thickness in the DMN including the main hubs precuneus and PCC with different type of meditations. These findings are supported by fMRI studies by Brewer et al. (2011). Brewer et al. (2011) demonstrated decreased activity in the hubs of the DMN after numerous types of meditation. Cortical thickness results of the SWDs in this study showed similar results to Brewer et al. (2011) and Kang et al. (2013) reports for the DMN plasticity. There were four thinner cortical areas in the RH, and five thinner in the LH including the hubs of DMN as precuneus and PCC. There were no thicker cortical areas compared with the control group. The thinner cortical zones of SWDs were the Precuneus, PCC and on both hemispheres and middle temporal (MT)/visual area 5 (V5) and fusiform gyrus on the left and Dorsolateral Prefrontal cortex (DLPFC) and lingual gyrus on the RH. With the SWDs having meditation practice combined with movement, the results obtained in the left MT/V5, fusiform gyrus and Right DLPFC and DMN hubs may all underline the altered vertigo perception in whirling motion of SWDs.

FIGURE 1 | Thinner Cortical areas in Sufi Dervishes (Sufi > Controls). All regions are based on cluster thresholding.


TABLE 2 | The significant clusters with their t, p values, X, Y, Z coordinates and areas on both hemispheres.

#### Altered Perception of Motion and Vertigo

#### Left MT/V5

Our results demonstrated a thinner Left MT/V5 in the LH of the SWDs. In previous studies that have explored V5/MT, the left MT/V5 was shown to have more robust involvement in motion detection (Beckers and Hömberg, 1992; Stewart et al., 1999; Antal et al., 2004; Schwarzkopf et al., 2011; Tadin et al., 2011; Murd et al., 2012).

The dominance of left V5/MT may explain the thinner V5/MT on the LH of the SWDs. With the depressed function of left V5/MT, the perception of the movement during whirling meditation may be depressed and as a consequence, the possible physiological and motor responses to whirling motion perception may be inhibited. In addition to robust function of left V5/MT on motion perception, discrimination function is also found to be related with the left V5/MT rather than the right V5/MT (Cornette et al., 1998).

#### Fusiform Gyrus, Right DLPFC and Change Detection

Fusiform gyrus is well known for its face detection function but it also discriminates between places; particularly in the medial and anterior portions of the fusiform gyrus. Conscious detection of visual changes including face and place changes mostly relies on regions of the ventral visual cortex including the fusiform gyrus, but also the right DLPFC (Beck et al., 2001). The left fusiform gyrus is also found to be related with the non-face related visual changes (Rangarajan et al., 2014). Further, left fusiform gyrus showed greater activity when participants attended to changes in face parts than to changes in whole face. The opposite pattern was demonstrated in the right FFA (Rossion et al., 2000).

The role of right DLPFC in visual change awareness has been demonstrated by fMRI and TMS studies (Beck et al., 2001; Turatto et al., 2004). The right DLPFC is activated with place changes and it has been shown to be non-active in the cases of change blindness by fMRI and TMS studies (Beck et al., 2001). Additionally, it is demonstrated that activation of ventral occipitotemporal cortex, including lingual gyrus, is also related to the processing of visual information for human faces (McCarthy et al., 1991).

Our analyses showed that in the SWDs cortex, the fusiform gyrus were thinner on the left and lingual gyrus and DLPFC were thinner only on the right side. By doing so, SWDs may also have an altered state of place change perception, especially important for keeping the body stability in the case of a whirling meditation which includes a continuous place change stimulation for the place change detection areas of fusiform gyrus, lingual gyrus and right DLPFC.

#### Precuneus, Egocentric Framework and Default Network Hubs

In three dimensional spaces, humans navigate themselves with the aid of spatial relationship references. Two different type of spatial reference coding frameworks (or frame of reference) are allocentric and egocentric abilities. While the allocentric ability depends on object to object positional references and is independent from self-position, the egocentric ability depends on self to object positional references (Vogeley and Fink, 2003; Gramann et al., 2006). Research on the cortical regions related to these two distinct frameworks has revealed that egocentric conditions have activations exclusively within the precuneus in comparison to allocentric conditions (Gramann et al., 2006). The precuneus as a location of the egocentric representation does the updating during self-motion and it is demonstrated that it is the only region for working memory of directional updating (Land, 2014). The precuneus is considered as a machinery of self-perception that builds a conscious self-perception by providing continous data of external space to maintain a synchronous relationship with the body in the move and the objects in the environment (Land, 2014). In this study, bilateral thinner precuneus in SWDs was shown. Thinner precuneus may underline the role of egocentric framework depression to aid the extraordinary whirling ability of SWDs in addition to the role of depressed activations of left MT/V5 and fusiform gyrus and right DLPC and lingual gyrus. The last but not the least, it has been shown that the electrical cortical stimulation of the precuneus produce vestibular sensations and implicating the role of precuneus for vestibular information processing (Wiest et al., 2004).

#### The Default Mode and Subsystems

A recent analysis of the DMN neuroimaging studies suggests the precuneus is the core node or the hub of the DMN (Andrews-Hanna, 2012). Activation of both the DMN core hub precuneus and prefrontal cortex has been found to be related to self-perception and mind wandering (Kjaer and Lou, 2000; Kjaer et al., 2002). Imaging studies have shown that these areas are deactivated in the case of altered states of consciousness including vegetative state, hypnosis and sleeping (Maquet et al., 1997, 1999; Laureys et al., 1999; Hobson et al., 2000; Maquet, 2000). It has also been observed that the precuneus is one of the first zones to be reactivated in the case of a reconsciousness (Laureys et al., 2004, 2006).

In addition to bilateral thinner precuneus in SWDs, the posterior cingulate gyrus was thinner. In the context of thinner precuneus and posterior cingulate gyrus in SWDs's brains, it has been shown that these becomes progressively deactivated in anesthetic sedation states (Alkire et al., 1999; Fiset et al., 1999). Additionally, independent researchers have observed that coactivation of precuneus and PCC occurs when processing intentions related to self (Vogeley and Fink, 2003; den Ouden et al., 2005).

The outcomes of the present study may open a new era in the field of vertigo therapy. This is because if whirling based movement achieves inhibition of the self-perception and motion related regions such as left V5/MT, fusiform gyrus, right DLPFC precuneus and PCC and the DMN hub, then this may underline the cortical plasticity for a network of resistance to vertigo perception. Targeting these areas to inhibit by interventional non-invasive brain stimulation tools may achieve maintenance of balance when conditions of extreme spatial and proprioception information occur in addition to pathological conditions that trigger vertigo.

# Mood

A possible mood enhancing effect of the defined structurally plastinated cortical areas in the present study is worth to give an attention.

In the theory of mind, the DMN activity as a self-awareness state is correlated with the neuronal representation of mind-wandering (Kjaer and Lou, 2000; Kjaer et al., 2002). This DMN is active at all times except when suppressed by other networks, stimulated by other states, and its activity is correlated with lower levels of Happiness (Killingsworth and Gilbert, 2010; Brewer et al., 2011). Gusnard and Raichle (2001) demonstrated that the goal-directed cognitive process can decrease the activity of precuneus, the core hub of the DMN or mind-wandering network of the brain. Therefore, it is theorized that prolonged periods of goal-directed cognitive processes may decrease the mind-wandering activity in the SWD's brain because the precuneus activity has been decreased. fMRI data results from the Brewer et al. (2011) study on the DMN of the experienced meditators showed that the DMN main nodes, including medial prefrontal cortex and PCC areas extending to precuneus were relatively deactivated. Further, precuneus and PCC have been observed as thinner and concluded to be so as a response to meditation (Kang et al., 2013). In line with these results, the structural cortical thickness analysis of the two cohorts in this study showed that the experienced SWDs had bilateral thinner PCC and precuneus zones. It can be therefore be theorized that the prolonged period of decreased activity in the PCC and precuneus may result in the thinner zones in the SWDs. As the DMN activity presented thinner in the SWDs, this is likely related to suppressed mind wandering, and as a consequence, this plasticity may improve the happiness level in SWDs. These results justify further studies to clarify the potential effects of SWDs's unique meditation on their moods and depression levels.

### Behavior

In addition to potential mood enhancing effects by Whirling Meditation as achieved with decreased mind wandering, or by DMN activity as achieved with other types of meditation, decreased activity in the DLPFC may contribute to the behavioral attribute of honesty. As the precuneus stands as a core hub for the DMN, the DLPFC stands as a core hub for the executive network (Beaty et al., 2015). It has been shown that to achieve creative idea production, there is a coupling of the PCC and precuneus of the DMN with the right DLPFC of the executive network (Beaty et al., 2015). It has also been shown that increased task complexity or increased rule complexity is accompanied by increased activation in the right DLPFC and precuneus (Jia et al., 2015). In the case of lying, we need a creative idea production process and to consider long-term benefits. It has been shown that disruption of the right DLPFC leads to a greater selection of both gains and losses that have better immediate but worse long-term alternatives (Essex et al., 2012). It has also been shown that when the right DLPFC activity was disrupted using TMS, subjects were statistically less inclined to lie about the subject matter tested (Karton and Bachmann, 2011). Regarding the decreased thickness in the right DLPFC in SWDs, it may be theorized that this contributes to improve their behavioral attitude of honesty. It may also be speculated that the decrease in the thickness of fusiform and right DLPFC contributes to decreased discrimination of places and faces, and that such an altered perception of the world and people is also a result of SWDs's meditations. Further studies are needed to investigate whether the suppression of cortical areas related with discriminational perception leads to less selfish, egocentric behavior and increased level of happiness.

# Neuroprotection

The DMN activity is also found to be related to Alzheimer's disease (Bero et al., 2011). Bero et al. (2011) demonstrated that increased amyloid-β deposition overlaps with the DMN, including the core hub, precuneus and PCC regions. This overlapping is attributed to the higher metabolic activity of the DMN. The decreased thicknesses of these regions in SWDs cortex that are shown in the present study may underline a possible whirling meditation protective effect of over Alzheimer disease by its possible effect of decreasing the amyloid-β deposition in these regions because of their decreased activity in SWDs.

#### Limitations

The current study used a cross-sectional design and it is performed on a small group of SWDs. As a result of the crosssectional character of the study, the results are correlational and an absolute relationship between the cortical thinning and whirling experience can not be suggested. In addition, it may also be argued that individuals who have such cortical properties are more likely becoming a Sufi whirling Dervish. On the other hand, it is worth to note that there are numerous factors to consider that relate the outcomes of the present study to whirling experience. This is a cross-sectional study in a very unique (and rare) group who had traditional training for whirling (approximately 1 year in most of the cases) and each of them reported that they were falling down when they try to whirl in the first months of the whirling training sessions. This indicates that they did not have a unique previous ability that was superior to the predisposition of the control group. After the long-term training, they gained an ability to whirl without vertigo. The present cross-sectional study focused on experienced Sufis that passed through the same traditional whirling training that enabled them to whirl for an hour without falling. In this context, the detected structural differences are more likely to be specific to motion perception and body perception networks. As the analyses were performed without visual input, the outcomes of these areas are free of bias. Analyses within the article were almost not separated but discussed in the context of relevant networks. The structural cortical plasticity that was demonstrated in SWDs were distributed over the body/motion perception areas, therefore the discussion was focused on the possible relationships of the structural plasticity of the body/motion perception areas and their potential role to

#### REFERENCES


alter vertigo perception. In sum, we only explained the results on the basis of the previous work which was the most fitting way for the data obtained in this cross-sectional article. Longitudinal studies are needed to clarify the role of these areas in vertigo perception.

The outcomes of this cross-sectional study in a rare and unique group (whirling Sufi dervishes) warrants cortical zones which may have significant roles to alter vertigo/dizziness and address those areas for future studies. In conclusion, this is the first report that demonstrates correlations of the structural cortical plasticity and the prolonged period of vestibular system stimulation in humans.

# AUTHOR CONTRIBUTIONS

YOC designed, performed the study, analyzed the data and wrote the manuscript and made critical review of the manuscript. GE performed the study, collected data, contributed analytical tool, made critical review of the manuscript. AH analyzed and interpreted data, contributed analytical tool, performed statistics and made critical review of the manuscript. SÇ contributed analytical tool, collected data, wrote manuscript and made critical review of the manuscript.

# FUNDING

The study is funded by the Start-up fund of YOC, School of Biomedical Sciences,University of Otago.


stress disorder. Neuroimage 41, 675–681. doi: 10.1016/j.neuroimage.2008. 03.007


part-based face processing in the human fusiform gyrus. J. Cogn. Neurosci. 12, 793–802. doi: 10.1162/089892900562606


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

AH is employed by Brain Innovation at Biopartners Center, Maastricht, Netherlands, as a member of the Brain Voyager software support team. He assisted in preparing the data with the new version of BrainVoyager. AH's employment by Brain Innovation does not alter the authors' adherence to all the Frontiers Journals policies on sharing data and materials.

Copyright © 2017 Cakmak, Ekinci, Heinecke and Çavdar. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Do's and Don'ts of Neurofeedback Training: A Review of the Controlled Studies Using Healthy Adults

Jacek Rogala\*, Katarzyna Jurewicz, Katarzyna Paluch, Ewa Kublik, Ryszard Cetnarski and Andrzej Wróbel

Laboratory of Visual System, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland

The goal of EEG neurofeedback (EEG-NFB) training is to induce changes in the power of targeted EEG bands to produce beneficial changes in cognitive or motor function. The effectiveness of different EEG-NFB protocols can be measured using two dependent variables: (1) changes in EEG activity and (2) behavioral changes of a targeted function (for therapeutic applications the desired changes should be long-lasting). To firmly establish a causal link between these variables and the selected protocol, similar changes should not be observed when appropriate control paradigms are used. The main objective of this review is to evaluate the evidence, reported in the scientific literature, which supports the validity of various EEG-NFB protocols. Our primary concern is to highlight the role that uncontrolled nonspecific factors can play in the results generated from EEG-NFB studies. Nonspecific factors are often ignored in EEG-NFB designs or the data are not presented, which means conclusions should be interpreted cautiously. As an outcome of this review we present a do's and don'ts list, which can be used to develop future EEG-NFB methodologies, based on the small set of experiments in which the proper control groups have excluded non-EEG-NFB related effects. We found two features which positively correlated with the expected changes in power of the trained EEG band(s): (1) protocols which focused on training a smaller number of frequency bands and (2) a bigger number of electrodes used for neurofeedback training. However, we did not find evidence in support of the positive relationship between power changes of a trained frequency band(s) and specific behavioral effects.

#### Edited by:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia, Spain

#### Reviewed by:

Christian Beste, Technische Universität Dresden (TUD), Germany Carlos J. Gomez-Ariza, Universidad de Jaén, Spain

#### \*Correspondence: Jacek Rogala

j.rogala@nencki.gov.pl

Received: 23 February 2016 Accepted: 02 June 2016 Published: 17 June 2016

#### Citation:

Rogala J, Jurewicz K, Paluch K, Kublik E, Cetnarski R and Wróbel A (2016) The Do's and Don'ts of Neurofeedback Training: A Review of the Controlled Studies Using Healthy Adults. Front. Hum. Neurosci. 10:301. doi: 10.3389/fnhum.2016.00301 Keywords: neurofeedback training, EEG, replicability, protocol efficacy, methodology

# INTRODUCTION

Electroencephalogram (EEG) based neurofeedback (NFB) is a method in which brain activity is modulated via self-induced increases or decreases in the power of selected EEG frequency bands. The subject's control over his or her EEG activity is typically mediated with visual or auditory feedback. EEG-NFB is widely used as a therapy for certain mental, cognitive, and behavioral disorders (e.g. ADHD, for review see Arns et al., 2009); or as supportive training to improve cognitive performance (e.g. attention or memory, for review see Gruzelier, 2013a). Since the pioneering work of Sterman and Friar (1972), the number of publications devoted to EEG-NFB has systematically increased. During the first two decades (1972- 1990), 162 NFB-based studies were published, based on a search in Google Scholar for the keyword 'neurofeedback'. This number increased rapidly in the subsequent decades, reaching 1,260 in nineties and 6,100 between 2001 and 2010. Since 2011, there have been over 9000 publications devoted to various aspects of EEG-NFB. However, despite multiple promising case reports, reliable experimental research is scarce, and the methodologies and results inconsistent.

The validity of EEG-NFB protocols can be measured by unambiguous changes in EEG activity and by changes in the targeted cognitive function. Unfortunately, most of the work conducted in the EEG-NFB field has failed to satisfy the unambiguity criterion for both of the variables and the field itself has shown a big tolerance for violations of scientific methodology. Several reviews have been published which focused mainly on the clinical aspects of EEG-NFB training and summarized the protocols and their effects. One of the first reviews, published by Vernon et al. (2004), concerned the application of EEG-NFB for the treatment of attention deficit hyperactivity disorder (ADHD). The authors discussed various experimental variables (factors), namely, protocol, duration of the training, location of electrodes, signal modality and discussed their possible impact on treatment. The methodological findings of this review recommended visual rather than auditory feedback, with the best results obtained with visual and auditory modalities combined. In addition, the review suggested that at least 20 EEG-NFB sessions were necessary to achieve therapeutic effects, and that beta band/SMR protocols may play a role in successful ADHD treatment. However, studies examining EEG-NFB in the treatment of ADHD have raised important concerns about the validity of the treatment paradigm(s) mainly due to the lack of control groups or evidence of training specificity related to EEG changes (Vollebregt et al., 2014; Zuberer et al., 2015).

The first quantitative review of EEG-NFB methodology was reported by Arns et al. (2009) and focused on controlled ADHD studies. This review supported the view that positive therapeutic effects of EEG-NFB training could be achieved for all symptoms of ADHD. However, in contrast to Vernon's conclusions (Vernon et al., 2004), Arns concluded that inattention and hyperactivity were most sensitive to the non-specific treatment factors (e.g., therapist-patient interactions) and not the EEG feedback itself. Similarly, according to Logemann et al. (2010), non-specific factors may be responsible for the effects observed in healthy subjects. A more recent review by May et al. (2013) carried out on traumatic brain injury patients also concluded that positive therapeutic effects of the EEG-NFB could be achieved, however, all the reviewed studies were missing proper sham control groups (with pseudo-feedback based on an EEG signal recorded from another individual/another session, or an artificial signal generated by a computer) that would undergo fake EEG-NFB training. The same methodological weakness characterized the therapeutic-oriented ADHD studies analyzed in another review published recently by Arns group (Arns et al., 2014). The majority of the reviewed studies included a limited control group such as a semi-active control group (aiming to control non-specific effects like time spent interacting with a computer) or an active control group (comparison with a treatment with known therapeutic effects). Importantly, for all studies using proper sham-control groups the results of EEG-NFB training appeared to be negative. Controls for both clinical treatment and basic research of EEG-NFB training should include sham groups that account for factors such as spontaneous EEG changes, coach-subject interactions or attention effort that accompanies any EEG-NFB training. The effect of the independent variable (the specific EEG-NFB protocol) should be estimated as the measurement in the experimental group minus the same measurement in the sham-control group or alternatively in a group with a different protocol. Without such quantification the experimental results do not constitute evidence for the efficacy of the independent variable, i.e., feedback training.

Unfortunately, a sham-control for the positive effects is ethically challenging, since the use of placebo (which includes also sham groups), instead of clinical treatments, may lead to a deterioration of symptoms (Helsinki Declaration 1964). Thus, well-controlled EEG-NFB studies can only be carried out on treatment-resistant or healthy subjects.

The recent review series by Gruzelier (2013a,b, 2014)stemmed from the increasingly abundant number of studies devoted to EEG-NFB experiments performed on healthy participants. The author presented overwhelmingly positive interpretation concerning the state-of-the-art of EEG-NFB research, however these reviews included multiple studies which did not include proper controls for nonspecific training effects (as mentioned above).

Here, we selectively review only of those EEG-NFB studies which have included an appropriate control group(s) to quantitatively evaluate the effects of this type of training. The fundamental assumptions of EEG-NFB are based on the causal relationships between (1) the desired feedback signal and changes in brain activity, and (2) between induced EEG power changes and behavior. Therefore, we quantitatively assessed the efficacy of the protocols reported to induce changes in these two variables (EEG activity and behavior). We also examined any possible mutual relationship between these two measures. Our evaluation included quantitative estimations of the contribution of individual experimental factors to the final outcome. Each experiment that qualified for analysis was verified to determine whether EEG-NFB training induced a significant modulation of EEG and/or behavioral features, and whether the EEG modulation and behavioral changes occurred concomitantly.

# METHODS

# Inclusion Criteria

The research papers selected and reviewed in this study were identified by (i) a search of web resources (Google Scholar and PubMed) using the following keywords: biofeedback and EEG, neurofeedback and EEG, brain-computer interface and EEG, and EEG operant conditioning; and (ii) an examination of the reference lists of the retrieved articles. From the collected database we selected 86 articles that described experiments carried out using healthy adult participants.

# Two Steps Selection Process

First, we selected articles which included separate control groups and those in which training was performed in two parallel groups, each group with a different EEG-NFB protocol. In latter case we assumed the two groups could serve as a control for one another (Bird et al., 1978; Allen et al., 2001; Keizer et al., 2010a). We included purely behavioral studies that lacked elaboration of the EEG data. In these studies the authors only used EEG for generating feedback signals, the EEG recordings obtained from within the training sessions or the pre- vs. post-training data were not analyzed and outcomes were only focused on the final behavioral effect of training.

This criterion identified eligible 40 publications that described 43 EEG-NFB experiments (a few articles described more than one experimental paradigm) that were performed on healthy volunteers (**Table 1**) and included different types of control groups: sham EEG-NFB, alternative type of activities (relaxation techniques, yoga), or no activity. We carefully evaluated all these experimental paradigms and in the second selection step we excluded studies in which control did not unequivocally address major nonspecific factors. In an illustrative example no activity was required from participants in the control group, not even attendance at the laboratory on a similar schedule. In this case, since the control group was not exposed to nonspecific factors that accompanied EEG-NFB training, it is not known whether the observed behavioral effects resulted from the treatment, or from factors such as regular tasks that require focused attention and/or trainer care. Similarly, the EEG spectrum could also depend on nonspecific behavioral training. Thus, for further analysis we only included experiments in which EEG's were recorded and analyzed in both EEG-NFB and control groups.

During this second selection step the following studies were excluded: (i) experiments that used EEG feedback but did not provide the results of the EEG analysis in the article; (ii) experiments that did not provide a description of the EEG results for the control group which precluded reliable analysis; (iii) experiments that did not engage the members of the control group in any type of activity ("non-intervention group") and/or did not describe the control group manipulation); (iv) experiments that used non-responders (subjects not showing expected changes due to neurofeedback training) identified after completion of the training as a control. The identification of "non-responders" has been reported in published EEG-NFB experiments and clinical applications. It is possible that susceptibility to EEG-NFB training differs greatly between individuals, which alone demands further investigation (Weber et al., 2011; Wan et al., 2014; Nan et al., 2015). However, at present, it is not appropriate to restrict the experimental group to the participants with clear effects and exclude non-responders from the analyses. The set of papers selected for final analysis was composed of 28 experiments described in 25 studies (listed in **Tables 1**, **4**). Supplementary Table 1 lists the eliminated studies and reasons for exclusion.

# Key Experimental Factors

Many factors may potentially influence the success or failure of EEG-NFB procedures. Some factors can be controlled in our experimental paradigms, other cannot. An example of the latter is low training susceptibility of a subgroup of participants frequently referred to as non-responders. In order to identify potential non-responders and exclude them at the initial stages of screening for NFB training a new line of research has emerged focused on individual factors that might predict training success (Weber et al., 2011; Nan et al., 2015). Here we concentrate on the neurofeedback methodology, in which important factors influencing training success may be: feedback modality, training intensity, choice of EEG band(s) used for the feedback signal, and the number and positions of electrodes from which feedback signal is recorded. Since the role of these factors has not yet been quantified, we attempted to evaluate their influence in NFB training in the selected studies. Several other factors may also be important for EEG-NFB training, such as the age of the trainees, their personal traits and beliefs regarding EEG-NFB training (e.g., Witte et al., 2013), and trainer behavior; however, they could not be analyzed due to either insufficient variability of the data (e.g., with respect to age - in most cases the participants were university students) or a lack of a sufficient number of reports regarding a specific factor across the investigated experiments. In this study, we identified five factors which were investigated in a sufficient number of studies and had appropriate variability for statistical analyses in order to determine their influence on the training results.

Definitions used for all the analyzed factors were as follows:


Since the factors listed above were differentially described in the individual studies, we applied common categorization scales for analyses. The range of these scales is presented in **Table 2**. The EEG bands, which were also differently specified in the reviewed publications, were grouped based on their frequency ranges to six unified categories defined in **Table 3**. **Table 4** summarizes the parameter values for the database of all 28 experiments.

## Selection of Thematic Groups

To assess the effectiveness of various EEG-NFB protocols and address the issue of the large diversity of applied protocols and investigated putative behavioral effects, we divided the database into groups based on the (i) EEG features used for training purposes and (ii) the behavioral tasks administered to the subjects (**Figure 1**). We then analyzed the data to identify the correlations


TABLE 1 | List of the analyzed studies (references in the second column) with their characteristics, including raw values of the experimental factors used for the analysis.

In a few articles the authors did not supply sufficient information regarding the experimental paradigm (denoted as 'no data' in the table). The "+" and "–" signs in the "Protocol" column denote the enhancement or suppression of particular frequency bands.



between training factors (defined in paragraph 2.2.) and the observed EEG modulation and/or behavioral changes.

(i) We selected three groups based on the frequency bands– theta (4–8 Hz), alpha (8–12 Hz) and beta1 (12–20 Hz), with a minimum of n = 5 experiments. Both, the up- or downregulation of the band (as defined in **Table 3**) or its fraction were included in the same protocol group (i.e., the upregulation of "low alpha" and the down-regulation of "high alpha" were both considered alpha protocols aimed at the assessment of the alpha band protocol effectiveness). This approach enables the estimation of whether training based on a particular frequency band induces the desired changes in the EEG or behavioral domains. For other single and multi-band protocols, we did not find a sufficient number of experiments for statistical evaluation.

(ii) Within the behavior-oriented experiments, we also identified two groups (based on the behavioral tests described): An Attention Group: experiments targeted to change

#### TABLE 3 | Definition of the EEG frequency bands used in this review.


#### TABLE 4 | Success/Failure scores for studies (references in the second column) that qualified for analysis.


Training results: 1, training success; 0, training failure. "EEG" column lists the results on the modulation of EEG features, "Behavior" column contains the list results in the behavioral domain, G, general effects of the training obtained in any of the investigated behaviors; A, attention; M, memory. Values in column G may also include effects not classified to attention (A) and memory (M) groups.

performance measured by attention tests (e.g., TOVA– Test of Variables of Attention, SSRT–Stop Signal Reaction Times, CPT–Continuous Performance Test or others) and a Memory Group: studies aimed at the identification of any type of memory improvement (estimated by different types of recall and recognition tasks; **Figure 2**).

#### Definition of Successful Training

In the reviewed articles, the efficacy of EEG-NFB training in the induction of EEG modulation was typically evaluated by comparing the amplitudes (absolute or relative) of the monitored EEG bands (in the case of single band protocols) or the amplitude ratios between the manipulated bands (in the case of multiband protocols, i.e., theta/alpha or SMR/theta ratios) recorded before and after the whole training (interpreted as delayed effects of neurofeedback or, more general, brain plasticity [50]) or within/between training sessions. Significant differences of these parameters between the experimental and control groups indicated successful training in the EEG domain. The changes in behavioral performance were evaluated based on tests or measures typical for the given type of activity.

We reviewed each article and individually qualified the training results as a failure (0) or success (1) based on the statistical measures used by authors. Training was considered as being a success:


If direct (i) or (ii) and indirect (iii) comparisons for EEG or behavior were not described in the articles, we qualified the outcome of the training as giving no results. We considered EEG-NFB training as successful if either EEG or behavioral change reached significance. Similarly, a study was defined as a success when one of several results available for one category (e.g., scores from several attention tests or an amplitude change of different EEG bands) was unambiguously described as significantly changed as a result of the training.

# Methods of Statistical Analyses

We estimated the correlation of each training factor with the EEG and behavioral success scores for all experiments and specific groups (defined in the paragraph' Selection of thematic subgroups'). An analysis was performed only for groups that comprised five or more experiments. For consistent comparison of all factors between all experimental groups, we used two nonparametric tests: X 2 to test for independence of training outcome and the Kendall's T (tau) rank correlation coefficients to precisely assess the strength and direction of any significant association. Although Kendall's T coefficient is similar to the Spearman rank correlation statistic, which is used to measure rank correlation, the advantage of using Kendall's T is in the direct interpretation of the probability of observing concordant and discordant pairs (Hauke and Kossowski, 2011).

behavioral/cognitive effects. The groups are not mutually exclusive; both types of data were often analyzed in the same study, which resulted in multiple classifications. Some experiments also investigated more than one behavioral goal (Table 4). Protocols that could not be attributed to any of the specific subgroups (lowermost boxes "Other") contributed only to the general analysis.

The partial η 2 (Bakeman, 2005; Lakens, 2013) can be used to estimate effect size from published studies using information about F statistic provided by the authors. It is suitable for the results extracted from multiple experiments with similar design and testing one (or equivalent) dependent variable. Unfortunately, the designs used in the reviewed papers did not meet these criteria. First of all, there was a great variety of EEG features (dependent variables) that underwent treatment. Next, within the same trained EEG feature, the statistics were available either for within group or between group analyses. Often, the statistics were incompletely described, which made calculation of effect size impossible. The details of statistical results provided in the reviewed articles are presented in the Supplementary Table 2.

After grouping the available effect sizes according to the training protocol and source of the effect (within group, between groups, interaction) the groups were not numerous enough for quantitative comparisons. We managed to construct only one group of 8 reports regarding alpha training from which we could extract F statistics for within subject effects which enabled evaluation of unspecific changes of this band.

Since the calculation of statistical power for all the other training factors was not possible, we used a more gross, binary approach. Instead of using F values we qualified experiments as a failure (0) or success (1, Section Definition of successful training) and calculated a success ratio (SR) defined as the percentage of successful studies within the total number of studies of the given set.

# RESULTS

# General Characteristics of the Participants

The average age of the participants ranged from 20 to 65 years, with median = 24 (most participants were students) and mean = 30 ± 14. The difference between the mean and median age was caused by two experiments in which the participants' ages ranged between 55 and 75 years (6, 41 in **Table 1**, correspondingly).

The mean number of subjects who participated in a single experiment was 26 ± 10, with an average experimental group size of 13 ± 6 subjects. The diversity of the relatively small number of qualified experiments permitted selection of two behavioral groups: Attention (n = 6) and Memory (n = 9) and three EEG protocol groups: Theta (n = 5), Alpha (n = 9), and Beta (n = 5) fulfilling the criterion of minimum group size. Seven out of nine experiments in the Alpha Group, four out of five in Beta Group and all five Theta Group experiments included also behavioral tests.

# EEG-NFB Training Methods (EEG Protocols, Training Intensity, and Signal Modality)

Most of the experiments (23 of 28) used single EEG band protocols for EEG-NFB training, and most of these (17) intended to up-regulate the amplitudes of the trained band: theta (n = 4), alpha (n = 5), beta (including SMR, n = 5), gamma (n = 2), or slow cortical potentials (SCP, n = 1). The remaining five experiments (18%) used multi-band protocols that aimed to change the ratio of the amplitudes of the employed bands (**Table 1**).

The Memory Group (n = 9) was composed of experiments using only single band protocols: five experiments had theta protocol, three beta and one alpha. In the Attention Group (n = 6), the majority of experiments used a theta protocol (n = 3), and the three remaining studies used beta, gamma and SMR/theta protocols.

The daily training sessions across all experiments typically consisted of multiple, few minutes long runs interrupted by short pauses. The average training was 7.7 ± 3.8 sessions with 3 ± 2.4 day intervals; thus, the training intensity was relatively low compared with EEG-NFB training arrangements in clinical practice [4]. The average Training Intensity Index was 5.5 ± 3.4 for all experiments. In 46% of the experiments the EEG-NFB modality was auditory, in 40%, it was visual, and in 14%, it was mixed (visual and auditory).

#### Location of Electrodes

In the two behaviorally oriented groups (Attention and Memory) we did not find any consistency in the placement of feedback electrodes. Out of six experiments, oriented on training of attention, five different electrode(s) locations were used. In the Memory Group in five experiments the electrodes were

placed at Fz or included this location, in the remaining four experiments we did not note any consistency in the placement of electrodes.

Among the EEG protocols most consistent electrode placement was observed in the Theta Group where all single electrode setups used the Fz location and one of two multielectrode setups used frontal locations including Fz. All experiments with training provided from Fz solely or together with other locations succeeded in evoking EEG modulation (seven experiments, four using theta protocol, two using alpha protocol and one using gamma protocol). In the Beta Group four experiments used mainly electrodes located over central sulcus (in the three experiments electrodes were positioned at Cz and in two in C3 locations according to the 10–20 international system) and one in the F3 position. In the Alpha Group the most commonly used locations were frontal (Fz, F3, F7) and central (C3, C4), one experiment used occipital (Oz) and one parietal (Pz) location.

# Dependence of Successful Training on the Analyzed Factors

According to the descriptions provided by the authors, EEG-NFB training yielded significant results in the EEG domain for 17 out of total 28 experiments which resulted in a 60.7% success ratio (SR) and in 10 out of 20 behaviorally-oriented experiments, which provided a SR of 50 %. SR values are given in **Table 4**. We did not find a significant relationship between behavioral or EEG success in general (X <sup>2</sup> = 0.91, p = 0.33) or any of the considered training factors (X 2 and T statistics for all analyzed factors are given in **Table 5**) except for number of bands which negatively correlated with the resulting EEG changes (X <sup>2</sup> = 4.2303, p = 0.0397; T = −0.3887, p = 0.0472). We also observed a tentative relation between SR and training intensity index (TI). In a sample of 6 studies, which had at least four training sessions completed on consecutive days (excluding weekends) SR was 100%.

In the consecutive steps of our review we analyzed the effectiveness of training for experiments classified to smaller groups specified by EEG or behavioral protocols.

## Efficacy of Employed EEG Protocols

Theta and alpha EEG band appeared to be highly susceptible to EEG-NFB procedures resulting in SR's of 80 and 89% respectively,

TABLE 5 | Dependence (X <sup>2</sup>) and correlation (T) values for EEG training success and analyzed factors.


Correlation was considered significant when p < 0.05 for both X <sup>2</sup> and Kendall T correlation.

trainings targeted at the beta band were less effective with a SR of 40%. The experiments using multi-band protocols were very diverse and only one of five yielded changes in EEG spectra, which is in line with our general conclusion of negative relation between number of bands used for training and induced EEG changes. In the behavioral domain out of three investigated protocols only theta was highly effective at inducing behavioral changes with a SR of 80%, the least effective was alpha protocol with a SR of 28% (**Figure 2**).

# Efficacy of the Training Aimed at Specific Behavioral Effects

The SR of the desired behavioral effects in Attention and Memory Groups reached 50 and 67%, respectively (**Figure 2**). The effectiveness of inducing EEG changes was even higher with an SR of 67% (attention) and 78% (memory). A comparison of the chosen experiments that had positive results in the EEG domain and included behavioral tests before and after the training did not reveal any frequency band that was specifically effective for increased performance in attention or memory tests. In the Memory Group, of six experiments which yielded positive training results in both domains three experiments used theta, two SMR and one alpha protocol, additionally one theta protocol resulted with reported lack of significant results in the behavioral domain. In the Attention Group, there were four eligible experiments in which three cases had positive results in both EEG and behavioral domains, but each of them had a different protocol.

# Summary of the Efficacy of Employed EEG Protocols

Our analysis demonstrated: (i) the effectiveness of the theta and alpha protocols at inducing EEG modulations; (ii) a negative correlation between the number of trained bands and success in evoking EEG modulations, (iii) a lack of specificity of the popular EEG-NFB protocols in obtaining desired behavioral changes.

# DISCUSSION

The theoretical basis for EEG-NFB is that brain activity, modified according to the targeted changes of the EEG signal, can cause a plastic reorganization within the involved brain networks and lead to an expected improvement in a specific behavioral task (Anguera et al., 2013). The findings of this review cannot unambiguously disprove or support this hypothesis, since behavioral effects were not convincingly validated by any specific training protocol. This ambiguity may results from a limited number and high diversity of the available descriptions of wellcontrolled NFB studies.

The scientific community has produced a multitude of articles with enthusiastic case reports on EEG-NFB training. However, a rigorous scientific approach to EEG-NFB is rare, and experiments performed on healthy participants to study the effectiveness and/or mechanism of training are very limited; we identified only 86 relevant reports. From this list we selected 43 EEG-NFB experiments which described control groups. We next assessed the quality of the control paradigms and excluded all experiments that did not allow for unambiguous attribution of the results to the effects of EEG-NFB training.

The final, small number of accepted studies (n = 28) is alarming because it suggests that the field of EEG-NFB research is dominated by poorly controlled experiments. Analogous conclusions were drawn in other quantitative reviews of EEG-NFB research. In her review, Niv (2013) attempted to assess the efficacy of EEG-NFB applied to patients with neurological disorders (e.g., ADHD, autism, epilepsy) and identified only 22 well-controlled experiments. The review concluded that "only few controlled studies exist and more and better-organized research is necessary to confirm the efficacy and effectiveness of neurofeedback". The two meta-analyses of Arns et al. (2009) and Lofthouse et al. (2012), who investigated the efficacy of EEG-NFB treatment applied to children diagnosed with ADHD, included only 15 and 14 studies, respectively.

In our analysis of 28 well-controlled studies, we calculated the SR separately for EEG (60.7%) and behavioral (50%) effects. These numbers demonstrated that EEG-NFB training could influence both the amplitude of the chosen EEG signal and behavior. Because the effects were not consistent, it is essential to carefully evaluate all factors of experimental design to select those that promote successful training (see also review by Vernon et al., 2009). The fact that most of the investigated factors did not correlate with training success is worth consideration itself. The intuitively important factors such as feedback modality did not affect training results. SR for auditory and visual paradigms were similar (53% and 63%, respectively). Training success did show not dependency on the intensity of the stimuli (X <sup>2</sup> = 15.9, p = 0.25). High SR (75%) averaged from four experiments using mixed modality stimuli should be treated with caution, however Vernon et al. (2004) in his review suggested the optimal results with combined, auditory and visual feedback.

The assessment of the SR in the selected experiments was possible for theta, alpha and beta bands. From those three protocols the most effective in changing the EEG spectra were training involving the theta or alpha band. Alpha training calls for special attention for its popularity despite the low behavioral SR (out of nine well controlled EEG-NFB studies with alpha training, seven aimed to induce behavioral improvement, and only two were successful (Allen et al., 2001–reported mood increase and Reis et al., 2015–reported improvement of memory; comp. **Table 4**). It seems that the alpha band is highly susceptible to various manipulation. This notion is supported by Williams (1977) who used two sham-feedback groups (with no real experimental group) and reported that an increase in alpha band amplitude was achieved only in subjects who had been told that they were involved in the real alpha-inducing experiments. The subjects who were informed that they participated in the control group did not develop this change. Quandt et al. (2012) induced changes in the upper alpha and beta bands in subjects who observed actions performed by other individuals, further supporting the high susceptibility of the alpha band to nonspecific manipulations. Finally, Dempster and Vernon (2009) showed that an increase in alpha band amplitude often observed during EEG-NFB trainings might merely represent return to baseline level disturbed by training procedure. This rebound might be enhanced in EEG-NFB groups and impaired when subjects are engaged in e.g., sham feedback. Thus, inclusion of baseline measurements in scientific reports might be a valuable addition in studying inferences about EEG-NFB effects on EEG. In 4 out of 8 reviewed studies, alpha was found to be changing in both experimental and control conditions. In order to estimate non-specific alpha changes we pooled together increases and decreases of alpha over both sessions and epochs in all experimental conditions, resulting in average effect size 0.32 (STD = 0.15) (positions 6, 8, 9, 11, 31, 32, 34, 36 in Supplementary Table). The high susceptibility of the alpha band to neurofeedback manipulation, as resulting from our review, was, however, not accompanied by any specific behavioral effects.

There is an assumption behind NFB practice concerning specific protocols that target particular behavioral goals (Egner and Gruzelier, 2004; Keizer et al., 2010b; Gruzelier, 2013a,b; Enriquez-Geppert et al., 2014). However, our analysis of the training protocols that aimed to improve attention or memory performance does not enable such attribution of any of the frequency bands used for training to these two behavioral goals showing no privilege of any specific frequency band. A similar observation was reported by Arns et al. (2009) in conclusion of their meta-analysis: while EEG-NFB therapy provided, on average, positive therapeutic effects for children with ADHD, there was no difference in the effect size for different EEG-NFB protocols (SCP [0.5–2 Hz], SMR [12–15 Hz]/theta and beta/theta).

This lack of specificity between the frequency band of an EEG-NFB protocol and its behavioral effect does not necessarily discount the basic concept of behavioral correlates of specific EEG frequencies (Anguera et al., 2013). When investigating the roots of inconsistent EEG-NFB effects, a more careful examination of the old issue of spatial resolution and source localization in EEG recording is warranted (Zuberer et al., 2015). The recorded EEG signal can be generated by an infinite number of different sets of current sources, even with the assumption of an infinite number of recording electrodes (Fender, 1987). In the case of EEG-NFB, where it is expected to modify the activity of the local networks (Wood and Kober, 2014), the signal collected by a few EEG-NFB electrodes is generated also by other brain regions than those desired as a training target (Witte et al., 2013). A partial solution to this problem could be high resolution EEG combined with EEG source localization methods to enable training focused on specific regions. Although, results of preliminary studies, using multi-electrode EEG-NFB such as LORETA-EEG-NFB, Z-score-EEG-NFB, or blind source separation, seem promising (Cannon et al., 2006, 2009; Koberda et al., 2012; Bauer and Pllana, 2014; White et al., 2014) more experiments with sham control groups are necessary to verify this approach as a valuable EEG-NFB tool. The source localization problem worsens in experiments where a single EEG-NFB electrode is used. Unfortunately, a single electrode approach is the most common practice in EEG-NFB training and therapy. We argue that such a protocol would not be able to precisely affect the putative regions responsible for targeted behavior.

It is important to underline that several thoroughly prepared experiments clearly show positive effects of EEG-NFB training (Allen et al., 2001; Hoedlmoser et al., 2008; Keizer et al., 2010a). However, in other (equally thorough) studies, EEG-NFB was not found to be successful (Egner et al., 2002; Berner et al., 2006; Logemann et al., 2010). As noted above, this discrepancy might stem from the lack of optimal training paradigms. The presented quantitative review offers suggestions for concrete modifications of the used paradigms, which might improve the results of EEG-NFB training. We subsequently discuss two particular factors that originate from experiments collected in this review.

# The Don'ts: The Number of EEG Bands Used for Training

One of our findings, is the negative correlation between the number of bands used to compose the feedback signal and the success of training, i.e., complicated protocols which promoted and/or inhibited several bands worsen the final results. This may be related to the interdependence of the EEG frequencies, i.e., their susceptibility to follow the modulation of the other, most often neighboring frequencies. Band interdependence has been reported by Ros et al. (2013) who demonstrated that training directed for up-regulation of the alpha band was accompanied by changes in the flanking frequencies (theta and beta bands). The interdependence of EEG frequencies may also affect the expected behavioral changes because the modulation of neighboring bands trained in opposite directions may cancel out the effects of up-regulation of the targeted band. Similarly, the observed behavioral change may result from an up-regulation of bands neighboring the band that was intentionally trained. This situation might at least partially explain why so many authors observed positive behavioral results despite the lack of noticeable modifications of the trained EEG band.

However, the issue requires deeper investigation as there is a contradictory example of theta band amplitude increase not accompanied by an increase in the flanking alpha frequency amplitude (Enriquez-Geppert et al., 2013).

## The Do's: Location of EEG-NFB Electrodes

The analysis of experiments from the Theta Group showed that neurofeedback mediated by the electrodes positioned above wellidentified sources of a given EEG frequency (e.g., frontal regions in case of frontal-middline theta training) might bring expected changes in the EEG spectrum (Keizer et al., 2010a; Boxtel van et al., 2012; Wang and Hsieh, 2013). In general, however, the targeted frequencies are generated by many structures of dispersed neural networks and the use of a one or two, arbitrarily located EEG-NFB electrodes (and they reference), does not allow for non-ambiguous estimation of the brain sources involved. A way to partially mitigate this issue might be the use of multiple electrodes located in the appropriate position for capturing activity of the brain regions associated with the targeted behavioral functions. Indeed, the rare experiments in which electrode location was justified for physiological reasons appeared to be successful (Enriquez-Geppert et al., 2013, 2014), which likely results from a substantial contribution of relevant sources into feedback signals.

Suitably positioned multiple electrodes increase the chances of collecting feedback signals from many regions engaged in the target activity. Attention is a cognitive function with relatively well-known networks of involved brain structures and a popular goal for improvement in many EEG-NFB paradigms. The frontoparietal network (Buschman and Miller, 2007) is one of the brain systems that support attentional mechanisms and it is not surprising that most protocols that successfully increased attention used feedback signals from several EEG-NFB electrodes located at frontal and parietal locations (Allen et al., 2001; Becerra et al., 2012; Enriquez-Geppert et al., 2013), or from a single electrode located over frontal areas (Vernon et al., 2004). In contrast, an experiment with a similar goal, which used C3/C4 electrodes (Boxtel van et al., 2012) to record feedback signals did not achieve significant behavioral changes after EEG-NFB training. We posit that the locations of the EEG-NFB electrodes in many experiments reviewed in this analysis were non-optimally placed for an expected result. It is, therefore, important that future EEG-NFB experiments should identify optimal training electrode locations based on existing knowledge regarding the anatomical and functional substrates of various behavioral tasks. It appears feasible that for a dispersed target network new training paradigms should be developed in which the feedback signals would be weighed by a component analysis from many recording electrodes.

# CONCLUSIONS

We posit that neurofeedback methodology used in the majority of the reviewed experiments did not enable proper targeting of the brain regions responsible for control over the desired behavioral changes. This may explain the lack of correlation

## REFERENCES


between the changes induced in the trained EEG signal and the modification of the targeted behavior, as well as lack of correlation between the remaining analyzed factors and training success. We, therefore, recommend the following factors for improved EEG-NFB training efficacy:


# AUTHOR CONTRIBUTIONS

JR Review concept, data analyses, main conclusions, article drafting. KJ Data analyses, verification of analytical methods, proof reading. KP Data analyses, proof reading. EK Conclusions verification, data analyses, proof reading. RC Data analyses, verification of analytical methods, proof reading. AW Verification of the review concept and of the conclusions.

## ACKNOWLEDGMENTS

This work was supported by the Polish National Science Centre grant 2012/07/B/NZ7/04383.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2016.00301


processing: an SMR neurofeedback training study. Clin. Neurophysiol. 126, 82–95. doi: 10.1016/j.clinph.2014.03.031


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Rogala, Jurewicz, Paluch, Kublik, Cetnarski and Wróbel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Near-Infrared Spectroscopy-Based Frontal Lobe Neurofeedback Integrated in Virtual Reality Modulates Brain and Behavior in Highly Impulsive Adults

Justin Hudak<sup>1</sup> \*, Friederike Blume<sup>1</sup> , Thomas Dresler1,2, Florian B. Haeussinger<sup>2</sup> , Tobias J. Renner<sup>3</sup> , Andreas J. Fallgatter1,2,4, Caterina Gawrilow1,5,6 and Ann-Christine Ehlis1,2

<sup>1</sup> LEAD Graduate School & Research Network, University of Tübingen, Tübingen, Germany, <sup>2</sup> Department of Psychiatry and Psychotherapy, University Hospital Tübingen, Tübingen, Germany, <sup>3</sup> Department of Child and Adolescence Psychiatry, University Hospital Tübingen, Tübingen, Germany, <sup>4</sup> Center for Integrative Neuroscience, University of Tübingen, Tübingen, Germany, <sup>5</sup> Department of Psychology, University of Tübingen, Tübingen, Germany, <sup>6</sup> Center for Individual Development and Adaptive Education of Children at Risk, Goethe University Frankfurt, Frankfurt, Germany

#### Edited by:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia (UNED), Spain

#### Reviewed by:

Sabrina Brigadoi, University of Padua, Italy Hasan Ayaz, Drexel University, United States

> \*Correspondence: Justin Hudak

justin.hudak@lead.uni-tuebingen.de

Received: 12 May 2017 Accepted: 08 August 2017 Published: 04 September 2017

#### Citation:

Hudak J, Blume F, Dresler T, Haeussinger FB, Renner TJ, Fallgatter AJ, Gawrilow C and Ehlis A-C (2017) Near-Infrared Spectroscopy-Based Frontal Lobe Neurofeedback Integrated in Virtual Reality Modulates Brain and Behavior in Highly Impulsive Adults. Front. Hum. Neurosci. 11:425. doi: 10.3389/fnhum.2017.00425 Based on neurofeedback (NF) training as a neurocognitive treatment in attention-deficit/hyperactivity disorder (ADHD), we designed a randomized, controlled functional near-infrared spectroscopy (fNIRS) NF intervention embedded in an immersive virtual reality classroom in which participants learned to control overhead lighting with their dorsolateral prefrontal brain activation. We tested the efficacy of the intervention on healthy adults displaying high impulsivity as a sub-clinical population sharing common features with ADHD. Twenty participants, 10 in an experimental and 10 in a shoulder muscle-based electromyography control group, underwent eight training sessions across 2 weeks. Training was bookended by a pre- and post-test including go/no-go, n-back, and stop-signal tasks (SST). Results indicated a significant reduction in commission errors on the no-go task with a simultaneous increase in prefrontal oxygenated hemoglobin concentration for the experimental group, but not for the control group. Furthermore, the ability of the subjects to gain control over the feedback parameter correlated strongly with the reduction in commission errors for the experimental, but not for the control group, indicating the potential importance of learning feedback control in moderating behavioral outcomes. In addition, participants of the fNIRS group showed a reduction in reaction time variability on the SST. Results indicate a clear effect of our NF intervention in reducing impulsive behavior possibly via a strengthening of frontal lobe functioning. Virtual reality additions to conventional NF may be one way to improve the ecological validity and symptom-relevance of the training situation, hence positively affecting transfer of acquired skills to real life.

Keywords: NIRS, neurofeedback, virtual reality, impulsivity, ADHD

# INTRODUCTION

fnhum-11-00425 August 31, 2017 Time: 17:8 # 2

Impulsivity refers to the inability to inhibit behavioral responses to urges created by external stimuli as well as internal desires, often brought about by the current environment. It is a ubiquitous behavioral trait found in healthy individuals as well as those with developmental disorders such as attention-deficit/hyperactivity disorder (ADHD), substance-use disorders, binge eating disorders, and others (Whiteside and Lynam, 2001; Bari and Robbins, 2013). Individual impulsive episodes, such as drunk driving, can negatively impact the lives of the impulsive individual, as well as the lives of others. On neuropsychological tasks, impulsive behavior is associated with certain types of errors, typically on conditions requiring inhibitory control. For example, the more impulsive an individual is, the more commission errors [i.e., false alarms (FA)] they make on go/no-go tasks (Aichert et al., 2012; Weidacker et al., 2016). Impulsive subgroups such as binge eaters (Hege et al., 2014) and binge drinkers (Henges and Marczinski, 2012) also make more FA than healthy controls.

From a neuroscientific perspective, impulsivity is strongly linked with dysfunctional frontal lobe activity and frontal lobe excisions (Fallgatter and Herrmann, 2001; Bari and Robbins, 2013). Development of impulse control is the result of maturation of the cognitive control network (CCN; Casey et al., 2008; Steinberg, 2008; in Shulman et al., 2016) which consists of the lateral prefrontal cortex and its connectivity with other frontal, striatal, motoric, and parietal regions (for comprehensive reviews see Cubillo et al., 2012; Rubia et al., 2013). Highly impulsive subgroups require a stronger activation of the CCN than healthy controls to achieve comparable response inhibition (Horn et al., 2003; Ding et al., 2014). Additionally, evidence for negative correlations between trait impulsiveness and activation as well as connectivity in prefrontal brain structures has been provided (Farr et al., 2012). Furthermore, there is evidence that the bilateral dorsolateral prefrontal cortex (dlPFC) may be involved in inhibitory control as transcranial direct current stimulation (tDCS) of the left dlPFC led to improved inhibitory control on a go/no-go task in participants with ADHD (Soltaninejad et al., 2015).

Neurofeedback (NF), a therapeutic technique in which participants are tasked with regulating their own brain activity, is used as a way to effect long-term change in abnormal brain activity (Arns et al., 2013). Thereby, electroencephalography (EEG)-based NF protocols have shown promise in reducing impulsive symptoms in ADHD (Gevensleben et al., 2012, 2014a). However, these protocols have had mixed effects, particularly as they are often based on brain-frequency imbalances that are highly heterogeneous within subjects (Holtmann et al., 2014). A recently emerging NF protocol for ADHD using functional near-infrared spectroscopy (fNIRS) to measure the blood oxygenation level dependent (BOLD) response within the dlPFC has several potential advantages over traditional EEG protocols (Marx et al., 2015).

Compared to EEG, fNIRS has improved spatial resolution and better correspondence of channel to underlying brain region, as well as reduced sensitivity to movement-based artifacts, making it ideal for NF training of circumscribed brain areas in motorically restless individuals (e.g., ADHD patients, children, etc.). Furthermore, evidence from BOLD-based NF paradigms suggest that they yield effects faster than their EEG-based counterparts. In a pilot study with children with ADHD, significant symptom improvements were found after only 12 sessions of fNIRS-based dlPFC training (Marx et al., 2015). Sherwood et al. (2016) found that – in healthy subjects – achieving control of the BOLD response in the dlPFC is possible after just five sessions of real-time functional magnetic resonance imaging (fMRI) NF training. Current EEG protocols, on the other hand, require between 25 and 50 sessions to realize significant effects (for a review and meta-analysis see Begemann et al., 2016). However, despite the promise of BOLD-based protocols as a potential treatment for impulsivity, such protocols still need to translate from laboratory to real-world settings.

Neurofeedback treatment is often criticized for its lack of ecological validity. Simply put, strategies of brain regulation learned in a lab setting may not translate well into the real world. Those with impulsivity struggle in the classroom where academic achievement is negatively correlated to impulsivity severity (Spinella and Miley, 2003). Therefore, any effective strategies developed in NF therapy should ultimately be applied in the classroom (or a similar real-world) setting, a concept known as transfer (e.g., Strehl, 2014). However, NF protocols – at this point – cannot be utilized in a real scholastic setting as they require large and delicate equipment, and students need to concentrate on the current lesson. An increasingly viable option, virtual reality (VR), has been used for assessment of clinical symptoms of ADHD in the classroom (Muhlberger et al., 2016) and with an EEG-based NF protocol designed to reduce inattentive and impulsive behavior in adolescents displaying behavioral problems (Cho et al., 2004). In the latter study, the VR group showed the greatest improvement following NF training on attention-related tasks relative to both a control group and a 2-D classroom group, but no difference in impulsivity. However, this study was controlled with a waiting group, thus not ruling out non-specific effects of NF training, such as continuous performance monitoring, reinforcement of compliance, and the idea that one is being treated by a sophisticated technology and professional (Gevensleben et al., 2012, 2014b). Furthermore, the NF was a separate module, not incorporated into the experience of the class itself.

Based on these findings, we developed a virtual classroombased fNIRS NF protocol (for study design see Blume et al., 2017) in order to directly facilitate transfer of NF training effects to the classroom. Importantly, feedback is delivered in the form of gentle dimming or brightening of the overhead lighting which does not distract the participant from the experience of being in a classroom. In the present study, we implement a 2 week accelerated protocol in highly impulsive young adults, consisting of eight training sessions (one per day) which were bookended by a pre- and a post-test to assess behavioral changes during a go/no-go, n-back, and stop-signal task (SST). Changes in frontal lobe function were also assessed during the go/no-go and n-back tasks using fNIRS. To control for the previously mentioned non-specific effects of the NF training, we used bilateral musculi

supraspinatus-based electromyography (EMG) biofeedback (BF) (see Marx et al., 2015; Mayer et al., 2015). This method has been successfully used in the aforementioned studies as a control for NIRS-based NF. Sham-based NF control groups (e.g., targeting putatively unrelated brain areas) invite ethical concerns, as training random areas may have unforeseeable negative effects on the participant, who is often recruited on the premise that the training will be helpful to their condition (Holtmann et al., 2014). Furthermore, participants sometimes become aware that they are part of some sham conditions (particularly if the sham feedback contains data completely unrelated to the current training situation, e.g., training data of another participant), or even assume they are part of one when they are not, leading to both drop-outs and reduced motivation, a critical aspect for any successful NF training (Birbaumer et al., 1991; Gevensleben et al., 2014a). As we did not explicitly inform participants that EMG BF was a control condition, they were less susceptible to this motivation loss.

We hypothesize that the fNIRS-based NF group will show an improvement in dlPFC activity during the cognitive tasks (go/no-go and n-back) relative to the EMG-based control group following the treatment program. We also expect the NF group to show a reduction in FA (go/no-go task) as well as reduced stop-signal reaction time (SSRT) on the SST from pre- to post-test measurement (as measures of response inhibition). As secondary outcomes, we expect reaction time (RT) and RT variability [standard deviation of the reaction time (SDRT)] to decrease for the NF group on all tasks, as the dlPFC plays a role in a multitude of executive functions. The expected neurocognitive improvements following frontal lobe focused fNIRS-based NF in a virtual training environment would confirm the general feasibility of a combination of NF with virtual training scenarios which could – in the long run – increase the ecological validity of NF interventions.

# MATERIALS AND METHODS

# Participants

We recruited 22 students from the University of Tübingen out of a larger group of potential participants who had completed the Barratt Impulsiveness Scale (BIS; Barratt, 1959) using an online format. Based on their high BIS scores (MBIS = 85.75, SDBIS = 9.36), these students were selected and invited to an in-person screening for ADHD [according to criteria from the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR; Sass et al., 2003)] using two subtests from the Homburger ADHD Scale for Adults (HASE; Rösler et al., 2008), the German versions of the Wender Utah Rating Scale (WURS-K) and the ADHD-Self Assessment Scale (ADHS-SB). Participants meeting the criteria for an indication of ADHD under this context (WURS-K > 30 and ADHS-SB > 18) were excluded from the study and informed about the outpatient ADHD program at the Department of Psychiatry and Psychotherapy at the University Hospital Tübingen (n = 1). The remaining participants (n = 21; nine female, MAge = 23.4, SDAge = 2.8) reported no history of serious or chronic illness, neurological, or psychiatric disorders.

This study was approved by the Ethics Committee of the Medical Faculty of the University and the University Hospital of Tübingen and all procedures were in accordance with the Helsinki Declaration of 1975, as revised in 2013. Participants provided written informed consent and were compensated with 100 Euros for completing the duration of the training including pre- and post-measurements (10 sessions, 1 h each, over 2 weeks). One participant dropped out of the study due to feeling ill from the VR and was payed pro-rata of 10 Euro per hour participated.

# Study Design

The study followed a randomized, controlled experimental design. Participants were randomized (10 participants in each group) to either eight fNIRS-based NF (experimental) or eight EMG-based BF (control) sessions taking course daily over two weeks (Tuesday to Friday in the first week, Monday to Thursday in the second week). We randomized without stratifying for any other variables. Groups did not differ significantly in gender (NF: 4 female, 6 male; BF: 5 female, 5 male; Fisher's exact test, p = 0.50), or in age (MBF = 22.9, SD = 2.88; MNF = 23.9, SD = 2.77; t(18) = 0.80, p = 0.44). The pre-test and post-test were exactly the same and included a go/no-go task, an n-back task, and an SST. The pre-test took place on the Monday of the first week, while the post-test occurred on the Friday of the second week. Order of the pre- and post-test measures was counter-balanced between subjects.

# Virtual Classroom Scenario

The participants were seated and wore the Oculus Rift (Oculus Rift, United States<sup>1</sup> ) VR head-mounted display (HMD). The HMD rendered a virtual classroom developed by KatanaSim (KatanaSim, Germany<sup>2</sup> ) with animated students and a teacher. The participants' point of view was seated first-person, facing the teacher (**Figure 1**). The participant had a full 360◦ view from the desk seat, with other students seated nearby. The task was to control the brightness of the lighting in the classroom. When an upward-pointing arrow was shown on the chalkboard, the participant was required to "activate" in order to make the light brighter. When the arrow pointed downward, the participant was required to "deactivate" in order to make the light darker. Briefly, activation requires higher output compared to baseline from the respective feedback source, while deactivation requires reduced output compared to baseline (see below for more details on fNIRS and EMG activation/deactivation protocols). Importantly, participants were not told, in either condition, how to regulate the lighting in the classroom, they were instructed simply to try to increase the lighting in the room when the arrow pointed upward and to decrease the lighting when the arrow pointed downward. In this way, only the positive or negative feedback they received from the scenario should have enforced their learning of the feedback parameter. The probability that a trial was activation (arrow up) was 50% in sessions 1–4 and 80% in sessions 5–8. More activation was encouraged in the second half of the scenario, as more upregulation of the prefrontal cortex is

<sup>1</sup>www.oculusvr.com

<sup>2</sup>www.katanasim.com

associated with stronger inhibitory control (Rubia et al., 2013; Soltaninejad et al., 2015). Participants were confronted with distractions within the scenario (e.g., students turning around or cell phones ringing) from the second half of each session until the end.

Before each trial, a baseline and threshold of light fluctuation were calculated to determine the point at which the classroom light was balanced between fully bright and fully dark and the range within which it could fluctuate. Following successful activation or deactivation – when the signal was 60, 70, or 80% of the time above or below the baseline, respectively – the participant was rewarded with one, two, or three smiley faces, respectively, on the chalkboard.

Each session was comprised of three blocks, the first and the last being 12 min in length while the second, the transfer block, was 8 min. In the transfer block, the light's brightness was fixed, meaning that the only feedback came at the end of each trial. Trial number and length varied depending on the feedback source and will be discussed in the following sections.

#### fNIRS

Functional near-infrared spectroscopy records change in oxygenated (O2Hb) and deoxygenated (HHb) hemoglobin relative to a baseline; the amount of local O2Hb infers the amount of local brain activation, via the process of hemodyamic coupling, wherein increases of cortical activation lead to increases in O2Hb and decreases in HHb (Haeussinger et al., 2014). The ETG-4000 continuous Optical Topography System (Hitachi Medical Co., Japan) was used for pre- and post-tests as well as NF sessions.

Our optode montage featured two 3 × 3 optode arrays centered with the innermost channel of the front row of each array placed on F3 (left hemisphere) and F4 (right hemisphere) of the international 10–20 EEG system (Jasper, 1958). Source–detector distances were kept at 3 cm. The optode arrays were rotated 45◦ laterally along the transversal plane so that the innermost four channels in the two frontal rows were oriented over the left and right dlPFC (**Figure 2**). The third optode array was a 3 × 5 arrangement where the most superior and lateral optode on the left and right of the array were oriented on P3 and P4, respectively. Subtending the parieto-occipital cortex, this probeset was used exclusively for common average (CA) reference, a signal correction method (see below).

#### fNIRS Feedback Signal and Trials

The feedback target was the average amplitude of O2Hb within the bilateral dlPFC (see Marx et al., 2015). The raw fNIRS signal was sampled at 10 Hz and preprocessed in MATLAB version 9.0 (The MathWorks Inc., United States). A moving average Kalman filter with a 5 s sliding window was then applied to the data. Finally, we used a CA artifact removal method used in previous NF designs serving as a

basis for this design (Marx et al., 2015; Mayer et al., 2015). This method was preferred because of its ability to remove probeset-wide effects from individual channels (Heinzel et al., 2013). For the CA, the raw average of all 46 channels was subtracted from the raw average of the eight emitter–detector channel pairings over the dlPFC in order to limit the influence of artifacts – e.g., superficial blood flow, head and jaw movements, and respiration – on the hemodynamic response in the feedback channels. All preprocessing occurred online.

The fNIRS trials were 30 s in duration with a 5 s baseline period. Relative O2Hb concentration higher than baseline led to brightening of the lights; concentration lower than baseline led to dimming. Trials were divided into three blocks (**Figure 3**). The first and last blocks contained 12 trials and subsequent rests of 20 s duration. The middle block contained eight trials and rests and was used as the transfer block, wherein no continuous feedback was provided, though participants were still given feedback at the end of the trial. There was no jittering of intertrial intervals.

# EMG Feedback Signal and Trials

Monopolar EMG, with a sampling rate of 1000 Hz, provided feedback from the bilateral musculus supraspinatus for the control group (see Mayer et al., 2015). The signal was referenced to the right mastoid and was grounded on the left mastoid. The data stream was bandpass filtered between 80 and 300 Hz. The resulting signal was then normalized via a maximum output and a resting output, for which the participant flexed both muscles maximally for 10 s and sat completely at rest for 10 s, respectively. At each time point, feedback was equivalent to:

$$\text{Feedback Index = R--1, }$$

where R and L were the right and left normalized muscle outputs, respectively, given by:

R(L) =

Right(Left)EMG Signal−Average Resting Baseline Right(Left) Average Maximal Muscle Output Right(Left) .

Therefore, more tensing of right muscle led to brightening; more tensing of left muscle led to dimming. Baseline for each trial was an average of the last 2 s of the resting feedback signal.

The EMG trials were 15 s in duration with a 2 s baseline period. Relative muscular feedback index higher than baseline led to brightening of the lights; feedback index lower than baseline led to dimming. Trials were divided into three blocks (**Figure 3**). The first and last blocks contained 24 trials and subsequent rests of 10 s duration. The middle block contained 16 trials and rests and was used as the transfer block, wherein no contingent feedback was provided.

#### Pre- and Post-measures Go/no-go and n-Back Task

The go/no-go and the n-back tasks were programmed in Presentation version 18.0 (Neuro Behavioral Systems, United States) following previously published protocols (Mayer et al., 2015; see also Ehlis et al., 2008). We recorded fNIRS during both tasks. Briefly, the go/no-go task consisted of alternating go and no-go blocks (four repetitions each) separated by rest blocks, each block lasting 30 s. In the "go" condition, participants were asked to respond as fast as possible to each stimulus. In the "no-go" condition, participants were instructed to withhold their response on no-go trials (here: presentation of the letter "N"; 25% of trials). Dependent variables were RT, SDRT, FA, and omission errors.

The n-back task consisted of three blocks each of 2-back (high working memory load), 1-back (low working memory load), and 0-back (control) (block length: 30 s; separated by 30 s rest periods). In the 2- (1-)back task, the participants were instructed to press the space bar as quickly as possible whenever the current letter was the same as the letter two letters (one letter) back. In the 0-back task, the participant was instructed to respond when the letter "O" appeared on the screen. Dependent variables were RT, SDRT, and correct hits.

#### Stop-Signal Task

The SST followed the protocol described in Verbruggen et al. (2008). The task consisted of one practice block and three 3-min verum blocks wherein the participant should respond to the direction of an arrow pointing on the screen as quickly as possible. In roughly 25% of trials, the arrow would turn blue,

indicating the participant should withhold their response, after a variable stop-signal delay (SSD) that started at 250 ms and increased or decreased by 50 ms depending on if they failed or succeeded to stop, respectively. Dependent variables in the SST included the SSRT – a measure of behavioral inhibition – RT, and SDRT. The SST was added as a secondary measure for behavior. We did not record simultaneous fNIRS with this measure.

#### Analysis

#### Functional Near-Infrared Spectroscopy Data

All analysis was performed using MATLAB. In order to analyze fNIRS data, we used subroutines programmed in our research group, adapted for fNIRS from the Statistics Parametrical Mapping toolbox for MATLAB (SPM8; Friston et al., 1994). Raw signals were bandpass filtered between 0.01 and 0.2 Hz to remove unwanted physiological artifacts such as heartbeat and respiration. Next, channels exceeding three times the within-subject standard deviation over the course of the measurement were interpolated (see Hagen et al., 2014) using a Gaussian distribution with the O2Hb values of proximal channels given a higher weighting than distal ones; less than 10% of all channels were interpolated. We then applied a wavelet-based transform (Molavi and Dumont, 2012) to detect and correct motion artifacts that were still part of the data. We used the hmrMotionCorrectWavelet algorithm from the Homer2 fNIRS analysis package for MATLAB with the standard motion artifact detection threshold of 1.5 SD above the interquartile range of the data (Huppert et al., 2009). Finally, a block-related average amplitude was calculated for each channel using an interval of 0–60 s after block onset with a 10-s baseline correction. Linear detrending was applied to remove slow drifts in the data. Finally, average amplitudes over the duration of the task blocks (0–30 s) were calculated.

# Region of Interest (ROI)

We mapped fNIRS channels to corresponding, underlying cortical areas based on a virtual registration method (Rorden and Brett, 2000; Singh et al., 2005; Tsuzuki et al., 2007). The left and right dlPFC regions of interest (ROIs) consisted of the channels that we used for the NF training. These channels are concentrated in Brodmann Areas 9, 45, and 46. This includes the dlPFC and also slightly expands into the inferior frontal gyrus (IGF; see **Figure 2**).

# Rate of Learning and Correlation with Primary Outcome Variables

Additionally, we analyzed the success of the participants in obtaining control of the feedback parameter. Our success rate was calculated as the average percentage of time spent in the correct direction of the desired feedback (above or below the baseline, for activation vs. deactivation trials, respectively) for the duration of the trial. An average was calculated for all trials from the first week (four sessions) and the second week (four sessions). The rate of learning was calculated as the average of the second week minus the average of the first week. Rate of learning was then correlated with the primary outcome variables of FA rate in the no-go task and average amplitude of O2Hb of the feedback channels during the no-go task. Similar metrics were created in order to compute the correlations: pre–post FA errors were computed for each subject, to give a metric of individual improvement. Similarly, a post–pre average amplitude of O2Hb of the feedback channels was computed to reflect difference in activation after the training. In the event of significant correlations in one or more groups, we computed a pseudopermutation test (n = 10,000 permutations), permuting the group assignment while keeping within-subject correlation pairs intact, to determine a significant difference between groups. The number of permutations in which the permuted group difference in ρ value was larger than the verum group difference in ρ value was divided by the total number of permutations to create a p-value.

# Statistical Analysis

To evaluate the statistical significance of pre–post changes in O2Hb and HHb in the go/no-go and n-back tasks, we conducted 2 × 2 × 2 × 2(3) repeated measures analyses of variance (ANOVAs), with the between-subjects factor treatment group (NIRS vs. EMG) and the within-subject factors of time (pre vs. post), ROI (left dlPFC vs. right dlPFC), and condition (n-back (3): 2-, 1-, and 0-back; go/no-go (2): go and no-go). For behavioral data, repeated measures ANOVAs were performed using the same factors excluding ROI. When data violated the assumption of sphericity, Greenhouse-Geisser corrected values were reported. For significant main and interaction effects, two-tailed Student's t-tests were employed for post hoc analyses (paired or independent samples, as appropriate). In cases where the assumption of normality was violated, we used two-tailed Mann–Whitney U tests or Wilcoxon signed-rank tests, respectively.

## ROI Specificity

In order to determine specificity of ROIs we used pseudopermutations tests, wherein the mean difference in the average amplitudes from pre to post measurement for a given verum ROI (vROI) for all participants was compared to a pseudo-ROI (pROI) composed of an equal number of randomly chosen NIRS channels. N = 10,000 permutations of pROI were calculated and the resulting p-value was the sum of trials in which the resulting statistic from the vROI was greater than the permuted statistic from the pROI.

# RESULTS

#### Behavioral Data

Only significant results related to the hypotheses are reported here. For a full summary of behavioral data, see **Table 1**.

#### Go/no-go

False alarm errors in the go/no-go task showed a trend with a large effect size for a measurement time<sup>∗</sup> group interaction effect



SD, standard deviation; RT, reaction time; SDRT, standard deviation of the reaction time; FA, false alarms; SSRT, stop-signal reaction time.

(F(1,18) = 4.08, p = 0.059, η <sup>2</sup> = 0.185). Post hoc Wilcoxon signed-rank tests revealed a reduction of FA errors from pre to post measurement in the experimental group (Mpre = 4.8, SDpre = 2.4; Mpost = 2.6, SDpost = 1.3; Z = −2.57, p = 0.01), but not in the control group (Mpre = 4.8, SDpre = 2.7; Mpost = 6.0, SDpost = 5.2; Z = −0.30, p = 0.77) (**Figure 4A**). No other interaction effects were observed.

#### Rate of Learning

A one-sample Kolmogorov–Smirnov test rejected the null hypothesis that the learning rates for the first half and second half of the experimental and control groups, respectively, followed a normal distribution (D = 0.65, 0.65, 0.64, 0.65, N = 10 each, and p < 0.05 each). Therefore, Wilcoxon signed-rank tests and Spearman correlations were calculated. For the experimental group, there was no significant difference between first half and second half performance, but a medium effect size indicating better second half performance (Z = 1.48, p = 0.13, r = 0.33). There was, however, a significant difference between first and second half performance for the control group (Z = 2.68, p = 0.013, r = 0.60), indicating a significantly better performance in the second week with a large effect size.

The rate of learning of both groups failed to correlate significantly with post–pre changes in average O2Hb concentration in feedback channels (|ρ| < 0.224, p > 0.05). The rate of learning in the experimental group, however, correlated strongly with size of pre–post reduction in FA (ρ = 0.75, p = 0.013; see **Figure 4B**). Rate of learning in the control group did not correlate with pre–post reduction in FA (ρ = −0.24, p = 0.508). The resulting pseudo-permutation test concluded that there was a significant group difference (p = 0.015).

#### N-Back Task

No significant behavioral interaction effects were observed. Hit rates for each condition were nearly 100% in the pre-test. Furthermore, no FA errors were made in this task. A ceiling effect was evident for this task.

#### Stop-Signal Task

Reaction time variability yielded a significant interaction effect of measurement time<sup>∗</sup> group (F(1,18) = 5.39, p = 0.03, η <sup>2</sup> = 0.231), with the experimental group showing significantly reduced RT variability following the training (Mpre = 160.78 ms, SDpre = 64.88; Mpost = 124.13, SDpost = 60.60; t(9) = 2.48, p = 0.035). The control group showed no difference between measurements (Mpre = 125.55 ms, SDpre = 46.13; Mpost = 145.70, SDpost = 61.04; t(9) = 1.04, p = 0.328).

# fNIRS Data

#### Go/no-go O2Hb

We observed a main effect of task (F(1,18) = 11.92, p = 0.003, η <sup>2</sup> = 0.398, mean amplitudes: Mgo = 0.005, SD = 0.033 mm∗mol/l, Mno-go = −0.005, SD = 0.029 mm∗mol/l) and an interaction effect of time<sup>∗</sup> task∗ROI<sup>∗</sup> group (F(1,18) = 5.63, p = 0.029, η <sup>2</sup> = 0.238). This interaction was caused by a pre to post increase in O2Hb amplitudes of the left dlPFC in the experimental group during the no-go task (Mpre = −0.029, SD = 0.035 mm∗mol/l; Mpost = 0.010,

SD = 0.040 mm∗mol/l; t(9) = −3.63, p = 0.005; see **Figure 5**). In the control group of the same condition, time, and ROI, there was no significant change (Mpre = 0.006, SD = 0.017 mm∗mol/l; Mpost = −0.006, SD = 0.031 mm∗mol/l; t(9) = 1.15, p = 0.281). All other post hoc comparisons failed to reach significance (|t(9)| < 1.837, p > 0.1). The permutation test indicated that this ROI was indeed the focal point for the increase in brain activation. The resulting p-value was equal to p = 0.003, indicating that there is high spatial specificity to the activation, located in the left dlPFC.

#### Go/no-go HHb

We observed no main effects, only an interaction effect of task∗hemisphere (F(1,18) = 5.79, p = 0.027, η <sup>2</sup> = 0.243). Post hoc testing indicated that there was a trend toward a significant difference in HHb activation between the left and right hemisphere in the "go" condition (Mleft = −0.005, SD = 0.023; Mright = −0.013, SD = 0.026; t(9) = 2.07, p = 0.052).

#### N-Back O2Hb

We observed no main effects or significant interaction effects (all |F(2,36)| < 2.50; all p > 0.11).

#### N-Back HHb

We observed a trend for a main effect of task (F(1.39,24.93) = 3.75, p = 0.052, η <sup>2</sup> = 0.173; mean amplitudes: M2Back = −0.011, SD = 0.023; M1Back = −0.007, SD = 0.021; M0Back = 0.001, SD = 0.024). Again, the indication is a higher activation in tasks with a higher working load. We also observed a trend for a main effect of time (F(1,18) = 3.26, p = 0.088, η <sup>2</sup> = 0.153; MPre = −0.008, SD = 0.026; MPost = −0.003, SD = 0.021), indicating a marginal decrease in activation across all tasks from pre to post measurement time. No other main effects or interaction effects were observed.

#### DISCUSSION

The present study was designed to test the efficacy of a novel neurofeedback intervention (fNIRS-based frontal lobe NF in a virtual classroom environment) with the ultimate aim of reducing ADHD symptoms in schoolchildren by increasing their ability to regulate prefrontal cortex activity (Blume et al., 2017). Here, we focused on the effects of this newly developed NF protocol in a sample of highly impulsive young adults, a subclinical risk population that exhibits many of the behavioral abnormalities also seen in patients with ADHD (e.g., Herrmann et al., 2009). In this proof-of-concept study, we were primarily interested in first, whether the fNIRS-based NF group would show increased cortical activation in feedback channels during frontal lobe/impulsivity-related tasks (go/no-go and n-back), following focused training of these channels and second, whether the fNIRS-based NF group would show a reduction in impulsive behaviors (go/no-go, n-back, SST).

During a go/no-go task, we observed a significant increase compared to a pre-training baseline in cortical O2Hb concentration in the left dlPFC of the experimental (fNIRS) group only. During the same task, we observed a concurrent and significant reduction in FA errors of the same group. Importantly, this reduction in FA errors correlated significantly with the rate of learning of the experimental subjects but not the control subjects. Additionally, we observed a reduction in RT variability on the SST for the experimental group. We observed no group differences in either cortical activation or behavior on the n-back task. The lack of a group difference after training on this task is likely due to the study specifically focusing on the recruitment of highly impulsive students. There is no evidence to suggest that highly impulsive participants have explicit deficits in working memory. In fact, in a study examining the correlations between trait impulsivity (as measured by BIS self-report) and performance on various neurocognitive tasks, no significant correlation was found between trait impulsivity and working memory performance, while trait impulsivity correlated strongest with go/no-go errors (Keilp et al., 2005). Furthermore, task accuracy reflected a ceiling effect from the pre-test, indicating that the task was not difficult for these

in black. (B) ROI event-related averages. Circled regions from (A) indicate the left dlPFC ROI for which the event-related average of O2Hb ± standard error of the

subjects. Therefore, despite the potential benefit to working memory that training the dlPFC might imbue, in our case there may have been no deficit to correct. Lastly, HHb data showed no differences in activation in either task. These results make sense in the context of the NF training; since O2Hb was trained, the hypothesis would be that O2Hb and not HHb would show the strongest pre–post effects. In addition, O2Hb is more sensitive to detection of changes than HHb (Strangman et al., 2002).

mean (SEM) is depicted for both pre-(blue) and post-(red) tests.

False alarm errors, or incorrect go-responses to no-go stimuli, represent a failure to exhibit response inhibition (Aichert et al., 2012), an impulsive trait that subjects with ADHD share with highly impulsive participants. A reduction for the experimental group and not for the control group suggests that the fNIRS intervention was effective in reducing impulsive behavior as specified. The strong O2Hb correlation observed between a reduction in FA errors and the rate of learning within the experimental group, but not within the control group, further illustrates the importance of specificity in NF training. The goal of actually learning to control the feedback parameter is often overlooked in NF studies, where the rate of obtained control is rarely reported (Zuberer et al., 2015). Interestingly, the control group showed a significant improvement between the first and second week in regulating the feedback parameter while the experimental group did not. This likely has to do with the comparable ease of the EMG feedback; once one learns the correct movement, it can relatively easily be replicated every trial. The fNIRS feedback is likely more complex, as there is no right or wrong way to achieve the feedback parameter, and sustaining oxygenation of the dlPFC over time is strenuous. Given this complexity, the medium effect size observed in the fNIRS learning rate is encouraging, and may simply mean that more sessions are needed to fully gain control. Moreover, for

the specific sample investigated and trained here (i.e., highly impulsive subjects), frontal lobe alterations have been shown as a central neurophysiological correlate, so it is perhaps not surprising that improving control over this area of the brain seems to have been particularly difficult. However, this behavioral effort seems to pay dividends, as we see that the more control impulsive subjects were able to gain over the activation of their dlPFC, the fewer FA errors they made, whereas the successful learning of the EMG parameter had little effect. This result supports the findings of an fNIRS study that sought to differentiate the roles of the medial and lateral prefrontal cortex during a go/no-go task. The bilateral middle frontal gyrus (i.e., the dlPFC) was responsible for error monitoring during the motor inhibition segment of the go/no-go task (Rodrigo et al., 2014). Our results indicate that the combination of both correct feedback parameter (i.e., frontal lobe focused) and successful learning of that parameter, not one or the other in isolation, is important to the feedback's overall success.

The task-specific increase in prefrontal oxygenation coinciding with a reduction in FA errors suggests that – following the frontal fNIRS training – the highly impulsive participants were able to recruit more cognitive resources, particularly from the dlPFC, during this task, leading to improved performance. Whether or not this was intentional is a matter of debate, but the goal of NF interventions remains to train implicit activation of brain activity through operant and classical conditioning (Strehl, 2014). Therefore, it seems that the participants were able to transfer skills learned either implicitly or explicitly from the training into a performance situation. Furthermore, this increase in cortical activation was both task- (no-go) and region-specific (left dlPFC). While there was no increase in activation in the right dlPFC, the left-specific increase as well as the increase in inhibitory control are in line with the tDCS study of Soltaninejad et al. (2015) who used cathodal stimulation over the left dlPFC of adolescents with ADHD and observed a decrease in FA errors. While the literary consensus places the locus of inhibitory control within the right dlPFC, inferior prefrontal, premotoric, and striatal brain structures (Aron et al., 2004, 2014; Bari and Robbins, 2013; Obeso et al., 2013), the left dlPFC shares strong functional connectivity with the above-mentioned areas (Ridderinkhof et al., 2004; Aron et al., 2014). Moreover, the dlPFC does not seem to be directly responsible for inhibitory control, but rather functions as a higher order mechanism that organizes the relevant brain structures above when attention control or increased working memory capacity is needed, in particular for oddball or complex no-go tasks (Criaud and Boulinguez, 2013). Because our go/no-go paradigm could be considered oddball, with an occurrence of no-go stimuli in only 25% of trials, it may be that the extra dlPFC resources recruited were used for focusing attention, rather than inhibitory control per se. Indeed, the reduction in SDRT seen in the SST also indicates an increase in attentional resources, possibly also mediated by an increase in prefrontal brain activity, though NIRS data were not available for this task. Increases in SDRT are generally considered to be related to lapses in attention (Alderson et al., 2007), though Kirkeby and Robinson (2005) found SDRT to be inversely correlated with trait impulsivity. Still, this does not rule out the idea that our impulsive sample also suffered from inattentiveness.

Treatment effects for both impulsivity and possibly inattention are encouraging from a translational perspective regarding potential use of our NF design with an ADHD population. We chose the dlPFC as a NF site because of its involvement in general top-down cognitive control, and the realization of significant training effects in impulsivity and possible inattention suggests that the protocol may be useful for an ADHD population. Several reasons lead us to be hopeful of even greater effects in a current study in our lab with ADHD schoolchildren (Blume et al., 2017). First, the sample size of this study was small. Only large effects could be detected, and with a greater sample size, we would expect to see effects in a wide range of other cognitive and behavioral deficits. Secondly, the training was compact and about half the number of training sessions we would recommend (and currently use) for a clinical ADHD project. As far as we know, this is the shortest number of training sessions to produce effects in brain activation and behavior that was adequately controlled for specificity. Cho et al. (2004) also used a 2 week, eight session NF paradigm with EEG and found training effects for inattention and impulsivity, but they did not have an adequate control group (waiting group), and additionally, did not measure differences in brain activity pre and post. Lastly, but most importantly, children have a greater capacity for brain plasticity than adults (Kolb and Gibb, 2011). For children with ADHD, this capacity is even more pronounced within the dlPFC, a region that develops particularly late for them (Rubia et al., 2013). Given the current study's results, we would expect even greater improvements within a child population.

The current study was limited by several factors, which we hope to improve upon in a second study with children with an ADHD diagnosis (Blume et al., 2017). The sample size was small which limited data analysis. Our aim was to test the viability of an immersive VR NF paradigm, and it appears that the full classroom immersion did not detract from the ability of the participants to regulate their brain activity. There was a difference between experimental groups in pre-test no-go activation, with the experimental group showing less activation than the control group. Small groups, even with proper randomization, have a much greater chance of having differing baseline measurements simply due to sampling error (Marshall, 1996). The larger the group, the smaller the chance of pre-baseline differences due to a random sampling error. As NF studies require large time and monetary investments per participant, and the aim of our study was to ultimately test the efficacy of VR NF, we chose 10 participants per group as a balance between power and realism. For technical reasons, we did not have triggers to compare the extent to which participants were able to regulate their brain activity across sessions, something that will be improved in the next study. While we used distractors in the current study, there was no way to compare trials in which a distractor occurred to trials in which they did not. Furthermore, we lack a comparison of the effects of the immersive VR NF paradigm to a 2-D version. In an ongoing study with children with ADHD (Blume et al., 2017),

we include a 2-D group that still uses lighting in the classroom as the feedback source, but the child sees the classroom on a normal computer monitor. In this way, we will be able to determine if immersive NF is actually more effective for the transfer of the learned regulation. Furthermore, the classroom itself is only one of many possible VR NF designs. Virtual reality scenarios coupled with NF are limited only to the imagination and relevance to a certain psychological disorder. Virtual reality NF with subjects with social phobias, for example, could be integrated within a potentially stressful social situation, like a bar or dinner party, furthering the ecological validity of the treatment while also avoiding an exposition-driven therapeutic approach that cannot be as easily controlled.

Considering these limitations and the relative ease with which they could be improved upon going forward, it seems that VR NF is a very promising modality for the treatment of behavioral disorders with known pathophysiological alterations.

#### ETHICS STATEMENT

This study was approved by the Ethics Committee of the Medical Faculty of the University and the University Hospital of Tübingen and all procedures were in accordance with the Helsinki Declaration of 1975, as revised in 2013.

# AUTHOR CONTRIBUTIONS

All authors have approved of the final version of this manuscript. JH study design, data collection and analysis, and manuscript preparation; FB study design, data collection,

#### REFERENCES


and manuscript preparation; TD study design and manuscript preparation; FH data analysis; TR study design and manuscript preparation; AF study design and manuscript preparation; CG study design and manuscript preparation; and A-CE study design, data analysis, and manuscript preparation.

# FUNDING

This research was funded by the LEAD Graduate School & Research Network (GSC1028), a project of the Excellence Initiative of the German federal and state governments. JH and FB are doctoral candidates and TD is a Junior Research Group Leader at the LEAD Graduate School & Research Network. A-CE was partly supported by the Interdisciplinary Center for Clinical Research (IZKF) Tübingen (Junior Research Group, grant 2115-0-0). We acknowledge support by the Deutsche Forschungsgemeinschaft and the Open Access Publishing Fund of the University of Tübingen.

# ACKNOWLEDGMENTS

The authors would like to give additional thanks to those who contributed with measurements, in particular Marc Werth, Silke M. Bieck, and Nana Ambs. Sandru Razvan was influential in editing and structuring of the manuscript. Additionally, uncompromising support was given by Betti Schopp and Ramona Taeglich in coordinating measurements and training. They would also like to thank the participants for their participation and a very involved study.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Hudak, Blume, Dresler, Haeussinger, Renner, Fallgatter, Gawrilow and Ehlis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Computer Enabled Neuroplasticity Treatment: A Clinical Trial of a Novel Design for Neurofeedback Therapy in Adult ADHD

Benjamin Cowley 1, 2 \*, Édua Holmström<sup>3</sup> , Kristiina Juurmaa<sup>2</sup> , Levas Kovarskis <sup>3</sup> and Christina M. Krause<sup>2</sup>

<sup>1</sup> BrainWork Research Centre, Finnish Institute of Occupational Health, Helsinki, Finland, <sup>2</sup> Cognitive Brain Research Unit, Cognitive Science, Institute of Behavioural Sciences, University of Helsinki, Helsinki, Finland, <sup>3</sup> Formally affiliated with Faculty of Behavioural Sciences, University of Helsinki, Helsinki, Finland

Background: We report a randomized controlled clinical trial of neurofeedback therapy intervention for ADHD/ADD in adults. We focus on internal mechanics of neurofeedback learning, to elucidate the primary role of cortical self-regulation in neurofeedback. We report initial results; more extensive analysis will follow.

Methods: Trial has two phases: intervention and follow-up. The intervention consisted of neurofeedback treatment, including intake and outtake measurements, using a waiting-list control group. Treatment involved ∼40 h-long sessions 2–5 times per week. Training involved either theta/beta or sensorimotor-rhythm regimes, adapted by adding a novel "inverse-training" condition to promote self-regulation. Follow-up (ongoing) will consist of self-report and executive function tests.

#### Edited by:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia, Spain

#### Reviewed by:

Lutz Jäncke, University of Zurich, Switzerland Juliana Yordanova, Bulgarian Academy of Sciences, Bulgaria

> \*Correspondence: Benjamin Cowley

benjamin.cowley@ttl.fi

Received: 13 January 2016 Accepted: 22 April 2016 Published: 09 May 2016

#### Citation:

Cowley B, Holmström É, Juurmaa K, Kovarskis L and Krause CM (2016) Computer Enabled Neuroplasticity Treatment: A Clinical Trial of a Novel Design for Neurofeedback Therapy in Adult ADHD. Front. Hum. Neurosci. 10:205. doi: 10.3389/fnhum.2016.00205 Setting: Intake and outtake measurements were conducted at University of Helsinki. Treatment was administered at partner clinic Mental Capital Care, Helsinki.

Randomization: We randomly allocated half the sample then adaptively allocated the remainder to minimize baseline differences in prognostic variables.

Blinding: Waiting-list control design meant trial was not blinded.

Participants: Fifty-four adult Finnish participants (mean age 36 years; 29 females) were recruited after screening by psychiatric review. Forty-four had ADHD diagnoses, 10 had ADD.

Measurements: Symptoms were assessed by computerized attention test (T.O.V.A.) and self-report scales, at intake and outtake. Performance during neurofeedback trials was recorded.

Results: Participants were recruited and completed intake measurements during summer 2012, before assignment to treatment and control, September 2012. Outtake measurements ran April-August 2013. After dropouts, 23 treatment and 21 waiting-list participants remained for analysis.

Initial analysis showed that, compared to waiting-list control, neurofeedback promoted improvement of self-reported ADHD symptoms, but did not show transfer of learning to T.O.V.A. Comprehensive analysis will be reported elsewhere. Trial Registration: "Computer Enabled Neuroplasticity Treatment (CENT)," ISRCTN13915109.

Keywords: neurofeedback, attention deficit/hyperactivity disorder, attention deficit disorder, adult, randomized controlled trial, waiting list control, learning curves, learning transfer

#### 1. INTRODUCTION

Attention Deficit/Hyperactivity Disorder (ADHD) is a neurobiological condition which can strongly affect several areas of life: lower socio-economic status, less satisfaction with employment and marriage, as well as common co-occurrence of conditions like addiction and depression. Epidemiology research has estimated the prevalence of ADHD among adults to be 4.4%. Subtypes of ADHD have been identified including Inattentive (ADHD-I), Hyperactive (ADHD-H) and combined (ADHD-C). However, the nature of the disease and most effective method of treatment are still not well understood.

This open-label clinical trial is a test of neurofeedback (NFB) as a treatment intervention for ADHD or ADD-diagnosed adults<sup>1</sup> . The intervention is based on the NFB training regimes "theta-beta" (TB) and "sensorimotor rhythm" (SMR). These regimes train self-regulation of power in specific bands of the EEG frequency spectrum, through the principle of operant conditioning. The trial also includes the novel addition of a third type of training, "inverse training," designed to help investigate the relationship between NFB learning performance (the requirement to self-regulate) and the specific effects of each training regime. This follows recent calls for increased focus on the internal mechanics of the NFB process by Gevensleben et al. (2014) and Zuberer et al. (2015), in order to attempt more than just another simple test of efficacy.

We test intervention efficacy in a between-subjects manner using a waiting-list control (WLC) group. The trial thus relies on three different levels of measurements: neurophysiological, cognitive, and behavioral; and examines their relationship across multiple time-points. Transfer of learning is measured by: questionnaires tapping ADHD/ADD symptoms that participants filled out at four time-points during the intervention; and by performance on a continuous performance test. More importantly, given our focus on mechanics, we also test specific effects of the NFB training regimes by analysing within-subjects learning curves (LCs) of the treatment group, which represent how they learn to self-regulate their EEG-band activity.

At the time of writing this is an ongoing two phase trial, where phase 1 included intake measurements, NFB intervention, and outtake (all complete), and phase 2 will include follow-up measurements and WLC treatment opportunity (pending). In addition to the protocol structure, we report an initial group-comparison result from the first phase, showing mixed outcomes for efficacy. This result is primarily included to illustrate the importance of the planned withinsubjects analysis for obtaining clear insights from the data gathered.

#### 1.1. Attention Deficit/Hyperactive Disorder

Models seeking to explain the cognitive neuropsychological problems associated with ADHD include disturbance of attention, cortical arousal, and executive functions (for review see e.g., Sergeant et al., 2003; Seidman, 2006). However, a meta-analysis by Huang-Pollock and Nigg (2003) discarded the explanatory value of attention, at least in individuals diagnosed with the combined subtype of ADHD. Increasingly, ADHD is not seen as a disorder of attention at all but as a disorder in key aspects of self-regulation and executive functions (Nigg, 2005). One caveat is the growing consensus that the executive function "single deficit" model cannot sufficiently explain ADHD (Nigg, 2005; Pennington, 2005; Sonuga-Barke, 2005). Studies indicate that not all persons with ADHD have executive function deficits — at least as measured by laboratory tests.

Cortical arousal models in ADHD are closely related to the attentional concept of alerting as proposed by Posner and Petersen (1990), reflecting right-lateralized vigilance network with noradrenergic involvement. These models emphasize deficiencies in the early stages of information processing as a result of under-arousal in cortical systems (Sergeant et al., 1999). EEG and ERP findings tend to support this model in that they reveal excess slow-wave activity in adults with ADHD (Bresnahan et al., 1999). Support also comes from consistent findings of deficit in the continuous performance test (CPT) dprime parameter, which can be considered a consensus index of arousal (Losier et al., 1996). Epstein et al. (2003) found that the d-prime demonstrated very robust relationships to the 18 DSMV-IV ADHD symptoms.

EEG studies suggest that ADHD might stem either from a maturational lag or a developmental deviation (Barry et al., 2003). Maturational lag models require that EEG measures from an individual with ADHD would be considered normal in a younger person, and implies that ADHD adults grow out of their immature EEG activity with increasing age (Mann et al., 1992). Whereas in the developmental deviation model, ADHD is conceptualized as resulting from an abnormality in the functioning of the central nervous system, unlikely to change without targeted intervention.

**Abbreviations:** ADHD, Attention Deficit/Hyperactivity Disorder; ADD, Attention Deficit Disorder; BCIA, Biofeedback Certification International Alliance; CENT, Computer Enabled Neuroplasticity Treatment; CNV, Contingent Negative Variation; IBS, Institute of Behavioral Science; MCC, Mental Capital Care (clinic); NFB, neurofeedback; RCT, randomized controlled trial; SCP, Slow Cortical Potentials (neurofeedback training regime); SMR, SensoriMotor Rhythm (neurofeedback training regime); TB, Theta/Beta (neurofeedback training regime); T.O.V.A., Test Of Variables of Attention; UoH, University of Helsinki; WLC, waiting list control.

<sup>1</sup>The ICD-10 diagnosis classification system is in use in Finland where the study was conducted. Both ADD and ADHD are used as diagnoses, the former referring to the predominantly inattentive type, as opposed to predominantly hyperactive or combined types.

Longitudinal studies, reviewed by Bresnahan et al. (1999), that followed participants up till adulthood revealed that although there is a significant reduction of slow wave activity in both the ADHD and the control group with increasing age, absolute and relative theta activity remained elevated through adolescence into adulthood (Bresnahan and Barry, 2002). Interestingly, with increasing age, the level of beta activity produced by adults with ADHD was normalized in the frontocentral regions. The most consistent finding from EEG studies of ADHD in adults is increased absolute power in theta, clearly visible in frontocentral areas (Bresnahan et al., 1999; Lazzaro et al., 1999; Clarke et al., 2001). Such findings contradict the maturational lag model, as the difference in slow activity does not disappear with increasing age.

By contrast, the hypo-arousal developmental deviation model originally proposed by Satterfield et al. (1974) has been supported by cerebral blood flow and positron emission tomography studies (Lou et al., 1989; Zametkin et al., 1990). This model proposes that ADHD results from cortical under-arousal, and the observed atypical slow wave activity confirms the existence of altered brain activity among adults with ADHD. Hypo-arousal is thought to correlate with both beta and SMR (sensory motor rhythm, also called low beta), because in normal functioning, increased beta is associated with mental activity, and decreased SMR with physical activity.

Much of the existing research has identified maturational lag or hypo-arousal as the underlying cause of ADHD. Although these models have initiated extensive research, they have failed to clarify the aetiology of the disorder (Bresnahan et al., 1999). Thus, the literature suggests that, at least in adult ADHD, aetiological specificity is lacking; with the consequence that traditional treatments targeted at ADHD as a single disorder are unlikely to be reliable. In contrast, personalized medicine emphasizes heterogeneity within a given disorder, relying on biomarkers or endophenotypes to guide different treatments.

## 1.2. Neurofeedback

NFB, also called EEG biofeedback, is operant conditioning of specific temporal, spatial and frequency features extracted from scalp-recorded electrical potentials (Lubar and Shouse, 1976). Feedback is presented to the treated individual in the form of positive and negative reinforcers (in this study: visual reinforcers) whenever their ongoing EEG features meet or fail to meet a predefined criterion. The aim of NFB is to learn to gain control of those EEG features over time.

Literature supports the efficacy of NFB for children with ADHD (Arns et al., 2009, 2014b; Micoulaud-Franchi et al., 2014). Part of its value is that NFB can be personalized to suit the specific clinical presentation, provided that there is requisite theoretical and observational data to guide the personalization.

NFB has been described as a mechanism that can stimulate cortical arousal and/or regulate cortical oscillations, which in turn may influence such cognitive activity as attention (Vernon et al., 2003). The specific effect has been described variously as following one of two models, termed by Gevensleben et al. (2014) as "conditioning- and-repairing model" vs. "skill-acquisition model." This implies that the effect of NFB may be to repair a presumed cause of disorder to normalize behavior, or instead may be a tool to enhance cognitive performance (see Gevensleben et al., 2014 for a thorough discussion).

It has been suggested that, besides the neurophysiological aspects of NFB, treatment outcome depends greatly on the subjective involvement of the patient. Calderon and Thompson (2004) have conceptualized biofeedback as a three-step process that consists of


The first two steps of the model — becoming aware and learning to control the electrical activity of the brain — constitute NFB learning. The third step refers to transfer of the NFB learning, measured here by performance on a neurocognitive test as well as self-reported ADHD related symptoms.

This study employs two kinds of separate NFB training regimes: theta/beta and sensimotor-rhythm. Additionally, we included a novel "inverse mode" of training, which is a modification of each of these two regimes. Finally, transfer trials during which the patient is given no visual feedback were included toward the end of the trial.

Theta/beta (TB) training regime assumes a theta power that is elevated above normal, and therefore uses an inhibition target for theta power and a reinforcement target for beta power. EEG recording is often at a frontal site. The rationale behind TB training has been described in at least two different ways: as the rectification of cortical hypoarousal (Barry et al., 2003), and as the reinforcement of working memory (Vernon et al., 2003).

Sensimotor-rhythm (SMR) training regime reinforces beta power, usually low or mid beta, often with an inhibition target for theta. The site is above the sensorimotor strip, often lateral, such that the beta oscillations correspond to the sensorimotor rhythm. The rationale for SMR training has been proposed as either facilitating attention (Vernon et al., 2003), or the improvement of sleep through an increase in beta spindles, with concomitant effects on cognitive function (Arns et al., 2014a).

It is important to note, that NFB learning is anchored in two scientific theories, but occurrence of NFB learning as such tests only one of these. On the one hand, NFB learning relies on the cortical arousal model of ADHD that emphasizes underarousal in the cortical systems with excess slow wave activity affecting information processing (Sergeant et al., 1999; Barry et al., 2003). Based on this model, the NFB training aims at increasing fast-wave activity (in this study: SMR and beta bands) and decreasing slow-wave activity (in this study: theta) (Barry et al., 2003). However, NFB learning as such does not test whether the underlying problem of ADHD is under-arousal; what it does test is the operant conditioning of EEG activity. NFB learning is conceptualized in terms of changes in the amount of time a patient manages to move his/her EEG features in the required direction during training sessions as a result of learning to selfregulate cortical oscillations. Zuberer et al. (2015) argue that NFB outcomes should be tested by examination of the learning curves.

Thus, it is clear that, although aetiological and thus clinical specificity for ADHD is lacking, all NFB treatment regimes share a common goal of promoting self-regulation. On the other hand, some more modern NFB regimes have a more explicit approach to self-regulation than their older cousins. For example, Slow Cortical Potentials (SCP) training uses two opposed cortical regulation targets (Mayer et al., 2013), to be trained in random consecutive order. The two most common NFB training regimes TB and SMR do not include such an explicit set of counter-poised targets to induce self-regulation, relying instead on a single target of reinforcement/inhibition, which is trained repeatedly.

The target of SCP training is the Contingent Negative Variation (CNV) Event Related Potential, which Mayer et al. defined as a slow negative shift over central sites that develops following the presentation of warning stimulus while expecting an imperative stimulus that requires a response (Mayer et al., 2015). Thus, while SCP directly addresses self-regulation, it does so only for a single correlate of attention. Other cortical correlates of attention processes are addressed by other NFB training regimes, e.g., cortical hypo-arousal in TB, or spontaneous motor activation in SMR. However, these training regimes have no specific component designed to promote self-regulation.

Therefore, to the standard TB and SMR training regimes we have introduced a mode of "inverse training" (denoted iTB and iSMR), in order to explore the effect of adding an SCPlike approach to these unidirectional training regimes. This takes the form of an extra target in each regime, where the reinforcer/inhibitor is the exact opposite of the norm (see Methods).

TB and SMR training regimes are based on sub-second frequency-band features, so they are not directly comparable with SCP which feeds back the time domain DC component. However, from our point of view, there are at least three motivating reasons to test the "inverse training" self-regulatory modes of TB and SMR.

First, the neurological effect of TB and SMR remains unclear, due to the competing explanatory models. This study will not lay all questions to rest, but it does pursue a novel line of inquiry. Second, "inverse training" should aid the subjective experience of self-regulation, as it does in SCP; after all from the clinical point of view, NFB training in any training regime relies on the patient's own "mental strategy," reinforced by feedbackfree transfer trials. Third, combining the above two issues, the participants have the opportunity to learn the experiential correlate of the inverse neural state, and thus learn to be able to activate OR deactivate cortical resources at will. If we accept, for example, that TB trains the activation of cortical arousal, then the experiential correlate of inverse TB (iTB) might be more appropriate when the individual needs to enter a state of calm reflection. Similarly, SMR implies activation of the sensorimotor strip, which in turn implies quietude of bodily motor-neuron activity; however inverse SMR (iSMR) might be more appropriate when particular task activity calls for so-called kinaesthetic intelligence. This conceptualization of the process would follow the "skill-acquisition" model of Gevensleben et al. (2014). That is, the patient would gain a tool to enhance cognitive performance, as opposed (or in addition) to repairing a presumed cause of disorder.

Although earlier work (Lubar and Shouse, 1976; Monastra et al., 2005), including meta-analysis by Snyder and Hall (2006), has shown support for a single-trait model of ADHD (an elevated theta-beta ratio), others have argued that research results and clinical application should be interpreted with more regard for variability of individuals (Arns et al., 2008). Hammond (2010) goes into this issue in detail, illustrating the heterogeneity in quantitative EEG (qEEG) patterns associated with symptoms and discussing the requirements and need for qEEG analysis guided by normative databases. Johnstone et al. (2005) provided a review of such databases, along with a review of qEEG profiles, which are manifestations seen between genome and behavioral that they term "intermediate" EEG endophenotypes. They called for QEEG endophenotype-guided NFB treatments to provide non-pharmacological interventions to help the subgroup of nonresponders to traditional treatments, or complement traditional treatments in certain cases.

Especially in adults, who are subject to maturation effects across a broad age range, ADHD is a heterogeneous disorder with an uncertain treatment situation. In other words, some might have executive function deficits and might possibly benefit from TB over the prefrontal cortex; while some might benefit more from the characteristic behavioral correlate of SMR, that is, immobility as well as reduction of muscular tension (Chase and Harper, 1971; Howe and Sterman, 1972), thus facilitating the self-regulation of attention through mechanisms similar to mindfulness meditation (Zylowska et al., 2008). For this reason, in this study TB and SMR training regimes are assigned in a personalized fashion based on EEG spectral profiles (see Methods).

#### 1.2.1. Outline

In the rest of this paper, following the CONSORT guidelines, we first document the Methods and design of the trial, including participant criteria, intervention details, objectives, outcome measures, sample size calculation, randomization procedure and other allocation details, plus statistical analysis. Next, we provide existing results from the trial as it stands, primarily regarding how the treatment stage was run, along with preliminary analyses of group comparison outcomes. Finally we discuss issues arising from the trial design and implementation, as well as the implications of the preliminary analyses and future work.

# 2. METHODS/DESIGN

# 2.1. Participants

Inclusion criteria were scores on Adult ADHD Self Report Scale (ASRS) (Kessler et al., 2005), and Brown -ADHD scale (BADDS) (Brown, 1996) indicating presence of ADHD, as well as:


Exclusion criteria included extreme outlier scores in the scales of

• Generalized Anxiety Disorder (Spitzer et al., 2006),


Thresholds for exclusion were not fixed but at the discretion of the consulting psychiatrist. Use of medication for ADHD was not an exclusion criterion but participants were asked not to make changes in medication during the time of the training. Informed consent was obtained from each subject in accordance with the Declaration of Helsinki.

Self-report tests were distributed and submitted by mail. Psychiatric consultations were performed at the private practice of the psychiatrist; IQ testing was performed at the testing room of the University of Helsinki (UoH) Institute of Behavioral Sciences (IBS); both in central Helsinki.

#### 2.1.1. Ethical Approval

Written informed consent for participation was obtained from all participants before entering the study. The protocol followed the Declaration of Helsinki for the rights of the participants and the procedures of the study. An ethical approval of the present research protocol for all participants was obtained from The Ethical Committee of the Hospital District of Helsinki and Uusimaa, 28/03/2012, 621/1999, 24 §. Participants were not remunerated.

#### 2.1.2. Clinic of Treatment

The clinic for intervention sessions was required to be centrally located in Helsinki, to be staffed by licensed psychiatrist in case of emergencies, and to have recognition by the Association for Finnish Work. Technicians were required to have a primary degree in a discipline related to human psychology; they were also required to take a 3-month training course provided at the UoH based on principles of the Biofeedback Certification International Alliance (BCIA).

#### 2.2. Intervention

The experimental treatment was a novel neurofeedback (NFB) intervention, based on the well-known operant conditioning NFB training regimes "theta-beta" (TB) and "sensorimotor rhythm" (SMR); with the novel addition of a self-regulatory component designed to address the heterogeneity and aetiological uncertainty of ADHD in the adult population. The comparator was a WLC group. The WLC design places a randomly assigned control group on hiatus, while the active treatment is applied to the randomly assigned treatment group. The WLC group should receive treatment after the follow-up assessment at 24 months post-treatment, without experimental oversight.

Participants who volunteered in response to advertisements were recruited at time T0. Contact with the psychiatrist and psychologist for screening followed at time T1.

Successfully screened participants were taken to the intake measurement at time T2, where they performed the T.O.V.A. test along with eyes-open and closed baselines, while scalp EEG was recorded. The individual alpha peak frequency (IAPF) of each participant was estimated from band power analysis of eye-opened and eye-closed baseline conditions (Lansbergen et al., 2011). The boundaries of each EEG frequency band for each participant are defined with respect to IAPF, e.g., theta is IAPF×0.4 to IAPF×0.6.

After randomization between treatment and WLC groups at time T3, we assigned participants in the NFB treatment group to either TB or SMR training based on their IAPF-adjusted theta/beta ratio. Those with theta/beta ratio >1 (n = 9) received reinforcement for simultaneous increase in beta and decrease in theta (over power estimated from per-session baseline) at electrode Fz. The rest (n = 16) got reinforcement for increase in SMR and decrease in theta at electrode C4. Band powers within the NFB training regimes are adjusted by IAPF.

Treatment was administered using the following hardware and software set up. The EEG amplifier was the Enobio ambulatory device (Neuroelectrics SL, Barcelona)<sup>2</sup> , with streaming Bluetooth connection to standard Windows 8 desktop computers. The software was developed within the project, as described in Cowley et al. (forthcoming). Briefly, the system is based on OpenViBE signal acquisition framework<sup>3</sup> , with a Qt frontend, and is available open source<sup>4</sup> .

NFB interventions were standardized by scheduling of the training sessions: session duration was fixed; and training blocks per session, sessions per week, timing of the break from training, and total duration of training were all constrained to equalize the intervention. At time T4, treatment group participants began their treatment by being briefed about all aspects of the NFB training regimes, e.g., length, frequency, purpose. Finally for the first phase, outcome measures were taken at time T5, when all participants in the treatment group had completed 40 sessions NFB.

In the first phase the care providers were monitored by both the lead researcher and responsible psychiatrist on separate occasions, with interviews to ascertain their selfassessment of performance. Both care providers and patients were given self-assessment questionnaires to describe their working relationships.

The second phase of the trial is ongoing at time of writing. Beginning with re-recruitment at T6, phase two consists of a follow-up measurement for all, and treatment option for WLC participants. Follow-up measurements include the ASRS and BADDS self-reports. Treatment for the WLC group will not be NFB, but will consist of a game-like computerized attention training intervention, without concurrent recording of EEG.

## 2.3. Objectives

Our RCT research questions (RQs) follow Calderon and Thompson (2004), since we first examine NFB learning "within subjects," and second examine the transfer of NFB learning

<sup>2</sup>http://www.neuroelectrics.com/.

<sup>3</sup>http://openvibe.inria.fr/.

<sup>4</sup>https://github.com/CBRUhelsinki/CENTplatform.

comparing the treatment group to a WLC group. Within this paper, we report only first-stage analysis, namely the comparison between groups addressed by H2a and H3c below (see Results).

#### 2.3.1. Learning in NFB

The NFB learning metric reflects the proportion of time during training when EEG signals are in the target state; the "learning curve" is thus characterized by a signal evolving over blocks and sessions of training. The shapes or slopes of participants LCs are rarely reported in the NFB literature: analysis tends to focus on transfer outcomes compared to a control group. However, clinical observations commonly indicate that learning occurs. Also, NFB learning in this study was manipulated with the addition of the "inverse-training" mode. The LCs resulting from normal, inverse and transfer training blocks should each be slope-positive, because they each require a similar act of concentration which the participants are practicing throughout training. Finally, the profile of the LCs over sessions which combine all training types should be slope-positive, because training with counter-poised targets increases the need to selfregulate. Thus, we propose the following hypotheses:


#### 2.3.2. Transfer: Attention Test and Self-Reported Symptoms

Due to meta-analyses that find that NFB is efficacious for reduction of inattention (Arns et al., 2009), transfer of learning is expected to result in more improvement-over-baseline of the treatment group, compared with a control group, at the continuous performance test (CPT) Test Of Variables of Attention (T.O.V.A.) applied before and after training. Furthermore, those participants who perform better in baseline T.O.V.A. (lower scores), are expected to learn quicker during the NFB training.


Severity of subjective symptoms of ADHD/ADD should be reduced by the transfer of NFB learning to the ability to selfregulate. We also expect this effect to be dose-dependent, such that participants with better NFB performance (steeper positive slope) should present a higher rate of change in reported symptoms (steeper negative slope). Finally, the treatment group is expected to report fewer symptoms than the control group in the outcome measurement.


#### 2.4. Outcomes

Primary outcome measures include learning curve assessment, T.O.V.A., ASRS, and Digit span. These measures fall into two categories: (1) between-group comparison of NFB and WLC groups; and (2) within-subjects tests for treatment group. Primary measures are all comparative, truly experimental, and hypothesis-driven.

Secondary outcome measures include pre- and post-treatment vigilance measurement with an EEG protocol (Olbrich et al., 2012); also per-session self-report of circadian patterns, mood, excitement, effort and frustration; and the Pittsburgh Sleep Quality Index (PSQI) administered pre-, post- and at two intermediate points during treatment. These measures were taken to explore the additional research question of the relationship between sleep quality and NFB performance.

Additional methods taken to assess the quality of measurements include


#### 2.5. Sample Size

The power calculation of N = 60 was based on an estimated effect size for neurofeedback of 0.9, alpha at 0.05 and Power at 0.95 (for an independent samples t-test, estimate based on studies including Egner and Gruzelier, 2004; Rossiter, 2004a,b; Arns et al., 2009). Recent work has shown an effect size of 0.73 for a purely adult population (Mayer et al., 2012), but we can still maintain N = 60 by assuming power = 0.85, since this is equivalent to an phase 1 trial where the motivation is greater to minimize Type I error than Type II. This also contributes to our motivation to use a WLC along with the principle that NFB is closer in nature to a behavioral intervention than a drug trial, and thus double-blinding is unwarranted and unethical.

## 2.6. Randomization Procedure

We assign patients to test or control groups using a procedure that controls for selection bias, and known and unknown sources of external variance due to prognostic variables; the procedure is based on combined randomization and adaptive allocation.

Simple randomization prevents biased selection and meets the assumptions of uniform assignment probability made in standard inferential procedures. However, it does not guarantee equal group sizes and can lead to baseline imbalance in prognostic variables such as age, gender or disease severity. Equal group sizes

<sup>5</sup>http://www.measuringimpact.org/home.

can be obtained using blocking, while baseline imbalance can be helped by using blocking stratified over prognostic variables, although the number of variables which can be used is small. This limitation is surmounted by adaptive allocation, exemplified by minimization methods: these are not strictly random but allow tight control of balance of multiple prognostic variables (see e.g., Roberts and Torgerson, 1998 on methodological issues).

In their review of recommendations of assignment method, (Scott et al., 2002, p. 671) found a general support for minimization with a random element in smaller trials. Our approach follows: X% random blocking followed by 100-X% minimization (0<X<100). The algorithm is:


Assignment was balanced over age, sex, education, IQ, diagnosis (ADHD vs. ADD), comorbities from diagnosis, comorbities from administered scales, and ASRS subtype score, and tested between groups to show no statistically significant differences (see **Table 1** below). The same tests returned null when run after any change in relative group composition, due to e.g., drop-outs. Thus, groups did not differ in terms of symptom severity, diagnosis, IQ or demographic features at assignment.

Given that the initial allocation of patients is random, further non-random selections will also be random at the population level. Even if the minimization assignments cannot be considered random, the overall assignment retains a proportional amount

TABLE 1 | Demographic and clinical characteristics for NFB and WLC groups.


of randomness equivalent to X, similar to more complex biased-coin approaches. In this trial X was set equal to 50.

The most important caveat of the approach is that variables used in minimization must be included as covariates during analysis, to avoid potentially misleading results.

Technicians must be assigned to treatment and control group by recruitment on-demand, as the group treatment phases were separated in time.

#### 2.6.1. Allocation Concealment

Participants were randomly assigned by use of computerized algorithm detailed above, at one time, directly after all intake measurements and before the start of NFB treatment.

#### 2.6.2. Implementation

The allocation algorithm was designed and implemented by the lead researcher; the direct contact with participants for enrolment and assignment was handled by a technician working for the MCC clinic.

#### 2.6.3. Blinding

Due to the fact that the trial used waiting list for the control group, assessment was not blinded. Thus, after assignment, all participants and researchers/technicians had access to the assignment information. As mentioned above, since NFB is closer in nature to a behavioral intervention than a drug trial, doubleblinding would be unwarranted and unethical.

#### 2.7. Statistical Methods

Independent variables for group comparison include NFB training regime (TB vs. SMR), participant age and gender, and assigned group (treatment vs. WLC group). Dependent variables (DVs) for group comparison include the T.O.V.A. and the ASRS self-report.

T.O.V.A. variables, based on response times (RT) and error rates, include RT variability (RTV) indicating consistency; mean RT; Omission errors (OM) indicating inattention; Commission errors (COM) indicating impulsivity; as well as the D-prime score. D-prime is described as a measure of "perceptual sensitivity" and has been suggested as an index of arousal (Losier et al., 1996). T.O.V.A. variables are standardized in the analysis.

To evaluate the effect of NFB on T.O.V.A., we created five new variables by subtracting the baseline scores from the outcome measurement scores: RTV-change, mean RTchange, OM-change, COM-change, and D-prime-change. These T.O.V.A. change scores are subsequently compared for the treatment group and the WLC, using independent samples t-test. The Levene's test showed that the variances for the two groups were similar. Consequently, the independent samples t-test was run with equal variances assumed.

MANOVA was subsequently used to evaluate the effect of NFB training, compared to the WLC, in the outcome measures on the five dependent variables of T.O.V.A.

ASRS consists of 18 items tapping the frequency of recent DSM-IV criterion symptoms of adult ADHD, including a scale for Inattention (IA, max 36 points) and a scale for Hyperactivity-Impulsivity (HI, max 36 points). We calculated differences of scores between baseline and outcome measurements to create IA-change and HI-change scores. These difference scores are subsequently compared for the NFB and the control group, using independent samples t-test.

The Levene's test showed that the variances in IA-change were statistically significantly different in the two groups (F = 4.36, p < 0.05). As a result, the independent samples t-test for IA-change was run with equal variances not assumed. The opposite was the case for the variable HI-change.

In all our random coefficient models the intercept and slope are separately estimated for each participant. That is, the coefficients are estimated for each participant for the linear regression equation as follows: Score = Intercept + B (Session).

All participants were treated with NFB at a single center. All technicians were equivalently trained and capable; therefore clustering of participants per technician was based simply on scheduling logistics.

# 3. RESULTS

Results primarily describe the specific details of the intervention implementation. Also, as stated, a preliminary between-subjects comparison was performed after phase one to assess intervention efficacy under a conventional analysis model. Thus, we report the two straightforward pre- to post-treatment outcome measures which are comparable between-groups: T.O.V.A. and ASRS. The ambiguous results of this conventional approach, especially in an unblinded context, supports the motivation to extend the analysis with LC modeling. We therefore include these results in the protocol report because they constitute an informative part of the trial design going into the second phase.

#### 3.1. Participant Flow

Eighty-two adults were recruited through cooperating clinics Mental Capital Care, Neuromental and YTHS; also by newspaper advertisement and posting to online forums for the Helsinkibased ADHD society of Finland. Of this, 19 dropped out of the trial before completing the screening process, due to various issues.

Sixty-three participants were screened by a psychiatrist and a psychologist prior to the training, resulting in nine participants screened out of the trial due to one or more failures to meet the criteria. The remainder (n = 54) consisted of 29 females, 25 males, mean age 36 std.dev. 10 years, with 44 ADHD and 10 ADD diagnoses.

Participants were split equally between treatment (n = 27) and control groups (n = 27); however two switched from treatment group (final n = 25) to control group (final n = 29) for personal reasons. From these assignments, eight dropped out from the WLC group (including one participant whose pre-test measurement data was then deleted by request), and two from the treatment group. Thus, 23 treatment group, and 21 WLC group cases (total n = 44) were available for analysis. The trial progression is shown in detail in **Figure 1**.

Out of those who completed NFB, five participants did not complete treatment on the pre-defined schedule, of which three exceeded by more than a week. Delays were due to personal reasons, causing a number of cancelations of scheduled sessions, which is a regularly observed phenomenon in this diagnostic group.

#### 3.1.1. Implementation of Intervention

In practice, NFB training consisted of ∼40 sessions (range: 38– 41) during 2–4 months. There was a mid-training pause of nominally 2 weeks. Patients came to the sessions 2–5 times a week. One session lasted ∼1 h, subdivided into self-report of mood, excitement, hours slept and hours awake; electrode attachment; baseline measurement; 5–7 units of 5 min NFB trials; and debrief including self-report of effort and frustration. During each session, patients played different NFB "game" trials during which they got immediate visual reinforcement for classifiermatching states in their EEG. The scores per game trial are baseline-adjusted and averaged per session to form characteristic LCs. The content and purpose of the training sessions followed a phased timeline:


#### 3.1.2. Recruitment

As shown in **Figure 1**, participants were recruited in May/June 2012, with intake measurements during July/August 2012. Randomization took place in early September; treatment began September 17, 2012. Outtake measurements ran from April until August 2013.

Follow-up measurements are planned for start of 2016.

# 3.2. Baseline Data

The participants who began the trial (n = 54, 29 females) had mean age 36 years (std.dev. 10 years), with 44 ADHD and 10 ADD diagnoses. The characteristics per group are shown below in **Table 1**.

All intervention sessions were performed at the Mental Capital Care (MCC) clinic premises in Helsinki, which met all requirements described above. Technicians were recruited to administer the NFB from IBS and MCC; they were trained at UOH over a 3 month period; four of the team also attended the Biofeedback Certification International Alliance (BCIA) accredited introductory course at the Brainclinics education and treatment center, Netherlands.

#### 3.2.1. Numbers Analyzed

The analysis included 23 participants in the treatment group, and 21 participants in the WLC group. Given that dropouts were not analyzed, this analysis was not by intention-to-treat.

#### 3.3. Outcomes and Estimation

For H2a we find no support after analysis of the five T.O.V.A. indexes; **Table 2** shows their mean differences between baseline and post-training. Changes in these variables were not significantly different for the NFB group than for the WLC. Results of the MANOVA at the outcome measurement revealed that on the Wilks' Lambda the difference in means between the NFB and WLC did not reach significance [F(5, 38) <sup>=</sup> 0.45, <sup>p</sup> >0.05]. Thus, NFB training had no significant effect on the 5 indexes of T.O.V.A. at the end of the intervention.

**Table 3** presents the mean difference of the two indexes between baseline and post-training and the results of the independent sample t-tests comparing IA-change and HI-change between NFB group and WLC. The NFB group presented a higher reduction of inattention symptoms than the WLC t(36.03) = −2.14, p < 0.05. Similarly, while the NFB group presented a reduction of HI symptoms from baseline to posttraining, the WLC presented an increase in HI symptoms t(44) = −2.42, p < 0.05. Thus, we find statistically significant support for H3c. That is, the treatment group reported greater improvements in ADHD/ADD symptoms than the WLC.

A more detailed analysis concerning the rest of the hypotheses will follow in a separate paper. As treating the hypotheses H1a-b, H2b, and H3a-b requires substantial additional methodological reporting, addressing them does not fit the scope of a clinical trial report.

#### 3.3.1. Adverse Effects

No adverse effects were observed in the treatment group. Further investigation of this question is planned for follow-up, using the state-oriented self-report items described above.

#### 4. DISCUSSION

#### 4.1. Interpretation

Regarding the relationship between NFB learning and performance in the continuous performance test, H2a proposed that the NFB group will achieve better T.O.V.A. performance, and improve more after training, than the WLC. Results of this study did not find evidence for such transfer. Patients participating in the NFB training did not perform better than WLC on the 5 indexes of T.O.V.A. in the outcome measurement. This result can be interpreted in at least two ways. On the one hand, it can be a sign of the all too common problem of transfer of training (Green and Bavelier, 2012).

Lack of transfer is one of the most important of the several key obstacles pertaining to the effect of NFB trainings (Gazzaniga, 2009, p. 94). Because brain plasticity is highly task specific, training in a specific task shows little or no improvement on related tasks. On the other hand, the results of this study can mean that NF learning bears no relationship to performance on any indexes of the T.O.V.A. test. This would contradict the findings of Losier et al. (1996) who considered the D-prime index of T.O.V.A. a consensus index of arousal, which is, in turn, assumed to be a manifestation of excessive slow wave brain waves in ADHD patients (Barry et al., 2003).

TABLE 2 | Groups statistics for RTV-change, mean RT-change, OM-change, COM-change and D-prime-change scores from baseline to outcome.


TABLE 3 | Groups statistics for IA-change and HI-change scores.


H3c suggested that the NFB group will report greater improvements in ADHD symptoms than the WLC. Results show the change of IA was significant. Patients did perceive a reduction of inattention symptoms over the course of the training. Furthermore, this perceived reduction of inattention symptoms differed significantly from the perceived reduction of inattention symptoms of the control group. This supports the meta-analysis by Arns et al. (2009) concluding that NFB has large effect sizes on inattention. In the index of HI, no negative linear trend was found. Interestingly however, patients in the NFB group did perceive a significantly larger reduction of these symptoms over the course of the training than the control group.

It might be that the training indeed resulted in some reduction of IA and HI symptoms. This interpretation gets support from theories claiming that ADHD is, in effect, pathology of executive functions that cannot be tapped by neurocognitive tests, but can instead be measured by self-reported questionnaires (Rabbitt, 1997; Brown, 2009, pp. 81–116). Alternatively, the results can also be interpreted as an example of the so called Hawthorne effect (Green and Bavelier, 2012). Establishing the presence of experience-dependent learning effects is not always straightforward. It is well documented, that individuals who take an active interest in their performance tend to improve more, or evaluate their improvement more positively. The Hawthorne effect can lead to powerful subjective improvements that have little to do with the specific cognitive training regimen being studied reflecting motivational factors instead.

If the training caused the decreased symptoms, a higher rate of learning should have a relationship to the rate of ADHD symptom decrease under the training. Therefore further analyses of this trial will examine the relationship between NFB LCs and the trend of self-reported ADHD symptoms.

Also, the different learning performance levels across the group should reflect different long term effects, to be measured during the second phase in a within-subjects analysis. This contrasts with the effect from the interaction-derived placebo which is relatively constant for all participants (per group), and presumably "fades away" quickly after the phase one intervention. Thus, second phase measurements will be analyzed with respect to NFB LCs.

Though this is a relatively small study, we believe the analysis of learning curve questions is a novel and useful contribution. As noted by Arns (personal communication, 2012):

"In any Neurofeedback study it is very important to track and have an indication of 'learning.' If neurofeedback fails to demonstrate a clinical effect and there is no indication that learning actually took place, one can't draw any conclusions about neurofeedback. In an analogy, if one employs operant conditioning to [teach] a rat to press a lever, and the rat does not learn to press the lever, then it is incorrect to conclude that 'operant conditioning' does not work. This means that maybe the operant conditioning procedure was not implemented effectively. The same applies to neurofeedback, and this is further illustrated by the study from Roger DeBeuss. His study employed suboptimal parameters e.g., auto-thresholding, 'game' feedback and an 'unconventional' training regime (engagement index) making it likely harder to learn. On the group level they found no effects of neurofeedback in ADHD, however, when separating learners from non-learners based on session data they did find an effect."

Questions of efficacy on the other hand are still a matter of controversy in the literature. WLC controls are not accepted as sufficient evidence by some. Double blind control through sham NFB was not chosen for this study because of the discussed lack of general understanding of ADHD. This lack, combined with the extremely contingent nature of NFB which depends heavily on non-specific aspects of treatment, implies that even if a double blind RCT showed large effect for NFB the causal mechanisms would still not be clear. A true resolution to this issue is probably only possible by running the kind of large sham NFB RCT called for by others; we choose instead to side-step this debate and focus on questions of internal comparisons within the method.

There is no blinding in this open-label WLC study, except at random assignment to groups. Other control paradigms may be preferred in pharmacological interventions; however there are a number of arguments in support of WLC, in the context of NFB. The WLC is a minimum viable control for non-specific effects of history, maturation, repeated testing, instrument drift, statistical regression, selection bias, and population inhomogeneity effects (Mohr et al., 2009). There is no control for non-specific or placebo effects; however this is still a valid experimental control design. Among other things, WLC controls for expectation and attention (Hawthorne) effects, whereby the notion that at some future point treatment will be provided (and life will improve) is by itself able to produce improvement. Additionally, longitudinal (rather than parallel) designs control for maturation, regression to the mean, instrument drift and practice effects; also time threats to validity (the same effect occurring 2 years in a row in different samples rules out external non-seasonal temporal causes for the effect. Seasonal causes with a WLC group should be ruled out by staggered application i.e., 2nd treatment starting in spring).

Technicians can be considered equally skilled and expert. All began as neurofeedback novices before the trial. Qualification history was varied. Due to availability, only some began their training with attendance at the Biofeedback Certification International Alliance (BCIA)-accredited introductory course at the Brainclinics education and treatment center, Netherlands. However, all five then shared 3 months training at UoH premises, including extensive peer review work, which helped to disseminate and pool the knowledge across the group.

# 4.2. Generalizability

External validity asks the question of "generalizability": to what populations, settings, treatment variables and measurement variables can this effect be generalized? While the question of external validity is never completely answerable, it is of particular interest for intervention research (Campbell et al., 1966, pp. 171–246). Campbell et al. (1966) note that there is a recurrent reluctance among researchers to accept Hume's truism that induction or generalization is never fully justified logically. While the problems of internal validity are solvable within the limits of the logic of probability statistics, the problems of external validity are not logically solvable in any neat, conclusive way (Campbell et al., 1966, p. 17). Generalization always involves extrapolation into a realm not represented in one's sample. Here, the issue of sample bias is of importance. If an experimental study is conducted with voluntary patients from a given district, they might have characteristics that cause the experimental treatment to be more effective than it would be in other populations. However, for ethical reasons, intervention studies are impossible to conduct without the informed consent of the research subjects. It is obvious, that a "true" experimental design is in practice impossible with a fully representative sample of a given country, let alone all ADHD patients in the world. It must be emphasized that the results of an experiment "probe" but do not "prove" a theory. An adequate hypothesis is one that has repeatedly survived such probing, but it may always be displaced by a new probe. Many findings in experimental psychology gain generalizability not through the nature of the setting in which they occurred, but through their ability to establish a theory of basic mental processes that are implicated in many tasks.

## 4.3. Overall Evidence

After preliminary analysis, the trial did not find evidence for a transfer of learning that was the intended benefit of the intervention. Since the intervention's goal is symptomatic improvement outside the laboratory, the results underline how the transfer-problem limits the potential benefits. Patients did perceive a reduction of Inattention symptoms over the course of training, yet this change was not reflected in better performance in the continuous performance test (T.O.V.A.).

We propose that in the absence of any other evidence, one should consider these self-report results as due to placebo by default. The improvement in self-reported symptoms might not be specifically due to NFB, but due to employment of goaloriented attention in general. At present, we can not rule out the possibility that the individuals would also report improvement if all other factors were held equal but NFB was swapped for some other exercise that requires concentration. That is, our preliminary results did not support the effectiveness of NFB in alleviating the symptoms of AHDH/ADD.

Nevertheless, keeping in mind the core aim of studying mechanisms and models of NFB, the CENT trial is on track to provide the necessary evidence. As Seidman (2006) suggest, an adequate neuropsychological model of ADHD should utilize measures from multiple domains to be able to encompass

# REFERENCES


subtypes and multiple deficits. Combining neuropsychological, neurophysiological and behavioral measures, we aim toward an evaluation of structure-function relationships in NFB treatment for adult ADHD.

#### AUTHOR CONTRIBUTIONS

BC designed and implemented the study and wrote the draft text; EH conducted the statistical analyses and contributed to the draft; KJ and LK contributed to the study design and implementation, and the draft; CK contributed to the study design and the draft.

# ACKNOWLEDGMENTS

The authors wish to thank Svetlana Kirjanen, Mona Moisala, Marko Repo, Hanna Björkstrand, Tanja Hyttinen, Jari Torniainen, Teemu Itkonen, Markus Kivikangas for their part in running the trial; also Laura Hokkanen and Jari Lipsanen for assistance with the work of EH. Partly funded by Finnish science agency TEKES, project #440078.


Liebowitz, M. R. (1992). Dissociative experiences scale. Am. J. Psychiatry 149, 719.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Cowley, Holmström, Juurmaa, Kovarskis and Krause. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Beware: Recruitment of Muscle Activity by the EEG-Neurofeedback Trainings of High Frequencies

Katarzyna Paluch<sup>1</sup> \* † , Katarzyna Jurewicz <sup>1</sup> \* † , Jacek Rogala<sup>1</sup> , Rafał Krauz <sup>2</sup> , Marta Szczypi´nska<sup>3</sup> , Mirosław Mikicin<sup>3</sup> , Andrzej Wróbel <sup>1</sup> and Ewa Kublik <sup>1</sup>

<sup>1</sup>Department of Neurophysiology, Nencki Institute of Experimental Biology of Polish Academy of Science, Warsaw, Poland, <sup>2</sup>Centre for Physical Education and Sport, Military University of Technology, Warsaw, Poland, <sup>3</sup>Department of Physical Education, University of Physical Education, Warsaw, Poland

EEG-neurofeedback (NFB) became a very popular method aimed at improving cognitive and behavioral performance. However, the EMG frequency spectrum overlies the higher EEG oscillations and the NFB trainings focusing on these frequencies is hindered by the problem of EMG load in the information fed back to the subjects. In such a complex signal, it is highly probable that the most controllable component will form the basis for operant conditioning. This might cause different effects in the case of various training protocols and therefore needs to be carefully assessed before designing training protocols and algorithms. In the current experiment a group of healthy adults (n = 14) was trained by professional trainers to up-regulate their beta1 (15–22 Hz) band for eight sessions. The control group (n = 18) underwent the same training regime but without rewards for increasing beta. In half of the participants trained to up-regulate beta1 band (n = 7) a systematic increase in tonic EMG activity was identified offline, implying that muscle activity became a foundation for reinforcement in the trainings. The remaining participants did not present any specific increase of the trained beta1 band amplitude. The training was perceived effective by both trainers and the trainees in all groups. These results indicate the necessity of proper control of muscle activity as a requirement for the genuine EEG-NFB training, especially in protocols that do not aim at the participants' relaxation. The specificity of the information fed back to the participants should be of highest interest to all therapists and researchers, as it might irreversibly alter the results of the training.

Keywords: artifacts, attention, beta rhythm, biofeedback, muscle control, placebo

# INTRODUCTION

In the last two decades, EEG-based neurofeedback (EEG-NFB) received vast popularity in clinical and paramedical practice, even though the therapeutic usage of this method was precariously ahead of the careful, systematic examination of its physiological mechanisms, confounding factors and possible side effects.

The method belongs to a broader category of biofeedback techniques aimed at altering various physiological parameters such as heart rate (ECG-feedback), muscle tension (EMG-feedback)

#### Edited by:

Soledad Ballesteros, National University of Distance Education, Spain

#### Reviewed by:

Felix Darvas, University of Washington, USA José Luis Ulloa, Ghent University, Belgium

#### \*Correspondence:

Katarzyna Paluch k.paluch@nencki.gov.pl Katarzyna Jurewicz k.jurewicz@nencki.gov.pl

†These authors have contributed equally to this work.

Received: 14 September 2016 Accepted: 27 February 2017 Published: 20 March 2017

#### Citation:

Paluch K, Jurewicz K, Rogala J, Krauz R, Szczypi ´nska M, Mikicin M, Wróbel A and Kublik E (2017) Beware: Recruitment of Muscle Activity by the EEG-Neurofeedback Trainings of High Frequencies. Front. Hum. Neurosci. 11:119. doi: 10.3389/fnhum.2017.00119

**330**

and others. The trainings are based on the assumption that one can learn to change her/his brain physiological activity in a chosen oscillatory frequency by virtue of continuous feedback about its amplitude. On the contrary to research on brain computer interfaces (BCI) which concentrates on finding easily detectable and modifiable signals that can be reliably used by machine control algorithms, the EEG-NFB training is directed toward lasting changes of brain activity and behavioral improvement. The trainings aim to induce systematic increases/decreases of predefined specific EEG frequencies reflecting particular cognitive or behavioral functions.

The EEG-NFB has been tested as a treatment in a vast domain of neurological and psychiatric disorders, e.g., epilepsy (Sterman and Friar, 1972; Kotchoubey et al., 2001), attention deficit hyperactivity disorder (ADHD; Kaiser and Othmer, 2000; Fuchs et al., 2003; Kropotov et al., 2007), schizophrenia (Gruzelier et al., 1999; Surmeli et al., 2012) and even in traumatic brain injury (TBI) and stroke rehabilitation (for the review of clinical applications, see Yucha and Montgomery, 2008). In healthy subjects the EEG-NFB has been applied expecting behavioral and/or cognitive improvements (Arns et al., 2008; Reiner et al., 2014) and as a supportive training of cognitive performance in the elderly (Becerra et al., 2012; Staufenbiel et al., 2014).

All traditionally discriminated EEG bands have been used as a feedback source, i.e., slow cortical potentials (<2 Hz; Birbaumer, 1999; Heinrich et al., 2004; Strehl et al., 2006; Leins et al., 2007), theta (4–7 Hz; Egner et al., 2002; Raymond et al., 2005; de Zambotti et al., 2012), alpha (8–12 Hz; Egner et al., 2002; Raymond et al., 2005; Zoefel et al., 2011; Gruzelier et al., 2014), lower and higher beta (12–30 Hz; Cannon et al., 2009; Egner and Gruzelier, 2001, 2004) and gamma (>30 Hz; Keizer et al., 2010a,b; Staufenbiel et al., 2014). This large diversity of protocols (i.e., sets of frequency bands used to up- or down-regulate their amplitudes or their ratios) resulted from a belief that each frequency range is related to some specific cognitive functions. Even though instances of such frequency-to-function mapping have been documented (Wróbel, 2000, 2014; Wang, 2010; Anguera et al., 2013), their complex interactions do not allow for simplifying generalizations and require further investigations. Among others, the beta band has been posited to be an attention carrier (Wróbel, 2000, 2014), with specific, local increases of amplitude during attentional tasks positively correlating with correct performance in animals and humans (Bekisz and Wróbel, 1993; Buschman and Miller, 2007; Wróbel et al., 2007; Kami´nski et al., 2012; Gola et al., 2013). The up-regulation of this band has been applied in more complex EEG-NFB training protocols as a supplementary treatment in ADHD (e.g., Lévesque et al., 2006; Leins et al., 2007) and as skill enhancement for sportsmen and the elderly (e.g., Rostami et al., 2012; Staufenbiel et al., 2014).

Research focusing on beta and gamma bands confronts the problem of EEG contamination by muscle activity. Electromyographic activity recorded on the surface of the skin is composed of high frequencies with most of the power concentrating between 20 Hz and 150 Hz (Criswell, 2011). In consequence, muscles located on the head (e.g., temporal, occipitofrontal and auricular muscles) or even more distally can interfere with the EEG, sometimes constituting a majority of the power in the higher frequencies (Goncharova et al., 2003; Whitham et al., 2007) and influencing most of the electrodes on the scalp (Goncharova et al., 2003; Yilmaz et al., 2014). The relation between these signals is further complicated by the fact that facial EMG was shown to be sensitive to numerous cognitive and affective processes, including cognitive load (Waterink and van Boxtel, 1994; Whitham et al., 2008). Muscle interference has been widely discussed in the context of EEG and EMG data analysis (for review see McMenamin et al., 2011; Muthukumaraswamy, 2013) but is not sufficiently recognized and controlled in EEG-NFB trainings (see ''Discussion'' Section and Enriquez-Geppert et al., 2017). The EEG signal analyzed offline can be iteratively examined and cleaned of any obscuring components. However, in the EEG-NFB the signal is analyzed online and transformed into stimuli immediately fed back to a trained person. All undetected artifacts modify the feedback signal. This might cause different effects in the case of various training protocols and therefore needs to be carefully considered while designing training protocols and algorithms.

Here we report a particular instance of this problem concerning the NFB up-regulation of the beta band. Since there is vast research on clinical and normal population reporting no control or one with doubtful effectiveness (e.g., Leins et al., 2007; Gevensleben et al., 2009a,b; Keizer et al., 2010a,b; Logemann et al., 2010; Meisel et al., 2014; see ''Discussion'' Section), we conducted an experiment to assess the possible impact of muscle activity on EEG-NFB results. We applied the beta up-regulation set-up, commonly used as training aimed at improving attention (Egner and Gruzelier, 2001; Vernon et al., 2003; Egner et al., 2004; Logemann et al., 2010; Ghaziri et al., 2013). Healthy, young participants were trained to voluntarily increase the amplitude of beta1 band oscillations (15–22 Hz) recorded from the leads overlying the areas of the frontoparietal attention network. Examination of the raw signal revealed in a subgroup of participants a substantial muscle employment which increased systematically in the course of the session, mimicking the expected increase in the beta1 band. If the muscle related effects had gone unnoticed, the conclusion of our study would have been falsely positive stating a successful upregulation of the beta1 band. We discuss the need for proper muscle control for a reliable NFB training in the light of the current EEG-NFB literature.

# MATERIALS AND METHODS

#### Participants

Thirty-two male healthy university students, age m = 21.97 ± 1.88 years (mean ± standard deviation), were recruited for the experiment. The experiments were approved by the local ethics committee (Bioethical Committee at the Military Institute of Hygiene and Epidemiology). All subjects were informed about the study and gave their written informed consent for participation in the experiment in accordance with the Declaration of Helsinki.

# The EEG-Neurofeedback Training

The subjects were randomly assigned to one of the three training groups: beta plus (B+), aimed at increasing the amplitude of beta1 (15–22 Hz) oscillations (n = 14), beta minus (B−) dedicated to down-regulation of the beta1 oscillatory activity (n = 6) and sham (SH) group, receiving pseudo-feedback (generated by a computer algorithm), unrelated to the brain's EEG signals (n = 12). The participants were unaware of their group affiliation and uninformed about the existence of the sham group to prevent loss of motivation.

The training sessions were performed using a customized version of the commercial EEG DigiTrack Biofeedback system (ELMIKO MEDICAL Sp. z o. o.). Each participant had a personal code, which was recognized by the program and started groupdependent feedback protocol. At the user (trainer) level, the program displayed set-ups only for two groups: beta plus and beta minus. Sham protocols were run under facade of these set-ups (half as B+ and the other half as B−). The trainings were conducted by hired professional NFB trainers. In order to reduce possible nonspecific effects trainers were instructed not to additionally motivate the trainees. Over a period of 1–2 months the subjects underwent eight training sessions (one to two trainings per week). During the session subjects were seated in a chair in front of a 17<sup>00</sup> computer LCD screen (∼70 cm from the screen). Each session consisted of 10 blocks of 3 min duration each. The session started after mounting the EEG electrodes with a short (ca. 2 min) resting period in order to accustom the participants with the training situations and screen the control sample of the EEG signal.

EEG was recorded from F3, F4, P3 and P4 sites in 10–20 standard, with linked ears as a reference and ground electrode placed at the Pz. Thus, the electrodes were positioned over the frontoparietal attention network nodes (Gross et al., 2004; Donner et al., 2007; Siegel et al., 2008). The signal was sampled at 250 Hz and band-pass filtered between 0.16 Hz and 70 Hz, with a notch filter at 50 Hz. A fast Fourier transform (FFT) spectrogram was computed for each electrode. The feedback parameter was obtained by averaging the FFT amplitudes over the beta1 range and across the electrodes. The FFT window of 2.07 s (512 point, giving 0.49 Hz resolution) was sliding with 92% or 77% overlap and, accordingly, the amplitude values presented to the trainer and used for feedback were updated with 200 or 500 ms delay. Windows overlap and the feedback delay varied between subjects (due to two software versions used in the study) but were constant for each person and randomly distributed among the experimental groups. There was no difference in results of participants trained with these two settings.

The training display consisted of a shooting target presented in the background and four green dots moving inwards and outwards along vertical and horizontal axes (**Figure 1**). The feedback information about the amplitude of beta1, was provided by the synchronized movement of the dots in B+ and B− groups. In the sham group a predefined algorithm controlled the movement of the dots. When the amplitude changed in the intended direction (towards the threshold value set manually by the trainers) all dots moved inwards. The subject's goal was to make the green dots meet in the center. To boost the participants' motivation and to make the training more involving additional reinforcements were provided. When the beta1 amplitude reached 43% of the threshold, the display was complemented with black rings within the high-scored area of the shooting target. When the beta1 amplitude reached 75% of the threshold value, a red ring in the center of the shooting target was presented, signaling achievement of the goal. The threshold defining the required value of the beta1 band amplitude was adjusted manually by the trainers during the session to provide a relatively constant rate of reward, thus encouraging the subjects to continuously improve their performance.

#### Processing of the EEG Data

Raw EEG data were exported from the DigiTrack environment to the European Data Format (EDF) and further analyzed using the EEGLAB software (Delorme and Makeig, 2004) and self-written MATLAB scripts. The off-line analysis involved similar preprocessing of the data as the online feedback computation, to obtain the same frequency bands to those produced by the EEG-NFB apparatus. The continuous data recorded during training session were mean corrected and filtered to remove frequencies lower than 0.5 Hz and higher than 70 Hz. Notch filtering was applied at 50 Hz. The signal was split into 1 s epochs, which in turn were searched for artifacts with the EEGLAB function pop\_autorej. An epoch was rejected from all the channels if any data point in this epoch exceeded 5 standard deviations from the amplitude of the signal on any of the channels. The algorithm proceeded iteratively—if the number of epochs classified for exclusion exceeded 5% of the data, the procedure was repeated with a more liberal exclusion criterion (increased by 0.5 standard deviation). Furthermore, to remove muscle artifacts, for every subject we removed whole training blocks, in which higher frequencies (22–45 Hz, referred to in DigiTrack software as beta2 band) diverged by more than 3 standard deviations from the individual mean of that participant. The procedure ran iteratively until no such cases were found. In effect, 3.05% of the data was removed, including three full sessions (belonging to two subjects). The FFT analysis was computed separately for each 3 min training block with a sliding Hanning window 512 points-long with a 92% overlap.

The visual inspection of the raw signal and the FFT spectra (performed after all cleaning steps described above) revealed, that the signals of some participants were dominated by frequencies above 15 Hz (**Figure 2**). Since such oscillations constituted the majority of some participants' signal, cleaning algorithms relaying on signal distribution parameters e.g., mean and standard deviation were unable to detect and reject them during offline automatic data processing. These signals were characterized by long sweeps of sharp high frequency oscillations characteristic for EMG (Criswell, 2011). In these cases, a typical logarithmic-like shape of the FFT curve was substantially distorted by an elevation spanning from higher to lower parts of the spectrum.

Therefore, we asked three independent judges to visually inspect the raw signals and the FFT spectra from all blocks and sessions of each participant and to classify the sessions as contaminated by muscle activity if FFT spectrum was distorted above 15 Hz in the majority of blocks (**Figure 2**; Criswell, 2011). The judges listed the participants who exhibited such a pattern in more than half of their sessions as muscle employing ones.

To confirm that it is possible to distinguish muscle-employing participants from the whole sample in an automated approach we applied two additional methods of classification: k-means clustering and logistic regression. For both these methods we used solely the amplitudes of beta2 as the bases of classification.

When applying the k-means method we asked for two clusters, to prove that our division into muscle employers and others is the most prevalent pattern in the data. The analysis was performed with the use of beta2 amplitudes from all

available recordings (10 blocks × 8 sessions for each subject, 3.05% of blocks were missing, due to previous data cleaning and were substituted with the average of a given participant). The algorithm was set to minimize absolute deviations within clusters by calculating the median along predefined dimensions (Manhattan distance).

The logistic regression was applied as a supervised method of data classification. The binary classifier (muscleemploying/other) was guided by the classification made by independent judges. The mean beta2 amplitude from all available recordings for each participant constituted the predictor value. The regression line, fitted to the data, quantified the relationship between beta2 amplitude and the probability of belonging to one of the two categories. The decision criterion was established at a probability of 0.5.

In addition to the trained beta1 band (15–22 Hz), we reported on the alpha (8–12 Hz) and beta2 (22–45 Hz) flanking bands, as they are capable of showing potential specificity of the training effects.

#### Statistical Analyses

The amplitude values in each frequency band were averaged from the four electrodes (F3, F4, P3, P4) to reproduce the training setup averaging online the amplitudes from all channels. We confirmed with a three-way ANOVA of group, session and electrode that there were no significant differences between individual electrodes with respect to amplitudes of analyzed bands (no significant effects of electrode or interactions including this factor, all p > 0.201).

The main goal of the analysis was to compare the effects of the training in the participants employing and not employing muscles in their performance. The most prominent characteristic of the muscle-employing subjects was a pronounced elevation of high frequency amplitudes and their high variability among different blocks. In order to maintain the relations between individual subjects, as present in the raw data, we chose to perform a between-subject standardization (by subtracting the mean and dividing by standard deviation from all blocks/sessions across all participants). This procedure performed for each band separately enables a direct comparison of different frequency bands as it shifts the values to a common range (z space). Additionally, to ensure that the observed effects, even if different in absolute size, are common for the group and not driven by single cases we repeated our analysis with the within-subject z-scores (using individual mean and deviation for each subject).

We verified the effects of the EEG-NFB during the course of the session (the within session effects) and in consecutive sessions (the between session effects). For the within session effects, the values for each of the 10 blocks were obtained by averaging across all the sessions. For the between-session effects, the values were obtained by averaging all the blocks constituting each session. The missing session averages (only 3 per all 256 data points in all participants) were substituted with mean values interpolated from the directly preceding and following sessions.

Considering the small number of participants assigned to the B− protocol for the sake of further analyses we decided to combine it with the sham group to create a single control condition, further referred to as control group (CON). Before combining the groups we confirmed with three-way ANOVA of group, session and band that there were no significant differences between these groups (no significant effect of group or interactions including this factor, all p > 0.208). Since seven participants identified to have a steady EMG contamination belonged to the B+ protocol (50% of this group), we split this group into MB+ (muscle-employing participants from B+ group, n = 7) and nMB+ (participants not employing muscles from B+ group, n = 7). Their performance was compared to the results of the control group (CON, n = 18). The three-way ANOVAs were computed for within and between session effects with ''time'' (blocks 1–10 or sessions 1–8), ''band'' (alpha, beta1, beta2) as within subject factors and training ''group'' (MB+, nMB+, CON) as between subject factors. The Greenhouse-Geisser correction (G-G) was applied when the data did not meet the sphericity. We considered the results to be significant when the p value was below 0.05. For significant interactions post hoc pairwise comparisons were provided. For clarity of the presentation, from multiple pairwise comparisons between consecutive time points, we show the comparison of the first and the last blocks/sessions and prove the gradual character of the change by fitting a linear trend.

#### Self-Reports

After completing the training the participants were asked to assess: (1) the effectiveness of the NFB training; (2) the influence of the training on their functioning outside the sessions; (3) their ability to evoke the state from the trainings outside the sessions; (4) their progress in the ability to control visual stimulus during the trainings; and (5) their implemented strategies (if any).

## RESULTS

# Electroencephalographic Data

Fourteen healthy subjects took part in eight sessions of the EEG-NFB training that aimed to up-regulate the beta1 band amplitude. Another 18 participants who underwent the same training regime but were not rewarded for increasing beta1 amplitude formed the control group to account for the unspecific training factors.

To assess the possible impact of muscle activity on the EEG-NFB results we divided the participants into subgroups based on the presence of the extended muscle contamination in their EEG signal. To assure the validity and consistency of the ensuing division three different classification methods were used. Based on the screening of the raw signal and the shape of the FFT spectra competent judges marked nine subjects as muscle-employing. Their assessments agreed in 91% of cases (eight participants were identified by all three judges, and one by two judges).

K-means clustering with assumed two clusters divided the data into groups of eight and 24 participants. Eight out of nine subjects marked as muscle-employing during the visual inspection were classified as such by the k-means algorithm (see

**Figure 3**). All 23 participants judged not to employ muscles were in the second cluster. The same result was obtained with logistic regression (**Figure 3**). The increase in beta2 amplitude significantly raised the probability of a subject being classified as muscle employing (β = 25.92; t = 3.87; p < 0.001). It is worth noticing that while visual inspection was based on raw signals as well as on the shape of entire FFT spectrum, automatic classifications, relying exclusively on the amplitude of beta2 band lead to very similar results, proving that changes in the high frequencies are the critical feature distinguishing these two groups. All approaches resulted in the same outcome in 31/32 cases. In the single ambiguous case we leaned toward the automatic classification and included this subject in the control group.

We compared the effects of the training on the muscle employing participants trained to upregulate the beta band with other groups. The analysis of the within session effects (**Figure 4A**) revealed a significant three-way interaction of block, band and group factors (F(36,522) = 4.23, p = 0.001, η <sup>2</sup> = 0.226). The statistics for the main and the interaction effects are shown in **Table 1**. We observed a general increase of amplitudes in MB+ during training session, with the magnitude varying between the bands. The most pronounced increase was in beta1 (first block: m = 0.58, last block: m = 1.59, p < 0.001, linear trend at p = 0.029) and beta2 (first block: m = 0.77, last block: m = 1.81, p < 0.001, linear trend at p = 0.029). In both of these bands the amplitudes during entire training session were significantly higher in this group than in nMB+ or CON (all p < 0.001). A smaller, yet significant increase was also present in the alpha band (first block: m = −0.23, last block: m = 0.15, p = 0.001, linear trend at p = 0.018). The mean alpha amplitude did not significantly differ between groups (all p > 0.745). MB+ was the only group that exhibited any amplitude changes during the training session. All comparisons between the first and the last blocks, for all bands in the remaining groups turned out insignificant (all p > 0.361).

In the analogous analysis performed for the between session effects (**Figure 4B**) the three-way interaction of session, band and group factors appeared insignificant (F(28,406) = 1.39, p = 0.215, η <sup>2</sup> = 0.087), demonstrating a lack of systematic changes in the EEG amplitudes across sessions similar to those observed within sessions. However, beta1 and beta2 amplitudes were significantly higher across sessions (all p < 0.001) in MB+ (beta1 m = 1.05 ± 1.09; beta2 m = 1.23 ± 1.152) than in nMB+ (beta1 m = −0.48 ± 0.40; beta2 m = −0.35 ± 0.25) and CON (beta1 m = −0.53 ± 0.50; beta2 m = −0.40 ± 0.21), as shown in the significant interaction of band and group factor (F(4,58) = 6.06, p = 0.004, η <sup>2</sup> = 0.295). There was no difference between groups in the alpha band amplitude (all p > 0.764).

The same analyses performed on the z-scores computed for each participant separately confirmed the effects previously observed in the beta1 and beta2 bands. The amplitude of these two bands increased within session in the MB+ group and did not change significantly in the other two groups as revealed by the interaction effect (F(18,261) = 2.38, p = 0.032, η <sup>2</sup> = 0.141). The increase of the alpha amplitude visible in the previous approach appeared to be equal to the one observed in higher frequencies after its normalization to the subject-specific variability range. This is evidenced by the lack of significant differences between the bands in the MB+ group (interaction of band and group for the within session: F(4,58) = 0.27, p = 0.827, η <sup>2</sup> = 0.018 ; between session: F(4,58) = 0.31, p = 0.744, η <sup>2</sup> = 0.021).

The impact of muscle activity on the EEG spectrum increased with frequency. However, the training related increase in amplitude was proportional to the amount of muscle contamination in the particular frequency. The results showed that after separating muscle-employing participants from the group trained to increase beta1 band we were unable to observe any effects of the NFB training as indicated by no significant differences between the nMB+ and control group participants.

#### Self-Reports

The participants expressed their opinion about the EEG-NFB trainings on a Likert scale (1—not effective, 5—effective). The assessment of the effectiveness of the training varied between groups (F(2,29) = 5.65, p = 0.008, η <sup>2</sup> = 0.280). Subjects assigned to the control group expressed a more positive opinion about the effectiveness of the training (m = 4.17 ± 0.71) than those from the MB+ (m = 3.29 ± 0.76) and nMB+ (m = 3.00 ± 1.29). Consistently, the majority of subjects from the CON group (13 out of 18) declared that the trainings had positive influence on their functioning outside the sessions, the same was true only

groups (MB+, n = 7; nMB+, n = 7; CON, n = 18; for statistical significance see "Results" Section). (A) Consecutive 10 blocks averaged across sessions. (B) Means of eight consecutive sessions. Error bars represent standard error of the mean.



for two out of seven subjects in the MB+ and only two subjects in the nMB+ group (χ 2 (2,32) = 6.03, p = 0.049). About half of all the participants declared their ability to transfer the state maintained during the sessions onto other situations (half of the nMB+, two thirds of the CON and only one person form the MB+ group, χ 2 (2,31) = 5.55, p = 0.063).

There were no differences between the groups in the reported ability to control the visual stimulus (F(2,29) = 2.26, p = 0.122, η <sup>2</sup> = 0.135). All groups declared that their ability to control the visual stimulus increased during the trainings (CON: m = 4.33 ± 0.97, MB+: m = 4.14 ± 0.69, nMB+: m = 3.43 ± 1.13). During the trainings the participants were left without any specific instruction. In post trainings self-reports, we asked them if they implemented any strategies to control the visual stimulus. In all groups the majority of subjects declared to apply some strategies during the NFB sessions (χ 2 (2,32) = 0.169, p = 0.919). The participants indicated strategies such as: (1) looking at one point; (2) solving logical puzzles; (3) visualizing places; (4) relaxing; (5) focusing and calming down; and (6) singing songs in their minds. Four subjects from the MB+ group and one from nMB+ mentioned in their reports changing the muscle tension. Therefore, about half of the participants from MB+ were aware of the possibility to use muscle tension to control the feedback and half were not. However, these numbers are too small for direct statistical comparisons. The self-aware subjects had a slightly better opinion about the effectiveness of the training and perceived ability to control the visual stimulus than MB+ who were unaware of their muscle employment (effectiveness m = 3.00 ± 0.816 vs. m = 3.67 ± 0.58, perceived control m = 3.75 ± 0.50 vs. m = 4.67 ± 0.58).

# DISCUSSION

In the present experiment performed on healthy adults, we failed to observe a change of the EEG activity in the trained beta1 (15–22 Hz) band despite the positive reception of the trainings effects reported by both the trainers and the trainees. However, we observed extensive muscle employment, which increased during the training sessions. We argue that in the reported experiment, in the subgroup of participants the EEG-NFB training was taken over by the EMG signal, which became the foundation for incentive-based learning. This conclusion is supported by reports showing that EMG is more susceptible to feedback modification than EEG (DeGood and Chisholm, 1977; Maurizio et al., 2013). Indeed, all but one participants who increased muscle activity belonged to the training group up-regulating the beta1 band. This group was rewarded for amplifying amplitude of beta1, which is at the lower edge of the EMG spectrum (Criswell, 2011). On the contrary, such excessive artifacts were not present in participants trained to down-regulate beta1 activity and present only in 1 out of 12 subjects trained in the sham protocol. In that participant the muscle activity did not change systematically in the course of the training.

We cannot eliminate the possibility that in the beta1 activity recorded from the participants identified as the muscle employing group (MB+) there was also a contribution of a neuronal origin overshadowed by muscular activity. However, in the analysis restricted to the subjects who did not employ muscles for trainings, no modifications in the beta1 range were detected, suggesting ineffectiveness of the performed EEG-NFB training. The susceptibility of the beta1 band for the NFB training has yet to be established with larger sample sizes.

Subjects generally declared that their ability to control the visual stimulus increased during the course of the trainings. Surprisingly, the participants from the control group (who did not show any changes in the EEG during the trainings) were most positive about the training effects. This counter-intuitive result may point out the presence of a placebo effect in the NFB trainings. It acted as if the effect was most pronounced when undisturbed by real control of the ongoing feedback (the case of MB+, which was less pleased with the effectiveness of the NFB). A majority of participants reported using various strategies during the trainings. The employment of muscle tension to control the visual feedback stimulus was reported post-training by four participants from MB+, the remaining subjects were unaware of using this strategy or failed to mention it in their reports. At the beginning of the experiment the participants were informed about the basic mechanisms of the EEG-NFB e.g., its relation to the ongoing brain activity. To assure a high quality of the recorded signals they were asked to sit still, and trainers intervened whenever they noticed excessive movement or other type of undesirable behavior. The fact that half of subjects trained to up-regulate the beta1 band managed to increase their muscle activity (in half of the cases unwittingly) shows that such behavior cannot be efficiently controlled and eliminated by trainers alone.

Our experiment strongly supports the need for effective automatic on-line control of muscle activity during the EEG-NFB trainings, in particular in the protocols aiming to up-regulate higher frequency bands. Proper muscle control is a requirement not solely to acquire a high quality EEG signal but primarily to accomplish a genuine EEG-NFB training. Muscle control must be recognized as standard part of the NFB procedure, to successfully prevent participants from unintentionally using muscle tension to change the signals registered by the EEG electrodes during the training. Various mathematical algorithms were proposed and validated as effective in removing the EMG components from the EEG signal (McMenamin et al., 2011; Fitzgibbon et al., 2016), however, most of them require high density multichannel recordings and as such are not applicable in the case of typical NFB setups. It was also proposed to model, fit and subtract individual EMG spikes from the EEG channels (Nottage et al., 2013)—such an approach does not require multiple recording channels but needs powerful computers and sophisticated software.

The strategy suggested in NFB guidebooks (Demos, 2005) is based on the intervention of the trainer, who is expected to instruct the trainee to relax, adopt a proper, comfortable position and to avoid muscles contraction. Trainers are expected to visually screen the recorded EEG and FFT spectra to detect sweeps of EMG activity and to instruct the trainees to correct their behavior. While this is the demanded minimum that can be done without adequate hardware and software support, present experiment shows that such a strategy may be inefficient at least in the trainings aiming to up-regulate high frequencies. Surprisingly, the reports provided by the authors in a clinical research concerning mostly the NFB applications for ADHD treatment in children do not refer to the issue of muscle control (Lévesque et al., 2006; Leins et al., 2007; for review see Lofthouse et al., 2012). It remains widely neglected also in the research conducted on healthy participants in which high frequency (beta and gamma) training is used in e.g., for cognitive improvement (Keizer et al., 2010a,b; Logemann et al., 2010). The articles presenting various control procedures are noticeable exceptions (e.g., Bird et al., 1978; Berner et al., 2006; Hoedlmoser et al., 2008; Kober et al., 2013; Witte et al., 2013). Based on this research three main approaches to control muscle activity can be distinguished.

Probably the most common, although only incidentally described as serving for artifact control, is the usage of multiband protocols aiming at simultaneous manipulation of two or more bands, or maximization of their proportion (e.g., up-regulating SMR while simultaneously decreasing theta and beta2). Rarely was such protocol claimed as serving for the eye blink (theta) and muscle (beta2) control (Kober et al., 2013; Witte et al., 2013). Even though such an approach can be effective in controlling muscle-related activity it is threatened by physiological invalidity when the contrasted bands are directly flanking (Ros et al., 2013). It is highly plausible that (at least) the edges of the neighboring bands are mutually interdependent and if left unconstrained, they tend to change their amplitude in the same direction. The interdependence of the flanking frequency bands (on the example of theta, alpha and beta) was demonstrated as moderate to strong (by means of correlation of the withinsubject changes during training, 0.5 < r < 0.7 (Ros et al., 2013). Thus, implementation of such protocols, although effective for the control of muscle artifacts, might lower the effectiveness of training.

Our study shows that the classification based solely on the beta2 amplitude proved that the changes in this frequency are a feature distinguishing the subjects who train based on the EMG activity from others. Thus, controlling this single parameter is sufficient to assure good quality of the feedback information. It can be implemented as an amplitude threshold on one of the high frequency bands, as the signal of the muscle origin have an amplitude far bigger than the neuronal one (procedure employed i.a. by Hoedlmoser et al., 2008). When the signal exceeds this threshold the participants are left without a reward and the training is interrupted. This method can be easily implemented in commercial software, but it does not completely exclude the risk of under threshold manipulation of muscle tension. Many of the articles that implement these solutions, report the use of an amplitude threshold in the range of 100–120 µV (e.g., Gevensleben et al., 2009a,b; Meisel et al., 2014). Our data clearly shows that it is perfectly feasible to gain control over the NFB apparatus with muscle activity in the range of 70 µV or lower (**Figure 2**). Since different apparatus and online signal processing settings influence the range of the recorded EEG values the amplitude threshold should not be generic but separately calibrated.

Finally, some authors used additional EMG recordings of facial muscles (Berner et al., 2006), neck-muscles (Bird et al., 1978) or the chest belt measuring the chest wall movements (Berner et al., 2006). In these experiments positive feedback was conditional upon ongoing EMG activity and provided only if the latter did not exceed established threshold. As this method unambiguously distinguishes between neuronal and muscular signals, it is the most precise one but also the most demanding in implementation (requiring additional hardware and software facilities and prolonging the preparation time).

Muscle activity can differently influence various training protocols, so it should be specifically approached with regard to the training band and behavioral goal. In the case of the protocols aiming to up-regulate high frequency bands, the EMG signal can result from a head or upper body muscle strain. It is especially plausible in trainings focusing on attention (which constitute many of high frequency protocols, e.g., Egner and Gruzelier, 2004; Cannon et al., 2006, 2009), as they do not instruct subjects to relax during the trainings. The situation is different for the NFB trainings aiming at subjects' relaxation,

#### REFERENCES


most often related to alpha protocols (e.g., Egner et al., 2002; van Boxtel et al., 2012). Interestingly, few articles directly comparing the EEG and EMG feedback employed the alpha band and have shown that down-regulating the amplitude of the muscle signal in the EMG-feedback can be equally or even more effective in increasing the alpha amplitude when compared to the direct alpha training (DeGood and Chisholm, 1977; Moore et al., 2000). Both these procedures resulted in subjects' relaxation as supported by physiological indices such as heart and respiratory rates (DeGood and Chisholm, 1977). In this comparison, the EMG training was explicitly described by participants as easier in the post training survey. Thus, the EMG-feedback can be considered as a valid replacement of the EEG-NFB protocols aiming at the alpha band upregulation.

We conclude that the activity from the EEG electrodes might be overwhelmed by the stronger and easier to control EMG signals, which in turn becomes a foundation for the feedback reinforcement. This might cause different effects in various training protocols and therefore needs to be carefully considered while designing training protocols and algorithms. Online data analysis and quality of information fed back to the participants should be of the highest interest to all therapists and researchers, as any shortcomings at this stage irreversibly alter the training and cannot be fixed in further offline data processing. Extensive effort should be undertaken within the NFB community to develop, validate and implement in experimental and commercial setups efficient automatic artifact detection algorithms.

#### AUTHOR CONTRIBUTIONS

KP and KJ: data analyses, main conclusions, article drafting (equal contribution). JR: interpretation of the results, proofreading. RK and MS: data collection. MM: experimental setup, data collection. AW and EK: verification of main concept and conclusions, proofreading.

#### ACKNOWLEDGMENTS

We thank ELMIKO MEDICAL Sp. z o. o. for technical support. The study was supported by the Polish National Science Centre grant 2012/07/B/NZ7/04383 and by the National Centre for Research and Development grant POIR 01.01.01-00-178/15.


of elderly subjects. Int. J. Psychophysiol. 89, 334–341. doi: 10.1016/j.ijpsycho. 2013.05.007


for neurofeedback training with a case illustration. Appl. Psychophysiol. Biofeedback 38, 109–119. doi: 10.1007/s10484-013-9213-x


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Paluch, Jurewicz, Rogala, Krauz, Szczypi ´nska, Mikicin, Wróbel and Kublik. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Posterior Parietal Cortex Subserves Precise Motor Timing in Professional Drummers

Bettina Pollok <sup>1</sup> , Katharina Stephan<sup>2</sup> , Ariane Keitel <sup>1</sup> , Vanessa Krause<sup>1</sup> \* and Nora K. Schaal <sup>2</sup>

<sup>1</sup>Medical Faculty, Institute of Clinical Neuroscience and Medical Psychology, Heinrich-Heine University Duesseldorf, Duesseldorf, Germany, <sup>2</sup>Department of Experimental Psychology, Heinrich-Heine University Duesseldorf, Duesseldorf, Germany

The synchronization task is a well-established paradigm for the investigation of motor timing with respect to an external pacing signal. It requires subjects to synchronize their finger taps in synchrony with a regular metronome. A specific significance of the posterior parietal cortex (PPC) for superior synchronization in professional drummers has been suggested. In non-musicians, modulation of the excitability of the left PPC by means of transcranial direct current stimulation (tDCS) modulates synchronization performance of the right hand. In order to determine the significance of the left PPC for superior synchronization in drummers, we here investigate the effects of cathodal and anodal tDCS in 20 professional drummers on auditory-motor synchronization of the right hand. A continuation and a reaction time task served as control conditions. Moreover, the interaction between baseline performance and tDCS polarity was estimated in precise as compared to less precise synchronizers according to median split. Previously published data from 16 non-musicians were re-analyzed accordingly in order to highlight possible differences of tDCS effects in drummers and non-musicians. TDCS was applied for 10 min with an intensity of 0.25 mA over the left PPC. Behavioral measures were determined prior to and immediately after tDCS. In drummers the overall analysis of synchronization performance revealed significantly larger tap-to-tone asynchronies following anodal tDCS with the tap preceding the tone replicating findings in non-musicians. No significant effects were found on control tasks. The analysis for participants with large as compared to small baseline asynchronies revealed that only in drummers with small asynchronies tDCS interfered with synchronization performance. The re-analysis of the data from non-musicians indicated the reversed pattern. The data support the hypothesis that the PPC is involved in auditory-motor synchronization and extend previous findings by showing that its functional significance varies with musical expertise.

Keywords: anticipation, brain plasticity, musicians, synchronization, transcranial direct current stimulation (tDCS)

# INTRODUCTION

Timing abilities are essential for precise movement execution, in particular when movements have to be executed with respect to external events. The ability to predict such events increases movement accuracy and reduces attentional demands. A well-established paradigm to investigate motor timing with respect to an external signal is the so-called synchronization task, which

Edited by: Louis Bherer, Université de Montréal, Canada

#### Reviewed by:

Takako Fujioka, Stanford University, USA Filippo Brighina, University of Palermo, Italy Simone Dalla Bella, University of Montpellier 1, France

\*Correspondence: Vanessa Krause vanessa.krause@uni-duesseldorf.de

> Received: 22 June 2016 Accepted: 28 March 2017 Published: 11 April 2017

#### Citation:

Pollok B, Stephan K, Keitel A, Krause V and Schaal NK (2017) The Posterior Parietal Cortex Subserves Precise Motor Timing in Professional Drummers. Front. Hum. Neurosci. 11:183. doi: 10.3389/fnhum.2017.00183 requires subjects to synchronize their own finger taps with respect to a regular metronome (reviewed in Repp and Su, 2013). Despite the simplicity of this task, non-musicians typically show the so-called negative asynchrony, which is characterized by the tap preceding the tone by several tens of milliseconds (for a review, see Repp and Su, 2013). Motor timing relies on a cerebello-thalamo-cortical network (Pollok et al., 2005; Pecenka et al., 2013; for reviews, see Coull and Nobre, 2008; Chen et al., 2009). Core timing functions have been related to the basal ganglia (Malapani et al., 1998) and the cerebellum, which has been particularly linked to the stabilization of movements with respect to external events (Ivry et al., 2002; Spencer et al., 2005; for a review, see Molinari et al., 2003) as well as to the anticipation of sensory events (Tesche and Karhu, 2000). On a cortical level precise motor timing engages parietal as well as primary motor and premotor areas (Karabanov et al., 2009; Pecenka et al., 2013; for a review see Coull and Nobre, 2008). A specific relevance of the dorsal premotor cortex (dPMC) for precise movement timing with respect to auditory stimuli has been found suggesting that the dPMC integrates auditory information with motor actions (Chen et al., 2006, 2008, 2009; for a review, see Zatorre et al., 2007).

Drummers (Krause et al., 2010a,b) and percussionists (Manning and Schutz, 2016) show superior synchronization performance as compared to non-musicians and even as compared to professional pianists (Krause et al., 2010a,b). This behavioral advantage has been related to a stronger functional interaction between the thalamus and the posterior parietal cortex (PPC), suggesting a specific significance of the PPC for synchronization accuracy (Krause et al., 2010b). The term synchronization accuracy refers to the mean temporal distance between the onsets of the auditory pacing signal and the finger-tap as well as its variability. Noteworthy, although the data by Krause et al. (2010b) reveal evidence for a stronger involvement of the dPMC in professional musicians as compared to non-musicians, the dPMC cannot account for superior synchronization in drummers, since no significant differences between drummers and pianists were found. The PPC has been related to sensorimotor integration possibly acting as sensorimotor interface (Andersen, 1997; Andersen and Buneo, 2003) as well as to anticipatory motor control, which suggests that movements can be planned and executed not only with respect to actual but also to anticipated sensory events (Beudel et al., 2009; reviewed in Blakemore and Sirigu, 2003). Due to the regularity of the pacing signal, the synchronization task allows the investigation of anticipatory motor control. In line with this hypothesis, faster reaction times with respect to temporally predictable visual cues have been particularly related to increased activation of the inferior PPC (Coull et al., 2016).

Although neuroimaging studies reveal important insights into brain areas involved in a certain task, the results do not necessarily allow a conclusion regarding their functional significance for task execution. In order to estimate the functional relevance of different brain areas within a network, non-invasive brain stimulation methods like transcranial direct current or magnetic stimulation (tDCS/TMS) can be applied. These methods allow the modulation of cortical excitability. Evidence exists that tDCS changes the resting membrane potential in a polarity specific manner. While anodal tDCS increases the likelihood of neural firing by depolarization of neurons, cathodal tDCS yields hyperpolarization of cell bodies (Lang et al., 2005; reviewed in Stagg and Nitsche, 2011; Shin et al., 2015). Stimulation after-effects are assumed to rely on changes of inhibitory and excitatory synapses (reviewed in Stagg and Nitsche, 2011).

Previous studies showed that modulation of the PPC excitability by 1 Hz repetitive TMS (rTMS; Krause et al., 2012) or tDCS (Krause et al., 2014) changes synchronization accuracy in non-musicians as indicated by larger tap-to-tone asynchronies following anodal tDCS (Krause et al., 2014) and smaller asynchronies following inhibitory 1 Hz rTMS (Krause et al., 2012). Since in non-musicians no significant effects were found in the continuation and reaction tasks, we hypothesized that the PPC is particularly involved in anticipatory motor control. The present study aims at investigating whether this area is causally involved in superior synchronization in professional drummers. To this end, anodal and cathodal tDCS was applied to the left PPC and effects on synchronization as well as continuation accuracy and reaction times of the right hand were determined. Assuming a specific significance of the PPC for anticipatory motor control, we expected effects of tDCS: (i) on synchronization accuracy only; and (ii) being evident particularly in precise drummers.

## MATERIALS AND METHODS

#### Participants

Twenty professional drummers (19 male) aged between 19 years and 63 years (34.3 ± 2.6 years; mean ± standard error of the mean; SEM) were included in the present study. Sample size was determined with respect to our previous study (Krause et al., 2014) revealing relatively large effect sizes. The mean lateralization ratio according to the Edinburgh Handedness Inventory (Oldfield, 1971) was 96.3 ± 0.7 indicating that all participants were right-handed. They were either students of a music college or worked as professional musicians in an orchestra or as music teachers. Nineteen participants were formally educated on the instrument for 13.7 ± 1.2 years on average (range 5–24 years). One participant learned the instrument by self-education without any formal training. Mean age at the beginning of formal training was 9.9 ± 1.3 years (range 3–21 years). Mean duration of regular practice was 17.2 ± 2.6 years.

In addition, data from 16 healthy non-musicians (6 male) with a mean age of 23.7 ± 1.0 years were re-analyzed (Krause et al., 2014) in order to determine a possible interaction between baseline performance and musical expertise on tDCS effects. Right-handedness was indicated by a mean lateralization ratio of 85.0 ± 4.3. This group was labeled non-musicians since they never had regularly practiced an instrument.

Subjects with personal or family history of epileptic seizures or other neurological or psychiatric disorders, cardiac pacemaker or intracranial metal implants or intake of central nervous systemeffective medication were not included in the study.

### Ethics

The study was carried out in accordance with the standards set by the latest revision of the Declaration of Helsinki. Experimental procedures were approved by the local ethics committee (Heinrich-Heine University Duesseldorf; study number 3347). Participants gave their written informed consent prior to participation.

## Experimental Paradigm

Participants were naïve with respect to the exact hypotheses of the study. None of them had received electrical brain stimulation before. Participants and the main investigator were blinded with respect to the type of tDCS until the end of the experiment. To this end, a second investigator ran the DC stimulator which was covered by a paperboard in order to hide the exact stimulation type. The order of tDCS conditions was counterbalanced across subjects. Timing abilities were measured by means of the: (i) synchronization paradigm which was always followed by; a (ii) continuation task; and a (iii) simple reaction time task. The order of tasks (synchronization-continuation vs. reaction time tasks) was counterbalanced across participants and tDCS sessions. During synchronization the subjects were instructed to tap with their right index finger in synchrony with an auditory pacing signal presented with a regular stimulus onset asynchrony (SOA) of 900 ms. Length of the pacing signal was 100 ms. After 30 taps the pacing signal stopped and subjects continued with the same rhythm for another 30 taps (continuation task). Reaction times were measured with respect to the same auditory signal being presented at varying SOAs of 1.000, 1.500 and 2.000 ms. In total 60 reactions were recorded for each individual. Behavioral data prior to and immediately after tDCS were measured by a photoelectric barrier mounted on a pad. For stimulus presentation and recording of behavioral data E-Prime<sup>r</sup> 2.0 software was applied (Psychology Software Tools, Sharpsburg, MD, USA). Prior to data acquisition a short practice run was implemented in order to familiarize the subjects with the tasks. No specific training was conducted. During each experimental session, subjects were comfortably seated in a reclining chair. They were instructed to relax and to keep their eyes open during the entire experiment.

## Transcranial Direct Current Stimulation

tDCS was applied using a battery driven DC stimulator (NeuroConn GmbH, Ilmenau, Germany) with a pair of rubber electrodes (3 × 3 cm<sup>2</sup> ) nestling between saline-soaked sponges. According to our previous study (Krause et al., 2014) the stimulation electrode was fixed above the left PPC and the reference electrode was placed over the contralateral orbit (Nitsche and Paulus, 2000; Moliadze et al., 2010). Self-adhesive bandages (CobanTM, 3M Deutschland GmbH, Neuss, Germany) were used for the fixation of the electrodes. Anodal as well as cathodal tDCS was applied during rest for 10 min, respectively, with an intensity of 0.25 mA, resulting in a current density of 27.77 µA/cm<sup>2</sup> below electrodes. In line with our previous study (Krause et al., 2014) the relatively weak stimulation intensity was chosen in order to adjust the current density for the electrode size. A previous study has shown that smaller stimulation electrodes in combination with lower intensities result in higher tDCS focality (Nitsche et al., 2007). We tried to reduce the probability of primary motor cortex (M1) co-stimulation by the application of 3 × 3 cm<sup>2</sup> stimulation electrodes along with lower stimulation intensities. The current was ramped up and down over additional 10 s at the beginning and the end of stimulation, respectively. Impedance was kept below 10 kΩ. Mean impedance was 8.2 ± 0.4 kΩ. An interval of at least 1 week was interspersed between anodal and cathodal tDCS sessions in order to avoid carryover effects. Stimulation was in accordance with established safety protocols (Nitsche et al., 2003; Iyer et al., 2005). In order to monitor the quality of blinding, subjects were asked to estimate the respective stimulation condition by a questionnaire. To this end, at the end of each session they were asked to decide whether they had received either anodal or cathodal tDCS.

The PPC was localized by means of a neuronavigation system (LOCALITE, Sankt Augustin, Germany) using a standard brain. The stimulation target was set to the Talairach coordinates (x, y, z) −25, −46, 62 corresponding to Brodmann area (BA) 7 (**Figure 1**).

In order to ensure that the stimulated area does not overlap with the M1, M1 was localized by means of single pulse TMS using a standard figure of eight coil with an outer winding diameter of 80 mm (MC-B 70) being connected to a MagPro stimulator (Mag Venture, Hückelhoven, Germany). The coil was placed tangentially to the scalp with the handle pointing backwards and laterally at about 45◦ away from the midline inducing an initial posterior-anterior current flow in the brain. The magnetic stimulus had a biphasic waveform with a pulse width of about 300 µs. In a first step, the optimal cortical representation of the first dorsal interosseous (FDI) muscle was determined by eliciting motor evoked potentials (MEPs; for an overview see Kobayashi and Pascual-Leone, 2003). Then, the point which evoked the largest motor response of the FDI muscle was determined as motor hot spot by moving the coil in 0.5 cm steps anterior, posterior, medial and lateral to this area. The mean distance between the left M1 hot spot and the stimulated area corresponding to the left PPC was 4.6 ± 0.2 cm.

## Data Analysis

Synchronization performance and reaction times were determined as the temporal distance between tap and tone onsets. Continuation performance was determined as the mean inter-tap interval (ITI) and calculated by the temporal distance between two subsequent tap onsets. In addition, the inter-tap variability was calculated for the continuation task as determined by the mean standard deviation of the ITI. Accordingly, the tap-to-tone variability was determined for the synchronization task as indicated by the mean standard deviation of the temporal distance between tap and tone onsets. The first three taps of each run were excluded from the analysis. Data which were two standard deviations below

or above individual and group means were identified as outliers and discarded. Less than 5% of individual data per condition were removed prior to the final analysis. The number of outliers did not significantly differ between stimulation conditions (p > 0.14). Due to this procedure synchronization and continuation data from one subject and reaction times from two other subjects were excluded. Analysis of variance (ANOVA) with factors stimulation (anodal vs. cathodal) and time (pre vs. post) were calculated for each task (synchronization, continuation, reaction), respectively. T-tests were applied for post hoc analysis.

In order to determine whether tDCS effects vary depending on baseline performance, the data were additionally split with respect to baseline median of the tap-to-tone asynchrony (synchronization task) and ITI (continuation task), respectively and were analyzed separately for subjects with performance levels above (large asynchronies) and below (small asynchronies) group median. A comparable procedure has been recently used for the investigation of tDCS effects on spatial attention depending on age (Learmonth et al., 2015). Median was calculated for each baseline measurement, respectively. In addition, data from our previous study investigating the effects of tDCS over the left PPC on motor timing in non-musicians (Krause et al., 2014) were re-analyzed with respect to baseline performance in the same way. This analysis aimed at investigating whether and to what extent effects of tDCS might vary with musical expertise and baseline performance.

# RESULTS

# Blinding

In the first measurement 6 and in the second measurement 5 out of 20 participants correctly indicated the tDCS type, suggesting that blinding was successful.

# Synchronization Task

The analysis of the tap-to-tone asynchrony revealed a significant stimulation × time interaction (F(1,18) = 6.71, p = 0.02; **Figure 2**). This interaction was due to larger tap-to-tone asynchronies following anodal stimulation as compared to baseline (t(18) = 2.31, p = 0.03), while no significant effect following cathodal stimulation was found (t(18) = −1.58, p = 0.13). Comparison of baseline performance between tDCS conditions revealed a trend towards significance (t(18) = 1.97, p = 0.06). Neither a significant main effect of stimulation (F(1,18) = 0.06, p = 0.45) nor time (F(1,18) = 0.49, p = 0.49) was evident.

#### Effect of Age and Amount of Musical Training

Since the age at the beginning of formal musical training was quite variable across subjects, we compared subjects with early (i.e., starting the formal training below the age of 8 years) and late onset of musical practice (i.e., > 8 years) according to median split. The analysis did not reveal a significant main effect of age (F(1,15) = 10.79; p = 0.23) or a significant interaction with this factor (p > 0.34). Moreover, no significant correlation:

(i) between years of musical education; or (ii) duration of daily practice on the instrument with the amount of the tap-to-tone asynchrony was observed (p > 0.5).

while no significant effects occurred following cathodal tDCS. Error bars indicate the standard error of the mean (SEM).

#### Effects of Baseline Performance

Analysis for drummers with large baseline asynchronies according to median split revealed neither significant main effects (stimulation: F(1,9) = 0.09, p = 0.78; time: F(1,9) = 1.48, p = 0.26) nor a significant stimulation × time interaction (F(1,9) = 0.83, p = 0.39). In drummers with small baseline asynchronies—however—the interaction turned out to be significant (F(1,9) = 12.85, p = 0.01). Post hoc analyses revealed a significant shift from a mean positive to a mean negative asynchrony following anodal tDCS as compared to baseline (t(8) = 3.94, p = 0.003) while no significant effect was found following cathodal tDCS (t(8) = −1.09, p = 0.31). Mean asynchronies were significantly different during baseline (t(8) = 4.14, p = 0.003), but not after tDCS (t(8) = −1.24, p = 0.252). No significant main effect of factor stimulation was found (F(1,9) = 0.22, p = 0.65) while factor time showed a trend towards significance (F(1,9) = 4.82, p = 0.06). Data are summarized in **Figure 3**.

The observed baseline differences raised the question whether behavioral tDCS effects indeed occur due to tDCS or may arise simply due to such baseline differences. In order to estimate the effect of baseline performance on post-tDCS synchronization, we additionally calculated regression analyses with post-tDCS synchronization as the dependent and baseline performance as the predictor variable. The analysis revealed significant effects in drummers with large baseline asynchronies (anodal tDCS: F(1,7) = 13.40, p = 0.001, R <sup>2</sup> = 0.66, β = 0.81; cathodal tDCS: F(1,7) = 6.13, p = 0.04, R <sup>2</sup> = 0.47, β = 0.68). In drummers with small baseline asynchronies post-tDCS synchronization was not significantly associated with baseline performance in the anodal condition (F(1,8) = 1.09, p = 0.328, R <sup>2</sup> = 0.120, β = 0.346) while in the cathodal condition a trend emerged (F(1,7) = 5.03, p = 0.060, R <sup>2</sup> = 0.418, β = 0.647).

The analysis of synchronization variability as indicated by the standard deviation of the tap-to-tone asynchrony did not reveal significant main effects or interactions neither for the entire group nor for the sub-group analysis (p > 0.09).

# Continuation Task

The analysis of ITI and inter-tap variability across the entire group did not reveal significant main effects or an interaction (p > 0.13). In subjects with large ITIs, a significant main effect of time was found (F(1,8) = 11.38, p = 0.01) which was due to smaller ITIs post tDCS (904.41 ± 4.09 ms) as compared to pre-tDCS ITIs (916.74 ± 2.95 ms). A trend emerged for stimulation (F(1,8) = 3.57, p = 0.09) which was characterized by smaller ITIs in the cathodal (903.19 ± 4.09 ms) as compared to anodal tDCS (916.74 ± 5.23 ms). The stimulation × time interaction (F(1,8) = 0.27, p = 0.62) was not significant. The analysis of the data from subjects with small ITIs revealed a trend for time (F(1,8) = 4.90, p = 0.06) and a non-significant effect of stimulation (F(1,8) = 0.009, p = 0.92). The time × stimulation interaction was again not significant (F(1,8) = 0.30, p = 0.60). The trend of factor time can be explained by smaller ITIs prior to (882.91 ± 3.00 ms) as compared to post tDCS performance (897.18 ± 5.63 ms).

#### Reaction Times

The analysis revealed neither significant main effects of factors stimulation (F(1,17) = 0.08, p = 0.78) and time (F(1,17) = 0.41, p = 0.53) nor a significant interaction (F(1,17) = 1.29, p = 0.27).

#### Synchronization in Non-Musicians

In order to test whether in non-musicians tDCS effects vary with baseline performance as shown in drummers, we re-analyzed the data published previously (Krause et al., 2014). In participants with small baseline asynchronies neither significant main effects of stimulation (F(1,6) = 0.03, p = 0.86) and time (F(1,7) = 0.28, p = 0.61) nor a significant stimulation × time interaction (F(1,6) = 0.04, p = 0.85) emerged. In contrast to this, in participants with large baseline asynchronies a significant stimulation × time interaction (F(1,6) = 14.21, p = 0.01) was evident suggesting larger tap-to-tone asynchronies following anodal tDCS (t(6) = 2.80; p = 0.03) while following cathodal tDCS a trend towards smaller asynchronies was found (t(6) = −2.33; p = 0.06). Neither significant main effects of stimulation (F(1,6) = 1.71, p = 0.24) nor time (F(1,6) = 0.92, p = 0.37) were found. While the comparison of baseline asynchronies revealed a trend towards significance (t(6) = 2.23, p = 0.07) significant differences emerged following tDCS (t(6) = −3.01, p = 0.024). Results are summarized in **Figure 4**.

Again, we calculated regression analyses with post-tDCS synchronization as dependent and baseline performance

as predictor variable. In participants with small baseline asynchronies no significant effect in the anodal (F(1,5) = 1.80, p = 0.24, R <sup>2</sup> = 0.26, β = 0.51), but in the cathodal condition emerged (F(1,5) = 9.55, p = 0.03, R <sup>2</sup> = 0.66, β = 0.81). In participants with large baseline asynchronies no significant effect was found (anodal: F(1,5) = 0.46, p = 0.53, R <sup>2</sup> = 0.08, β = 0.29; cathodal: F(1,5) = −1.64, p = 0.16, R <sup>2</sup> = 0.35, β = −0.59).

Finally, we analyzed whether effects of tDCS on the size of the negative asynchrony is modulated by continuation performance. To this end analysis of covariance (ANCOVA) was calculated with pre-tDCS ITIs in the continuation task as co-variate. In drummers this analysis resulted in a non-significant effect of time in the anodal condition (F(1,16) = 0.02, p = 0.89) while in non-musicians the significant effect of time remained unaffected (F(1,11) = 5.11, p = 0.05). No modulation following cathodal tDCS was found for either group.

#### DISCUSSION

The analysis across the entire group suggests that anodal tDCS over the left PPC in professional drummers yields an increase of the tap-to-tone asynchrony of the right hand while no significant effect on reaction times was found. The data are in line with the hypothesis that the PPC is involved in precise auditory-motor synchronization, but not in motor control per se. The overall-effects in drummers resembled the effects observed in non-musicians (Krause et al., 2014). The sub-group analysis with respect to baseline performance suggests that tDCS influenced synchronization in drummers with small asynchronies only. In contrast to this, in non-musicians tDCS was found to modulate the tap-to-tone asynchrony in participants with large asynchronies. The data support the functional significance of the left PPC for auditory-motor synchronization of the right hand (Krause et al., 2012, 2014) and extend these findings by showing that behavioral tDCS effects vary depending on baseline performance and musical expertise. While in professional drummers the PPC seems to be relevant for exactly keeping the rhythm, in nonmusicians, this area might be rather related to prevent the participants from deviating from a given pace within a broader range.

## Motor Timing in Musicians

Structural as well as functional reorganization in the musician's brain is well-established (e.g., Schlaug, 2001; Münte et al., 2002; Gaser and Schlaug, 2003; Herholz and Zatorre, 2012). Early musical practice drives gray matter plasticity in the ventral premotor cortex (vPMC; Bailey et al., 2014) and white matter volume in the cerebellum (Baer et al., 2015). These changes were correlated with accuracy in an auditory rhythmic synchronization task, which requires subjects to synchronize their finger taps with respect to rhythms varying in their metrical complexity. The vPMC has been particularly related to visuo-motor integration (for a review, see Chen et al., 2009). Interestingly enough, activation changes in this area are not sensitive to the metrical structure of a rhythm as supported by a brain imaging study (reviewed in Chen et al., 2009), suggesting that it is less involved in higher-order aspects of movement control. In contrast to this, it has been suggested that the dPMC plays a crucial role for auditory-motor integration in a synchronization task (Pollok et al., 2008; for a review, see Chen et al., 2009). Findings by Chen et al. (2009) furthermore showed that activity within auditory cortices and the dPMC varies with metrical salience as determined by functional magnetic resonance imaging (fMRI).

A possible contribution of the dorsal PPC (BA 7) for motor timing in musicians has less attracted the literature so far. A recently published study suggests its involvement in the processing of rhythmic deviations in musicians after a short-term sensorimotor training (Lappe et al., 2016). This finding is in line with the hypothesis that processing of temporal and spatial stimuli relies on auditory as well as parietal and prefrontal brain areas (Di Pietro et al., 2004; Koch et al., 2009). The present results extend the current knowledge by providing evidence for the hypotheses that: (i) this area is indeed involved in anticipatory motor control; and (ii) that its functional involvement varies with the size of the baseline asynchrony as well as with musical expertise.

In contrast to the observed tDCS effects on synchronization performance, tDCS did not affect continuation performance as well as simple reaction times in a polarity-specific manner.

Nevertheless, the sub-group analysis of drummers with large baseline ITIs revealed a main effect of time as well as a trend towards significance of factor stimulation. The main effect of time was characterized by larger ITIs prior to as compared to post-tDCS performance. In contrast to this, in drummers with small ITIs a trend of this factor was due to larger ITIs post-tDCS as compared to pre-stimulation performance. Since we did not find a polarity-specific effect, we cannot exclude the possibility that this result reflects training or unspecific tDCS effects. Interestingly enough, results from the ANCOVA suggest that in drummers post-tDCS synchronization performance is modulated by pre-stimulation continuation efficiency. This finding suggests interdependency between both tasks in this group revealing evidence for the hypothesis that in drummers the PPC might be crucial for precise motor timing independent of a pacing signal. All in all this analysis may suggest that motor timing is differentially controlled in drummers as compared to non-musicians.

# PPC and Motor Timing

Precise motor timing is associated with a cerebello-thalamocortical network (Karabanov et al., 2009; Pecenka et al., 2013; for a review, see Coull and Nobre, 2008; Chen et al., 2009). Besides primary and premotor areas, the PPC has been suggested to be of particular importance for superior synchronization in professional drummers using magnetoencephalography (MEG; Krause et al., 2010b). Results from that study revealed evidence for a significantly stronger functional interaction between the thalamus and the PPC in professional drummers. Those data provide additional evidence for a stronger functional interaction between the thalamus and the dPMC, but this was found in professional pianists as well and thus this interaction less likely accounts for superior synchronization observed in drummers.

A recent study investigating patients with brain lesions following stroke suggests that lesions of the left but not the right PPC impair accuracy in the double-step task requiring the modification of an ongoing arm movement (Mutha et al., 2014). Accordingly, inhibitory TMS over the left PPC disturbed this ability (Desmurget et al., 1999) and increased parietal activation was shown when an ongoing action had to be modified (Mars et al., 2007). These data suggest a critical role of the left PPC for action modification in particular when movements were guided by actual and predicted sensory information. Furthermore, the present findings are in line with studies suggesting the involvement of parietal and premotor areas in motor timing (Coull et al., 2013; for a review, see Coull and Nobre, 2008). However, it should be stressed that those data reveal evidence for a specific relevance of the inferior PPC for precise timing while the present data reveal evidence for the contribution of its superior part. Given a stronger involvement of the inferior PPC in visuo-motor integration (for a review, see Chen et al., 2009), this apparent discrepancy can be explained by different modalities of sensory cues used in the studies: in the present study auditory pacing signals were applied while in the studies by Coull et al. (2013) visual stimuli were used.

The present data are in contrast with those from Vicario et al. (2013) showing overestimation of reproduced time intervals following cathodal tDCS over the right PPC and reduced variability following left PPC cathodal stimulation. It should be stressed that reproduction of temporal intervals in the suprasecond range was investigated in that study, possibly requiring visuo-spatial attention. Thus, it is likely that the behavioral effects observed by Vicario et al. (2013) are mediated by attentional changes following right PPC tDCS. Combining those data with data from the present study, one may conclude that the PPC may differentially contribute to auditory-motor synchronization in the sub- and supra-second range. But, we realize that this interpretation remains speculative at the moment.

We argue in favor of a specific relevance of the PPC for auditory-motor synchronization, although tDCS may have affected somatosensory processing or auditory-somatosensory integration rather than motor control. Indeed previous studies suggest that tDCS applied to the PPC modulates multisensory integration of one's own body (for a review, see Azañón and Haggard, 2009). Thus, it remains open whether the effect observed in the present study is indeed due to changes of auditory-motor synchronization or due to a modulation of the necessary somatosensory input.

# A Possible Contribution of the Primary Motor Cortex

Previous studies suggest that TMS (Koch et al., 2007; Karabanov et al., 2013) as well as tDCS over the PPC (Rivera-Urbina et al., 2015) yield changes of M1 excitability likely due to a modulation of functional connectivity between both areas (Rivera-Urbina et al., 2015). Since effects occurred at relatively long intervals of 10 and 15 ms, they are likely due to a modulation of polysynaptic pathways possibly involving the basal ganglia and/or the thalamus (Rivera-Urbina et al., 2015). The feasibility to affect cortical-subcortical connectivity by tDCS has been previously proven by combining neuroimaging studies with tDCS applied to M1 (Polania et al., 2012). Results from other studies reveal further evidence for monosynaptic connections between M1 and PPC (Koch et al., 2007; Koch and Rothwell, 2009; Karabanov et al., 2013), which were particularly found in the inferior parietal sulcus, an area that has been related to visuomotor interaction (Cohen and Andersen, 2002; Grefkes and Fink, 2005). Regarding the dorsal PPC it was found that the strength of functional M1-PPC connectivity varies during learning being stronger at the beginning and returning back to baseline after training (Karabanov et al., 2012). Due to the simplicity of the synchronization paradigm and due to the fact that the subjects were not trained on the task, the effects observed here are less likely due to learning induced changes of M1 excitability. In addition, previous results from our group do not support this hypothesis since in non-musicians tDCS applied to M1 did not result in changes of the tap-to-tone asynchrony (Krause et al., 2014). Those data suggest that M1 seems to be involved more strongly in motor implementation rather than in motor timing. The assumption that M1 and PPC differentially contribute to motor control has been supported by a study investigating the effects of tDCS over both areas on skilled motor function (Convento et al., 2014). Finally, if the present results were indeed due to changes of M1 excitability, we would expect a general effect on movement execution independent of movement type. We would not exclude the possibility that PPC tDCS affected M1 excitability, but the observed behavioral effects appear to be less likely due to such changes.

# Why Does Anodal tDCS Increase the Asynchrony?

The primary mechanisms underlying tDCS effects are most likely alterations of the resting membrane state (for reviews, see Stagg and Nitsche, 2011; Shin et al., 2015). Based on stimulation effects on cortico-spinal excitability as determined by MEP changes, it has been suggested that anodal tDCS yields enhanced motorcortical excitability while cathodal stimulation results in the reversed effect (for reviews, see Stagg and Nitsche, 2011; Shin et al., 2015). However, improved performance following cathodal tDCS has been found in attentional (Weiss and Lavidor, 2012) and complex perceptional tasks (Antal et al., 2004) as well as planning abilities (Dockery et al., 2009).

In a previous study we applied inhibitory rTMS over the left and right PPC, respectively and found smaller tap-to-tone asynchronies as compared to baseline following left PPC rTMS (Krause et al., 2012). Assuming that PPC is relevant for the comparison between predicted and actual sensory feedback (Blakemore and Sirigu, 2003), we brought forward the hypothesis that this potentially time-consuming mechanism is important for complex motor tasks, but detrimental to the relatively easy synchronization task (Krause et al., 2012).

We would like to stress that in precise drummers the asynchrony changed from a mean positive prior to tDCS to a mean negative asynchrony after stimulation, but the size of the asynchrony was comparable. Thus, for this group the assumption of larger tap-to-tone asynchronies following anodal tDCS does not sustain. Nevertheless, although the asynchrony was not found to be larger in terms of absolute values after anodal stimulation, the data suggest that tDCS interferes with synchronization accuracy, but the effect varies depending on baseline performance.

The present results support previous findings by Krause et al. (2014) and reveal further evidence for the involvement of the PPC in anticipatory motor control over motor control in general.

#### Limitations

A main limitation of the present study is the lack of a sham condition. Thus, tDCS effects can be estimated by comparison with baseline performance only. Unexpectedly, the tap-to-tone asynchrony at baseline differed between tDCS conditions. This raises the question whether post-tDCS effects were driven by baseline differences. In order to clarify this issue, regression analyses were calculated. Assuming that post-tDCS synchronization performance was mainly driven by baseline differences, one would expect a significant regression in drummers with small and in non-musicians with large baseline asynchronies. However, this result was not provided by the analyses weakening the hypothesis that effects on synchronization accuracy occurred due to baseline differences. We would therefore argue that the modulation of synchronization performance indeed reflects a ''real'' tDCS effect, rather than an effect of pre-tDCS performance.

Another limitation of the analysis might be seen in the sub-group analysis with respect to the median split of baseline performance. This analysis: (i) does not consider the continuous nature of the outcome measures (i.e., synchronization and continuation); and (ii) does not allow the investigation of the most precise as compared to the most imprecise performance. Nevertheless, we would like to stress that the investigation of such extreme groups was not the aim of the present study. The data should be seen as a piece of evidence for the hypothesis that the functional significance of the PPC might vary depending on the interaction between baseline performance and musical expertise.

In addition, we acknowledge that drummers and non-musicians were not matched with respect to age and gender. For that reason we did not directly compare both groups. However, despite smaller tap-to-tone asynchronies in drummers, the overall effect of tDCS on synchronization performance was comparable in both groups. Thus, we would argue that the results of the sub-group analysis are less likely due to such group differences.

Finally, we realize that although the tap-to-tone asynchrony is usually negative in non-musicians, it can be even positive, particularly in musicians. Thus, the comparison of the tap-to-tone asynchrony between different conditions and groups might result in misleading findings (i.e., individuals always tapping close to the pacing signal will show a mean asynchrony comparable to other individuals producing larger but positive and negative asynchronies). Thus, mean values need to be interpreted

#### REFERENCES


cautiously, and a proper interpretation of synchronization data requires the consideration of the variability across subjects.

# CONCLUSION

The present results suggest that the functional relevance of the PPC for precise auditory-motor synchronization might differ depending on musical expertise. While in drummers the PPC might be relevant for keeping exactly the pace, in non-musicians the PPC might be rather related to prevent the participants from deviating from a given pace within a broader range.

# AUTHOR CONTRIBUTIONS

BP: conception and design of the experiment, data collection and analyses, interpretation of the data, drafting the article; KS: data collection and analyses; AK: data collection, interpretation of the data, critical revision of the article; VK: conception and design of the experiment, data collection, interpretation of the data, critical revision of the article; NKS: conception and design of the experiment, interpretation of the data, critical revision of the article.

# FUNDING

BP is grateful for financial support by a grant from the Deutsche Forschungsgemeinschaft (DFG): PO806-3.

## ACKNOWLEDGMENTS

We would like to thank Dr. Markus Butz for his valuable comments on the manuscript.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Pollok, Stephan, Keitel, Krause and Schaal. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fnhum-11-00110 March 17, 2017 Time: 17:32 # 1

# No Evidence That Short-Term Cognitive or Physical Training Programs or Lifestyles Are Related to Changes in White Matter Integrity in Older Adults at Risk of Dementia

Patrick Fissler1,2 \*, Hans-Peter Müller<sup>2</sup> , Olivia C. Küster1,2, Daria Laptinskaya<sup>1</sup> , Franka Thurm<sup>3</sup> , Alexander Woll<sup>4</sup> , Thomas Elbert<sup>5</sup> , Jan Kassubek<sup>2</sup> , Christine A. F. von Arnim<sup>2</sup> and Iris-Tatjana Kolassa<sup>1</sup>

<sup>1</sup> Clinical and Biological Psychology, Institute of Psychology and Education, Ulm University, Ulm, Germany, <sup>2</sup> Department of Neurology, University Hospital Ulm, Ulm, Germany, <sup>3</sup> Department of Psychology, Technische Universität Dresden, Dresden, Germany, <sup>4</sup> Institute of Sports and Sports Science, Karlsruhe Institute of Technology, Karlsruhe, Germany, <sup>5</sup> Department of Psychology, University of Konstanz, Konstanz, Germany

#### Edited by:

Soledad Ballesteros, National University of Distance Education, Spain

#### Reviewed by:

Betty M. Tijms, VU University Medical Center, Netherlands Joaquin Alberto Anguera, University of California, San Francisco, USA

> \*Correspondence: Patrick Fissler patrick.fissler@uni-ulm.de

Received: 31 October 2016 Accepted: 22 February 2017 Published: 20 March 2017

#### Citation:

Fissler P, Müller H-P, Küster OC, Laptinskaya D, Thurm F, Woll A, Elbert T, Kassubek J, von Arnim CAF and Kolassa I-T (2017) No Evidence That Short-Term Cognitive or Physical Training Programs or Lifestyles Are Related to Changes in White Matter Integrity in Older Adults at Risk of Dementia. Front. Hum. Neurosci. 11:110. doi: 10.3389/fnhum.2017.00110 Cognitive and physical activities can benefit cognition. However, knowledge about the neurobiological mechanisms underlying these activity-induced cognitive benefits is still limited, especially with regard to the role of white matter integrity (WMI), which is affected in cognitive aging and Alzheimer's disease. To address this knowledge gap, we investigated the immediate and long-term effects of cognitive or physical training on WMI, as well as the association between cognitive and physical lifestyles and changes in WMI over a 6-month period. Additionally, we explored whether changes in WMI underlie activity-related cognitive changes, and estimated the potential of both trainings to improve WMI by correlating training outcomes with WMI. In an observational and interventional pretest, posttest, 3-month follow-up design, we assigned 47 communitydwelling older adults at risk of dementia to 50 sessions of auditory processing and working memory training (n = 13), 50 sessions of cardiovascular, strength, coordination, balance and flexibility exercises (n = 14), or a control group (n = 20). We measured lifestyles trough self-reports, cognitive training skills through training performance, functional physical fitness through the Senior Fitness Test, and global cognition through a cognitive test battery. WMI was assessed via a composite score of diffusion tensor imaging-based fractional anisotropy (FA) of three regions of interest shown to be affected in aging and Alzheimer's disease: the genu of corpus callosum, the fornix, and the hippocampal cingulum. Effects for training interventions on FA outcomes, as well as associations between lifestyles and changes in FA outcomes were not significant. Additional analyses did show associations between cognitive lifestyle and global cognitive changes at the posttest and the 3-month follow-up (β ≥ 0.40, p ≤ 0.02) and accounting for changes in WMI did not affect these relationships. The targeted training outcomes were related to FA scores at baseline (cognitive training skills and FA composite score, r<sup>s</sup> = 0.68, p = 0.05; functional physical fitness and fornix FA, fnhum-11-00110 March 17, 2017 Time: 17:32 # 2

r = 0.35, p = 0.03). Overall, we found no evidence of a link between short-term physical or cognitive activities and WMI changes, despite activity-related cognitive changes in older adults at risk of dementia. However, we found positive associations between the two targeted training outcomes and WMI, hinting at a potential of long-term activities to affect WMI.

Keywords: white matter integrity, cognitive training, physical training, cognitive lifestyle, physical lifestyle, older adults, memory complaints, dementia

#### INTRODUCTION

An active cognitive and physical lifestyle can reduce the risk of cognitive decline (Valenzuela and Sachdev, 2006; Sofi et al., 2011; Ngandu et al., 2015) and dementia (Valenzuela and Sachdev, 2006; Hamer and Chida, 2009; Barnes and Yaffe, 2011). Cognitive training programs and video games showed cognitive benefits (Karbach and Verhaeghen, 2014; Lampit et al., 2014; Toril et al., 2014; Ballesteros et al., 2015), and first evidence indicated that cognitive training reduces the incidence of dementia over a 10 year period (Edwards et al., 2016). Similarly, physical activity has yielded promising results with regard to cognitive benefits (Smith et al., 2010; Nagamatsu et al., 2012; Kattenstroth et al., 2013).

Revealing the neurobiological mechanisms of the activityinduced prevention of cognitive decline and dementia could pave the way for an endogenous (Sale et al., 2014), personalized treatment approach (Cuthbert and Insel, 2013). By understanding the mechanisms of intervention effects, the identified neuropathological processes in a given patient can be targeted in an individualized fashion (Cuthbert and Insel, 2013). For example, cognitively impaired patients with deteriorated white matter integrity (WMI) may benefit more from an intervention that targets this microstructural impairment than a patient with the same behavioral syndrome but normal WMI.

However, our knowledge of the neurobiological mechanisms underlying the beneficial cognitive effects of an active lifestyle and training interventions is still in its infancy. Although there is initial evidence of functional and structural brain changes through cognitive and physical activity (Valenzuela et al., 2008, 2011; Erickson et al., 2011; Buschkuehl et al., 2012; Voss et al., 2012; Bennett D.A. et al., 2014; von Bastian and Oberauer, 2014; Constantinidis and Klingberg, 2016), the role of WMI in activityrelated cognitive changes is largely unclear.

Cognitive and physical activity may increase WMI through activity-related myelination (Fields, 2015) that could lead to cognitive benefits. However, current evidence is inconsistent. While some studies support this mechanism for cognitive (Lövdén et al., 2010; Takeuchi et al., 2010; Engvig et al., 2011; Sagi et al., 2012; Steele et al., 2013; Salminen et al., 2016) and physical activities (Chaddock-Heyman et al., 2014; Svatkova et al., 2015), others do not (Voss et al., 2012; Chapman et al., 2013; Strenziok et al., 2014; Lampit et al., 2015). For example, Lampit et al. (2015) did not find cognitive training-induced changes in WMI, despite positive effects on global cognition, and Voss et al. (2012) did not observe positive effects on WMI following an extensive exercise program of three weekly 40-min sessions over the period of 1 year in a sample of 70 participants.

Moreover, there are four knowledge gaps in our understanding of physical and cognitive activity-related WMI changes. These comprise, first, training-induced WMI changes in tracts shown to be affected in cognitive aging and Alzheimer's disease (Head et al., 2004; Ringman et al., 2007; Madden et al., 2012; Wang et al., 2012; Kantarci, 2014; Salat, 2014), second, training-induced WMI changes in a population of older adults at risk of dementia, third, maintenance of traininginduced WMI changes, and fourth, lifestyle-related WMI changes.

To address the inconsistent findings and the knowledge gaps, this study had two primary aims: First, to assess the immediate and long-term effects of cognitive and physical training programs on the integrity of tracts shown to be affected in cognitive aging and Alzheimer's disease (the genu of the corpus callosum, the fornix, and the hippocampal cingulum) in older adults at risk of dementia, and second, to investigate the relationship between cognitive and physical lifestyles and changes in WMI over the 6-month study period.

As additional analyses, we assessed the association at baseline between the two targeted training outcomes (cognitive training skills, functional physical fitness) and WMI in order to reveal the potential of training programs to affect WMI. Finally, we investigated whether changes in WMI could account for activityrelated cognitive changes to understand whether changes in WMI underlie these cognitive changes.

For the cognitive training program, we used a computerbased training program targeting auditory processing and working memory that has been shown to have robust cognitive benefits (Smith et al., 2009; Zelinski et al., 2011, 2014; Bamidis et al., 2015; Shah et al., 2017). For the physical training program, we used a multimodal training regime based on a program that has previously been shown to have cognitive benefits (Thurm et al., 2011). The use of a multimodal exercise program is consistent with findings of larger cognitive benefits through combined aerobic and strength training versus aerobic exercise only (Colcombe et al., 2006; Smith et al., 2010).

With regard to our primary objectives, we hypothesized that the cognitive and physical training groups, in contrast to a passive control group, would exhibit an increase in the fractional anisotropy (FA) composite score at posttest and at the 3-month follow-up. We expected that self-reported active cognitive and physical lifestyles at baseline would be positively associated with changes in the FA composite score at both follow-ups.

#### MATERIALS AND METHODS

fnhum-11-00110 March 17, 2017 Time: 17:32 # 3

#### Study Design

This 10-week interventional, two-center, controlled clinical trial (Ulm and Konstanz, Germany) entailed a three-arm assessor-blinded study evaluating training- and lifestyle-related changes in WMI. This diffusion tensor imaging (DTI) study comprises a subsample of participants of the main study whose results on the cognitive outcomes have previously been reported (Küster et al., 2016). We found that the associations of an active lifestyle with cognitive changes over time were stronger than the effects of specifically designed cognitive or physical training interventions in the same period.

#### Participants

For inclusion in the study, participants had to be 55 or older, suffer from subjective memory complaints and either objective [Munich Verbal Learning Test (Ilmberger, 1988): average of the learning and free long-delayed recall trials below −1 SD of the age norm] or clinically apparent memory impairment (e.g., increased difficulty locating objects, keeping appointments, remembering conversations or events), have vision and hearing adjusted to normal, and be fluent in German. Exclusion criteria were a moderate or severe stage of dementia [Mini Mental State Examination (MMSE) < 20], changes in antidementive or antidepressive medication within 3 months prior to study initiation, a history of severe psychiatric or neurologic disorders, or physical impairment that would prevent participation in the physical training program. Participants without contraindications for magnet resonance imaging (MRI) were offered the opportunity to participate in the MRI subsample.

Subjects were recruited via newspaper articles, flyers, informative meetings at community centers, and personal contacts in the memory clinics of the University Hospital Ulm and the Reichenau Psychiatry Center in Konstanz. The study was approved by the Ethics Committees of the University of Konstanz and Ulm University, Germany. Participants gave written informed consent at screening visits before enrollment in the study.

Of the 122 individuals we screened, 65 were enrolled in the intervention study (Küster et al., 2016); of these, 47 participated in the MRI subsample and were assigned to a 10-week cognitive training group (five sessions/week, n = 13), a physical training group (five sessions/week, n = 14), or a passive control group (n = 20, see **Figure 1**).

The analysis included 39 participants (83% of all enrolled participants). Apart from the FA of the hippocampal cingulum, the three groups did not significantly differ in terms of demographics, FA outcomes, cognitive outcomes, lifestyles, or study-related data, even without adjusting for multiple comparisons (see **Table 1**).

#### Procedure

Outcome variables were assessed within 4 weeks before the 10-week intervention, within 4 weeks after the intervention, and another 3 months later to measure training and lifestylerelated changes in WMI. Due to logistic issues (e.g., limited available facilities, a highly selected study sample with more than 60% exclusions at screening, the required time commitment of participants, the limited time period between pretest and the start of the intervention, and the time slots of the physical training program), it was not possible to achieve the necessary number of included participants that allowed both randomized allocation and a sufficient number of participants to start a new group-based physical training program. To avoid any selection bias, the groups were matched in terms of age, education, gender, and MMSE. When a new physical training program started, all successfully screened participants were allocated to this group until the required number of participants was reached. During the other time periods, a minimization approach was implemented for the allocation of participants to the cognitive training and control groups in order to minimize group differences in age, gender, education, and MMSE. Neuropsychological outcome assessors were blind to the group allocation of participants. In rare cases, participants disclosed their group assignment during the neuropsychological assessment. The blinding of participants was not feasible due to the nature of the behavioral interventions.

#### Outcomes

#### MRI Analysis

#### **Data recording**

The MRI analysis was performed on 1.5 Tesla scanners at the two study centers, Ulm University (center 1, Magnetom Symphony, Siemens Medical) and the University of Konstanz (center 2, Intera, Philips Medical Systems). The DTI study protocol consisted of 2 × 30 gradient directions with b = 1000 s/mm<sup>2</sup> and two b = 0 gradient directions. At both centers, slice thickness was 2.5 mm and in-plane pixel size was 1.875 mm × 1.875 mm; 55 slices (128 pixels × 128 pixels) and 62 slices (128 pixels × 128 pixels) were recorded at center 1 and center 2, respectively. The echo time and repetition time were 28 and 3080 ms at center 1 and 70 ms and 8035 ms at center 2.

#### **Data processing**

DTI analysis was performed using the software package Tensor Imaging and Fiber Tracking (TIFT, Müller et al., 2007; Müller and Kassubek, 2013). For longitudinal data analysis, affine halfway linear registration (Menke et al., 2014) was employed. Pretest and posttest images were halfwaytransformed, whereas follow-up images were affine transformed to the transformed pretest images. FA maps were calculated and smoothed with a Gaussian filter of 2 voxels full-width at the half maximum (FWHM, Madhyastha et al., 2014). Individualized FA templates were calculated by using FA maps of all available measurements of each individual. Based on these individualized FA templates, regions of interest (ROIs) were set. Because this processing procedure was implemented, Montreal Neurological Institute transformation was not necessary.

fnhum-11-00110 March 17, 2017 Time: 17:32 # 4

#### Regions of Interest

Regions of interests were defined in an attempt to focus on white matter correlates of cognitive aging and Alzheimer's disease (Head et al., 2004; Ringman et al., 2007; Madden et al., 2012; Wang et al., 2012; Kantarci, 2014; Salat, 2014). To this end, the WM integrity of hippocampus-related limbic tracts and prefrontal cortex tracts were examined: the genu of the corpus callosum, the fornix and the hippocampal cingulum (see **Figure 2**). The tracts in the genu of the corpus callosum connect the two prefrontal cortices (Hofer and Frahm, 2006), and their white mater integrity has been shown to correlate with executive function (Madden et al., 2009). The fornix and the hippocampal cingulum interconnect the hippocampus with distributed brain areas; their WMI correlates with episodic memory (Bennett I.J. et al., 2014; Bennett and Stark, 2015; Ezzati et al., 2015).

Within the three ROIs (the genu of the corpus callosum, the fornix, and the hippocampal cingulum), two non-overlapping subregions were set and averaged in order to increase reliability. In the genu of the corpus callosum, the two 515-voxel subregions were set in the center of the genu of the midsagittal slice and six voxels to the right lateral direction in the center of the tract. In the fornix, the two 33-voxel subregions were set halfway between the anterior and posterior ends of the fornix in the center of the tract of the midsagittal slice and four voxels apart in the anterior-ventral direction in the center of the tract. In the hippocampal cingulum, the two 33-voxel subregions were set on the same coronal slice in the center of the tract in both hemispheres. The coronal slice was selected as the most anterior and dorsal area of the pyramidal tract. This slice – located anterior to the posterior commissure – generally cuts through the anterior pons and the midsection of the hippocampal cingulum.

The lower threshold for FA values was set to 0.2 to increase the probability that only white matter voxels would be included in the measurements (Kunimatsu et al., 2004). If fewer than 75% of all possible voxels in each subregion were above the threshold, it was lowered accordingly. Only in one participant did the threshold have to be lowered to 0.17 to include more than 75% of the fornix voxels.

#### TABLE 1 | Baseline characteristics of study groups.

fnhum-11-00110 March 17, 2017 Time: 17:32 # 5


<sup>a</sup>p-values of group comparisons refer to one-way ANOVA for continuous variables and to χ 2 tests for categorical variables. gCC, genu of the corpus callosum; HC, hippocampal cingulum; MMSE, Mini Mental State Examination; n, number of participants.

#### Composite Score of WMI

A composite score of the three ROIs was constructed in order to increase statistical power by avoiding multiple comparison problems and by improving the reliability of the outcome. The composite score was calculated by averaging the FA values of the fornix, the hippocampal cingulum, and the genu of the corpus callosum.

#### Cognitive Outcomes

Global cognition, episodic memory, and executive functions were assessed through an extensive cognitive test battery. Principal component analysis served to construct the three composite scores (see Küster et al., 2016). The two composite scores for episodic memory and executive function represent the weighted average of the z-standardized cognitive test scores with loadings of at least aij = 0.4 on the respective components. The global cognition score represents the average of the two component scores.

The test battery consisted of the phonemic and semantic fluency tasks, the Trail Making Test (A and B) from the CERAD neuropsychological battery (Welsh et al., 1994), the forward and backward digit span, the digit symbol coding subtest from the Wechsler Adult Intelligence Scale-III (WAIS-III, Von Aster et al., 2006), the working-memory subtest from the Everyday Cognition Battery (Allaire and Marsiske, 1999), the free recall trial from the Alzheimer's Disease Assessment Scale – cognitive subscale (ADAS-cog, Ihl and Weyer, 1993), and the learning and free long-delayed recall trials from an adapted version of the California Verbal Learning Test (Munich Verbal Memory Test, Ilmberger, 1988).

#### Interventions

#### Cognitive Training

fnhum-11-00110 March 17, 2017 Time: 17:32 # 6

Participants were asked to complete a total of 50 h of computerized, home-based cognitive training within a period of 10 weeks, with five 1-h sessions per week. The training consisted of six different tasks targeting auditory processing and working memory (for details see Mahncke et al., 2006a,b; Küster et al., 2016). In each session, four different 15-min training tasks were completed. The order of the tasks varied in each session; moreover, the difficulty was adapted according to the participant's performance, and correct answers were positively reinforced. This training program was originally developed by Posit Science (San Francisco, CA, USA) and has been adapted and translated into German in a collaboration between Posit Science and the University of Konstanz. In the German version, a sound frequency discrimination task replaces the original auditory working memory task "listen and do" (see Mahncke et al., 2006b; Küster et al., 2016 for detailed training descriptions).

#### Physical Training

Participants were asked to attend a total of 20 sessions of a multimodal physical training program at the respective trial sites within a period of 10 weeks, with two 1-h sessions per week. The training was carried out in groups of 5–10 participants. In addition, a total of 30 sessions of a 20-min home-based physical training program was to be performed three times per week. These training sessions were documented by participants and monitored by the trainers. The multimodal training program involved aerobic, strength, coordination, balance, and flexibility elements and was designed in the form of an imaginary journey. The difficulty was adapted individually by the trainers to match the needs of participants. The structure of this training regime was based on a program that induced positive effects on cognition in a previous study on frail nursing-home residents (Thurm et al., 2011).

#### Passive Control Group

Wait-list control participants (controls) were asked to continue their daily life as usual and were given the opportunity to participate in one of the training programs after their follow-up assessment.

#### Assessment of Lifestyle

The cognitive and physical lifestyles of participants were assessed through the Community Healthy Activities Model Program for Seniors Physical Activity Questionnaire for Older Adults (CHAMPS, Stewart et al., 2001). This questionnaire describes 40 possible activities in the participants' daily life, categorized into physical activities (such as running, swimming, or bicycling) and cognitively challenging activities (such as playing card or board games, performing voluntary work, or playing a musical instrument; see Küster et al., 2016). Participants were asked to report the activities in which they had engaged in the previous four weeks. The number of completed activities was divided by the potential number of activities in each domain. These scores reflect the variety in the participants' cognitive and physical lifestyles, respectively.

# Cognitive Training Skills

Cognitive training skills were measured by averaging the standardized training performance in the most frequently used cognitive training tasks: "high or low," "tell us apart," "sound replay," and "match it." Changes in cognitive training skills were measured in terms of the difference between the third and the last training session (the first two training sessions were guided by trainers). Unfortunately, the cognitive training data from two individuals were not properly stored and could not be included in the analysis.

#### Functional Physical Fitness

Functional physical fitness was assessed with four tasks from the Senior Fitness Test (Rikli and Jones, 2001): "chair stand," "chair sit-and-reach," "2-min step," and "8-feet up-and-go" which measure strength, flexibility, endurance, and agility, respectively. Z-standardized scores were averaged to create the functional physical fitness composite score.

#### Statistical Analyses

Statistical analyses were conducted using R version 3.2.1 for Windows (R Development Core Team, 2015). To assess baseline differences between the three groups, χ 2 -tests and one-way analyses of variance were used for categorical variables and continuous variables, respectively.

#### Training- and Lifestyle-Related FA and Cognitive Changes

The effects of training interventions on WMI as well as lifestyle-related changes in WMI were assessed with linear mixed-effects models with maximum likelihood estimation (nlme package, Pinheiro et al., 2000). Group (with contrasts cognitive training vs. controls and physical training vs. controls), physical lifestyle, cognitive lifestyle, and time (with contrasts pre vs. post and pre vs. follow-up) were defined as fixed effects, and subject as the random intercept. Hypothesisrelevant effects were indicated by Group × Time, Physical Lifestyle × Time, and Cognitive Lifestyle × Time interactions. Hedges' g was based on the pretest standard deviation; this was calculated by the difference in change scores between (1) the physical training group vs. the control group and (2) the cognitive training group vs. the control group divided by the pooled baseline standard deviation corrected for bias in small samples (Lakens, 2013). Positive values indicate beneficial effects of the intervention. Standardized regression coefficients of cognitive and physical lifestyles predicting changes in outcomes were used as effect size measure for lifestyle-related outcome changes.

#### The Potential of the Two Training Programs to Affect White Matter Integrity

To assess the potential of the cognitive and physical training programs to improve hippocampus-related and prefrontal WMI, we performed two analyses: (1) at pretest, we assessed the crosssectional correlations of cognitive training skills and functional physical fitness with FA and cognitive outcomes, and (2) we investigated the improvement in cognitive training skills and functional physical fitness within the respective training groups. For the analyses of cognitive training skills, we used non-parametric procedures (Spearman's rank correlation and Wilcoxon signed rank test for paired differences) due to the small sample size (n = 9).

#### Reliability of FA Scores

fnhum-11-00110 March 17, 2017 Time: 17:32 # 7

Retest-reliability was assessed through correlations between pretest and posttest scores within the total study sample including all three groups.

#### RESULTS

# Effects of Cognitive and Physical Training on WMI and Cognition

We did not find a significant influence of the cognitive or physical training program on WMI compared to the control group, neither at the posttest (all ps ≥ 0.18 before adjustment of multiple comparisons; Hedges' gs ≤ 0.25) nor at the 3-month follow-up (all ps ≥ 0.16; Hedges' gs ≤ 0.31). Hedges' gs of the FA composite score were −0.09, 95% CI [−0.43, 0.22] at posttest and −0.14, 95% CI [−0.90, 0.57] at the 3-month follow-up for the cognitive training, and 0.03, 95% CI [−0.41, 0.47] at posttest and −0.18, 95% CI [−0.79, 0.40] at the 3-month follow-up for the physical training (see **Table 2**).

Likewise, we did not find a significant impact of both training programs on global cognition compared to the control group, neither at the posttest (all ps ≥ 0.09; Hedges' gs ≤ −0.16) nor at the 3-month follow-up (all ps ≥ 0.12; Hedges' gs ≤ −0.12; see **Table 2**).

#### Cognitive and Physical Lifestyle-Related Changes in WMI and Cognition

We did not find significant associations between self-reported cognitive and physical lifestyles at baseline and changes in WMI, neither at the posttest (all ps ≥ 0.08 before adjustment of multiple comparisons; all βs ≤ 0.34) nor at the 3-month followup (all ps ≥ 0.31 before adjustment of multiple comparisons; all βs ≤ 0.20). Effect sizes for the FA composite score were β = 0.20, 95% CI [−0.16, 0.56] at the posttest and β = −0.04, 95% CI [−0.54, 0.45] at the 3-month follow-up with respect to cognitive lifestyle, and β = −0.04, 95% CI [−0.40, 0.32] at the posttest and β = 0.15, 95% CI [−0.34, 0.64] at the 3-month follow-up with respect to physical lifestyle (see **Table 3**).

Despite the lack of significant lifestyle-related FA changes, we found an association between cognitive lifestyle and changes in both global cognition and episodic memory from the pretest to the posttest and to the 3-month follow-up (all ps ≤ 0.02, all βs ≥ 0.40; see **Figure 3** and **Table 3**).

#### Additional Analyses

#### The Potential of the Two Training Programs to Affect White Matter Integrity

Additional analyses showed that cognitive training skills at the start of the program were correlated with the FA composite score, r<sup>s</sup> = 0.68, p = 0.05, indicating the potential of the cognitive training program to affect WMI and the fact that engagement in cognitive training taps the neural connections of interest (see **Figure 4**). Associations between the various ROIs and the cognitive training skills were similar, with medium to large effect sizes: fornix, r<sup>s</sup> = 0.50, p = 0.18; hippocampal cingulum, r<sup>s</sup> = 0.33, p = 0.39; genu of the corpus callosum, r<sup>s</sup> = 0.60; p = 0.01.

In the cognitive training group, we found a significant increase in cognitive training skills over the training period, with a very large effect size, g = 1.68, p = 0.008. Performance changes in all four training tasks revealed medium to very large effect sizes: "match it," g = 1.47, p = 0.02; "sound replay," g = 0.52, p = 0.20; "high or low," g = 0.89, p = 0.008; "tell us apart," g = 0.95, p = 0.10.

Functional physical fitness was marginally significantly associated with the FA composite score, r = 0.28, p = 0.08, and significantly related to the fornix FA, r = 0.35, p = 0.03 (see **Figure 5**) indicating that interventions that target physical fitness have the potential to affect WMI.

In the physical training group, we found a significant increase in functional physical fitness over the study period, p = 0.02. This increase was marginally significant at the posttest, β = 0.51, p = 0.07, and significant at the 3-month follow-up, β = 0.88, p = 0.007.

#### Associations between Changes in Targeted Training Outcomes and FA Changes

Changes in cognitive training skills were not associated with changes in the FA composite score, r<sup>s</sup> = −0.27, p = 0.49, or in global cognition, r<sup>s</sup> = 0.20, p = 0.61. Likewise, changes in functional physical fitness did not correlate with changes in the FA composite score at posttest, r = −0.19, p = 0.28, or at follow-up, r = 0.01, p = 0.96, nor in global cognition at posttest, r = −0.10, p = 0.58, or at follow-up, r = −0.14, p = 0.54.

#### Reliability of FA Measures

Retest-reliability between pretest and posttest was high for the composite FA score, r = 0.91, and ranged from r = 0.92 for the genu of the corpus callosum to r = 0.91 for the fornix and r = 0.80 for the hippocampal cingulum.

## DISCUSSION

We found no evidence of an effect of short-term cognitive or physical training programs on WMI in regions that have previously been shown to be affected in cognitive aging and Alzheimer's disease (the genu of the corpus callosum, the fornix, and the hippocampal cingulum) in a sample of older adults at risk of dementia (Head et al., 2004; Ringman et al., 2007; Madden et al., 2012; Wang et al., 2012; Kantarci, 2014; Salat,

#### TABLE 2 | Effects of training interventions.

fnhum-11-00110 March 17, 2017 Time: 17:32 # 8


<sup>a</sup>p-value of the Group [Control vs. Cognitive/Physical Training] × Session [pre vs. post and pre vs. 3-month FU] interaction, before adjustment for multiple comparisons. <sup>b</sup>Hedges' g: change in cognitive/physical training minus change in control group divided by the pooled baseline standard deviation corrected for bias in small samples. Positive values indicate beneficial effects of the interventions. gCC, genu of the corpus callosum; FA, fractional anisotropy; FU, follow-up; FX, fornix; HC, hippocampal cingulum; MMSE, Mini Mental State Examination; n, number of participants.

2014). The estimated effect sizes of the two training programs at the posttest were not of relevance (Hedges' g < 0.1), and the two 95% confidence intervals did not include medium effects (Hedges' g < 0.5).

The lack of training-induced changes in FA is consistent with several previous findings. For example, for the cognitive training program used in our study, Strenziok et al. (2014) did not find any effect on FA scores compared to two other video games. Moreover, in one of the largest studies in the field, Voss et al. (2012) did not show significant FA increases in a 1-year aerobic fitness training intervention compared to a stretching control intervention.

It is worth to note that physical training has been shown to increase FA in fiber tracts implicated in motor functioning such as the corticospinal tract (Svatkova et al., 2015). These tracts were not of interest in this study and potential effects could not be detected in our ROI analysis.

The lack of a cognitive training effect contrasts with three studies that found significant effects of different working memory training programs on regions of the anterior part of the corpus callosum (Lövdén et al., 2010; Takeuchi et al., 2010; Salminen et al., 2016). These inconsistent results might be explained by the working memory training and by the study population. In contrast to the other studies our working memory training did not include an updating component and our sample comprised older adults at risk of dementia vs. younger adults in Takeuchi et al. (2010) and Salminen et al. (2016), and healthy older adults in Lövdén et al. (2010).

The associations between cognitive training skills and the FA composite, as well as between functional physical fitness with the fornix FA hint at the potential of cognitive and physical activities to improve WMI in these tracts. Correlations between these two training outcomes and FA transfer outcomes allow us to estimate the maximal transfer gains given a specific increase in the training outcomes (Jaeggi et al., 2010; Baniqued et al., 2013; Rode et al., 2014). The higher the association, the higher is the transfer potential. Therefore, long-term rather than shortterm training programs and lifestyles that induce larger effects on training outcomes may significantly increase the targeted white matter tracts.

Self-reported lifestyles at baseline were not associated with changes in WMI. In addition, positive associations between cognitive lifestyle and changes in global cognition and episodic memory were not altered after accounting for WMI. To our knowledge, there has been no other study that has assessed the relationship between lifestyles and changes in WMI. Therefore, this is initial evidence that other brain mechanisms than changes

#### TABLE 3 | Associations with cognitive and physical lifestyles.

fnhum-11-00110 March 17, 2017 Time: 17:32 # 9


<sup>a</sup>βs represent the standardized regression coefficients of cognitive and physical lifestyles predicting changes in outcomes. <sup>b</sup>p-value of the Cognitive Lifestyle × Session [pre vs. post and pre vs. 3-month FU] interaction. <sup>c</sup>p-value of the Physical Lifestyle × Session [pre vs. post and pre vs. 3-month FU] interaction. gCC, genu of the corpus callosum; FA, fractional anisotropy; FU, follow-up; FX, fornix; HC, hippocampal cingulum; MMSE, Mini Mental State Examination; n, number of participants.

in WMI do underlie lifestyle-related cognitive changes in older adults at risk of dementia.

#### Limitations

Our use of ROI analyses rather than whole brain-based approaches means that any changes in other brain regions would not be detected. However, in this sample of older adults at risk of dementia, we were particularly interested in the white matter tracts that are affected in cognitive aging and Alzheimer's disease. Importantly, by using ROIs, we limited the problems of alpha-error inflation and a reduction in power through multiple comparisons – an issue that is particularly important in analyses with limited sample sizes. Other limitations include the lack of randomization, which was not feasible due to logistic issues (see above). However, we used a minimization approach instead to prevent group differences in participants' characteristics from inducing bias. The limited sample size likely impeded the detection of very small effects. However, the sample size was sufficient to detect lifestyle-related cognitive changes and to reveal associations between WMI and both cognitive training skills and functional physical fitness. In addition, the confidence intervals of the training effects immediately after the training period were lower than a Hedges' g of 0.5, suggesting that effects of medium size are unlikely. Finally, the lack of a lifestyle intervention

prevented a causal inference regarding associations between lifestyles and FA changes. However, before implementing costintensive experimental designs, it is a reasonable strategy to initially employ observational designs.

fnhum-11-00110 March 17, 2017 Time: 17:32 # 10

white matter integrity. Association between cognitive training skills at the beginning of the training and FA composite score at baseline (r<sup>s</sup> = 0.68, p = 0.05). The cognitive training data from two individuals were not properly stored and could not be included in the analysis.

# Future Perspectives

Future studies should use larger samples to increase the probability of finding small effect sizes; moreover, they should lengthen the training periods to enhance the potential to induce larger effects. In addition, little is known about the time course and maintenance of activity-induced white matter changes, suggesting that future studies should implement multiple assessments during the training regime and after the training period. Activity-related white matter changes may be differential for specific populations; thus, younger participants without cognitive impairments may profit more than older adults at risk of dementia. Future meta-analyses should assess these potential moderators. Interventional studies have only rarely reported the correlation of training outcomes with potential neurobiological mechanisms and have neglected the relation between cognitive and neurobiological changes. Future interventional studies should include these analyses to allow a better understanding of the mediating role of WMI for cognitive benefits. Finally, to our knowledge, cognitive and physical lifestyle-related changes in WMI have not yet been reported. Large-scale studies investigating this association should be conducted as a first step to explore the role of active cognitive and physical lifestyles for WMI.

#### Conclusion

First, we found no evidence that short-term cognitive and physical training programs do affect the integrity of hippocampus-related and prefrontal white matter tracts in older adults at risk of dementia. Second, we provide first evidence that WMI changes do not underlie the positive association between a cognitive lifestyle and cognitive change. However, as the two training outcomes (cognitive training skills and functional physical fitness) were related to WMI, engagement in long-term cognitive and physical activities might have the potential to affect WMI.

# AUTHOR CONTRIBUTIONS

PF contributed to study conception and design, organized study procedures and acquired data, analyzed and interpreted data, and wrote the first draft of the manuscript as well as the paper. H-PM designed the MRI protocol, supervised the DTI analysis, was involved in the interpretation of the imaging results, and critically revised the manuscript for intellectual content. OK and DL contributed to study conception and design, organized study procedures and acquired data, contributed to the data analysis and interpretation of results, and critically revised the manuscript for intellectual content. FT contributed to study conception and design, organized study procedures, acquired data, and critically revised the manuscript for intellectual content. AW designed the physical training program and revised the manuscript. TE contributed to study conception and design and critically revised the manuscript for intellectual content. JK designed the MRI protocol, supervised the DTI analysis, was involved in the interpretation of the imaging results, and critically revised the manuscript for intellectual content. CvA and I-TK conceptualized the study, obtained funding, supervised all phases of the study as principle investigators, and critically revised the manuscript for intellectual content. All authors read and approved the final manuscript.

## FUNDING

This research was supported by the Heidelberg Academy of Sciences and Humanities, Germany. The funders had no role in study design, data collection, analysis and interpretation of the data, writing the manuscript, or the decision to submit the manuscript for publication.

#### ACKNOWLEDGMENTS

fnhum-11-00110 March 17, 2017 Time: 17:32 # 11

We thank Anita Stewart for the adaptation of the English CHAMPS questionnaire into German, Claire Bacher for English proofreading, and Rosine Gröschel, Nelli Hirschauer, Jens Kalchthaler, Anne Korzowski, Claudia Massau, Dörte Polivka, Florentine Jurisch, and Christina Schaldecker for support in subject recruitment, data acquisition, and training implementation.

#### AVAILABILITY OF DATA AND MATERIALS

Data will be made available upon request. The participants did not approve the unrestricted publication of the data in

# REFERENCES


their informed consents, as this option was not common at the time.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Ethics Committees of the University of Konstanz and Ulm University with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Ethics Committees of the University of Konstanz and Ulm University, Germany.

Association International Conference 2016, Toronto, ON. doi: 10.1016/j.jalz. 2016.06.373


fnhum-11-00110 March 17, 2017 Time: 17:32 # 12


presymptomatic carriers of familial Alzheimer's disease mutations. Brain 130, 1767–1776. doi: 10.1093/brain/awm102


fnhum-11-00110 March 17, 2017 Time: 17:32 # 13


and on untrained outcomes. Front. Hum. Neurosci. 8:617. doi: 10.3389/fnhum. 2014.00617

Zelinski, E. M., Spina, L. M., Yaffe, K., Ruff, R., Kennison, R. F., Mahncke, H. W., et al. (2011). Improvement in memory with plasticity based adaptive cognitive training: results of the 3 month follow up. J. Am. Geriatr. Soc. 59, 258–265. doi: 10.1111/j.1532-5415.2010.03277.x

**Conflict of Interest Statement:** TE and I-TK are members of the scientific advisory board of Posit Science. CvA received honoraria from serving on the scientific advisory board of Nutricia GmbH and Honkong University Research council, travel funding and speaker honoraria from Nutricia GmbH, Novartis Pharma GmbH, Lilly Deutschland GmbH, Desitin Arzneimittel GmbH, and Dr. Willmar Schwabe GmbH & Co. KG, and research support from Roche Diagnostics GmbH, Biologische Heilmittel Heel GmbH, and ViaMed GmbH.

The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Fissler, Müller, Küster, Laptinskaya, Thurm, Woll, Elbert, Kassubek, von Arnim and Kolassa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Aerobic Exercise as a Tool to Improve Hippocampal Plasticity and Function in Humans: Practical Implications for Mental Health Treatment

#### Aaron Kandola1,2 \*, Joshua Hendrikse<sup>1</sup> , Paul J. Lucassen<sup>3</sup> and Murat Yücel<sup>1</sup> \*

<sup>1</sup> Brain and Mental Health Lab, School of Psychological Sciences and Monash Institute of Cognitive and Clinical Neurosciences, Monash University, Melbourne, VIC, Australia, <sup>2</sup> Amsterdam Brain and Cognition, University of Amsterdam, Amsterdam, Netherlands, <sup>3</sup> Centre for Neuroscience, Swammerdam Institute of Life Sciences, University of Amsterdam, Amsterdam, Netherlands

#### Edited by:

Claudia Voelcker-Rehage, Chemnitz University of Technology, Germany

#### Reviewed by:

Yvonne Nolan, University College Cork, Ireland Dieter J. Meyerhoff, University of California, San Francisco, USA

\*Correspondence:

Aaron Kandola aaron.kandola@gmail.com Murat Yücel murat.yucel@monash.edu

Received: 07 March 2016 Accepted: 11 July 2016 Published: 29 July 2016

#### Citation:

Kandola A, Hendrikse J, Lucassen PJ and Yücel M (2016) Aerobic Exercise as a Tool to Improve Hippocampal Plasticity and Function in Humans: Practical Implications for Mental Health Treatment. Front. Hum. Neurosci. 10:373. doi: 10.3389/fnhum.2016.00373 Aerobic exercise (AE) has been widely praised for its potential benefits to cognition and overall brain and mental health. In particular, AE has a potent impact on promoting the function of the hippocampus and stimulating neuroplasticity. As the evidence-base rapidly builds, and given most of the supporting work can be readily translated from animal models to humans, the potential for AE to be applied as a therapeutic or adjunctive intervention for a range of human conditions appears ever more promising. Notably, many psychiatric and neurological disorders have been associated with hippocampal dysfunction, which may underlie the expression of certain symptoms common to these disorders, including (aspects of) cognitive dysfunction. Augmenting existing treatment approaches using AE based interventions may promote hippocampal function and alleviate cognitive deficits in various psychiatric disorders that currently remain untreated. Incorporating non-pharmacological interventions into clinical treatment may also have a number of other benefits to patient well being, such as limiting the risk of adverse side effects. This review incorporates both animal and human literature to comprehensively detail how AE is associated with cognitive enhancements and stimulates a cascade of neuroplastic mechanisms that support improvements in hippocampal functioning. Using the examples of schizophrenia and major depressive disorder, the utility and implementation of an AE intervention to the clinical domain will be proposed, aimed to reduce cognitive deficits in these, and related disorders.

Keywords: aerobic fitness, hippocampus, plasticity, schizophrenia, depression, dementia, memory, neurogenesis

# INTRODUCTION

The brain is continuously balancing two conflicting demands: it must retain enough structural integrity to maintain proper neurotransmission, and function efficiently, whilst remaining malleable enough to restructure itself and adapt to changing environmental demands. The dynamic nature of the brain is underpinned by the concept of neuroplasticity, which refers to the brains

capacity to change and reorganize itself in response to internal and/or external influences. The impact and eventual consequences of brain plasticity can be twofold; i.e., these influences can be adaptive, such as during skill learning, when they help the individual to survive, or can be maladaptive, when plasticity is insufficient to meet a specific demand, which may then contribute toward disease conditions.

To some extent, brain disorders can be considered maladies of neuroplasticity (Krystal et al., 2009). As such, stimulating neuroplasticity is becoming a popular approach aimed to counteract pathological harm (Kays et al., 2012). Interestingly, a stimulus as peripheral as aerobic exercise (AE) has been demonstrated to have a strong influence on inducing neuroplasticity (Voss et al., 2013a) and promoting cognitive performance (Smith et al., 2010). Given its general benefits to ones physical health, low risk profile and relative ease of implementation, AE represents a promising therapeutic target for a range of brain pathologies.

One brain region with a high degree of endogenous neuroplasticity is the hippocampus (Bavelier and Neville, 2002). The hippocampus is heavily involved in learning and memory processes and is particularly vulnerable to damage in various pathological conditions like major depressive disorder (MDD), Alzheimer's disease and schizophrenia (Adriano et al., 2012; Bartsch and Wulff, 2015; Schmaal et al., 2016). Conversely, the positive influence of AE on neuroplasticity is more pronounced within the hippocampus than in any other brain region (Gomez-Pinilla and Hillman, 2013). AE has been suggested as a promising approach in remediating hippocampal harm and cognitive deficits caused by neurodegenerative disorders like AD (Intlekofer and Cotman, 2013), but AE may also be a promising approach for various psychiatric disorders like schizophrenia or MDD (Oertel-Knöchel et al., 2014).

This paper will review the cognitive benefits associated with AE and focus on aspects of cognition that are particularly dependent hippocampal functioning such as episodic memory formation. We further discuss the capacity of AE to stimulate macro- and micro-scale neuroplastic mechanisms relating to hippocampal functioning. Finally, we address the suitability of AE to be used as a novel therapeutic intervention for psychiatric disorders. Both schizophrenia and MDD are associated with hippocampal deterioration and cognitive dysfunctions, so we take these two conditions as examples to discuss the potential utility of AE-based interventions.

## EXERCISE AND COGNITION

The interest in how AE influences cognitive performance has exploded in the past decade. AE generally refers to exercise that improve the efficiency of aerobic energy producing systems by increasing maximal oxygen uptake and cardiorespiratory endurance (Voss et al., 2013a). Large-scale epidemiological studies have consistently correlated high levels of aerobic fitness with greater academic achievement and IQ scores (Sibley and Etnier, 2003; Tomporowski et al., 2008, 2014; Howie and Pate, 2012) as well as with a greater preservation of cognitive function in old age (Yaffe et al., 2001; Barnes et al., 2003; Middleton et al., 2008; Wendell et al., 2014) and fewer incidences of dementia (Hamer and Chida, 2009). The capacity to promote cognitive performance in this way implies that AE may have an important clinical relevance in counteracting the cognitive decline associated with aging or dementia (Kramer et al., 2006) and has catalyzed its systematic investigation. Many randomized controlled trials (RCTs) have now been conducted using AE interventions of a moderate intensity (such as 30 min of Nordic walking) that generally span for 3–12 months and are mostly conducted in older adults. Meta-analyses have found AE interventions to improve cognitive performance across a variety of domains, including attention, executive functioning, processing speed, motor functioning, and memory in healthy young and middle aged adults (Etnier et al., 1997; Smith et al., 2010; Chang et al., 2012; Roig et al., 2013; Verburgh et al., 2013) but mostly in older age groups (Etnier et al., 1997; Colcombe and Kramer, 2003; Angevaren et al., 2008; van Uffelen et al., 2008; Smith et al., 2010; Snowden et al., 2011; Chang et al., 2012), as well as in older individuals with mild cognitive impairments or dementia (Heyn et al., 2004; van Uffelen et al., 2008; Gates et al., 2013).

The available evidence strongly suggests that AE has a positive influence on cognition in individuals of all age groups, particularly in older adults. However, the exact nature of how AE impacts upon cognition is not yet clear. Some RCTs have stipulated that AE influences divergent cognitive domains, whereas others have suggested AE had no significant impact on cognition at all (Etnier et al., 2006; van Uffelen et al., 2008; Snowden et al., 2011; Gates et al., 2013; Kelly et al., 2014; Young et al., 2015). Such inconsistencies may partially be explained by the methodological variation between RCTs, making it difficult to systematically compare their findings in meta-analyses (Angevaren et al., 2008; Young et al., 2015).

On a neural level, findings have been consistent in demonstrating that AE has a strong, positive influence on the structure of the hippocampus, which is not seen to the same extent in any other brain regions (Gomez-Pinilla and Hillman, 2013). As will be discussed below, it has been extensively demonstrated in animal models that AE stimulates a cascade of neuroplastic mechanisms within the hippocampus that are often paralleled by functional improvements (Voss et al., 2013a; Opendak and Gould, 2015). A large cohort of animal studies have assessed the functional impact of AE using tasks designed to specifically assess hippocampus dependent processing such as spatial (Vaynman et al., 2004) or contextual memory (Radak et al., 2006). Using hippocampus dependent paradigms, AE has been reliably demonstrated to improve task performance in animal models (van Praag, 2009; Voss et al., 2013a), although this does also depend on others aspects of AE such as the duration and intensity of AE (Naylor et al., 2005; Ploughman et al., 2007; O'Callaghan et al., 2009).

Until recently, human studies had primarily focussed on examining the impact of AE on performance in frontal-executive or attentional tasks rather than on specific hippocampus dependent forms of cognition (Ruscheweyh et al., 2011). It is possible that this may have contributed toward the inconsistent

findings when investigating the effect of AE on human cognition. Based on the profound impact of AE on the hippocampus, this bottom-up focus on hippocampus dependent processing may also be a useful approach in specifying the impact of AE on human cognition and will be adopted here.

# A Hippocampus-Centric Approach

The hippocampus plays an important role in both learning and memory (Jarrard, 1993) and affective processing (Phillips et al., 2003). The dichotomous functioning of the hippocampus is thought to be reflected in its structure, with affective processing being largely attributed to the ventral hippocampus and learning and memory processes mostly occurring through the dorsal hippocampus (Moser and Moser, 1998). Given that cognition is a central theme of this paper, the following sections will predominantly focus on the role of the hippocampus in learning and memory processing.

Several meta-analyses have denoted the tendency for RCTs to report improvements in memory-based task performance following an AE intervention (Colcombe and Kramer, 2003; van Uffelen et al., 2008; Smith et al., 2010; Chang et al., 2012; Roig et al., 2013). However, some domains of memory are more reliant on hippocampal functioning than others (see **Box 1**) and therefore, it is possible that the most consistent cognitive improvements may be found in these specific domains of memory.

Some human studies have focussed on assessing the influence of AE on hippocampus dependent cognition and shown in older adults that AE was associated with improved performances in both episodic (Richards et al., 2003; Stewart et al., 2003; Sabia et al., 2009; Flöel et al., 2010; Ruscheweyh et al., 2011) and spatial (Erickson and Kramer, 2009; Erickson et al., 2010, 2011) memory tasks as well as in pattern separation tasks in

BOX 1 | Hippocampus dependent memory. The human hippocampus plays a vital role in the formation of declarative memories, most prominently of which, in the formation of episodic and spatiotemporal memories (Burgess et al., 2002). Episodic memory refers to the recollection of autobiographical events and is related to spatial memory, which refers to one's environment and spatial orientation. Spatial and episodic memory processes are inherently related given their specific reliance on the hippocampus (Bird and Burgess, 2008) and the fact that episodic memories are encoded in a spatio-temporal context (Tulving, 1993), making spatial information important in episodic memory formation. Also, the hippocampus, and particularly the DG, is crucial in selecting and separating similar events in space and time, and hence pattern separation is a main function attributed to the hippocampus (Yassa and Stark, 2011; Oomen et al., 2013, 2014). It is important to note that given the requirement of a conscious experience to form an episodic memory, at present episodic memory cannot be directly studied in animals given the lack of behavioral markers for their conscious experience (Clayton et al., 2001). Contextual memory is a process strongly related to episodic memory that is also highly dependent on the hippocampus and refers to the capacity for an animal to make associations with salient landmark objects and their environmental context (Eichenbaum et al., 2005). As there is currently no objective proxy for studying episodic-like memory processing in animals (Templer and Hampton, 2013), hippocampal functioning shall be considered here as a function of contextual and spatial memory task performance when referring to animal literature and episodic and spatial memory task performance in human literature.

young adults (Déry et al., 2013). Moreover, some studies have demonstrated in preadolescent children and young adults that AE is selectively associated with improved performances on contextual (Chaddock et al., 2010, 2011; Monti et al., 2012) and spatial (Stroth et al., 2009; Herting and Nagel, 2012) memory tasks and not with less hippocampus dependent tasks such as attention, verbal memory, or item recognition tasks.

Despite a limited selection of studies, these findings indicate that AE may have a positive influence on hippocampus dependent forms of cognition in healthy human participants, similar to what has been consistently shown in animal models. Pertaining to its highly neuroplastic nature (Bavelier and Neville, 2002), the hippocampus is particularly vulnerable to structural and functional deterioration in a wide range of neurological and psychiatric disorders (Bartsch and Wulff, 2015). The aforementioned studies demonstrate that AE could have a positive influence on hippocampal functioning, but a significantly greater cohort of systematic investigations using human participants will be necessary to outline this relationship on a broader cognitive level. A growing body of evidence is also accumulating to suggest that AE may have a prominent impact on hippocampal structure in humans as well as in animal models. The following sections will first discuss the relationship between AE and gross structural changes related to the hippocampus in humans. We will then discuss both animal and human work, which indicates that these structural changes may be driven by a cascade of micro-scale neuroplastic mechanisms within the hippocampus that are stimulated by AE.

# MACRO-SCALE CHANGES

Directly studying structural changes in the human brain in vivo is currently limited to the use of neuroimaging techniques, like magnetic resonance imaging (MRI), to detect macro-scale changes such as in grey-matter volume or white-matter integrity.

## Gray-Matter

A number of cross-sectional studies have used both voxel-based morphometric (VBM) and region of interest (ROI) techniques on structural MRI data to estimate volume changes associated with AE. Higher levels of aerobic fitness have been consistently associated with larger hippocampal or temporal lobe volumes in healthy adolescent (Chaddock et al., 2010) and older adults (Colcombe et al., 2003; Bugg and Head, 2011; Head et al., 2012; Niemann et al., 2014). Several studies have also shown that hippocampal growth induced by AE correlates with a greater performance on spatial memory tasks such as a virtual Morris Water Maze task (Erickson and Kramer, 2009; Szabo et al., 2011; Herting and Nagel, 2012) and on contextual memory tasks (Chaddock et al., 2010) with correlations ranging from r = 0.12 to r = 0.36. Cross-sectional data has also indicated that AE may also be beneficial to non-healthy individuals as higher levels of aerobic fitness have also been correlated with larger hippocampal volumes in patients with obesity (Bugg et al., 2012), anorexia (Beadle et al., 2015), mild cognitive impairments (Gates et al., 2013; Makizako et al., 2014), MDD (Travis et al., 2015),

Alzheimer's disease (Honea et al., 2009), and multiple sclerosis (Prakash et al., 2010; Motl et al., 2015).

Several RCTs have extended upon these findings in healthy and non-healthy samples, which have mostly implemented AE interventions of a moderate intensity for 3–12 months. RCTs in healthy samples have demonstrated that AE interventions were associated with increases in hippocampal volume in young and middle aged (Thomas et al., 2016) and older adults (Colcombe et al., 2006; Erickson et al., 2011; Niemann et al., 2014; Kleemeyer et al., 2015; Maass et al., 2015; Sexton et al., 2015a). In some cases the AE induced growth in hippocampal volume was correlated with improved performances on a computerized spatial memory task (r = 0.28; Erickson et al., 2011) or on a complex figure test of spatial object recognition (r = 0.37; Maass et al., 2015). AE interventions have also been shown to increase hippocampal volume in patients with mild cognitive impairments (ten Brinke et al., 2015), schizophrenia (Pajonk et al., 2010), and multiple sclerosis (Leavitt et al., 2014). While some RCTs have found that AE did not have an impact on hippocampal volume (Ruscheweyh et al., 2011; Scheewe et al., 2013; Krogh et al., 2014; Rosenbaum et al., 2015; Malchow et al., 2016), it is possible that such discrepancies are caused by a lack of consistency in AE protocols used (Prakash et al., 2015) or in the methods of calculating hippocampal volume (Niemann et al., 2014).

# White-Matter

The majority of the current literature has focussed on assessing gray-matter changes, but some studies have investigated the impact of AE on white-matter integrity. A recent systematic review concluded that AE was associated with global and localized improvements in white-matter volume and microstructural integrity in older adults (Sexton et al., 2015b). It would be reasonable to expect AE to have a specific impact on hippocampal connectivity, but such findings were not supported in this review (Sexton et al., 2015b). However, some studies have found AE to be associated with greater white-matter volume in the temporal lobes that surround the hippocampus in older adults (Colcombe et al., 2003, 2006; Tseng et al., 2013; Burzynska et al., 2014). In addition, one RCT that assessed a 12-month AE intervention did demonstrate that greater changes in aerobic fitness were associated with greater improvements in temporal lobe white-matter integrity in healthy, older adults (Voss et al., 2013b). White-matter alterations may also occur in non-healthy individuals as one recent RCT demonstrated a 6-month AE intervention to improve global white-matter integrity in patients with schizophrenia (Svatkova et al., 2015).

# Structural Benefits of AE

There is a growing cohort of evidence to suggest that AE is associated with increases in hippocampal gray-matter volume in both healthy and non-healthy individuals (Hötting and Röder, 2013; Hamilton and Rhodes, 2015), and with increases in hippocampal microstructural integrity (Kleemeyer et al., 2015). In some cases, these AE-induced gray-matter changes have been directly correlated with improvements in hippocampal functioning (Prakash et al., 2015). AE appears to have a beneficial impact on global white-matter integrity (Sexton et al., 2015b) and potentially on hippocampal connectivity.

These studies support the idea that AE may be beneficial to hippocampal integrity but it is important to note that these human imaging techniques may only directly assess macro-scale changes and not the functional changes or biological mechanisms that may underlie these changes. Without information relating to the specific substrates underlying changes in tissue composition, it is difficult to determine the exact functional significance of a change in volume (Czéh and Lucassen, 2007; Thomas et al., 2012; Biedermann et al., 2016). For example, volume growth could be driven by a regional increase in dendritic length or density, but it could also be driven by changes that are likely to be less functionally relevant, such as expanding interstitial space between cells or changes in relative water distributions. Alternatively, increases in the proliferation of stem cells, glia, or the birth of new neurons that are added to existing hippocampal circuitry could also influence hippocampal volume over time. Furthermore, the additional energy demands from new neurons or synaptic changes may also require greater metabolic support. This could induce gliogenesis or changes in vasculature, i.e., angiogenesis, that may further contribute to fluctuations in volume (Anderson, 2011). In many respects, such structural adaptations resemble some of the opposite changes seen after exposure to (chronic) stress, that are generally associated with hippocampal volume reductions and represents a risk factor for depression (Czéh and Lucassen, 2007). Whether the same substrates underlie both the atrophy and the growth of a brain region is not clear and in principle, different mechanisms may be responsible for such opposing effects.

Micro-scale changes on a molecular or cellular level can be directly studied using histological approaches in animal models in vivo. Therefore, the following section will focus predominantly on animal literature to outline the micro-scale impact of AE on hippocampal neuroplasticity that may drive these macro-scale structural and functional improvements seen in human studies.

# MICRO-SCALE CHANGES

Aerobic exercise has been linked to changes in a range of independent and interdependent mechanisms of neuroplasticity within the hippocampus (for comprehensive reviews, see: van Praag, 2009; Gomez-Pinilla and Hillman, 2013; Voss et al., 2013a; Bolijn and Lucassen, 2015; Opendak and Gould, 2015). Key mechanisms at both the cellular and molecular level will be discussed below in terms of their contribution to AE-induced enhancements in hippocampal functioning.

## Neurogenesis

Adult neurogenesis is a form of structural hippocampal plasticity that refers to the process of stem cells forming new neurons within a few, distinct sub-regions of the adult brain. These stem cells undergo subsequent stages of proliferation, migration and neuronal differentiation, eventually producing adult-generated, fully functional, neurons that are well integrated into existing neural circuits (Kempermann et al., 2015). The subgranular

zone of the hippocampal dentate gyrus (DG) is one of the just two primary locations where neurogenesis is known to occur in the adult rodent and human brain (Spalding et al., 2013; Kempermann et al., 2015). AE increases the rate of hippocampal neurogenesis, i.e., both the rate of cell proliferation as well as the survival of newborn granule cells (Cotman et al., 2007; Kempermann et al., 2010; Lucassen et al., 2010; Vivar et al., 2013) and despite being difficult to study in humans (Manganas et al., 2007; Ho et al., 2013; Jessberger and Gage, 2014), AE may also stimulate cell proliferation and DG volume growth in the human brain (Pereira et al., 2007; Erickson et al., 2011; Demirakca et al., 2014).

The proliferation of adult-born granule cells is thought to play an important role in hippocampal functioning (Deng et al., 2010; Oomen et al., 2014). It has been extensively demonstrated that the process of neurogenesis modifies the excitation of hippocampal neurons (Ikrar et al., 2013). Animal models have demonstrated that the inhibition or ablation of neurogenesis impairs performance on spatial and contextual memory tasks while improved performances on these tasks are seen when neurogenesis is stimulated (Deng et al., 2010; Snyder and Cameron, 2012; Aimone et al., 2014; Vadodaria and Jessberger, 2014; Kent et al., 2015). Furthermore, manipulating the rate of proliferation in rodents has been shown to selectively effect hippocampus dependent task performance, such as in spatial and contextual memory or pattern separation tasks, but not in tasks that are less hippocampus dependent, such as delay conditioning (Gould et al., 1999; Shors et al., 2002; Snyder et al., 2005, 2011; Deng et al., 2009; Sahay et al., 2011; Lucassen et al., 2013).

While there is still debate as to the exact role that adultborn granule cells play in hippocampal functioning, the process of neurogenesis forms a substrate for experience-dependent change (Opendak and Gould, 2015; Lucassen and Oomen, 2016), which has been implicated in fear and anxiety and depression like-behavior (Sahay and Hen, 2007; Besnard and Sahay, 2015; Lucassen et al., 2015; Hu et al., 2016). This primarily occurs through the role that neurogenesis plays in facilitating memory formation by dictating a computational mechanism known as pattern separation (Aimone et al., 2011; Sahay et al., 2011). Pattern separation is an essential mechanism for the DG to efficiently process and store sensory inputs to enable the formation of episodic, contextual or spatial memories (Lazarov and Hollands, 2016), and its importance in AE induced cognitive enhancement is epitomized by correlative studies that manipulate neurogenesis (Kent et al., 2015; Lucassen and Oomen, 2016). For example, inhibiting the rate of cell proliferation in mice was sufficient to impair pattern separation (Deng et al., 2010) and block AE-induced improvements in spatial (Clark et al., 2009) and contextual (Wojtowicz et al., 2008) memory performance, but only in conditions in which the task required fine spatial discriminations (i.e., where pattern separation was necessary; Creer et al., 2010). Therefore, AE may augment hippocampal memory formation by minimizing the interference between highly similar inputs through increasing the rate of cell proliferation (Déry et al., 2013).

While pattern separation is a popular example of DG function, adult-born granule cells have also been associated with a number of other theoretical DG frameworks important to memory formation that are not discussed here, such as memory resolution or encoding temporal context (see Aimone et al., 2014 for a comprehensive review). While neurogenesis is a crucial process to the spatiotemporal aspect of hippocampal functioning and is likely to be a key mediator of the hippocampal enhancement stimulated by AE (Cotman et al., 2007), more research is needed to elucidate the exact role that adult-born granule cells play in DG and general hippocampal functioning.

Stimulating neurogenesis and promoting DG function is one mechanism through which AE may promote hippocampal function and some recent findings suggest this mechanisms may be driving the macro-scale increases in hippocampal volume found in human imaging studies (Erickson et al., 2011; Fuss et al., 2014). However, AE also induces a variety of other neuroplastic mechanisms that work both independently and in tandem with neurogenesis to improve hippocampal functioning.

#### Synaptic Plasticity

Learning and memory is reliant upon the efficient communication between neural cells through their synapses and AE is thought to enhance this efficiency through promoting synaptic plasticity in a number of ways (Vivar et al., 2013), such as through facilitating long-term potentiation (LTP). LTP is one model of synaptic plasticity that refers to the strengthening of synaptic connections between neurons (Bliss and Collingridge, 1993). Forming an episodic memory involves the association of an event or feature with a particular location in space and occurs through this LTP mechanism in the hippocampus (Bannerman et al., 2014). AE has been demonstrated to stimulate LTP in young rodents and reverse the age-related decline of LTP within the DG of aged rodents compared to sedentary controls (van Praag, 2009; Voss et al., 2013a). Interestingly, these changes appear to be specific to the DG region and may be directly related to the stimulation of neurogenesis by AE (van Praag, 2008). Indeed, immature granule cells are particularly suited to participate in the learning process as they have a lower threshold for LTP induction (Schmidt-Hieber et al., 2004) and demonstrate enhanced LTP compared to surrounding mature granule cells (Lee S.W. et al., 2012). Being hypersensitive to influence on their synaptic plasticity has led to the idea that these adult-born granule cells mediate the enhancement of LTP in the DG from AE (Voss et al., 2013a).

Aerobic exercise has also been associated with certain morphological changes to the structure of neural cells, which may be important in facilitating hippocampal learning and memory (Lang et al., 2004). The dendrites of granule cells within the DG have been shown to increase in length, complexity, and spine density in response to AE (Eadie et al., 2005; Redila and Christie, 2006) as well as spine density in surrounding pyramidal cells of the entorhinal cortex and CA1 regions (Stranahan et al., 2007). AE also interacts with the glutamatergic system through increasing the expression of N-methyl-D-aspartic acid (NMDA) receptors in the hippocampus (specifically, both NR2A and NR2B subtypes), which contributes toward the synaptic plasticity of the region (Molteni et al., 2002; Farmer et al., 2004). These morphological changes are associated with higher rates of LTP

induction and facilitate changes in dendritic strength (van Praag, 2008).

While certain morphological changes enhance synaptic plasticity within the CA1 and entorhinal cortex, AE seems to have a particularly potent impact on the granule cells of the DG. Through generating a greater influx of adult-born granule cells that have an enhanced propensity for LTP and fine-grained morphological changes, AE stimulates an environment within the hippocampus that promotes LTP and facilitates improvements in hippocampus-dependent cognition (Boecker et al., 2012).

#### Vasculature

Cerebral blood flow is important in providing oxygen and essential nutrients that facilitate brain functioning. Improving cerebral blood flow may be an important mediator of AE induced changes in hippocampal functioning (Christie et al., 2008). It has been demonstrated in animal models that AE stimulates the sprouting of new capillaries (angiogenesis) and improves vasculature within the hippocampus (Trejo et al., 2001; van Praag et al., 2005). This coincides with studies that have demonstrated a greater cerebral blood flow in the human hippocampus (Pereira et al., 2007; Burdette et al., 2010; Maass et al., 2015), some of which have correlated this increase with improved performance on episodic memory tasks (Pereira et al., 2007; Maass et al., 2015). The positive influence of improving vascularization may extend beyond a greater supply of oxygen and glucose through prompting the release of neurotrophic factors (the influence of which will be discussed below; Maass et al., 2015) or through facilitating other neuroplastic mechanisms like synaptic plasticity (Christie et al., 2008) or neurogenesis (Palmer et al., 2000; Pereira et al., 2007; Boecker et al., 2012; Bolijn and Lucassen, 2015; Biedermann et al., 2016). In fact, one recent human RCT found that the improvements in hippocampus dependent task performance and growth in hippocampal volume, which occurred following a 3-month AE intervention were predominantly attributable to a greater cerebral blood flow in the hippocampus (Maass et al., 2015). The authors suggested that these changes were either a direct consequence of vascularization or an indirect consequence of changes in synaptic plasticity or neurogenesis that were stimulated by the improvements in cerebral blood flow.

Quantifying the impact that changes in vasculature induced by AE have on hippocampal functioning is increasingly attracting attention, but the relative contribution of changes in vasculature is still in contention. For example, some animal models have suggested that angiogenesis may underlie improvements in spatial memory tasks independent of other neuroplastic mechanisms (van Praag et al., 2007; Kerr et al., 2010), whereas others have found improvements in spatial learning to be driven by neurogenesis with no influence at all from angiogenesis (van Praag et al., 2005). Changes in vasculature may indeed stimulate other neuroplastic mechanisms like synaptogenesis or neurogenesis and promote hippocampal tissue growth (Kleemeyer et al., 2015), but a greater investigation is required to outline the direct relationship between vascularization and enhancements in hippocampal functioning stimulated by AE (Davenport et al., 2012). Interestingly, AE is specifically associated with an increase the density of small, rather than largediameter blood vessels in both humans (Bullitt et al., 2009) and animals (Bloor, 2005; Van der Borght et al., 2009). Increasing microvasculature density may be protective against white-matter hyperintensities, which itself may reduce gray-matter atrophy and cognitive dysfunction (Voss et al., 2013a). Therefore, in additional to promoting hippocampal function, the influence of AE on vasculature may also preserve structural integrity in the hippocampus and other regions of the brain.

On a cellular level, both animal and human literature has suggested that neurogenesis, synaptic plasticity and vascularization within the hippocampus represent the three primary neuroplastic mechanisms that are stimulated by AE to promote hippocampal functioning. These cellular changes are in turn influenced by a number of changes that occur on a molecular level in response to AE and the relevance of these molecular factors to hippocampal functioning will be considered below.

#### Neurotrophic Factors

Neurotrophins are important to the development and maintenance of functioning neural cells in the brain (Barbacid, 1995) and are likely to play a crucial role in mediating the impact of AE on hippocampal neuroplasticity and functioning (Cotman et al., 2007; Voss et al., 2013a; Bolijn and Lucassen, 2015). Brain-derived neurotrophic factor (BDNF) is a centrally produced neurotrophin of particular interest due to its high concentration within the hippocampus and its integral role in supporting neuronal survival and growth, and synaptic plasticity (Cotman et al., 2007; Cowansage et al., 2010). Animal models have shown AE to be associated with a region-specific up-regulation of BDNF in the hippocampus (Neeper et al., 1995; Marlatt et al., 2012; Uysal et al., 2015). Similarly, in human models AE has been associated with an increase in BDNF serum (Coelho et al., 2013) with some indication of this up-regulation occurring specifically within the hippocampus (Erickson et al., 2011; Voss et al., 2013a).

Brain-derived neurotrophic factor may be the most important factor that is upregulated by AE given its extensive involvement in both synaptic plasticity and neurogenesis (Cotman et al., 2007). The protein is thought to interact with energy metabolism to facilitate both pre- and post-synaptic mechanisms and the induction of LTP (Gomez-Pinilla and Hillman, 2013; Edelmann et al., 2014), as well as promoting the proliferation and survival of adult-born granule cells (Korol et al., 2013; Park and Poo, 2013). The importance of BDNF to hippocampal neuroplasticity is epitomized by animal studies, which have found that the manipulation of BDNF expression directly impacts upon performance in spatial and contextual memory tasks (Linnarsson et al., 1997; Alonso et al., 2002; Heldt et al., 2007; Peters et al., 2010), and also that the downregulation of BDNF inhibits AE induced improvements in spatial memory performance (Vaynman et al., 2004).

Recently, a number of human studies have also demonstrated that the increased BDNF serum levels associated with AE are directly correlated with improved performances in various domains of memory including on spatial memory

tasks (Piepmeier and Etnier, 2015). In line with the molecules' interaction with neurogenesis and particularly high concentration within the DG subfield of the hippocampus (Farmer et al., 2004), BDNF may play an important role in pattern separation. Recent studies have shown that BDNF is required for the pattern separation to occur in memory encoding and consolidation, and specifically that BDNF acts on adult-born granule cells within the DG to facilitate pattern separation (Bekinschtein et al., 2013, 2014).

Vascular endothelial derived growth factor (VEGF) and insulin growth factor 1 (IGF-1) are growth factors that are produced peripherally and are implicated in both angiogenesis and neurogenesis (Gomez-Pinilla and Hillman, 2013). Human studies have shown that AE is associated with peripheral increases in VEGF and IGF-1 serum concentrations (Schobersberger et al., 2000; Llorens-Martín et al., 2010), and both molecules are known to cross the blood brain barrier and interact with hippocampal cells (Ding et al., 2006; Tang et al., 2010). VEGF is a hypoxiainducible protein that stimulates angiogenesis (Krum et al., 2002). By facilitating vascularization in this way, particularly within the DG (Clark et al., 2009), the molecule also supports the AE induced stimulation of neurogenesis (Gomez-Pinilla and Hillman, 2013). Similarly, IGF-1 is known to support alterations in vascularization that are induced by AE (Lopez-Lopez et al., 2004) and promotes the proliferation and survival of newborn neural cells (Gomez-Pinilla and Hillman, 2013). Indeed, the inhibition of VEGF or IGF-1 has been demonstrated to impair the promotional impact of AE on the rate of neurogenesis (Trejo et al., 2001; Fabel et al., 2003; Ding et al., 2006) and spatial memory performance (Ding et al., 2006). Conversely, up-regulating either VEGF or IGF-1 can stimulate neurogenesis independently of AE (Aberg et al., 2000; Carro et al., 2000; Trejo et al., 2001; Cao et al., 2004).

While a number of other biochemical changes occur in response to AE (see Bolijn and Lucassen, 2015), BDNF, VEGF, and IGF-1 are considered to be key proteins upregulated by AE that induce hippocampal neuroplasticity. The important contribution that these neurotrophins make to the neuroplasticity induced by AE is underscored by animal models, which demonstrate their direct and significant impact on hippocampal functioning (Cotman et al., 2007; Llorens-Martín et al., 2010; Voss et al., 2013a; Piepmeier and Etnier, 2015). Subsequently, human studies are starting to take a step further than only assessing the impact of AE in terms of macro-scale structural changes by attempting to indirectly study underlying biological processes.

# Inferring Micro-Scale Changes Using Human Imaging

An important approach has been to supplement volumetric changes with recordings of peripheral biomarkers as a proxy for measuring changes in neurotrophin regulation within the brain. Results using this approach have thus far have been mixed, with some AE studies findings a positive correlation between hippocampal volume or connectivity changes and serum concentrations of BDNF, VEGF, and IGF-1 (Erickson et al., 2011; Coelho et al., 2013; Voss et al., 2013b), while others have failed to replicate such findings (Maass et al., 2016).

Another promising approach has been to combine multimodal imaging techniques to estimate the micro-scale processes that underlie volumetric changes associated with AE. For example, one study demonstrated that AE stimulated hippocampal volume growth and then utilized a range of multi-modal MRI techniques to indirectly suggest that this growth may be supported by changes in myelination, rather than in vasculature (Thomas et al., 2016). Some studies have used diffusion tensor imaging (DTI) to demonstrate that the growth in hippocampal volume induced by AE is correlated with improvements in the microstructural integrity of hippocampal gray-matter, based on the assumption that a low mean diffusivity is indicative of an increased gray-matter tissue density (Kleemeyer et al., 2015). Another multimodal approach that is growing in popularity is the use of MR spectroscopy (MRS) to measure microscale changes in local metabolite composition. N-acetylaspartate (NAA) is a metabolite indicative of neuronal integrity and a number of AE studies have demonstrated that the growth in hippocampal volume (Pajonk et al., 2010; Erickson et al., 2012; Gonzales et al., 2013; Wagner et al., 2015) and in some cases, improvements in working memory performance (Erickson et al., 2011) are both correlated with higher concentrations of NAA. One of these studies found that an AE intervention led to a 2% decrease in hippocampal volume but the MRS data indicated there was no change in NAA (Wagner et al., 2015). Therefore, given the approximation that 50% of gray-matter is composed of neuropil (Thomas et al., 2012), the authors suggested that the volumetric decline is unlikely to have resulted from a loss of neurons (which would have been indicated by a lower NAA value) and may be due to other factors such as changes in glial cells (Wagner et al., 2015). This analysis and others (Dennis et al., 2015) illustrate the utility of using multimodal imaging such as MRI with MRS to assess the functional relevance of volumetric alterations that is not possible using one modality alone. Future research would benefit from adopting methods similar to those outlined above to aid in the translation between animal and human models and help provide a more comprehensive account detailing the impact of AE on the human brain.

Animal studies have demonstrated that AE stimulates a cascade of interdependent cellular and molecular mechanisms of neuroplasticity that mediate the associated enhancements in hippocampal functioning. In humans, AE is associated with increases in hippocampal volume and some indirect indicators of neuroplasticity (e.g., increased cerebral blood flow or BDNF serum concentration), which correlate with improved performances on hippocampus dependent tasks, indicative of an enhanced hippocampal functioning. These findings suggest that AE can stimulate hippocampal neuroplasticity and promote the regions functioning in humans in a similar way to that which has been demonstrated using animal models. Considerably more research will be necessary to substantiate this relationship and methods such as multimodal imaging and assessing peripheral biomarkers represent promising ways that future RCTs can help to bridge the gap between animal and human models. Based on the available evidence outlined above, AE does appear to have a highly beneficial impact on hippocampal integrity and the following sections will discuss whether AE thus represents a viable clinical intervention.

# CLINICAL APPLICATION

fnhum-10-00373 July 27, 2016 Time: 12:23 # 8

The capacity for AE to induce neuroplastic changes that improve both hippocampal integrity and promote hippocampus dependent cognition may be of particularly clinical importance for two reasons. Firstly, a number of psychiatric and neurological disorders seem to have a particularly potent influence on hippocampal structure (Bartsch and Wulff, 2015) and its deterioration may underlie certain aspects of their symptomatology. Secondly, psychiatric symptoms can be dichotomously described as being either affective or cognitive in nature, yet a disproportionate amount of the current literature has exclusively focussed on ameliorating affective symptoms (Millan et al., 2012). Currently, no effective treatments have been developed to alleviate cognitive deficits associated with psychiatric or neurological disorders (Wallace et al., 2011; Keefe et al., 2013, 2014; Solé et al., 2015). Promoting neuroplastic changes that enhance hippocampal functioning may be useful in remediating certain domains of cognitive dysfunction (e.g., in learning and memory), which occur in disorders that have a particularly detrimental impact on hippocampal integrity. The efficacy of AE is already being investigated as a therapeutic intervention to counteract cognitive decline and deteriorating hippocampal integrity associated with aging or neurological disorders like dementia (Ahlskog et al., 2011). Comparatively little attention has been afforded to the potential application of AE in treating psychiatric disorders that have a similarly deleterious impact on the hippocampus and are associated with severe cognitive deficits. The following sections will outline the need to develop effective interventions that remediate cognitive symptoms, which can have a debilitating impact on psychiatric patients, using the examples of schizophrenia and MDD. The capacity for AE to aid in the alleviation of cognitive symptoms and improve the efficacy of current treatment will then be discussed in the context of both MDD and schizophrenia.

# Cognitive Dysfunction in Psychiatric Disorders

Behaviors observed in animal models of anxiety/MDD have indicated that MDD can cause deficits in several areas of cognition, many of which are strongly related to hippocampal function (Darcet et al., 2016). A number of animal studies have shown models of anxiety/MDD to be associated with poorer performances on tasks of working memory (Mizoguchi et al., 2000; Henningsen et al., 2009) and attention (Baudin et al., 2012; Wilson et al., 2012; Wallace et al., 2014), as well as on tasks reliant on hippocampal functioning such as episodic-like memory (Orsetti et al., 2007; Baudin et al., 2012; Naninck et al., 2015) and spatial memory (Markham et al., 2010; Darcet et al., 2014). Similarly, human patients with MDD have consistently demonstrated clinically relevant deficits in domains of executive functioning, psychomotor speed, attention, and memory (McDermott and Ebmeier, 2009; Lee R.S. et al., 2012; McIntyre et al., 2013; Rock et al., 2013). Even after remission following successful antidepressant treatment, most patients continue to experience cognitive deficits, particularly within domains of executive function, memory, and attention (Hammar and Ardal, 2009; Bora et al., 2013; Rock et al., 2013; Popovic et al., 2015; Solé et al., 2015). Individuals with MDD who continue to experience these cognitive symptoms are more likely to relapse and display worse psychosocial functioning outcomes (Papakostas et al., 2008; Bortolato et al., 2016), which can impact their capacity to socialize or to function well at work (Jaeger et al., 2006; McIntyre et al., 2013). Consequentially, the importance of alleviating cognitive dysfunction to enhance the success of MDD treatment is increasingly being recognized (Solé et al., 2015; Bortolato et al., 2016).

The heterogeneity of schizophrenia makes the disorder extremely difficult to model in animals and most approaches have focussed on replicating certain groups of schizophrenialike behaviors in animals (Nestler and Hyman, 2010). For example, disrupting NMDA receptors with phencyclidine (PCP) is known to produce a range of symptoms associated with schizophrenia (Nestler and Hyman, 2010) and has subsequently been shown to cause a number of cognitive deficits including in cognitive flexibility (Abdul-Monim et al., 2007), attention (Amitai et al., 2007) and episodic-like memory (Grayson et al., 2007; Nagai et al., 2009). Cognitive dysfunction in human patients with schizophrenia is rapidly being considered amongst the most debilitating aspects of schizophrenia (Nuechterlein et al., 2011) and the severity of cognitive dysfunction is a key determinant of the functional outcome following treatment (Goldberg and Green, 2002). Schizophrenia is associated with deficits in cognitive domains of executive functioning, processing speed, memory and attention (Mesholam-Gately et al., 2009; Keefe and Harvey, 2012). The persistence of these deficits has a significant impact on an individual's quality of life and ability to attain and maintain employment (Archer and Kostrzewa, 2015) making the amelioration of cognitive deficits a highly prioritized therapeutic goal in schizophrenia treatment (Malchow et al., 2013).

# The Underlying Neuroplasticity of Cognitive Dysfunction

Both MDD and schizophrenia have a detrimental impact on neuroplasticity, particularly within the hippocampus. A reduced hippocampal volume is one of the most consistently reported structural abnormalities in patients of MDD (Schmaal et al., 2016) and of schizophrenia (Ellison-Wright and Bullmore, 2010; Adriano et al., 2012). In patients with schizophrenia, these hippocampal abnormalities have been correlated with deficits in memory (Gur et al., 2000) and aspects of executive functioning involving inhibitory control (Bilder et al., 1995; Szeszko et al., 2002), demonstrating the importance of hippocampal abnormalities to cognitive dysfunction. Both disorders seem to have a particularly deleterious impact on DG volume (Tamminga et al., 2010; Huang et al., 2013; Travis et al., 2015), which is reminiscent of the reduced level of neurogenesis found

in animal models of both disorders (Eisch and Petrik, 2012; Lucassen et al., 2015; Allen et al., 2016). Schizophrenia and MDD also inhibit other important neuroplastic mechanisms such as synaptic plasticity (Law and Deakin, 2001; Kolomeets et al., 2007; Kobayashi, 2009; McEwen et al., 2012; Sanderson et al., 2012) and BDNF expression (Krishnan and Nestler, 2008; Green et al., 2011; Favalli et al., 2012) within the hippocampus. The inhibition of such key neuroplastic mechanisms in the hippocampus is likely to be an important factor contributing toward the cognitive deficits associated with MDD (Perera et al., 2008; Kaymak et al., 2010; Nagahara and Tuszynski, 2011; Turner et al., 2012) and schizophrenia (Ranganath et al., 2008; Schobel et al., 2009; Heckers and Konradi, 2010; Zhang et al., 2012).

Targeting neuroplasticity has recently become an important approach in treating psychiatric symptoms (Kays et al., 2012) and may be useful in counteracting this hippocampal harm to remediate cognitive deficits. For example, deficits in episodic memory have been consistently found in patients with MDD (McDermott and Ebmeier, 2009) and schizophrenia (Barch and Ceaser, 2012), and given the role of the hippocampus in episodic memory formation this may be somewhat attributable to its dysfunction. Indeed, aspects of hippocampal dysfunction such as a lower basal rate of neurogenesis are thought to inhibit one's capacity for pattern separation in patients with MDD (Déry et al., 2013; Shelton and Kirwan, 2013) or schizophrenia (Das et al., 2014) and the negative impact that this has on episodic memory formation may have broad implications for the symptomology of both disorders (Dere et al., 2010; Tamminga et al., 2010).

To illustrate this point, deficits in pattern separation could negatively influence one's capacity to correctly discriminate between stimuli and lead to a tendency for overgeneralization (Kheirbek et al., 2012; Shelton and Kirwan, 2013). The predisposition for individuals with MDD to make negative selfinferences coupled with the tendency to overgeneralize could result in negative overgeneralizations in the formation of episodic memories, which may contribute toward affective symptoms such as anhedonia (Eisch and Petrik, 2012; Shelton and Kirwan, 2013). In the case of schizophrenia, the added complication of a reduced synaptic connectivity between the DG and CA3 subfields (Kolomeets et al., 2007) to a lower basal rate of neurogenesis may contribute toward psychosis to some extent. Patients with schizophrenia show a disproportionately low level of pattern separation relative to pattern completion (Tamminga et al., 2010). Pattern completion is a complimentary process to pattern separation whereby through the activation of associative networks in the CA3 region, partial information can act as a recall cue to return the full representation of a previously stored memory (Nakashiba et al., 2012). It is possible that this overactivation of the pattern completion mechanism could lead to the inappropriate associations and representations causing the encoding or retrieval of false episodic memories with psychotic content (Tamminga et al., 2010; Das et al., 2014).

These two theoretical examples illustrate how hippocampal dysfunction in either disorder may disrupt the correct formation of episodic memories and potentially exasperate other psychiatric symptoms. Within this framework, the stimulation of neuroplastic mechanisms like neurogenesis might help to alleviate deficits in episodic memory, subsequently help to reduce the expression of other psychiatric symptoms. Furthermore, given the regions importance to learning and memory processing, it is possible that improving hippocampal functioning would also contribute toward the alleviation of other cognitive deficits than episodic memory. For example, deficits in working memory represent an aspect of executive functioning impaired in MDD (Lee R.S. et al., 2012; Rock et al., 2013) and schizophrenia (Forbes et al., 2009). As hippocampal functioning may be an important factor supporting working memory (Fell and Axmacher, 2011; Chaieb et al., 2015) promoting hippocampal integrity may indirectly contribute toward alleviating working memory deficits and improving executive functioning.

While it has not been the focus of this paper, it is also worth noting that altered hippocampal functioning can directly influence other, non-cognitive processes that are important in the pathology of certain psychiatric disorders like MDD. Through its dense connectivity with the prefrontal cortex and the amygdala, the hippocampus is also implicated in emotional regulation (O'Donnell and Grace, 1995; Seidenbecher et al., 2003; Lisman and Grace, 2005; Maren and Hobin, 2007) as well as playing an important role in regulating feedback inhibition from the hypothalamic-pituitary-adrenal axis (Jacobson and Sapolsky, 1991). Hippocampal dysfunctioning may therefore contribute toward deficits in the regulation of emotional processing and stress responses that are often seen in patients with MDD (Sapolsky, 2000; Davidson et al., 2002).

The role of these deficits in neuroplasticity within the hippocampus may be a crucial factor underlying hippocampal dysfunction in psychiatric disorders like MDD and schizophrenia. Directly targeting these deficits to enhance hippocampal functioning may help to improve deficient cognitive processes whose underlying etiology are particularly localized to the hippocampus. Furthermore, it is possible that promoting hippocampal functioning may have a beneficial impact on alleviating other cognitive or affective symptoms, which are more broadly related to dysfunctional networks that involve the hippocampus.

# Current Treatment Approaches

Psychiatric disorders like schizophrenia and MDD have been primarily treated using pharmacotherapy such as with antipsychotic or antidepressant medications, both of which have been shown to reduce psychiatric symptoms (Leucht et al., 2012; Undurraga and Baldessarini, 2012). However, several large-scale meta-analyses have recently suggested that the effectiveness of antipsychotic and antidepressant medications is only marginally different from that of a placebo (Leucht et al., 2009; Rief et al., 2009), and neither have been able to successfully remediate cognitive dysfunction (Keefe et al., 2013, 2014; Solé et al., 2015). These approaches are generally not designed to target neuroplasticity, but rather to remediate other aspects of a psychiatric disorder such as dysfunctional neurotransmitter systems. Neither antipsychotics nor antidepressants have been consistently shown to induce any lasting neuroplastic changes in the brain (Rief et al., 2015). With the growing

importance that is being placed on deficits in neuroplasticity to the underlying etiology of psychiatric disorders, it may be necessary to stimulate underlying neuroplastic changes in order to induce lasting structural alterations and effectively alleviate cognitive dysfunctions.

For example, there have been equivocal findings regarding the impact of antipsychotic medications like haloperidol (typical) and olanzapine (atypical) on macro-scale hippocampal structure, with studies finding chronic use to cause volumetric increases (Schmitt et al., 2004), decreases (Barr et al., 2013), and no impact at all (Navari and Dazzan, 2009; Smieskova et al., 2009; Vernon et al., 2011). Micro-scale changes associated with the chronic use of antipsychotics have also been mixed, for example it remains unclear whether any antipsychotic has a consistent impact on hippocampal neurogenesis (Schoenfeld and Cameron, 2015). Even in cases where antipsychotics like olanzapine were found to increase the number of adult-born cells in the DG, these cells were likely to be endothelial cells and oligodendrocytes rather than granule cells (Kodama et al., 2004), which may not promote DG function and mechanisms like pattern separation in the same way. The impact of both types of antipsychotics on synaptic plasticity is also ambiguous, with some studies showing drugs like olanzapine to promote dendritic growth (Park et al., 2013), while others have found that various typical and atypical antipsychotics are associated with a general reduction in dendritic complexity and impaired LTP (Frost et al., 2010; Price et al., 2014). Finally, BDNF is considered to be an important factor underlying the pathophysiology of schizophrenia. However, typical antipsychotics have been largely associated with reductions in BDNF expression and the impact of atypical antipsychotics remains unclear, with many studies finding it to have no impact on BDNF expression in the hippocampus and across other regions (Favalli et al., 2012).

Furthermore, no antipsychotic medication (Goldberg et al., 2007; Tybura et al., 2013) or other pharmacological intervention (Keefe et al., 2013) has been consistently demonstrated to alleviate the cognitive deficits associated with schizophrenia and in cases where the medication is anticholinergic, cognitive deficits have been shown to worsen (Tandon, 2011). The most promising approach in restoring cognitive deficits in schizophrenia may in fact, be through cognitive remediation which has produced some significant improvements in a number of cognitive domains and functional outcomes (McGurk et al., 2007; Reddy et al., 2014), however, effect sizes remain only small to moderate (Wykes et al., 2011). Currently, cognitive deficits are one of the most debilitating aspects of schizophrenia that remain the least effectively treated (Gibbons and Dean, 2016).

In the case of treating MDD, widely prescribed antidepressant medications such as selective serotonin reuptake inhibitors (SSRIs) have been more convincingly demonstrated to induce neuroplastic changes. SSRI medications have been shown to stimulate the rate of hippocampal neurogenesis (Malberg et al., 2000; Boldrini et al., 2009), but again this may not be representative of increased granule cells. It has been suggested that this effect is likely due to a 'dematuration' of mature granule cells in the DG rather than increased cell proliferation and it is unclear what impact this would have on hippocampal functioning (Kobayashi et al., 2010). For example, mature granule cells are intrinsic to pattern completion (Nakashiba et al., 2012) so this dematuration of granule cells caused by SSRIs could lead to an underactivation of pattern separation and inhibit proper episodic memory formation. However, various antidepressant treatments have reliably been shown to up-regulate hippocampal BDNF (Duman and Monteggia, 2006; Musazzi et al., 2009), which appears to be promoting synaptic plasticity within the region (Bath et al., 2012). Despite this, antidepressant medications are encumbered with slow response rates, a modest therapeutic efficacy and little or no impact on the cognitive deficits associated with MDD (Duman and Aghajanian, 2012; McIntyre et al., 2013; Rosenblat et al., 2015; Solé et al., 2015; Bortolato et al., 2016). Despite the increasing importance placed on alleviating cognitive dysfunction in MDD, research has continued to focus on treating affective symptoms (Baune and Renger, 2014). Subsequently, there are currently no prescribed treatments available that effectively alleviate cognitive deficits in MDD (Keefe et al., 2014; Solé et al., 2015; Bortolato et al., 2016).

Through directly targeting neuroplasticity in crucial areas like the hippocampus, it may be possible to induce lasting structural changes that promote the region's functioning and contribute toward the alleviation of cognitive dysfunction in psychiatric disorders. With the global pharmaceutical industry being estimated to reach a value of US\$1.3 trillion by 2018 (CMR International, 2015), bridging the gap between neuroplasticity and functional outcomes in treating psychiatric disorders will likely be dominated by a pharmacological approach. Indeed, a great amount of research is being conducted into developing an effective pharmacological intervention to treat cognitive deficits in psychiatric disorders (Wallace et al., 2011). However, none of the most promising pharmacological agents have been able to achieve more than a moderate effect size in treating most cognitive deficits, at least within MDD (Keefe et al., 2014; Solé et al., 2015) and schizophrenia (Keefe et al., 2013). Therefore, it may be worth considering the inclusion of non-pharmacological approaches such as AE, as an adjunction to pharmacotherapy that may improve treatment of cognitive dysfunction.

# The Prospect of an AE Intervention

While its therapeutic potential has only been explored very recently, AE has been demonstrated to counteract pathologically induced hippocampal harm and improve the region's functioning in a range of animal disease models including fetal alcohol spectrum disorders, traumatic brain injury, stroke, and Parkinson's, Alzheimer's and Huntington's diseases (Patten et al., 2015). In humans, AE has been shown to stimulate hippocampal neuroplasticity and successfully counteract deteriorating hippocampal function caused by aging or Alzheimer's disease (Intlekofer and Cotman, 2013). However, whether these findings can be extended to individuals with psychiatric disorders is still unclear. Most of the existing human literature investigating the use of AE in the treatment of MDD or schizophrenia has only measured its impact on individual psychopathologies, such as depressive symptoms or positive and negative symptoms (Knöchel et al., 2012). In both cases, AE has been demonstrated to be effective in reducing both positive and negative symptoms

of schizophrenia (Vancampfort et al., 2012; Firth et al., 2015; Rimes et al., 2015) and affective symptoms in MDD (Cooney et al., 2012), in some cases just as effectively as antidepressants (Blumenthal et al., 1999, 2007; Brosse et al., 2002), or in terms of neurogenesis, exceeding it (Marlatt et al., 2010). The efficacy of AE in reducing a range of psychiatric symptoms suggests that AE interventions could have a number of benefits to the treatment of disorders like MDD or schizophrenia. Importantly, AE appears to be able to stimulate neuroplasticity and promote hippocampal functioning in brains with both healthy and pathologically deteriorated hippocampi. It is plausible that AE interventions could be used to counteract hippocampal harm caused by disorders that profoundly impact upon hippocampal functioning, and through this approach, aid in the alleviation of certain aspects of cognitive dysfunction.

In recent years, AE has started to attract attention as a therapeutic target for schizophrenia treatment, but only a handful of studies have systematically investigated the capacity for an AE intervention to remediate cognitive deficits associated with the disorder (Knöchel et al., 2012; Sommer and Kahn, 2015). Both cross-sectional (Kimhy et al., 2014) and interventional studies (Pajonk et al., 2010; Oertel-Knöchel et al., 2014; Kimhy et al., 2015; McEwen et al., 2015) have demonstrated that AE can promote cognitive functioning in patients with schizophrenia across a range of cognitive domains including speed of processing, short-term and working memory and visual learning. Although one review was unable to find an association between AE and a reduction of cognitive symptoms in patients with schizophrenia (Dauwan et al., 2016) these early results are largely promising.

On a cellular level, some studies have used animal models of schizophrenia to demonstrate that AE can promote neurogenesis (Wolf et al., 2011), hippocampal BDNF concentration and the expression of NDMA receptors in the hippocampus (Kim T.W. et al., 2014; Park et al., 2014). One study using an animal model of schizophrenia also demonstrated AE to improve performance on a spatial working memory task (Kim T.W. et al., 2014). These initial results suggest that AE could be beneficial for improving hippocampal structure and functioning in schizophrenia but such findings must be interpreted with caution given the aforementioned difficulty of modeling complex disorders like schizophrenia in animals (Nestler and Hyman, 2010). Some recent studies in human patients with schizophrenia have shown AE interventions to be associated with increased BDNF serum concentrations (Kuo et al., 2013; Kim H. et al., 2014; Kimhy et al., 2015) and one study found that the increased BDNF concentration accounted for a significant proportion of the improvements they had observed in cognitive performance observed in following AE (Kimhy et al., 2015). Another study in patients with schizophrenia also demonstrated AE to counteract the deterioration of white-matter tract integrity that is associated with the disorder (Svatkova et al., 2015).

These findings are encouraging and some human imaging studies have suggested the positive impact of AE may be regionspecific to the hippocampus. For example, in patients with schizophrenia a 12-week AE intervention has led to increased hippocampal volume, hippocampal NAA concentration and improved performances on short-term memory and working memory tasks (Pajonk et al., 2010; Lin et al., 2015; McEwen et al., 2015). However, other studies have found AE interventions to of had no impact on hippocampal volume or function in patients with schizophrenia (Scheewe et al., 2013; Rosenbaum et al., 2015). It is possible that these discrepancies are caused by systemic differences between the studies. Although data was not available for all studies (Rosenbaum et al., 2015) attendance rates for AE sessions were generally higher in those studies that did find improvements in hippocampal volume and function (Pajonk et al., 2010; Lin et al., 2015) than those which did not (Scheewe et al., 2013). Additionally, both studies that found no volume change used an automated algorithm to segment the hippocampus (Scheewe et al., 2013; Rosenbaum et al., 2015), which is known to be less accurate than the manual method (Morey et al., 2009) employed in the former study (Pajonk et al., 2010) – although this was not always the case (Lin et al., 2015).

In patients with MDD, several interventional studies have demonstrated that AE can promote cognitive performance in executive functioning, attention, inhibitory control, speed of processing, working and spatial memory and visual learning (Kubesch et al., 2003; Vasques et al., 2011; Oertel-Knöchel et al., 2014; Greer et al., 2015). Although one study did not find AE to have any positive impact on cognition (Hoffman et al., 2008), the results so far are promising and AE may represent a viable option in remediating cognitive dysfunction in MDD (Solé et al., 2015).

In animal models of stress/MDD, certain micro-scale changes have been recorded which demonstrate that AE can promote hippocampal neurogenesis, vascularization, BDNF expression (both brain-wide and within the hippocampus), IGF-1 expression, VEGF expression, and hippocampal synaptic plasticity (Adlard and Cotman, 2004; Zheng et al., 2006; Bjørnebekk et al., 2010; Nakajima et al., 2010; Sartori et al., 2011; Kiuchi et al., 2012; Yau et al., 2012; Lu et al., 2014). Animal models of stress/MDD have also shown that AE can reduce depressive-like behavioral and deficits in spatial memory (Zheng et al., 2006; Nakajima et al., 2010; Yau et al., 2012). In studies of human patients with MDD, AE has been associated with increased BDNF serum concentration (Gustafsson et al., 2009; Laske et al., 2010). Just one human study has sought to assess macro-scale structural changes and found no association between an AE intervention and hippocampal volume or neurotrophin circulation (Krogh et al., 2014). However, participant attendance rate to AE sessions was very low in this study, at less than half (one session per week) the attendance rate recorded by studies in other psychiatric populations who did demonstrate growth in hippocampal volume (2.6 sessions per week; Pajonk et al., 2010). Furthermore, participants in this study also showed no reduction in depressive symptoms. This indicates that there may not have been a sufficient level of engagement in AE for changes in hippocampal structure and neurobiology to be recorded.

In addition to promoting neuroplastic changes that could lead to cognitive enhancement, AE is also thought to interact with several neurotransmitter systems, including the monoamine system (Chaouloff, 1989). The action of the monoamine neurotransmitter serotonin (5-hydroxytryptamine, 5-HT) in the

hippocampus is known to facilitate the process of hippocampal learning and memory (Riedel et al., 1999; Buhot et al., 2000). 5-HT transmission in the hippocampus is disrupted in schizophrenia and MDD (Naughton et al., 2000; Middlemiss et al., 2002) and this 5-HT dysfunction is likely to contribute deficits in learning and memory in these disorders (Meeter et al., 2006; Gray and Roth, 2007). The exact nature of this dysfunction is unclear given the diverging impact that agonists and antagonists have on each 5-HT receptor subtype within the hippocampus and the complex interactions between 5-HT and other neurotransmitter systems (Meneses, 1999). Targeting specific hippocampal 5-HT receptors, particularly the 5-HT1a receptor subtype, is growing in popularity as a pharmacological approach designed to promote learning and memory in psychiatric disorders (Wallace et al., 2011) like schizophrenia (Meltzer and Sumiyoshi, 2008) or MDD (Meeter et al., 2006). Some animal studies have indicated that AE may also be associated with elevated levels of 5-HT in the hippocampus (Gomez-Merino et al., 2001) as well as an increase in tryptophan hydroxylase, a rate-limiting precursor of 5- HT produced in the raphe nucleus, which projects directly to the hippocampus (Chaouloff, 1989). It is unclear whether this interaction with hippocampal 5-HT would be beneficial or not, but some recent animal work has indicated that AE can specifically ameliorate dysfunction in the 5-HT1a receptor subtype (Maniam and Morris, 2010; Kim M.H. et al., 2014). The impact of AE on the monoamine system has largely been discussed in terms of its antidepressant properties (Christie et al., 2008; van Praag, 2009), but it is possible that this interaction may also contribute toward cognitive enhancement. However, more work is needed to elucidate the exact role of 5-HT and 5-HT receptor subtypes in hippocampal learning and memory and whether this coincides with the actions of AE.

The inclusion of AE interventions into current treatment approaches could be useful to counteract hippocampal harm and alleviate cognitive dysfunctions caused by psychiatric disorders. The important role that AE could play in the treatment of cognitive dysfunction is starting to gain traction in relation to schizophrenia (Malchow et al., 2013; Sommer and Kahn, 2015; Vakhrusheva et al., 2016) and MDD (Malchow et al., 2013; Oertel-Knöchel et al., 2014). Despite this, there is a distinct lack of systematic investigations (both animal and human) into the relationship between AE and neuroplasticity, hippocampal functioning, and its impact on cognitive dysfunctions in psychiatric disorders such as MDD (Malchow et al., 2013) or schizophrenia (Sommer and Kahn, 2015). Based on the available evidence, the merits of AE are best capitulated as an adjunctive intervention to pharmacotherapy that could improve treatment efficacy through targeting cognitive dysfunctions that remain largely untreated. It is possible that pharmacotherapy may eventually develop a comprehensive treatment for cognitive dysfunction, but non-pharmacological approaches like AE have a number of distinct benefits to the patient (discussed below), which make these interventions worthy of further investigation by future research.

# WHY ARE NON-PHARMACOLOGICAL INTERVENTIONS USEFUL?

The efficacy of both antidepressant and antipsychotic medications could be improved should more attention be afforded to relevant lifestyle factors such as AE engagement (Rief et al., 2015). There appears to be a degree of overlap in the underlying mechanisms that are stimulated by AE and certain pharmacological medications, such as with the antidepressant fluoxetine (Huang et al., 2012). It is possible that such a degree of overlap in their underlying mechanisms could mean that combining a traditional pharmacological intervention with an AE intervention would have a synergising impact on inducing neuroplasticity. For example, animal models have demonstrated that the combination of AE and antidepressant treatment had a stronger impact on up-regulating BDNF than either intervention had individually (Russo-Neustadt et al., 2001; Baj et al., 2012) – this is particularly important as BDNF regulation is thought to be crucial to the antidepressant mechanism (Duman and Aghajanian, 2012). Additionally, the inclusion of AE to a traditional antidepressant intervention has been shown to have greater impact on reducing depressive symptoms in patients with MDD, than antidepressant treatment alone (Knubben et al., 2007; Schuch et al., 2011; Legrand and Neff, 2016). Interestingly, MDD patients with higher basal levels of BDNF due to prior treatment with SSRI's experienced a more rapid reduction in symptoms following an AE intervention than those with lower basal BDNF levels (Toups et al., 2011). This suggests that antidepressant medications may be useful for 'priming' patients with MDD to respond better to a subsequent AE intervention (Toups et al., 2011). Additionally, in animals, AE was significantly more potent at increasing the survival of adult-born granule cells in comparison to SSRIs like fluoxetine and duloxetine (Marlatt et al., 2010). Therefore, the inclusion of AE to a traditional antidepressant intervention could also have an additive impact on promoting neurogenesis.

The use of AE as an adjunctive treatment to traditional antidepressant medication may have a synergising impact on neuroplasticity potentially resulting in a more effective approach toward remediating psychiatric symptoms. The possibility of an enhanced efficacy rate would be particularly useful in treating patients with MDD who do not respond to antidepressant treatment alone (Mura et al., 2014), which could be as many as 10-30% of patients with MDD (Joffe et al., 1996). An interesting direction for future research would be to investigate whether including AE as an adjunctive treatment to a traditional pharmacological approach would have a synergistic impact on neuroplasticity, resulting in a greater treatment efficacy than attainable by either intervention alone.

In addition to the potential benefits that a combined approach may have on symptom alleviation, the inclusion of AE interventions may have further benefits to the well being of patients. For example, the development of a psychiatric disorder significantly increases the risk of psychiatric comorbidity (Fusar-Poli et al., 2014; Avenevoli et al., 2015). AE is conversely associated with lowering the risk of various other conditions

developing that range from age- and dementia-related cognitive decline to mood and anxiety disorders (Martinsen, 2008; Ahlskog et al., 2011; Mammen and Faulkner, 2013). For example, a recent systematic review concluded that AE was effective at reducing the risk of patients with schizophrenia developing a comorbid disorder (Firth et al., 2015). The inclusion of AE interventions could be useful in reducing the risk of patients developing other comorbidities, but AE may be of an even greater utility in helping to reduce the risk of pharmacologically induced side effects.

# Side Effects in Pharmacology

Pharmacological treatments are generally associated with a higher risk of inducing adverse side effects than nonpharmacological interventions in patients of MDD or schizophrenia (Gartlehner et al., 2015, 2016). Both antipsychotic (De Hert et al., 2012) and antidepressant (Anderson et al., 2012) medications are associated with numerous adverse side effects that range from more common and less severe symptoms like headaches or nausea, to less common but more severe symptoms like cardiovascular or metabolic dysfunction. Independent of influences from medication, psychiatric populations are already at an elevated risk of cardiovascular, metabolic or respiratory dysfunction than the general population (Galletly et al., 2012; Vancampfort et al., 2013) and this risk may be heightened further by certain pharmacological treatments. Contrastingly, AE is known to elicit comparatively few adverse side effects and is associated with improvements in the social, physical and affective well being of individuals with psychiatric disorders (Fiuza-Luces et al., 2013). Moreover, AE is known to be preventative of cardiovascular, metabolic and respiratory dysfunction, which suggests that the inclusion of an AE intervention may be useful in stemming the risk of psychiatric patients developing these severe conditions (Caemmerer et al., 2012; Vancampfort et al., 2014). Some evidence of this can be drawn from recent animal studies that have demonstrated that AE ameliorates the metabolic, lipid peroxidation and extrapyramidal side effects of antipsychotic medication (Czéh et al., 2007; Teixeira et al., 2011; Baptista et al., 2013; Boyda et al., 2014).

Concerns are also mounting over the growing length of time in which a patient is subjected to pharmacotherapy, as the long-term impact of these medications is unclear. There is currently a distinct lack of longitudinal studies that systematically evaluate the impact of long-term treatment with widely used antidepressants like SSRI's on brain and behavior (Popovic et al., 2015). Some studies have suggested that long-term antidepressant treatment (generally lasting more than 6 months) can have a detrimental impact on executive function, memory, attention, and motivation in patients with MDD (Fava et al., 2006; Popovic et al., 2015; Bortolato et al., 2016). Worryingly, one recent animal study demonstrated that the long-term administration of fluoxetine at clinically relevant doses was associated with impaired dendritic spine morphology leading to deficits in hippocampal synaptic plasticity (Rubio et al., 2013). This is particularly concerning given the increasing importance that is being placed on the role of hippocampal synaptic plasticity in the effective treatment of MDD (Duman and Aghajanian, 2012). Evidence is also accumulating to suggest that long-term antipsychotic treatment in humans may be linked to a reduction in both global gray- and white-matter volumes (Navari and Dazzan, 2009; Ho et al., 2011; Vernon et al., 2011; Fusar-Poli et al., 2013) as well as region-specific reductions in hippocampal volume (Panenka et al., 2007).

There is an insufficient amount of data to make any definitive statements about the long-term impact these pharmacological interventions could be having on the brain. However, evidence may continue to emerge that shows the long-term application of widely used antidepressants and antipsychotics as having a detrimental impact on neuroplasticity and cognitive performance. The inclusion of AE interventions could be useful in counterbalancing some of these harms, at least with regard to the hippocampus, through the direct impact that AE has on promoting neuroplasticity and cognition. For example, recent animal studies have demonstrated that AE was able to reverse most of damage caused to hippocampal volume and hippocampal synaptic connectivity caused by antipsychotic medication (Barr et al., 2013; Ramos-Miguel et al., 2015). In addition, the aforementioned potential for a combined pharmacotherapy/AE approach to have an additive impact on symptom alleviating may mean that patients reach remission at a faster rate. Thereby, the inclusion of AE interventions may also reduce the total time a patient is exposed to pharmacotherapy, lowering their risk of any adversities associated with long-term use.

Developing further pharmacological interventions that effectively alleviate cognitive symptoms in medicated psychiatric populations will undoubtedly provide a more comprehensive treatment approach, but it could also elevate the risk of adverse side effects for the patient. AE has the potential to directly combat these adverse side effects as well as contributing toward the remediation of cognitive deficits, as well as various other psychiatric symptoms that may be related to hippocampal dysfunction. AE interventions are unlikely to represent a viable standalone treatment for cognitive dysfunction, but may make an important, low-risk adjunctive treatment to pharmacotherapy that promotes the effectiveness and reduces the harms associated with pharmacological interventions. However, this approach must be considered in light of some limitations associated with AE intervention.

# Limitations of AE Interventions

Psychiatric patients often view pharmacological interventions more negatively than by the doctors who prescribe them (Nosè et al., 2012) and it is possible that patients may perceive a nonpharmacological adjunctive approach more favorably. Psychiatric patients have previously reported having a favorable perspective of AE treatment (Stanton and Reaburn, 2014), however, this is not necessarily reflected in actual AE engagement (Vancampfort et al., 2016). Motivating psychiatric patients to engage in regular AE is likely to be the greatest obstacle in both researching and implementing AE interventions in treatment, as epitomized by the high dropout rates in AE interventional studies in psychiatric samples (Stubbs et al., 2016; Vancampfort et al., 2016). Amongst those who are physically able to exercise, the lack of motivation is a significant factor preventing individuals with

schizophrenia (Soundy et al., 2014) or MDD (Krämer et al., 2014) from adhering to AE interventions. It is possible that certain pharmacological treatments may contribute to this issue as psychiatric patients have previously reported medication side effects as a primary factor that inhibits their capacity to engage in regular exercise (Glover et al., 2013). While many studies investigating AE interventions in psychiatric patients do now incorporate motivational strategies (e.g., motivational interviews or goal setting) to encourage AE engagement, motivational factors are rarely included as a primary outcome measure (Farholm and Sørensen, 2016). Increasingly, efforts are being concentrated on promoting adherence to AE interventions in psychiatric patients (Knapen et al., 2015). For example, the integration of action video games with AE interventions has been shown as a promising method of enhancing the adherence of patients with schizophrenia to improving their aerobic fitness (Kimhy et al., 2016b). However, more research must be dedicated to directly studying the relationship between motivation and AE in psychiatric populations in order to measure and improve the effectiveness of motivational strategies.

Another issue in the conceptualization of AE treatments is the lack of consensus as to what type, intensity or length of exercise sessions has the strongest impact on the brain (Prakash et al., 2015). For example, some studies argue that a highintensity exercise is optimal for reducing symptoms in MDD (Singh et al., 2005), while others have suggested that that a mild (Dunn et al., 2005) or a moderate intensity exercise intervention would be optimal (Stanton and Reaburn, 2014). It is possible that exercise intensities may vary depending on the purpose of the intervention, for example it has been suggested that improving cognitive performance may require high-intensity, interval training but preserving cognitive function in an aging brain may require a lower intensity, more continuous protocol (Duzel et al., 2016). Exercise type may also be an important factor. Although most of the current literature has focussed on AE, other forms of exercise such as yoga (Lin et al., 2015) or weight training (Suo et al., 2016) may also be beneficial in promoting brain health and cognition. Given the growing interest in exercise as a therapeutic intervention, it is surprising that only a handful of studies have attempted to systematically establish the most effective way in which it should be applied in psychiatric populations (Perraton et al., 2010; Stanton and Happell, 2014; Stanton and Reaburn, 2014; Kimhy et al., 2016a). It is important that future research concentrates on establishing the merits of different forms of exercise and fully outlining the dose-response relationship between the intensity and length of AE intervention and therapeutic outcome in each psychiatric population.

## CONCLUSION

Research demonstrating the potential for AE to promote hippocampal structure and function is growing at an impressive rate as more and more work is translated from animal to human models. Importantly, the beneficial impact that AE has on the brain may have a useful clinical application in treating disorders in which hippocampal damage is a significant factor that underlies its symptomatology. There is currently a particular need to develop effective strategies that alleviate cognitive dysfunction and targeting deficits in the neuroplasticity of crucial areas to cognition like the hippocampus, is a promising approach to remediating cognitive dysfunction. AE interventions represent an effective method of promoting hippocampal neuroplasticity and function that encompasses few risks and several additional benefits to the patient, such as combating pharmacologically induced side effects. This paper has highlighted two promising examples of how AE interventions could improve the treatment of schizophrenia and MDD, but AE interventions could well have a broader application in mental health such as in treating substance abuse (Zschucke et al., 2012).

However, several issues must be addressed for AE to be successfully implemented as an adjunct to pharmacotherapy. Firstly, more RCTs are needed to systematically establish a causal relationship between AE with neural and cognitive outcomes. Future RCTs could benefit from a more hippocampus-focussed approach with particular regard to the choice of cognitive tasks used in the study. The use of techniques like multimodal imaging and peripheral biomarker assays should also be incorporated to build a comprehensive account of the impact that AE has on the brain. A greater focus should be placed on investigating the impact of AE in psychiatric populations both in terms of neuroplastic changes and therapeutic efficacy. An interesting avenue of research is to assess to what extent AE could interact with current pharmacological treatments to reduce side effects and have a synergistic impact on neuroplasticity and symptom reduction. Finally, future research should strive to establish standardized methodologies for investigating AE and the most effective method in which an AE intervention would be implemented to maximize therapeutic outcome. Improving our understanding of the role that lifestyle factors such as exercise play in maintaining and promoting brain functioning could have major implications for the way in which we treat, or even prevent, psychiatric and neurological disorders.

# AUTHOR CONTRIBUTIONS

AK was responsible for the conception and writing of the review and received substantial, direct and intellectual contributions from JH, PL, and MY. All authors approved the paper for publication.

# FUNDING

MY is supported by the National Health and Medical Research Council of Australia Fellowship (#APP1021973). PL is supported by the Netherlands Organization for Scientific Research (NWO), ISAO, Alzheimer Nederland and Amsterdam Brain and Cognition (ABC).

# ACKNOWLEDGMENT

Thank you to Dr. Chao Suo for contributing toward the final revisions of the paper.

# REFERENCES

fnhum-10-00373 July 27, 2016 Time: 12:23 # 15


patients with major depression. Arch. Intern. Med. 159, 2349–2356. doi: 10.1001/archinte.159.19.2349





schizophrenia: feasibility, Safety, and Adherence. Psychiatr. Ser. 67, 240–243. doi: 10.1176/appi.ps.201400523


executive functions in depressed patients. J. Clin. Psychiatry 64, 1005–1012. doi: 10.4088/JCP.v64n0905


sleep disruption, exercise and inflammation: implications for depression and antidepressant action. Eur. Neuropsychopharmacol. 20, 1–17. doi: 10.1016/j.euroneuro.2009.08.003



antipsychotic drugs on hippocampal volume in schizophrenia. Schizophr. Res. 94, 288–292. doi: 10.1016/j.schres.2007.05.002


model of schizophrenia. Neuropharmacology 62, 1349–1358. doi: 10.1016/j.neuropharm.2011.08.005



Tulving, E. (1993). What is episodic memory? Curr. Perspect. Psychol. Sci. 2, 67–70.


relationship to cell proliferation and neurogenesis. Hippocampus 19, 928–936. doi: 10.1002/hipo.20545


disorders. Pharmacol. Biochem. Behav. 99, 130–145. doi: 10.1016/j.pbb.2011. 03.022


hippocampal cell proliferation and learning and memory in stressed rats. Neuroscience 222, 289–301. doi: 10.1016/j.neuroscience.2012.07.019


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Kandola, Hendrikse, Lucassen and Yücel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Exercise-Induced Fitness Changes Correlate with Changes in Neural Specificity in Older Adults

Maike M. Kleemeyer <sup>1</sup> \*, Thad A. Polk <sup>2</sup> , Sabine Schaefer <sup>3</sup> , Nils C. Bodammer <sup>1</sup> , Lars Brechtel <sup>4</sup> and Ulman Lindenberger 1,5

<sup>1</sup>Center for Lifespan Psychology, Max Planck Institute for Human Development, Berlin, Germany, <sup>2</sup>Computational and Cognitive Neuroscience Laboratory, Department of Psychology, University of Michigan, Ann Arbor, MI, USA, <sup>3</sup> Institute of Sport Science, Saarland University, Saarbruecken, Germany, <sup>4</sup>Berlin Academy of Sports Medicine, Berlin, Germany, <sup>5</sup>European University Institute, San Domenico di Fiesole (FI), Italy

Neural specificity refers to the degree to which neural representations of different stimuli can be distinguished. Evidence suggests that neural specificity, operationally defined as stimulus-related differences in functional magnetic resonance imaging (fMRI) activation patterns, declines with advancing adult age, and that individual differences in neural specificity are associated with individual differences in fluid intelligence. A growing body of literature also suggests that regular physical activity may help preserve cognitive abilities in old age. Based on this literature, we hypothesized that exercise-induced improvements in fitness would be associated with greater neural specificity among older adults. A total of 52 adults aged 59–74 years were randomly assigned to one of two aerobic-fitness training regimens, which differed in intensity. Participants in both groups trained three times a week on stationary bicycles. In the low-intensity (LI) group, the resistance was kept constant at a low level (10 Watts). In the high-intensity (HI) group, the resistance depended on participants' heart rate and therefore typically increased with increasing fitness. Before and after the 6-month training phase, participants took part in a functional MRI experiment in which they viewed pictures of faces and buildings. We used multivariate pattern analysis (MVPA) to estimate the distinctiveness of neural activation patterns in ventral visual cortex (VVC) evoked by face or building stimuli. Fitness was also assessed before and after training. In line with our hypothesis, traininginduced changes in fitness were positively associated with changes in neural specificity. We conclude that physical activity may protect against age-related declines in neural specificity.

Keywords: aging, fitness, physical exercise, neural specificity, multivariate pattern analysis

# INTRODUCTION

Pictures of faces, houses, and many other stimulus categories elicit distinguishable patterns of neural response in ventral visual cortex (VVC). For example, using functional magnetic resonance imaging (fMRI), Haxby et al. (2001) found distinct neural activation patterns in response to eight stimulus categories within ventral temporal cortex. After being trained on a subset of activation patterns, machine learning classifiers can often decode the stimulus category associated with novel patterns (Haynes and Rees, 2006; Norman et al., 2006). And the more distinctive, or

#### Edited by:

Soledad Ballesteros, National University of Distance Education, Spain

#### Reviewed by:

Jonas Kaplan, University of Southern California, USA Pietro Pietrini, IMT School for Advanced Studies, Italy Eduard Kraft, Ludwig Maximilian University of Munich, Germany

> \*Correspondence: Maike M. Kleemeyer

kleemeyer@mpib-berlin.mpg.de

Received: 21 October 2016 Accepted: 03 March 2017 Published: 16 March 2017

#### Citation:

Kleemeyer MM, Polk TA, Schaefer S, Bodammer NC, Brechtel L and Lindenberger U (2017) Exercise-Induced Fitness Changes Correlate with Changes in Neural Specificity in Older Adults. Front. Hum. Neurosci. 11:123. doi: 10.3389/fnhum.2017.00123 specific, the patterns are, the more accurately a classifier will be able to predict the stimulus category from neural activity, so classifier accuracy is a natural way to estimate neural specificity.

Evidence suggests that neural specificity declines with increasing age (Grady et al., 1994; Park et al., 2004) and reduced neural specificity is associated with lower cognitive performance in a variety of cognitive tasks in older adults (Park et al., 2010). But why does neural specificity decline with age? Age-related declines in neurotransmitter function and neuromodulation have been suggested as underlying mechanisms. For example, the number of dopamine (DA) neurons (Bäckman et al., 2006), as well as DA receptor levels (Inoue et al., 2001) show pronounced and widespread decrements with advancing adult age. Neurocomputational models predict that attenuated neuromodulation lowers a cell's responsivity and leads to less differentiated neural responses to different stimuli (i.e., less distinct neural representations), which in turn would explain age-related deficits across a wide range of cognitive domains (Li and Sikström, 2002).

On the other hand, regular physical activity has repeatedly been shown to preserve cognitive abilities in old age (for reviews see Bherer et al., 2013; Voss et al., 2013). Evidence from animal research suggests that exercise induces an upregulation of DA (Sutoo and Akiyama, 2003; Poulton and Muir, 2005; Foley and Fleshner, 2008), possibly by stimulating DA synthesis through a calcium/calmodulin-dependent system. Consistent with this hypothesis, Ruscheweyh et al. (2011) found increased plasma concentrations of DA after 6 months of physical training in older humans. Moreover, Stroth et al. (2010) showed that young adults with a genotype associated with lower DA levels (val/val COMT gene homozygotes) exhibited greater cognitive improvements after a 4-month exercise intervention compared with other genotypes (met carriers). Since the relationship between DA and cognitive performance seems to follow an inverted U-shape, Stroth et al. (2010) hypothesized that exercise may optimize central DA availability. This especially benefits those individuals with suboptimal initial levels, as it moves them further up the curve. Similarly, a cross-sectional study with older adults showed that val/val homozygotes benefitted most from better fitness in terms of Flanker task performance. The latter result suggests that high fitness levels may compensate for being a COMT val/val homozygote (Voelcker-Rehage et al., 2015).

Taken together, the two lines of research suggest that neural specificity may provide a potential mechanism for beneficial effects of exercise in preserving cognitive functions in older adulthood. If so, improvements in exercise-induced fitness should be associated with more positive changes in neural specificity.

We are aware of four previous intervention studies that looked at exercise-related changes in BOLD activation (Colcombe et al., 2004; Voss et al., 2010; Voelcker-Rehage et al., 2011; Maffei et al., 2017). All four of them focused on changes in neural efficiency, that is, more efficient usage of brain networks, which is typically reflected in a reduced BOLD signal, and in greater functional connectivity with increasing cognitive load. The results of these studies suggest that an aerobic fitness training as opposed to an anaerobic fitness training or a passive control improves neural efficiency during task performance, as reflected by maintained (Maffei et al., 2017) or even reduced (Colcombe et al., 2004; Voelcker-Rehage et al., 2011) BOLD activation in task-relevant areas, as well as by strengthened functional connections within the default mode and frontal executive networks (Voss et al., 2010). In contrast to these former studies, which focused on exercise-induced changes in neural efficiency, the present study investigates exercise-induced changes in neural specificity, that is, the distinctiveness of specific stimulus-evoked neural activation patterns, irrespective of BOLD signal strength. Note that it is possible to observe changes in neural efficiency without changes in neural specificity, such as when two different stimuli elicit smaller, more efficient activation after the intervention that are not accompanied by reductions in activation overlap. Likewise, one may observe changes in neural specificity without changes in neural efficiency, such as when two stimuli elicit more distinct activation patterns after the intervention that are not accompanied by reductions in BOLD signal strength.

To test whether exercise-induced fitness improvements translate to neural specificity we used fMRI and multivariate pattern analysis (MVPA) within an exercise-dose-response paradigm. Elderly participants were randomly assigned to training regimens with differing levels of intensity. Before and after the 6-month training phase, participants performed a graded maximal exercise test to assess their trainingrelated fitness improvements as well as a passive viewing task while functional brain images were acquired. As in earlier work (see Carp et al., 2011), the distinctiveness of neural activation patterns in response to different stimulus categories served as an index of neural specificity. To examine the hypothesized association between fitness and neural specificity, we correlated changes in fitness with changes in neural specificity.

#### MATERIALS AND METHODS

#### Participants

The total sample consisted of 52 community-dwelling older adults aged 59–74 (mean 65.95 ± 4.36, 20 males). All participants met the following inclusion criteria: (1) age range in years between 59 and 75; (2) physical inactivity prior to study enrollment (MET < 40 based on the German version of the compendium of physical activities); (3) MMSE score ≥ 26; (4) free of neurological, psychiatric, and cardiovascular diseases; (5) right-handed; (6) no contraindication for heart-rate controlled exercise training (e.g., no medication with beta blockers); (7) suitability for MR assessment (e.g., no magnetic implants, no claustrophobia). This study was carried out in accordance with the recommendations of the ethics committee of the German Psychological Society (DGP). All participants gave written informed consent in accordance with the Declaration of Helsinki and participated voluntarily. They were paid for study completion; training adherence was reinforced through a bonus system. Details on the recruitment can be found in previously published work based on the same study (Kleemeyer et al., 2016). Five participants were excluded from the MVPA analyses, four due to incomplete fMRI data, and one due to improper slice positioning at pretest. One participant was excluded from the fitness analyses due to problems in VO2max detection. Thus, correlation analyses were based on data from 46 participants.

#### Design

Participants completed a 6-month fitness intervention with a comprehensive test battery before (pre) and after (post) the training. Please note that we confine ourselves to describing only those methods relevant for the scope of this article, that is, training, fitness assessment, and imaging procedures. Additional information can be found in Kleemeyer et al. (2016).

# Training

Participants were randomly assigned to a high-intensity (HI) or low-intensity (LI) training regimen subsequent to the pretest assessment. Groups were counterbalanced for age, sex, years of education, digit-symbol performance, and MMSE scores.

Participants in each of the two groups exercised in our lab on stationary bikes, three times a week for 55 min in each session, with a gradual increase during the first 3 weeks. Over the 6-month period, a total of 75 training sessions could be achieved. For the HI group, training intensity was calibrated to result in a heart rate at 80% of the individual's ventilatory anaerobic threshold (Wasserman et al., 1990). In contrast, the LI group exercised at a constant resistance of 10 W. For the last 21 sessions, five intervals of 2 min each were integrated after 20 min of training in order to further increase variance in fitness gains. During these 2-min time windows, the LI group only increased the cadence from 60–70 to 80–90 cycles/min, while the HI group also increased the intensity to a resistance corresponding to 110% of the individual's ventilatory anaerobic threshold. Training intensity was automatically controlled using the software custo cardio concept (custo med GmbH, Ottobrunn, Germany), with a staff member supervising compliance for each participant and each training session. Up to six participants exercised simultaneously, irrespective of intensity levels (e.g., HI and LI participants exercised together). The existence of different training regimens was conveyed only after termination of the study.

# Cardiovascular Fitness Assessment

Participants performed a graded maximal exercise test on a cycle ergometer to assess their cardiovascular fitness. The test started at 10 W, increased to 25 W after 2 min followed by 25 W increments every 2 min until total exhaustion or signs of cardiac or respiratory distress. A sports physician continuously monitored the cardiogram, oxygen uptake, heart rate, and blood pressure. We computed an aggregate measure of the maximum oxygen consumption at exhaustion (VO2max) and the oxygen consumption at the ventilatory anaerobic threshold (VO2AT) to obtain a more robust fitness measure. Therefore, data were z-transformed in a way that preserves mean differences between time points, namely by subtracting the common mean from pretest and posttest data. The aggregate fitness measure VO<sup>2</sup> served as the outcome of the fitness assessment.

# MRI Data Acquisition and Preprocessing

During functional imaging, participants passively viewed face, house, or phase-scrambled images, following the procedures of Park et al. (2010). They completed two runs, each of which consisted of four blocks per stimulus category. During every block 15 images were shown for 2 s each, resulting in 30 s per block and 6 min per run. Stimuli were presented via E-prime (Psychology Software Tools, Pittsburgh, PA, USA) and displayed by a projection system.

Brain images were acquired on a Siemens TIM Trio 3T MRI scanner (Siemens, Erlangen, Germany). A conventional echo-planar MR sequence was used for functional acquisitions (TR = 2000 ms, TE = 30 ms, flip angle = 80◦ , FOV = 216 mm) encompassing 192 volumes per run and 36 slices per volume (slice thickness 3 mm). Slices were 72 × 72 matrices acquired parallel to the Corpus Callosum. A high-resolution T1-weighted MPRAGE (TR = 2500 ms, TE = 4.76 ms, TI = 1100 ms, flip angle = 7◦ , acquisition matrix = 256 × 256 × 176, 1 mm isotropic voxels) was also acquired to facilitate warping masks from MNI to individual subject space. Data were preprocessed using SPM12 (Wellcome Department of Cognitive Neurology, London, UK<sup>1</sup> ). Functional images were realigned to the mean volume. The T1-weighted image was normalized to MNI space. The inverse normalization parameters were then applied to an AAL atlas based mask of VVC (including bilateral occipital cortices, inferior temporal cortices, and fusiform gyri). As a last step, the T1-weighted image was co-registered to the mean functional image, and the same parameters were applied to the mask such that all images mapped into the subject's native space. No normalization, spatial smoothing or other transformation was applied to the functional images.

To obtain activation maps, we setup a General Linear Model (GLM) to estimate the response to each category relative to phase-scrambled control images. We defined a separate regressor for each experimental block, resulting in eight estimates of face-evoked activation and eight estimates of house-evoked activation. We also included six nuisance covariates per run in the GLM, modeling head translation and rotation.

# Multivariate Pattern Analysis

Since we were interested in the responses to face and house stimuli, we restricted the analysis to voxels within VVC. We applied MVPA using correlation analysis (see Haxby et al., 2001) on individual subject data separately for pretest and posttest. More precisely, we used the 16 coefficient estimates for faces and houses vs. phase-scrambled images from the GLM and extracted the activation pattern for voxels within

<sup>1</sup>http://www.fil.ion.ucl.ac.uk/spm/

VVC. Next, we computed the Pearson correlation within categories (i.e., correlating the face-evoked activation patterns from odd blocks pairwise with the face-evoked activation patterns from even blocks and the same for house-evoked activation patterns) and the Pearson correlation between categories (i.e., correlating the face-evoked activation patterns from odd blocks pairwise with the house-evoked activation patterns from even blocks and vice versa). To ensure a more normal distribution, correlation coefficients were transformed into Fisher's z-values. Neural specificity was then defined as the difference between the mean within-category and betweencategory correlations.

#### Statistical Analyses

Statistical analyses were performed using SPSS (IBMCorp., IBM SPSS Statistics, V22, Armonk, NY, USA). To assess effects on fitness and neural specificity, we used repeated-measures analysis of variance (ANOVA) with time point as a within-subject factor and training group as a between-subject factor. To examine whether changes in fitness were associated with changes in neural specificity, we performed a two-tailed Pearson correlation analysis across participants from both groups using the absolute difference (post-training minus pre-training) in fitness and the absolute difference (post-training minus pre-training) in neural specificity. The alpha level for all analyses was set to p = 0.05.

#### RESULTS

There were no baseline differences between participants in the two training groups with respect to age, years of education, MMSE, BMI, fitness, hormone replacement therapy, and treated hypertension (see **Table 1** for means and standard deviations (SD) or proportion of participants, respectively). Also, training adherence did not differ reliably between the two groups (mean HI = 71.14, mean LI = 69.12, t(44) = −1.088, p = 0.282).

The intervention was associated with increasing fitness levels (F(1,49) = 5.637; p = 0.022; η 2 <sup>p</sup> = 0.103). However, HI and LI groups did not differ in mean fitness changes (F(1,49) = 0.997; p = 0.323; η 2 <sup>p</sup> = 0.020). Consequently, data were collapsed across treatment conditions when investigating exercise effects on neural specificity.

As predicted, greater changes in fitness were associated with greater changes in neural specificity (r(46) = 0.310, p = 0.036, R <sup>2</sup> = 0.096), irrespective of the training regimen's intensity (see **Figure 1**). Similar to changes in fitness, neural specificity increased from pretest to posttest in some participants, and decreased in others, especially among those whose fitness did not improve. Overall there was no significant change in neural specificity with training (F(1,45) = 1.891; p = 0.176; η 2 <sup>p</sup> = 0.040) and no reliable interaction of neural specificity changes with group (F(1,45) = 0.006; p = 0.936; η 2 <sup>p</sup> = 0.000). **Figure 2** displays mean changes in fitness and neural specificity.

To differentiate training-induced changes in neural specificity from changes in neural efficiency, we also tested whether training affected mean BOLD activation in stimulus-relevant regions, that is, the fusiform face area (FFA) for faces and parahippocampal place area (PPA) for buildings. We extracted the % BOLD signal change from four spherical ROIs (FFA left and right, PPA left and right, with 5 mm radius around center coordinates adapted from Ishai et al., 1999) for the appropriate condition (in FFA for faces, and in PPA for houses) using MarsBaR. We found a negative but non-significant relationship between change in fitness and change in BOLD activation in both the FFA (r(46) = −0.201, p = 0.18, R <sup>2</sup> = 0.040) and the PPA (r(46) = −0.181, p = 0.23, R <sup>2</sup> = 0.033). The average effect across both regions was also negative, but not significant (r(46) = −0.234, p = 0.12, R <sup>2</sup> = 0.055). For neither region alone, nor for the average, did we observe significant training-induced changes in % BOLD signal change (all p > 0.3), nor were there significant interactions with training group (all p > 0.2). All means and SD are provided in **Table 2**.

#### DISCUSSION

In this study we investigated whether exercise-induced fitness improvements would be associated with enhanced neural specificity. We found a positive correlation between changes in fitness induced by a 6-month exercise intervention and changes in neural specificity, in the sense that participants whose physical fitness improved more also showed more positive changes in


Values are means (M) and standard deviations (SD), or number of participants (N) and percentage within group (%), respectively. Abbreviations: MMSE, Mini-Mental State examination; BMI, Body mass index.

neural specificity. These data suggest that regular physical activity may reduce or even reverse aging-related declines in neural specificity that have been reported in earlier studies (Park et al., 2004, 2010; Payer et al., 2006).

In terms of mechanisms, we propose that regular exercise may counteract the age-related weakening of dopaminergic neuromodulation (Bäckman et al., 2006). Neurocomputational models predict that adequate DA availability helps to keep the sigmoidal gain function within ranges that optimize signal transmission (Li et al., 2001). Optimized signaling reduces the detrimental effects of neural noise, and leads to the generation of more distinct internal representations (Li and Sikström, 2002). We note that this hypothesis could be tested more directly using Positron Emission Tomography (PET) to investigate DA binding and neural specificity before and after an exercise intervention.

Age-related losses in dopaminergic neuromodulation have more often been reported in frontal regions than in VVC. However, there is evidence for age-related losses of DA in the human temporal, parietal, and occipital cortices as well (Kaasinen and Rinne, 2002), closely resembling the dopaminergic decline repeatedly observed in the striatum. Recently, Garrett et al. (2015) found effects of d-amphetamine (boosting DA levels) on BOLD signal variability in regions beyond typical DA projections (e.g., primary visual cortices). Moreover, in a PET study with 181 healthy adults between 64 and 68 years of age, Nyberg et al. (2016) found that both caudate and hippocampal D2 DA receptor availability were positively associated with episodic memory. These findings provide further evidence for the functional significance of DA across many regions of the human brain.

It is also possible that other neurotransmitters are playing a role, instead of, or in addition to, DA. For example, in a proton magnetic resonance spectroscopy study, Maddock et al. (2016) observed increased signals for both glutamate and GABA in the visual cortex following vigorous exercise, indicating increased glutamate and GABA levels. Likewise, animal work showed an increased visual response in mouse visual cortex during running using in vivo calcium imaging. The suggested mechanism includes disinhibition of glutamatergic pyramidal neurons through the interaction of two different GABAergic interneurons (Fu et al., 2014). However, these results were obtained during running and there is only preliminary evidence to suggest that more exercise in the preceding week


Abbreviations: MVPA, Multivariate pattern analysis; FFA, fusiform face area; PPA, parahippocampal place area. <sup>∗</sup> Indicates a significant main effect of time from the repeated measures ANOVA (p < 0.05). <sup>1</sup>F(1,49) = 4.805; p = 0.033; η 2 <sup>p</sup> = 0.089. <sup>2</sup>F(1,50) = 4.902; p = 0.031; η 2 <sup>p</sup> = 0.089. <sup>3</sup>F(1,49) = 5.637; p = 0.022; η 2 <sup>p</sup> = 0.103.

also relates to higher resting glutamate (but not GABA) levels (Maddock et al., 2016). Furthermore, administration of GABA and a GABA agonist has been found to increase the orientation selectivity of individual visual neurons (increased neural specificity), while administration of a GABA antagonist has been found to decrease it (reduced neural specificity; Leventhal et al., 2003). Future work is needed to gauge the relative importance of different neurotransmitter systems, and their potential interactions, in mediating the effects of exercise on cognition.

Exercise has been shown to exert effects on other parameters of cerebral functioning, and these parameters may have contributed to our findings. Regarding fMRI, exercise-induced fitness changes were found to induce improved neural efficiency, that is, reductions (Colcombe et al., 2004; Voelcker-Rehage et al., 2011) or stability (Maffei et al., 2017) of BOLD activation in task-relevant areas as well as strengthened functional connections in relation to default mode and frontal executive networks (Voss et al., 2010). Though changes in neural efficiency, as investigated in these earlier studies, and changes in neural specificity, as investigated in the present study, can occur independently from each other, it seems worth exploring whether and in what way they are related empirically. Specifically, exercise training may lead to neural representations that achieve greater distinctiveness with a lower levels of neural activation.

Although our experimental task was not designed to look at neural efficiency, we were curious to see whether our training did affect mean BOLD activation in stimulus-relevant regions (FFA and PPA) and if so, in what way. Overall, we found negative, but non-significant, relationships between change in fitness and change in BOLD activation. These trends seem consistent with the hypothesis that improved fitness is associated with increased neural efficiency (i.e., less BOLD signal change). However, because we used a passive viewing task, there are no behavioral measures (e.g., reaction time, accuracy) from the scanning sessions that we could relate to the BOLD signal. We therefore acknowledge that the present study does not directly address neural efficiency, as this would require reduced BOLD activation in the absence of performance decrements.

At a more general level, exercise has been found to change brain perfusion (Pereira et al., 2007; Maass et al., 2015). Whereas Pereira et al. (2007) specifically looked at the hippocampus, Maass et al. (2015) also found fitness-related changes in perfusion to affect non-hippocampal cortical blood flow and blood volume. Changes in perfusion could potentially influence our results, as the fMRI BOLD signal measures a vascular response. Hence, we wondered whether the observed association might be due to changes in vascular response rather than changes in neural specificity. If so, one would expect to also observe an association between changes in mean activation level and changes in fitness. We therefore extracted mean activation values from VVC using MarsBaR, but found no reliable correlation with changes in fitness. Furthermore, vascular changes would not be expected to be limited to VVC, and so we tested whether associations between changes in fitness and changes in neural specificity were also present in brain regions that were not significantly activated by the task, such as medial orbitofrontal cortex, precentral gyrus, supramarginal gyrus, superior temporal gyrus, hippocampus, and parahippocampus. None of those regions, nor all of them combined, exhibited the change-change correlation observed in VVC. Each of these findings argues against a predominantly vascular interpretation of the observed effects. We note, however, that a calibrated fMRI approach would be needed to fully disentangle changes in neural activity from changes in vascular reactivity.

In contrast to Park et al. (2010), we did not observe associations between neural specificity and any of the cognitive tasks assessed in this study. One reason may be that Park et al. (2010) used a slightly different visual task in the fMRI scanner: Whereas our participants passively viewed single pictures, Park et al. (2010) presented two pictures side-by-side and asked their participants to make a same/different judgment, which may have introduced a demand characteristic that was absent in our task.

It is conceivable that exercise training as well as familiarization with the MR environment would alter participants' intention to lie quietly in the scanner. To minimize differences in head movement between pretest and posttest, all participants took part in a Mock-Scanner session before pretest. In addition, we used the FMRIB Software Library (Jenkinson et al., 2012) to check for head motion artifacts using the aggregated measures dvars as well as framewise displacement (implemented in fsl\_motion\_outlier). We found no evidence for significant session differences in head motion.

Also, sample size as well as sample selectivity may have influenced our results. In a study with similar sample selection criteria, Voelcker-Rehage et al. (2015) noticed a comparably high amount of COMT met/met allele carriers. Since our participants were inactive but healthy and willing to change their lifestyle, it is rather likely that they were even more selective. Given the absence of genetic information in this study, this conjecture cannot be tested.

As discussed in Kleemeyer et al. (2016), the absence of group differences in fitness changes between the HI and the LI groups may indicate that the level of challenge provided by LI training was already effective in boosting fitness in sedentary older adults, thereby rendering the two training regimes similar to one another. The observed absence of group differences in fitness changes also provides a reasonable explanation for the absence of group differences in neural specificity changes. As for neural specificity, we did not observe a mean increase over time, but rather stability, as the numerical decrease was not statistically different from zero. We interpret this finding in terms of two opposing forces, one related to normal aging and the other related to the fitness intervention. On the one hand, neural specificity tends to decline with age. Participants did not escape normal aging while taking part in our study, and for this reason alone one would expect that neural specificity measured later in time is lower than neural specificity measured earlier in time. On the other hand, the training-induced fitness improvements apparently counteracted aging-related decrements in neural specificity, and apparently reduced or, in some cases, actually offset the effects of aging. We conclude that the effects of fitness improvements on brain functioning were not strong enough to result in a mean positive trend in neural specificity. Nevertheless, and in full agreement with our hypothesis, participants whose fitness improved more showed smaller declines, or even improvements, in neural specificity. From a design perspective, it would have been preferable to also include a no-contact control group

#### REFERENCES


in the study in order to document changes in fitness and neural specificity that would take place in the absence of any intervention.

To conclude, we found that exercise-related changes in fitness are positively associated with changes in neural specificity. Greater neural specificity is related to better fluid processing ability, so these results may explain some of the beneficial effects of exercise on cognition. Future longitudinal and intervention work should include higher intensity levels, longer training durations, and no-contact control groups to replicate and extend these findings.

## AUTHOR CONTRIBUTIONS

MMK designed and conducted the study, analyzed the data, interpreted the results, and wrote the manuscript. TAP assisted with MVPA analysis, interpreted the results, and revised the manuscript. SS designed the study and revised the manuscript. NCB designed the neuroimaging protocol and revised the manuscript. LB performed the cardiovascular fitness assessment and revised the manuscript. UL designed the study, interpreted the results, and revised the manuscript.

## FUNDING

The work reported in this article was supported by the Max Planck Society and the German Research Foundation (DFG; Gottfried Wilhelm Leibniz Research Award 2010 to UL).

#### ACKNOWLEDGMENTS

We thank Timo Schmidt and Naftali Raz for helpful discussions.

working memory in younger and older adults. Proc. Natl. Acad. Sci. U S A 112, 7593–7598. doi: 10.1073/pnas.1504090112


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Kleemeyer, Polk, Schaefer, Bodammer, Brechtel and Lindenberger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fnhum-11-00344 June 28, 2017 Time: 18:11 # 1

# The Impact of Aerobic Exercise on Fronto-Parietal Network Connectivity and Its Relation to Mobility: An Exploratory Analysis of a 6-Month Randomized Controlled Trial

Chun L. Hsu1,2,3,4, John R. Best1,2,3,4, Shirley Wang1,2,3,4, Michelle W. Voss5,6 , Robin G. Y. Hsiung<sup>7</sup> , Michelle Munkacsy1,2,3,4, Winnie Cheung1,2,3,4, Todd C. Handy<sup>8</sup> and Teresa Liu-Ambrose1,2,3,4 \*

<sup>1</sup> Aging, Mobility, and Cognitive Neuroscience Lab, University of British Columbia, Vancouver, BC, Canada, <sup>2</sup> Department of Physical Therapy, University of British Columbia, Vancouver, BC, Canada, <sup>3</sup> Djavad Mowafaghian Center for Brain Health, University of British Columbia, Vancouver, BC, Canada, <sup>4</sup> Center for Hip Health and Mobility, Vancouver, BC, Canada, <sup>5</sup> Health, Brain, and Cognition Lab, University of Iowa, Iowa City, IA, United States, <sup>6</sup> Department of Psychology, University of Iowa, Iowa City, IA, United States, <sup>7</sup> Department of Medicine, University of British Columbia, Vancouver, BC, Canada, <sup>8</sup> Department of Psychology, University of British Columbia, Vancouver, BC, Canada

#### Edited by:

Srikantan S. Nagarajan, University of California, San Francisco, United States

#### Reviewed by:

C. J. Boraxbekk, Umeå University, Sweden Gary Abrams, University of California, San Francisco, United States Karl Meisel, University of California, San Francisco, United States

> \*Correspondence: Teresa Liu-Ambrose teresa.ambrose@ubc.ca

Received: 14 November 2016 Accepted: 14 June 2017 Published: 30 June 2017

#### Citation:

Hsu CL, Best JR, Wang S, Voss MW, Hsiung RGY, Munkacsy M, Cheung W, Handy TC and Liu-Ambrose T (2017) The Impact of Aerobic Exercise on Fronto-Parietal Network Connectivity and Its Relation to Mobility: An Exploratory Analysis of a 6-Month Randomized Controlled Trial. Front. Hum. Neurosci. 11:344. doi: 10.3389/fnhum.2017.00344 Impaired mobility is a major concern for older adults and has significant consequences. While the widely accepted belief is that improved physical function underlies the effectiveness of targeted exercise training in improving mobility and reducing falls, recent evidence suggests cognitive and neural benefits gained through exercise may also play an important role in promoting mobility. However, the underlying neural mechanisms of this relationship are currently unclear. Thus, we hypothesize that 6 months of progressive aerobic exercise training would alter frontoparietal network (FPN) connectivity during a motor task among older adults with mild subcortical ischemic vascular cognitive impairment (SIVCI)—and exercise-induced changes in FPN connectivity would correlate with changes in mobility. We focused on the FPN as it is involved in top-down attentional control as well as motor planning and motor execution. Participants were randomized either to usual-care (CON), which included monthly educational materials about VCI and healthy diet; or thrice-weekly aerobic training (AT), which was walking outdoors with progressive intensity. Functional magnetic resonance imaging was acquired at baseline and trial completion, where the participants were instructed to perform bilateral finger tapping task. At trial completion, compared with AT, CON showed significantly increased FPN connectivity strength during right finger tapping (p < 0.05). Across the participants, reduced FPN connectivity was associated with greater cardiovascular capacity (p = 0.05). In the AT group, reduced FPN connectivity was significantly associated with improved mobility performance, as measured by the Timed-Up-and-Go test (r = 0.67, p = 0.02). These results suggest progressive AT may improve mobility in older adults with SIVCI via maintaining intra-network connectivity of the FPN.

Keywords: aging, impaired mobility, vascular cognitive impairment, fronto-parietal network, functional connectivity, fMRI

# INTRODUCTION

fnhum-11-00344 June 28, 2017 Time: 18:11 # 2

Impaired mobility is a major concern for older adults and is associated with increased risk for disability, institutionalization, and death (Rosano et al., 2008). The prevalence of impaired mobility is 14% at age 75 years and involves half of the population over 84 years (Odenheimer et al., 1994). Falls are a significant consequence of impaired mobility.

Current evidence supports the recommendation of targeted exercise training to improve mobility, prevent major mobility disability, and reduce the risk of future falls (Campbell et al., 1999; Pahor et al., 2014). The widely accepted view is that improved physical function, such as improved balance and increased muscle strength, primarily underlies the effectiveness of the exercise in improving mobility and reducing falls (Layne et al., 2017). However, in a meta-analysis of four randomized trials of exercise, falls were significantly reduced by 35% while postural sway significantly improved by only 9% and there was no significant improvement in knee extension strength (Robertson et al., 2002). Moreover, in a proof-of-concept randomized controlled trial, we demonstrated that a home-based exercise signficantly reduced falls by 47% in older adults – in the absence of significant improvement in physical function (i.e., balance and muscle strength) (Liu-Ambrose et al., 2008). Rather, significant improvement in executive functions were observed in the exercise group as compared with the usual care (i.e., control) group.

These data suggest that exercise may reduce falls in older adults via several mechanisms, not just via improved physical function.

We previously proposed that cognitive and neural plasticity may be an important, yet under-appreciated mechanism by which exercise promotes mobility and reduce falls (Liu-Ambrose et al., 2013). This hypothesis stems from the growing evidence that suggest: (1) cognitive impairment and impaired mobility are associated (Atkinson et al., 2007; Buracchio et al., 2010; Montero-Odasso et al., 2012); (2) reduced executive function, is associated with impaired mobility and increased falls risk (Delbaere et al., 2012; Hsu et al., 2012); (3) aberrant neural network functional connectivity is associated with impaired mobility (Hsu et al., 2014); and (4) targeted exercise training, particulary aerobicbased, promotes cognitive and cortical plasticity, including executive function and its neural correlates, in older adults (Erickson and Kramer, 2009; Voss et al., 2010b).

Despite the growing recognition that targeted exercise training may promote mobility outcomes in older adults via central mechanisms (Liu-Ambrose et al., 2013), few intervention studies of exercise to date have provided direct evidence for this theory (Bolandzadeh et al., 2015). A better understanding of the neural mechanisms underlying exercise-induced improvements in mobility may facilitate the development and refinement of preventative/intervention strategies, as well as identify the populations for whom these effects apply.

Older adults with subcortical ischemic vascular cognitive impairment (SIVCI), the most common form of vascular cognitive impairment (VCI) (Bowler, 2005), are at particular risk for both impaired mobility and dementia secondary to underlying white matter lesions (WMLs) or lacunar infarcts (Roman et al., 2002; Vermeer et al., 2007). VCI is the second most common cause of dementia after Alzheimer's disease (AD) (Desmond et al., 1999; Erkinjuntti et al., 1999; Pantoni et al., 1999; Rockwood et al., 2000). The clinical consequences of covert ischemic strokes are substantial (Roman et al., 2002; Vermeer et al., 2007). These WMLs and lacunar infarcts typically manifest in brain regions such as caudate, pallidum, thalamus, frontal and prefrontal white-matter (Roman et al., 2002). As a result, they may disrupt the integrity of functional neural networks and negatively impact cognitive function, particulary executive functions, and mobility (Kuo and Lipsitz, 2004; O'Brien, 2006).

Among the relevant neural networks, most notably, is the frontoparietal network (FPN). The FPN is involved in topdown attentional control and allocation of available neural resources that contribute to executive processes, such as response anticipation and conflict processing (Fogassi and Luppino, 2005; Seeley et al., 2007; Sridharan et al., 2008), as well as motor planning and motor execution (Wise et al., 1997; Wu et al., 2009; Wymbs et al., 2012). Of particular relevance, previous studies have shown that key regions within the FPN were actively recruited during actual as well as imagined completion of the walking while talking (WWT) test (Holtzer et al., 2011; Blumen et al., 2014). Specifically, neural activity within these FPN regions were positively associated with both task difficulty and cognitive performance of the WWT test (Holtzer et al., 2011; Blumen et al., 2014). Although previous studies have linked aspects of mobility to FPN connectivity, its potential role understanding exercise-induced effects on mobility is unkown.

Thus, we propose FPN connectivity as one of the neural mechanisms by which exercise promotes mobility in older adults with mild SIVCI. Using functional magnetic resonance imaging (fMRI) data from a 6-month single-blind randomized controlled trial (clinicaltrials.gov Identifier: NCT01027858), we conducted a planned secondary analysis to assess the impact of moderateintensity aerobic exercise training on functional connectivity of FPN among older adults with mild SIVCI. We hypothesized that aerobic exercise-induced increases in FPN connectivity would correlate with improved mobility. The primary results from the parent study have been published (Liu-Ambrose et al., 2016), which provided preliminary evidence that 6 months of thriceweekly progressive aerobic training (AT) promotes cognitive performance in community-dwelling adults with mild SIVCI, relative to usual care plus education.

#### MATERIALS AND METHODS

#### Study Design

This is a secondary analysis of neuroimaging data acquired from a 6-month proof-of-concept RCT (NCT01027858) of aerobic exercise in older adults with mild SIVCI (Liu-Ambrose et al., 2010, 2016). Trained study assessors were blinded to group allocation of participants. Functional MRI (fMRI) data were acquired at baseline prior to randomization and at trial completion (i.e., 6 months) in a subset of eligible participants.

# Participants

fnhum-11-00344 June 28, 2017 Time: 18:11 # 3

As the current study was a secondary analysis, we sought to recruit as many eligible and consenting individuals from the parent study as possible. To briefly describe the recruitment process, we recruited from the University of British Columbia Hospital Clinic for Alzheimer's Disease and Related Disorders, the Vancouver General Hospital Stroke Prevention Clinic, and specialized geriatric clinics in Metro Vancouver, BC. Recruitment occurred between December 2009 and April 2014 with randomization occurring on an ongoing basis. Study participants were clinically diagnosed with mild SIVCI as determined by the presence of cognitive syndrome and small vessel ischaemic disease (Erkinjuntti et al., 2000). Small vessel ischemic disease was defined as evidence of relevant cerebrovascular disease by brain computed tomography or MRI defined as the presence of both: (1) Periventricular and deep WMLs; (2) Absence of cortical and/or cortico-subcortical non-lacunar territorial infarcts and watershed infarcts, hemorrhages indicating large vessel disease, signs of normal pressure hydrocephalus, or other specific causes of WMLs (i.e., multiple sclerosis, leukodystrophies, sarcoidosis, brain irradiation). In addition to the neuroimaging evidence, the presence or a history of neurological signs such as Babinski sign, sensory deficit, gait disorder, or extrapyramidal signs consistent with sub-cortical brain lesion(s) was required and confirmed by study physicians (G-YRH and PL). Cognitive syndrome was defined as a baseline Montreal Cognitive Assessment (MoCA) score less than 26/30. However, participants were free of frank dementia (i.e., clinically diagnosis of dementia) as determined by a Mini-Mental State Examination (MMSE) score ≥ 20 and the absence of diagnosed dementia. Progressive cognitive decline was confirmed through medical records or caregiver/family member interviews.

The Consolidated Standards of Reporting Trial flowchart shows the number and distribution of participants included in this secondary analysis (**Figure 1**). Of the 38 participants (54% of parent sample) that completed baseline MRI scanning, 7 (18% of the sample) dropped out from the study and 10 (26% of the sample) failed to correctly perform the motor finger tapping task (e.g., finger tapped during resting blocks). Consequently, 21 participants who completed MRI at baseline and trial completion were included in this secondary analysis (30% of parent sample). Ethical approval was provided by the University of British Columbia's Clinical Research Ethics Board (H07-01160). All participants provided written informed consent.

## Randomization

The randomization sequence was generated using the web application www.randomization.com with a ratio of 1:1 to AT

fnhum-11-00344 June 28, 2017 Time: 18:11 # 4

or usual care (CON). A research team member not involved with the study held this sequence at a remote location. After the completion of consent and baseline testing, the research coordinator contacted the team member holding the list to determine the next allocation.

### Aerobic Training and Compliance

For the AT group, AT consisted of supervised thrice-weekly 60-min classes of walking for the 6-month intervention period. All AT group classes were led by instructors certified to instruct seniors and were delivered in a group setting. Each 60-min class included a 10-min warm-up, 40-min of walking, and a 10-min cool down. Both the warm-up and cool-down included passive and active stretches, as well as range of motion exercise.

Walking occurred outdoors and followed predetermined routes around local areas. The intensity of the AT program was monitored and progressed using three approaches: (1) heart rate monitoring with an initial intensity of 40% of age specific target heart rate (i.e., heart rate reserve; HRR). HRR was calculated by subtracting resting heart rate from maximum heart rate [using the formula: 206.9 – 0.67 × Age (Gellish et al., 2007)] and recalculation each month. Participants progressed over the first 12 weeks to the range of 60–70% of HRR, after which this was sustained for the remainder of the intervention period; (2) subjective monitoring using the Borg's Rating of Perceived Exertion (RPE) (Borg, 1982) with a target RPE of 14 to 15; and (3) the "talk" test (Persinger et al., 2004), starting at a walking pace allowing comfortable conversation and progressing to a walking pace where conversation was difficult. Individual training logs (i.e., target heart rate, heart rate achieved, and rate of perceived exertion) were maintained throughout the intervention period.

The AT group was also given a pedometer to serve as both an incentive and monitoring tool. Participants recorded the number of steps each day taken outside the AT classes on standard logs provided by the research team.

# Usual Care

Participants in the CON group received usual care, in which they were provided with monthly educational materials about VCI and healthy diet. However, no specific information regarding physical activity was provided. In addition, research staff phoned the CON participants on a monthly basis to maintain contact and to acquire research data.

## Adverse Effects

All participants were instructed to report any adverse effects due to the AT exercises to our research coordinator, such as falls or musculoskeletal pain persisting longer than 48 h. Participants were also questioned about the presence of any adverse effects, such as musculoskeletal pain or discomfort, at each exercise session. All instructors also monitored participants for symptoms of angina and shortness of breath during the exercise classes. External experts from our safety monitoring committee reviewed all adverse events reported on a monthly basis.

# Descriptive Variables

At baseline, participants underwent a clinical assessment with study physicians (GYRH and PL) to confirm current health status and study eligibility. Age in years and education level were assessed by self-report. Standing height was measured as stretch stature to the 0.1 cm per standard protocol. Weight was measured twice to the 0.1 kg on a calibrated digital scale. Waistto-hip ratio was determined by measuring the widest part of the hip circumference and the waist just above the navel in centimeters. The Functional Comorbidity Index (Groll et al., 2005) assessed the number of comorbid conditions related to physical functioning.

Global cognition was assessed using the MMSE (Cockrell and Folstein, 1988) and the MoCA. The MMSE and MoCA are 30-point tests that encompass several cognitive domains. The MoCA has been found to have good internal consistency and test–retest reliability and was able to correctly identify 90% of a large sample of individuals with mild cognitive impairment from two different clinics with a cut-off scores of ≤26/30 (Nasreddine et al., 2005).

## Functional MRI Acquisition

All MRI was conducted at the University of British Columbia (UBC) MRI Research Center located at the UBC Hospital on a 3.0 Tesla Intera Achieva MRI Scanner (Philips Medical Systems, Markham, ON, Canada) using an 8-channel SENSE neurovascular coil. The fMRI consisted of two successive runs with 165 dynamic images of 36 slices (3 mm thick) with the following parameters: repetition time (TR) of 2000 ms, echo time (TE) of 30 ms, flip angle (FA) of 90 degrees, field of view (FoV) of 240 mm, acquisition matrix 80 × 80, voxel size of 3 mm × 3 mm × 3 mm. High resolution anatomical MRI T1 images were acquired using the following parameters: 170 slices (1 mm thick), TR of 7.7 ms, TE of 3.6 ms, FA of 8 degrees, FoV of 256 mm, acquisition matrix of 256 × 200.

During each scanning session, the study participants were asked to perform a finger tapping motor task that had been previously administered and described (Hsu et al., 2014). Briefly, the task consisted of three conditions: right finger tapping, rest, and left finger tapping. The specific instructions given required the participants to finger tap in a particular sequence regardless of condition: start with the index finger and progress toward the little (pinky) finger continuously until a different condition is presented. For the rest condition, participants were asked to rest with their eyes open. The exact order of motor task blocks was not disclosed to the participants and was counter balanced over two runs as follow:

Run A: Rest, Left Tap, Rest, Right Tap, Rest, Right Tap, Rest, Left Tap, Rest

Run B: Rest, Right Tap, Rest, Left Tap, Rest, Left Tap, Rest, Right Tap, Rest

Total WML volume (in mm<sup>3</sup> ) at baseline was quantified with structural MRI data acquired on the same MRI scanner (3T Achieva, Philips Medical Systems, Markham, ON, Canada) at the UBC MRI Research Centre. A T2-weighted scan and fnhum-11-00344 June 28, 2017 Time: 18:11 # 5

a proton-density-weighted (PD-weighted) scan were acquired for each subject. For the T2-weighted images, the repetition time (TR) was 5,431 ms and the echo time (TE) was 90 ms, and for the PD-weighted images, the TR was 2,000 ms, and the TE was 8 ms. T2- and PD-weighted scans had dimensions of 256 × 256 × 60 voxels and a voxel size of 0.937 mm × 0.937 mm × 3.000 mm. Briefly, WMLs were identified and digitally marked (i.e., placing seed points) by a radiologist on T2 and PD weighted images. Marked WMLs were automatically segmented by a customized Parzen windows classifier that estimated the intensity distribution of the lesions – which also included heuristics that optimized the accuracy of the estimated distributions (Parzen, 1962; McAusland et al., 2010; Bolandzadeh et al., 2015). WML segmentation was reviewed by a trained technician to ensure accuracy.

# Mobility, Cardiovascular Capacity, and Physical Activity

Mobility was assessed with the Timed-Up-and-Go test (TUG) and the Short Physical Performance Battery (SPPB). The TUG required participants to rise from a standard chair, walk a distance of three meters, turn, walk back to the chair and sit down (Shumway-Cook et al., 2000). We recorded the time (s) to complete the TUG, based on the average of two separate trials.

For the SPPB, participants were assessed on performances of standing balance, walking, and sit-to-stand. Each component is rated out of four points, for a maximum of 12 points; a score < 9/12 predicts subsequent disability (Guralnik et al., 1995).

Participant's cardiovascular capacity was assessed using the 6-Minute Walk Test (Enright, 2003). The total distance walked (meters) within the span of 6 min was recorded.

Monthly total physical activity level was determined by the Physical Activities Scale for the Elderly (PASE) self-report questionnaire (Washburn et al., 1999).

# Data Analysis

#### Functional MRI Preprocessing

Image preprocessing was carried out using tools from FSL (FMRIB's Software Library) [78], MATLAB (Matrix Laboratory), and toolboxes from SPM (Statistical Parametric Mapping). Excess unwanted structures (i.e., bones, skull, etc.) in high resolution T1 images were removed via Brain Extraction Tool (BET); rigid body motion correction was completed using MCFLIRT (absolute and relative mean displacement were subsequently extracted and included in the statistical analysis as covariates); spatial smoothing was carried out using Gaussian kernel of Full-Width-Half-Maximum (FWHM) 6.0 mm; temporal filtering was applied with high pass frequency cut-off of 120 s. In addition, a low pass temporal filtering was also included to ensure the fMRI signal fluctuated between 0.008 < f < 0.080 Hz, the ideal bandwidth to examine functional connectivity. Furthermore, the application of a low pass filter eliminated high frequency signals that could be confounds. Participants' low-resolution functional data were registered to personal high resolution T1 anatomical images, which were subsequently registered to standardized 152 T1 Montreal Neurological Institute (MNI) space.

Noise generated from both physiological and nonphysiological sources were removed through regression of the cerebral-spinal fluid (CSF) signal, white matter signal, and global brain signal. Global signal regression had been reported as both valid and useful step in functional connectivity analyses (Fox et al., 2009) that may improve specificity (Murphy and Fox, 2016).

#### Functional Connectivity Analysis

Previous studies guided our choice of seeds in the whole brain analysis of the FPN (Voss et al., 2010b). The FPN included the inferior parietal sulcus (IPS), ventral visual cortex (VV), supramarginal gyrus (SMG), superior lateral occipital cortex (SLOC), frontal eye field (FEF), as well as overlapping areas in the temporal-parietal junction. The respective MNI space coordinates for each region of interest (ROI) are provided in **Table 1**.

From each ROI, preprocessed time-series data were extracted with 14 mm spherical regions of interest drawn around their respective MNI coordinates in standard space. The different conditions (i.e., left, right, and rest) within each block of the motor task were extracted and compiled together. To concatenate the time-series data, the stimulus onset time for each task condition was acquired from the task program. Each volume of the data was then sorted according to their respective condition. Once the data were properly categorized, the task-specific volumes (e.g., all the "left" volumes) were merged using a script provided in the FSL program. The first three volumes of any condition were discarded to account for delay of the hemodynamic response. Evidence in the literature has demonstrated that functional connectivity derived from temporally spliced/merged resting-state data from blocked fMRI design is not significantly different from connectivity derived from continuous data (Fair et al., 2007). Recent study using motor task fMRI also showed that quantifying functional connectivity via similar seed-based approach using concatenated data is comparable to results from continuous data (Zhu et al., 2017).

Region of interest time-series data were subsequently crosscorrelated with every voxel within the brain to establish functional connectivity maps of their associated neural networks,


RIPS, right inferior parietal sulcus; RVV, right ventral visual; LVV, left ventral visual; RSMG, right supramarginal gyrus; RSLOC, right lateral occipital cortex; LSLOC, left lateral occipital cortex; RFEF, right frontal eye field; LFEF, left frontal eye field.

in which pairwise correlation between time-series extracted from ROI listed above was calculated. Individual-level withinsubject results were generated via ordinary least-squares (OLS) regression using FSL's flameo (Beckmann et al., 2003) in FSL by congregating the voxel-wise functional connectivity maps from each condition. Similarly, for group results, a mixed-level OLS analysis was conducted. The statistical map thresholding was set at Z = 2.33, with cluster correction of p < 0.05.

#### Statistical Analyses

fnhum-11-00344 June 28, 2017 Time: 18:11 # 6

Statistical analysis was conducted using the IBM SPSS Statistic 23 for Windows (SPSS Inc., Chicago, IL, United States). Statistical significance was set at p ≤ 0.05 for all analyses. Change in network connectivity strength was computed in SPSS as 6-month FPN connectivity minus baseline FPN connectivity. Linear mixed models with random intercepts and time-varying outcome measures were constructed to statistically test for significant between-group differences in change in network connectivity while adjusting for baseline total WML and age. A group by time interaction indicated group differences in changes in FPN connectivity from baseline to post-intervention. Similar analyses were conducted to determine whether there were group differences in changes in TUG, SPPB, 6MWT, and PASE scores. The primary analyses included the 21 participants with valid baseline and post-intervention fMRI data. Secondary analyses followed the intention-to-treat principle by including nine additional individuals with valid baseline fMRI but were lost to follow-up; maximum likelihood estimation allowed for these individuals to inform the treatment effects, despite having missing follow-up data and to determine whether loss to followup might bias the treatment effects estimated with only treatment completers.

Bivariate correlation analyses were performed to determine whether any significant changes in intra-network FPN connectivity (during rest, left tap, and right tap) in the AT group (n = 12) correlated with change in mobility, as measured by TUG and SPPB, or change in 6MWT across the 6-month study duration.

#### RESULTS

#### Participants and Treatment Fidelity

Among the 70 randomized individuals in the parent study, we observed a significant effect of AT on 6-min walk performance, a well-established tool that accurately evaluates cardiovascular fitness (Cataneo et al., 2010) (B = 30.34, p = 0.02), indicating that AT had a positive effect on cardiovascular capacity (Liu-Ambrose et al., 2016). Twenty-one participants who completed fMRI scans at both baseline and 6 months were included in the primary analysis (**Figure 1**). Study demographics are reported in **Table 2**, pedometer information over the intervention period is reported in **Table 3**, mobility and cardiovascular capacity measures are reported in **Table 4**; these measures do not differ between groups at baseline nor differ significantly from the 70 eligible participants enrolled in the parent study (Liu-Ambrose et al., 2016). The mean TABLE 2 | Participant characteristics at baseline (N = 21).


MMSE, Mini-Mental Status Examination; MoCA, Montreal Cognitive Assessment; FCI, Functional Comorbidity Index. All measures were not statistically significant at p < 0.05.

TABLE 3 | AT group pedometer information over 6-month intervention period.


age of all participants included in this secondary analysis was 71.1 years (SD = 8.7 years), which is not significantly different from the mean age of the parent cohort at 74.3 years (SD = 8.3 years). Compared to the nine individuals with valid baseline data only, the study sample did not differ on baseline FPN connectivity (all p > 0.19) but did have higher average baseline MoCA scores (23.2 versus 20.4; p = 0.02). Neither of the mobility measures nor self-reported physical activity differed significantly between groups across the 6 months. However, we observed a trendlevel group difference in the change in 6-Minute Walk Test performance (p = 0.08; **Table 4**), in which the AT group showed greater improvement (48.6 m) compared with the CON group (−0.3 m).

#### AT Compliance and Adverse Effects

The average compliance observed in the AT group was 76% for the walking classes and 65% for the nutrition education classes; whereas the average compliance observed in the CON group was 74%. Two study-related adverse events were reported in the AT group and one in the CON group. All three were non-syncopal falls. One of the falls in the AT group resulted in a broken tooth and required assessment in the Emergency Department; the remaining two did not result in injury.

#### fMRI Results

Results from the seed-based functional connectivity analysis on the FPN (**Figure 2**) showed there were no significant between fnhum-11-00344 June 28, 2017 Time: 18:11 # 7

#### TABLE 4 | Mobility and cardiovascular capacity measures (N = 21).


<sup>∗</sup>Adjusted for baseline WML and age; PASE, Physical Activities Scale for the Elderly.



<sup>∗</sup>Adjusted for baseline WML and age.

group differences in the mean network connectivity strength at baseline, regardless of task conditions (**Table 5**). At trial completion, compared with AT, CON exhibited significantly greater intra-network coupling of the FPN during right finger tapping (p < 0.02) after adjusting for baseline WML and age. No AT effects were observed for FPN connectivity during left finger tapping (p = 0.26) or during rest (p = 0.50). We conducted a secondary, intention-to-treat analysis using all 30 participants with usable baseline data, regardless of loss to follow-up, and observed similar, though weaker, between-group differences in FPN coupling during right finger tapping (p = 0.08). As with the primary analyses, CON showed an increase in intra-network coupling of the FPN (mean = 0.18, SE = 0.09), whereas AT showed no significant change over time (mean = −0.04, SE = 0.08).

#### Correlation Results

Bivariate correlation across the study sample showed that the change in FPN connectivity during right tapping was significantly associated with change in 6-Minute Walk Test performance (r = −0.43, p = 0.05; **Table 6**). Within the AT group (N = 12), the change in FPN connectivity during right tapping was significantly associated with change in TUG performance (r = 0.67, p = 0.02; **Table 6**). Specifically, reduced FPN connectivity from baseline to post-intervention correlated with improved TUG performance over the same period of time (**Figure 3**).

#### DISCUSSION

Contrary to our initial hypothesis, we found that a 6 month AT intervention significantly alters FPN connectivity



1 = 6-Months – Baseline. <sup>∗</sup>p < 0.05.

fnhum-11-00344 June 28, 2017 Time: 18:11 # 8

during right finger tapping among older adults with mild SIVCI. The observed effect of aerobic exercise on the FPN during right tapping was significantly associated with improved mobility and cardiovascular capacity. While these results are preliminary, our data suggest aerobic exercise may promote mobility among older adults with mild SIVCI via altering FPN connectivity.

Our findings are in contrast to previous findings that show altered FPN connectivity is associated with aging (Andrews-Hanna et al., 2007) and with cognitive deficits (He et al., 2007; Geerligs et al., 2012; Poppe et al., 2015). Specifically, Poppe et al. (2015) demonstrated that compared with healthy controls, patients with schizophrenia had significantly less connectivity in the FPN during goal-oriented task performance. Compared with controls, task performance was also significantly worse among patients. Similarly, He et al. (2007) found that compared with age-matched healthy controls, individuals who suffered an acute stroke showed significantly less left–right posterior intraparietal sulcus connectivity while performing a spatial-orientation task. Lower connectivity between the left and right posterior intraparietal sulcus were significantly associated with poorer task accuracy and slower task reaction time.

Moreover, current evidence suggests that SIVCI is generally associated with less functional connectivity of neural networks. For example, among those with SIVCI, Yi et al. (2012) found lower functional connectivity in the medial prefrontal cortex and the middle temporal gyrus. Yu et al. (2015) demonstrated that compared with healthy controls, individuals with SIVCI had less network efficiency in the fronto-temporal and parietal regions. Nonetheless, aberrant functional connectivity has been repeatedly observed among those with SIVCI (Yi et al., 2012; Ding et al., 2015; Yu et al., 2015; Zhou et al., 2016), and the pattern (i.e., increased or decreased connectively) is not consistent across studies. For instance, Ding et al. (2015) found increased functional connectivity in the left middle temporal lobe, right inferior temporal lobe, and left superior frontal gyrus among patients with SIVCI as compared with healthy controls.

However, our results do concur with and extend emerging evidence that show less functional connectivity of large-scale networks may be advantageous (Chuang et al., 2014), especially within the context of mobility. In one cross-sectional study, Rosenberg-Katz et al. (2015) demonstrated that compared with

fnhum-11-00344 June 28, 2017 Time: 18:11 # 9

healthy older adults and individuals with Parkinson's disease who were non-fallers, those with Parkinson's disease who were fallers showed significantly greater connectivity between the posterior parietal lobule and the inferior parietal lobule. This data suggest increased connectivity between parietal regions may be associated with more severe motor impairments and more generally, heightened neural activity (e.g., activation or connectivity) may reflect the inability of networks to actively suppress irrelevant neural events, causing regions to compete unnecessarily for available neural resources. In contrast, diminished connectivity may represent greater efficiency as the networks can effectively allocate resources to areas of immediate importance. Certainly, emerging evidence suggests that lifestyle interventions can improve neural efficiency (Smith et al., 2013; Nishiguchi et al., 2015).

Our observation that AT impacted FPN connectivity only during right hand finger tapping concurs with the literature that suggests the FPN connectivity is lateralized (Smith et al., 2009; Pool et al., 2014; Gao et al., 2015). Specifically, using independent component analysis, Smith et al. (2009) revealed that among the neural networks identified, only the FPN exhibit distinct left-right lateralized components. Also, Jancke et al. (2000) found contra-lateralized FPN activation during right index finger tapping task (without visual cue), including the left dorsal lateral premotor cortex and the left inferior parietal lobule. Moreover, Yuan et al. (2015) recently demonstrated that gait velocity among cognitively normal older adults was significantly associated with connectivity of the left-FPN. Therefore, given that our study participants were all right-hand dominant, our results are supported by the literature. We also extend the current state of knowledge by using data generated from a randomized controlled trial to demonstrate the potential impact of aerobic exercise on FPN connectivity during right hand finger tapping and the significant association between FPN connectivity and mobility and cardiovascular capacity. An alternative interpretation of these results may be that aerobic exercise may help maintain mobility and cardiovascular capacity among older adults with SIVCI via reducing cognitive load (i.e., less FPN connectivity) required to perform less attentiondemanding motor task (i.e., dominant hand finger tapping). Our own previous work supports this latter concept (Hsu et al., 2017). Critically, in this separate sub-analysis of the same 6-month RCT, we found that after AT, older adults confirmed with SIVCI performed significantly better at the Eriksen flanker task compared with the no-exercise controls. The observed improvement in task performance was associated with overall reduction in activation in the lateral occipital cortex and superior temporal gyrus.

It should be noted that we are aware of only one study in the relevant field in the literature that investigated the association between functional connectivity and cardiorespiratory fitness among older adults (Voss et al., 2010a). While our findings deviate from evidence presented, in which the authors reported greater connectivity is associated with higher fitness among healthy older adults (Voss et al., 2010a), several distinctions from the current study should be considered. Specifically, the differences were: (1) the fMRI task (visual vs. motor); (2) the network examined (default mode network vs. FPN); and (3) study participant (healthy older adults vs. older adults with SIVCI). The combination of these variations could have resulted in disparities in the reported findings.

A few limitations should be taken into consideration. First, our study participants are likely healthier and to have superior physical functioning than average older adults with mild SIVCI. This potential sample bias is somewhat unavoidable given the requirement that participants be able to engage in progressive AT safely. However, it also limits the generalizability of our findings to the population of older adults with mild SIVCI as a whole. Secondly, due to the small sample size of the current study, the current dataset may not possess enough power to detect small differences between the two groups. Provided that the study population is generally frail and older, the occurrence of drop-out from potentially strenuous fMRI session is to be expected. Future studies designed with larger sample sizes are necessary to validate the notion of functional network efficiency/inefficiency by providing sufficient power despite the expectation of drop-out. Thirdly, it is possible that subsets of pairwise connectivity between ROI within the FPN may have driven the effects we observed; however, this was not further investigated due to potential issue with type II error with the current sample size. Moreover, there is much controversy in regards to global signal regression and potential observation of artificial anti-correlations. This may be particularly influential when examining functional connectivity between networks deemed anti-correlated in nature (e.g., default mode network and FPN). In assessing within-network connectivity, it may be that the effects of induced anti-correlation are less significant. However, as stated by Murphy and Fox (Murphy and Fox, 2016) 'there is not a single "right" way to process resting state data that reveals the "true" nature of the brain.' They also summarized the several advantages of global signal regression including removal of motion, cardiac and respiratory signals. In addition, despite evidence supporting its use (Fair et al., 2007; Zhu et al., 2017), we recognize temporally splicing and concatenating data is not recommended and can potentially lead to increase in signal noise. Nevertheless, studies demonstrated that connectivity derived from concatenation does not differ significantly from those acquired from continuous data (Fair et al., 2007; Zhu et al., 2017). In addition, our data is limited by the fact that only the connectivity during right hand tapping was statistically significant while left hand was not. Differences in social interactions experienced by the experimental groups may present addition confounding factors to our data. Specifically, active attention provided by trainers within the AT group may potentially influence our findings. Lastly, the relationship between connectivity and SIVCI status is equivocal with much of the evidence generated from cross-sectional studies. Thus, the inclusion of fMRI data from a healthy-aged matched cohort might have facilitated interpretation of our results. Nevertheless, we highlight the key strengths of our currents study design – a randomized controlled trial – which are: (1) provides evidence of causation; and (2) increased internal validity. Thus, our study provides preliminary evidence to suggest that aerobic exercise may impact functional connectivity in older adults with SIVCI, and this is associated with the maintenance of mobility.

## CONCLUSION

fnhum-11-00344 June 28, 2017 Time: 18:11 # 10

Our results demonstrate that neural network functional connectivity may contribute to the effects of aerobic exercise on mobility among older adults with SIVCI. We observed that 6 months of AT maintains motor task-based connectivity within the FPN of older adults with SIVCI, and the degree of decoupling within this region correlates with improvements in mobility. As such, our current findings support emerging results from others that altered functional connectivity within certain neural networks might represent a beneficial change in older adults with mild SIVCI, especially vis-à-vis their mobility. More broadly, these results bring further support to the burgeoning notion that functional neural changes contribute to exercised-induced improvements to mobility among older adults. As extension of these findings, future studies should explore potential interaction between mobility and cognitive outcomes among this population.

# DATA ACCESS AND RESPONSIBILITY

TL-A had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

# REFERENCES


# AUTHOR CONTRIBUTIONS

TL-A and RH were involved in the study concept, design, acquisition of data, preparation and critical review of the manuscript. CH, MM, WC, and TL-A were involved in data collection. CH, TL-A, and JB were involved in writing of the manuscript. CH, TL-A, and JB were involved in statistical analyses. CH, TL-A, and JB were involved in interpretation of data. CH and SW were involved in fMRI data analyses. MV and TH were involved in critical review of the manuscript. All authors had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.

# FUNDING

This work was supported by Canadian Stroke Network and the Heart and Stroke Foundation of Canada to TL-A and the Jack Brown and Family Alzheimer Research Foundation Society to TL-A.

# ACKNOWLEDGMENT

CH is an Alzheimer Society Research Program Doctoral trainee. TL-A is a Canada Research Chair (Tier II) in Physical Activity, Mobility, and Cognitive Neuroscience.


improves executive functioning in older fallers: a randomized controlled trial. J. Am. Geriatr. Soc. 56, 1821–1830. doi: 10.1111/j.1532-5415.2008. 01931.x


fnhum-11-00344 June 28, 2017 Time: 18:11 # 11


Parkinson's disease. Neurosci. Lett. 460, 6–10. doi: 10.1016/j.neulet.2009. 05.046


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewers KM, GA and handling Editor declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2017 Hsu, Best, Wang, Voss, Hsiung, Munkacsy, Cheung, Handy and Liu-Ambrose. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fnhum-11-00344 June 28, 2017 Time: 18:11 # 12

# Corrigendum: The Impact of Aerobic Exercise on Fronto-Parietal Network Connectivity and Its Relation to Mobility: An Exploratory Analysis of a 6-Month Randomized Controlled Trial

Chun L. Hsu1, 2, 3, 4, John R. Best 1, 2, 3, 4, Shirley Wang1, 2, 3, 4, Michelle W. Voss 5, 6 , Robin G. Y. Hsiung<sup>7</sup> , Michelle Munkacsy 1, 2, 3, 4, Winnie Cheung1, 2, 3, 4, Todd C. Handy <sup>8</sup> and Teresa Liu-Ambrose1, 2, 3, 4 \*

*<sup>1</sup> Aging, Mobility, and Cognitive Neuroscience Lab, University of British Columbia, Vancouver, BC, Canada, <sup>2</sup> Department of Physical Therapy, University of British Columbia, Vancouver, BC, Canada, <sup>3</sup> Djavad Mowafaghian Center for Brain Health, University of British Columbia, Vancouver, BC, Canada, <sup>4</sup> Center for Hip Health and Mobility, Vancouver, BC, Canada, <sup>5</sup> Health, Brain and Cognition Lab, University of Iowa, Iowa City, IA, United States, <sup>6</sup> Department of Psychology, University of Iowa, Iowa City, IA, United States, <sup>7</sup> Department of Medicine, University of British Columbia, Vancouver, BC, Canada, <sup>8</sup> Department of Psychology, University of British Columbia, Vancouver, BC, Canada*

Keywords: aging, impaired mobility, vascular cognitive impairment, fronto-parietal network, functional connectivity, fMRI

#### **A corrigendum on**

#### Edited and reviewed by:

*Srikantan S. Nagarajan, University of California, San Francisco, United States*

> \*Correspondence: *Teresa Liu-Ambrose teresa.ambrose@ubc.ca*

Received: *02 August 2017* Accepted: *23 August 2017* Published: *05 September 2017*

#### Citation:

*Hsu CL, Best JR, Wang S, Voss MW, Hsiung RGY, Munkacsy M, Cheung W, Handy TC and Liu-Ambrose T (2017) Corrigendum: The Impact of Aerobic Exercise on Fronto-Parietal Network Connectivity and Its Relation to Mobility: An Exploratory Analysis of a 6-Month Randomized Controlled Trial. Front. Hum. Neurosci. 11:449. doi: 10.3389/fnhum.2017.00449*

#### **The Impact of Aerobic Exercise on Fronto-Parietal Network Connectivity and Its Relation to Mobility: An Exploratory Analysis of a 6-Month Randomized Controlled Trial**

by Hsu, C. L., Best, J. R., Wang, S., Voss, M. W., Hsiung, R. G. Y., Munkacsy, M., et al. (2017). Front. Hum. Neurosci. 11:344. doi: 10.3389/fnhum.2017.00344

In the original article, we used inconsistent wording that, while not incorrect, may cause confusion for readers. In the discussion section and the conclusion, we indicated that we found aerobic training may alter functional network connectivity. For greater clarity, we should have stated that aerobic training maintains network connectivity.

The following corrections, in italics, have been made to Discussion, Paragraph 1.

Contrary to our initial hypothesis, we found that a 6-month AT intervention did not significantly increase, but rather maintained FPN connectivity during right finger tapping among older adults with mild SIVCI. The observed effect of aerobic exercise on the FPN during right tapping was significantly associated with improved mobility and cardiovascular capacity. While these results are preliminary, our data suggest aerobic exercise may promote mobility among older adults with mild SIVCI by maintaining the integrity of FPN connectivity.

Also, a correction has been made to Conclusion, Paragraph 1.

Our results demonstrate that neural network functional connectivity may contribute to the effects of aerobic exercise on mobility among older adults with SIVCI. We observed that 6 months of AT maintained motor task-based connectivity within the FPN of older adults with SIVCI, and the degree of decoupling within this region correlated with improvements in mobility. As such, our current findings support emerging results from others that lower functional connectivity within certain neural networks might represent a beneficial change in older adults with mild SIVCI, especially vis-à-vis their mobility. More broadly, these results bring further support to the burgeoning notion that functional neural changes contribute to exercised-induced improvements to mobility among older adults. As an extension of these findings, future studies should explore potential interactions between mobility and cognitive outcomes among this population.

The authors apologize for this issue and state that this does not change the scientific conclusions of the article in any way.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Hsu, Best, Wang, Voss, Hsiung, Munkacsy, Cheung, Handy and Liu-Ambrose. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Comparing Aging and Fitness Effects on Brain Anatomy

Mark A. Fletcher1,2, Kathy A. Low<sup>1</sup> , Rachel Boyd1,3, Benjamin Zimmerman1,2 , Brian A. Gordon<sup>4</sup> , Chin H. Tan1,3, Nils Schneider-Garces1,3, Bradley P. Sutton1,2,5 , Gabriele Gratton1,2,3,5 and Monica Fabiani1,2,3,5 \*

<sup>1</sup> Beckman Institute, University of Illinois, Urbana, IL, USA, <sup>2</sup> Neuroscience Program, University of Illinois, Urbana, Illinois, USA, <sup>3</sup> Department of Psychology, University of Illinois, Urbana, IL, USA, <sup>4</sup> Department of Radiology, Washington University in St. Louis, Saint Louis, MO, USA, <sup>5</sup> Department of Bioengineering, University of Illinois at Urbana–Champaign, Urbana, IL, USA

Recent studies suggest that cardiorespiratory fitness (CRF) mitigates the brain's atrophy typically associated with aging, via a variety of beneficial mechanisms. One could argue that if CRF is generally counteracting the negative effects of aging, the same regions that display the greatest age-related volumetric loss should also show the largest beneficial effects of fitness. To test this hypothesis we examined structural MRI data from 54 healthy older adults (ages 55–87), to determine the overlap, across brain regions, of the profiles of age and fitness effects. Results showed that lower fitness and older age are associated with atrophy in several brain regions, replicating past studies. However, when the profiles of age and fitness effects were compared using a number of statistical approaches, the effects were not entirely overlapping. Interestingly, some of the regions that were most influenced by age were among those not influenced by fitness. Presumably, the age-related atrophy occurring in these regions is due to factors that are more impervious to the beneficial effects of fitness. Possible mechanisms supporting regional heterogeneity may include differential involvement in motor function, the presence of adult neurogenesis, and differential sensitivity to cerebrovascular, neurotrophic and metabolic factors.

#### Edited by:

Soledad Ballesteros, Universidad Nacional de Educación a Distancia, Spain

#### Reviewed by:

Bonnie J. Nagel, Oregon Health & Science University Keita Kamijo, Waseda University, Japan

\*Correspondence:

Monica Fabiani mfabiani@illinois.edu

Received: 11 March 2016 Accepted: 27 May 2016 Published: 28 June 2016

#### Citation:

Fletcher MA, Low KA, Boyd R, Zimmerman B, Gordon BA, Tan CH, Schneider-Garces N, Sutton BP, Gratton G and Fabiani M (2016) Comparing Aging and Fitness Effects on Brain Anatomy. Front. Hum. Neurosci. 10:286. doi: 10.3389/fnhum.2016.00286 Keywords: aging, cardiorespiratory fitness, exercise, brain anatomy, FreeSurfer

# INTRODUCTION

The brains of seemingly healthy individuals undergo various degrees of cortical and subcortical atrophy with aging (Raz et al., 2005, 2010; Gordon et al., 2008). The exact mechanisms underlying these changes are not completely known, although vascular (Brown and Thore, 2011; Davenport et al., 2012; Lähteenvuo and Rosenzweig, 2012; Fabiani et al., 2014), genetic (McGue and Johnson, 2008; Papenberg et al., 2015), oxidative stress (Muller et al., 2007; Dai et al., 2014; Thorin and Thorin-Trescases, 2014) and hormonal (Morrison and Baxter, 2012) factors may all contribute significantly. These anatomical changes are considered to have important functional consequences, and to influence some of the cognitive decline that accompanies aging (e.g., Salthouse, 2011; Fabiani, 2012; Nyberg et al., 2012).

In an attempt to delay or possibly even reverse some of these structural changes, investigators have focused on lifestyle factors as potential mitigators of the effects of aging.

**Abbreviations:** CRF, cardiorespiratory fitness; eCRF, estimated cardiorespiratory fitness; eTIV, estimated total intracranial volume; VO2max, maximal oxygen uptake.

One such factor that has received particular attention is CRF. Higher CRF is correlated with a number of systemic benefits, including lower blood pressure (Blair et al., 1996; Barlow et al., 2006), decreased stress (Jackson and Dishman, 2006), and better lipid profiles (Dunn et al., 1999; Park et al., 2015). Extensive research has also demonstrated that higher CRF is correlated with improved brain tissue preservation and higher performance in neuropsychological tests (e.g., Colcombe and Kramer, 2003; Gordon et al., 2008; Voss et al., 2011, 2013; Verstynen et al., 2012; Weinstein et al., 2012; McAuley et al., 2013). It has also been shown that CRF mediates age-related changes in some structural (e.g., frontal cortex volume, Weinstein et al., 2012; caudate nucleus, Verstynen et al., 2012) and functional measures (e.g., cerebral blood flow; Zimmerman et al., 2014), and that these changes are relevant for cognitive function (Verstynen et al., 2012; Weinstein et al., 2012). Beneficial effects of CRF on the preservation of regional brain volumes have been reported for a number of brain regions in cross-sectional comparisons of age and fitness groups (Gordon et al., 2008; Bugg and Head, 2011; Verstynen et al., 2012), as well as in response to fitness intervention (Colcombe et al., 2003, 2006; Erickson et al., 2011; Voss et al., 2011). Most of the research conducted thus far, however, has not examined the regional specificity of CRF effects on brain anatomy in a systematic manner, but has instead focused on specific brain areas (such as the hippocampus and prefrontal cortex) that are known to be affected by aging (e.g., Bugg and Head, 2011; Erickson et al., 2011; McAuley et al., 2011).

In this study, we are proposing a strategy to investigate the extent to which the effects of CRF on volumetric brain measures are generalized or regional, and their overlap with the effects of aging. In other words, are the profiles of the effects of CRF and of aging similar throughout the brain or does CRF preferentially influence only some of the brain regions that are affected by aging and not others? To address these questions we report a direct comparison of the profiles of the effects of aging and fitness on volumetric measures obtained throughout the brain (including both cortical and subcortical structures) in older adults between the ages of 55 and 87.

In vivo volumetric measurement of brain areas based on structural magnetic resonance imaging (sMRI) allows for researchers to study the influence of demographic and lifestyle factors on different patterns of volumetric decline in aging. Further, measurements of brain structures throughout the whole brain are critical to provide the comprehensive data required for a profile analysis. Although manual tracing remains the gold standard for in vivo anatomical studies based on structural MR images (Raz et al., 2003; Kennedy et al., 2009), a number of semi-automated methods have become available in the last two decades. These methods, allowing for the parallel measurement of multiple structures, permit a more complete examination of brain anatomy in large groups of individuals without requiring extensive training. One of the first automatic methods used for investigating the effects of CRF on brain anatomy was voxel-based morphometry (VBM; Ashburner and Friston, 2000), which allows for probabilistic brain mapping across populations. VBM studies of the effects of aging on brain anatomy have demonstrated significant gray and white matter loss in a wide range of areas, including most of the frontal and parietal cortex, as well as most subcortical gray matter regions (e.g., Good et al., 2002; Gordon et al., 2008). VBM studies have also examined the effects of CRF. For example, in a cross-sectional study including younger (ages 20– 28) and older adults (ages 65–81) Gordon et al. (2008) showed that higher CRF was associated with volumetric preservation in the inferior frontal, anterior parietal, and medial temporal regions, even when education, age and gender were controlled for. One of the first fitness intervention studies to use VBM (Colcombe et al., 2006) demonstrated that a 6-month aerobic fitness intervention in low-fit older adults (ages 60–79) led to significant volumetric increases in the dorsal anterior cingulate, supplementary motor area, middle frontal gyrus, left superior temporal lobe, and anterior third of the corpus callosum (CC). Other VBM studies examining subcortical regions have shown that higher CRF is associated with larger hippocampal volumes, and that exercise interventions may even lead to an increase in hippocampal size (Erickson et al., 2011; McAuley et al., 2011).

Although VBM can provide a useful estimate of anatomical effects, it also has several limitations, including the fact that coregistration of different structures across individuals relies on group templates, which may introduce biases in the comparison process (Bookstein, 2001; Salmond et al., 2002; Davatzikos, 2004; Ridgway et al., 2008; Kennedy et al., 2009; Whitwell, 2009). FreeSurfer© provides an alternative approach: It offers a way of coregistering anatomical structures across individuals using a semi-automatic parcellation of the cortex based on surface and border features, and the possibility of separating cortical volume from cortical thickness (Dale and Sereno, 1993; Dale et al., 1999; Fischl et al., 2001). Using FreeSurfer, Bugg and Head (2011) examined the correlation between the amount of running, jogging, and walking (as indexed by a questionnaire looking at the previous 10 years) and loss in a few regions of interest (ROIs) in older adults (ages 55–79). Although only some regions were examined, they showed that exercise mitigated agerelated atrophy in the medial temporal lobe (MTL), but failed to show an interaction between age and exercise in other ROIs (including areas of the prefrontal cortex and parietal lobes). In the current study, we used FreeSurfer to derive volumetric brain measurements in a sample of middle–aged and older adults. We also used multiple statistical methodologies in an attempt to provide a more comprehensive description of the effects of fitness and age across brain regions to evaluate their profile overlap. We believe that an improved understanding of the overlap between these variables will help design more effective intervention programs (see for example Bherer et al., 2013; Voss et al., 2013).

#### MATERIALS AND METHODS

#### Participants

Fifty-six healthy adults (29 females, age range = 55–87) from the Champaign-Urbana community were recruited through local advertisements, mass emailing, and flyers. The study was

approved by the University of Illinois Institutional Review Board, and all participants signed documents of informed consent in accordance with the Declaration of Helsinki. All participants were right-handed and fluent English speakers. Information about the participants' demographics is reported in **Table 1**.

Phone interviews were conducted to screen individuals for inclusion/exclusion. Exclusion criteria included a history of drug abuse, major psychiatric or neurological disease, or other serious chronic medical conditions. Those who passed the phone screening were invited into the lab and were given two additional screening measures, the Beck's Depression Inventory (Beck and Steer, 1987) and the modified Mini–Mental Status Exam (Mayeux et al., 1981). Individuals scoring above 14 on the BDI or 51 or less on the mMMSE were excluded from the study. Two male participants were excluded from analysis due to incomplete fitness estimates or missing neuropsychological data. Our final sample consisted of 54 participants.

The anatomical MRIs of 8 college students (age range = 19– 22, Mean Age: = 21.53, SD = 0.99, five Females) were also used for head-size normalization but were excluded from all other analyses (see below for rationale). These younger participants reported themselves to be healthy, right-handed, fluent in English and also underwent a screening similar to the one described above.

#### Fitness Estimation

Estimated cardiorespiratory fitness (eCRF) scores were obtained for each participant using the regression model proposed by Jurca et al. (2005). The eCRF measures have been validated with a large sample of older adults (Mailey et al., 2010; McAuley et al., 2011). They are highly correlated (≈0.7) with VO2max, which is obtained through a graded exercise protocol and is considered the gold standard for assessing CRF. Specifically, Mailey et al. (2010) found that within a population of older adults, eCRF was significantly correlated with metabolic equivalents (METs) (r = 0.66) and with estimates based on submaximal field testing (r = 0.67). Crucially, eCRF allowed us to include participants that may be at risk during standard VO2max testing.

The regression model used for eCRF estimation is based on weighted data including gender, age, body mass index (BMI), resting heart rate, and a physical activity score. Height and weight were measured to calculate the BMI. Resting heart rate was recorded on three separate days, after participants had been sitting for a minimum of 5 min, and these measurements were then averaged. The activity score was derived from the Physical Activity Scale for the Elderly (PASE, Washburn et al., 1993). The PASE contains specific questions to determine the type and frequency (minutes per session and number of sessions per week) of exercise activities a participant does on an average week. Specifically, the total minutes per week of exercises requiring low exertion (i.e., activities that produce only a slight increase in heart rate) and those classified as aerobic exercise (jogging, swimming, cycling etc.) were quantified. These totals were used to categorize each participant into one of the five selfreported activity scores defined by Jurca et al. (2005), and then combined with the other weighted data to obtain an eCRF value for each participant. The eCRF score is expressed in METs, which are defined as the amount of oxygen consumed while sitting at rest (see Jetté et al., 1990). To estimate VO2max from the METs values, the eCRF score needs to be multiplied by 3.5 (Masley, 2009). To facilitate cross-referencing with studies measuring VO2max, **Table 1** also reports the VO2max values estimated from eCRF. The means of the fitness estimates for each group, when compared to existing norms, indicate that they correspond to meaningful differences, with the low-fitness group classified as "poor prognosis in coronary patients; highly deconditioned individuals" (Jurca et al., 2005; see also McArdle et al., 2006).


eCRF, Estimated Cardiorespiratory Fitness; MET, Metabolic Equivalent; Estimated VO2max, milliliters of oxygen per minute per kilogram of body weight; Kg, Kilograms; BMI, Body Mass Index – weight in Kg over height in meters squared (m<sup>2</sup> ); bpm, beats per minute; BDI, Beck's Depression Inventory; mMMS, Modified Mini–Mental Status Examination; Systolic and diastolic blood pressure: mmHg (millimeters of mercury). <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

#### Sample Stratification and Analytic Strategies

The purpose of the present study is to independently assess the effects of age and fitness on brain anatomy. A critical step is to "orthogonalize" these two variables, which are typically highly correlated. To do so, participants were sorted into high- and lowfit groups using the following strategy. To control for the effects of age and gender on fitness, group membership was determined by first separating participants by age, split by the average age of the entire participant pool (mean = 69.56 years). Within each age group participants were then divided by gender and then split by eCRF scores using a median split for each age by gender group. In groups with uneven numbers of members, median participants were designated as high- or low-fit based on whether their eCRF scores were above or below the average eCRF scores of their peer group. Next, participants were combined into high-fit and low-fit categories collapsing across all other categorizations. Fitness designation was done in this way to account for the heavy weighting that gender and age receive in the CRF estimate, and to control for the gender bias known to exist within VO2max scores (Jurca et al., 2005). Thus, each individual was classified as high- or low-fit based on a comparison with their similarly aged peers within the same gender group. As a result of this grouping strategy (see **Table 1**) the low-fit and high-fit groups had an almost equivalent number of males and females (14F, 13M), statistically similar average age (70.30 versus 68.75, t = 0.63, p > 0.10), but significantly different average eCRF scores (5.28 versus 7.88, t = 4.53, p < 0.01), thus effectively stratifying the sample and eliminating the inherent correlation between age and fitness.

# Collection and Processing of Structural MRI Data

High-resolution T1-weighted images were acquired with a 3T Siemens Trio full body scanner using a 3D MPRAGE protocol. MPRAGE pulse parameters were: flip angle = 9, TR = 1900 ms, TE = 2.32 ms, and inversion time = 900 ms. Slices were acquired in the sagittal plane (192 slices, 0.9 mm slice thickness, voxel size 0.9 mm × 0.9 mm × 0.9 mm) with matrix dimensions of 192 × 256 × 256 (in-plane interpolated at acquisition to 192 × 512 × 512) and field of view of 172.8 mm × 230 mm × 230 mm. These parameters allowed for a clear delineation of gray–white matter boundaries upon visual inspection. Structural MRI images were processed with FreeSurfer© 5.0 (for technical details see Dale and Sereno, 1993; Dale et al., 1999; Fischl et al., 1999a,b, 2001, 2002, 2004; Fischl and Dale, 2000; Segonne et al., 2004; Desikan et al., 2006; Han et al., 2006; Jovicich et al., 2006). The FreeSurfer output was thoroughly inspected for errors through extensive visual screening performed by three trained individuals, and corrected according to the methods recommended on the FreeSurfer web site<sup>1</sup> . Finally, estimates of cortical and subcortical volumes were obtained, using an automated probabilistic labeling procedure based on the Desikan–Killiany anatomical atlas (Desikan et al., 2006).

Subcortical and cortical regions were then normalized by intracranial volume (see equation below) to account for volumetric differences in head size (Jack et al., 1989; Buckner et al., 2004). The anatomical data from eight college-aged students were also included in the normalization process in order to frame the results obtained from our 55–87 years-old sample into a broader age-related perspective. In other words, we included the young adults as a reference point, so that the normalized values can be more easily compared to those obtained from samples that, unlike this one, do include younger adults. Mathematically normalization consists of a scaling factor that operates equally for all subjects and groups and therefore does not introduce any bias. Every volume for each participant was corrected for head size with the following formula proposed by Buckner et al. (2004):

$$\mathbf{V\_{adj}} = \mathbf{V\_{nat}} - \mathbf{b(eTIV\_{fs} - \overline{eTIV\_{fs}})}$$

where Vadj is the covariance-corrected (adjusted) volume, Vnat is the original volume calculated by FreeSurfer in native space, b is the slope of the regression of Vnat over eTIVfs, which is the participant's eTIV produced by FreeSurfer, and eTIVfs is the average eTIV of the group. Although several other normalization methods have been proposed, this particular one has been widely utilized throughout the field.

#### Statistical Analyses

The processed and corrected data were analyzed using two approaches: (a) a standard analysis evaluating all the cortical and subcortical ROIs included in the Desikan–Killiany atlas (Desikan et al., 2006); and (b) groupings of different cortical and subcortical regions showing similar effects based on a principal component analysis/VARIMAX decomposition (see Alexander-Bloch et al., 2013), in order to reduce the problem of multiple comparisons present in (a). The choice of an empirical, datadriven (rather than a theory-driven) approach for grouping the different regions was motivated by the following logic: (a) At present, the factors leading some regions more than others to be affected by aging or fitness are not yet known, and this study is, in fact, one of the first addressing this issue; and (b) the factoranalytic approach employed here allows for the data to reveal emergent groupings, which may be useful to drive future research aimed at understanding the mechanisms underlying regional variations in sensitivity to age and fitness. For all these analyses, left and right hemispheres were combined, as we had no specific hemispheric-based hypotheses.

To dissociate the effects of age and fitness we used a datadriven stratification approach, in which participants were given an "age-score" equal to the deviation of their age from the overall mean age (yielding two groups of participants above and below the overall mean age, respectively), and a "fitnessscore" with a value of +1 for "high-fit-for-their-age" individuals (defined as above) and −1 for "low-fit-for-their-age individuals. This approach allowed also for the examination of age × fitness interactions, testing whether age effects were similar for highand low-fit individuals (i.e., whether fitness moderates the effect of age), a pattern of results that is not typically examined by studies using a more traditional multiple regression approach.

<sup>1</sup>http://surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/TroubleshootingData

Pearson's correlation coefficients (r) were calculated to determine the associations of age, fitness and the age × fitness interaction with neuroanatomical measurements. Also, a multiple regression analysis was performed to determine the unique variance explained by these three variables. For this analysis, given the directional nature of the hypotheses (i.e., younger and more fit individuals should have larger brain volumes than older and less fit individuals), we used unidirectional tests.

The main purpose of this study is to compare the profiles of anatomical effects associated with aging and fitness. For this analysis we selected data from the ROI approach. For each of the 48 Desikan–Killiany areas, the effect sizes associated with age and fitness were calculated using Cohen's d. The similarity of the effect sizes of age and fitness across anatomical areas was estimated using a Pearson's correlation coefficient. A consistencybased two-way mixed intra-class correlation coefficient was also calculated to estimate the consistency of the profiles of regional volumetric changes with age and fitness.

# RESULTS

# Age and Fitness Effects: Demographics

Age, fitness, and various physiological and neuropsychological measures are reported in **Table 1**. Several variables demonstrated negative correlations with age including BMI [r(52) = −0.48, p < 0.01] and diastolic pressure [r(52) = −0.39, p < 0.01]. Pulse pressure (the difference between systolic and diastolic pressure, indexing arterial stiffness) was positively correlated with age [r(52) = 0.36, p < 0.01].

# Regional Analyses of Age and Fitness Effects

To determine that the lack of correlation or consistency between the effect of age and fitness on brain volume is not merely due to lack of power, it is important to show that regional volumetric measures are sensitive to each of these two independent variables. To this end, we performed a set of multiple regressions investigating the effects of age, fitness, and their interaction on each individual area. A problem with this analysis is that it is based on a large number of comparisons, which are only partially independent of each other. Therefore, we also performed a second analysis in which we first grouped the different areas using a factor analysis (i.e., PCA followed by Varimax rotation).

#### Region by Region Analysis

The BERT template provided by FreeSurfer was colored (**Figure 1**) to provide a visual summary of cortical and subcortical regions of the Desikin–Killiany atlas that were most strongly (p < 0.01) associated with age and fitness. The correlations between age, fitness and the volumes of cortical ROIs are found in **Table 2**. Within the frontal lobe, only the pars opercularis, pars triangularis and precentral regions were found to be significantly negatively associated with age. However, the superior frontal and precentral regions were positively associated with fitness. In the temporal lobe, most of the regions were negatively associated with age. Furthermore, the superior temporal sulcus

FIGURE 1 | Schematic representation of the results of the regression analysis for all of the ROIs. Colored areas indicate ROIs where age (blue) or eCRF (red) show strong partial correlations (p < 0.01) with the volume of the corresponding region. Although left and right volumes were not examined separately, for illustration purposes both hemispheres are included in the figure. Note that no ROI was significantly affected by both variables in this analysis.

was positively associated with fitness, whereas superior temporal and middle temporal areas showed a similar, but not significant, trend (p < 0.10). The only region showing a modest interaction effect between age and fitness was the fusiform gyrus, with larger volumes in high-fit younger adults, but smaller volumes

TABLE 2 | Raw and partial correlations between anatomical ROIs and age, eCRF, and eCRF by age interaction (centered age multiplied by dichotomized fitness scores).


(Continued)


#### TABLE 2 | Continued

fnhum-10-00286 June 24, 2016 Time: 13:59 # 7

N = 54 <sup>+</sup>p < 0.05, <sup>∗</sup>p < 0.025, ∗∗p < 0.005 (one-tailed).

in high-fit older adults. All the sub-regions of the parietal lobe were negatively associated with age, but none were positively associated with eCRF. The insula was not associated with either age or eCRF. The posterior cingulate was associated with both age and eCRF, whereas the caudal anterior cingulate was associated only with eCRF.

The effects of age and fitness on the subcortical ROIs are reported in **Table 2**. The volumes of the hippocampus, amygdala, putamen, and thalamus were associated with both age and fitness. The nucleus accumbens, brain-stem and ventral diencephalon were associated with age but not eCRF. Lastly, age was positively associated with all ventricle measurements, with no corresponding association with fitness.

#### Anatomical Factor Analysis

A lack of overlap between age and CRF effects would be best demonstrated by a double-dissociation, in which some areas show effects of one independent variable and not the other, and some show the opposite effect. However, such a demonstration may be obscured by the large number of comparisons, due to the many areas that are being considered. To better identify which regions tended to covary in volume, and to reduce the number of comparisons in the age and fitness analyses, the volumes of the cortical and subcortical ROIs from the Desikan–Killiany atlas (Jack et al., 1989; Desikan et al., 2006) were submitted to a PCA. Scree plots suggested five components, or factors, which were subjected to Varimax rotation. A visualization of the regions loading on each factor (Factor Loading Score >0.5) is presented in **Figure 2**. Factor loadings of 0.3 or higher after rotation for each of the five components are reported in **Table 3**. The component scores were then submitted to a multiple regression analysis, whose results are presented in **Table 4**, using age, eCRF and the age × CRF interaction term as predictors.

The first factor consists of areas that cover most of the medial and lateral cortex. It also contains subcortical regions including the thalamus, putamen and amygdala. Regression analyses revealed that variance in Factor 1 amplitude was associated with both age and eCRF. The second factor (not shown in **Figure 2**) included the CC and ventricles, which were strongly associated with age, but not with eCRF. The third factor was made up almost exclusively of the basal ganglia with an additional contribution by the superior parietal cortex. This factor was positively associated with eCRF, but not age. The fourth factor was comprised of inferior frontal and occipital regions. This factor was associated with age, but was not associated with fitness. The fifth factor comprised regions around the dorsolateral prefrontal cortex, the anterior cingulate, and the inferior temporal lobe. Surprisingly, these regions were not associated with either age or fitness. Thus, we found one factor (Factor 1) that was affected by both aging and fitness, two factors (Factors 2 and 4) that were affected only by aging, one factor (Factor 3) that was affected only by fitness, and one factor (Factor 5) that was not affected by either variable.

#### Relationship between Age and Fitness Effects

The main purpose of this paper is to evaluate the overlap between the profiles of age and fitness effects across brain regions. The effect sizes (Cohen's d), across different anatomical areas, for high-fit and low-age participants are presented in **Figure 3** (a = cortical and b = subcortical regions). Positive values in this figure indicate age-related reductions in volume (black bars) or fitness-related increases in volume (gray bars), and vice-versa for negative values. Significant effects [t(53) > 1.67, p < 0.05, onetailed] are denoted by bars crossing the horizontal dashed line. Nine (out of 48) regions showed significantly greater volumes in high- than in low-fit participants. Twenty-one (out of 48) regions showed significantly greater volumes in the younger than in the older participants.

The data presented in **Figure 3** suggest that age and fitness effects were especially different in regions that were most

FIGURE 2 | Visualization of the ROIs loading highly (>0.5) on each factor of the PCA (from data presented in Table 3). Factor 2 (not represented here) is related to ventricles and corpus callosum size (for which loadings were positive and negative, respectively). For factor 3, significant and positive factor loadings were colored in red, with significant but negative factors colored in purple. Superior and inferior views are presented with the anterior portion of the brain pointing upward. Although left and right volumes were not examined separately, for illustration purposes both hemispheres are included in the figure. Anatomical ROIs that had high factor loadings but could not be displayed are listed at the bottom of each factor column.

#### TABLE 3 | Factor loadings for anatomical ROIs entered in the factor analysis.


Factors are ranked in descending order. Highlighted cells contain values greater than or less than 0.50. Bolded values are loading scores greater than 0.60. Values less than 0.30 are excluded for ease of presentation. F, Frontal; T, Temporal; C, Cingulate; V., Ventricle; P, Parietal; CC, Corpus Callosum; Thal, Thalamus; Lat, Lateral; Sup, Superior; Mid, Middle; Med, Medial.

#### TABLE 4 | Raw and partial correlations between anatomical factor loadings with age, eCRF, and eCRF by age interaction (centered age multiplied by centered fitness scores).


N = 54 <sup>+</sup>p < 0.05, <sup>∗</sup>p < 0.025, ∗∗p < 0.005 (one-tailed).

impacted by age (left side of **Figures 3A,B**). This was especially true for the CC and ventral frontal regions. To characterize the relationship (or lack thereof) between the two types of effects, a scatter-plot displaying group-level age effects against grouplevel fitness effects across brain regions is presented in **Figure 4**. Overall, the correlation between these average t-scores was not significant, r(46) = 0.078, p = 0.597.

To quantify the relationship between the two profiles, we computed a consistency-based two-way mixed intra-class correlation coefficient. The result of this analysis indicate that there is little consistency between the profile of age effects and the profile of fitness effects on brain volumes, ric = 0.078, p = 0.298. Although this analysis failed to reach significance when examining all regions, a median split on the effect sizes of age effects revealed interesting differences. For those regions most significantly impacted by age (blue circles in **Figure 4**), a marginal, but negative association was found between age and fitness effects [N = 24, r(22) = 0.380, p = 0.067]. For regions

(B) Cohen's d 0 s for normalized subcortical gray and white matter regions.

less impacted by age (orange circles in **Figure 4**), no association was found between age and fitness effects [N = 24, r(22) = 0.117, p = 0.586].

#### DISCUSSION

The main goal of this study was to determine the overlap between the profiles of the effects of fitness and age on the volumes of different brain areas in adults aged 55 and older. This may have theoretical and practical importance: If age and fitness were found to be related to similar (albeit opposite) volumetric profiles, it could be argued that fitness (or lack thereof) may be an important global mediator of the effects of aging, and a fitnessbased intervention could be used to halt, delay or reverse the effects of aging on brain volumes. Although a large number of studies have investigated the effects of aging and those of fitness on the volumes of various brain regions (see Voss et al., 2013 for a detailed review), a determination of the degree of overlap between these two sets of effects is still lacking.

As found previously in several other studies, our results indicate that both age and fitness have effects on brain volumes. As in previous work (Raz, 2005; Raz and Rodrigue, 2006; Raz et al., 2007; Fjell et al., 2013), the effects of age were particularly evident for the ventricles, the CC, some subcortical gray structures such as the hippocampus and portions of the basal ganglia, as well as a number of cortical regions. Also confirming previous work (Gordon et al., 2008; Erickson et al., 2009, 2011; Chaddock et al., 2010; Bugg and Head, 2011; Verstynen et al., 2012), eCRF was associated with larger volumes in a small set of brain structures, including some cortical (precentral gyrus and superior temporal sulcus) and subcortical (amygdala and some portions of the basal ganglia) structures.

Importantly, however, three findings from our study suggest that the profiles of the effects of aging and fitness are not entirely overlapping. First, an intraclass correlation analysis failed to show a significant association between the two types of effects. Second, the multiple regression analysis of the anatomical PCA factor scores showed that age and fitness were by and large correlated with different components. Specifically, although factor 1 was associated with both variables, factors 2 and 4 were associated with age and not with fitness, and factor 3 (comprised mainly of the basal ganglia) was associated with fitness but not age. The results of the PCA, together with the multiple regression analysis, suggest that age has broad effects on much of the cortex. These are, however, at times separable from the effects of fitness, which are more localized to regions such as the basal ganglia and MTL. Third, very few interactions (no more than could be predicted on the basis of chance) were observed between age and fitness, suggesting that fitness did not moderate the age effects. In other words, according to an additive factor logic, this suggests that age and fitness contribute independent, additive effects to brain anatomy.

It is also interesting to note that those regions that are most impacted by age (blue circles on the right of **Figure 4**) are among those not significantly impacted by fitness. As mentioned previously, when considering only the 24 regions showing the largest age effects (top half), a marginally significant negative association between age and fitness effects is revealed. Although it is not entirely clear why this happens, it is possible that, when the effects of age become too prominent, then the effects of fitness can no longer counteract them. An alternative explanation is that, in some cases, area volume may not be the best measure of tissue preservation, and that other indices of anatomical reserve (such as cortical thinning or measures of myelination) may be more sensitive. This may account for the lack of correlation between CC volume and fitness obtained in our sample, while an association between fitness and white matter integrity (as measured by fractional anisotropy) has been reported in previous studies (e.g., Johnson et al., 2012; Chaddock-Heyman et al., 2014). It should be noted, however, that neither of these studies directly compared the effects of age and fitness on white matter integrity. In addition, the sensitivity explanation does not account for the fact that we still find a strong relationship between age and white matter preservation in our study.

There is of course ample evidence of both physiological (Burdette et al., 2010; Fabiani et al., 2014; Zimmerman et al., 2014) and cognitive (Colcombe and Kramer, 2003; Kashihara et al., 2009; Erickson et al., 2011; Weinstein et al., 2012) benefits of physical fitness in older adults. Further, the brain structures that we (as well as others) find to be associated with CRF actually support important functional phenomena (Gordon et al., 2008; Erickson et al., 2009, 2010; Chaddock et al., 2010; Bugg and Head, 2011; Verstynen et al., 2012; Weinstein et al., 2012).

A number of mechanisms might help explain why some regions seem to be more "fitness sensitive" compared to others. First, many of these "fitness sensitive" regions play unique roles in the planning, coordination, and execution of movements required during aerobic exercise. Indeed, many of the fitnesssensitive regions found in this study are related to motor functions including the precentral gyrus (primary motor cortex), basal ganglia (regulation of movement and motor control), superior temporal sulcus (perception of biological motion), and anterior cingulate (coordination of motor behavior) (DeLong et al., 1984; Grossman and Blake, 2001; Wenderoth et al., 2005; Colcombe et al., 2006; Gordon et al., 2008; Chaddock et al., 2010; Verstynen et al., 2012). Thus, it is possible that for those

older adults who exercise, increased use of these regions might preserve them from age-related atrophy, contributing to regional variations.

A second possible contributor to this regional specificity is the fact that neurogenesis is restricted to just a few regions within the brain. Animal studies have shown that CRF induces neurogenesis in rodents, although this phenomenon seems to be restricted to the hippocampus (e.g., Altman, 1969; Brown et al., 2003; Farmer et al., 2004; van Praag, 2008). In humans adult neurogenesis has also been shown within the hippocampus and regions of the basal ganglia (Eriksson et al., 1998; Spalding et al., 2013; Ernst et al., 2014). In the current study and a number of prior investigations, the effects of fitness on the hippocampus and the basal ganglia are particularly pronounced (e.g., Erickson et al., 2009; Chaddock et al., 2010; Ahlskog, 2011; Bugg and Head, 2011; Erickson et al., 2011; McAuley et al., 2011; Nagamatsu et al., 2016). In fact, our study showed that regions of the basal ganglia (particularly the putamen) demonstrated the strongest association with eCRF, after partialing out the effects of age and gender.

Several other mechanisms could further explain the variations in fitness sensitivity of certain regions including: differences in cerebral blood flow and arterial stiffness across the cortex (e.g., Fabiani et al., 2014; Zimmerman et al., 2014); variations in the sensitivity to various neurotrophic factors (e.g., Voss et al., 2013); and differential oxygen requirements across regions (Kreisman et al., 2000). Indeed, it is likely that these mechanisms might work synergistically to prevent atrophy in some regions, while providing minimum influence on other areas. Our study, however, points out that fitness cannot completely offset the declines associated with normal aging, and illustrates a possible approach for future studies to examine the interactions between fitness and age on volumes across different brain structures.

Within this context, it is important to note that our study, by and large, replicates most of the major findings that have been reported both in the literature on anatomical brain aging as well as in the effects of fitness on brain volumes. This indicates that the results of our study are not outliers with respect to the extant literature. What is novel here is the profile approach used, and the consequent types of analyses that were carried out to examine it. A critical requirement for these analyses is the inclusion of a large number of areas distributed across the whole brain. Another feature of the current study is the dichotomization of the most critical independent variables (age, fitness, and gender), which permits their orthogonalization. This allows for a comprehensive and independent examination of the profiles of the effects of these variables (and of their interactions) across areas. Finally, another important feature of the analytic strategy employed in this study is the use of a factor-analysis approach to identify the brain structures that covary as a function of age and fitness. Although this is a data-driven (rather than theoretically based) approach, and therefore it is inherently exploratory, it does help reduce multiple comparisons. It could also help guide future research aimed at understanding the factors underlying the emergent grouping (as well as the stability of the grouping across different samples).

A possible limitation of the current work is the relatively small sample size (N = 54), which could diminish the power of the analysis. Since some of the findings depend on tests of the null hypothesis (such as the lack of a significant intra-class correlation), this lack of power may potentially be an important problem. However, it should be noted that the statistical power of a study depends not only on sample size but also on the extent to which experimental error can be controlled or minimized. In this study we used a very rigorous process of data quality control, as well as methods for orthogonalizing the variance due to age and CRF. The effectiveness of these procedures is demonstrated by our replication of the main significant effects for both age and fitness normally reported in the literature.

# CONCLUSION

The current study presents an analytic approach to investigate the degree of overlap of the effects of critical variables on brain anatomy. A crucial aspect of this work is the orthogonalization of the effects of these two variables. This allows for the separate study of the profiles of the effects associated with age and fitness, and demonstrates that they are only partially overlapping. While some areas (such as the precentral gyrus, the banks of the superior temporal sulcus, and some subdivisions of the MTL) are affected by both age and fitness, there are a number of areas (including extensive regions of the frontal, parietal, and temporal cortex, as well as the CC) that are only affected by aging, and some structures (mostly in the basal ganglia) that are uniquely affected by fitness.

These findings support the idea that aging and fitness (or lack thereof) have differential effects on the brain. Understanding that fitness cannot revert all of the effects of aging is important for many reasons. First, it may lend support to the hypothesis that a comprehensive preventive approach to brain aging focusing on other lifestyle factors (such as diet or cognitive training) in addition to fitness may be more effective than fitness interventions alone, as these other factors may potentially help protect those brain regions not responsive to fitness. Second, the limited overlap of these effects in many regions may help explain the differential effects of fitness on the various domains of cognition found in previous studies. Finally, these differences may also help guide neurobiological research examining, in an area-specific way, the mechanisms by which fitness could impact the brain.

# AUTHOR CONTRIBUTIONS

MF and GG designed the study and directed all aspects of the project, and helped write and edit the manuscript. MAF with the help of RB conducted all anatomical analyses, and MAF wrote the first version of this manuscript. KAL, BZ, HT, and NS-G carried out the data collection. AG and BPS provided consulting on FreeSurfer. Everyone commented on the paper.

## ACKNOWLEDGMENT

This work was supported by NIA grant 1RC1AG035927 to FM.

# REFERENCES

fnhum-10-00286 June 24, 2016 Time: 13:59 # 13




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Fletcher, Low, Boyd, Zimmerman, Gordon, Tan, Schneider-Garces, Sutton, Gratton and Fabiani. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Cardiorespiratory Fitness Is Associated with Selective Attention in Healthy Male High-School Students

#### Eivind Wengaard<sup>1</sup> , Morten Kristoffersen<sup>1</sup> , Anette Harris <sup>2</sup> and Hilde Gundersen<sup>1</sup> \*

<sup>1</sup>Department of Sport and Physical Education, Western Norway University of Applied Sciences, Bergen, Norway, <sup>2</sup>Department of Psychosocial Science, Faculty of Psychology, Institute of Psychosocial Science, University of Bergen, Bergen, Norway

Background: Previous studies have shown associations of physical fitness and cognition in children and in younger and older adults. However, knowledge about associations in high-school adolescents and young adults is sparse. Thus, the aim of this study was to evaluate the association of physical fitness, measured as maximal oxygen uptake (VO˙ 2max), muscle mass, weekly training, and cognitive function in the executive domains of selective attention and inhibitory control, in healthy male high-school students.

Methods: Fifty-four males (17.9 ± 0.9 years, 72 ± 11 kg and 182 ± 7 cm) completed <sup>a</sup> VO˙ 2max test, a body composition test and a visual cognitive task based on the Posner cue paradigm with three types of stimuli with different attentional demands (i.e., stimuli presentation following no cue, valid cue or invalid cue presentations). The task consisted of 336 target stimuli, where 56 (17%) of the target stimuli appeared without a cue (no cue), 224 (67%) appeared in the same rectangle as the cue (valid cue) and 56 (17%) appeared in the rectangle opposite to the cue (invalid cue). Mean reaction time (RT) and corresponding errors was calculated for each stimuli type. Total task duration was 9 min and 20 s In addition, relevant background information was obtained in a questionnaire.

#### Edited by:

Claudia Voelcker-Rehage, Technische Universität Chemnitz, Germany

#### Reviewed by:

Keita Kamijo, Waseda University, Japan Daniel Sanabria, University of Granada, Spain

> \*Correspondence: Hilde Gundersen

> > hsg@hvl.no

Received: 08 August 2016 Accepted: 08 June 2017 Published: 28 June 2017

#### Citation:

Wengaard E, Kristoffersen M, Harris A and Gundersen H (2017) Cardiorespiratory Fitness Is Associated with Selective Attention in Healthy Male High-School Students. Front. Hum. Neurosci. 11:330. doi: 10.3389/fnhum.2017.00330 Results: Linear mixed model analyses showed that higher VO˙ 2max was associated with faster RT for stimuli following invalid cue (Estimate = −2.69, SE = 1.03, p = 0.011), and for stimuli following valid cue (Estimate = −2.08, SE = 1.03, p = 0.048). There was no association of muscle mass and stimuli (F = 1.01, p = 0.397) or of weekly training and stimuli (F = 0.99, p = 0.405).

Conclusion: The results suggest that cardiorespiratory fitness is associated with cognitive performance in healthy male high-school students in the executive domains of selective attention.

Keywords: aerobic fitness, VO2max, cognitive function, cue-target paradigm, attention

# INTRODUCTION

In recent years, researchers have become increasingly interested in investigating possible impacts of physical activity and physical fitness on cognitive functions, and several studies have suggested that higher physical fitness levels are associated with cognitive benefits (Guiney and Machado, 2013; Dupuy et al., 2015; Cox et al., 2016).

The vast majority of research has been conducted with older adults (above 50 years), and shows positive associations of physical activity and cognitive function (Colcombe and Kramer, 2003), as well as a reduced risk of age-related cognitive decline (Paillard et al., 2015). Recently, a growing number of studies have shown positive associations between physical fitness and cognitive function in developing children and young adolescents (Hillman et al., 2009; Chaddock et al., 2011; Fedewa and Ahn, 2011; Esteban-Cornejo et al., 2015). Positive associations are also reported in young adults (Themanson and Hillman, 2006; Kamijo and Takeda, 2009), however with inconsistent findings (Scisco et al., 2008; Hayes et al., 2014). Knowledge about the associations in high-school adolescents and young adults are still sparse (Cox et al., 2016).

Given that executive functions develop throughout the childhood and decline with age (Friedman et al., 2009), the effects of physical fitness may be elusive in high-school students and in young adults where cognition is developmentally peaking (Hayes et al., 2014). As a result, specific and sensitive methods of measurement are required. In general, a large number of different methods of measurement have been employed in research in this field, as well as different definitions of cognitive function (Biddle and Asare, 2011). Dupuy et al. (2015) have stressed that a major weakness in many studies is the fact that fitness is often measured by subjective self-reports or through submaximal tests. An objective measure of cardiorespiratory fitness, VO˙ 2max, was therefore chosen in the present study, in addition to self-reported weekly physical activity and objectively measured muscle mass.

In general, previous research indicates that the association of physical fitness and cognitive function is stronger for tasks requiring higher level of executive functioning (Colcombe and Kramer, 2003; Hillman et al., 2009; Pontifex et al., 2011; Guiney and Machado, 2013), particularly those tasks related to different aspects of attentional ability such as inhibition and task switching (Themanson et al., 2008; Guiney and Machado, 2013; Buckley et al., 2014). A cognitive task with different attentional load was therefore chosen in the present study to investigate the association of physical fitness and cognitive load.

In terms of age, the high-students in our study came close to what could be categorized as young adulthood, thus the study adds to previous work done on subjects in late adolescence or young adulthood. Our study further contributes to previous work exploring associations between physical fitness and selective attention performed on this age group.

Thus, the aim of this study was to investigate if physical fitness was associated with cognitive function in healthy male high-school students. To do this, participants completed a VO˙ 2max test, answered questions regarding weekly physical activity, underwent a body composition test and performed a pc-based visuospatial attention task to determine performance in the executive domains of selective attention and inhibitory control. The Posner cue target paradigm (Posner, 1980) was chosen to study the association of cardiorespiratory fitness and attention capacity to stimuli with different attentional load. Responding to stimuli presented after an invalid cue is particularly demanding, compared to stimuli following valid cue and no cue presentations, as it involves an interaction between goal-directed actions and inhibition of reflexive bottom-up processing of attention initiated by the distracting cue. To have a homogeneous population, only males were included. The main hypothesis was that higher physical fitness is associated with faster reaction time (RT) for all stimuli categories, due to better selective attention and inhibitory control. Moreover, we expected to find the fastest RT for stimuli following valid cue, followed by invalid cue and no cue.

# MATERIALS AND METHODS

# Participants

Out of the 85 young male students attending two different high schools in Norway, 66 (77.7%) were recruited, with 54 (63.5%) participants (17.9 ± 0.9 years, 72 ± 11 kg and 182 ± 7 cm) completing both the physical test and cognitive task.

#### Procedures

Information regarding the study was sent via e-mail to the principals of two different high schools in Norway, and they replied quickly, expressing their willingness to participate in the study. An information meeting was then arranged at which the students who were interested in participating in the study, signed informed consent forms. When recruiting participants, it was made clear that both trained/active and untrained/sedentary participants were needed.

Data was collected in the course of 6 weeks, from September–November 2015. The tests were conducted in two independent test-runs for all participants, one for the physical tests (Day A) and one for the cognitive task (Day B; **Figure 1**). The cognitive task was performed between 10:15 and 15:22. On average, there was 12 ± 10 days between tests. All physical tests were conducted in the physiological test lab at Bergen University College, and the cognitive task was mainly completed at the students' respective schools. Five to ten minutes before the participants started with the cognitive performance task, they completed a web-based electronic questionnaire.

All participants were told to dress appropriately for the physical tests, wearing shorts or pants, a t-shirt and running shoes. Participants were asked if they had felt ill during the last week before the test, especially with symptoms of fever and/or

impaired general condition. Participants were excluded if they had been diagnosed with heart disease, asthma or any type of disease that could potentially affect their cardiorespiratory system in a strenuous exercise test. Before testing, all participants were informed orally of the test procedures and were given the opportunity to ask questions regarding the tests. In addition, they were asked not to engage in any strenuous physical exercise the day before the physical test as this could affect the results.

## Body Composition

A direct segmental multi-frequency bioelectrical impedance analysis (DSM-BIA) for determining body composition (body weight (kg), muscle mass (%)) was performed using the In-Body720 (Biospace Co. Ltd, Seoul, Korea) body composition analyzer (Lim et al., 2009; Tompuri et al., 2015). Standard procedures were followed for all participants.

#### Cardiorespiratory Fitness Test

The test started with a 10-min warm-up with a gradient of 1.7% on a motorized treadmill (Woodway PPS 55, USA) where the velocity was gradually increased until the participant maintained a heart rate (HR) in the area of 73%–82% of age-predicted maximum HR (HRmax). Following a short rest period (<3 min), an incremental test protocol was followed with a treadmillgradient of 5.3% and a start velocity of 8 km · t −1 . The manual test protocol was controlled by the test leader, whom increased the speed by 1 km · t −1 each minute until volitional exhaustion. Notifications of the amount of time left to complete each speed and the amount of time left until the apparatus would attain a new sample registration were given throughout. When each participant was nearing possible exhaustion (respiratory exchange ratio >1), the test leader asked if a speed increase would be tolerable; the participant then had the opportunity, through the use of different hand signals, to agree to a velocity increase of 1 km · t <sup>−</sup><sup>1</sup> or 0.5 km · t −1 , or to maintain the same speed.

Oxygen uptake was measured using a computer that was connected to a metabolic system with a 4.2-Liter (L) mixing chamber, containing baffles (Oxycon Pro, Erich Jaeger GmbH, Hoechberg, Germany). A full system calibration was performed before the testing of each group of participants, and a volume calibration using a 3 L calibration syringe with an accuracy of 1/2 ± of 1% (Hans Rudolph Inc., Shawnee, KS, USA) was regularly performed between and during the testing of each group. Gas calibration was performed using a container that held 300 L of a gas mixture (Riessner-Gase GmbH, Carefusion, Germany) with a mixture ratio of carbon dioxide (5.84 vol. %), oxygen (15.00 vol. %), nitrogen (79.16 vol. %). Measurements of oxygen uptake were continuously recorded and saved at 30-s intervals. HR was monitored every fifth seconds using the RS400 (Polar, Kempele, Finland).

The average value of the two highest consecutive VO˙ <sup>2</sup> values (30-s intervals) for each participant, in any interval registration, was recorded as their individual VO˙ 2max. VO˙ 2max was defined when two of three criteria were satisfied (Dupuy et al., 2015): (1) attaining a VO˙ <sup>2</sup>-plateau; (2) attaining a HR >90% or equivalent to their age-predicted maximum (i.e., 220 minus age); and (3) attaining a respiratory exchange ratio >1.1.

# Cognitive Performance Task

A visual cognitive task based on the Posner cue paradigm was used to assess response time (RT), accuracy and inhibition (Posner, 1980; Posner and Cohen, 1984; Gundersen et al., 2007; Irgens-Hansen et al., 2015; **Figure 2**). The task was performed on a 13.3<sup>00</sup> laptop, and was programmed with E-prime 2.0, standard version (Psychology Software Tools, Inc.).

The participants were instructed to fixate on the crosshair in the middle of the screen (**Figure 2A**), and to respond as fast as they could by pressing ''l'' on the keyboard when the target stimulus (an asterisk) appeared in the right rectangle and ''d'' when the target stimulus appeared in the left rectangle. Sometimes during the task, the frame on one of the rectangles became broader (a cue) before the target stimulus appeared. Participants were told to ignore the cue stimulus and only press the key when the target stimulus appeared.

All participants were instructed orally and performed a small practice run before the actual task to make sure they had understood the procedure. The task duration was 9 min and 20 s, and it was carried out in a quiet location without distractions (usually a small classroom), wearing hearing protectors to exclude distracting noise.

Three different stimuli categories appeared in the task: ''no cue'', ''valid cue'' and ''invalid cue''. If the target stimuli appeared in one of the rectangles without a cue, this was called a no cue presentation (**Figure 2B**). If the target stimulus appeared within the rectangle with a broader frame, it was called a valid cue presentation (**Figure 2C**). The third category applied when the target stimulus appeared in the opposite rectangle to the cue location, and was called an invalid cue presentation (**Figure 2D**).

The task consisted of 336 target stimuli, with each stimulus being presented for 500 ms with an interstimuli-interval (time between stimulus) of between 600 ms and 1400 ms. The inter-

stimuli interval was selected from a set of fixed values (600, 700, 900, 1000 and 1200 ms). The order of the inter-stimuli interval, stimuli and cues was fixed for all participants, with mean interstimuli-interval of 825 ± 253 ms for no cue stimuli, 1108 ± 112 ms for valid stimuli and 1164 ± 236 for invalid stimuli presentations. The cue appeared in 50% of the cases 200 ms before the target stimuli, and in 50% of the cases 400 ms before the target stimulus would appear. Fifty-six (17%) of the target stimuli would appear without a cue (no cue), 224 (67%) would appear in the same rectangle as the cue (valid cue), and 56 (17%) would appear in the rectangle opposite to the cue (invalid cue).

RT (i.e., processing speed) and response accuracy were recorded and stored on the laptop for each trial. Mean RT for stimuli following no cue, valid cue and invalid cue were calculated for each participant. Error rate for the different stimuli categories were presented in percentage. Registration of responses before the target stimulus appeared, and responses during the first 149 ms. after target stimuli presentation were defined as erroneous responses (Amano et al., 2006). If an erroneous response was corrected by pressing a second time before the next stimulus was presented, it was still considered an erroneous response.

#### Questionnaire

The questionnaire included questions about weekly physical activity (1 = less than once a week, 2 = 1–2 time a week, 3 = 3–4 times a week, 4 = 5–6 times a week, 5 = 7–8 times a week, 6 = 9–10 times a week), and variables that previously have been associated with cognitive performance in visuospatial tests, like ADHD diagnosis (yes/no; McDonald et al., 1999), dyslexia (yes/no; Franceschini et al., 2012), daily nicotine use (yes/no; Heishman et al., 2010) and daily video-game playing (minutes; Green and Bavelier, 2003). In order to calculate the potential covariate effects of alertness on the association of cardiorespiratory fitness and cognitive performance, the participants were also asked to rate their present alertness level on a 5-point Likert scale, ranging from 1 (not alert at all) to 5 (highly alert).

#### Ethics

The study was conducted in accordance with the Helsinki declaration. All participants were informed about the study and signed an informed consent form before participating. Participants could withdraw from the study at any point, and results were treated anonymously. Norwegian Social Science Data Services (NSD) approved the study (reference number: 44551).

#### Data Analysis

The descriptive data is shown as range, mean ± standard deviation (SD). The data was checked for outliers and the preliminary analysis ensured no violation of the assumptions of normality.

To investigate if physical fitness affected RT differently for stimuli following no cue, valid cue and invalid cue, a linear mixed model was applied. Stimuli (no cue, valid cue and invalid cue) were used as repeated measure and subject as random intercept. Main effect of RT for the different stimuli categories, and interactions between RT for the different stimuli and VO˙ 2max, muscle mass and weekly training were included in the model. Furthermore, the model was adjusted for daily video-game playing, time-point of cognitive testing and alertness. Post hoc test with Bonferroni correction was performed to evaluate differences in RT between the different stimuli. A model with a first autoregressive (AR1) covariance structure and maximum likelihood of 10,000 iterations was applied showing an acceptable model fit (AIC = 1454).

Because no participants reported being diagnosed with ADHD, only three participants reported dyslexia and seven reported daily nicotine use, those variables were not included in the regression analyses. For all statistical analyses, p-values of <0.05 were considered statistically significant. SPSS<sup>r</sup> version 23.0 (IBM Corporation, Armonk, NY, USA) for Windows<sup>r</sup> was used for all the statistical analyses.

# RESULTS

A significant main effect of stimuli (F = 6.01, p = 0.003) was found. Faster RT was seen on stimuli following valid cue (Estimate = −146.89, SE = 43.42, p = 0.001), but not on stimuli following no cue (Estimate = −65.62, SE = 32.96, p = 0.051) when invalid cue was the reference. However, post hoc analyses with Bonferroni correction showed significant differences between all stimuli categories. RT to stimuli following invalid cues were slower compared to valid cue (Mean differences = 59.94, SE = 2.81, p < 0.001) and faster compared to stimuli following no cue (Mean differences = −23.16, SE = 2.13, p < 0.001). RT to valid cue was faster compared to no cue (Mean differences = −83.10, SE = 2.8, p < 0.001). For mean values, see **Table 1**.

TABLE 1 | An overview of scores from the physical, cognitive and questionnaire assessments of the participants (n = 54).


Data is presented as mean ± standard deviation (SD) and range (min-max).

Moreover, there was a significant interaction of VO˙ 2max and stimuli (F = 3.67, p = 0.017). Higher VO˙ 2max was associated with faster RT for stimuli following invalid cue (Estimate = −2.69, SE = 1.03, p = 0.011), and for stimuli following valid cue (Estimate = −2.08, SE = 1.03, p = 0.048), but not for stimuli following no cue (Estimate = −1.16, SE = 1.03, p = 0.266). There was no interaction of muscle mass and stimuli (F = 1.01, p = 0.397) or of weekly training and stimuli (F = 0.99, p = 0.405). Time-point of cognitive testing was the only covariate that contributed significantly to the model (F = 14.14, p < 0.001) with significantly slower RT in the morning (Estimate = −12.32, SE = 3.28, p < 0.001). See **Figure 3** for an overview of correlations between RTs of the different stimuli and VO˙ 2max and for the RTs of the different stimuli and the time-point of cognitive testing.

## DISCUSSION

Our results showed a significant interaction of VO˙ 2max and RTs for the different stimuli categories. The interaction was significant for stimuli following valid and invalid cues, with an association between higher VO˙ 2max and faster RT. There was also a significant association between time-point of cognitive testing and RT, with slower RT in the morning.

The findings of the present study show an association between higher VO˙ 2max and faster RT for stimuli following both valid and invalid cues. Both stimuli conditions initiate a reflexive bottom-up attraction of attention due to the appearing peripheral cue, but differ in their demand as the most often appearing valid cue (67%) does not demand inhibition. The invalid condition (17%) will therefore demand more of a volitional top-down control when having to inhibit the most probable response. Higher VO˙ 2max seems to be beneficial in conditions initiated by cues through a faster processing speed of information, indicating a relationship between cardiorespiratory fitness and selective attention and inhibitory control in conditions where facilitation occurs.

Our findings are in agreement with a previous study by Themanson et al. (2008) showing that higher fit individuals exhibited a better top-down control of attention, and a better modulation of responses to tasks measuring selective attention. In addition, our results support previous studies showing that physical fitness yield favorable effects on executive functions (Gomez-Pinilla and Hillman, 2013; Erickson et al., 2014; Dupuy et al., 2015; Guiney et al., 2015; Szuhany et al., 2015; Luque-Casado et al., 2016). No interaction between VO˙ 2max and RT on stimuli following no cue in the present study may be due to a lower attentional demand, or to the lack of priming in that stimuli category.

Our results showed fastest RT for stimuli following valid cue, followed by invalid cue and no cue. As our results show, cueing was consistently associated with faster RT, possibly reflecting a higher internal activation and increased responsiveness to upcoming target stimulus caused by the priming effect (Mangun and Hillyard, 1991; Gabay and Henik, 2008; Hayward and Ristic, 2013). Lack of priming may explain the slowest RT for stimuli following no cue. Slowest RT for no cue could also be related to the shorter interstimuli-interval. Time-point for cognitive testing were associated with RT with highest RT in the morning, supporting common knowledge of the circadian variation in alertness and performance (Van Dongen and Dinges, 2010).

VO˙ 2max was the only significant physical fitness-predictor of cognitive performance in the present study. Our finding is in accordance with previous studies showing that cardiorespiratory fitness is regarded as one of the components of physical fitness mostly associated with cognitive performance (Ruiz-Ariza et al., 2017).

It is worth noting that there are recent studies failing to demonstrate the fitness-executive function hypothesis (Verburgh et al., 2014; Ballester et al., 2015). Moreover, Belsky et al. (2015) found that children with higher IQ scores grew up to be adults who were less sedentary and less obese, and in turn, had better cardiorespiratory fitness. They suggest that socioeconomic status, household and neuroselection must be considered when investigating the fitness-executive function hypothesis. Thus, more studies including other explanatory factors is needed.

Participants in the present study demonstrated an above average cardiorespiratory fitness (VO˙ 2max score of 54.2 ± 4.9 mL · kg−<sup>1</sup> · min−<sup>1</sup> ) compared to the age-relative norm of the population (46–50 mL · kg−<sup>1</sup> · min−<sup>1</sup> ; Shvartz and Reibold, 1990). Thus, we cannot exclude a greater association of VO˙ 2max and cognitive function in participants whose mean score is closer to the population average, given normally distributed VO˙ 2max scores.

#### Limitations and Strengths

Due to the general limitations of a cross-sectional design, it is impossible to establish cause and effect in this study. In addition, the above average fitness of the participants has some implications for generalizability. The fact that the study exclusively recruited males and that the sample size is relatively small contributes to this generalization problem. Moreover, the spatial visual attention task used in the present study may not be appropriate to identify whether the cueing effects are due to endogenous spatial orienting driven by probability manipulation, attentional capture (peripheral cues), stimulusresponse compatibility, or a mixture of all of them. The strengths of the study are the use of direct measures of VO˙ 2max and a pc-based cognitive performance task with different attentional loads and the homogeneous group regarding age and gender.

# CONCLUSION

The findings of this study show that cardiorespiratory fitness is associated with cognitive performance in healthy male high-school students. Our results suggest that cardiorespiratory fitness may be associated with better modulation of bottom-up and top-down processes in tasks involving selective attention. More research is needed to increase the knowledge about factors affecting cognitive function.

#### Implications for Future Research

This study contributes to the limited number of studies investigating the influence of physical fitness on the components of selective attention and inhibition in male high-school students. More studies are needed to further assess the association of physical fitness and attentional tasks demanding inhibitory control, and how this relates to the ability to maintain attention.

#### AUTHOR CONTRIBUTIONS

All authors participated in planning the experiment. EW, MK and HG participated in data collection. All authors have participated in conception of this manuscript, and have assisted in revising the manuscript for important content. EW and HG wrote the first draft of the manuscript, and all authors have provided final approval of the enclosed manuscript.

#### FUNDING

This research did not receive a grant from funding agencies in the public, commercial, or not-for profit sectors.

#### ACKNOWLEDGMENTS

Thanks to all the participants who volunteered to take part in this study; their participation is greatly appreciated. Thanks also to the principals and teachers at the schools for their cooperation, and to Esben Martinsen, Cecilie Hannevig and Henrik Hysing-Dahl for their assistance in parts of the physiological testing.

# REFERENCES


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Wengaard, Kristoffersen, Harris and Gundersen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Steps to Health in Cognitive Aging: Effects of Physical Activity on Spatial Attention and Executive Control in the Elderly

Giancarlo Condello<sup>1</sup> , Roberta Forte<sup>2</sup> , Simone Falbo<sup>2</sup> , John B. Shea<sup>3</sup> , Angela Di Baldassarre<sup>4</sup> , Laura Capranica<sup>1</sup> and Caterina Pesce<sup>2</sup> \*

<sup>1</sup> Sport Performance Laboratory, Department of Movement, Human and Health Sciences, Italian University of Sport and Movement "Foro Italico", Rome, Italy, <sup>2</sup> Exercise and Cognition Laboratory, Department of Movement, Human and Health Sciences, Italian University of Sport and Movement "Foro Italico", Rome, Italy, <sup>3</sup> Ergonomics Laboratory, School of Public Health, Indiana University Bloomington, Bloomington, IN, USA, <sup>4</sup> Department of Medicine and Aging Sciences, School of Medicine and Health Sciences, "G. d'Annunzio" University of Chieti-Pescara, Chieti, Italy

The purpose of this study was to investigate whether physical activity (PA) habits may positively impact performance of the orienting and executive control networks in community-dwelling aging individuals and diabetics, who are at risk of cognitive dysfunction. To this aim, we tested cross-sectionally whether age, ranging from late middle-age to old adulthood, and PA level independently or interactively predict different facets of the attentional performance. Hundred and thirty female and male individuals and 22 adults with type 2 diabetes aged 55–84 years were recruited and their daily PA (steps) was objectively measured by means of armband monitors. Participants performed a multifunctional attentional go/no-go reaction time (RT) task in which spatial attention was cued by means of informative direct cues of different sizes followed by compound stimuli with local and global target features. The performance efficiency of the orienting networks was estimated by computing RT differences between validly and invalidly cued trials, that of the executive control networks by computing local switch costs that are RT differences between switch and non-switch trials in mixed blocks of global and local target trials. In regression analyses performed on the data of non-diabetic elderlies, overall RTs and orienting effects resulted jointly predicted by age and steps. Age predicted overall RTs in low-active individuals, but orienting effects and response errors in high-active individuals. Switch costs were predicted by age only, with larger costs at older age. In the analysis conducted with the 22 diabetics and 22 matched non-diabetic elderlies, diabetic status and daily steps predicted longer and shorter RTs, respectively. Results suggest that high PA levels exert beneficial, but differentiated effects on processing speed and attentional networks performance in aging individuals that partially counteract the detrimental effects of advancing age and diabetic status. In conclusion, adequate levels of overall PA may positively impinge on brain efficiency and attentional control and should be therefore promoted by actions that support lifelong PA participation and impact the built environment to render it more conducive to PA.

#### Edited by:

Claudia Voelcker-Rehage, Chemnitz University of Technology, Germany

#### Reviewed by:

Florentino Huertas Olmedo, Universidad Católica de Valencia San Vicente Mártir, Spain Terry McMorris, University of Chichester, UK

> \*Correspondence: Caterina Pesce caterina.pesce@uniroma4.it

Received: 01 November 2016 Accepted: 20 February 2017 Published: 06 March 2017

#### Citation:

Condello G, Forte R, Falbo S, Shea JB, Di Baldassarre A, Capranica L and Pesce C (2017) Steps to Health in Cognitive Aging: Effects of Physical Activity on Spatial Attention and Executive Control in the Elderly. Front. Hum. Neurosci. 11:107. doi: 10.3389/fnhum.2017.00107

Keywords: attentional networks, reaction time, active lifestyle, late middle-aged, old, diabetes

# INTRODUCTION

fnhum-11-00107 March 2, 2017 Time: 16:34 # 2

The rectangularization of the life expectancy curve and the increasing proportion of 'graying' population (Spirduso et al., 2005) urges societies toward a more comprehensive understanding of how to ensure health and quality of life of aging people. Awareness is increasing that physical activity (PA) is one of the major lifestyle-related health determinants with benefits that go beyond physical health also in advanced age (Netz et al., 2005; World Health Organization, 2007; Ballesteros et al., 2015). Over the past decades, there has been a rise and fall of interest for the different facets of PA-elicited health outcomes due to reasons ranging from epidemiological trends to methodological advancements. The worldwide overweight and insulin resistance epidemic has led to advocate for PA in aging with the aim to ensure health-appropriate levels of PA and caloric expenditure (Ryan, 2010). On the other hand, methodological advancements in cognitive and especially neuroscientific research have allowed to accumulate evidence on the beneficial impact of PA on several aspects of brain health and cognitive efficiency in the aging population with or without chronic diseases (Bherer et al., 2013; Prakash et al., 2015; Young et al., 2015; Gajewski and Falkenstein, 2016).

Age-related chronic diseases are frequently associated with cognitive impairment. Population aging appears to be the most important demographic change to the prevalence of diabetes, one of the four main types of non-communicable diseases across the world (World Health Organization, 2015) projected to double from 2000 to 2030 (Wild et al., 2004) and demonstrated to be a risk factor for cognitive decline and dysfunction as early as middle-age (Kodl and Seaquist, 2008; Nooyens et al., 2010; Luchsinger, 2012; Umegaki, 2014). Thus in recent years, the focus of PA interventions has been extended from sole physical to brain health effects. Designed, structured PA interventions (Espeland et al., 2016), as well as physically active habits as simply walking (Yaffe et al., 2001; Abbott et al., 2004) seem beneficial to counteract cognitive aging of older adults with and without diabetes, even though health-related covariates may limit effect size (Devore et al., 2009).

Echoing the title of a European framework to promote PA for health ("Steps to health," World Health Organization, 2007), we extend the notion of the health-enhancing effects of PA to an aspect of cognitive health in aging people – the efficiency of attentional control – that has still received scarce consideration in research on the influence of PA on cognition. Particularly, the present study investigates the relation of PA habits of aging individuals with and without diabetes, as objectively assessed in terms of daily steps, to the efficiency of the attentional systems responsible for the orienting of attention and the executive control, which seem to undergo different trajectories of agerelated deterioration from middle-age to older adulthood (Zhou et al., 2011).

Recent advancements in neurosciences suggest that these attentional systems rely on two interactive, but anatomically distinct networks each (Dosenbach et al., 2008; Petersen and Posner, 2012; Vossel et al., 2014). The orienting of attention is handled by both a more dorsal and a ventral network that act cooperatively to enable individuals to flexibly control attention in relation to top-down goals and bottom-up sensory stimulation (Corbetta and Shulman, 2002; Vossel et al., 2014). The dorsal network, including parietal regions, as the intraparietal sulcus, but also a small set of frontal locations, particularly in the frontal eye fields, allows for strategic control over attention according to the information delivered by environmental cues. The ventral network, including the ventral frontal cortex and the temporoparietal junction, comes into play when the focus of attention is erroneously engaged by misleading cues and must be therefore disengaged and shifted in a task-relevant direction (Petersen and Posner, 2012).

Executive control is guaranteed by the interplay of two distinct networks too: the fronto-parietal and the cinguloopercular components. The first seems responsible for the adaptability, the second for the stability of top-down task control (Dosenbach et al., 2008). Particularly, the fronto-parietal network, including lateral frontal and parietal regions and particularly the dorsolateral prefrontal cortex, is supposed to initiate executive control and handle its ongoing adjustment for conflict resolution on a trial-by-trial basis. The cingulo-opercular network, including the anterior cingulate cortex, the anterior insula and frontal regions as the frontal operculum, appears to ensure a stable 'set maintenance' over trials by monitoring the preparatory allocation of attention especially in presence of competing attentional sets (Luks et al., 2002; Petersen and Posner, 2012).

After the seminal meta-analysis by Colcombe and Kramer (2003), showing that executive functions of older adults are more improved by PA than lower-level functions, researchers have confirmed such larger or selective effects and devoted noticeable efforts to further differentiate PA effects on specific aspects of executive vs. non-executive function (Gajewski and Falkenstein, 2016). Executive function is responsible for crucial aspects of cognition as planning of goal-oriented actions, monitoring of cognitive operations and behavioral adaptability (Diamond, 2013). Thus, this special focus on PA effects on executive function is well justified, but has led, with specific regard to the PA-attention relationship, to a disproportional interest for the attentional networks responsible for executive control and to a relative neglect of the other attentional networks the executive control network is strictly intertwined with. The study of PA effects on attentional orienting is mainly limited to the effects of acute bouts of exercise in young active adults (Pesce et al., 2007b; Huertas et al., 2011; Sanabria et al., 2011; Luque-Casado et al., 2013; Chang et al., 2015; Llorens et al., 2015). The only two attentional orienting studies performed with older adults have tested the moderation of acute exercise effects by chronic PA participation (Pesce et al., 2007a, 2011).

The lack of aging studies that examine the separate and joint effects of chronic PA on the orienting and executive control networks is surprising, because such networks contribute in an intertwined manner to the ability to allocate attention in response to expectations and environmental stimuli (Petersen and Posner, 2012) that is relevant to functioning and safety of older adults in everyday life (Bédard et al., 2006). Several

situations, as in house hold or road traffic, require the ability to decide where and what to pay attention in advance, as a traffic light for pedestrians, coupling go/no-go actions, but also to re-orient attention rapidly to changes in the environment, as a car unexpectedly approaching, or selecting from the set of possible locomotor actions the most adequate to key features of the situation, avoiding distraction from other irrelevant sources.

Thus, the primary aim of the present study was to investigate whether active PA habits have a similar or differentiated impact – if any – on the performance of the orienting and executive control networks in aging. Of the large body of cognitive research that has investigated the interplay between age and PA (Young et al., 2015; Gajewski and Falkenstein, 2016), most studies have used age and PA levels as categorical variables. When, rarely, direct/indirect measures of PA were used as continuous predictors (e.g., Bixby et al., 2007), age was accounted for as a covariate, thus neglecting the potential interaction between PA and age. Also as regards age, mostly younger and older age classes were compared, whereas the interesting transition phase of late middle-age, characterized by a unique interplay between covert neural and overt behavioral changes (Berchicci et al., 2012) remains relatively underinvestigated. Since specific aspects of the attentional orienting (Pesce et al., 2007a) and executive control networks (Hillman et al., 2006; Themanson et al., 2006) seem benefited by PA at old age, we hypothesized that aging and PA level may have interactive effects on the different facets of the attentional performance across a wide range of ages from late middle-age to old adulthood.

The second aim, related to the first one, was to investigate whether aging diabetics, who are at risk of poor cognition (Luchsinger, 2012), may profit from the expected attentional benefits of being physically active. Although PA and exercise training is considered a major therapeutic modality for type 2 diabetes, persons affected by this pathology usually exhibit lower levels of PA and related lower levels of cardiovascular fitness (Albright et al., 2000). This may prevent diabetics from successful cognitive aging, which is linked to physically active habits (Young et al., 2015) through different mechanisms and above all the enhancement and maintenance of cardiovascular fitness (Stillman et al., 2016). Recent evidence suggests that the mechanisms through which PA affects cognitive function may differ for aging persons by diabetes status, since beneficial cognitive outcomes of PA were found in diabetic elderlies, but not in co-aged individuals without diabetes (Espeland et al., 2016). Thus, we hypothesized to find more pronounced benefits in diabetics than in non-diabetics. Espeland et al.'s (2016) study addressed global cognition and memory. Nevertheless, of particular concern is evidence showing that among the broad range of cognitive functions impaired by type 2 diabetes, there is executive function (Qiu et al., 2006; Okereke et al., 2008). Given the critical role that executive control plays in functional abilities relevant for everyday life of aging people (Rucker et al., 2012; Forte et al., 2013, 2015), we deemed relevant to examine if a physically active lifestyle counteracts the deterioration of the ability to exert executive control over attention in this special population.

# MATERIALS AND METHODS

This study was carried out in accordance with the recommendations of "Umberto I" hospital of the First Rome University with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Ethics Committee of the "Umberto I" hospital of the First Rome University.

#### Participants

Hundred and thirty participants (68 females and 62 males) were recruited according to the following eligibility criteria: (i) age between 55 and 84 years; (ii) not self-reported diagnosis of psychiatric or somatic illnesses, (iii) normal or corrected-tonormal vision. Also, they were stratified sampled for age class (Spirduso et al., 2005): late middle-aged (55–64 years = 48), young-old (65–74 years = 44), and old adults (75–84 years = 38). Within each age class, they were further stratified sampled for their declared PA level to ensure a balanced presence of sedentary and physically active individuals and master athletes (runners and swimmers) engaged in regular structured physical PA/training for ≥3 (n = 42), ≥2 (n = 46), or < 2 (n = 42) sessions/week, respectively.

To address the secondary aim of this study, also a sample of 22 late middle-aged, young-old, and old adults with type 2 diabetes (55–64 = 5; 65–74 = 12; 75–84 = 5) was recruited. Accordingly to literature (Hu et al., 1999), a case of diabetes was considered confirmed if at least one of the following criteria was reported: (1) one or more classic symptoms (excessive thirst, polyuria, weight loss, hunger) and fasting plasma glucose levels of at least 140 mg/dL (7.8 mmol/L), or random plasma glucose levels of at least 200 mg/dL (11.1 mmol/L); (2) at least two elevated plasma glucose concentrations on different occasions [fasting levels of at least 140 mg/dL (7.8 mmol/L), random plasma glucose levels of at least 200 mg/dL (11.1 mmol/L), and/or concentrations of at least 200 mg/dL after 2 h or more shown by oral glucose tolerance testing] in the absence of symptoms; (3) treatment with hypoglycemic medication (insulin or oral hypoglycemic agent).

# Health, Physical Activity, and Anthropometric Assessment

Participants answered the AAHPERD (American Alliance for Health, Physical Education, Recreation and Dance) exercise/medical history questionnaire (Osness et al., 1996) ascertaining their activity level, educational background, dietary habits, tobacco smoking and alcohol consumption, medication use and history of PA.

Daily PA was measured under free-living conditions using the SenseWear Pro3 armband (BodyMedia, Pittsburgh, PA, USA). The use of SenseWear Pro armband has been already validated in older adults (Mackey et al., 2011). The armband is a monitor that integrates the information gathered by the two axis accelerometers and sensors (i.e., skin and near body temperature, heat flux, and galvanic skin response) with sex, age, height, weight, smoking status, and handedness of the user. It

provides proprietary algorithms to give quantitative information (e.g., number of daily steps, locomotor activity intensity, and energy expenditure; Di Blasio et al., 2016) about an individual's habitual PA involving any form of locomotion as activities at workplace, sports, conditioning, house holding. The descriptive characteristics of the participants were entered into the software program (SenseWear Professional 8; BodyMedia) before the monitoring was initialized. The participants wore the armband on the right arm over the triceps muscle at the midpoint between the acromion and olecranon processes. According to reliability criteria reported in the literature, participants wore the armband for seven entire and consecutive days, 24 h a day except during water-based activities (Scheers et al., 2012), with a wear time of at least 540 min/day on weekdays and 480 min/day on weekend days (Di Blasio et al., 2016). From the default information given by the software, the mean value of steps of 7 days was used for the statistical analysis.

Standing height to the nearest 0.1 cm and body mass to the nearest 0.1 kg, were measured using a portable stadiometer (Seca 220, GmbH & Co., Hamburg, Germany) and a balance scale (Seca 761, GmbH & Co., Hamburg, Germany), respectively. Body mass index (BMI, kg∗m−<sup>2</sup> ) was computed. Background information on the participants, as main health, lifestyle and anthropometric characteristics are reported in **Table 1** separately for non-diabetic and diabetic individuals and for the three age classes (late middle-aged, young-old, old).

# Attentional Assessment

The attentional test, developed by Pesce et al. (2003) by means of the Experimental Run Time System (ERTS, BeriSoft Cooperation), has been applied in aging research to investigate the effects of acute bouts of exercise and those of chronic PA participation on performance of the attentional networks responsible for orienting (Pesce et al., 2007a, 2011) and executive control (Pesce and Audiffren, 2011). The testing took place either in the morning or in the afternoon, according to participants' availability, avoiding the time before 9 am, between 1 and 3 pm, and after 7 pm to minimize undesired reaction time (RT) variability due to circadian vigilance fluctuation.

#### Apparatus and Stimuli

Participants were seated in a dimly lit room at a distance of 60 cm from a PC-driven video screen. Four visual displays were used: the instruction, presented on the screen only one time at the beginning of the experimental session, and three types of stimuli, sequentially presented on the screen at each trial. They were a central fixation point, a spatial cue of variable size, and a compound stimulus. The fixation point was a tilted "T" of 0.4◦ × 0.4◦ and the spatial cue was an empty box of 1◦ × 1 ◦ or 5 ◦ × 5 ◦ . The compound stimulus was a large letter (4.6◦ × 4.6◦ ) made of 13–17 small letters (0.6◦ × 0.6◦ ) spaced 0.4◦ in a 5 × 5 matrix. The large letter and its small elements represented the global and local level of the compound stimulus, respectively. The large letter could be an A, E, F, or H; the small elements were the remaining letters. The fixation point, the large box and the following compound stimulus were centered on the screen; the small box could randomly appear at one of the locations of the elements composing the compound stimulus.

#### The Attentional Task

Each trial consisted of the sequence of events represented in **Figure 1**. In five sixths of the trials (go trials), the compound stimulus contained a target letter (e.g., "H," **Figures 1**, **2**) either at the global or at the local level. Participants had to react as soon as possible to the target letter by pressing a RT-key with the right index finger while gazing at the fixation point. In the remaining trials (no-go trials), the compound stimulus did not contain the target letter and participants had to refrain from responding. Responses to no-go trials or responses with RTs shorter than 200 ms or longer than 2,500 ms were considered errors (anticipations and delayed responses, respectively) and were discarded. The response caused the offset of the compound stimulus for the next trial to begin after an inter-trial interval of 1,000 ms.

In 80% of the go trials, the size of the cue and that of the upcoming target were matched: a large cue was followed by a global target and a small cue by a local target at the same location (validly cued trials, **Figure 2** left). In the remaining 20% of trials, cue and target size were mismatched (invalidly cued trials, **Figure 2** right). Before the experiment, participants were instructed to focus their attention on the area of the visual field delimited by the spatial cue, without shifting their gaze, in order to react as soon as possible to a predefined target letter that would probably match cue size. Further instructions were aimed at forcing participants, in the case of cue-target mismatching, to directly switch from the global to the local level (attentional zooming in) or from the local to the global level (zooming out), avoiding visual search strategies. It was explained that when a large cue was not followed by a global target letter, the target was the local letter at the center of the screen; when a small cue was not followed by a local target letter at the cued location, the target was the global letter (**Figure 2**, right).

There were two blocks of 76 trials, one with short (150 ms) time interval between the onset of the cue and the onset of the target stimulus that follows (cue-target Stimulus-Onset-Asynchrony, SOA) and one with long (500 ms) SOA, lasting 3–4 min depending on SOA and reaction speed of the participant. Each block included four warm-up trials, 60 go trials and 12 no-go trials. Short and long SOAs were blocked and not randomized within blocks to avoid a bias in target expectancy. If SOAs were randomized within blocks, the probability (and therefore expectancy) of the target would increase after the short SOA was passed without target occurrence.

Testing was preceded by one block of practice trials to ensure that set acquisition reached a learning asymptote in both younger and older individuals. The minimum amount of practice (40 trials) could be automatically prolonged until a criterion frequency of 80% correct responses was reached. The order of the two blocks of trials with short and long SOA within each task, as well as the use of two of four possible target letters were counterbalanced across participants. Also, to reduce potential threats to internal validity deriving from the use of four different letters, all possible combinations with the remaining non-target

TABLE 1 | Background characteristics of the participants: gender, anthropometric data, steps, education, number of medications and diseases, retirement, smoking, and alcohol habits.


letters at the global and local level of the compound stimuli were balanced and randomized within blocks. Cue sizes and target levels were balanced and randomized within blocks. Particularly, the 50% frequency of global or local target occurrence allowed to balance the priming effects between consecutive global- or local-target trials (Robertson, 1996) that were estimated as a measure of switch costs (see "Switch Costs").

# PRELIMINARY COMPUTATIONS AND ANALYSES

Trials with response errors (responses with RTs shorter than 200 ms or longer than 2,500 ms) were discarded and median RTs were computed for correct trials separately for each type of trial. Median instead of mean RTs were used because of the disproportional contribution of outliers on mean RTs and the appropriateness of median values for positively skewed distributions, as RTs usually are, as long as RT differences, not absolute RTs, are relevant (Pesce et al., 2003). Thus, computations of RT differences of interest were performed on median RT data of correct trials to isolate the performance of the (i) orienting and (ii) executive control networks from the performance of the processing systems that, handling incoming stimuli and producing outputs, contribute to general information processing speed. Specifically, we computed (i) RT differences that reflect the efficiency of the exogenous (automatic) and endogenous (intentional) control of attentional orienting (Lauwereyns, 1998) and (ii) switch costs that reflect how an individual is able to cope with the cognitive flexibility requirements of the attentional task (Rogers and Monsell, 1995).

#### Attentional Orienting Effects

As common in spatial cueing paradigms (Chica et al., 2014), attentional orienting effects were generated by manipulating the validity of the spatial cue: cue and target size where most probably matching and only rarely mismatching (80 and 20% probability, respectively). Consequently, participants were expected to react faster on validly cued trials with targets matching in size the antecedent cue (**Figure 2**, left) and slower on invalidly cued trials with cue-target mismatching (**Figure 2**, right). To estimate the time needed to refocus attention when a misleading cue leads to focus attention at the wrong spatial scale, RT differences between invalidly and validly cued trials were computed as follows (**Figure 2**):


The attentional task was originally designed to tap the exogenous and endogenous control of attentional orienting jointly within the same task (Pesce et al., 2003). The abrupt onset of the direct cue was expected to elicit an automatic, short-lasting orienting of attention toward the cued area affecting performance at short SOA (Stoffer, 1993; Lamb et al., 2000). The informative value of the direct cues as to where the upcoming target should occur was expected to generate a lower-rising spatial expectancy affecting performance especially at longer SOA. Traditional views attributed the exogenous, stimulus-driven and the endogenous, intentional control of attention allocation to the ventral and dorsal networks, respectively. In recent years, this dichotomy has been replaced by a more interactive view attributing to the dorsal and ventral networks a joint role in both exogenous and endogenous control of attentional orienting to locations and features (Macaluso and Doricchi, 2013; Vossel et al., 2014). Thus, to have an overall estimate of the joint activity of the two networks responsible for attentional orienting, the above RT differences were computed merging short- and long-SOA trials. Means and standard deviations of median RTs calculated for the four types of trials used for the calculation of attentional orienting effects and the RT differences that reflect zooming in/out effects are presented in **Table 2**.

# Switch Costs

The structure of the present attentional task allowed assessing executive function by computing a classical index of cognitive flexibility and executive control of cognitive processes, labeled specific (or local) switch cost (Rogers and Monsell, 1995; Kiesel et al., 2010). Since the attentional task was composed of equally frequent trials with global or local target stimulus dimensions, presented in a random order within heterogeneous blocks, participants had to switch between global and local attending. In general terms, in tasks involving the switching between two tasks A and B within heterogeneous trial blocks, trial n + 1 may be a repetition of task A or B (A–A or B–B) or an alternation of tasks A and B (A–B or B–A). Specific switch costs are computed as the difference between the RT for repetition trials and the RT for switch trials. Especially when task switching is explicitly cued, switch costs are proven to index the duration of a true executive control process of task set reconfiguration that must suppress the proactive interference from the previous, no longer appropriate stimulus-response mapping and activate a new relevant task set (Jost et al., 2008).

In the present experiment, switches between global and local target features of complex visual stimuli were explicitly cued by the preceding spatial cue. To isolate the switch costs from differential attention orienting effects of validly vs. invalidly

cued targets, only trials with cue-target matching (80% of trials, comprising equally frequent large cue-global target and small cue-local target trials) were used for switch costs computation. Each trial was coded as "switch trial" or "non-switch trial" according to whether it was preceded by a trial with a target at the different or the same object level, respectively. Thus, four types of trials were identified (**Figure 3**):


Median RTs were computed separately for each type of trial. Means and standard deviations of median RTs calculated for the four types of trials are presented in **Table 3**. Switch costs were calculated as RT differences between switch trials and non-switch trials, representing an estimate of the time required to switch from attending to the global level of a visual object on trial n to attending to the local level on trial n + 1, or vice versa:


TABLE 2 | Means ± SD of median Reaction Times (ms) of community-dwelling non-diabetic elderlies (n = 130) and co-aged diabetics (n = 22, within brackets) calculated for the four types of validly/invalidly cued trials and reaction time (RT) differences computed to estimate spatial cueing (zooming) effects.


SG, small cue-global target; LL, large cue-local target; SL, small cue-local target; LG, large cue-global target. Data are collapsed across short and long Stimulus-Onset Asynchronies (SOAs).

#### Error Rates

Three types of error rates were calculated: real response errors (responses to no-go trials), anticipated responses (RTs shorter than 200 ms), and delayed responses (RTs longer than 2,500 ms). They were computed both as overall error rates and separately for the different types of experimental conditions used to obtain attentional orienting effects and switch costs (**Table 4**). Since anticipated

responses were overall very low (2%), they were not analyzed further.

## RESULTS

# Effects of Age and Physical Activity in Aging Individuals

The first question regarded whether the efficiency of the attentional networks responsible for attentional orienting and executive control are differentially affected by transitions from late middle-aged to young-old and old adulthood and whether physically active habits may counteract the hypothesized age-related deterioration. To address this question, RT differences of community-dwelling aging individuals, computed to estimate attentional orienting effects and switch costs, were regressed on age with daily steps as moderator and gender and BMI as covariates. The rationale for including BMI as a covariate was that we aimed at disentangling the role played by habitual PA from weight status, whose independent or joint influence on cognition across the lifespan and in aging is still an issue of debate (Memel et al., 2016; Chang et al., 2017), but goes beyond the aim of the present study.

TABLE 3 | Means ± SD of median Reaction Times (ms) of community-dwelling non-diabetic elderlies (n = 130) and co-aged diabetics (n = 22, within brackets) calculated for the four types of switch and non-switch trials used to compute RT differences as estimates of switch costs.


Data are collapsed across SOAs.

This moderated regression model entailed the following steps: (1) computing the interaction variable by multiplying age and daily steps (after centering them); (2) performing a hierarchical multiple regression analysis for the prediction of RT by age, daily steps, and their interaction term. Gender and BMI were statistically controlled for by entering them in a first block, while the individual predictors (age and daily steps) were entered

TABLE 4 | Average percentage errors of community-dwelling non-diabetic elderlies (n = 130) and co-aged diabetics (n = 22, within brackets) calculated for the four types of validly/invalidly cued go trials and the two types of no-go trials.


SG, small cue-global target; LL, large cue-local target; SL, small cue-local target; LG, large cue-global target.

in a second block and their interaction term in a third block. (3) In case, the interaction term significantly predicted RT, post hoc analysis through simple slope test was performed (Aiken and West, 1991). The statistical significance was set at p < 0.05.

Also overall reaction speed (absolute RTs), as well as accuracy (response errors and delayed responses) data were submitted to the same regression analysis models. This allowed ensuring that larger RT differences of interest would not be merely due to longer RTs in absolute terms, or that smaller RT differences would merely reflect a shift in speed-accuracy tradeoff setpoint. For instance, if older adults would show, as expected, longer RTs, this might lead to proportionally larger zooming effects and switch costs, which are RT differences. If such longer RTs and correspondently larger RT differences would be paralleled by lower rates of responses to no-go trials, it might be just due to the fact that older individuals traded speed for accuracy.

Since no reference data for a priori power analysis for multiple regression were available from aging studies with the employed attentional variables, post hoc achieved power (1−β) was computed with the G∗Power program (Faul et al., 2009).

#### Reaction Speed

The results of the analysis performed on overall RT are presented in **Table 5**, left. There was a gender difference in RT in favor of females (744 ± 126 vs. 782 ± 141 ms) and a direct relationship between age and RT, indicating that RT slows down with increasing age. However, there was a further small, but significant percentage of variance explained by the interaction between age and daily steps, suggesting that the effect of age on RT was moderated by the activity level. Simple slope testing (**Figure 4**) showed a buffering effect of the moderator on the predictor. While in low-active adults, older age predicted longer RT, this negative effect of age on RT was not present in high-active adults along the entire age range from late middle-age to old adulthood. Post hoc observed power (1−β) was 0.99.

#### Attentional Orienting Effects

The results of the analysis performed on attentional orienting effects are presented in **Table 5**, middle, separately for the two directions of the attentional zooming. This distinction was deemed necessary because the size of the two effects differed greatly, with zooming out effects being averagely almost absent with a huge interindividual variability (**Table 2**). Regardless of zooming direction, there was a significant prediction by age. Additionally for zooming out, there was a further small, but significant percentage of variance explained by the interaction between age and daily steps. In contrast to what observed in the case of overall RT, simple slope testing showed an inverse relationship between age and the size of the attentional zooming effect and an amplifying effect of the moderator (**Figure 5A**). While low-active adults showed an averagely almost absent zooming out effect regardless of age, high-active adults showed such effect, but with an agerelated decrement from late middle-aged to old adulthood. Visual inspection of single slopes showed a similar, but non-significant pattern of results for zooming in effects (**Figure 5B**). Post hoc observed power (1−β) from the analysis of zooming out and zooming in effects was 0.98 and 0.88, respectively.

#### Switch Costs

The results of the analysis performed on switch costs are presented in **Table 5**, right, separately for the two directions of local-to-global and global-to-local switches. Similar to what explained for the zooming effects, this distinction was deemed necessary also for switches of attention between global and local features of visual objects. Also in this case, the effect in one switch direction was averagely not detectable (i.e., small negative value, **Table 3**). Results of the regression analysis evidenced only a small, but significant prediction by age of local-to-global switch costs, with increasing switch costs at older age. This direct relationship was not moderated by PA level (**Figure 6**). Post hoc observed power (1−β) from the analysis of localto-global and global-to-local switch costs was 0.82 and 0.54, respectively.

#### Error Rates

The same model of regression analysis performed on delayed responses yielded a large percentage of variance explained by age (R <sup>2</sup> = 0.24, std β = 0.49, t = 5.86, p < 0.001). The older the person, the larger the amount of delayed responses (**Figure 7A**). This age effect was not moderated by PA level, whereas an interactive prediction by age and PA level emerged from the analysis of response errors (R <sup>2</sup> = 0.10, std β = 0.23, t = 2.62, p = 0.010). Simple slope testing (**Figure 7B**) showed that high-active adults, as compared to their low-active counterparts, had lower rates of responses to no-go trials at late middleage, but higher rates at old adulthood, due to the presence of an incremental trend as a function of age in high-active participants only. Post hoc observed power (1−β) from the analysis of delayed responses and response errors was 1.0 and 0.97, respectively.


TABLE 5 | Hierarchical regression models testing moderated prediction of overall RT, attentional orienting (zooming) effects, and switch costs in community-dwelling elderlies (n = 130).

Total R<sup>2</sup> explained and standardized β coefficients with t-values and significance level are reported. out, in = spatial attentional zooming out or zooming in effects; LtG, GtL = local-to-global or global-to-local switch costs.

their significance are reported for each single slope of the moderated prediction (A), or for the main slope (dotted line) of the non-moderated prediction (B).

# Effects of Diabetic Status and Physical Activity in Aging Individuals

The second question of the present study regarded whether the diabetic status affects the cognitive functions of interest and PA level may buffer diabetes-related cognitive impairments. To address this question, further regression analyses were performed on the same dependent variables, but contrasting the data of the 22 late middle-aged, young-old, and old adults with type 2 diabetes recruited for this study with those of a subsample of 22 non-diabetic individuals selected from the main sample. Matching criteria for selection were: gender, age (± 1 year), and mean number of daily steps closest to that of the diabetic and non-diabetic participants (correlation between daily steps of age- and gender-matched pairs of diabetic participants: r = 0.96, p < 0.001). By matching diabetics and non-diabetics for daily steps, we aimed at isolating the hypothesized attentional differences due to the diabetic status from those expectedly due to lower PA levels and related lower fitness in diabetics (Albright et al., 2000), according to the cardiovascular fitness hypothesis of chronic PA effects on cognition (Stillman et al., 2016). In a moderated regression model, BMI was statistically controlled for by entering it in a first block, while the individual predictors (diabetic/non-diabetic status and daily steps) were entered in a second block and their interaction term in a third block.

#### Reaction Speed

The results of the analysis performed on overall RT showed that diabetic status, and daily steps inversely predicted reaction speed (R <sup>2</sup> = 0.16; diabetic status: std β = 0.29, t = 2.15, p = 0.038; daily steps: std β = −0.27, t = −2.09, p = 0.043), after accounting for the significant prediction accrued by BMI (R <sup>2</sup> = 0.16, std β = 0.30, t = 2.21, p = 0.033). Diabetics showed longer RTs than their non-diabetic counterparts (**Figure 8A**), but a similar, inverse relationship between higher PA level and shorter RT (**Figure 8B**). The relationship between BMI and RT was direct, with a higher weight predicting longer RT in both diabetics and their non-diabetic counterparts, who marginally (p = 0.06) differed in BMI (diabetics: 29.4 ± 4.4; non-diabetics: 27.3 ± 4.43). Post hoc observed power (1−β) was 0.99. Moreover, to estimate if the absence of interaction between diabetic status and daily steps reflected truly independent effects, or a lack of power (Stone-Romero et al., 1994), a further post hoc power analysis for differences between slopes in moderated regression

FIGURE 7 | (A) Prediction of delayed responses to go trials accrued by age without any significant moderation by PA level (daily steps); (B) prediction of response errors (responses to no-go trials) accrued by age and moderated by PA level. Solid lines: change in the slope of the predictor for high vs. low PA levels (1 SD change); β values and their significance are reported for the main slope (dotted line) of the non-moderated prediction (A), or for each single slope of the moderated prediction (B).

with diabetic/non-diabetic status as a dichotomous moderator was computed. Its low value (0.26) indicated lack of power.

#### Attentional Orienting Effects, Switch Costs, and Error Rates

Although descriptive statistics show noticeable differences in zooming out effects and local-to-global switch costs, regression analyses did not reveal any significant prediction of RT difference and accuracy variables accrued by health status and/or PA level. Post hoc observed power (1−β) from the analysis of zooming out and zooming in effects, local-to-global and global-to-local switch costs was 0.46, 0.49, 0.61, and 0.42, respectively.

## DISCUSSION

The present study aimed to investigate the independent and interactive effects of aging and objectively measured PA levels

on performance of the orienting and executive control networks in community-dwelling aging individuals and diabetics. In sum, the results show a pattern of effects suggesting that there is a generalized detrimental impact of aging on information processing speed, attentional effects, and performance accuracy. However, being physically active seems to partially dampen this age-related deterioration, exerting a protective effect on processing speed and on the ability to orient attention toward locations and objects in the visual field, as well as avoiding an age-related shift toward accurate, but slowed performance on the speed-accuracy trade-off. Instead, different from the general claim that PA is especially beneficial to executive function (Colcombe and Kramer, 2003; Gajewski and Falkenstein, 2016), physically active habits appear to neither outweigh, nor attenuate the detrimental effect of aging on the executive control processes involved in task set reconfiguration. Furthermore, diabetic status and PA level resulted to affect processing speed in opposite directions, whereas they did not affect the performance of the orienting and executive control networks as reflected in orienting effects and switch costs.

To our knowledge, this was the first study of PA effects on cognition in aging people to investigate the performance of the orienting and executive control networks in combination in one task. Previous research combining the investigation of different attentional networks have been performed only in the area of acute exercise research by adopting Posner and Petersen's (1990) attention network test that combines in one task warning signals prior to targets (alerting), cues that direct attention toward potential target locations (orienting) and target stimuli surrounded by congruent or incongruent flankers (executive control; Huertas et al., 2011; Chang et al., 2015). Differently, the attentional test developed by Pesce et al. (2003) and used for the present study merges typical features of the spatial orienting paradigm (Chica et al., 2014) with hierarchically built visual objects that contain global or local target features (Navon, 1977). The use of direct and informative cues with different cuetarget SOAs and a low percentage of misleading cues allows tapping the initially exogenous and then strategic control over attention according to the informative value of the cue and the attentional re-orienting following miscued targets, led by the dorsal and ventral networks, respectively (Corbetta and Shulman, 2002; Petersen and Posner, 2012). The initial blocked task instruction, followed by the random presentation of global and local targets allows tapping true executive processes of stable set maintenance and adjustment of executive control on a trial-by-trial basis to switch attention between global and local attending led by the cingulo-opercular and fronto-parietal networks, respectively.

First, older age predicted lengthened RT, but only in the case of low-active individuals (**Figure 4**). This is in line with evidence of generalized slowing of information processing speed at old age (Birren and Fisher, 1995) and results of previous aging studies performed with the present attentional paradigm, which showed faster reaction speed in older athletes than in sedentary co-aged individuals (Pesce et al., 2005, 2007a, 2011). However, results also suggest that from late middle-age to old adulthood there is a differential shift in speed-accuracy tradeoff setpoint between low-active and high-active individuals. In fact, high-active late middle-aged individuals showed averagely longer RTs (**Figure 4**), but lower rates of responses to no-go trials (**Figure 7B**). The pattern of results was reversed at older age, since high-active individuals were faster in responding than their low-active counterparts, but made more response errors. It seems that with increasing age, low-active older adults trade speed for maintaining accuracy as a compensatory strategy (Spirduso et al., 2005), whereas high-active individuals trade accuracy for maintaining speed of performance.

High PA levels seem also to dampen the age-related decline of efficiency of the orienting system. It must be pointed out that the size of the RT differences computed to estimate orienting effects and switch costs has an opposite meaning. As regards the orienting (zooming) effect, it represents the difference in RT between validly and invalidly cued trials. A large RT difference means that the individual was able to strategically orient attention toward the cued area, thus shortening the RT to validly cued target and had to pay a RT cost in the rare cases of miscued targets. This ability seems relatively scarce in low-active individuals already at late middle-age, when highactive individuals instead show a preservation of orienting ability reflected in a higher orienting effect (**Figure 5A**). Nevertheless, the active lifestyle no longer seems to buffer the age-related deterioration in old adulthood. The negligible size of the zooming out effect is attributable to the fact older adults show a typical local attending deficit (Pesce et al., 2005) that lengthens RT particularly when local targets and the preceding cue are not presented foveally, as it is the case for RT on valid local cuelocal target trials that was used as subtrahend for the computation of the zooming out effect (**Figure 2**). The huge interindividual variability in zooming out effect is therefore an indicator that some individuals succeeded in overcoming the typical age-related local attending deficit, thus showing a positive zooming out effect, but other did not. A similar, but non-significant trend emerged for the zooming in effect (**Figure 5B**).

The graphical representation suggests that those, who succeeded were high-active late middle-aged individuals. Since orienting effects are RT differences, the larger effect in high-active late middle-aged individuals would be meaningless if it would be paralleled by a corresponding increment in absolute RT. This was not the case, as they showed lowest RTs. This strengthens the interpretation that being physically active helps overcoming age-related attending deficits. Intriguing neuroimaging evidence, while confirming the hypothesis that gains in cardiovascular fitness lead to enhanced neural efficiency, also shows a unique relationship between coordination training at old age and increased activation in the visuo-spatial orienting network (Voelcker-Rehage et al., 2011). The counteracting effect of overall PA on the age-related decline of attention orienting performance found in the present study might be therefore attributable, at least in part, to the coordinative demands of being physically active, since our objective measure of overall PA tapped a variety of possible activities at workplace, sports, house holding.

However, in studies of aging effects on the attentional networks, the most pronounced deterioration has been reported

for the executive control network (Mahoney et al., 2010), which is also reported to be the primary locus of the beneficial effects of PA and exercise (Etnier and Chang, 2009), particularly at old age (Colcombe and Kramer, 2003; Gajewski and Falkenstein, 2016). In the present study, we focused on cognitive flexibility, a core executive function needed to switch attention between tasks that we measured by means of local switch costs. In contrast to the orienting effect, whose size reflected the ability to exert topdown control over attention to adhere to task requirements, local switch costs represented the inability to adjust executive control flexibly according to the unpredictable need to switch between global and local attending. Thus, the higher the switch cost, the lower the efficiency of the executive control. This type of costs is indeed thought to reflect the executive processes needed to deactivate a previous task set in favor of the actually relevant one. Differently from many other aspects of executive function that are benefited by PA, this type of cost was not positively affected by the PA level of the participants, but only negatively by age (**Figure 6**). This age-related decline was observed only for local-to-global switch costs, because global-to-local switch costs were biased by interacting spatial orienting effects. The strong automatic capture of attention by small cues interfered with the allocation of attention to visual objects (Goldsmith and Yeari, 2003), overweighing the persistence of attention on the last attended object that should facilitate RT in the case of consecutive local target trials used as subtrahend for the computation of the global-to-local switch cost (**Figure 3**; Pesce and Audiffren, 2011).

In sum, our findings parallel and extend to overall PA the notion that regular participation in exercise training, regardless of exercise mode, facilitates reaction speed, but is uninfluential on local switch costs (Dai et al., 2013). In their study across the lifespan, Pesce and Audiffren (2011) found that age and sport expertise independently predicted lower switch costs, whereas we could not find any effect by PA level. Taken together, the result suggests that not PA per se, but the cognitive demands inherent in many sports may be the mediator of PA effects on the executive control networks and particularly the fronto-parietal network (Pesce, 2012). This hypothesis refers to the "cognitive component skills approach" (Voss et al., 2010), suggesting that sport-related cognitive expertise may transfer to sport-unspecific tasks requiring fundamental cognitive abilities. The interpretation of the absence of PA effects on switch costs in the present study is in accordance with the finding that not PA, but cognitive training interventions in aging seem to have the potential to positively impinge on the plasticity of those specific processes and underlying neural substrate responsible for task switching ability (Gajewski and Falkenstein, 2012).

The second aim of the study was to investigate whether in aging diabetics, who are at risk of poor cognition and especially executive dysfunction (Qiu et al., 2006; Okereke et al., 2008; Luchsinger, 2012), a physically active lifestyle counteracts the deterioration of the ability to exert executive control over attention, which is particularly relevant for this special population. The outcomes of this study do not show impairments of specific aspects of the attentional networks performance as compared to non-diabetic co-aged participants, but only a worse information processing speed (**Figure 8A**). A physically active lifestyle seems beneficial to their processing speed to the same extent as it benefits performance of non-diabetic aging individuals. This means that the vascular pathologies that characterize the diabetic status and may be responsible for cerebrovascular disease and cognitive dysfunction (Luchsinger, 2012; Umegaki, 2014) can be counteracted – at least in terms of efficiency of the processes responsible for perceiving and responding – by physically active habits. Instead, the question if an active lifestyle counteracts the deterioration of the attention networks performance in diabetics needs further exploration, since this study resulted underpowered for that type of variables.

The study has further limitations that must be addressed. Merging the spatial orienting paradigm with the task switching between global and local stimulus features has the advantage to tap different attention networks with one task, but also led to some biases. One reason for the absence of PA effects on switch costs might be the relatively small size of such costs probably due to the presence of a spatial cue to switch. This anticipated information, typical of cueing paradigms, generally results in smaller switch costs (Wasylyshyn et al., 2011). This reflects an influence of the orienting network on the executive network, with the latter taking advantage from the information provided by the first to revolve a conflict and switch sooner (Callejas et al., 2005). Furthermore, the spatial cueing with small cues narrowed and captured attention, overweighing the global-to-local switch effect (Goldsmith and Yeari, 2003; Pesce and Audiffren, 2011). It is therefore possible that in our study, due to the influence of the advance cues on local switch costs, benefits by PA level could be detected only for attention orienting.

Thus, an outlook for future research is to use an interactionist approach to the study of PA effects on the attentional networks in aging (Callejas et al., 2005). The present study assessed local switch costs in heterogeneous blocks of spatially cued trials, but did not consider global switch costs in homogeneous trial blocks. Instead, beneficial PA effects in aging could be found for both local and global switch costs in uncued task switching (Hillman et al., 2006; Themanson et al., 2006). An interactionist approach with/without spatial cueing in both heterogeneous and homogeneous blocks of trials might further our understanding of whether the orienting of attention, preserved by an active lifestyle at least in late middle-age, is able to raise the efficiency of the executive control networks responsible for the adaptability of top-down control on a trial-by-trial basis and the stability of top-down control for set maintenance, respectively (Dosenbach et al., 2008).

A reason that may have prevented to detect a buffering effect of PA on the attentional performance of diabetics, as instead found for the orienting performance of non-diabetics, is the relatively small sample size and the intrinsically low power of moderated multiple regression analysis, particularly when the moderator is a dichotomous variable (Stone-Romero and Anderson, 1994; Stone-Romero et al., 1994). Extending the sample of diabetic elderlies can help distinguish true absence of PA effects on the attentional networks from power and generalizability issues of this convenience sample.

# CONCLUSION

fnhum-11-00107 March 2, 2017 Time: 16:34 # 15

Adequate levels of PA may positively influence the brain processes and systems responsible for information processing speed and the strategic control of the orienting of attention, but seem uninfluential on the ability to exert executive control for switching attention, which, instead, seems positively influenced by participation in cognitively demanding sports (Pesce and Audiffren, 2011). The differential association of physically and/or cognitively challenging activities to different attentional functions may provide the basis to design interventions for successful attentional aging tailored to exploit the multifaceted nature of the concept of an enriched environment, including PA and challenging cognitive tasks (Hertzog et al., 2009; Kraft, 2012). Given the broad range of unstructured, daily-life activities and structured exercise or grassroots/competitive sports composing overall PA levels measured in this study, our results add to the evidence that both daily-life PA as walking (Yaffe et al., 2001; Abbott et al., 2004) and sports participation (Pesce et al., 2007a; Zhao et al., 2016) may act as protective factors against cognitive decline in elderlies. These two main components of an active lifestyle (Condello et al., 2016) should be promoted by actions that impact the built environment to render it more conducive to PA (Saelens and Handy, 2008) and support physically active habits and sport participation until old age (Baker et al., 2010).

# AUTHOR CONTRIBUTIONS

GC: Data acquisition with relevant role in data acquisition coordination, analysis and interpretation and drafting of the work, final approval of the version to be published and agreement to be accountable for all aspects of the work. RF: Data interpretation, drafting and critical revision of the work for important intellectual content with specific contribution as regards aging issues, final approval of the version to be published and agreement to be accountable for all aspects of the work. SF: Data acquisition and analysis, contribution to

#### REFERENCES


drafting the work, final approval of the version to be published and agreement to be accountable for all aspects of the work. JS: Interpretation of data and critical revision of the work for important intellectual content, final approval of the version to be published and agreement to be accountable for all aspects of the work. ADB: Interpretation of data and critical revision of the work for important intellectual content, final approval of the version to be published and agreement to be accountable for all aspects of the work. LC: Contribution to conception of the work with relevant role in project coordination, critical revision of the work, final approval of the version to be published and agreement to be accountable for all aspects of the work. CP: Main role in the conception and design of the work, creation of the attentional test, data analysis and interpretation, drafting of the work with specific contribution as regards the physical activity-attention relationship, final approval of the version to be published and agreement to be accountable for all aspects of the work.

# FUNDING

This research was granted by Italian Ministry of Education, University and Research (MIUR) as a part of a Project of National Interest (PRIN): Impact of Physical Activity on healthy aging: multidisciplinary analysis of mechanisms and outcomes (2010KL2Y73\_003).

# ACKNOWLEDGMENTS

We thank all the participants who took part in the study and the master students who cooperated to this research project. We also thank the National Pensioners' Federation of the Italian National Confederation of the Labor Unions (CISL), the national secretary of the Department of Social and Health Policies, Attilio Rimoldi and Brigida Modesti for their indispensable contribution to recruitment. Moreover, we thank both reviewers and particularly FHO for very detailed and precious suggestions.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Condello, Forte, Falbo, Shea, Di Baldassarre, Capranica and Pesce. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A 12-Week Cycling Training Regimen Improves Gait and Executive Functions Concomitantly in People with Parkinson's Disease

Alexandra Nadeau1,2,3 \*, Ovidiu Lungu1,2,4,5, Catherine Duchesne1,2,3 , Marie-Ève Robillard1,2, Arnaud Bore1,2, Florian Bobeuf1,6, Réjean Plamondon<sup>7</sup> , Anne-Louise Lafontaine1,2,8, Freja Gheysen<sup>9</sup> , Louis Bherer1,6,10,11 and Julien Doyon1,2,3 \*

<sup>1</sup> Research Center of the University Institute of Geriatrics of Montreal, Montreal, QC, Canada, <sup>2</sup> Functional Neuroimaging Unit, Montreal, QC, Canada, <sup>3</sup> Department of Psychology, University of Montreal, Montreal, QC, Canada, <sup>4</sup> Department of Psychiatry, University of Montreal, Montreal, QC, Canada, <sup>5</sup> Centre for Research in Aging, Donald Berman Maimonides Geriatric Centre, Montreal, QC, Canada, <sup>6</sup> PERFORM Centre, Concordia University, Montreal, QC, Canada, <sup>7</sup> Department of Electrical Engineering, Polytechnique Montreal, Montreal, QC, Canada, <sup>8</sup> McGill Movement Disorder Clinic, McGill University, Montreal, QC, Canada, <sup>9</sup> Department of Movement and Sport Sciences, Ghent University, Ghent, Belgium, <sup>10</sup> Department of Medicine, University of Montreal, Montreal, QC, Canada, <sup>11</sup> Montreal Heart Institute, Montreal, QC, Canada

#### Edited by:

Stephane Perrey, University of Montpellier, France

#### Reviewed by:

Johanna Wagner, University of California, San Diego, USA João Manuel R. S. Tavares, University of Porto, Portugal Wei Peng Teo, Deakin University, Australia

\*Correspondence:

Alexandra Nadeau alexandra.nadeau.3@umontreal.ca Julien Doyon julien.doyon@umontreal.ca

Received: 12 September 2016 Accepted: 26 December 2016 Published: 12 January 2017

#### Citation:

Nadeau A, Lungu O, Duchesne C, Robillard M-È, Bore A, Bobeuf F, Plamondon R, Lafontaine A-L, Gheysen F, Bherer L and Doyon J (2017) A 12-Week Cycling Training Regimen Improves Gait and Executive Functions Concomitantly in People with Parkinson's Disease. Front. Hum. Neurosci. 10:690. doi: 10.3389/fnhum.2016.00690 Background: There is increasing evidence that executive functions and attention are associated with gait and balance, and that this link is especially prominent in older individuals or those who are afflicted by neurodegenerative diseases that affect cognition and/or motor functions. People with Parkinson's disease (PD) often present gait disturbances, which can be reduced when PD patients engage in different types of physical exercise (PE), such as walking on a treadmill. Similarly, PE has also been found to improve executive functions in this population. Yet, no exercise intervention investigated simultaneously gait and non-motor symptoms (executive functions, motor learning) in PD patients.

Objective: To assess the impact of aerobic exercise training (AET) using a stationary bicycle on a set of gait parameters (walking speed, cadence, step length, step width, single and double support time, as well as variability of step length, step width and double support time) and executive functions (cognitive inhibition and flexibility) in sedentary PD patients and healthy controls.

Methods: Two groups, 19 PD patients (Hoehn and Yahr ≤2) and 20 healthy adults, matched on age and sedentary level, followed a 3-month stationary bicycle AET regimen.

Results: Aerobic capacity, as well as performance of motor learning and on cognitive inhibition, increased significantly in both groups after the training regimen, but only PD patients improved their walking speed and cadence (all p < 0.05; with no change in the step length). Moreover, in PD patients, training-related improvements in aerobic capacity correlated positively with improvements in walking speed (r = 0.461, p < 0.05).

Conclusion: AET using stationary bicycle can independently improve gait and cognitive inhibition in sedentary PD patients. Given that increases in walking speed were obtained

through increases in cadence, with no change in step length, our findings suggest that gait improvements are specific to the type of motor activity practiced during exercise (i.e., pedaling). In contrast, the improvements seen in cognitive inhibition were, most likely, not specific to the type of training and they could be due to indirect action mechanisms (i.e., improvement of cardiovascular capacity). These results are also relevant for the development of targeted AET interventions to improve functional autonomy in PD patients.

#### Keywords: Parkinson's disease, exercise, gait, aerobic, stationary bicycle

# INTRODUCTION

Parkinson's disease (PD) is a neurodegenerative pathology characterized by progressive motor symptoms, including gait modifications leading to balance instability (Bello et al., 2010). Patients can also develop several non-motor complications, such as depression, sleep disturbances and cognitive impairments like executive dysfunctions (Speelman et al., 2011) and deficits in procedural learning (Clark et al., 2014; Ruitenberg et al., 2015). Despite advances in pharmacological agents and surgical procedures that could be employed in PD patients to alleviate the primary motor signs of the disease, these treatment options often fail to improve the whole range of symptoms observed in PD and side effects are common (Bloem et al., 2004). Recently, exercise has been proposed as an adjuvant therapy that may help in alleviating multiple symptoms, but very little is know about the impact and the mechanisms of such alternatives.

Despite the fact that the link between gait and cognitive functions, especially executive functions, is well documented in aging research (Springer et al., 2006; Yogev-Seligmann et al., 2008; Liu-Ambrose et al., 2010; Martin et al., 2013), to date, only a few studies have investigated the same relationship in PD patients. In contrast, there is growing evidence documenting the association between standing balance and gait initiation with cognition in PD population, but in a dual-tasking context (Fernandes et al., 2015, 2016). The latter have shown that some cognitive processes, such as executive functions, processing speed and semantic fluency (Smulders et al., 2013; Stegemoller et al., 2014) are associated with some gait parameters and functional mobility in PD. In addition, several studies have shown independently that non-pharmacological treatment approaches, such as physical exercise (PE), do improve various gait parameters in PD (Mehrholz et al., 2010; Li et al., 2012; Shu et al., 2014; Arcolin et al., 2015), on the one hand, and executive functions, on the other hand (Tanaka et al., 2009). Yet, evidence that this type of intervention can simultaneously improve motor (such gait) and non-motor symptoms (executive functions, motor learning) in PD is non-existent and the mechanisms by which training produces changes in both of these components remain unknown.

We have recently reported that an aerobic exercise training (AET) regimen using a stationary bicycle is not only safe for PD patients in early stages of the disease, but that it has also improved aerobic capacity as well as cognitive inhibition and motor sequence learning (MSL; Duchesne et al., 2015). In the current study, our main objective was to assess the effects of an AET regimen using stationary bicycling on gait parameters in sedentary people with PD (not reported in the previous study). As a second objective, we set out to compare these effects to those observed in healthy adults (HA) in order to determine whether this type of intervention has a different impact depending on the participant's health status. In addition, as a third objective, we intended to evaluate the associations between exercise-related changes in gait (not reported in the previous study) with those seen in cardiovascular capacity, executive functions and motor sequence capacity (Duchesne et al., 2015). We hypothesized that: (1) bicycle training would improve gait parameters in all participants (especially speed and cadence based on the specificity of the bicycle training), but especially those diagnosed with PD, (2) such improvements in gait parameters would correlate with other AET-related improvements, such as cardiovascular capacity, cognitive inhibition and the capacity to learn a new sequence of movements, and that (3) these relations would be moderated by disease.

# MATERIALS AND METHODS

# Participants

In order to be eligible for the study, all participants (HA and those with PD) had to be right-handed, sedentary, and aged between 40 and 80 years old. They were screened for the presence of possible dementia score between 24 and 30 needed on the Mini Mental State Evaluation (Folstein et al., 1975) or on the Montreal Cognitive Assessment (Marinus et al., 2011) and appropriateness for testing in an MRI environment (e.g., no metallic implants that could interfere with testing, no claustrophobia). The Physical Activity Readiness Questionnaire (PAR-Q) was used to verify the participant's safety in participating in a physical program. Exclusion criteria included other neurological disorders, and comorbidities likely to affect gait, smoking or heart diseases. Importantly, HA were matched with PD patients with respect to sex distribution, age, education as well as cognitive and fitness levels. PD patients had to be classified as stage 1 or 2 according to Hoehn and Yahr's scale based upon evaluation of a certified neurologist (A-LL). Participants who were under medication continued their treatment all throughout the study (testing and training). This study was carried out in accordance with the recommendations of the research ethics committees' guidelines of the Research Center of the University Institute of Geriatrics of Montreal, which approved the protocol. Written and informed consent was obtained from each participant in this study. Demographic characteristics of the samples are presented in **Table 1**.

#### Exercise Intervention Protocol

fnhum-10-00690 January 10, 2017 Time: 16:43 # 3

Prior to engaging in the training regimen, all participants were cleared by a physician, who analyzed the electrocardiogram (ECG) at rest in order to rule out any cardiac anomalies. The aerobic exercise intervention was designed to improve cardiorespiratory fitness with an exercise intensity prescription based on each participant's maximal aerobic power output achieved at maximum volume of oxygen (VO<sup>2</sup> peak) uptake assessed on the pre-test day (ACMS, 2006). Recumbent bicycles were used to train participants. Duration of the exercise program started at 20 min and 60% of intensity per session, and was then increased by steps of 5 min and 5% of intensity every week, until participants reached 40 min of training at 80% intensity. Bike speed was maintained at 60 revolutions per minute (RPM). As such, to achieve the desired bike resistance power and adjust intensity level (if needed), the work intensity was based on power output (Watt), controlling for participant's heart rate. In addition, rate of perceived exertion (Borg scale; Borg, 2012) was assessed during each training session. The program lasted 12 weeks, with three training sessions per week. A participation rate equivalent to 75% of the sessions was achieved by each participant included in the data analyzes. Trained kinesiologists supervised all sessions.

## Assessments

Participants were evaluated on a variety of outcome measures before the intervention and immediately after completion of the 3-month exercise program.

Lower limb capacities were assessed with the GaitMat II (E.Q. Inc., Chalfont, PA, USA; Barker et al., 2006). The GaitMat II consists of a 7.8-m long walkway and its computer software, which controls the mat sensors and calculates different metrics of gait. The mat is also equipped with initial and final 1 m inactive sections that allow acceleration and deceleration of the participant locomotion. One trial consisted of participants walking the full length of the Mat at their self-selected walking speed. After a practice of two trials, participants completed four more trials for data collection and measurements (walking speed, cadence, step length, step width, single and double support time, as well as variability of step length, step width and double support time).

To evaluate the patient's mood, the Beck Depression Inventory (BDI; Beck et al., 1961) and the Beck Anxiety Inventory


Means ± SD. HA, healthy adults; PD, Parkinson's diseases individuals; N/A, non-applicable; s, seconds; m, meters; min, minute.

(BAI; Beck et al., 1988) were used. Also, executive functions were assessed, precisely cognitive inhibition and flexibility. Participants' inhibitory aptitude was assessed using a version of the Stroop test with three different conditions (naming, reading, and interference). Each condition contained 100 stimuli (i.e., words, colored rectangles, words in colors) printed on a 21.5 cm × 28 cm sheet of paper. In the reading condition, participants had to read the words (red, green, blue, and yellow) printed in black. In the naming condition, subjects had to name of the rectangles. In the third condition (interference), individuals needed to name the color of the ink in which the words were written. In the latter condition, the meaning of each word had to be ignored, as it was incongruent with the color to name (i.e., the word "blue" written in yellow). The trail Making Test (TMT) was used to assess subjects' flexibility functions. The first part of the test (TMT A) included numbers from 1 to 25, circled and written on a 21.5 cm × 28 cm sheet of paper. Participants were asked to connect with a pencil, as fast as possible, the numbers in numerical order. In contrast, the second part (TMT B) included numbers from 1 to 13 and letters from A to L. Subjects were asked to connect, as fast as possible, a number followed by a letter in numerical and alphabetic order, respectively (i.e., 1-A-2-B-etc.). The participants' capacity in MSL was evaluated during a functional magnetic resonance imaging session, where they had to perform an implicit serial reaction time task (Nissen and Bullemer, 1987). More details of the complete fitness, psychological, neuropsychological and motor learning evaluations can be found in our previous study by Duchesne et al. (2015).

#### Statistical Analysis

A repeated model ANOVA was used to test the effect of AET on primary and secondary outcomes in PD participants. In addition, a mixed model ANCOVA was carried out to assess group differences pre-post AET, as well as the effect of training within each group and group differences at baseline and after AET for all gait parameters. BDI scores and age were used as covariates for all analyses to account for group differences in sentiments of depression and age, two factors that may impact gait parameters such walking speed (Rochester et al., 2008). In order to account for the effect of multiple comparisons, the statistical significance was adjusted using the Bonferroni method. All results were expressed as means ± standard deviations for descriptive statistics. Pearson linear correlations between walking speed, cadence and step length with aerobic capacity, executive functions (inhibition and flexibility) and MSL (performance and learning scores) were tested to figure out if there is a link between gait and these factors among PD participants only. We then employed Hayes's (2009) free add-on SPSS macro to test whether the disease (present/absent) moderated the relationship between gait parameters and other variables of interest, i.e., if the relation between variables in PD is in the same direction than the group of reference. Training-related changes in cognitive scores, MSL, and aerobic capacity constituted the independent variables in the moderation model, while gait parameters that changed significantly as a result of training corresponded to the outcome variable. Analyses were conducted using SPSS 21.0 (IBM, Armonk, NY, USA). The level of statistical significance for all tests was set at p < 0.05.

#### RESULTS

Forty-four participants (21 PD patients and 23 HA) were deemed eligible to enroll in the study after the completion of the first evaluation. However, between the evaluation and the beginning of the exercise program, two HA decided to withdraw from the project for personal reasons. One PD participant and one HA were excluded after the beginning of the training regimen for health security reasons. Only one PD patient was excluded from analysis after AET regimen completion because of extreme results on several outcomes, even if this person respected all inclusion criteria. In the end, a total of 39 persons (19 PD patients and 20 HA) were analyzed. All demographic characteristics and initial values of the study participants are described in **Table 1**. There was no difference between groups for any of the gait parameters at baseline.

Following the 12-week AET, repeated measures ANOVA indicated that PD participants showed significant improvements for the walking speed (F1,<sup>18</sup> = 6.154, p < 0.05), the step length (F1,<sup>18</sup> = 5.828, p < 0.05) and the single support time (F1,<sup>18</sup> = 4.771, p < 0.05), with a trend for cadence (F1,<sup>18</sup> = 4.211, p = 0.055). When using the mixed ANCOVA model, the effect observed in PD for walking speed and single support time remained the same while the one obtained for step length disappeared (**Figure 1**). However, the trend observed for the cadence became significant when using covariates such the age and the BDI (p < 0.05). In addition, the groups did not differ significantly neither in pre- or post-comparisons, nor in regards to AET-related changes. All other gait parameters did not change significantly following the aerobic training (ps > 0.05) (**Table 2**).

Significant between-sessions differences were found in both groups for outcomes related to aerobic capacity (VO<sup>2</sup> peak), MSL capacity and cognitive inhibition (all p < 0.05), indicating that the training improved participants' fitness, procedural learning and cognitive inhibition, regardless of the health status. Given that these results were analyzed in detail and reported elsewhere (Duchesne et al., 2015), they are presented here only for reader's convenience in the Supplementary material (Supplementary Figure S1). However, these data were used to test the correlation with gait parameters among PD participants. We observed a significant association only between the walking speed at the post-test and the aerobic capacity after the AET (r = 0.461, p < 0.05, N = 19). No correlation was observed between other gait parameters and cognition or MSL.

A multiple regression model was then employed to investigate whether the association between pre-post change in walking speed and change in fitness depended upon the presence of disease (i.e., moderation). The relationship between change in fitness and change in walking speed was significantly moderated by the presence of the disease (R 2 increase due to the interaction: F1,<sup>32</sup> = 4.34, p < 0.05; conditional effect of change in fitness on change in walking speed: HA t = −1.08, p = 0.29, PD t = 1.79, p = 0.08). Specifically, patients with PD who increased

#### TABLE 2 | Spatiotemporal gait parameters during self-selected speed condition.


No significant differences between groups at baseline were found. Means ± SD. †A significant within-group difference from baseline. HA, Healthy adults; PD, Parkinson's disease patients; m, meters; s, second(s); min, minute; AET, aerobic exercise training. Bolded terms emphasize statistical differences.

their cardiorespiratory capacities the most also showed the best improvement in walking speed (**Figure 2**). By contrast, there was no significant relationship between these variables in HA (HA group: β = −0.162, p = 0.505; PD group: β = 0.596, p = 0.027). Also, no relationship between gait parameters and motor skill learning or executive functions was found alone or when investigating the moderation of these relationships by the presence of the disease.

#### DISCUSSION

In the current study, we investigated the effects of an AET regimen using stationary bicycling on gait parameters in sedentary HA and in PD patients. As reported previously by our group (Duchesne et al., 2015), such training regimen improved cardiovascular capacity, executive functions and motor learning capacities in both groups. Here, we report that AET also had a significant positive impact on cadence and walking speed in the PD group. Moreover, the presence of the disease mediated the relationship between aerobic capacity and walking speed, as the improvement in fitness correlated positively with that in walking speed of PD patients only. Contrary to our expectations, we did not find significant relationships between AET-related changes in gait parameters and cognition.

Importantly and as predicted, however, the present study yielded significant increases in walking speed and cadence in the PD group. The latter findings are consistent with previous studies indicating that 4–12 weeks of treadmill training improved walking speed, step length and step-to-step variability (Herman et al., 2009). This is also in accordance with other reports that resistance training, tai chi and physical therapy lead to improvements in walking speed and step length (Pellecchia et al., 2004; Li et al., 2012). Until now, studies on stationary bicycle training used forced exercise paradigm and observed improvements of dexterity, tremor and bradykinesia (Burini et al., 2006; Ridgel et al., 2009; Alberts et al., 2011). Finally, the fact that we did not find significant improvements in step length in PD patients, but observed significant increases in walking speed

and cadence, may be due to the nature of our AET program. Indeed, the pedaling rhythm during exercising has a built-in cadence, and thus, it is expected that training-specific effect of bicycling would be more pronounced in terms of cadence, as compared to other gait parameters. This is similar to the mechanism by which treadmill training will impact more step length and walking speed, rather than other gait variables (Fisher et al., 2008; Herman et al., 2009).

Contrary to our expectations, there was no significant relationship between changes in PD participants' executive functions, MSL, cardiovascular capacity and gait parameters. We

had hypothesized that we would find a positive and significant relation between executive functions and gait in the PD group, mostly because previous evidence suggested that some elements of cognition such as working memory and attention capacities, were associated with gait abnormalities in PD (Smulders et al., 2013). Our hypothesis was based on the fact that in a recent study, Sohmiya et al. (2012) found significant correlation between frontal assessment battery scores and changes in gait following physical therapy (Sohmiya et al., 2012). Yet it is important to note that in that study, PD patients with high, but not low, executive functioning scores improved walking speed, stride and step length after physical training. Furthermore, unlike in our sample, PD participants in these studies were in more advanced stages of the disease. Thus this suggests that the relationship between gait and executive functions may be more evident as the disease progresses, hence possibly explaining why we did not observe any relation between gait parameters and other cognitive and learning functions. The fact that AET improved certain gait parameters and cognitive functions, but that these changes did not correlate with each other suggest that independent action mechanisms underlie this therapeutic improvements, at least at this stage of the disease.

We used moderation analyses to investigate the extent to which age influenced the relationship between changes in participants' fitness levels as well as their executive functioning and motor learning capacities on the one hand, and gait parameters, on the other hand. Using such an approach, we found a significant moderation effect for the disease variable only, regarding the relationship between AET-related improvements in aerobic capacity and those in walking speed. It has previously been suggested that motor abilities in the PD population, such as gait, could be affected by various health conditions. For example, a decrease in cardiorespiratory capacity has been shown to affect walking speed (Skidmore, 2008). Therefore, our results, especially the positive correlation between improvements in VO<sup>2</sup> peak and those in walking speed in PD patients only, seem to support this hypothesis. In addition, they suggest that motor abilities may be improved in PD via non-pharmacological means, such as aerobic PE.

The main objective of our study was to assess of AET in PD patients. We used the HA group to explore the possibility that AET may have a differential impact as a function of disease. However, we found no significant differences between these two groups in regards to any of the primary and secondary outcomes, neither in pre-, post- or AET-related changes. This is in contrast with the few studies that compared gait parameters in these two populations (Frenkel-Toledo et al., 2005; Bello et al., 2008). Thus, although conjectural, the reason why we did not observe any group differences may be that we recruited PD patients that were in the early phase of the disease (i.e., Hoehn and Yahr stage 1 or 2), compared to previous reports which included patients in more advanced stages (Frenkel-Toledo et al., 2005; Bello et al., 2008).

Several mechanisms have been proposed to explain physical training-related improvements similar to those seen in the present study. Some of these include direct effects on the central nervous system based upon an optimisation of the medication intake by easing its absorption (Speelman et al., 2011) or through increased corticomotor excitability (Fisher et al., 2008) and dopaminergic neurotransmission (Petzinger et al., 2010). Other proposed mechanisms have been more indirect and include increased cortical vascularisation, synaptic plasticity and neurogenesis (Speelman et al., 2011). These processes, which could be mediated by neurotrophic factors, would lead to structural and functional brain changes. Although the present study does not allow identifying the mechanism(s) that could explain the effects of AET on gait measures reported here, one can nevertheless assume that some brain structural and functional changes could be at the origin of such clinical outcomes. For example, in rodents, regional gray matter volume in a region equivalent to supplementary motor area (SMA) in humans has shown to be correlated positively with the total distance run by the animal following 7 days of exercise (Sumiyoshi et al., 2014). Similar increases, in both white and gray matter, have been reported in a group of older sedentary human adults after participation in a 6-month aerobic training regimen (Colcombe et al., 2006). Furthermore, a recent study using resting state functional magnetic resonance imaging found significant changes in activity in sensorimotor areas in a group of young individuals after 20 min of aerobic exercise (Rajab et al., 2014), while a couple of studies demonstrated that functional brain activity in motor areas increased proportional to the movement rate on a pedaling task executed during scanning (Mehta et al., 2012). Finally, increased SMA activation was observed during motor imagery of locomotor-related tasks (Malouin et al., 2003) as well as during real locomotion as measured by electrophysiological studies showing SMA modulation during walking (Petersen et al., 2012; Wagner et al., 2012, 2014; Seeber et al., 2015). Thus despite the scarcity of studies assessing cerebral structural and functional changes in relation to gait parameters and aerobic training in humans, the above mentioned studies provide basic evidence that AET can have a direct impact on the brain. Given the lack of neuroimaging studies looking at the effect of PE in PD, further investigations are needed to directly assess these mechanisms in PD patients using neuroimaging techniques.

An issue that merits discussion would be the role of medication. There is evidence that medication itself may have a differential effect on movement rate and amplitude, as demonstrated by past research (Espay et al., 2009, 2010, 2011; Stegemöller et al., 2009; Teo et al., 2013, 2014). In the current study, we believed that medication did not play a major role influencing the outcomes for PD patients. The reason is that patients were always on their medication during both pre and post evaluations, their medication did not change and more importantly, assessments always took place at the same time of the day.

Although our findings help increasing our knowledge base about the effects of AET on gait, the current study has some limitations. A limitation of the present study was the lack of an additional training condition controlling for the type and intensity of exercise that PD patients performed over the 3-month period. However, despite this shortcoming, we believe that the results of the current study are theoretically and clinically relevant as they suggest that the use of AET using stationary

bicycle can have a beneficial impact in persons suffering from PD. Another limitation is the fact that we only assessed two cognitive functions, inhibition and flexibility, which are the most commonly used in the PD literature. Therefore, one cannot exclude the possibility that other cognitive domains may show correlations with gait parameters following physical training.

From a clinical perspective, the use of high-intensity exercise is now a common rehabilitation method in PD. It is important to note that the AET effects in the current study were observed after a moderate to a high intensity exercise regimen (half of the program was performed at 80% of maximal intensity). Thus, with its stable and comfortable sitting posture, our results suggest that AET with a stationary bicycle is not only a viable, but also a safe training procedure for PD patients in stage 1 and 2 of the disease. Further studies are still needed in order to assess the safety and feasibility of a training regimen on stationary bicycle in patients in more advanced stages of the disease who are showing greater physical limitations such as balance impairments, as well as to evaluate the long-term benefits of this training method. Yet, despite such limitations, our study shows that stationary bicycle can be successfully used to improve gait functions in PD patients. The main contribution of the current study thus stems from the fact that our findings are showing AET-related improvements in a gait parameter (walking speed) is crucial for the daily functioning of sedentary patients who are in the initial stages of the disease. Therefore, we believe that this result is another step closer to developing a reliable strategy to stimulate an active lifestyle in patients with PD, taking into account safety issues and each patient's individual capacities.

#### ETHICS STATEMENT

RNQ- Research Ethics Mixed Committee. The protocol was approved by a research committee to ensure the full security of

#### REFERENCES


the participants. The consent form was read with participants before to obtain their agreement to participate.

## AUTHOR CONTRIBUTIONS

Research project conception: CD, RP, FG, LB, and JD; Research project organization: CD, M-ÈR, AB, FB, and JD; Research project execution: AN, CD, M-ÈR, FB, and A-LL. Statistical analysis: AN and OL. Manuscript writing: AN, OL, and JD; Manuscript review and critique: AN, CD, OL, M-ÈR, AB, FB, RP, A-LL, FG, LB, and JD.

## FUNDING

The current study was supported by the Parkinson Society Canada (2014-709).

## ACKNOWLEDGMENTS

The results of the present study do not constitute endorsement by ACSM. The authors wish to thank Dr. Juan Manuel Villalpando and Dr. Thien Tuong Minh Vu who kindly accepted to supervise physical assessment during testing.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fnhum. 2016.00690/full#supplementary-material




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Nadeau, Lungu, Duchesne, Robillard, Bore, Bobeuf, Plamondon, Lafontaine, Gheysen, Bherer and Doyon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Dancing or Fitness Sport? The Effects of Two Training Programs on Hippocampal Plasticity and Balance Abilities in Healthy Seniors

Kathrin Rehfeld1,2 \* † , Patrick Müller1,3† , Norman Aye1,2, Marlen Schmicker<sup>1</sup> , Milos Dordevic1,2, Jörn Kaufmann<sup>4</sup> , Anita Hökelmann<sup>2</sup> and Notger G. Müller1,3,5

<sup>1</sup> German Center for Neurodegenerative Diseases, Magdeburg, Germany, <sup>2</sup> Institute for Sport Science, Otto von Guericke University Magdeburg, Magdeburg, Germany, <sup>3</sup> Medical Faculty, Otto von Guericke University Magdeburg, Magdeburg, Germany, <sup>4</sup> Department of Neurology, Otto von Guericke University Magdeburg, Magdeburg, Germany, <sup>5</sup> Center for Behavioral Brain Sciences, Magdeburg, Germany

#### Edited by:

Claudia Voelcker-Rehage, Technische Universität Chemnitz, Germany

#### Reviewed by:

Dieter J. Meyerhoff, University of California, San Francisco, United States Marco Taubert, Otto von Guericke University Magdeburg, Germany

\*Correspondence:

equally to this work.

Kathrin Rehfeld kathrin.rehfeld@ovgu.de †These authors have contributed

Received: 08 November 2016 Accepted: 26 May 2017

Published: 15 June 2017

#### Citation:

Rehfeld K, Müller P, Aye N, Schmicker M, Dordevic M, Kaufmann J, Hökelmann A and Müller NG (2017) Dancing or Fitness Sport? The Effects of Two Training Programs on Hippocampal Plasticity and Balance Abilities in Healthy Seniors. Front. Hum. Neurosci. 11:305. doi: 10.3389/fnhum.2017.00305 Age-related degenerations in brain structure are associated with balance disturbances and cognitive impairment. However, neuroplasticity is known to be preserved throughout lifespan and physical training studies with seniors could reveal volume increases in the hippocampus (HC), a region crucial for memory consolidation, learning and navigation in space, which were related to improvements in aerobic fitness. Moreover, a positive correlation between left HC volume and balance performance was observed. Dancing seems a promising intervention for both improving balance and brain structure in the elderly. It combines aerobic fitness, sensorimotor skills and cognitive demands while at the same time the risk of injuries is low. Hence, the present investigation compared the effects of an 18-month dancing intervention and traditional health fitness training on volumes of hippocampal subfields and balance abilities. Before and after intervention, balance was evaluated using the Sensory Organization Test and HC volumes were derived from magnetic resonance images (3T, MP-RAGE). Fourteen members of the dance (67.21 ± 3.78 years, seven females), and 12 members of the fitness group (68.67 ± 2.57 years, five females) completed the whole study. Both groups revealed hippocampal volume increases mainly in the left HC (CA1, CA2, subiculum). The dancers showed additional increases in the left dentate gyrus and the right subiculum. Moreover, only the dancers achieved a significant increase in the balance composite score. Hence, dancing constitutes a promising candidate in counteracting the age-related decline in physical and mental abilities.

#### Keywords: dancing, fitness training, balance, hippocampus, aging

# INTRODUCTION

The human hippocampus (HC) is affected not only by pathological aging such as in Alzheimer's disease but also by the normal aging process resulting in deficits in memory, learning, and spatial navigation at old age (Driscoll et al., 2003; Barnes et al., 2009). Magnetic resonance-studies indicate an atrophy rate of the hippocampus and the nearby parahippocampal gyrus of 2–3% per decade (Raz et al., 2004, 2005), which is further accelerated in the very old age where there is an annual loss

of 1% over the age of 70 (Jack et al., 1998). On the other hand more recent research has shown that the HC counts among the few brain regions with the ability to generate new neurons throughout the lifespan (Kempermann et al., 2010; Spalding et al., 2013). In animal models physical activity has been identified as a key mechanism that can drive this adult neuroplasticity (van Praag et al., 1999; Kronenberg, 2003). In humans, research has focused on the effects of aerobic fitness and training on volumes and perfusion of the HC. Results reveal that higher cardiorespiratory fitness levels (VO<sup>2</sup> max) are associated with larger hippocampal volumes in late adulthood, and that larger hippocampal volumes may, in turn, contribute to better memory function (Erickson et al., 2011; Szabo et al., 2011; Bugg et al., 2012; Maass et al., 2015). Furthermore, some investigations also assessed possible physiological mediators of the observed neuroplasticity, such as brain-derived neurotrophic factor (BDNF), insulin-like growth factor 1 (IGF-1), and vascular endothelial growth factor (VEGF) (Flöel et al., 2010; Erickson et al., 2011; Ruscheweyh et al., 2011; Maass et al., 2016). Whereas Erickson et al. (2011) reported a positive correlation between levels of serum BDNF, hippocampal volume and cardiorespiratory fitness during 1 year of aerobic training, neither Ruscheweyh et al. (2011) nor Maass et al. (2016) found fitness-related BDNF changes after 6 or 3 months of training, respectively. Moreover, other studies failed to find correlations between volumes of the medial temporal lobe area or the hippocampus and cardiovascular fitness in healthy elderly (Honea et al., 2009; Smith et al., 2011). Therefore, the role of cardio-respiratory fitness in modulating hippocampal gray matter volume is still under debate.

The hippocampus is also involved in spatial navigation (O'Keefe, 1990) and in motor sequence consolidation (Albouy et al., 2008) suggesting that motor skill learning and motor fitness can have impact on hippocampal volume without any cardio-respiratory change. In this respect, Niemann et al. (2014) tested whether 12 months of cardiovascular or coordination training induces larger increases in hippocampal volume in healthy older participants. After training, the cardio-vascular group revealed a significant volume increase in the left HC of 4.22% and a non-significant increase of 2.98% for the right HC. Effects of the coordinative training were more pronounced in the right HC with an increase of 3.91%, whereas the changes in the left HC (1.78%) were non-significant. Further correlation analyses between motor fitness and hippocampal volume failed to reach significant results. Still there is compelling evidence that the human brain undergoes morphological alterations in response to motor-skill learning (Draganski et al., 2004; Boyke et al., 2008; Taubert et al., 2010; Sehm et al., 2014). Along these lines, a recent study demonstrated structural brain changes already after two sessions of dynamic balance training that correlated with the individual motor skill learning success of the participants (Taubert et al., 2010). Sehm et al. (2014) could demonstrate that 6 weeks of balance training induced increases in the gray matter of the left HC in healthy seniors. These findings highlight the behavioral relevance of structural brain plasticity in the HC for the learning process. Hüfner et al. (2011) stated that longterm balance training with its extensive vestibular, visual and sensorimotor stimulation is associated with altered hippocampal formation volumes in professional ballet dancers and slackliners. Hence, the HC seems not only crucial for long-term memory consolidation, learning and spatial navigation, but also for balancing. Intact balance is essential for social mobility and quality of life in aging (Dordevic et al., 2017). Hence, physical intervention programs should take this function into account, too.

In this respect dancing seems to be a promising intervention since it requires the integration of sensory information from multiple channels (auditory, vestibular, visual, somatosensory) and the fine-grained motor control of the whole body. Behavioral studies have already provided evidence of better performance in balance and memory tasks in elderly dancers (Kattenstroth et al., 2010, 2013; Rehfeld et al., 2014), but the underlying neural mechanisms have not been addressed comprehensively so far. Knowing that aerobic, sensorimotor and cognitive training contribute to hippocampal volume, which also seems to be associated with balancing capabilities, we initialized a prospective, randomized longitudinal trial over a period of 18 months in healthy seniors. Two interventions were compared: a specially designed dance program, during which subjects constantly had to learn new choreographies, and a traditional fitness program with mainly repetitive exercises, such as cycling on an ergometer or Nordic walking. Whole-brain analyzes of the acquired data using voxel based morphometry had shown dance-associated volume increases mainly in the precentral and the parahippocampal gyrus (Müller et al., 2016). Knowing that dancing/slacklining (Hüfner et al., 2011) and endurance sport (e.g., Erickson et al., 2011) have different impact on anterior and posterior parts of the hippocampus in the present analysis we ran a region of interest analysis of this specific brain region. To do so we first computed a restricted VBM analysis with a hippocampal mask. In the next step we divided the hippocampus in five subfields in order to allow a detailed analysis of the interventions' effects on different parts of the HC. The hippocampus is not a homogeneous structure but consists of histologically specialized subfields, such as the subiculum, cornu ammonis (CA) 1–4 and dentate gyrus (DG). The subiculum has been implicated in working memory and spatial relations (Riegert et al., 2004; O'Mara, 2005). CA3 and DG have been suggested to be involved in memory and early retrieval, whereas CA1 in late retrieval, consolidation and recognition. Especially the DG is one of the few regions of the adult brain where neurogenesis takes places, which is important in the formation of new memories and spatial memory (Saab et al., 2009). Nevertheless all these subfields are tightly interconnected (Duvernoy, 2005). Since dancing seems to promote spatial orientation, working memory and might promote neurogenesis, we expected volume changes in more subfields of the HC after this intervention. Moreover, given the importance of intact balance for successful aging on the one hand and its dependence on the hippocampus on the other hand, we also assessed effects of the interventions on balancing capabilities and their relation to hippocampal subfield volumes.

# MATERIALS AND METHODS

fnhum-11-00305 June 14, 2017 Time: 16:59 # 3

# Study Design and Subjects

This investigation, comprising hippocampal volume alterations and changes in balance abilities, is part of a large prospective longitudinal study which compares the effects of dancing versus aerobic training on brain structure and function, mediating neuroplasticity factors, such as BDNF, as well as cognitive and motor performances in healthy elderly seniors. The cognitive development and BDNF changes are highlighted in our recent report (see Müller et al., 2017). The intervention was provided for 18 months and contained three time-points of measurement: baseline pre-test, first post-test after 6 months of training and second post-test after 18 months of training (see **Figure 1**). Again, the temporal dynamics of gray matter brain plasticity are already stated by Müller et al. (2017), showing a significant increase of gray matter volume in parahippocampal gyrus only for the dancers. Based on that finding, we assume only changes from baseline to the second post-test (18 months).

The approval for the study was obtained from the ethics committee of the Otto von Guericke University, Magdeburg. All subjects signed a written informed consent and received a reimbursement for their participation.

The timeline of the study can be depicted from **Figure 1**. Primarily, we invited 62 healthy elderly volunteers aged 63–80 years for cognitive and physical screening as well as for verification of magnetic resonance imaging suitability. Exclusion criteria were defined as follows: any history of severe neurological conditions, metal implants, claustrophobia, tinnitus, intensive physical engagement (more than 1 h/week), cognitive impairments as evidenced in the MMSE (Folstein et al., 1975) and depressive symptoms (BDI-II > 13) (Beck et al., 2006). Fifty two seniors met the inclusion criteria and were then randomly assigned to the experimental dance group and the control sport group. After 18 months of training we were left with 26 complete data sets, including 14 dancers and 12 sportsmen. Both groups (mean age = 67.9 ± 3.3 years) did not differ concerning age, sex, education, and BMI. For detailed information about demographic data see **Table 1**.

#### Interventions

The precise description of the interventions is published elsewhere (Müller et al., 2016). In brief, the first period of training was provided for 6 months, twice a week for both groups. Each dancing or fitness class lasted 90 min. Because of organizational reasons we had to change the training frequency from twice a week to once a week after 6 months of training. The second training period was run for 12 months and the training sessions were reduced to once a week for 90 min in both groups. The content of the dance classes induced a permanent learning situation with constantly changing choreographies, which participants had to memorize accurately. The training focused on elementary longitudinal turns, head-spins, shifts of center of gravity (COG), single-leg stances, skips and

hops, different steps like chassée, mambo, cha cha, grapevine, jazz square to challenge the balance system. Additional armpatterns enforced imbalances (moving arms away from center of pressure).

The program for the sport group was adjusted according to the recommended guidelines for health sport (Brehm et al., 2006) and included endurance training, strength-endurance training, and flexibility training (stretching and mobility). Each part of the mentioned topics (endurance; strength-endurance; and flexibility) was exercised for 20 min, whereby a 10 min warm-up, a 10 min cool-down and short breaks between the different exercises adding to another 10 min completed each 90 min lasting session. So both groups exercised for 90 min in each training session. In the first 6 months, endurance training was performed on bicycle ergometers with the intensity adjusted to the individual training heart rate (HR) using the Karvonen Formula:

Target training HR =

Resting HR + (0.6[maximum HR − resting HR]).

The factor 0.6 is a representative for an extensive aerobic training (Davis and Convertino, 1975). In the second training period (12 months) the participants completed a Nordic Walking program. The strength-endurance training aimed to strengthen major muscles of the muscular skeleton. In this program we


BMI, Body-Mass-Index; BDI-II, Becks-Depressions-Inventar II; MMSE, Mini Mental State Examination; M, Mean; SD, Standard deviation; p ≤ 0.05 = statistical significance.

avoided combined arm and leg movements in order to keep coordinative demands low.

TABLE 1 | Demographic information at baseline of analyzed participants (N = 26).

# Structural MRI Acquisition, Preprocessing, and Analysis

Magnetic resonance (MR) images were acquired on a 3 Tesla Siemens MAGNETOM Verio (Syngo MR B17) using a 32 channel head coil. T-1 weighted MPRAGE sequence (224 sagital slices, voxel size: 0.8 mm × 0.8 mm × 0.8 mm, TR: 2500 ms, TE: 3.47 ms, TI: 1100 ms, flip angle: 7◦ ) were analyzed using region of interest (ROI) defined voxel-based morphometry with SPM 12 (Welcome Department of Cognitive Neurology, London, United Kingdom) running under Matlab (The Math Works). The data preprocessing involved gray matter segmentation, DARTEL based template creation, spatial normalization to MNI-Space and an 5 mm smoothing with a Gaussian kernel as previously described.

# Voxel-Based Morphometry with Hippocampal Mask

In order to incorporate our a priori hypotheses concerning hippocampal gray matter volume changes we first conducted a ROI-VBM with hippocampal masks. The longitudinal analysis for hippocampal gray matter volume changes was performed using repeated measurement ANOVAs in a full factorial design. We applied a threshold of p < 0.05 (FDR corrected).

# Hippocampal Subfield Volume Measurements

In a second step we analyzed volume changes in five subfields of the HC. Up to now there is no real gold standard in analyzing HC subfield volumes and each of the current manifold analytic techniques has its strengths and weaknesses (Bandettini, 2009; Kuhnt et al., 2013). Here for the hippocampal subfield segmentation in order to obtain ROI volumes we chose the SPM ANATOMY Toolbox v.2.2.c (Eickhoff et al., 2007) with normalized images. This segmentation included the cornu ammonis (CA1–CA3), the dentate gyrus (DG, including CA4) and the subiculum (**Figure 2**). In SPM Anatomy toolbox, definition of anatomical regions is based on maximum probability cytoarchitectonic maps.

#### Postural Control

Postural control was assessed with the Sensory Organization Test (SOT) implemented in the Balance Master System (Neurocom International, Inc., United States). This test provides information about the contribution of the visual, somatosensory, and vestibular system to the maintenance of balance. The system consists of a dual force platform including force transducers measuring the angular displacement of the COG under certain conditions and visual surround. Both, visual surround and platform enable anterior/posterior sway and this sway can be assessed under different conditions. The six conditions are: normal vision and fixed support (condition 1), absent vision and fixed support (condition 2), sway-referenced vision and fixed support (condition 3), normal vision and swayreferenced support (condition 4), absent vision and sway referenced support (condition 5), and sway-referenced vision and sway-referenced support (condition 6). These conditions were performed in three trials for 20 s, resulting in equilibrium scores. Those equilibrium scores range from 0% (balance loss) to 100% (perfect stability). From the equilibrium scores a sensory analysis was performed by calculating average scores of specific pairs of SOT conditions: the participant's ability to use input from the somatosensory system to maintain balance is reflected by the average of condition 2 divided by the average of condition 1, the contribution of the visual system by the average of condition 4 divided by the average of condition 1 and that of the vestibular system by the average of condition 5 divided by the average of condition 1.

The composite score was calculated by averaging the score for conditions 1 and 2; adding these two scores to the equilibrium scores from each trial of sensory conditions 3, 4, 5, and 6; and dividing that sum by the total number of trials (NeuroCom Natus Medical Incorporated, 2008).

## Statistical Analysis

Statistical analysis of hippocampal volumes and balance data were performed with SPSS (SPSS 22, inc./IBM). Intervention effects were tested using repeated-measurement ANOVAs with group (dance, sport) as between-subject factor and time (pre, post) as within-subject factor. Hereby, age, gender, and total hippocampal volume were included as covariates. Additionally, hypothesis driven t-tests (with Bonferroni

adjustment) were performed to determine longitudinal changes in the dance and the sports group separately. In case of missing normal distribution we used the Mann–Whitney-U-test or Wilcoxon instead of t-tests. Pearson-Correlation analysis was performed between percentage change of hippocampal subfield volumes and the balance composite score.

# RESULTS

The presentation of the results is structured as follows. We first tested for hippocampal volume differences after intervention using both masked VBM and subfield volume measurements. In the next step we investigated balance data and finally we looked for correlations between improvements in balance and hippocampal volume.

# Voxel-Based Morphometry with Hippocampal Mask

A two-sample t-test revealed no group differences at baseline. To explore hippocampal gray matter volume changes during intervention we used repeated measurement ANOVA for comparison between baseline and post-test. There was a significant interaction effect in the right hippocampus [MNI-coordinates: x = 28, y = −16, z = −23; p(FDR) = 0.049, F = 17.03]. Post hoc paired t-tests showed only in the dance group significant volume increases in the right hippocampus [MNI-coordinates: x = 29, y = −16, z = −27; p(FDR) = 0.001, t = 6.10] (**Figure 3**).

## Hippocampal Subfield Volume Measurements

A two-sample t-test revealed no group differences of total hippocampal volumes at baseline [t(25) = −1.078, p = 0.658, d = −0,424]. The repeated measurement ANOVA of hippocampal subfield volumes showed a main effect of time regarding left CA1, left CA2, left and right subiculum and left CA4/dentate gyrus (**Table 2**). There were no significant interactions with group. Paired t-tests showed significant volume increases for the dancers in left CA1, left CA2, left CA4/dentate gyrus and left and right subiculum and for the sportsmen in the left CA1, left CA2, and left subiculum (**Figure 4**).

# Postural Control

Repeated measurement ANOVAs of balance data showed an interaction effect with group for the composite equilibrium score (see **Table 3** and **Figure 5**).

There was a main effect of time regarding the somatosensory and vestibular contribution but no significant time × group interaction effects after 18 months of training (see **Table 3**). Post hoc tests revealed that the dancers improved in the use of all three sensory systems somatosensory system [t(13) = −2.902, p = 0.004], visual system [t(12) = −2.525, p = 0.027] vestibular system [t(12) = −3.271, p = 0.007] to maintain balance. Members of the sports group improved in the use of the somatosensory system [t(9) = −3.579, p = 0.006] and the vestibular system [t(9) = −3.881, p = 0.004] but not in the visual system. **Table 3** presents an overview of significant alterations related to employment of sensory information to maintain balance from baseline to post-intervention for both groups.

# Correlation Analysis

Correlation analysis between all hippocampal subfields and balance did not yield any significant results irrespective of whether the groups were analyzed separately or jointly.

# DISCUSSION

Animal research has shown that combining aerobic training with sensory enrichment has a superior effect on inducing neuroplasticity in the HC compared to physical exercise or sensory stimulation alone (Kempermann et al., 2010). This sparked our idea to investigate the impact on neuroplasticity in elderly humans of a specially designed, sensorimotor and cognitive challenging dance program in comparison to a classical cardiovascular fitness program. In addition to our previous work (Müller et al., 2017), in the present study we ran a dedicated ROI analysis, which was focused on subfield volumes of the HC. The HC is of special interest as this brain structure is (a) especially affected by normal and pathological aging and (b) plays a key role in major cognitive processes, e.g., memory and learning and (c) is also involved in keeping one's balance, a function which is crucial for well-being and quality of life.

We observed that both, dancing and fitness training led to increases in hippocampal subfield volumes. Although there was no significant group × time interaction in the ANOVA omnibus

TABLE 2 | Statistical values of repeated-measures ANOVAs for hippocampal subfields.


CA, cornu ammonis; DG, dentate gyrus; p ≤ 0.05 = statistical significance. Level of significance: 0.01 ≤ α < 0.05: <sup>∗</sup> "significant"; 0.001 ≤ α < 0.01: ∗∗"high significant"; α < 0.001: ∗∗∗"highly significant".

TABLE 3 | Statistical values of repeated-measures ANOVAs for sensory organization of balance.


Level of significance: 0.01 ≤ α < 0.05: <sup>∗</sup> "significant"; 0.001 ≤ α < 0.01: ∗∗"high significant"; α < 0.001: ∗∗∗"highly significant".

analysis, exploratory post hoc t-tests indicated that participants of the dance group showed volume increases in more subfields (four out of five, including the DG) of the left HC and that only dancing led to an increase in one subfield of the right HC, namely the subiculum. Regarding balance abilities dancing was superior to standard fitness as expressed by a larger increase in the composite score of our balance test and improved use of all three sensory systems. We, however, did not observe a correlation between changes in HC subfield volumes changes and those in balance; in other words whether the observed skill improvement can be attributed to the HC cannot be fully answered yet.

Regarding the HC volume increases observed in both groups, our results support the assumption that HC volume can be enhanced by physical fitness alone, as this was the overlapping feature of both trainings. Animal studies have shown that adult neurogenesis takes place mainly in the DG part of the HC (van Praag et al., 1999). Interestingly, only the dancers showed an increase in this brain region. Whether adult neurogenesis was indeed the basis of the here observed volume change, however, must remain an open question as there is no direct way in addressing this process in humans.

The dancers showed increases in some HC subfields where there was no change to be observed in the sports group. This indicates that apart from physical fitness, other factors inherent in dancing, contribute to HC volume changes, too. Animal research has suggested that sensory enrichment may be such a factor whereby physical fitness and enrichment have different effects on HC neurons: running in a wheel generates new neurons in the HC of mice but these only survive when sensory stimulation is also present (Kempermann et al., 2010). Again, with our own data we cannot differentiate between these different processes. We nevertheless can conclude that the additional challenges involved in our dance program, namely cognitive and sensorimotor stimulation, induced extra HC volume changes in addition to those attributable to physical fitness alone. It is noteworthy, that other studies in elderly humans, which did not boost physical fitness but which were sensorimotor demanding, such as learning to juggle (Boyke et al., 2008), have observed HC volume increases as well.

Only the dancers showed an increased balance composite score and they improved in all three involved sensory systems. This indicates that dancing drives all three senses and presumably also improves the integration of sensorimotor, visual and vestibular information. Balancing is an important everyday function, crucial for example for social mobility. Impaired balance often results in falls, which constitutes a major health risk factor with consequences both on morbidity (and even mortality) and health care costs (see also Dordevic et al., 2017). Although the ability to balance has been also linked to the HC and its connections, for example, to the vestibular system (Brandt et al., 2005), we did not observe a correlation between HC subfield volumes and improvements in balance. Given the small size of our sample, this needs to be interpreted with care but may suggest that other brain regions, probably those described in our earlier analysis (Müller et al., 2017) were involved in these improvements or that changes in the HC other than those expressed in measurable volumes, e.g., synaptic function, perfusion, etc. contributed to this effect.

There are other limitations in the present study which should not be left unmentioned. As already mentioned above (see Materials and Methods), we had to change the training intervention frequency from twice a week to once a week after 6 months of training. Hence, it must remain unclear whether more pronounced effects could have been observed if we had been able to stick to the initial training intensity. Next the ANOVAs failed to reach significant group interaction effects, only the exploratory t-tests became significant which may be

a consequence of the large number of factors and levels in the ANOVAs on the one hand, and the small sample size on the other hand. A further limitation can be seen in the use of fully automated segmentation tools. Finally, the small sample size accompanied by a high drop-out rate as well as the highly selective inclusion, a missing inactive control group and exclusion criteria must be mentioned as they limit the generalizability of our results.

In sum, the present results indicate that both dance and fitness training can induce hippocampal plasticity in the elderly, but only dance training improved balance capabilities.

However, larger studies with more representative samples are required in the future. They should include additional analysis of mediating factors and they should try to find ways to optimally adjust the training protocol to an individual's needs and preferences. Most of all, it needs to be investigated in longitudinal randomized clinical trials whether the proposed interventions indeed have the potential to reduce or postpone the risk of neurodegenerative diseases such as Alzheimer's as suggested in large non-interventional studies (Verghese et al., 2003).

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

KR was responsible for the study organization and execution, as well as writing the text of the manuscript (Introduction, Discussion, and some parts of the Materials and Methods: Study Design and Subjects). PM contributes equally to this work. He has written some parts of the manuscript (Materials and Methods, Results, Discussion). NA assessed balance abilities and analyzed the data. He has written some parts of the Materials and Methods and Results (postural control). MS contributes to the Statistical Analysis and did some corrections of this manuscript. MD supported hippocampal subfield analysis and corrected this manuscript. JK contributes to the MRI measurements and for structural brain analysis. He corrected this manuscript. AH is the chief coordinator of this study and selected the motor skill tasks and organized the framework. NM is the second chief coordinator of this study and provided MRI measurements. He also worked on the Introduction and Discussion of this manuscript.



acquisition of spatial memory. Neuron 63, 643–656. doi: 10.1016/j.neuron.2009. 08.014


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer MT declared a shared affiliation, though no other collaboration, with several of the authors KR, NA, MD, JK and AH to the handling editor, who ensured that the process met the standards of a fair and objective review.

Copyright © 2017 Rehfeld, Müller, Aye, Schmicker, Dordevic, Kaufmann, Hökelmann and Müller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Improvements in Orientation and Balancing Abilities in Response to One Month of Intensive Slackline-Training. A Randomized Controlled Feasibility Study

Milos Dordevic1,2 \*, Anita Hökelmann<sup>2</sup> , Patrick Müller<sup>1</sup> , Kathrin Rehfeld<sup>2</sup> and Notger G. Müller1,3

<sup>1</sup> Department of Neuroprotection, German Center for Neurodegenerative Diseases, Magdeburg, Germany, <sup>2</sup> Institute of Sports Science, Otto von Guericke University, Magdeburg, Germany, <sup>3</sup> Center for Behavioral Brain Sciences, Magdeburg, Germany

Background: Slackline-training has been shown to improve mainly task-specific balancing skills. Non-task specific effects were assessed for tandem stance and preferred one-leg stance on stable and perturbed force platforms with open eyes. It is unclear whether transfer effects exist for other balancing conditions and which component of the balancing ability is affected. Also, it is not known whether slacklinetraining can improve non-visual-dependent spatial orientation abilities, a function mainly supported by the hippocampus.

Objective: To assess the effect of one-month of slackline-training on different components of balancing ability and its transfer effects on non-visual-dependent spatial orientation abilities.

#### Edited by:

Claudia Voelcker-Rehage, Chemnitz University of Technology, Germany

#### Reviewed by:

Oliver Faude, University of Basel, Switzerland Astrid Zech, Jena University, Germany

> \*Correspondence: Milos Dordevic milos.dordevic@dzne.de

Received: 15 September 2016 Accepted: 26 January 2017 Published: 10 February 2017

#### Citation:

Dordevic M, Hökelmann A, Müller P, Rehfeld K and Müller NG (2017) Improvements in Orientation and Balancing Abilities in Response to One Month of Intensive Slackline-Training. A Randomized Controlled Feasibility Study. Front. Hum. Neurosci. 11:55. doi: 10.3389/fnhum.2017.00055 Materials and Methods: Fifty subjects aged 18–30 were randomly assigned to the training group (T) (n = 25, 23.2 ± 2.5 years; 12 females) and the control group (C) (n = 25, 24.4 ± 2.8 years; 11 females). Professional instructors taught the intervention group to slackline over four consecutive weeks with three 60-min-trainings in each week. Data acquisition was performed (within 2 days) by blinded investigators at the baseline and after the training. Main outcomes Improvement in the score of a 30-item clinical balance test (CBT) developed at our institute (max. score = 90 points) and in the average error distance (in centimeters) in an orientation test (OT), a triangle completion task with walking and wheelchair conditions for 60◦ , 90◦ , and 120◦ .

Results: Training group performed significantly better on the closed-eyes conditions of the CBT (1.6 points, 95% CI: 0.6 to 2.6 points vs. 0.1 points, 95% CI: –1 to 1.1 points; p = 0.011, η 2 <sup>p</sup> = 0.128) and in the wheelchair (vestibular) condition of the OT (21 cm, 95% CI: 8–34 cm vs. 1 cm, 95% CI: –14–16 cm; p = 0.049, η 2 <sup>p</sup>= 0.013).

Conclusion: Our results indicate that one month of intensive slackline training is a novel approach for enhancing clinically relevant balancing abilities in conditions with closed eyes as well as for improving the vestibular-dependent spatial orientation capability; both of the benefits are likely caused by positive influence of slackline-training on the vestibular system function.

Keywords: balance, slackline-training, orientation, vestibular system, hippocampus

# INTRODUCTION

fnhum-11-00055 February 8, 2017 Time: 14:51 # 2

Intact balance control is required not only to maintain postural stability but also to assure safe mobility-related activities during daily life (Mancini and Horak, 2010). Approximately 1.5% of healthcare expenditures in European countries are caused by falls, which mainly occur because of impaired balance, aging and cognitive decline (Ambrose et al., 2013); this large number does not take into account any additional indirect costs. Prevention in the earliest stages, already at young age, is hence justified. Balance and strength training is considered to be by far the most efficient intervention for fall prevention (Karlsson et al., 2013) and it can be effective for postural and neuromuscular control improvements; in addition, balance training is considered to be an effective intervention for improvement in static postural sway and dynamic balance in both athletes and nonathletes (Zech et al., 2010). Moreover, both gray and white matter alterations have been reported in young people in response to only six weeks of balance training (Taubert et al., 2010). The optimal interaction between visual, vestibular and somatosensory systems is the key to stability of the body. While the visual factor can be corrected in many different ways, the other two can be best enhanced through optimal training interventions.

Several recent studies have demonstrated particularly beneficial effect of slacklining on balancing abilities in both younger and older populations (Pfusterschmied et al., 2013; Thomas and Kalicinski, 2016), through enhancement in postural control and functional knee joint stability. Although in other studies mainly task-specific effects were found in response to six weeks of slackline training, larger non-task specific effects on postural control could not be found in these studies for only several relatively simple testing assignments, such as one-leg and tandem stance on stable force platform surface (Donath et al., 2013, 2016b); moreover, the amount of training in these studies was limited to approximately only one hour per week. Slackline length in previous studies was set to between 5 and over 15 m, which proportionally decreases the rate of turns per training, limiting thereby the stimulation of the vestibular system and its output pathways mainly to the otolith organs; in other words, important function of semicircular canals and related brain regions might have been underemployed and an additional potential effect overseen (Highstein, 1991; Cullen and Minor, 2002). Earlier research using several other types of balance-training interventions found transfer effects of training on performance in clinical tests of balance, and these tests are considered very important for both diagnostic and therapeutic purposes (Mancini and Horak, 2010). Some questions remain, however, still unanswered: (1) can slacklinetraining cause such non-specific transfer effects on performance in a comprehensive clinical balance test clinical balance test (CBT) and (2) what component of balancing ability, as assessed by this test, is mainly affected by slackline-training, with the vestibular component being of particular interest here.

The hippocampus and neighboring cortical regions are the main loci where the onset of Alzheimer's disease pathology occurs (Raskin et al., 2015), followed by their progressive degeneration, and early prevention treatments (in younger age) concerning this problem are strongly encouraged (Brookmeyer et al., 2007). Several previous animal and human studies have pointed towards a strong link between the vestibular system and orientation centers of the brain, considered to be located in the hippocampus and neighboring regions (Stackman et al., 2002; Russell et al., 2003; Brandt et al., 2005; Jahn et al., 2009). These studies found serious deficits in the orientation function of the temporal lobe as a result of disturbed or lost vestibular input. Many other studies also suggested that the vestibular system provides selfmotion information which is important for the hippocampus and related brain regions to develop spatial memories; when this input is lost, spatial memory becomes impaired (Smith et al., 2010). Moreover, professionals who intensively make use of their vestibular system during their daily artistic performances, such as ballet/ice(Pfusterschmied et al., 2011) dancers and slackliners, have differently structured temporal brain regions, including the hippocampus, compared to non-professionals (Hüfner et al., 2011). A study by Allen et al. (2004) clearly demonstrated a reduction in vestibular-kinesthetic dependent orientation abilities with aging, by comparing performance of younger and older adults on the triangle completion task; the older adults performed particularly worse on this task when their input was restricted to the vestibular system only (passively pushed in a wheelchair), implying deterioration of this system with aging. A question that remains unanswered here is if an intensive slackline-training can lead to significant improvement in the vestibular system's function, which can then be beneficial for spatial orientation abilities in a trained person. Therefore, here we wanted to find out whether an especially challenging balance training program (learning to slackline) can also induce transfer effects on cognitive function, namely spatial orientation. The idea behind this assumption was that a) a strong connection between the vestibular system (which is important for balancing) and the hippocampus has been suggested and b) that spatial orientation is a function that is to a great extent supported by the hippocampus (Hitier et al., 2014). We chose intensive slacklining in young adults as an intervention measure under the assumption that if this training is not capable of inducing transfer effects then other, less demanding regimen (such as those typically used to enhance balancing skills in elderly, sick patients) will surely not be able to do so either. In other words, this was a feasibility pilot study, using a young population.

Thus, having in mind the close connection between the vestibular and orientation systems, we asked whether intensive slackline training can improve not only one's ability to maintain balance but also has transfer effects on the capability to successfully orientate in space. Up to this point we are not aware of any longitudinal studies that investigated whether the vestibular-dependent temporal lobe orientation function can be enhanced through an intervention aimed towards improvements in balancing skills. The goal of this study was to find out whether learning how to slackline over a period of one month can be of benefit for both stability and orientation skills.

#### MATERIALS AND METHODS

fnhum-11-00055 February 8, 2017 Time: 14:51 # 3

#### Ethics Statement

This study was carried out in accordance with the recommendations of and was approved by the Medical Faculty Ethics Committee at the Otto von Guericke University (approval number: 156/14). Each participant signed a document of informed consent before the beginning of the study.

### Subjects

Fifty healthy young (18 to 30 years old) subjects were recruited for this study and randomly assigned (without stratification) into two groups, control (12 females and 13 males; mean age = 23.2 years; SD = 2.6 years) and training (11 females and 14 males; mean age = 24.4 years; SD = 2.7 years) (**Table 1**). The two groups did not significantly differ in any of the recorded demographic and other characteristics, including age, height, weight, years of education, handedness etc. Physical activity was assessed by asking subjects how many hours they spend on sports weekly on average; all sports were taken into consideration, including jogging, various team sports, cycling etc., but not walking. Participants of both groups were paid the same amount of money for their participation in the study. Sample size and characteristics, as well as the balance-training duration have been justified by several previous slackline- and other balance-training studies (Zech et al., 2010; Pfusterschmied et al., 2013).

Eligible subjects for this study were all those aged from 18 to 30 years who had no previous experience in slacklining or similar activity (i.e., highly demanding balancing activities, such as ballet dancing, rhythmic gymnastics etc.) and normal or corrected to normal vision. Exclusion criteria were injuries to the musculoskeletal system and systemic diseases (e.g., cardiovascular, metabolic, nervous system diseases etc.). Participants were recruited through advertisement in the buildings of Otto von Guericke University in Magdeburg, both at the main and medical campus.



#### Study Design

Flow diagram of the study is shown in the **Figure 1**. This study was planned and organized as a randomized controlled singleblinded trial with factorial design (factors: time and group). Participants were randomly assigned to the training and control groups using computer-based randomization procedure<sup>1</sup> . The computer-based randomization and assignment of participants to groups were performed by MD (not involved in data collection), with all other investigators blinded to the outcome of the randomization.

The study consisted of measurements at two time points: baseline and one month (±2 days) after baseline. All trainings took place in the movement lab of our institute (German Center for Neurodegenerative Diseases) from February to April 2015.

## Intervention

During this one month period the training group underwent intensive balance training consisting of 12 trainings (three trainings/week with each training lasting 1 h; max. 2 consecutive non-training days) on a 3-m long slackline ("Power-wave 2.0" slackline rack), whilst the control group was instructed to abstain from any type of similar activity; the abstinence from this type of activity was confirmed by control group participants at the post-test.

Trainings were led and supervised by an experienced instructor, whose assignment was to achieve the best possible skill level in the training group participants; content of teaching is shown in the **Table 2**. Minimum requirement to be achieved was set to walking forward two slackline lengths with turn at the end of the first length; each participant must have achieved this minimum requirement to be considered for the analysis, and all participants were successful in achieving this. Each training unit consisted of a 10-min warm up session and 50-min training session. Maximum group size allowed was four participants, so the instructor could dedicate enough time to each trainee. Moreover, the trainings were highly individualized, according to the skill and progression levels of each of the participant. At the end of each training session the instructor collected the information about skill progression, by writing down the achieved skill level of each participant. To do so, the amount of time every participant needed to walk up to four slackline lengths forward, backward, sideways, and turn in between was recorded.

The slackline tension was also individualized, so that when standing in the middle and applying a light vertical force (as during walking) the slackline would not get more than several centimeters away from the metal bar located 15 cm underneath. Our goal here was to increase difficulty of training by keeping the slackline slack and thus more unstable, rather than tight and stable, which would otherwise resemble walking on a firm surface. The length of the slackline was also intentionally set to 3 meters; in this way we wanted to achieve a higher rate of turns on the slackline, and thus a higher rate of semicircular canal stimulation. This is in contrast to earlier studies which used moderately to much longer slacklines (5 to over 15 m in length) (Granacher et al., 2010; Pfusterschmied et al., 2013; Donath et al., 2016b),

<sup>1</sup>www.randomizer.org

Control (n = 25)

stimulating thereby mainly otolith organs and related central vestibular pathways.

# Tests

All tests were performed by two trained members of our institute and the results of the following sets of measurements were recorded before and after the training:

TABLE 2 | Contents participants were taught during slackline-trainings; the minimum difficulty level they had to achieve in order to be considered for the final analysis is also presented.


#### Clinical Balance Test

Considering that every CBT has its advantages and disadvantages (Mancini and Horak, 2010), this comprehensive test was developed by experts in our institute (DZNE) with the goal to assess different components of patients' ability to maintain equilibrium, in both standing and gait conditions. Many of the test conditions are consistent with similar comprehensive CBTs (Mancini and Horak, 2010) used in other clinics. The interrater reliability of the test (determined with ICC coefficient) is 0.98 ± 0.04 (SEM = 0.003), and its validity is still to be evaluated. The conditions can be briefly divided into standing and walking (**Figure 2**), both of which further contain sub-conditions with open and closed eyes (for detailed list of conditions see the **Table 3**).

Standing conditions include:

• two- and one-leg stance on both stable (floor) and unstable (soft pad) surfaces, with both open and closed eyes

Walking conditions included:

FIGURE 2 | Examples of clinical balance test (CBT) conditions: unstable surface one-leg stand (left) and balance beam walking (right) conditions.


In total, there are 30 assessment items within this test, 14 of which assess standing and 16 walking; 8 of all measurements are performed with closed eyes. The maximum amount of points that could be collected on the test was 90, with each condition carrying the minimum of 0 and the maximum of 3 points, similar to other comprehensive CBT batteries (Horak et al., 2009). Assessment was based on the subjective opinion of trained assessor who graded postural sway during each of the conditions; to avoid potential differences in subjective opinion between the assessors, each participant was tested by only one assessor at both preand post-test. In each of the standing conditions participants were instructed to maintain the required position for 15 s, whereas in walking conditions there was no time requirement and participants were asked to walk at their own pace.

#### Orientation Test (OT)

Orientation test was a modified version of the test described by (Allen et al., 2004), whereby the only modification was the inclusion of only three conditions (turning angles) from this study, due to time and space limitations. In brief, six triangular paths were marked on the floor of a room, three in the left and three in the right direction, giving thus three pairs of triangular paths. Lengths of the segments of each triangular path as well as the turn angles between the segments of the triangles are presented in the **Table 4**, with examples of the polygon and test conditions shown in the **Figures 3** and **4**. The test consisted of two conditions: active-walking and passive-wheelchair.

In the active-walking condition, while being guided on foot, the participant's movement was controlled by leading him or her along two sides of the triangular path as he or she held onto a wooden bar. The passive-wheelchair condition included transport along the same routes with the use of a standard wheelchair with attached footpads.

Each participant was walked (active) and pushed (passive) only once along each of the paths, giving thus 12 trials per participant in total (3 to the left and 3 to the right, times 2 conditions).

Once the participant was walked/pushed in the wheelchair along two sides of each triangle, his or her task was to walk along the third one, back to the starting point, using thus the shortest possible way back; that is, the participants were instructed not to walk back along the two sides that were used to bring them to the drop-off point, but to use the shortest possible way back to the starting point instead, which is actually always the third side of the respective triangle.

The main outcome variable was the distance error on each trial, which was assessed by marking the participant's stopping point with adhesive dots on the floor and later measuring the distance from that stopping point to the starting point, from which the respective movement was initiated. The dots were placed on the floor exactly between the feet, aiming thus for the center of pressure, by second assessor, so that the first assessor could focus on giving instructions and guiding participants. After each trial participants were led or pushed back from the stopping point to the starting point, which was for the whole test at the same location, so the next trial could begin.

For the whole duration of the test participants were blindfolded in a quiet room and thereby could not use any visual nor auditory cues that might help them in finding their way back to the starting point. It can thus be assumed that the only cues

#### TABLE 3 | Test conditions of the clinical balance test (CBT).

fnhum-11-00055 February 8, 2017 Time: 14:51 # 6


they could use were somatosensory and vestibular in the activewalking condition and vestibular only in the passive-wheelchair condition.

#### Outcome Variables and Data Analysis

Pre-specified primary outcomes were improvement in score (in points) on the CBT and decrement in average error distance (in cm) on the orientation test (OT).

TABLE 4 | Length of segments and turning angles of triangular paths in the orientation test (OT).


Data were analyzed with MatLab (Mathworks, USA) and SPSS (IBM, USA) software. Statistical analysis included paired t-tests for within group analyses and repeated-measures-ANOVAs with time and group as factors for between group and interaction effects analyses. The significance level was set to α = 0.05. The descriptive results are shown as mean ± standard deviation; in addition, effect sizes (η 2 p ) and 95% confidence intervals of change are reported; the effect size magnitude of ≥0.01 indicated small, ≥0.059 medium and ≥0.138 large effects(Cohen, 1988; Donath et al., 2013). All of the datasets were checked for normal distribution and homogeneity of variance before running parametric tests.

# RESULTS

Final analysis included 25 participants in each group. Two participants (one from each group) were not considered for the analysis because of major outliers, reaching more than 2 standard deviations away from the mean score of all participants. All subjects were recruited from December 2014 until March 2015 and their characteristics are shown in the **Table 1**.

### Clinical Balance Test

**Figure 5** shows both results of the overall test as well as the results for closed eyes condition of the CBT; the respective significance levels are summarized in the **Table 5**.

When overall results are considered, both of the groups demonstrated pre- to post-training improvements. In the training group this improvement was on average 5.1 points (71.8 ± 5.2 to 77.0 ± 4.5) whereas in the control group it amounted to 2.4 points on average (71.1 ± 6.4 to 73.50 ± 4.4). The interaction effect here was not large enough to reach our preset significance level and the effect size was small (p = 0.166, η 2 <sup>p</sup> = 0.039) (**Figure 5**; **Table 5**).

In contrast to the overall test results, when only those conditions were analyzed in which the participants had their eyes closed, a significant interaction effect with medium to large effect size was observed (p = 0.011, η 2 <sup>p</sup> = 0.128), as can be seen from the **Figure 5** and **Table 5**. In these conditions the training group improved (13.7 ± 1.8 to 15.4 ± 2.2) while the control group performed slightly worse on the post-test (13.7 ± 2.6 to 13.6 ± 2.4).

The results from test conditions where participants had their eyes open did not reach significant interaction effect (p = 0.594). A learning effect could be observed here, with very similar improvements of about 3 points in both the training and control group (**Table 5**).

#### Orientation Test

Overall OT results gave a non-significant interaction effect with very small effect size (p = 0.063, η 2 <sup>p</sup> = 0.006) (**Figure 6**; **Table 5**). Errors in the training group decreased by 11 cm (114 ± 68 to 103 ± 62) whereas the error in the control group increased slightly by 2 cm (111 ± 74 to 113 ± 75) (**Figure 6**; **Table 5**).

Further analysis of the wheelchair condition results revealed a much larger improvement in the training group compared to the control group; the training group improved by about 21 cm (131 ± 75 to 110 ± 63) in comparison to a very small 1 cm (121 ± 68 to 120 ± 79) improvement in the control group. This difference in improvements between the two groups that occurred over time led also to a significant interaction effect with small effect size (p = 0.049, η 2 <sup>p</sup> = 0.013) (**Figure 6**; **Table 5**).

Lastly, the condition where participants were walking while actively guided over the polygon did not reveal a significant time x group interaction effect (p = 0.591). Within this condition of the OT the training group remained at about the same level of


error while the control group performed worse at the post test by about 4 cm (**Table 5**).

#### DISCUSSION

The main findings of this study are twofold, both of which are supportive of the a priori hypothesized improvements of vestibular system function in response to intensive balance training.

Firstly, 1 month of intensive balance training during which participants learned how to slackline, led to significantly better performance of our training group participants on the CBT compared to their control counterparts, but only on those measurements where their visual input was blocked, i.e., where

they had to balance with eyes closed. The magnitude of the effect of slackline-training here was medium to large. In contrast, on tasks where visual input was not blocked, both groups improved about the same, thus revealing a potential practice effect which might have taken place between pre- and posttest. Considering that the input from three systems involved in balance maintenance is present normally in a moving person (visual, vestibular and somatosensory) (Horak, 2006), it appears from our test results that the vestibular and somatosensory systems were particularly affected by the slackline-training. Secondly and similarly to the previous finding, the training group performed significantly better on the OT compared to the control group, but again only in one condition, namely the passivewheelchair condition (passively pushed along the designated

routes). In this condition the input was intentionally limited to the vestibular system, and the performance thus depended solely on the function of the vestibular system and related brain regions which process this input. Many connections have been proposed to exist between the vestibular system and temporal lobe, in particular the hippocampus, for the purpose of processing these spatial and orientation inputs (Hitier et al., 2014). Once more, the results of the OT used in our study allow us to speculate that vestibulo-hippocampal spatial orientation function has been positively affected by the slackline-training, with a small effect size.

Many earlier studies used numerous diverse approaches to enhance balancing skills in various target groups (Zech et al., 2010; Sherrington et al., 2011). The majority of balance trainings were reported to be successful in improving outcome variables in healthy young (Zech et al., 2010) and elderly (Sherrington et al., 2011; Cadore et al., 2013; El-Khoury et al., 2015) participants, athletes (Hubscher et al., 2010; Boccolini et al., 2013), as well as patients suffering from Alzheimer's (Ries et al., 2015) and Parkinson's disease (Sehm et al., 2014), post-stroke patients (Lubetzky-Vilnai and Kartin, 2010) and patients with vestibular disorders (Porciuncula et al., 2012). A literature review pertained to our first finding (stability improvement in closedeyes conditions of CBT) revealed that similar studies (involving slackline-training) published before suggested large task-specific improvements (standing on slackline) in response to training but only small to moderate non-task specific improvements (for meta-analytical review see (Donath et al., 2016a)). However, these studies used different training and evaluation methodologies; that is, the only non-task specific transfer effects evaluated were postural sway displacement and velocity changes, while participants stood with open eyes on a firm or suddenly perturbed flat surface of a force platform, mostly in one-leg and tandem stance modes. In contrast to these studies, for our analysis outcome from comprehensive clinical balance assessment was used, in which the standing conditions included standing on both and each leg separately (not only one by own choice) in open and closed eyes conditions, on a firm flat but also on a soft, unstable surface. In fact, our main finding here was related to the larger improvement in the closed eyes conditions, which was not even assessed by these studies; for the open-eyes conditions we could also not find any significant effects. Furthermore, our training methodology differed from that applied in previous studies in at least two points: (a) it involved more hours spent on the slackline (around 600 min vs. an average of 380 min in other studies) and was implemented on slacklines of shorter length (3 m vs. 5 to over 15 m in other studies). As we already mentioned earlier, this slackline length was intentionally chosen for the purpose of stimulating semicircular canal function, in addition to that of otolith organs; this important input (Highstein, 1991; Cullen and Minor, 2002) might have been neglected in other training interventions and its effects could hence have been overlooked. Regarding the training intensity, variation in intensity of motor training has already been shown to differentially affect the skill learning and brain structure (Sampaio-Baptista et al., 2014), an effect which could have also contributed to our results. One of previous studies investigated improvements in balancing skills with both open and closed eyes in response to 6 weeks of balance training (Strang et al., 2011). Their results from postural movement measurement were, interestingly, very similar to our results; in the eyes closed condition they noticed a significant improvement while in the eyes open condition no significant change could be observed. The authors argued that this finding was to be expected, because only imposing a constraint during test, such as blockading visual input, would allow the effects of training to emerge. Another study on basketball players also reported improvements in tests with closed eyes in response to a 6-week balance training (Zemkova and Hamar, 2010). Whereas in that study improvements were seen mainly in dynamic balance tests we found them in the static balance tests only, consisting of various conditions on stable and unstable surfaces, which might be due to methodological differences between the studies; that is, the training methods differed and only one dynamic test condition was performed with closed eyes in our CBT, whereas all the other closed eyes conditions of the CBT belonged to the static group. Since the participants improved significantly on the closed eyes conditions, this had to be to the greatest extent within the static conditions. Had, however, our test involved more dynamic conditions with closed eyes, it appears from our results that it would have been reasonable to expect a significant difference in the amount of improvement between groups there as well.

The importance of stimulating both the rotational (semicircular canal function) and the translational (otolith organs) component of the vestibular system becomes obvious and is also crucial for our second finding. Namely, this was to show that the link between the vestibular system and its central vestibular-dependent spatial-orientation brain regions, primarily hippocampal regions (Hitier et al., 2014), can be affected by an adequately designed slackline-training. No previous studies investigated this possibility, making consequently our results novel in that sense. After learning how to slackline, our participants were able to return to the starting position more precisely after being taken away from it in a wheelchair along three different triangular paths. The triangle completion task was already used by many previous studies, mainly to examine the difference between younger and older persons in their ability to navigate in space (Allen et al., 2004; Adamo et al., 2012) or to investigate functions of the medial temporal lobe (Wolbers et al., 2007; Wiener et al., 2011). Consequently, the design of these studies was cross-sectional and no particular treatment was used to improve this ability over time. Our study is the first one to our knowledge to show transfer effects of slackline-training on orientation abilities in young people assessed with this task. Several authors studied rats to demonstrate the importance of the vestibular system for successful orientating in space (Stackman and Herbert, 2002; Russell et al., 2003; Smith et al., 2005). It has been shown that peripheral vestibular deficiency leads to impairments in functioning of the medial temporal lobe in spatial orientation tasks as well as in spatial learning. These impairments are due to alterations in electrophysiological and neurochemical signaling between the two systems. Other previous studies went on further to investigate the importance of the vestibular system for orientation in humans (Brandt et al., 2005; Hüfner et al., 2011; Previc et al., 2014), thereby confirming

the findings of animal studies. The structure of the hippocampal formation has been found to be altered in persons who suffer from vestibular deficiency, but also in persons who need to rely heavily on their vestibular system because of their profession, for example ballet dancers. It has even been proposed that vestibular system degeneration might be a significant contributor to development of the Alzheimer's disease (Previc, 2013). Although our study sample consisted of young and healthy subjects, considering neuroplasticity principles in response to motor task learning over the entire lifespan (Dayan and Cohen, 2011), it is legitimate to hypothesize that similar results could be expected in older populations, particularly as a prevention strategy in those at early stages of dementia. Some studies could not find significant relevant transfer effects of slackline-training in this population (Donath et al., 2016b), but, as discussed earlier, the methodological issues might have contributed to such findings; in our opinion additional research on this topic is required to answer this question.

Therefore, as far as the external validity or generalizability of our findings is concerned, our sample consisted of young and healthy (18–30 years old) subjects, and the results are thus mostly applicable to the same population. Considering, however, that many balance-training interventions can benefit both healthy and diseased populations of various ages in their original form (Taubert et al., 2010; Sehm et al., 2014), it is reasonable to assume that similar interventions to that used in our study could be beneficial for healthy older or even nonhealthy older populations, in the direction obtained with our younger sample. It would be of a particular interest for us to see if similar interventions would demonstrate a significant gain in patients suffering from various stages of neurodegeneration, from those with mild cognitive impairment to those with Alzheimer's disease, in whom the spatial orientation capabilities are considerably reduced (Allen et al., 2004).

There are several limitations of our study which we would like to list here. First, our CBT has not yet been validated; however, many of its items resemble those applied in other validated CBTs and its inter-rater reliability is very high. Another argument against could be that the subjective nature of our postural sway assessment is less accurate than the quantitative assessments at force platforms; most CBTs, however, are subjective opinion-based tests, but are comprehensive and specifically designed to assess various components of balancing abilities, and remain important and valid assessment tools for this purpose (Mancini and Horak, 2010). Our OT comes from the extensively used triangle completion test that assesses orientation abilities; still, only a subset of conditions was applied

#### REFERENCES


in our study, in accordance with availability of facilities at our institute; although the error distance measurement has been performed thoroughly, reliability data is still to be provided. Secondly, we did not report any follow up results which would signify a potential of this training to cause eventual retention of the achieved effects over a longer period of subsequent inactivity. Third limitation can be considered the fact that we yet have to show neural correlates of our behavioral improvements, by analyzing pre/post MR data. Finally, our participants performed the training with open eyes; it would be interesting to know whether the same training performed with closed eyes would bring any different results, compared to both the control group and the actual training group, since in this third group visual input would be blocked. We will attempt to successfully deal with these limitations in our future work.

#### CONCLUSION

Our results indicate that 1 month of intensive balance training, through learning how to slackline, is a successful novel approach for enhancing clinically relevant balancing abilities in conditions with closed eyes and simultaneous improvements in vestibulardependent spatial orientation capability; both of the benefits are possibly caused by positive influence of slackline-training on vestibular system function, and possibly its connectivity with temporal lobe regions responsible for orienting in space, such as the hippocampus. We can highly recommend this method, both its intensity and type, to all young persons who need to improve functioning of their vestibular system, either for the purpose of increasing stability, upgrading spatial orientation abilities or both. Modifying the training protocol could be also potentially of advantage for healthy elderly and those at risk of neurodegeneration of the medial temporal lobe orientationsystem, such as in AD, but this is yet to be proven by future studies.

#### AUTHOR CONTRIBUTIONS

MD: Study planning and organization, data collection, data analysis, paper writing, paper revision, paper submission. AH: Study planning and organization, paper revision. PM: Data collection, paper revision. KR: Study planning and organization, paper revision. NM: Study planning and organization, data analysis, paper writing, paper revision.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Dordevic, Hökelmann, Müller, Rehfeld and Müller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Motor-Enriched Learning Activities Can Improve Mathematical Performance in Preadolescent Children

Mikkel M. Beck 1, 2 †, Rune R. Lind1, 2 †, Svend S. Geertsen1, 2, Christian Ritz <sup>1</sup> , Jesper Lundbye-Jensen1, 2 and Jacob Wienecke1, 2 \*

*<sup>1</sup> Department of Nutrition, Exercise and Sports, University of Copenhagen, Copenhagen, Denmark, <sup>2</sup> Department of Neuroscience and Pharmacology, University of Copenhagen, Copenhagen, Denmark*

Objective: An emerging field of research indicates that physical activity can benefit cognitive functions and academic achievements in children. However, less is known about how academic achievements can benefit from specific types of motor activities (e.g., fine and gross) integrated into learning activities. Thus, the aim of this study was to investigate whether fine or gross motor activity integrated into math lessons (i.e., motor-enrichment) could improve children's mathematical performance.

#### Edited by:

*Claudia Voelcker-Rehage, Chemnitz University of Technology, Germany*

#### Reviewed by:

*Jascha Ruesseler, University of Bamberg, Germany Caterina Pesce, Italian University Sport and Movement "Foro Italico", Italy*

> \*Correspondence: *Jacob Wienecke wienecke@nexs.ku.dk*

*These authors have contributed equally to this work.*

*†*

Received: *15 September 2016* Accepted: *05 December 2016* Published: *23 December 2016*

#### Citation:

*Beck MM, Lind RR, Geertsen SS, Ritz C, Lundbye-Jensen J and Wienecke J (2016) Motor-Enriched Learning Activities Can Improve Mathematical Performance in Preadolescent Children. Front. Hum. Neurosci. 10:645. doi: 10.3389/fnhum.2016.00645* Methods: A 6-week within school cluster-randomized intervention study investigated the effects of motor-enriched mathematical teaching in Danish preadolescent children (*n* = 165, age = 7.5 ± 0.02 years). Three groups were included: a control group (CON), which received non-motor enriched conventional mathematical teaching, a fine motor math group (FMM) and a gross motor math group (GMM), which received mathematical teaching enriched with fine and gross motor activity, respectively. The children were tested before (T0), immediately after (T1) and 8 weeks after the intervention (T2). A standardized mathematical test (50 tasks) was used to evaluate mathematical performance. Furthermore, it was investigated whether motor-enriched math was accompanied by different effects in low and normal math performers. Additionally, the study investigated the potential contribution of cognitive functions and motor skills on mathematical performance.

Results: All groups improved their mathematical performance from T0 to T1. However, from T0 to T1, the improvement was significantly greater in GMM compared to FMM (1.87 ± 0.71 correct answers) (*p* = 0.02). At T2 no significant differences in mathematical performance were observed. A subgroup analysis revealed that normal math-performers benefitted from GMM compared to both CON 1.78 ± 0.73 correct answers (*p* = 0.04) and FMM 2.14 ± 0.72 correct answers (*p* = 0.008). These effects were not observed in low math-performers. The effects were partly accounted for by visuo-spatial short-term memory and gross motor skills. Conclusion: The study demonstrates that motor enriched learning activities can improve mathematical performance. In normal math performers GMM led to larger improvements than FMM and CON. This was not the case for the low math performers. Future studies should further elucidate the neurophysiological mechanisms underlying the observed behavioral effects.

Keywords: children, motor skills, exercise, integrated physical activity, academic achievement, cognition, learning

#### INTRODUCTION

The acquisition and development of mathematical skills can be seen as a central cognitive attribute in a modern technological society. Successful acquisition of basic mathematical skills early in life provides a framework which later academic achievements are based upon (Duncan et al., 2007), and is a predictor of future academic and professional success (Butterworth, 2005; Parsons and Bynner, 2005). Consequently, it is an important area for researchers in the field of behavioral neuroscience to identify strategies to improve mathematical skill acquisition in children and to explore the mechanisms involved in the acquisition of academic skills.

An emerging line of research has focused on investigating the relationships between physical activity, cognitive functions and academic achievements in children (Hillman et al., 2008; Diamond and Ling, 2016; Donnelly et al., 2016; Pesce and Ben-Soussan, 2016; Vazou et al., 2016; Tomporowski et al., 2015). The term physical activity is a comprehensive concept covering various activities. These activities cover cardiovascular exercise focusing on the quantitative characteristics of the activity (e.g., intensity and duration) aiming at improving the cardiovascular fitness, in addition to activities concerned with the qualitative characteristics of the physical activity (e.g., the coordinative demands and cognitive engagement) leading, for instance, to improved motor skills (Pesce, 2012; Diamond, 2015). Currently, the majority of the performed studies have been concerned with linking cognitive functions (Hillman et al., 2005; Voss et al., 2011) and academic achievements (Castelli et al., 2007; Chaddock-Heyman et al., 2015) to cardiovascular fitness in cross-sectional designs. Recent cross-sectional studies have positively linked motor skills to cognitive and academic measures (Kantomaa et al., 2013; Lopes et al., 2013; Haapala et al., 2014; Geertsen et al., 2016) and recent reviews have stressed the importance of the qualitative characteristics of the performed physical activity as compared to the quantitative characteristics of the physical activity (Best, 2010; Pesce, 2012; Diamond, 2015). In general, however, less focus has been paid to the qualitative characteristics of physical activity, and the relation of these to cognitive functions and academic achievements.

In addition to cross-sectional findings, interventional studies have also focused on the potential of cardiovascular exercise to facilitate cognitive performance and academic achievements (for reviews, see Hillman et al., 2008; Pesce and Ben-Soussan, 2016; Vazou et al., 2016; Donnelly et al., 2016). The theoretical framework for these effects are related to the structural and functional differences and adaptations associated with or resulting from the increased cardiovascular fitness or exercise (Hillman et al., 2008). Indeed, higher-fit preadolescent children display greater gray matter hippocampal volumes (Chaddock et al., 2010), lower gray matter thickness in the frontal cortex (Chaddock-Heyman et al., 2015) and greater white matter integrity (Chaddock-Heyman et al., 2014) which altogether might translate into superior cognitive and academic performance (Chaddock et al., 2010; Chaddock-Heyman et al., 2015). Moreover, interventions focusing on various cardiovascular engaging activities result in brain electrophysiological adaptations, including an increased amplitude of the P3 component of the event-related potentials (ERPs), indicating a more efficient allocation of attentional resources (Polich, 2007). These effects are speculatively based upon neurobiological and molecular events related to cardiovascular exercise which positively affect neuroplastic processes within the central nervous system (Gomez-Pinilla and Hillman, 2013). Again, less focus has been paid to the qualitative characteristics of the physical activity. However, Schmidt et al. (2016) highlighted the promising effects of a cognitively engaging 6-week physical activity intervention in promoting executive functions independently of the exercise intensity. Furthermore, Chang et al. (2013) found a positive effect on measures of cognitive functioning of an 8-week intervention focusing on coordinative demanding intensity-independent physical activity. This positive effect was related to brain electrophysiological measures, including both increased amplitudes and shorter latencies of the P3 ERP component, reflecting more efficient and faster cognitive processing (Polich, 2007).

Altogether, these studies highlight two different approaches of physical activity to facilitate cognitive functions, and suggest differential, but beneficial, mechanisms for the observed behavioral effects (Voelcker-Rehage and Niemann, 2013). This has led to research conducted to investigate the effects of including both quantitatively and qualitatively approaches of physical activity as interventions in ecologically valid school settings to promote cognition and academic achievements. One way of investigating the impact of physical activity on cognition in school settings is to include classroombased physical activity, prescribed either as physical activity breaks ("energizers") or physical activity integrated into the academic curriculum. Ahamed et al. (2007) implemented quantitative, cardiovascular "energizers" of 15 min in a 16 month intervention, but did not see any positive effects for the intervention group. Conversely, Mullender-Wijnsma et al. (2016) found that 15-min of classroom-based integrated cardiovascular exercise improved the mathematical and spelling performance to a greater extent than conventional teaching. A paucity of literature exists investigating classroom-based qualitatively focused physical activity. However, a newly published study by Vazou et al. (2016) employed classroom-based integrated multifaceted physical activity into the taught academic content and found greater improvements in mathematical performance for the intervention group compared to a control group.

Collectively, studies evaluating the effects of classroombased physical activity (prescribed as breaks or integrated into the curriculum), whether focusing on quantitative or qualitative characteristics, yield inconsistent results. In line with this, Donnelly et al. (2016) recently proposed the need for further research in this specific field to delineate the potential effects. Furthermore, no previous studies have investigated the effects of different qualitatively focused motor enriching interventions (fine vs. gross motor skill) on cognitive and academic performance in children. Recently published crosssectional study have indeed indicated that different coordinative motor skills, including both gross and fine motor skills, are associated with objective measures of cognitive functions and academic achievement, including mathematical performance, in children (Kantomaa et al., 2013; Haapala et al., 2014; Geertsen et al., 2016). While these associations are intriguing, they are correlational in nature. Therefore, longitudinal interventional studies investigating potential causal effects of integrating different types of qualitative motor activities, including both gross and fine classroom-based motor activities on measures of academic achievements are needed.

Furthermore, a number of studies have aimed at investigating the differential effects of physical activity on cognitive functioning and academic achievement related to the baseline performance of the children (high and low performers). These studies have primarily employed interventions focusing on the quantitative characteristics of the physical activity. Acute cardiovascular exercise interventions have yielded the strongest effects in individuals with the lowest cognitive baseline performance (Mahar et al., 2006; Pontifex et al., 2013; Drollette et al., 2014), highlighting an interesting possibility of using physical activity to support those individuals who need it the most. Yet, the literature regarding the effects of chronic physical activity interventions are sparse. One study that investigated this found that typically developing children benefitted the greatest from a physical education based intervention with additional cognitive demands (cognitive enrichment), while children with coordinative impairments did not (Pesce et al., 2013). This points to the need of optimally challenging every individual. Whether this is the case for classroom-based physical activity interventions is currently unknown. Indeed, studies evaluating classroom-based integrated qualitative physical activity in relation to the children's cognitive and academic baseline performance are lacking.

Additionally, previous studies evaluating the effects of classroom-based physical activity interventions have not accounted for cognitive and motor covariates related to academic achievements, including executive functions (St. Clair-Thompson and Gathercole, 2006; Bull et al., 2008), shortterm memory (Raghubar et al., 2010) and motor skills (e.g., Geertsen et al., 2016). Importantly, theoretical models have been proposed, suggesting a mediating role for cognitive efficiency, and particularly executive and metacognitive functions, in the relationship between physical activity and academic achievements (Howie and Pate, 2012; Tomporowski et al., 2015). In line with this, Pesce et al. (2016) recently proposed that improvements in children's ball skills mediated the effects of a coordinative and cognitively demanding physical activity intervention on measures of executive functioning. Moreover, as indicated in a review by Tomporowski et al. (2015) the number of studies investigating the stability and maintenance of the effects resulting from physical activity interventions on measures of cognitive functioning and academic achievement is sparse.

Taken together, the present study has three primary aims: (1) to investigate the immediate and maintained causal effects of classroom-based integrated gross and fine motor activity on mathematical performance in preadolescent children, (2) to investigate whether differences exist in the intervention effects between children characterized as normal and low mathematical performers, and (3) to investigate whether, and to what extent, different cognitive functions and motor skills contribute to the physical activity-academic achievement relationship.

We hypothesize that classroom-based gross and fine motor activity will result in an increased mathematical performance following a 6-week intervention compared to a conventional teaching strategy. We further hypothesize that the addition of extra coordinative demands will primarily benefit normal performing children, in relation to the optimal challenge point theory (Pesce et al., 2013). Finally, based on prior studies highlighting cognitive and motor skills as potential mediators of the effects of physical activity, we hypothesize that both cognitive and motor performance will account for some of the interventional effects on mathematical performance.

## MATERIALS AND METHODS

#### Participants

We invited 186 children from the 1st grade level from three different Danish public schools, containing 9 different school classes, in the Copenhagen area. The schools were selected based on similar demographic profiles (determined by the placement of the schools) and on grade-based graduation performance (included schools ranked 48, 56, 57, of 59 public schools in Copenhagen). One-hundred-sixty-five children (77 girls, mean age = 7.5, SEM = 0.02) were included in the study after obtaining written consent, corresponding to 89% of the invited children (see **Table 1** for demographic characteristics within each intervention group). School classes were stratified based on baseline mathematical performance, and cluster-randomly allocated to one of three groups, explained in detail below (**Figure 1**). The study was approved by the Ethical Committee of Copenhagen, Denmark (protocol: H-15009418), and was carried out in accordance with the Helsinki Declaration II.

#### Intervention Groups

The three groups were gross motor math (GMM), fine motor math (FMM) and control (CON). The groups mainly



*Data reported as mean* ± *SEM. Estimate of cardiovascular fitness obtained using the Andersen test (Andersen et al., 2008). No significant between-group differences were observed for any of the measures. BMI, body mass index.*

differed in the applied teaching methods. GMM focused on integrating gross motor movements into the learning activities covering the mathematical curriculum (gross motor enrichment), using different gross motor movements supporting the mathematical principles and procedures to be acquired. Desks and chairs were moved to the sides in the classroom ensuring adequate space for performing gross motor movements. Children in the GMM group performed inter-limb gross motor movements that alternated between dynamic and static movements and involved a large range of movement (e.g., skipping, crawling, hopscotching, throwing, one-legged balance). The gross motor movements were performed while solving mathematical problems throughout all lessons (lasting approximately 60 min each). FMM focused on integrating fine motor movements into the learning activities covering the mathematical curriculum (fine motor enrichment). The learning activities involved a modified version of the LEGO MoreToMath <sup>R</sup> concept. The children used fine motor activity to manipulate LEGO <sup>R</sup> bricks supporting the mathematical principles and procedures to be acquired. The children were primarily sitting at their desks throughout the lesson. The children bimanually selected, moved and modeled the bricks using both hands and fingers while solving mathematical problems throughout all lessons (lasting approximately 60 min each). CON employed conventional math teaching, and was restricted to not make use of additional motor activity or cardiovascular exercise during the math lessons. In all three groups (i.e., CON, FMM, GMM), the children worked individually or in small groups during the lessons. During the intervention, the learning activities in the three groups were matched on content, i.e., the mathematical principles and procedures to be acquired, and time of the day of the math lessons. The standardization within and between the intervention groups was ensured through three workshops of 3 h for the involved teachers hosted by the experimental staff. The workshops were conducted prior to the allocation of classes into groups. In addition to thorough instructions, teachers received thoroughly designed teaching manuals, describing what, when and how the teaching should be conducted during the intervention period. Furthermore, the experimental supervisors were in ongoing dialogue with the teachers of all three intervention groups to ensure the intended intervention, however no quantitative measures were obtained during this process.

#### Procedure

The children had their mathematical achievement, cognitive functions and motor skills tested at three time-points. Testing took place before (T0), immediately (T1), and 8 weeks (T2) after the intervention. Substantial effort was paid to ensure that the children were tested at the same time of the day ± 1 h, in the same sequence of testing, and by the same experimental supervisor at all three time-points. At T0, the children's cardiovascular fitness was also tested. The intervention lasted 6 continuous weeks, and included 3 weekly math lessons of approximately 60-min duration. During the intervention, the amount and intensity of the physical activity (or load) during the math lessons was evaluated through spot tests involving accelerometers and heart rate monitoring.

#### Measures

#### Mathematical Test

The children's mathematical achievement was tested through a paper-and-pencil, standardized, diagnostic test developed by experts within the neuropsychological field of mathematical testing in Denmark (Hogrefe Psykologiske Forlag A/S, Virum, Denmark). Specifically, the test consisted of 50 mathematical tasks to be solved, including 39 1st grade tasks and 11 2nd grade tasks. The tasks covered different age-related mathematical themes, including arithmetic and geometry. The test was conducted individually in a classroom with 15–20 children completing it simultaneously under supervision from two experimental supervisors and the teacher of the class. The children were thoroughly instructed prior to the test. During the test, each problem was presented verbally in a standardized format by an experimental supervisor. Next, the children were to solve the presented problems. When the entire classroom had answered, the test proceeded in a similar fashion. Halfway through the test, the children had a 10 min mandatory break. The entire testing session lasted between 60 and 90 min. The tests were reviewed offline by a single experimental supervisor using standardized guidelines provided by the authors of the test.

#### Cognitive Tests

Three standardized cognitive tests were applied to estimate the capacity of the children in different domains of cognitive functions, including executive functions and short-term memory. The cognitive tests were completed on a computer in a one-to-one session between a child and an experimental supervisor.

#### **Executive functions**

The children's executive functions were assessed using a computer-based modified Eriksen Flanker Task (Eriksen and Eriksen, 1974). Since the test was carried out in young children, the stimuli consisted of fish requiring feeding (e.g., Hillman

et al., 2014; Schonert-Reichl et al., 2015; Vazou and Smiley-Oyen, 2014). The children were comfortably placed in front of a 15.4′′ laptop placed at a distance that allowed them to press the response buttons with their index fingers, with their elbows resting on the edge of the table. The laptop presented the stimuli using Presentation (Neurobehavioral Systems Inc., California, USA). Stimuli (90 × 10 mm) were placed in the center of the screen on a white background. The children were presented with congruent (compatible) stimuli (i.e., > > > > >) and incongruent (incompatible) stimuli (i.e., > > < > >) in a pseudo-randomized sequence with an equiprobable (0.5) frequency of congruent and incongruent stimuli. The children completed a single block of 60 trials, preceded by four familiarization trials, ensuring task compliance. The children were instructed to respond to the inside stimulus, while ignoring the flanking stimuli as fast and accurate as possible. The children's response latency and accuracy were logged for both congruent and incongruent trials. Responses faster than 200 ms were considered as an anticipatory response and were excluded from the analyses. From the congruent and incongruent trials, interference effects (i.e., flanker effects) were computed as the difference between incongruent trials and congruent trials. These measures were used as an estimate of the children's inhibitory control (e.g., Hillman et al., 2009).

#### **Visuo-spatial short-term memory**

The children's visuo-spatial short-term memory was assessed using a spatial span test from the Cambridge Neuropsychological Test Automated Battery (CANTAB) (Cambridge Cognition Ltd, Cambridge, UK). The spatial span test is a neuropsychological test specifically assessing the memory for sequentially presented visuo-spatial information, and it has been used in previous studies assessing visuo-spatial short-term memory in children aged 4–12 (Luciana and Nelson, 2002). The children were comfortably placed in front of a 23′′ touchscreen and equipped with headphones. The touchscreen was placed approximately 30 cm from the edge of the table. During the test, nine white squares (43 × 43 mm each) were presented on a black background on the touch-screen, and the squares changed color one by one. After the presentation, the participants were to replicate the sequence by touching the squares with the index finger of their dominant hand. If correct, an additional colorchanging square was added to the sequence in a progressive manner (from 2 to a total of 9 squares). The longest completed sequence, the span length, was logged and used as a measure of the visuo-spatial short-term memory.

#### **Phonological short-term memory**

The children's phonological (semantic) short-term memory was assessed using a free-recall wordlist memory task inspired by Pesce et al. (2009). Specifically, the test evaluated the children's ability to remember as many words as possible out of 20. The words were all age-appropriate nouns. The children were comfortably placed in front of a 13.3′′ laptop and equipped with headphones. During the test, 20 words were visually displayed and presented orally in a standardized and timed sequence for 5-s each using a Microsoft PowerPoint presentation yielding a 100-s presentation time. After the presentation, the children were exposed to a 120-s period in which they were to sit with their eyes closed and remember the words. Then, the children had an additional 120-s period to verbally recall as many words as possible in a free-recall manner. The number of correctly recalled words were logged and used as a measure of the phonological short-term memory.

#### Motor Tests

Two motor tasks were applied estimating the children's gross and fine motor skills. The tasks were completed in a one-to-one session between a child and an experimental supervisor.

#### **Gross motor skills**

To evaluate gross motor skills, the children completed a coordination wall, which has previously been used to assess gross motor coordinative skills in preadolescent children (Geertsen et al., 2016; Larsen et al., 2016). The children were standing facing the coordination wall, which consisted of an upright rectangular 9 × 8 grid, with the numbers 1–10 distributed on the grid. Half of the numbers were blue, half red. Red and blue numbers appeared on both sides of the vertical midline. The coordination wall was split in two by a horizontal dividing line, yielding an upper section (top seven rows), and a lower section (bottom two rows). The children were equipped with a red dot on their right hand and a blue dot on the left hand and foot. The children were instructed to touch the numbers in the correct order, from one to ten, with their hands (upper section) or feet (lower section), according to the color of the number. The movements performed required crossing of the vertical midline. They had to complete the task as fast and accurate as possible. If a mistake was made, the children were immediately instructed to correct it, and proceed. Prior to the test the children were thoroughly instructed. Next, the children had three attempts on the coordination wall, and the shortest time (best time) of completion (in sec) was used as a measure of the children's bimanual, inter-limb coordinative gross motor skills.

#### **Fine motor skills**

The children's fine motor skills were evaluated using the Perdue Pegboard test (US Neurologicals LLC, Washington, USA). Specifically, the Perdue Pegboard (Tiffin and Asher, 1948) assesses manual dexterity and bimanual fine motor coordination, and has previously been used testing preadolescent children (Gardner and Broman, 1979). The children were comfortably placed on a chair in front of the Purdue Pegboard. Prior to test, the children's handedness were assessed using a name-writing test supplemented with self-report (Scharoun and Bryden, 2014), and the children were thoroughly instructed in the test. The test consisted of four sessions: 30-s uni-manual placement of pins using the dominant hand (i) and the non-dominant hand (ii). Thirteen-seconds bimanual placement of pins using both hands (iii), and 60-s bimanual assembly, consisting of three parts assembled in a specific sequence (iv). The latter served as a measure of the children's fine motor skills, expressed as the number of correctly assembled parts.

#### Cardiovascular Fitness

The children's cardiovascular fitness was estimated using the Andersen test (Andersen et al., 2008), which is a validated measure of cardiovascular fitness in 6–9 year old children (Ahler et al., 2012). The test was modified to fit the available space in the gyms of the involved schools. Prior to test, the participants completed a 5-min warm-up and were thoroughly instructed in the test protocol. In the test, the children ran between cones placed diagonally across each other in a distance of 17 m in 15-s intervals interspersed by 15-s mandatory breaks. The test was completed in 10 min. The children ran as fast as they could during the running intervals to cover the greatest distance possible. Heart rate was monitored (Polar Team 2 System, Polar, Finland), and the test was videotaped using a GoPro HERO4 camera (GoPro Inc., California, USA), to ensure that the performance of each child was logged. The videos were inspected offline and the total running distance was registered as a measure of cardiovascular fitness.

#### Physical Load during the Intervention

In a subsample of the children, covering all three intervention groups (n = 49), the physical load during the math lessons was estimated combining time-synced heart rate monitoring (Polar Team 2 System, Polar, Finland) and accelerometers (MinimaxX S4, Catapult Innovations, Canberra, Australia) on 6 occasions. Individual heart rate during the lessons was compared to each participant's maximum heart rate (HRmax) collected during the Andersen test, yielding an individual percent of HRmax. Time spent in low (0–60% of HRmax), moderate-to-vigorous (60–90% of HRmax) and high (90–100% of HRmax) heart rate zones during the lessons were used as outcome measures. The accelerometers sampled tri-axially [forward (fwd), sideways (side), upwards (up)] with a sampling-rate of 100 Hz. Based on the MinimaxX proprietary software (Sprint, Catapult Sports, Canberra, Australia) the physical load (i.e., player load) was computed by the software using the following formula:

Player load =

$$\sqrt{\left(f\nu d\_{\mathbb{Y}^1} - f\nu d\_{\mathbb{Y}^{-1}}\right)^2 + \left(side\_{\mathbb{X}1} - side\_{\mathbb{X}-1}\right)^2 + \left(\iota p\_{\mathbb{z}1} - \iota p\_{\mathbb{z}-1}\right)^2}$$

where fwdy, sidex, and up<sup>z</sup> indicate the accelerations in the forward, sideways and upwards plane, respectively. The player load is an arbitrary unit correlated with subjective perceived exertion measures of physical exhaustion (Casamichana et al., 2013), and the measure has previously been used to assess the intensity of various physical activities in preadolescent children (Larsen et al., 2016).

#### Statistical Analysis

The statistical analyses were performed in the open-access software R Studio (R Core Team, Vienna, Austria). The analyses were carried out on complete datasets (completecase analysis). Baseline characteristics were compared between groups using one-way analysis of variance or chi-square tests for continuous (age, BMI, cardiovascular fitness) and categorical (gender, bilingualism) measures, respectively. Data from the mathematical, cognitive and motor tests were analyzed using linear mixed models with group-time interactions as fixed effects, using R package lme4 (Bates et al., 2014). Random effects were included in the models to account for dependencies between measurements on the same subjects, school classes, and schools. Model validation was based upon visual inspection of residual plots and normal probability plots. To accommodate the specific hypotheses of this study, specific sets of contrasts between intervention groups across the included time-points were evaluated using global F-tests. Subsequently, model-based t-tests were used to identify the significant differences, using the R-package multcomp (Hothorn et al., 2008). These pairwise comparisons were adjusted for multiplicity using the 'single-step' adjustment, which is a recently developed procedure providing a less conservative adjustment of p-values as compared to Bonferroni adjustment and related adjustments, by utilizing the correlations between tests. Additionally, between-group differences and within-group differences at, and between, specific time-points were compared using model-based t-tests. An explorative analysis sought to examine whether differences between intervention groups were related to the baseline mathematical performance level of the children. The authors of the standardized and validated math test describe that an individual performance at 75% correct answers, or lower, might reflect difficulties acquiring mathematical content. This subgroup was termed low performers (n = 49, based on the math performance at T0). The other subgroup (i.e., >75% c.a.) was termed normal performers (n = 116) and was characterized by not having difficulties acquiring mathematical content. Moreover, to gauge the tentative contribution or mediating effect of cognitive and motor performance on mathematical achievement, each cognitive and motor measure was added one by one, using an univariate approach, to the overall linear mixed model as an additional covariate, and differences in estimates between models with and without the covariate were reported. Data are reported as mean ± SEM unless otherwise stated. A significance level of 0.05 was applied.

## RESULTS

## Baseline Mathematical, Cognitive, and Motor Performance

At T0, the groups performed equally well in the measures of mathematical and visuo-spatial short-term memory performance, in addition to gross and fine motor performance. This is presented in **Table 2**. Indeed, model-based t-tests revealed no significant between-group differences in these measures (p > 0.05). However, significant between-group differences at T0 were found in measures of inhibitory control and phonological short-term memory (**Table 2**). FMM performed significantly better in the accuracy interference score compared to GMM (p = 0.01), and CON performed significantly worse than FMM (p = 0.049) and GMM (p = 0.03) at T0 in phonological short-term memory.

# Physical Load and Attendance during the Intervention

To evaluate the physical load of the math lessons during the intervention we applied combined accelerometer and heart rate measures at 6 randomly selected math lessons. Furthermore, we recorded the attendance in the math lessons during the intervention. Group means of the physical load during the math lessons as well as the attendance during the intervention are presented in **Table 3**. No between-group difference was found in the measure of attendance [F(2, 161) = 0.51; p = 0.60], indicating homogeneity in the attendance between groups during the intervention. An overall significant between-group difference was found for player load [F(2, 48) = 27.2, p < 0.001], and revealed that GMM displayed a higher player load during the math lessons compared to both FMM (t = 6.33, p < 0.001) and CON (t = 6.13, p < 0.001). A significant group-zone interaction was found for time spent in different heart rate zones [F(2, 138) = 4.58, p = 0.002], and revealed that GMM spent more time in the moderateto-vigorous heart rate zone (60–90% of HRmax) compared to both FMM 10.0 ± 4.0% (p = 0.03) and CON 10.6 ± 4.1 % (p = 0.03). These results indicate that GMM performed significantly more accelerations during the math lessons in the intervention, and spent more time in the moderate-to-vigorous heart rate zone compared to both FMM and CON.

## Performance in the Mathematical Test

All groups improved their performance within the mathematical task from T0 to T1 and T0 to T2, as can be seen in **Table 2**. A significant group-time interaction was found from T0 to T1 [F(2, 288) = 3.49, p = 0.03]. As summarized in **Figure 2A**, the changes in mean mathematical performance were significantly greater from T0 to T1 for GMM compared to FMM 1.87 ± 0.71 correct answers (c.a.) (p = 0.02). This effect was not evident from T0 to T2 [F(2, 434) = 1.89, p = 0.15]. This indicated that GMM transiently improved children's performance in the mathematical task more than FMM.

# Performance in the Mathematical Test in Normal and Low Performers

To assess whether differences existed between children characterized as normal and low mathematical performers we performed a subgroup analysis. Within the normal performers, the improvements in mean mathematical performance were significantly greater for GMM compared to both CON 1.78 ± 0.73 c.a. (p = 0.04) and FMM 2.14 ± 0.72 c.a. (p = 0.008) from T0 to T1 (**Figure 2B**). Additionally, the improvements in mean mathematical performance were significantly greater for GMM compared to CON 2.67 ± 0.71 c.a. (p < 0.001) from T0 to T2 (**Figure 2B**). No significant differences between groups were observed within the low performing individuals from T0 to T1 or T0 to T2 (all p > 0.05) (**Figure 2C**). This indicates that the normal performers benefitted from GMM compared to both FMM and CON, whereas the low performers did not.

TABLE 2 | Mathematical, cognitive and motor performance at T0, T1 and T2 for the two intervention groups (FMM, Fine motor math; GMM, Gross motor math) and the control (CON) group.


*Data reported as means* ± *SEM. Mathematical performance estimated by standardized, diagnostic Danish test. Visuo-spatial and phonological memory estimated using a spatial span and a word-recall task. Executive functions estimated using a modified Eriksen Flanker Task. Interference effect is the difference between incongruent and congruent trials. Gross and fine motor performance evaluated through a coordination wall task and the Perdue Pegboard, respectively.* \**Indicates a significant within-group difference from T0. §Indicates a significant within-group difference from T1.*

*a Indicates a significant between-group difference from CON at T0.*

*b Indicates a significant between-group difference from FMM at T0.*

*c Indicates a significant between-group difference from GMM at T0 (p* < *0.05).*

# Contribution of Cognitive and Motor Covariates to Changes in Mathematical Performance

Additionally, we investigated whether, and to what extent, potential covariates contributed to the results observed for the mathematical performance, by using an univariate analysis. Group means of the cognitive and motor performance measures can be seen in **Table 2**. The results of the univariate analysis are presented in **Table 4**. When controlling for visuo-spatial short-term memory the difference in mathematical performance between GMM and FMM was reduced to 1.22 ± 0.8 (p = 0.40). Changes in visuo-spatial short-term memory accounted for approximately 35% of the effects of the intervention on mathematical performance. Similar results were not found for any other measure of cognitive performance (see **Table 4**). When controlling for gross motor skills the difference in mathematical performance between GMM and FMM was reduced to 1.41 ± 0.77 (p = 0.16). Changes in gross motor skill performance accounted for approximately 25% of the effects of the intervention on mathematical performance (see **Table 4**).

#### DISCUSSION

The aims of this study were 3-fold. First, we sought to investigate the immediate and longer-term effects of classroom-based integrated gross and fine motor activities on mathematical achievement in preadolescent children. Secondly, it was investigated whether the intervention elicited different effects in normal and low math performers. Thirdly, we sought to investigate the potential role of cognitive functions and motor skills in the physical activity-academic achievement relationship.

The main findings of the study were that motor enriched learning activities can improve mathematical achievement. Indeed, in normal performers, applying gross motor enriched TABLE 3 | Attendance and physical load during the intervention for the two intervention groups (FMM, Fine motor math; GMM, Gross motor math) and the control (CON) group.


*Data reported as means* ± *SEM. Physical load obtained through six spot tests using timesynced heart rate measures and accelerometers. MVPA, moderate-to-vigorous physical activity; HR, Heart rate. Low HR zone corresponds to 0–60% of HRmax , MVPA HR zone corresponds to 60–90% of HRmax and high HR zone corresponds to 90–100% of HRmax. a Indicates a significant between-group difference from CON at T0.*

*b Indicates a significant between-group difference from FMM at T0.*

*c Indicates a significant between-group difference from GMM (p* < *0.05).*

math lessons resulted in a greater improvement in mathematical performance compared to fine motor enriched math and conventional math lessons after a 6-week intervention. The effects on mathematical performance were maintained between gross motor enriched and conventional teaching in the normal performers 8 weeks after the cessation of the intervention. In all children, these positive effects were observed between gross motor enriched learning and fine motor enriched learning. These effects seem to be partly accounted for by changes in the visuo-spatial short-term memory, gross motor skills and, to a minor degree, fine motor skills. These results add to the emerging literature, consolidating the positive effects of integrating classroom-based motor activity to improve cognitive and academic performance in children (Donnelly et al., 2016). Interestingly, our results specify that only normal achieving children benefit from adding motor activity to the classroom curriculum, and that cognitive and motor abilities could contribute to the observed positive effects of the motor enriched learning activities.

# Effects of Gross Motor-Enriched Learning Activities on Mathematical Performance

The observed behavioral effects might be the result of a combination of different mechanisms. Indeed, a recently published review proposed that classroom-based integrated physical activity could influence learning through various processes (Chandler and Tricot, 2015), including acute and chronic effects of the physiological response to exercise on the central nervous system (CNS) (Hillman et al., 2008). For example, physical exercise causes a release of a plethora of neurobiological substances, including brain-derived neurotrophic factor (BDNF), lactate and insulin-like-growth factor (IGF-1) (Skriver et al., 2014). Animal studies suggest that these neurotrophic agents might benefit neuroplastic processes, acutely related to memory formation (e.g., Cotman and Berchtold, 2002; Cotman et al., 2007). However, previous studies have demonstrated an intensity-dependent doseresponse relationship of exercise on the neurotrophic response (Ferris et al., 2007; Winter et al., 2007), leading to the greatest physiological response at intensities higher than the one experienced in the gross motor enriched learning activities in the current study (22% in moderate-to-vigorous heart rate zone). Additionally, a direct transfer of the animal-based molecular findings to human behavioral outcomes is irrefutably challenging (Voss et al., 2013). Indeed, while some human studies have found associations between blood concentrations of a number of neurotrophic agents and behavioral outcomes (Winter et al., 2007; Skriver et al., 2014), others have failed (Ferris et al., 2007; Schmidt-Kassow et al., 2013).

At a functional and behavioral level in children, single bouts of moderate-to-vigorous physical activity have been related to efficient and rapid allocation of attentional resources (Hillman et al., 2009) and improved classroom behavior (Mullender-Wijnsma et al., 2015). Results from chronic intervention studies focusing on moderate-to-vigorous physical activity have also pinpointed these positive effects (e.g., Hillman et al., 2014). Moreover, single bouts of coordinatively demanding physical activity have been found to improve attention in adolescents (Budde et al., 2008), and chronic interventions employing coordinatively demanding physical activity have successfully improved children's attention (Chang et al., 2013; Gallotta et al., 2015). Attention is a key mediator of hippocampal-related declarative memory formation (Aly and Turk-Browne, 2016), probably related to schema-dependent academic learning and performance (van Kesteren et al., 2014).

We hypothesized a general effect of motor enriched learning strategies. Yet, the gross motor learning activities were the single effective strategy in promoting mathematical performance. Arguably, the greater time spent in moderate-to-vigorous physical activity during the gross motor enriched learning activities, working in conjunction with greater coordinative demands, could have favored brain processes positively contributing to the effects of gross motor enriched learning activities on both an acute (during the lessons) and chronic (throughout the entire intervention) temporal scale.

Additionally, based on theories of embodied cognition (e.g., that cognitive knowledge is based on bodily experiences) (Barsalou, 2008), learning could be influenced by integrating task-related motor activity, bridging the content to be acquired to the performed motor activity. Indeed, procedural sensorimotor experiences might contribute to declarative knowledge acquisition (Koziol et al., 2012). Moreover, neuroanatomical structures, including the cerebellum, previously thought to be primarily motor related might also be critically involved in controlling higher-order cognitive functions (Diamond, 2000; Koziol et al., 2012). Previous studies have indeed found a positive effect of performing movements related to the content to be acquired in academically related domains in both adults (Macedonia et al., 2011; Mayer et al., 2015) and young children (Mavilidi et al., 2015). Collectively, these studies demonstrate the positive effects of performing congruent motor activity to improve learning. Theoretically, it could be speculated that the gross motor enriched learning activities involved motor activity



*Data reported as means* ± *SEM. Different covariates added to the statistical model explaining intervention effects in mathematical performance. Percent contribution displays percent accounted for by a single covariate. The intervention effects are primarily accounted for by visuo-spatial short-term memory and gross motor skills.*

more likely to be subjectively perceived as more figuratively meaningful and congruent, compared to the motor activity performed during the fine motor enriched learning activities. This was, however, not assessed during the study.

Taken together, while both the quantitative (e.g., exercise intensity), qualitative (e.g., coordinative demands) and embodied (e.g., congruency) characteristics of the physical activity might explain the behavioral differences observed, more research is needed to pinpoint the exact mechanisms underlying the effects.

# Differential Effects of Interventions on Mathematical Performance Related to the Baseline Level of the Children

The results obtained in our subgroup analyses showed that the benefits of gross motor enriched learning activities were confined to the normal performing individuals. The lowperforming children generally improved their mathematical performance more. However, no differences were observed in the improvements between groups. This was in contrast to the normal performing individuals, where benefits of gross motor enriched learning activities were present. Previous research have, on the other hand, shown that the effects of physical activity on cognition and academic involvement were greatest for low-performing children (Mahar et al., 2006; Drollette et al., 2014). However, whereas this study included measures of mathematical performance in a longitudinal interventional perspective, previous studies have used measures of executive functioning (Drollette et al., 2014) and class-room behavior (Mahar et al., 2006) in acute study designs. Furthermore, these studies all employed physical activity interventions centered around the quantitative characteristics of the performed activity. Moreover, the results of the abovementioned studies might be partially biased by the statistical phenomenon of regression toward the mean, explaining the tendency for an "extreme" measure to be closer to the mean when measured the second time (e.g., Moreau et al., 2016). In contrast, the results of the current study do not seem to be biased by this statistically observable phenomenon, as it was within the normal performing subgroup that the benefits of gross motor learning activities were observed. Taken together, the parameters mentioned above complicate direct comparisons to the results of the current study.

Importantly, however, our results contribute with novel knowledge clarifying who benefits from classroom-based integrated motor activity. It seems that the combined cognitive and motor demands of the gross motor enriched teaching strategies result in positive effects uniquely for normal performers, and not for the low performers. These findings support the notion of an optimal challenge point, as initially proposed by Pesce and colleagues (Pesce et al., 2013). Specifically, Pesce et al. (2013) found that enriching a physical activity intervention with additional cognitive demands specifically centered around the executive functions of the participants, resulted in greater improvements in measures of flexible attention for typically developing children, but not for children with developmental coordinative motor deficits. The authors argued that children with motor deficits required a higher amount of cognitive control merely performing the physical activity, leaving less cognitive resources available to deal with additional cognitive challenges (Pesce et al., 2013). In line with this, one could speculate that the low-performing individuals were sufficiently challenged by the cognitive demands of the mathematical content to be acquired during the lessons, due to their initial lower mathematical skill proficiency, leaving fewer mental resources available to benefit from the additional motor activity posed by the gross motor enrichment. Collectively, enriching mathematical lessons with gross motor activity seems to be optimal for normal performing individuals, but not for low performing individuals.

Despite these interesting findings, the current evidence in the field does not allow for clear conclusions regarding the responsiveness of physical activity interventions in individuals achieving at different levels at baseline. Future research is needed to investigate the potential inter-individual differences in the effectiveness of interventions aiming at improving academic achievements in children, using studies specifically designed for evaluating this question.

# Visuo-Spatial Short-Term Memory and Gross Motor Skill Performance Accounts for Mathematical Improvements

The performed univariate covariation analysis arguably provides novel, interesting perspectives on the effects of physical activity on academic achievement, by showing that changes in visuo-spatial short-term memory in addition to gross motor skill performance partially accounted for the effects of the intervention on mathematical performance. This was not the case, to the same extent, for the other included cognitive or motor measures, indicating specific associations between physical activity, visuo-spatial memory, gross motor skills and mathematical achievement. In support of this, visuospatial memory has previously been related to mathematical achievements (Bull et al., 2008), especially in young children (Holmes and Adams, 2006). Moreover, previous cross-sectional findings have supported a relationship between measures of fine and gross motor proficiency and cognitive functions in both elderly (Voelcker-Rehage et al., 2010) and children (Geertsen et al., 2016). A non-interventional longitudinal study also found that motor skills in kindergarten predicted academic achievement in 1st grade children (Roebers et al., 2014). The results of the current study add longitudinal and interventional evidence to the current knowledge, suggesting that gross motor enriched learning activities might improve mathematical performance through improved gross motor skills and visuospatial short-term memory. These results fit nicely with the only other developmental study addressing the potential mediating role of gross motor skills, but not fine motor skills, on executive functioning (Pesce et al., 2016), which is closely related to mathematical performance in children (St. Clair-Thompson and Gathercole, 2006). However, the causal interactions between the performance measures included in the current study are difficult to infer from these results, and the results should be seen as an initial exploration of possible mediating effects. Previous studies applying classroom-based interventions to improve academic achievements through physical activity have not controlled for the contribution of covariates to the same extent as in the current study. Thus, the novelty of the current findings warrants the need for investigating the contribution of cognitive and motor covariates when evaluating the effects of physical activity on academic achievements in future studies.

# Strengths and Limitations

This study was strengthened by the ecological value of the design including school classes as the level of randomization in the cluster-randomized controlled trial. In addition, by controlling what, when and how the participants were taught while still keeping their regular teachers and framework, we ensured that these factors did not influence our results substantially. Moreover, we evaluated the long-term effects caused by the intervention, which adds extremely important knowledge about the stability of the intervention. However, some limitations should also be kept in mind when interpreting the results of the study. First, the combination of a relatively small sample size and a substantial intra- and inter-individual variability might have influenced the power of the study. A larger sample size could strengthen the effects of the intervention (Gelman and Carlin, 2014; Moreau et al., 2016). Additionally, we acknowledge that the test measures included in the study are subject to practiceeffects (i.e., test-retest effects). However, these would expectedly affect the included groups equally, and hence would not bias the inter-group comparisons on which our conclusions are based. Even though we included various objective measures of motor skills and cognitive functions as potential covariates affecting the effects of the intervention on mathematical performance, our study could also have included measures of motivation, social interactions during the lessons and mental well-being as recently pointed out in the review by Diamond and Ling (2016) to strengthen the interpretation of the results. Indeed, integrating physical activity into the classroom have been found to influence motivational aspects (e.g., Vazou and Smiley-Oyen, 2014; Vazou and Skrade, 2016). Moreover, the results of our univariate approach did not account for collinearity between predictors and, with this shortcoming in mind, they should be seen as an initial exploration of possible mediating effects. However, more sophisticated techniques for mediation analysis, beyond the scope of the present study, such as the one applied by Pesce et al. (2016), would allow a more comprehensive exploration while accommodating for collinearity. Inferring the exact mechanisms underlying the observed behavioral effects in the current study design is challenging. Future studies should investigate these. Structural and functional imaging techniques could prove a valuable tool to widen our understanding of the underlying mechanisms.

## CONCLUSION

Participation in math lessons focusing on integrating gross motor activity can positively contribute to mathematical achievements in preadolescent children. In normal math performers, gross motor enrichment led to larger improvements than fine motor enrichment and conventional teaching. Across all children gross motor enrichment resulted in greater mathematical achievement compared to fine motor enrichment. From a practical perspective, teachers and related personnel should consider integrating gross motor activity in learning activities relevant to the academic curriculum as a promising way to engage children and improve academic achievement. The subgroup differences suggest the need of individually tailored teaching activities, specifically related to the individuals' optimal

#### REFERENCES


challenge point, to enhance learning in academic domains in children.

## AUTHOR CONTRIBUTIONS

MB, RL, JW, SG, and JL designed the experiment. MB, RL, and JW collected the data. MB, RL, JW, SG, and CR conducted the required data analysis. All authors contributed to drafting the manuscript, and all authors approved the final version of the manuscript.

#### FUNDING

This project was supported by a grant from the LEGOfoundation. LEGO-Education provided the MoreToMath <sup>R</sup> products.

#### ACKNOWLEDGMENTS

We would like to thank Lasse Rehné Jensen, Laurits Munk Højberg, Christian Lillelund Jacobsen, Stefan Madsen, Hanna Wårfors, Meaghan Spedden, Andreas Blaaholm Nielsen, Sisse Kofoed Seide, Sofie Rejkjær Elleby for their help with data collection. We would like to thank Richard Thomas for valuable feedback on a previous version of this manuscript. We would like to thank the included schools and their teachers, and most importantly the children for participating in the study.


mathematics curricula. Educ. Psychol. 26, 339–366. doi: 10.1080/0144341 0500341056


development. Ment. Health Phys. Act. 6, 172–180. doi: 10.1016/j.mhpa.2013. 07.001


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

LEGO-Education provided the MoreToMath <sup>R</sup> products, otherwise they were not involved before, during or after the project.

Copyright © 2016 Beck, Lind, Geertsen, Ritz, Lundbye-Jensen and Wienecke. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Mediators of Physical Activity on Neurocognitive Function: A Review at Multiple Levels of Analysis

Chelsea M. Stillman1,2 \*, Jamie Cohen2,3, Morgan E. Lehman<sup>3</sup> and Kirk I. Erickson1,2,3 \*

<sup>1</sup> Department of Psychiatry, School of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA, <sup>2</sup> Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA, <sup>3</sup> Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA

Physical activity (PA) is known to maintain and improve neurocognitive health. However, there is still a poor understanding of the mechanisms by which PA exerts its effects on the brain and cognition in humans. Many of the most widely discussed mechanisms of PA are molecular and cellular and arise from animal models. While information about basic cellular and molecular mechanisms is an important foundation from which to build our understanding of how PA promotes cognitive health in humans, there are other pathways that could play a role in this relationship. For example, PA-induced changes to cellular and molecular pathways likely initiate changes to macroscopic properties of the brain and/or to behavior that in turn influence cognition. The present review uses a more macroscopic lens to identify potential brain and behavioral/socioemotional mediators of the association between PA and cognitive function. We first summarize what is known regarding cellular and molecular mechanisms, and then devote the remainder of the review to discussing evidence for brain systems and behavioral/socioemotional pathways by which PA influences cognition. It is our hope that discussing mechanisms at multiple levels of analysis will stimulate the field to examine both brain and behavioral mediators. Doing so is important, as it could lead to a more complete characterization of the processes by which PA influences neurocognitive function, as well as a greater variety of targets for modifying neurocognitive function in clinical contexts.

Edited by: Louis Bherer, Université de Montréal, Canada

#### Reviewed by:

Arun Bokde, Trinity College, Dublin, Ireland Cédric T. Albinet, Jean-François Champollion University Center for Teaching and Research, France

#### \*Correspondence:

Chelsea M. Stillman cms289@pitt.edu Kirk I. Erickson kiericks@pitt.edu

Received: 11 August 2016 Accepted: 24 November 2016 Published: 08 December 2016

#### Citation:

Stillman CM, Cohen J, Lehman ME and Erickson KI (2016) Mediators of Physical Activity on Neurocognitive Function: A Review at Multiple Levels of Analysis. Front. Hum. Neurosci. 10:626. doi: 10.3389/fnhum.2016.00626 Keywords: brain, cognition, exercise, physical activity, mechanisms, mediation

# INTRODUCTION

Physical activity (PA) is important for maintaining physical health. PA has also been shown to maintain and improve neurocognitive health, but we know much less about the mechanisms by which it exerts its salutary effects on brain and cognition in humans (e.g., see Tomporowski and Ellis, 1986; Hillman et al., 2008; Gomez-Pinilla and Hillman, 2013 for reviews). However, research has found that physical inactivity is a risk factor for cognitive impairment, which has stimulated interest in examining whether PA can act to improve neurocognitive function, as well as the mechanisms by which it might work.

What are the cellular and molecular, brain systems, and behavioral mechanisms by which PA influences cognitive function, and how can they be identified? There are two frameworks that are typically used for causal inference—one is related to the design of the study, and the other is related to the statistical approach used for analysis (Imai et al., 2011).

The gold standard for assessing causality is through an **experimental manipulation**. A study is "experimental" when an independent variable (i.e., PA) is manipulated to examine its influence on a dependent variable (i.e., cognition). In the context of exercise studies, human or animal subjects are randomly assigned to an experimental condition (i.e., exercise treatment) or control group (i.e., standard care), and the outcome of interest is assessed in each group while all other factors are held constant. In the context of PA, this type of causal evidence comes almost exclusively from exercise training studies when studying humans, or from animal models of exercise in which one group of animals is permitted to exercise while another group is treated as a control. Causality is established if the outcome variable (e.g., cognition) changes to a greater extent in the treatment (e.g., exercise training) group relative to the control group.

Of course, it is not always feasible or practical to randomly assign participants to groups and experimentally manipulate treatment variables such as exercise, especially when studying humans. In this case, an alternative framework for causal inference can be used. **Statistical mediation** is used to test the plausibility of causal models not only in experimental studies, but also in observational, longitudinal, or quasi-experimental designs in which random assignment did not occur and/or the treatment variable of interest was not directly manipulated. Further, statistical mediation can be used in at least two different contexts. It is important to distinguish between these contexts as they influence the type of conclusions that can be drawn from the model's results.

In the first context (i.e., observational, longitudinal, and quasiexperimental studies), mediation models allow us to evaluate alternative causal mechanisms between the treatment and outcome variables by examining the roles of several intermediate variables that lie in the causal path. The benefit of using statistical mediation models in this context is that we are able to obtain evidence of a potential causal pathway using data or designs that are inherently non-experimental. Importantly, however, it is still not possible to rule out whether some third, confounding factor is driving the pattern of statistical mediation observed in a non-experimental design.

In the second context, a "gold standard" experimental design is used alongside a statistical mediation model. Using statistical models in experimental designs allows us to further examine intermediate factors that might covary with the treatment and outcome variables, and to test their plausibility as causal paths. Significant statistical mediation in this latter context provides more definitive evidence of a causal path because random assignment minimizes the influence of confounders and the direction of the relationships in the model can be established (i.e., it is possible to demonstrate that a manipulation caused changes in the mediator and outcome, rather than the reverse).

In both contexts described above where statistical mediation models can be used, the intermediate variable is considered a mediator if the coefficient describing the strength of the treatment-outcome relationship through the mediating variable (i.e., the indirect effect) is statistically significant (Preacher and Hayes, 2008). In other words, the significance of the indirect effects examines whether the mediator is a viable mechanism by which the independent (i.e., treatment) variable influences the outcome. Although statistical mediation can, and perhaps should, routinely be used in experimental studies, this approach is most frequently used to infer possible causal relationships from non-experimental data.

For the purposes of the present review, both types of causal evidence—design-related and statistical mediation—are relevant and will be discussed in the context of understanding mechanisms by which PA influences cognitive function. However, a critical caveat of the statistical mediation approach in quasi-experimental and observational studies is that causal relationships between a predictor and outcome through the mediator cannot be definitively determined, even in the case of significant statistical mediation. This is because random assignment did not occur, thus there could be pre-existing differences between the groups. However, statistical mediation approaches can provide evidence that one mediation pattern is more plausible than another, and they provide valuable theoretical insight for the design of future experimental studies.

Based on the description of causality outlined above, mechanisms can be conceived at multiple levels. However, most studies have taken a reductionist approach to mechanisms. To date, for example, most systematic reviews focusing on the causal mechanisms by which PA affects cognition have focused on cellular and molecular mechanisms, hereafter referred to as those at Level 1 of analysis. Consequently, there is a wealth of evidence (the majority from animal models) indicating that PA likely improves cognition by promoting various cellular and molecular pathways, including those responsible for neurogenesis and angiogenesis (van Praag et al., 2005), while decreasing others, such as inflammation (e.g., Parachikova et al., 2008). While information about basic cellular and molecular mechanisms is an important foundation from which to build our understanding of how PA promotes cognitive health in humans, there are other possible ways of thinking about mechanisms.

Just as mechanisms can be assessed in multiple ways (e.g., statistically or experimentally), they can also be assessed at multiple levels, ranging from the cellular and molecular level (i.e., Level 1) up to other, more macroscopic levels of analysis (see **Figure 1** for conceptual model). We will refer to brain systems and behavioral mechanisms as those at Level 2 and Level 3 of analysis, respectively, and describe them further below. Separating mediators into different levels of analysis may be informative because PA-induced changes to cellular and molecular pathways (Level 1) likely initiate changes to macroscopic properties of the brain (Level 2) and behavior (Level 3) that in turn influence cognition. For example, changes in brain structure and function (Level 2) and behaviors, such as in socioemotional functions (e.g., mood, motivation, or sleep) (Level 3) could mediate improvements in cognitive performance following PA. Crucially, our choice to organize potential mediators into different levels is not meant to imply that the mechanisms at each level are mutually exclusive. In fact, pathways identified using Levels 2 and 3 analyses are necessarily invoked by changes at lower levels of analysis, and bidirectional effects likely exist between levels (e.g., feedback loops from higher levels influencing lower levels of analysis).

For conceptual purposes, however, we have chosen to discuss evidence for mediators at each level of analysis separately.

Prior reviews have focused almost entirely on Level 1 mechanisms at the expense of the other levels. The goal of this review is to use a more macroscopic lens that attempts to identify Levels 2 and 3 mediators of the association between PA and cognitive function. To accomplish this, we will briefly summarize what is known regarding cellular and molecular mechanisms and will devote the remainder of the review to discussing the brain systems and behavioral/socioemotional mechanisms at Levels 2 and 3 of analysis. It is our hope that discussing mechanisms at multiple levels of analysis will stimulate the field to examine systems and socioemotional mediators, leading to a more complete characterization of the processes by which PA influences neurocognitive function, and a greater variety of targets for modifying neurocognitive function.

In summary, the main aim of the present review is to summarize the evidence that macroscopic brain and behavioral/psychological changes can be mediators of the effects of exercise. We define high quality evidence of mediation as either coming from randomized controlled trials (RCT) or from correlational/cross-sectional work in which a statistical mediation model is tested after finding a pattern of correlations consistent with mediation. We are not aware of any other reviews to date considering such macroscopic (i.e., non-molecular) brain and behavior changes as mediators of exercise.

#### Key Definitions

Before we discuss the evidence for mechanisms at each level of analysis, we will first define several terms related to cognition and PA, as they will be used throughout the following sections. Neurocognition is a broad term referring broadly to the brain and its cognitive functions. We will use this term when referring to general observations about the effects of PA on both brain and cognitive outcomes. Cognition is a slightly more specific term. We will use this when referring to particular behavioral performance measures, such as those frequently used to assess the effects of PA in experimental studies.

Physical activity (PA) is a broad term referring to an activity that raises heart rate above resting levels (Caspersen et al., 1985). This could include anything from housework or gardening, to walking or lifting weights. Importantly, the term PA has also been used to refer to coordinative activities, such as those requiring balance and higher-order cognitive processes (Voelcker-Rehage and Niemann, 2013). However, it is the former definition of PA that referring to aerobic, heart-rate-raising activity–that will be the focus of the present review as the PA field in the context of neurocognitive function is dominated by research using this definition. Further, different types of PA may have overlapping and distinct mechanisms (Voelcker-Rehage et al., 2010; Voelcker-Rehage and Niemann, 2013; Niemann et al., 2014). Aerobic PA, hereafter referred to as "PA," is often assessed subjectively in research studies, but it can also be measured objectively using devices such as accelerometers. Regular participation in PA influences aerobic fitness. Aerobic fitness (hereafter just "fitness") is a measure of cardiovascular efficiency and is often measured with a graded maximal exercise test. The main outcome measure from graded exercise tests is VO2max, a metric describing a person's maximal oxygen uptake. VO2max is widely accepted as the gold standard of the functional limit of the cardiovascular system. As such, it is often used as the primary objective outcome of fitness in non-experimental studies, or to assess whether a PA intervention was effective. Both PA and fitness are often used to describe the results of non-experimental designs. Exercise refers to any structured activity that is intended to improve physical fitness. This term is often used to describe the type of PA occurring in training studies. For the purposes of this review, we will use PA or fitness when referring to study outcomes and will use PA when discussing overall concepts or discussing mechanisms more generally. We will use exercise exclusively when referring to studies employing a randomized, controlled experimental design.

# LEVEL 1: MOLECULAR AND CELLULAR MECHANISMS

## PA and Cognitive Functioning

Much of what is known about the cellular and molecular mechanisms linking PA to cognitive functioning comes from animal models. This is because animals (most often mice or

rats) can be randomly assigned to an exercise or control group while the external environments are controlled for the duration of the study. This experimental manipulation examines whether outcomes between groups can be attributed to the exercise and not to other unmeasured factors. In a typical study, animals in the exercise group are given free access to a running wheel (i.e., voluntary exercise), while those in the control group are not given access to a running wheel to ensure they are comparatively inactive; all other environmental conditions (e.g., diet) are held constant. Cognitive function is assessed at the conclusion of the study (typically lasting 2–8 weeks), and physiological and brain changes are evaluated shortly after during autopsy.

Experimental studies employing animal models have established that PA (in particular, aerobic exercise) improves cognitive function, especially in cognitive domains dependent on the hippocampus, such as spatial or relational learning and memory, object recognition (e.g., Hopkins and Bucci, 2010; Bechara and Kelly, 2013), and avoidance learning (e.g., Baruch et al., 2004; Chen et al., 2008) (see van Praag, 2008 for review). In addition, exercise increases long-term potentiation, a cellular analog of learning and memory, in a hippocampal sub-region known as the dentate gyrus (e.g., van Praag et al., 1999). Animal models have been critical in establishing that the changes initiated by exercise extend beyond behavior into cognition, prompting further research into the mechanisms underlying exercise-induced synaptic, and downstream cognitive, changes.

#### Molecular Mechanisms

Exercise exerts its salutary effects on learning and memory by modulating key growth factor cascades responsible for energy maintenance and synaptic plasticity (**Figure 2**). Currently, the two pathways most studied in relation to the PA-neurocognition link are brain-derived neurotropic factor (BDNF) and insulinlike growth factor-1 (IGF-1) (for reviews of mechanisms see Cotman et al., 2007; Gomez-Pinilla and Hillman, 2013). Exercise increases BDNF and IGF-1 gene expression and protein levels, both in the periphery, as well as in several brain regions, with the most robust and long-lasting changes in the hippocampus (Voss et al., 2013b; Duzel et al., 2016). BDNF and IGF-1 signaling are considered to be causal pathways underlying exercise-related neurocognitive improvements because they are necessary to observe exercise-induced cellular effects. That is, experimentally blocking signaling in these pathways (e.g., with receptor-blocking ligands) eliminates or attenuates the beneficial effects of exercise on cellular and molecular pathways related to cognition (e.g., long-term potentiation) (Cotman et al., 2007). Most of the initial studies manipulating the action of BDNF or IGF-1 pathways have focused on the cellular consequences of this manipulation, and not on cognition itself. However, there is evidence suggesting that blocking BDNF attenuates behavioral learning and memory improvements following exercise (Vaynman et al., 2004). Therefore, exercise-related increases in at least one of these growth factors has been directly linked to both cellular and cognitive changes.

Blocking IGF-1 signaling also prevents exercise-induced increases in BDNF, suggesting that the two pathways converge at certain points in their cascades (Carro et al., 2000; Ding et al., 2006). Thus, the results of experimental animal studies have established that both BDNF and IGF-1 are mediators of exercise-induced cognitive improvements and that their relationship may be interdependent, and also involve other molecules and cascades. Therefore, while BDNF and IGF-1 are two molecular pathways affected by exercise, there are likely many others involved. Nonetheless, the heavy theoretical focus on these two molecular pathways has spurred research into other possible cellular mechanisms underlying effects of exercise on neurocognition, such as vascular endothelial growth factor (VEGF) and various neurotransmitters (e.g., serotonin) (Fabel et al., 2003; Cotman et al., 2007; Hamilton and Rhodes, 2015).

#### Cellular Mechanisms

Angiogenesis, the development of new blood vessels, and neurogenesis, the development of new neurons, are complex cellular changes that result from increased growth factor production and up-regulated molecular cascades. Both processes have emerged as viable candidates mediating the relationship between exercise and cognition (Cotman et al., 2007) (**Figure 2**). Changes in neurovasculature precede neurogenesis in rodents, particularly in the hippocampus (van Praag, 2008). Therefore, improvements in cognition following exercise may be due, in part, to increased growth of blood vessels, which in turn stimulates cell proliferation and survival.

Neurogenesis, particularly that which occurs in the dentate gyrus, is one of the most replicated cellular changes linked to

exercise (van Praag, 2008). The mediating role of neurogenesis to exercise-related cognitive changes was once controversial (Leuner et al., 2006; Meshi et al., 2006), but it is now more accepted as a viable mechanism underlying learning and memory improvements (Clark et al., 2008; van Praag, 2008). Following 2–3 weeks of voluntary exercise in rodents, there is an increase in dendritic length and complexity of existing neurons, as well as neural progenitor proliferation, in the dentate gyrus compared to control animals (Eadie et al., 2005; Cotman and Berchtold, 2007). Importantly, exercise-induced increases in the number of newborn, dividing neurons occurs in regions that overlap with those showing enhanced synaptic plasticity and growth factor expression following exercise (e.g., dentate gyrus). Exercise-induced changes in vasculature are less regionally specific (Cotman et al., 2007; Voss et al., 2011; Vivar et al., 2013). Moreover, abolishing the division of new neurons during environmental enrichment manipulations that include exercise, or inhibiting the integration of these neurons into the existing hippocampal cell structure, eliminates the learning and memory improvements typically observed following such manipulations (Bruel-Jungerman et al., 2005; Vivar et al., 2012). Thus, animal studies have suggested that neurogenesis and the survival and integration of new neurons into existing cellular networks are necessary to observe some cognitive improvements, particularly in the domain of learning and memory, following exercise. Yet, several studies have suggested that neurogenesis alone is not enough to induce cognitive changes, and that cognitive changes only arise when the new neurons successfully integrate themselves within an existing body of cells. This pattern of results establishes neurogenesis – potentially by way of angiogenesis – as another, slightly more macroscopic mechanism at Level 1 of analysis and further highlights the importance of examining mechanisms from multiple perspectives, even within our conceptualized three levels of analysis.

The bulk of evidence for the molecular and cellular mechanisms of exercise comes from animal models. One limitation of animal models is that the results cannot always be directly extrapolated to humans. Indeed, the cellular and molecular mechanisms of exercise-induced improvements to cognitive functioning in humans (not only in regards to learning and memory, but also other cognitive domains) remain largely unknown. This is, in part, because there are limited techniques available to measure cellular and molecular pathways in the human brain.

Despite these limitations, animal models have been critical in establishing that molecular and cellular changes occur in response to PA, particularly in the hippocampus. They have provided evidence that changes in molecular and/or cellular pathways mediate cognitive changes, supporting that these pathways are underlying mechanisms of the PA-cognition link. We have spoken about Level 1 mechanisms in brevity above because of the multitude of reviews already published summarizing this literature (e.g., Cotman et al., 2007; Hillman et al., 2008; van Praag, 2008; Lista and Sorrentino, 2009; Gomez-Pinilla and Hillman, 2013). However, these molecular and cellular mechanisms likely invoke more macroscopic changes in the brain, which leads us to Level 2 of analysis.

# LEVEL 2: MACROSCOPIC BRAIN SYSTEMS

## Statistical Mediation – Cross-Sectional Studies

The molecular and cellular mechanisms of exercise in humans have been difficult to establish because it is not possible to experimentally manipulate or measure cellular and molecular processes in humans in the same way we do in animals – through the use of brain tissue samples. Fortunately, advances in neuroimaging have allowed us to examine, in vivo and noninvasively, more macroscopic effects of PA on the structure and function of brain regions and circuits. But this leads to a critical question: Do the effects of exercise and fitness on neuroimaging markers (e.g., volume) have a mediating effect on cognitive outcomes, or are they just a meaningless by-product of increased exercise?

Most studies assessing mechanisms at Level 2 of analysis have examined how gray and white matter morphology are associated with PA and, in turn, whether these associations mediate differences in cognitive performance. We use the term morphology as a broad way to refer to changes in brain structure, most often assessed by measuring the volume of gray and/or white matter, or white matter integrity. Separate bodies of literature have demonstrated that brain morphology relates to cognitive function (e.g., Madden et al., 2012; Zhang et al., 2015) or to PA (e.g., Burzynska et al., 2014; Smith et al., 2014). However, it is a comparatively new concept to test brain morphology as a mechanism through which PA or fitness influences cognition. For example, in a cross-sectional study, Erickson et al. (2009) examined the possibility that links between fitness and memory function could be accounted for by hippocampal volume. Using statistical mediation modeling, they demonstrated that hippocampal volume significantly mediated the relationship between fitness and spatial memory. The authors used a statistical framework to test their mechanistic hypothesis because the study was not an experimental manipulation of PA (i.e., a randomized controlled trial). However, the results provide insight into a possible casual role of the hippocampus that would be further tested in later experimental manipulations, as described below.

Similar statistical mediation has been reported across the lifespan, suggesting that these associations may be independent of age. For example, in a group of 49 preadolescent children, Chaddock et al. (2010) found that higher-fit children had larger hippocampal volumes compared to lower-fit children, and that larger hippocampal volumes were associated with superior relational memory performance. Importantly, they found that bilateral hippocampal volume mediated the relationship between fitness and memory task performance. A study of older adults with mild cognitive impairment (MCI) reported similar pattern of results using hippocampal volume (Makizako et al., 2015). The results of these studies are consistent with animal studies demonstrating that the cognitive-enhancing effects of exercise can be traced to changes in the molecular and cellular architecture of the hippocampus. However, one limitation from human

studies is that we cannot determine which molecular and cellular pathways are mediating the associations with hippocampal volume (Braun and Jessberger, 2014).

Importantly, PA research in humans has also revealed that the hippocampus is not the only region mediating the link between exercise and cognition. Recent work suggests that changes to regions other than the hippocampus (e.g., prefrontal cortex) mediate some cognitive improvements in humans. For example, in a cross-sectional study, Weinstein et al. (2012) demonstrated that higher cardiorespiratory fitness levels were associated with better performance on both executive control and working memory tasks. Fitness levels were also associated with greater gray matter volume in several prefrontal brain regions. Further, the volume of these prefrontal regions statistically mediated the relationship between fitness and executive function and working memory performance. Similarly, Verstynen et al. (2012) found that caudate nucleus volume statistically mediated the relationship between fitness and cognitive flexibility, a function known to be supported by this region. Thus, in addition to the hippocampus, volumetric differences in the prefrontal cortex and caudate nucleus may mediate fitness- or exercise-related improvements in executive control and cognitive flexibility. These results demonstrate, as would be expected, that brain regions that support certain cognitive processes are the same regions that also statistically mediate associations between fitness or PA and cognitive performance in particular domains.

In addition to the cross-sectional studies showing that gray matter volume may statistically mediate the fitness-cognition relationship, the integrity of white matter tracts may also mediate the link between PA and cognitive functioning. The first line of evidence for this idea comes from studies showing that white matter integrity has a clear association with cognitive performance across a number of domains (Fjell et al., 2011; Bennett and Madden, 2014). The second line of evidence comes from studies showing that higher levels of PA are associated with greater white matter integrity (Smith et al., 2016). Given these patterns of findings, white matter integrity is another potential mediator of the link between PA and cognitive performance. This mechanism has only recently been tested. In two independent samples with a total of 267 healthy older adults, Oberlin et al. (2016) reported that white matter integrity in diffuse tracts statistically mediated the relationship between cardiorespiratory fitness (as measured by a VO2max test) and spatial working memory performance. These tracts included those connecting the medial temporal to prefrontal cortices – the same brain regions discussed above that have been found to be associated with fitness and PA in studies examining gray matter volume. Overall, these results extend the research on brain volume by demonstrating that aerobic fitness may also be associated with cognition through its associations with white matter microstructure.

The studies described above provide evidence that changes in the structure of both gray and white matter statistically mediate the relationship between PA (or fitness, in the case of cross-sectional work) and cognition. However, PA could also induce changes to the functioning, most often operationalized as functional activation or connectivity, of certain brain regions as a result, or independent of, changes in brain structure. Functional MRI (fMRI) markers have been found to differ between groups or change in response to an intervention; the question is if these changes mediate improvements in cognition. Several recent, cross-sectional studies examined functional activation as a statistical mediator of the effects of PA on cognition. Building on evidence of a relationship between fitness, executive control, and prefrontal functioning (Colcombe et al., 2004), Wong et al. (2015) examined the relationship between cardiorespiratory fitness (via VO2max), executive functioning (via dual-task processing), and prefrontal cortex activation. A statistical mediation model revealed that activation of a region in the anterior cingulate/prefrontal cortex significantly mediated the relationship between cardiorespiratory fitness and dual task performance, such that those who were more fit had more activation in this region. Similarly, Hyodo et al. (2016) found that the level of activation in the left dorsolateral prefrontal cortex statistically mediated the association between higher fitness and less cognitive interference (via a Stroop task). A recent study reported a pattern of relationships consistent with a mediating role of prefrontal cortex activation to the fitness-executive control relationship using (functional Near Infrared Spectroscopy; fNIRS) (Albinet et al., 2014). The pattern of results reported in these crosssectional studies supports the argument that PA influences cognition through its effects on the functional allocation of neural resources (i.e., functional activation) during cognitive tasks. The results of several other cross-sectional studies assessing links between fitness, neural functioning, and cognition provide support for this general idea (Dupuy et al., 2015; Gauthier et al., 2015). However, these studies did not fully test for mediation because they either did not observe or did not test for the prerequisite correlations amongst the variables.

Cross-sectional studies utilizing statistical mediation provide a theoretical and mechanistic foundation about the relationships between cardiorespiratory fitness, brain, and cognition. However, their correlational nature leaves open the possibility that the observed behavioral and structural fitness-related differences between high and low-fit groups are caused by some unmeasured factor. RCTs are necessary to account for potential selection bias, as well as to establish a direct, causal relationship in humans between aerobic fitness, brain structure, and cognitive functioning.

#### Experimental Mediation – Randomized Controlled Trials

There have been numerous RCT examining the effects of exercise on cognition or on brain outcomes (Kramer et al., 2006; Hillman et al., 2008; Smith et al., 2010). However, only a small subset has examined both brain and cognitive outcomes in the same study, allowing for causal inference (**Table 1**). Even fewer of the existing RCTs on this topic have included both cognitive and brain changes within a statistical model in order to definitively demonstrate a mechanism of exercise at Level 2 of analysis. In fact, while many of the RCTs that will be discussed below have shown promising patterns, only one tested for statistical mediation.

#### TABLE 1 | Evidence for mechanisms of PA at Level 2 of analysis.

fnhum-10-00626 December 7, 2016 Time: 17:19 # 7


(Continued)

TABLE 1 | Continued


#### Changes in Brain Structure

As in the human cross-sectional work, most RCTs examining mechanisms of PA at Level 2 of analysis have focused on its effects on brain structure, particularly on gray matter volume. In general, this literature has shown that exercise training increases brain volume particularly in the hippocampus, and these volumetric changes partially account for cognitive improvements following the intervention. In a seminal study on this topic, 120 inactive older adults were randomly assigned to a 12-month aerobic walking (experimental) group, or to a stretching and toning (control) group (Erickson et al., 2011). Following the intervention, the aerobic exercise group showed greater gray matter volume in the anterior hippocampus compared to the control group. These findings represent the first experimental evidence linking changes in exercise to changes in both gray matter volume and cognitive performance in aging humans in the context of a RCT. Further, the findings are consistent with animal models of regional specificity for the effects of exercise on the brain, such that volume changes are particularly robust in the anterior portion of the hippocampus.

One limitation of the Erickson et al. (2011) study was that they did not test whether the relation between changes in fitness levels and spatial memory could be statistically accounted for by changes in hippocampal volume. It is therefore possible that another factor associated with both changes in fitness and gray matter volume accounted for the cognitive changes. In fact, the control group in this study also showed improvements in memory performance, despite showing decreases in hippocampal volume, making the causal links between changes in fitness and changes in hippocampal volume and spatial memory in the experimental group tenuous. Further, only the volumes of subcortical regions were assessed; potential changes in cortical regions, such as the prefrontal cortex, were not examined (but see Colcombe et al., 2006; Ruscheweyh et al., 2011 for evidence of cortical volume changes). Nonetheless, these results are important, as they were the first experimental evidence to suggest that exercise training can increase the volume of the hippocampus and improve memory in older adults.

In addition to examining mechanisms of exercise in healthy older adults, RCTs have also examined whether exercise can increase gray matter volume and cognition in clinical samples. The patient groups studied are those in which there are wellknown hippocampal deficits, including schizophrenia (Pajonk et al., 2010), major depressive disorder (Krogh et al., 2014), and MCI (Ten Brinke et al., 2015). Each of these studies reported increases in gray matter volume in the hippocampus, but results were mixed regarding whether the exercise intervention improved cognition. For example, Pajonk et al. (2010) conducted a study with 24 participants, 16 of whom had schizophrenia. Eight of the patients and all of the healthy participants (n = 8) were enrolled in an aerobic (cycling) exercise intervention (n = 16), while the other eight patients played table tennis as a lowaerobic control activity. The authors found that hippocampal volume increased and short-term memory improved in the exercise group following the intervention, but not in the nonexercising control group. Short-term memory and schizophrenic symptom severity improved with changes in hippocampal volume in schizophrenics, suggesting a possible mediating role of hippocampal changes on behavioral outcomes. Unfortunately, these associations were not tested using a statistical mediation framework (see Firth et al., 2015 for mixed findings).

Increased cerebral perfusion has also been suggested as a possible mechanism for the cognitive-enhancing effects of exercise. For example, Maass et al. (2015) combined brain volume, perfusion, and memory change outcomes from their 3-month intervention and found that changes in gray matter volume could be accounted for by changes in cerebral perfusion. These results suggest that perfusion changes may mediate the effects of exercise on both gray matter volume and memory performance. Thus, while gray matter volume is the most widely studied Level 2 mechanism of exercise on cognitive outcomes, there are other neuroimaging modalities tapping into different components of brain health that could shed light on the mechanisms of volumetric and cognitive changes (e.g., see Zimmerman et al., 2014).

Changes in white matter microstructure may be another mechanism for the effects of PA on cognition because white matter supports communication between brain regions. Greater PA is linked to white matter preservation (Sexton et al., 2016) and decreased white matter integrity is linked to cognitive deficits (Roberts et al., 2013). This pattern raises the possibility that white matter health is a mechanism underlying the effects of PA on cognition. In the first RCT to examine this possibility, Voss et al. (2013a) evaluated the effects of a 12-month exercise intervention on white matter integrity and cognitive performance in healthy older adults. Seventy older adults were randomly assigned to

either an aerobic walking or toning/stretching group; the groups participated in their respective activities for 40 min per day, 3 days per week. Increases in temporal and prefrontal white matter integrity, assessed via fractional anisotropy (FA), and memory were positively associated with improvements in fitness. However, the exercise-induced changes in FA were not associated with changes in memory performance. One possible explanation for this null finding is that the sample size was too small, limiting the statistical power to detect a relationship between FA and memory. This is a limitation that applies to many of the RCTs conducted to date.

## Exercise-Induced Changes in Brain Function

Changes in brain function in response to exercise, as measured by fMRI, could potentially precede changes in brain structure. It is also possible that exercise induces changes in brain function that are independent of changes in structure (i.e., not simply a byproduct of structural changes). Functional imaging studies of exercise therefore offer information about another potential brain mechanism underlying cognitive changes in response to exercise. Although less studied compared to structural change, exercise-related changes in brain function have been examined in the context of several RCTs (e.g., Voss et al., 2010; Kamijo et al., 2011; Chaddock-Heyman et al., 2013; Hillman et al., 2014; Krafft et al., 2014). Unlike RCTs examining structural outcomes, however, most of the RCTs focusing on functional outcomes have focused on changes in prefrontal cortex functioning, rather than the hippocampus. For example, using fMRI, Chaddock-Heyman et al. (2013) observed that children participating in a 9-month exercise intervention, 5 days per week, showed improved executive control performance and increased prefrontal activation patterns following the intervention, similar to the pattern seen in a healthy adult comparison group. Krafft et al. (2014) reported a similar pattern of findings in obese children: Following an 8-month intervention, obese children showed improved cognitive control and increased activation in a comparable set of prefrontal brain regions to those reported by Chaddock-Heyman et al. (2013). Further, using similarly aged (i.e., childhood) samples, two additional studies by Kamijo et al. (2011) and Hillman et al. (2014) observed that exercise improved cognitive performance (i.e., working memory and cognitive flexibility, respectively) and increased frontal electrophysiological indices of cognitive preparation and flexibility (i.e., contingent negative variation and P3 amplitude, respectively) following 9-month exercise interventions. Thus, in the pre-pubescent, developing brain, exercise has been shown to improve executive functioning, and these improvements have been associated with increased activation or neural responsiveness in prefrontal brain regions.

At least two RCTs suggest that the functional changes induced by exercise extend to older adults, although perhaps in a less regionally specific manner. Using fMRI, Voss et al. (2010) showed that a 12-month walking intervention increased functional connectivity among regions within two large-scale brain networks: The default mode and the frontal executive networks (FEN). The increased functional connectivity in the FEN, a network that includes several prefrontal brain regions, was associated with improvements in executive control performance. A seminal study by Colcombe et al. (2004) reported similar findings in the functioning and recruitment of the FEN following a shorter, 6-month exercise intervention in older adults, supporting the claim that changes to large scale brain networks may occur relatively soon after the commencement of exercise training. Since large-scale brain networks are known to become less efficient and less flexible with age, these results suggest that exercise may exert more global effects on the efficiency and flexibility in which networks of brain regions interact in older adults, leading to preserved cognitive performance.

There have been a number of RCTs examining the effects of exercise on both brain outcomes and cognition. While many have demonstrated brain or cognitive changes following an exercise intervention, few have gone on to test for statistical mediation after finding a pattern of results consistent with a causal mechanism. Doing so, however, is important to establish the behavioral relevance of changes in neuroimaging metrics in PA-induced improvements in cognitive functioning (**Figure 3**). Further, the various differences in study design, measurement techniques, analytic approach, and study samples employed across the existing RCTs limit the mechanistic conclusions that can be drawn and highlight the need for more RCTs in this area. In particular, there is a need for RCTs to include larger samples and multiple imaging modalities in order to tease apart mechanistic questions related to temporal precedence. There is also a need to look at the activation of networks of brain regions, as well as the connectivity between them, as potential mediators. Given the recent shift in the field to focus on functional brain networks, it seems unlikely that any single brain region works in isolation to mediate cognitive improvements. More complex mediation models may prove useful for more fully capturing the mechanisms underlying the effects of PA on cognition. Nonetheless, the results of the existing body of literature are promising in that they indicate that exercise has multi-modal effects on the brain that likely underlie improvements in cognition. Further, the convergence of results, despite various differences across the studies, speaks to the robustness of the effects of exercise on both brain and cognition, and pinpoints PA as an effective tool to preserve and promote neurocognitive functioning across the lifespan.

#### LEVEL 3: BEHAVIORAL AND SOCIOEMOTIONAL MECHANISMS

The evidence described above implicates a number of molecular, cellular, and brain processes involved in PA, but it is likely that PA also exerts changes in other behaviors that contribute to cognitive improvements (**Figure 4**). Unfortunately, few studies have examined how mechanisms at Level 3 of analysis might underlie the effects of exercise on cognitive performance. From a clinical standpoint, however, changes in human behavior are much easier to observe and may reflect a cost-effective approach to understanding behavioral mechanisms by which exercise improves cognitive function. Thus, there is added practical value

in understanding potential mechanisms at the behavioral and socioemotional level.

#### Sleep as a Mediator

Sleep quantity and quality are important for healthy cognitive function (Yaffe et al., 2014). For example, both the amount and quality of sleep are considered to be important in memory consolidation and learning processes (Walker, 2009). There is also a wealth of cross-sectional and experimental evidence that supports the idea that high quality sleep leads to better performance on a variety of cognitive tasks (see Ellenbogen, 2005 for review). Sleep is therefore a critical contributor to cognitive performance.

There is also evidence that increased PA improves sleep quality. The first RCTs to examine this demonstrated that relatively brief (e.g., 10-week) exercise interventions boosted selfreported sleep quality in older adults compared to non-exercise control groups (King et al., 1997; Singh et al., 1997). These studies did not include measures of cognitive performance, so the effects of exercise-induced sleep improvements on cognition could not be determined. However, given the connections between sleep and cognition, and between sleep and PA, it is possible that PA improves cognitive outcomes by influencing sleep quality and efficiency.

The only study to date that included PA, sleep, and cognition in one model tested the hypothesis that sleep mediates the relationship between PA and executive functioning in 109 young (n = 59) and older adults (Wilckens et al., in press). Both PA and sleep were objectively measured with accelerometry. Wilckens et al. (in press) found that PA energy expenditure was positively associated with sleep efficiency, as well as executive functioning and processing speed. Further, sleep efficiency statistically mediated the relationship between PA and several measures of cognitive performance. Although limited by the cross-sectional nature of the design, these findings provide evidence that sleep may be a behavioral mechanism by which PA influences cognitive performance. Future RCTs in which participants are randomly assigned to multiple sleep and exercise conditions are needed to test this hypothesis directly.

## Mood as a Mediator

Low mood, often assessed using measures of depressive symptomology, is associated with poorer performance on a variety of cognitive tests (see Lichtenberg et al., 1995; Austin et al., 2001 for reviews). These cognitive deficits typically manifest in the domains of executive functioning, attention, and memory the same domains that are most affected by PA (McClintock et al., 2010).

Increased PA is associated with improved mood, and is an efficacious approach to reduce symptoms of depression and anxiety (Byrne and Byrne, 1993; Fox, 1999; Penedo and Dahn, 2005; Ströhle, 2009; see for review Bridle et al., 2012). In fact, PA is increasingly being used as an adjunct treatment for clinically significant depression (Mead et al., 2008). Interestingly, there is also evidence that the relationship between PA and mood is bidirectional, such that poorer mood may lead to decreased PA (Roshanaei-Moghaddam et al., 2009).

Despite the fact that mood has been linked both to PA and cognitive function, only a handful of studies have considered mood as a mediating pathway through which PA influences

cognition. For example, Vance et al. (2005) and Robitaille et al. (2014) both used statistical mediation in cross-sectional studies to examine the effects of several socioemotional factors (i.e., social support, depressive symptoms, cognitive activity) on the relationship between PA and cognitive performance. Consistent with their predictions and results from previous work, there were positive relationships between PA and tests of cognition, specifically in the domains of memory, processing speed, and visuospatial functioning. However, the two studies found disparate results when examining the behavioral predictors. In particular, Vance et al. (2005) found that physical inactivity had significant indirect associations with cognitive functioning through depressive symptomology and social support, while Robitaille et al. (2014) did not find that depression scores mediated the associations between PA and cognitive performance. Instead, they found that social support and cognitive engagement mediated the effects of PA on cognition (Robitaille et al., 2014). However, the indirect effects of these factors varied by cognitive outcome, suggesting that the mechanisms may vary across cognitive domains.

Unfortunately, the results of RCTs have not provided any more clarity regarding the role of factors, such as mood, on the PAcognition relationship. One RCT examined mood as a mediating pathway with 64 younger adults randomly assigned to an exercise or control group (Lichtman and Poser, 1983). Following the intervention, the exercise group performed significantly better on an executive control test (i.e., a Stroop task) compared to the control group. The exercise group also showed a significant decrease in their depression and anxiety symptoms compared to baseline. However, the control group also exhibited decreases in depression and anxiety. Albinet et al. (2016) found similar results in a sample of 36 older adults who either participated in aerobic exercise or stretching control intervention for 21 weeks (Albinet et al., 2016). While both groups showed improvement in self-reported depressive symptomology, only the exercise group showed neurocognitive improvements. This pattern of results indicates that exercise was not the primary mechanism for the mood or cognitive improvements.

There are also studies in which mood changes following an exercise RCT, but no cognitive effects are observed. Another early RCT examined changes in psychological (including mood) and neuropsychological functioning following a 4-month aerobic exercise intervention in older adults (Blumenthal et al., 1989). Participants across three randomized groups (i.e., aerobic exercise, yoga and flexibility control, or waitlist control) did not show any clear pattern of differences on neuropsychological tests following the intervention. However, males in the aerobic exercise group showed a significant decline in depression and anxiety scores compared the control groups. There was still no clear pattern of cognitive improvements following exercise, even after training was extended for up to 10 months (Madden et al., 1989; Blumenthal et al., 1991).

Conversely, there are RCTs in which neurocognition changes, but mood does not. For example, Williams and Lord (1997) conducted an RCT examining the effects of a 42-week exercise intervention on mood and cognitive functioning in a group of older, community-dwelling women. Following the intervention, the exercise group performed better than the control group on measures of memory and processing speed. In addition, within the exercise group, individuals who reported higher baseline levels of anxiety and depression showed greater improvements in cognitive performance compared to individuals in the exercise group who reported lower baseline levels. However, as with Lichtman and Poser (1983), there were no group differences in mood symptoms following the intervention, and so exercise may not be driving the cognitive changes they observed. While none of these correlational studies were conclusive regarding whether mood mediates exercise-induced cognitive improvements, the results provide evidence that the relationship between exercise, cognition, and mood merits further exploration.

Testing the influence of exercise on cognition in clinical populations has not been as common in the literature. In a RCT of 73 older adults with Chronic Obstructive Pulmonary Disease (COPD), Emery et al. (1998) examined the effects of exercise, education about COPD, and stress management on psychological and cognitive functioning. Participants were randomized to three groups, only one of which had an exercise component (in addition to stress management and education). Post-intervention analyses revealed that the group receiving exercise (hereafter referred to as the "exercise group") showed lower depressive symptomatology compared to their baseline scores and to the post-intervention scores of the education-only group. Participants in the exercise group also showed improved verbal fluency scores, while the other two groups' scores did not change from their pre-intervention levels. While Emery et al. (1998) did not directly test a mediation model in this RCT, the fact that only the exercise group showed improvements in both mood and cognitive performance suggests a possible mediating link between exercise, changes in mood, and changes in cognition. However, it is difficult to interpret the exact nature of these effects because there was no group that only engaged in exercise. Thus, it is unclear whether the mood and cognitive improvements were due to exercise alone or to some synergistic effect of exercise, education, and stress management. Further, given that these results—the only experimental evidence to date to show both mood and cognitive changes in an exercising group—were reported in a clinical sample with known disturbances in mood (Maurer et al., 2008) and cognitive functioning (Liesker et al., 2004), it remains an open question whether the same mechanisms would underlie exercise-related changes to cognition in healthy populations. These results of the studies of PA and mood reviewed indicate that we do not yet understand whether, to what extent, and how changes to mood mediate the effects of PA on cognition.

# DISCUSSION

The main motivation for writing the present review was to highlight the basic idea that mechanisms of PA on cognitive outcomes can be conceptualized on multiple levels, and it is possible to examine them using a variety of study designs. Historically, the discussion of mechanisms of PA or exercise on cognitive outcomes has been limited to Level 1 of analysis.

There have been numerous reviews in the past decade that have described in detail the molecular and cellular mechanisms of PA. Since excellent evidence for these mechanisms already exists, we did not explain them in detail. Instead, we focused on evidence for mechanisms of PA on cognition at Level 2 and took stock of the (limited) evidence for mechanisms of PA on cognition at Level 3 of analysis.

Mechanisms at Level 2 (i.e., structural and functional brain changes) are only just beginning to be discussed in the scientific literature. Our review suggests that regional gray matter volume statistically mediates the relationship between cardiorespiratory fitness or PA and cognitive functioning, but most of these studies have been limited to cross-sectional designs. In addition, white matter microstructure and functional brain activity may also be mediating associations between fitness or PA and cognition (e.g., Wong et al., 2015; Oberlin et al., 2016). Across the studies, brain changes do not occur equally or uniformly throughout the brain; rather, they seem specific to several brain regions in particular, namely, the hippocampus and prefrontal cortex. The regional specificity of PA-related structural and functional brain changes is important because it mirrors some of the regional specificity observed in animal models (i.e., hippocampus).

Consistent with the cross-sectional work, RCTs also support the argument that changes in brain structure and function may be mechanisms underlying the relationship between PA and cognitive performance. Specifically, the majority of RCTs have reported changes in brain structure or function, as well as in cognition following the exercise intervention. However, of the 13 RCTs including both cognitive and neuroimaging measures conducted to date, only 1 has used a statistical mediation model. Thus, it has not been possible in the majority of RCTs to rule out the possibility that another, unmeasured factor that covaries with both the treatment and outcome is underlying the intervention effects observed in the brain and/or cognitive performance.

The search for mechanisms at Level 2 is further complicated for several reasons. First, while the volume or function of specific brain regions (again, mostly the hippocampus and prefrontal cortex) consistently change following exercise interventions, the evidence has been less consistent with regard to cognitive performance. For example, although the majority of RCTs discussed above report cognitive changes that are exclusive to the exercise group following training, several reported cognitive improvements in both the exercise and control groups—i.e., a lack of group-by-time interaction (Erickson et al., 2011; Ruscheweyh et al., 2011; Voss et al., 2013a; Krafft et al., 2014). This makes it difficult to link exercise and brain changes exclusively to the cognitive changes observed. It also highlights a key limitation of RCTs in humans: It is extremely difficult to control the behavior of participants outside of the context of the RCT. Thus, even using this gold standard design, extratraining behaviors (e.g., those in the "control" condition might inadvertently increase their PA) could lead to unexpected effects. Second, there is variability in the design of the existing RCTs in terms of, for example, activity level/engagement of the control group, intervention length, type and frequency of PA, adherence, exclusionary criteria, neurocognitive outcomes assessed, and analytic techniques. It is therefore difficult to know whether null findings are the result of this inter-study variability or a true lack of effect. Finally, there are many factors that may moderate the effects of PA on neurocognition. Despite the favorable effects of PA and cardiorespiratory fitness on brain health and cognitive function reviewed above, there is significant inter-individual variability within studies regarding the extent to which any one individual will reap the physical and cognitive benefits of PA. Thus, it is likely that mediators are being moderated by other factors, such as the presence of pathology, age, genotype, gender, and diet (see Leckie et al., 2012 for review). However, the convergence of the effects of PA on brain health, despite this wide range of variability, speaks to the robustness of PA on both brain health outcomes and cognitive function.

Along these lines, if PA is thought to enhance cognition by improving brain structure and function, then eliminating PA should have the opposite effect. Examination of the effects of PA cessation has been comparatively unexplored to date. However, there have been two recent studies on this topic that support this idea (Alfini et al., 2016; Thomas et al., 2016). Alfini et al. (2016), showed that cortical and hippocampal resting brain perfusion decreases following PA cessation after just 10 days in older adult athletes. In addition, Thomas et al. (2016) found that hippocampal volume gains following an exercise intervention in young-middle aged adults, were abolished following 2-weeks of exercise cessation. These results are interesting and important for the field because they support PA as the causal variable in mechanistic models (i.e., removing PA reverses the brain effects attributed to this behavior). However, Alfini et al. (2016) did not administer a full cognitive battery, thereby limiting an interpretation of their results with regard to cognition. Thomas et al. (2016) administered a brief cognitive battery, but found no change in cognition following their 6-week intervention. It was therefore not possible to thoroughly evaluate whether cognition (our outcome variable of interest) also decreased following PA cessation. Such evidence is needed, as it would further strengthen the causal role Level 2 mediators play in PA-related cognitive effects.

Mechanisms of PA on cognition at Level 3 of analysis have not been frequently considered or assessed. However, there are a handful of studies suggesting that this level may be important to consider for future studies (**Table 2**). Changes in sleep quality, for example, are linked to both cognition and to PA. However, only one study to date (Wilckens et al., in press) has combined all three variables in a statistical model to test whether sleep can account for the relationship between PA and cognitive performance—the results of this initial study suggest that it can. Similarly, mood is linked both to cognitive performance and PA. While several studies have considered mood along with other behavioral or socioemotional factors in statistical models assessing mechanisms of PA, virtually none have considered the unique or independent contribution of mood to the PA-cognition relationship.

We highlighted sleep and mood as two examples of potential mechanisms at Level 3 because there are literatures linking

#### TABLE 2 | Evidence for mechanisms of PA at Level 3 of analysis.

fnhum-10-00626 December 7, 2016 Time: 17:19 # 13


these factors to both cognition and PA, thus making them candidate mediators. However, it is important to note that there are likely many other possible behavioral mechanisms (e.g., self-efficacy, motivation, and fatigue, pain) that should be examined in future work (e.g., see relevant reviews McAuley and Blissmer, 2000; Mullen et al., 2012; Teixeira et al., 2012). Identifying such mechanisms is important, as they would provide additional outcome targets to assess the effectiveness of PA interventions. These additional possible behavioral mechanisms were not addressed in the current review because, to our knowledge, there are no studies to date examining them as potential mediators of the PAcognition relationship. Additional studies including both psychological and neurocognitive functioning as outcome variables are needed to enhance our understanding of this level of analysis.

The studies reviewed above suggest that the individual pieces of a model from PA to cognitive functioning through behavioral changes, such as sleep or mood, exist, but they have not been succinctly combined in one cohesive model. Future work should address the gaps in our understanding of how behavioral mechanisms modulate the effects of PA on cognition by conducting RCTs with participants aged across the lifespan and measuring both behavioral and cognitive outcomes at multiple time points. Designs with these components would allow us to test not only for experimental mediation, but also for statistical mediation following exercise interventions.

# CONCLUSION AND LIMITATIONS

One important point of consideration/limitation for studies examining the mechanisms of PA relates to the sample size needed to test for mediation. There are a number of publications addressing this topic (Cerin et al., 2006; Fritz and Mackinnon, 2007; MacKinnon et al., 2007). These articles suggest that the sample size needed to detect mediation effects depends on the size of the effect expected and the statistical method used to test for mediation. Effect sizes generally decrease with increasing variability of the outcome and so, although not always practical, larger samples are likely needed to detect mediation effects when outcome measures are highly variable (i.e., as is often the case in neuroimaging data) (Preacher and Kelley, 2011). RCTs in which groups are made to more extremely differ on a manipulated independent variable is one possible way to decrease variability (by the making groups differ extremely) and therefore increase power to detect mediation. Thus, RCTs might be the best context in which to test for Level 2 mediators. In terms of analytical method,

bootstrapping methods of testing mediation generally require smaller samples, while causal-steps approaches require the most (Cerin et al., 2006). The recommended sample size to test for a small-moderate mediation effect range from as few as N = 50–100 people using bootstrapping methods (e.g., Cerin et al., 2006) to anywhere from 400 to >20,000 people using more conservative causal-steps approaches (e.g., Fritz and Mackinnon, 2007). Many of the studies of PA cited above do not meet these recommended power requirements, increasing the likelihood of Types I and II error. Future studies should keep these guidelines in mind when budgeting and planning for recruitment.

Despite these considerations, there are many studies that implicate mechanisms of PA on cognition at Levels 2 and 3 of analysis that have given us the groundwork to construct our proposed mechanistic models (**Figures 2** and **3**), but very few have actually tested for statistical mediation. Doing so is critically important in order to rule out the possibility that the more macroscopic brain and socioemotional changes linked to PA are meaningless byproducts of PA. While we currently have the foundation to think about more macroscopic mechanisms that may mediate the relationship between PA and cognition, and potentially provide new, clinically relevant targets, these components need to be combined in testable models in future studies.

#### REFERENCES


# AUTHOR CONTRIBUTIONS

CS, JC, ML, and KE have seen and approved this manuscript for submission and are accountable for all aspects of the work. CS, JC, ML, and KE made substantial contributions to conceptualizing, drafting, and revising the manuscript.

#### FUNDING

CS is supported by NIH/NIMH T32 MH109986. JC was supported by the Dietrich Arts and Sciences Fellowship, Department of Psychology, University of Pittsburgh. KE was supported by National Institutes of Health grants R01 DK095172, R01 CA196762, R01 AG053952, P30 MH90333, and P30 AG024827.

## ACKNOWLEDGMENT

The authors would like to thank BACH Lab undergraduate, Melanie Cieciuch, for assistance with the literature search for this review.


down-regulated serotonin system in the limbic system. Neurobiol. Learn. Mem. 89, 489–496. doi: 10.1016/j.nlm.2007.08.004




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Stillman, Cohen, Lehman and Erickson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Morphological and Functional Differences between Athletes and Novices in Cortical Neuronal Networks

Xiao-Ying Tan1† , Yan-Ling Pi 2† , Jue Wang<sup>3</sup> , Xue-Pei Li <sup>4</sup> , Lan-Lan Zhang<sup>4</sup> , Wen Dai <sup>4</sup> , Hua Zhu<sup>4</sup> , Zhen Ni <sup>5</sup> , Jian Zhang<sup>4</sup> and Yin Wu<sup>6</sup> \*

<sup>1</sup> School of Physical Education and Coaching, Shanghai University of Sport, Shanghai, China, <sup>2</sup> Shanghai Punan Hospital of Pudong New District, Shanghai, China, <sup>3</sup> Institutes of Psychological Sciences, HangZhou Normal University, Hangzhou, China, <sup>4</sup> School of Kinesiology, Shanghai University of Sport, Shanghai, China, <sup>5</sup> Division of Neurology, Krembil Neuroscience Centre and Toronto Western Research Institute, University Health Network, University of Toronto, Toronto, ON, Canada, <sup>6</sup> School of Economics and Management, Shanghai University of Sport, Shanghai, China

#### Edited by:

Louis Bherer, Université de Montréal, Canada

#### Reviewed by:

Arun Bokde, Trinity College, Dublin, Ireland Betty M. Tijms, VU University Medical Center, Netherlands Teresa Liu-Ambrose, University of British Columbia, Canada

#### \*Correspondence: Yin Wu

wuyin@sus.edu.cn

†These authors have contributed equally to this work.

Received: 10 July 2016 Accepted: 12 December 2016 Published: 04 January 2017

#### Citation:

Tan X-Y, Pi Y-L, Wang J, Li X-P, Zhang L-L, Dai W, Zhu H, Ni Z, Zhang J and Wu Y (2017) Morphological and Functional Differences between Athletes and Novices in Cortical Neuronal Networks. Front. Hum. Neurosci. 10:660. doi: 10.3389/fnhum.2016.00660 The cortical structural and functional differences in athletes and novices were investigated with a cross-sectional paradigm. We measured the gray matter volumes and resting-state functional connectivity in 21 basketball players and 21 novices with magnetic resonance imaging (MRI) techniques. It was found that gray matter volume in the left anterior insula (AI), inferior frontal gyrus (IFG), inferior parietal lobule (IPL) and right anterior cingulate cortex (ACC), precuneus is greater in basketball players than that in novices. These five brain regions were selected as the seed regions for testing the resting-state functional connectivity in the second experiment. We found higher functional connectivity in default mode network, salience network and executive control network in basketball players compared to novices. We conclude that the morphology and functional connectivity in cortical neuronal networks in athletes and novices are different.

Keywords: basketball player, motor expertise, magnetic resonance imaging, plasticity, resting state functional connectivity

# INTRODUCTION

Cortical plasticity is an intrinsic property of the human brain and occurs after long-term training under various conditions (Blakemore and Frith, 2005; Pascual-Leone et al., 2005). Structural differences were found in regional cortical morphology between musicians and non-musicians (Gaser and Schlaug, 2003). Interestingly, London taxi drivers have larger gray matter volume than that in healthy controls or even non-taxi drivers in posterior hippocampi where information of spatial representation is stored (Maguire et al., 2000). However, it is not clear whether long-term training may have effects on cortical morphology with plasticity in motor related cortical areas and whether these effects may contribute to the improvement in motor functions. Elite athletes in the confrontational sports (e.g., basketball etc.) start training since childhood. The sophisticated skills in confrontational sports are likely due to the involvement of different brain areas related to various cortical networks (di Pellegrino et al., 1992; Gallese et al., 1996). These elite athletes offer a special model for studying the long-term training related cortical plasticity with changes in multiple brain areas. In contrast to the long time period required for the longitudinal studies (or often nearly impossible due to extremely long time consumption), a cross-sectional paradigm (cohort paradigm) with a comparison between highly skilled elite athletes and novices was widely used recently (Imfeld et al., 2009; Jäncke et al., 2009; Wei et al., 2011; Fauvel et al., 2014). Although the cross-sectional paradigm may not be more sufficient than the longitudinal paradigm, the study using cross-sectional paradigm is more practicable and the results obtained from a cross-sectional designed study also provide an important indication of the presence of plasticity effects with long-term training. Voxel-based morphometry (VBM) is a neuroimaging analysis technique that allows investigation of focal differences in brain anatomy (Ashburner and Friston, 2000). In the present study, we compared the gray matter volumes in various brain areas in basketball players with those in novices in a cross-sectional paradigm by measuring the structural variation in cortical morphology with VBM. We hypothesize that the gray matter volumes in motor related cortical areas are different between basketball players and novices. In addition, since long-term training process leads to refined cognitive functions such as visual search (Williams et al., 1999; McRobert et al., 2007) and sensory perceptions (Aglioti et al., 2008; Wu et al., 2013) in elite athletes, we also expect that the morphological difference between basketball player and novices may be present in cortical areas responsible for cognitive functions.

Motor expertise involves several internal processes requiring organization and integration of sensory and motor information in different cortical areas (Lisberger, 1988). Neuroimaging studies have showed that the human brain is intrinsically organized into a set of spatially distributed, functionally specific networks (Damoiseaux et al., 2006; Bressler and Menon, 2010). Cortical plasticity with long-term training to gain motor expertise is complex. The interaction of cortical activation among different brain areas at the network level may be associated with multi-factorial process of cortical plasticity (Dosenbach et al., 2008). In particular, default mode network, salience network and executive control network are major functional networks relevant to motor and cognitive functions (Bressler and Menon, 2010; Cocchi et al., 2014). Default mode network is activated during motor related spontaneous cognition (Buckner et al., 2008; Mantini and Vanduffel, 2013). The salience network plays an important role in guiding orientation of attention and monitoring of errors during events with internal and external activities (Eckert et al., 2009). The executive control network is responsible for high-level cognitive functions during motor behaviors (Alvarez and Emory, 2006; Fox et al., 2006). Our second hypothesis is that the resting-state functional connectivity in basketball players and novices are different, as relatively less evidence was found in functional brain network in top athletes. We selected the cortical areas with larger gray matter volumes in basketball players (compared to novices, defined in the VBM analysis) as the seed regions and used a seed-based approach to test the functional connectivity in two subject groups. We predicted that the different resting-state functional connectivity between basketball players and novices will be related to the cortical areas located in default mode network, salience network and executive control network which are relevant to the motor and cognitive functions in the basketball players.

# MATERIALS AND METHODS

#### Participants

Twenty-one basketball players (mean age 21.3 ± 1.3 years, age range 18–23 years) and 21 novices (mean age 21.9 ± 0.8 years, age range 19–24 years) were studied. All subjects were male (Shanghai University of Sport is one of the major training centers for men's basketball in China). The basketball players were national first-class athletes and were trained five sessions per week (each daily session about 3 h) for 10–15 years (mean duration, 11.4 ± 2.3 years). The novices were university students without professional training in basketball or any other sports. Basketball players were taller (190.6 ± 3.4 cm) than healthy controls (176.8 ± 2.9 cm; t = 14.1, p < 0.001). The experimental protocol was approved by the regional ethics committee of the Shanghai University of Sport. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

# Magnetic Resonance Imaging

Imaging scanning was performed using a 3T Siemens scanner in the functional magnetic resonance imaging (fMRI) center at the East China Normal University. The anatomical difference in gray matter volume between basketball players and novices was tested using the T1-weighted structural image scanning. A high-resolution image with 192 slices was acquired using a 3-dimension fast-field echo sequence (echo time (TE) = 2.34 ms, repetition time (TR) = 2530 ms, flip angle (FA) = 7◦ , field of view (FOV) = 256 mm<sup>2</sup> × 256 mm<sup>2</sup> , slice thickness = 1 mm, inversion time (TI) = 1100 ms). The restingstate fMRI scanning was performed with a gradient echo planar imaging sequence (TE = 30 ms, TR = 2000 ms, FA = 90◦ , FOV = 240 mm<sup>2</sup> × 240 mm<sup>2</sup> , slice thickness = 3 mm). A total of 210 scans were obtained from each subject. Subjects were instructed to keep themselves relaxed while lying still in the scanner with their eyes closed.

# Optimized Voxel-Based Morphometry Analysis

The imaging data analysis was performed using Statistical Parametric Mapping version 8 software<sup>1</sup> implemented in MATLAB 7.4. As the basketball players were taller than novices and the brain volume varied with the height in individuals, we applied the optimized VBM approach (Good et al., 2001) to compare the gray matter volumes in two groups by creating a study-specific template. In the pre-processing step, each reoriented image was segmented (unified segmentation) into gray matter, white matter and cerebrospinal fluid (Ashburner and Friston, 2005). Segmented gray matter images of all subjects were rigidly transformed and averaged to create the studyspecific template. The aligned gray matter images in each subject were normalized with the study-specific template. Modulation

<sup>1</sup>http://www.fil.ion.ucl.ac.uk/spm/

with Jacobian determinant was used to correct volume changes caused by spatial normalization. The modulated images were finally transformed into the Montreal Neurological Institute space and smoothed with a 6-mm full-width at half maximum Gaussian kernel.

Pre-processed gray matter images in basketball players and novices were compared with a two-sample t-test (second-level) run at the whole brain level. The height of subject was set as a covariate to exclude the potential contamination caused by different sizes of brain in two groups. The t-map was set at a corrected significance level of p < 0.05. AlphaSim correction (REST\_V1.8) with Monte Carlo simulation was used to correct for multiple comparisons (Poline et al., 1997; Song et al., 2011) with consideration both for the individual voxel probability and cluster size threshold. Based on the results of optimized VBM analysis, brain areas with larger gray matter volume in basketball players were identified (**Figure 1**). We also tested the correlation between gray matter volumes in these areas and training time in basketball players with the Pearson correlation coefficient. These areas were further selected as the seeds for the subsequent functional connectivity analyses. The seed was defined as a 6 mm radius sphere.

#### Functional Connectivity Analysis

We defined five seeds (**Figure 1**) for the resting-state functional imaging analysis. The pre-processing for the resting-state functional imaging analysis included slice time correction, rigid body movement correction, normalization of the functional images by directly registering onto the Montreal Neurological Institute echo planar imaging template (interpolated spatial resolution 3 mm<sup>3</sup> × 3 mm<sup>3</sup> × 3 mm<sup>3</sup> ) and spatially smoothing (6 mm full-width at half maximum). The voxel-wise correlation analysis was conducted after the initial imaging data were temporally filtered (0.01–0.08 Hz). The resting-state time series of the five selected seed regions were extracted using MarsBaR toolbox (Brett et al., 2002). The correlation coefficient (r-value) between the seed region (6 mm radius sphere) and other voxels of the whole brain (excluding those in the seed region) was computed. The correlation coefficient was converted into a z score by Fisher's r-to-z transformation to generate a contrast matrix for each seed region in each subject. For the group data analysis, we used a two-sample t-test to compare the difference between basketball players and novices (**Figure 2**). The t-test was repeated five times (one t-test for each seed region). AlphaSim correction with Monte Carlo simulation was used to correct for multiple comparisons. The t-map was set at a corrected significance level of p < 0.05.

One purpose of our study was to investigate the differences in functional connectivity between two groups as they were applied to the cortical networks with potential interests. We predicted the differences in functional connectivity between two subject groups would be due to different motor and cognitive functions in them which were likely related to different cortical networks. We specifically focused on three cortical networks, including the default mode network, salience network and executive control network. Therefore, we illustrated the data with functional connectivity from five seed regions (superimposed results of 5 t-tests) and projected the data onto a network brain template (Brain NetViewer<sup>2</sup> ; Xia et al., 2013; **Figure 3**). The clusters of involved brain areas were extracted separately for each seed region and an ICBM152 brain template was used (REST\_V1.8; Song et al., 2011). This three-dimensional volume-to-surface mapping provided more intuitive information about the seed regions and other functionally connected brain areas within three cortical networks and exhibited the spatial distribution of these cortical networks in the brain (Margulies et al., 2013).

## RESULTS

#### Gray Matter Volume

We used an optimized VBM technique to analyze T1-weighted anatomical scanning and set height as a covariate to correct

<sup>2</sup>http://www.nitrc.org/projects/bnv/

for the potential effect of contamination caused by different brain sizes in two subject groups. It was found that basketball players had larger gray matter volumes than novices in multiple brain areas. These areas included right precuneus, left anterior insula (AI), right anterior cingulate cortex (ACC), left inferior frontal gyrus (IFG) and left inferior parietal lobule (IPL; **Figure 1**; **Table 1**). These areas were selected as the seeds for the functional connectivity analyses. The inverse contrast analysis did not show larger volumes in novices compared to players in any brain areas. No significant correlation was found between the gray matter volume and training time in five seed regions in basketball players.

# Resting-State Functional Connectivity

Five seed regions were connected with multiple brain areas at resting state both in basketball players and novices. Importantly, we found resting-state functional connectivity is different for basketball players and healthy controls in functional networks related to various brain areas (**Figure 2**; **Table 2**). Specifically, the right precuneus showed more resting connectivity with right inferior orbitofrontal gyrus (IOG), left pars opercularis (POP) of the IFG and right middle frontal gyrus (MFG) in the basketball players group. The connectivity between left AI and right MFG, right IFG, left superior temporal poles (STP) was stronger in the player group than that in the novice group. Right ACC was more functionally connected with left medial superior frontal gyrus (MSFG) in the basketball player group. Left IFG was more functionally connected with the left IPL while the left IPL was more functionally connected with the left MFG in the basketball player group compared to the novice group. On the other hand, the reversed comparison did not find stronger connectivity between seed region and any other cortical areas in novices compared to basketball players.

Furthermore, we projected the data onto a network brain template. It was confirmed that right precuneus had stronger connectivity with left POP of the IFG and right IOG, MFG in default mode network in basketball players compared to that in novices (**Figure 3A**). For the salience network (**Figure 3B**), left AI had stronger connectivity with left STP and right IFG while right ACC was more functionally connected with left MSFG in basketball players than novices. For the executive control network (**Figure 3C**), functional connectivity between left IFG and IPL and that between IPL and MFG was stronger in basketball players than that in novices.

# DISCUSSION

We investigated the difference in brain structure and restingstate functional connectivity between basketball players and novices. The novel finding was that basketball players showed greater gray matter volume in five brain areas (**Figure 1**). Furthermore, basketball players displayed higher resting-state functional connectivity between these five seed regions and other cortical areas compared to novices. These cortical areas are located in default mode network, salience network and executive control network which are related to motor and cognitive functions in basketball players (**Figures 2**, **3**).

## Gray Matter Volume

We found larger volume of gray matter in right precuneus, left AI, right ACC, left IFG and left IPL in basketball players compared to novices. Precuneus is associated with processing

spatial information during motor execution and preparation (Kawashima et al., 1995; Cavanna and Trimble, 2006). In particular, precuneus was involved in target tracking tasks with special requirement for attention to spatial information (Wenderoth et al., 2005; Cavanna and Trimble, 2006). Regular practicing skill of tracking the frequently moving targets in basketball may explain the volume increase in precuneus in basketball players compared to novices. Insula shows high activation during complex behavioral tasks (Craig, 2009) and plays an important role in making a rapid decision in a risky situation (Craig, 2002; Singer et al., 2009). Such ability with superior motor related perceptual functions is



BA, Brodmann's area; L, left; R, right. Coordinates refer to Talairach space. Brain areas with corrected p < 0.05 were listed.

required in basketball because the players often perceive their self-positioning in the court and make decision in offending/defending strategy (Bar-Eli and Tractinsky, 2000; Llorca-Miralles et al., 2013; Kinrade et al., 2015). Our results also confirmed that physical exercise induces volume increase in the ACC (Flöel et al., 2010; Prakash et al., 2010) and supported the opinion that ACC is the major neuronal substrate for attention (Osaka et al., 2007) and action selection (Rushworth, 2008). The IFG is related to the action observation and imitation (Buccino et al., 2004; Calvo-Merino et al., 2005, 2006; Iseki et al., 2008; Caspers et al., 2010), which may be essential in basketball because the programming and execution of motor plan in basketball highly relies on the action observation of the opponent players (Fujii et al., 2014a,b). The IPL is important for complex cognitive functions, including visual perception, spatial perception and visuomotor integration (Anderson, 2011). The present finding of more gray matter volume in IPL in basketball players was consistent with previous evidence from neuroimaging studies that IPL was activated during action observation (Grèzes and Decety, 2001; Buccino et al., 2004; Hamilton and Grafton, 2006; Chong et al., 2008) and anticipation with correct understanding of the movement (Rizzolatti et al., 2006).

The basketball players in the present study are top athletes in China. The expertise in basketball skill largely varies



BA, Brodmann's area; L, left; R, right. Coordinates refer to Talairach space. Brain areas with corrected p < 0.05 were listed.

depending on the special role that the athlete plays on the court (e.g., point guard and center have completely different playing styles) although our subjects have a similar duration in basketball training. The diversity in playing style with similar training duration might explain why we did not find significant correlation between gray matter volume and training duration in basketball players. This is different from our recent transcranial magnetic stimulation study in badminton players whose motor cortical excitation and inhibition are correlated to training years (Dai et al., 2016).

# Functional Connectivity in Cortical Neuronal Networks

Independent component analysis and seed-based correlation analysis are two most common techniques used in functional connectivity data analysis (Biswal et al., 1995; Fox et al., 2005). Although network map obtained from the independent component analysis may be used as a reference to interpret the results from seed-based correlation analysis, two data analysis techniques are based on different mathematical models (Calhoun et al., 2001; van de Ven et al., 2004; Joel et al., 2011). Independent component analysis provides a means to test several spatially separated cortical networks at once. However, the value of a voxel being tested with the independent component analysis represents the correlation between the time series of this voxel and the mean time series of a particular cortical network. The interpretation for these data-driven networks largely depends on the predetermined number of components for production which changes the patterns of spatially separated cortical networks. The interpretation is further challenged by the complexity of noise identification process which is often determined with system selection by the user. On the other hand, seed-based correlation analysis requires the selection of the seed regions. Voxel value from seed-based correlation analysis reflects the degree to which the time series of a tested voxel is correlated with the time series of the seed region. Owing to inherent simplicity, high sensitivity and ease of interpretation, seed-based correlation analysis is widely used to test the functional connectivity between a given seed region and the other cortical areas. We defined five seed regions with larger gray matter volumes in basketball players through VBM analysis at the first step in our study. Therefore, we specifically tested whether the correlations of time series between the seed regions and other cortical areas were different in basketball players and novices. A seed-based correlation approach is likely better and more practicable to identify the difference between two groups in our study. In addition, it may be worth mentioning that previous studies reported similar results when same resting-state fMRI data set was processed by independent component analysis and seed-based correlation analysis techniques (Damoiseaux et al., 2006).

Default mode network includes precuneus, posterior cingulate, medial prefrontal cortex and inferior parietal cortex (Raichle et al., 2001; Fox et al., 2005). We found greater connectivity between precuneus and medial prefrontal cortex in basketball players, supporting the functions of precuneus and the medial prefrontal cortex as the core nodes in default mode network (Martinelli et al., 2013). Our results were consistent with previous study performed in musicians that long-term motor learning and expertise experience lead to resting-state functional connectivity changes in the default mode network (Fauvel et al., 2014). As precuneus and medial prefrontal cortex highly involve in self-related episodic memory (Dörfel et al., 2009), it may be explained that higher activity in the default mode network is caused by frequently processed self-related episodic memory in basketball playing.

Salience network is composed of AI, dorsal ACC (dACC) and ventrolateral prefrontal cortex (Seeley et al., 2007; Chan et al., 2008). We found high connectivity between AI, frontal cortex and STP in basketball players. The result may be consistent with the notion that AI is highly involved in extracting key salient stimuli from multiple inputs (Menon and Uddin, 2010). Our previous study also reported greater AI activity when basketball players noticed incorrect anticipation during observation of a basketball free throw (Wu et al., 2013). Frontal cortex is responsible for the episodic memory extraction (McDermott et al., 1999; Wagner, 1999; Lepage et al., 2000; Cabeza et al., 2002) while STP is related to the storage of semantic memory (Markowitsch, 1995; Simmons and Martin, 2009). Our results support the idea that memory extraction and storage are essential in basketball. We also found strong connectivity between ACC and MSFG (one part of frontal cortex) in basketball players. As ACC is related to the detection and processing of salient information and monitoring of errors (Kiehl et al., 2000; Hester et al., 2005; Etkin et al., 2011), our results may suggest that the process of semantic memory with interaction between ACC and frontal cortical area is important to maintain the high performance for the basketball players.

Executive control network is distributed in the frontoparietal system which comprises the dorsolateral prefrontal cortex and posterior parietal cortex. Particularly, frontal cortex is a primary region in modulating regular allocation of spatial attention (Schafer and Moore, 2011) and parietal cortex is heavily involved in spatial awareness (Behrmann et al., 2004). Our results that connectivity between the IFG, MFG and IPL is stronger in basketball players compared to novices may verify the idea that executive control network is the key structure for converting selective and spatial attention (Wu et al., 2007) in athletes with high motor expertise.

# Motor and Cognitive Functions in Basketball Players

Basketball is a confrontational sport with open motor skill in which movements and actions of the players largely depend on the understanding of the environment and actions of other players (both the team mates and the players on the opposite side; Schmidt and Wrisberg, 2008). Our results with larger gray matter volume and higher functional connectivity in basketball players involved in multiple cortical areas and various cortical networks suggest that complex motor and cognitive functions combining visual search, perceptual anticipation and action execution are required in the development of motor expertise (Abernethy, 1996; Vickers, 2004). It is not surprising that the gray matter volumes in motor related cortical areas increase and show enhanced connectivity with other cortical areas as frequent engagement of these cortical areas during long-term training induces the cortical plasticity in the underlying neuronal components and facilitates the communication of these components within the networks (Fries, 2005; Lewis et al., 2009; Duan et al., 2012). Interestingly, we found gray matter volume increases and functional connectivity enhanced in a wide range of cortical areas among three different functional networks in basketball players. Structural and functional changes in these areas largely contribute to the improvement of cognitive functions including temporal and spatial attention (Wright et al., 2013; Wu et al., 2013), memory processing (Wan et al., 2011; Wang et al., 2013), decision making and error correction (Koelewijn et al., 2008; Cocchi et al., 2013) in the population with professional experience. Our results are consistent with previous studies performed in other sports players (Di et al., 2012; Wang et al., 2013) and support the view that development of motor expertise relies on the improvement both in motor and cognitive functions (Aglioti et al., 2008). Future studies with further consideration about the interaction and mutual advantage between motor and cognitive components may help elucidate the mechanisms of cortical plasticity during the acquisition of high-level motor expertise.

We did not find increase in gray matter volume and functional connectivity related to primary motor cortex in basketball players compared to novices. This is consistent with previous studies in cohorts with other motor expertise, such as musicians (Fauvel et al., 2014), taxi drivers (Maguire et al., 2000) and athletes (Wei et al., 2011; Di et al., 2012; Wang et al., 2013). Interestingly, our previous studies with transcranial magnetic stimulation found increased motor cortical excitability during different motor tasks in athletes (Wang et al., 2014; Dai et al., 2016). It may be inferred that changes (both gray matter volume and functional connectivity) in other motor and cognitive related brain areas alter the cortico-cortical projections to the primary motor cortex and eventually lead to the increased output from the motor cortex in athletes. However, we cannot exclude the possibility that functional or even structural changes occur in the primary motor cortex itself after different courses of long-term motor training (Gaser and Schlaug, 2003; Draganski et al., 2004).

# Limitations

We investigated the morphological and functional differences between athletes and novices with a cross-sectional paradigm. It may be argued that larger gray matter volume and stronger functional connectivity observed in basketball players are not induced by long-term training but simply due to the natural property in this cohort which potentially leads to an ''expert'' brain with better structure and functions during development. Although our optimized VBM analysis partly ruled out the effect caused by inherent brain size difference in two groups, the question how cortical plasticity with long-term training is related to the structural and functional changes in the brain should be further addressed by longitudinal studies performed along the whole career of the athletes.

In addition, stronger connectivity in three functional networks was identified in the basketball players. It has long been controversial whether and how the resting-state functional connectivity represents the anatomical and biological connections in the brain (Raichle et al., 2001; Fox et al., 2005, 2006). Our study does not directly approach the question how the strong functional connectivity in the basketball players is related to the biological changes after the long-term training. However, our findings that functional connectivity was relevant to the seed region where structural (gray matter volume) difference was found between two groups and that we did not perform any pre-selection in determination of the seed region might suggest the potential biological changes with cortical plasticity after long-term training in athletes. The opinion is also consistent with the evidence obtained from neuroimaging studies performed in musicians that changes in functional map are often accompanied with structural changes during acquisition of motor expertise (Schlaug, 2001).

A reversed question is whether the result of functional connectivity analysis is affected by the difference in seed regions determined in the VBM approach. The findings that higher resting-state functional connectivity seen in basketball players compared to novices in the present study was based on the whole brain wide correlation analysis and that the cortical areas where higher functional connectivity were found (except for the seed regions) did not show greater gray matter volumes might partly deny the cause of increase in functional connectivity with simple changes in seed sizes. Similar increases both in gray matter volumes in seed regions and functional connectivity with the seed regions were also found in musicians (Fauvel et al., 2014). In addition, the reference time series in two subject groups are likely slightly different due to more gray matter volume in basketball players compared to novices when parametric approach is used in our time domain analysis (Friston et al., 1994). The comparison for functional connectivity between two groups may further be confounded by the partial volume effect around the seed regions (Müller-Gärtner et al., 1992). Although we performed additional masks in the seed regions by excluding voxels with low gray matter density (value below 0.3) to minimize partial volume effect, it may still be argued that the functional connectivity in novices is potentially underestimated with the fact that the selected seeds in the novices are contaminated by gray matter in other adjacent cortical areas or even white matter. The interaction between structural and functional changes during the long-term course of motor training is complex and the answer to this complex question again requires future work with longitudinal studies performed in the training course.

# REFERENCES


#### CONCLUSION

Using structural and resting-state functional imaging techniques, the present study revealed larger volumes of gray matter in five seed regions and higher functional connectivity in default mode network, salience network and executive control network in basketball players compared to novices. We conclude that the morphology and functional connectivity in cortical neuronal networks in athletes and novices are different, and the differences may be related to higher level of motor expertise in athletes with better motor and cognitive functions.

#### AUTHOR CONTRIBUTIONS

X-YT and Y-LP conceived, designed and conducted experiments. YW analyzed data, interpreted results and wrote the manuscript. X-PL, L-LZ, WD and HZ helped conduct experiments and edited the manuscript. ZN, JW and JZ revised the manuscript. All authors approved the submitted version.

#### FUNDING

The present study was funded by the National Natural Science Foundation of China (No. 31470051, No. 31371056), Shanghai Pudong New Area Health Bureau (No. PWZxkq 2011-02) and Shanghai Key Lab of Human Performance (Shanghai University of Sport, No. 11DZ2261100).

# ACKNOWLEDGMENTS

We thank collaborators in East China Normal University for data acquisition.


attention systems. Proc. Natl. Acad. Sci. U S A 103, 10046–10051. doi: 10. 1073/pnas.0604187103


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Tan, Pi, Wang, Li, Zhang, Dai, Zhu, Ni, Zhang and Wu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Effects of Modified Constraint-Induced Movement Therapy in Acute Subcortical Cerebral Infarction

Changshen Yu1† , Wanjun Wang1† , Yue Zhang<sup>2</sup> , Yizhao Wang<sup>2</sup> , Weijia Hou<sup>2</sup> , Shoufeng Liu<sup>1</sup> , Chunlin Gao<sup>1</sup> , Chen Wang<sup>3</sup> , Lidong Mo<sup>3</sup> and Jialing Wu<sup>1</sup> \*

<sup>1</sup>Department of Neurorehabilitation, Department of Neurology, Tianjin Huanhu Hospital, Tianjin Key Laboratory of Cerebrovascular and Neurodegenerative Diseases, Tianjin, China, <sup>2</sup>Department of Rehabilitation Medicine, Tianjin Huanhu Hospital, Tianjin Key Laboratory of Cerebrovascular and Neurodegenerative Diseases, Tianjin, China, <sup>3</sup>Neurological Disease Biobank, Tianjin Neurosurgical Institute, Tianjin Huanhu Hospital, Tianjin Key Laboratory of Cerebrovascular and Neurodegenerative Diseases, Tianjin, China

Background: Constraint-induced movement therapy (CIMT) promotes upper extremity recovery post stroke, however, it is difficult to implement clinically due to its high resource demand and safety of the restraint. Therefore, we propose that modified CIMT (mCIMT) be used to treat individuals with acute subcortical infarction.

Objective: To evaluate the therapeutic effects of mCIMT in patients with acute subcortical infarction, and investigate the possible mechanisms underlying the effect.

#### Edited by:

Srikantan S. Nagarajan, University of California, San Francisco, United States

#### Reviewed by:

Sandro M. Krieg, Technische Universität München, Germany Filippo Brighina, University of Palermo, Italy

\*Correspondence: Jialing Wu wywjl2009@hotmail.com †These authors have contributed equally to this work.

> Received: 13 July 2016 Accepted: 04 May 2017 Published: 18 May 2017

#### Citation:

Yu C, Wang W, Zhang Y, Wang Y, Hou W, Liu S, Gao C, Wang C, Mo L and Wu J (2017) The Effects of Modified Constraint-Induced Movement Therapy in Acute Subcortical Cerebral Infarction. Front. Hum. Neurosci. 11:265. doi: 10.3389/fnhum.2017.00265 Methods: The role of mCIMT was investigated in 26 individuals experiencing subcortical infarction in the preceding 14 days. Patients were randomly assigned to either mCIMT or standard therapy. mCIMT group was treated daily for 3 h over 10 consecutive working days, using a mitt on the unaffected arm for up to 30% of waking hours. The control group was treated with an equal dose of occupational therapy and physical therapy. During the 3-month follow-up, the motor functions of the affected limb were assessed by the Wolf Motor Function Test (WMFT) and Motor Activity Log (MAL). Altered cortical excitability was assessed via transcranial magnetic stimulation (TMS).

Results: Treatment significantly improved the movement in the mCIMT group compared with the control group. The mean WMF score was significantly higher in the mCIMT group compared with the control group. Further, the appearance of motor-evoked potentials (MEPs) were significantly higher in the mCIMT group compared with the baseline data. A significant change in ipsilesional silent period (SP) occurred in the mCIMT group compared with the control group. However, we found no difference between two groups in motor function or electrophysiological parameters after 3 months of follow-up.

Conclusions: mCIMT resulted in significant functional changes in timed movement immediately following treatment in patients with acute subcortical infarction. Further, early mCIMT improved ipsilesional cortical excitability. However, no long-term effects were seen.

Keywords: constraint-induced movement therapy, rehabilitation, motor evoked potentials, cortical reorganization, acute subcortical stroke

# INTRODUCTION

Stroke significantly increases the mortality and morbidity in the developed as well as developing world (Sudlow and Warlow, 1997; Terént, 2003; Truelsen et al., 2003; Mehndiratta et al., 2015). Despite varying levels of functional recovery, substantial sensorimotor and cognitive deficits persist in more than 50% of survivors, resulting in significant socioeconomic burden (Hendricks et al., 2002; Kim, 2014). Approximately 80% of stroke survivors manifest motor impairments associated with the upper limb (Langhorne et al., 2009; Momosaki et al., 2016). The degree of upper limb paresis is correlated with the basic activities of daily living (ADL) after stroke (Veerbeek et al., 2011; van Mierlo et al., 2016).

Constraint-induced movement therapy (CIMT) promotes movement of upper extremities affected by paralytic stroke. The major components of CIMT include intense repetitive (taskoriented) training and behavioral sharping of the impaired limb with immobilization of the unimpaired arm. Animal studies suggest that increased use of the affected limb overcame the reduced motor activity associated with cortical lesions (Nudo et al., 1996; Kleim et al., 1998). Evidence supports the effectiveness of CIMT in improving dexterity and motor function in individuals with chronic hemiplegia (Wolf et al., 1989; van der Lee et al., 1999; Taub, 2000). There are some limitations to widespread use of CIMT in stroke rehabilitation: first, original CIMT protocol requires constant supervision, therefore, it is more expensive than customary care. Second, original CIMT protocol need constraint of the unaffected hand for approximately 90% of waking hours, but some individuals with hemiplegia cannot tolerate this long limit, and there are also some security issues, especially in acute stroke patients. Compared with original CIMT protocol, the modified CIMT (mCIMT) protocols were feasible and well tolerated in acute stroke patients (Souza et al., 2015).

Although stroke damage can be devastating, many patients survive the initial event and undergo some spontaneous recovery, which can be further augmented by rehabilitative therapy. The first few weeks after stroke are vital for neuroplasticity and relearning of impaired activities (Dobkin, 2004; Kwakkel et al., 2006). Randomized controlled studies have demonstrated mCIMT could improve more affected limb use and function in acute or sub-acute cerebrovascular accident (Page et al., 2005; Singh and Pradhan, 2013).

The recovery of motor function in cortical injury varies from that of subcortical injury (Liu et al., 2015) and the effects of mCIMT in early subcortical ischemic stroke is not established. This study is undertaken to determine if mCIMT is effective in the early phase of subcortical ischemic stroke, and investigate the possible mechanisms underlying the effect. We hypothesized that mCIMT could improve functional outcomes of hemiplegic upper limb in patients with acute subcortical ischemic stroke compared with conventional occupational and physical therapy, and increase ipsilateral cortical excitability.

# MATERIALS AND METHODS

#### Study Setting and Trial Registration

In this single-center randomized controlled clinical trial, patients were recruited from November 2013 to January 2016. This study was approved by the ethics committee of Tianjin Huanhu Hospital. All the procedures involving human participants were approved by the ethics committee of Tianjin Huanhu Hospital and were in accordance with the 1964 Helsinki Declaration and its later amendments, or comparable ethical standards. All participants provided informed consent. In this single-center randomized clinical trial, we compared the upper extremity function between the group exposed to mCIMT and a dose-equivalent control group immediately after intervention and 3 months later. Transcranial magnetic stimulation (TMS) was used to assess changes in cortical excitability after treatment and follow-up. The study was registered with the Chinese Clinical Trial Registry (Registration number: ChiCTR-IOR-15005770).

## Design and Participants

The inclusion criteria were: (1) stroke within 2 weeks of onset; (2) MRI showing subcortical ischemic stroke; (3) ability to raise two fingers with the forearm pronated on the table or lift the wrist 10◦ or more starting from a fully bent position; (4) respond to a 2-step command; and (5) a Mini Mental State Examination score exceeding 20. The exclusion criteria were: (1) inability to provide informed consent; (2) a history of stroke; (3) deviation greater than 2 cm on the line bisection test; (4) morbidity of the affected upper extremity resulting in functional limitation prior to stroke; (5) life expectancy less than 1 year; or (6) other neurological conditions affecting motor function or assessment (Thrane et al., 2015). Following informed consent, the patients were assigned to mCIMT or the control group using random odd- and even-numbered tickets in sealed envelopes. Patients selected one of the 60 sealed envelopes. Patients who selected tickets with even numbers represented the control group while those with odd numbers were allocated to mCIMT.

#### Interventions

The hemiplegic upper extremities in the mCIMT group were trained for 10 days by a licensed occupational therapist. All participants underwent 3 h per day of adaptive task practice and task training of the paretic limb (Wolf et al., 1989; van der Lee et al., 1999; Taub, 2000). Behavioral therapy comprised basic ADL together with skilled functional activities under supervision to improve motor performance. Positive feedback and increased gradations of difficulty were provided. Error data were provided after task training. Tasks with increasing levels of difficulty were assigned. In addition, patients carried a constraining mitt on the unaffected arm for nearly a third of their waking hours.

The control group was exposed to equal doses of traditional occupational therapy and physical therapy using a combination of neurodevelopmental techniques: bimanual tasks for the upper limbs, compensatory techniques for ADLs, strength and range of motion, positioning and mobility training.

#### Outcome Measurements

Primary outcomes included upper extremity motor function (tested with Wolf Motor Function Test (WMFT)) and a structured interview of real-world arm use with Motor Activity Log (MAL). The WMFT comprises 15 timed and two strength tasks (lifting the weighted limb and grip strength). The maximum time to complete a task was 120 s. If a trial was incomplete, the result was recorded as 121 s. The median time of all 15 tasks was used for analysis (Morris et al., 2001). The validity and reliability of the test had been demonstrated in stroke populations (Wolf et al., 2001, 2005; Nijland et al., 2010). The MAL was a structured interview comprising 30 standardized questions encompassing various ADL, which was used to assess the subjects' subjective report of 30 common daily tasks. It included two assessment subscales that rate the more affected upper extremity: an amount of use (AOU) scale and a quality of movement (QOM) scale (Bonifer et al., 2005). The MAL was characterized by stability over a 2-week period with high internal consistency, high inter-rater and test-retest reliability (Taub, 2000). The tool was used extensively in CIMT studies. All participants were assessed after inclusion but before randomization, after 2 weeks and after 3 months.

Secondary outcome was the change of cortical excitability. Motor-evoked potentials (MEPs) and cortical silent period (SP) were examined by Dantec Keypoint 4c eletromyography (EMG; Medtronic A/S, Skovlunde, Denmark) with Danish Medtronic MagPro R30 (Medtronic A/S, Skovlunde, Denmark) magnetic stmulator and a focal figure-eight-shaped coil (outer diameter 4.5 cm). The maximum intensity of the magnetic field was 2.5 tesla. All patients were seated comfortably in the supine position. Surface EMG electrodes (filter bandpass: 20–10 kHz) were attached 3 cm apart over the muscle bellies of the abductor pollicis brevis. The 10–20 International electrode system was used for positioning of the TMS coil which located the electrodes on the scalp using standard cranial landmarks. Five stimulation positions were C4/C3, FC4/FC3, C5/C6, CP4/CP3 and C2/C1 according to the 10–20 International electrode system for measurements of cortical excitability. They were marked with an EEG cap, and stimulated with 90% of maximum stimulator output. The position at which stimuli at slightly suprathreshold intensity consistently yielded maximal MEPs in the contralateral abductor pollicis brevis was defined as ''hot spot'' (Bergmann et al., 2012). Subsequently, rest motor threshold (RMT) was defined according to the guidelines of

the International Federation of Clinical Neurophysiology (IFCN) Committee as the minimum stimulus intensity eliciting MEPs of 0.50 mV in the resting muscle in at least 5 of 10 consecutive trials (Chen et al., 2008). The procedures were performed before the intervention and repeated similarly after the intervention and follow-up. Central motor conduction time (CMCT) was a neurophysiological measure that reflected conduction between the primary motor cortex and spinal cord. CMCT was calculated by subtracting the conduction time from the spinal roots to the muscle from the latency of MEPs evoked magnetically by transcranial cortical stimulation (Heald et al., 1993). In our previous study, cortical SP was an useful tool to predict outcome of acute stroke patients (Zhang et al., 2016). The SP had been proposed as an additional factor to the MEP for predicting motor recovery (van Kuijk et al., 2005). The length of the SP was measured from MEP onset until the return of uninterrupted voluntary EMG activity (Uozumi et al., 1991; Trompetto et al.,

2000; **Figure 1**). When TMS was applied during isometric muscle contraction, cortical SP could be evoked following the MEP, which would be lasting up to 100–300 ms (Braune and Fritz, 1995). The intensity of stimulation was 120% RMT. The altered ipsilesional or contralesional MEPs and SPs, were calculated using a change ratio (∆) as follows:

$$
\Delta = \frac{\text{evaluated results post} \mid \text{treatment or follow} \mid \text{up}}{\text{baseline results}}
$$

#### Statistical Analysis

SPSS version 21.0 package for Windows was used for all statistical analyses. Categorical variables were reported as proportions and continuous variables were reported as median values (interquartile range) or means ± standard deviations (SD). Baseline demographic variables were tested using independent t-test or Chi-square test. Differences between within-group inter-group or within-group analysis were determined by Mann-Whitney U-test or one-way analysis of variance (ANOVA) followed by Bonferroni multiple comparisons test. The level of statistical significance was set at P = 0.05.

## RESULTS

A total of 297 patients were screened and 29 eligible participants were selected between November 2013 and January 2016. Fifteen patients were assigned to mCIMT and 14 were enrolled in standard therapy. All the participants were inpatients, and no participant dropped out of the post-treatment assessments. One patient refused the 3-month follow-up, and another patient was lost to follow-up in the mCIMT group. Another patient was also lost to follow-up in the standard therapy group after 3 months. The flowchart outlining patient selection is presented in **Figure 2**.

#### Demographic Data

A total of 26 patients (22 men, 4 women) were enrolled and successfully followed up. No significant differences were seen between groups with medical comorbidities: 22 (88.5%) patients with hypertension, 15 (57.7%) with diabetes, three (11.5%) with atrial fibrillation, six (23.1%) with high homocysteine, and seven (27.9%) cases of stenosis of cerebral artery. The disease in the standard therapy group lasted from 2 days to 14 days with a mean of 6.15 ± 3.98 days. In contrast, the CIMT group lasted from 2 days to 14 days with a mean of 7.31 ± 3.86 days. Patient demographic and baseline data are described in **Table 1**.

#### Clinical Assessment

No significant differences in baseline data (pretreatment) were observed (**Table 2**). After 2 weeks of intervention, both groups showed an increase in ipsilateral upper limb motor function in the WMFT and MAL compared with baseline. A greater improvement in WMFT scores was observed in the CIMT group than in the standard therapy group (P < 0.001), and also in the extent of arm use (P = 0.038). However, other items of assessment



<sup>∗</sup>Represents continuous variable with normal distribution, expressed as mean ± SD; other values are expressed as n (%); mCIMT, modified constraintinduced movement therapy; NIHSS, NIH Stroke Scale.


Normal distribution variable, expressed as mean ± SD; abnormal distribution variable, expressed as median (inter-quartile range). Normal distribution variables were compared by variance analysis. Abnormal distribution variables were compared by Mann-Whitney U-test. mCIMT, modified constraint-induced movement therapy; WMFT time, Wolf Motor Function Test of Performance Time; WMFT score, Wolf Motor Function Test of Functional Ability; MAL-AOU, Motor Activity Log of Amount Of Arm Usage; MAL-QOM, Motor Activity Log of quality of movement.

scales were no different between the two groups. At 3-month follow-up, the scores of the QOM (MAL-QOM) and degree of arm use (MAL-AOU) were no different between mCIMT and standard therapy groups. The WMFT analysis yielded no differences in the functional ability between the groups.

#### Electrophysiology

TMS revealed similarities between the two groups with respect to baseline data (**Table 3**). After 2 weeks of intervention, MEPs were present in 10 (76.9%) patients in the CIMT group, with a significant improvement compared with baseline (P = 0.047). In the standard therapy group, the MEPs were observed in seven (53.8%) patients, with no difference compared with baseline (P = 0.695). Despite the absence of significant differences between the two groups, the presence of MEPs in mCIMT group were significantly higher than the pre-treatment levels. Concurrently, we found that the ipsilesional SP declined 21% compared with the baseline, which was statistically significant compared with the standard therapy group (P = 0.029). Other TMS parameters including contralesional SP and CMCT showed no significant changes from the standard therapy group. At 3 months of follow-up, both groups showed significant changes in ipsilesional SP compared with baseline (mCIMT p < 0.001; Control group P = 0.047). However, no differences were observed compared with each other.

#### DISCUSSION

CIMT and mCIMT are most effective improving functional outcomes of the upper paretic limb (Kwakkel et al., 2015). Intermediate level of evidence supports mCIMT as an effective intervention for upper extremity hemiparesis after stroke (Uswatte et al., 2005). A single clinical trial involving hospitalized

#### TABLE 3 | Effect of mCIMT on cortical excitability.


1 represents altered ratio of TMS motor evoked potentials (MEPs). Normal distribution variable, expressed as mean ± SD; abnormal distribution variable, expressed as median (interquartile range). Normal distribution variables were compared by one way of variance analysis. abnormal distribution variables were compared by Mann-Whitney U-test.

patients demonstrated significantly a higher total scores of the Action Research Arm Test and pinch subscale scores in the CIMT group immediately after therapy without follow-up assessment (Dromerick et al., 2000).

In this study, we used two standard clinical tests to assess upper motor function in patients with acute subcortical infarction. We observed a significant increase (of 1.18) in mean WMFT score (P < 0.001) in the mCIMT group after intervention (post-treatment) compared with the standard therapy, indicating that mCIMT promoted faster recovery. Although the mean WMFT time was not significantly improved in the mCIMT group after treatment, a downward trend was observed in the mean WMFT time. MAL scores below 0.27 in patients before intervention suggested occasional usage of their more affected arms for ADL tasks. Following intervention, subjects in the mCIMT group showed changes exceeding 2.0 points in the amount of use (AOU-MAL). The results suggested increased use of the affected upper limb for ADL tasks. The AOU scale scores were comparable to the results of previous mCIMT studies (Page et al., 2005; Wu et al., 2007). This study demonstrated that CIMT improved immediate motor function in patients with acute subcortical infarction.

After a follow-up of 3 months, no differences were seen in the WMFT and MAL scores between patients receiving mCIMT and standard therapies. Our result was consistent with a randomized controlled trial, which did not find a favorable effect of CIMT during the 6-month follow-up (Thrane et al., 2015). Recently, a home-based CIMT in patients with upper limb dysfunction after stroke showed that patients in both the groups showed improvement in the QOM. The home CIMT group outscored patients in the standard therapy group at 3 months, which was not consistent with our study. The patients in the home-based CIMT were recruited at least 6 months after stroke, and received 5 h of professional therapy in 4 weeks. Our study recruited patients at an earlier phase of stroke and all patients underwent shorter therapy.


Normal distribution variable, expressed as mean ± SD; abnormal distribution variable, expressed as median (interquartile range). Normal distribution variables were compared by variance analysis, and followed by Bonferroni multiple comparisons test. P1 represents statistical difference between Pre-treatment and Post-treatment; P2 represents statistical difference between Pre-treatment and Follow-up; P3 represents statistical difference between Post-treatment and Follow-up.

Yu et al. Modified CIMT Improves Outcome

A series of studies demonstrated increased cortical neuroplasticity in the subacute and chronic post-stroke phases of the brain (Liepert et al., 1998; Ro et al., 2006; Boake et al., 2007; Sawaki et al., 2008; Laible et al., 2012). However, studies conducted on patients with subcortical lesions are rare (Jang, 2007). Our study of patients with subcortical infarction displayed improved MEPs and ipsilesional SP in the mCIMT group after treatment, suggesting significant enhancement in cortical excitability of the lesion side. Studies showed that reduced ipsilesional SP level was a prognostic factor for spasticity in chronic stroke (Uozumi et al., 1992; Cruz Martínez et al., 1998). However, it may play a different role in acute stroke. Our earlier study demonstrated decreased ipsilesional SP levels during the first few days after acute cerebral infarction and significantly predicted the outcome within 3 months (Zhang et al., 2016). Patients with an SP value more than 217.05 showed a 7.69-fold increased risk of unfavorable outcomes compared with patients reporting an SP value less than 217.05. In the present study, we found that ipsilesional SP levels were significantly reduced immediately post-mCIMT compared with the control group. We speculate that reduced SP levels may reflect increased muscle tone and improved functional recovery in the acute phase of cerebral infarction.

Previous studies showed that recruitment of supplementary motor areas on the ipsilesional side enhanced the recovery. However, persistent activation of the contralesional cortex was associated with a slower and less complete recovery (Murphy and Corbett, 2009; Xerri, 2012). The present study found no changes in the SP, and CMCT in the contralesional sides of both groups. Our results suggested that the functional improvement of affected upper limb after treatment were associated with enhanced ipsilesional cortical excitability. At 3 months follow-up, significant change in ipsilesional SP was detected in mCIMT group compared with pre-treatment, and we also the similarly change in standard therapy groups. The reason for those changes may be that patients were not required to adhere to similar training at home (**Table 4**).

Our study limitations are as follows. First, the present study was a single-center clinical trial, and it was difficult to recruit eligible patients. Multi-centered trials are needed to increase the sample size. Second, using non-navigated TMS was a major limitation to the present study. It was well known that navigational systems allow TMS within a spatial deviation of few millimeters to a desired region of the cortex (Rossini and Rossi, 2007; Sparing et al., 2008). Using the International 10–20 electrode system for positioning of the

#### REFERENCES


TMS coil was not as accurate as navigational systems, but it was easily applicable in its practicable use. Moreover, using 10–20 electrode system enabled us not only to quickly retrieve the cortical region of interest but also shorten pre-evaluation related time, and avoid some security concerns resulting from navigated TMS in acute stroke patients. Third, we assessed changes of cortical excitability using TMS. However, single TMS pulses were unlikely to distinguish between stimulation of cortico-spinal, intra-cortical and trans-cortical elements, but instead target all three to varying degrees (Bestmann and Krakauer, 2015). We could not determine the contribution of other areas of the brain to the resultant output. Combination of TMS and brain functional imaging or specific stimulation protocols, such as paired-pulse stimulation could facilitate our understanding of neurophysiological mechanisms of stroke recovery.

#### CONCLUSION

Compared with standard therapy, mCIMT induced significant functional changes in acute subcortical ischemic stroke patients. Early intervention with mCIMT promotes ipsilesional cortical reorganization, without any long-term effect.

# AUTHOR CONTRIBUTIONS

JW contributed to the conception and design of the work; contributed in revising the work for important intellectual content. CY, WW, YZ, YW, WH, SL, CG, CW and LM contributed in the data acquisition. CY, WW and JW contributed in the analysis and interpretation of data for the work. CY and WW contributed in drafting the work. All authors approved the final version to be published, and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

#### ACKNOWLEDGMENTS

This study was generously supported by Grants from Tianjin Public Health Bureau (2013KG122, to JW), and General Administration of Sport of China (2015B098, to JW), and Program to Establish Scientific Research Resources by Tianjin Municipal Health bureau (Establishment and Quality Control for biobanking of Neurological Diseases).


with ischemic hemispheric lesion. Stroke 26, 550–553. doi: 10.1161/01.str. 26.4.550


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Yu, Wang, Zhang, Wang, Hou, Liu, Gao, Wang, Mo and Wu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Cognitive Resources Necessary for Motor Control in Older Adults Are Reduced by Walking and Coordination Training

Ben Godde1,2,3\* and Claudia Voelcker-Rehage2,3,4

<sup>1</sup>Department of Psychology and Methods, Jacobs University Bremen, Bremen, Germany, <sup>2</sup>Jacobs Center on Lifelong Learning and Institutional Development, Jacobs University Bremen, Bremen, Germany, <sup>3</sup>Center for Cognitive Science, Bremen University, Bremen, Germany, <sup>4</sup> Institute of Human Movement Science and Health, Technische Universität Chemnitz, Chemnitz, Germany

We examined if physical exercise interventions were effective to reduce cognitive brain resources recruited while performing motor control tasks in older adults. Forty-three older adults (63–79 years of age) participated in either a walking (n = 17) or a motor coordination (n = 15) intervention (1 year, 3 times per week) or were assigned to a control group (n = 11) doing relaxation and stretching exercises. Pre and post the intervention period, we applied functional MRI to assess brain activation during imagery of forward and backward walking and during counting backwards from 100 as control task. In both experimental groups, activation in the right dorsolateral prefrontal cortex (DLPFC) during imagery of forward walking decreased from pre- to post-test (Effect size: −1.55 and −1.16 for coordination and walking training, respectively; Cohen's d). Regression analysis revealed a significant positive association between initial motor status and activation change in the right DLPFC (R <sup>2</sup> = 0.243, F(3,39) = 4.18, p = 0.012). Participants with lowest motor status at pretest profited most from the interventions. Data suggest that physical training in older adults is effective to free up cognitive resources otherwise needed for the control of locomotion. Training benefits may become particularly apparent in so-called dual-task situations where subjects must perform motor and cognitive tasks concurrently.

#### Edited by:

Klaus Gramann, Technische Universität Berlin, Germany

#### Reviewed by:

Brenda Malcolm, The Graduate Center (CUNY), USA Karen Zown-Hua Li, Concordia University, Canada Eling D. De Bruin, ETH Zurich, Switzerland

#### \*Correspondence:

Ben Godde b.godde@jacobs-university.de

Received: 29 September 2016 Accepted: 16 March 2017 Published: 11 April 2017

#### Citation:

Godde B and Voelcker-Rehage C (2017) Cognitive Resources Necessary for Motor Control in Older Adults Are Reduced by Walking and Coordination Training. Front. Hum. Neurosci. 11:156. doi: 10.3389/fnhum.2017.00156 Keywords: motor imagery, functional MRI, motor status, cognitive aging, physical fitness, locomotion

# INTRODUCTION

It has been demonstrated that gait and balance are increasingly in need of cognitive control and supervision with advancing age (Hausdorff et al., 2005; Yogev-Seligmann et al., 2008; Berchicci et al., 2014; Van Swearingen and Studenski, 2014). Such need for a certain amount of cognitive resources for movement coordination or control in older adults (Loewenstein and Acevedo, 2010) has been indicated by expanded brain networks and increased brain activation while performing a single motor task as compared with young adults (for reviews see Seidler et al., 2010; Papegaaij et al., 2014; Hamacher et al., 2015).

Due to the impossibility of performing larger movements in the MR or PET scanner, in recent years, motor imagery has been established as a method to investigate cortical activations during locomotion (Miyai et al., 2001; Malouin et al., 2003; Jahn et al., 2004; la Fougère et al., 2010; Peterson et al., 2014). Numerous studies confirmed that when imagining a movement similar and identical brain areas are activated as if the movement was actually being performed (Stephan et al., 1995; Jeannerod and Frak, 1999; Lotze et al., 1999; Sahyoun et al., 2004; Solodkin et al., 2004; la Fougère et al., 2010; for review see Lafleur et al., 2002; Allali et al., 2014). The locomotor network, as revealed by motor imagery, includes the supplementary and primary motor areas, right prefrontal cortex the basal ganglia, brainstem, tegmentum and cerebellum (Miyai et al., 2001; Jahn et al., 2004; la Fougère et al., 2010; Allali et al., 2014).

With respect to age differences, Allali et al. (2014) observed an age-related increase in brain activity in the right supplementary motor area (BA6), the right orbitofrontal cortex (BA11), and the left dorsolateral frontal cortex (BA10; Allali et al., 2014). Higher activations in older as compared to young adults were also observed in the middle temporal visual area MT/V5 (Wai et al., 2012; Zwergal et al., 2012) and subcortical regions including putamen and substantia nigra (Allali et al., 2014). The resulting use of frontal cortical resources also leads to lower cognitive and motor performance during dual-task situations in older adults (Kahnemann, 1973; Lindenberger et al., 2000; Huxhold et al., 2008; Malcolm et al., 2015).

The amount of cognitive control required for performing a motor task is not only affected by a person's age but also by her or his motor fitness status (Godde and Voelcker-Rehage, 2010; Berchicci et al., 2014). Using electroencephalography, Berchicci et al. (2014) revealed that older adults who regularly exercise reveal less reliance on extra cognitive control resources during basic visuo-motor functions. In a previous cross-sectional study, using motor imagery, we investigated with functional MRI brain activation in simple and complex walking tasks (walking forward and backward on a treadmill) and analyzed if the motor status of older adults influenced these activation patterns. Motor high-fit individuals showed more activations and larger BOLD signals in motor-related areas compared to low-fit participants but demonstrated lower activity in the dorsolateral prefrontal cortex (DLPFC). Moreover, parietal activation in high-fit participants remained stable throughout the movement period whereas low-fit participants revealed an early drop in activity in this area accompanied by increasing activity in frontal brain regions (Godde and Voelcker-Rehage, 2010).

Based on these findings, one could assume that interventions targeted to improve motor fitness could be a reasonable approach to free up prefrontal (cognitive) brain resources otherwise used for cognitive control of locomotion in older adults. To confirm this assumption, we examined the effects of 1 year of physical exercise interventions on brain activation in the same simple and complex walking tasks (walking forward and backward on a treadmill) as in our previous cross-sectional study. As we could also show previously that different dimensions of physical fitness (cardiovascular and motor fitness) and different types of physical exercise interventions (cardiovascular and motor coordination training) had different positive effects on brain functioning during performance of cognitive tasks (Voelcker-Rehage et al., 2010, 2011), we were also interested whether such interventions would differ in their effect on cognitive control of imagined walking movements. As in the previous study, we used motor imagery of walking forward and backward to assess brain activation patterns during locomotion with functional MRI.

#### MATERIALS AND METHODS

This study was part of the Old Age on the Move intervention study at Jacobs University Bremen (see Voelcker-Rehage et al., 2011) that examined effects of different kinds of physical exercise on cognitive, motor, and emotional functioning. Motor status and brain processing during motor imagery were assessed before the start of the intervention (t1) and after 12 months (t2).

#### Participants

In total, for the Old Age on the Move study, 91 older adults between 63 and 79 years from the Bremen (Germany) area were recruited through the member registry of a German health insurance company (DAK) or through newspaper articles. All participants took part voluntarily and provided written informed consent to the procedures of the study. They received compensation for their travel expenses at the end of the 1-year study amounting to Euro 100. The study conformed to the Code of Ethics of the World Medical Association (Declaration of Helsinki) and was approved by the ethics committee of the German Psychological Society (DGPs; Voelcker-Rehage\_072006).

Participants had medical clearance and were screened for health restrictions before inclusion in the study by means of a telephone interview. They were excluded from study participation if they had a history of cardiovascular diseases, any neurological disorder (e.g., self-report of neurological diseases such as a brain tumor, Parkinson's disease, stroke), any other motor or cognitive restrictions (e.g., a score of less than 27 in the Mini Mental Status Examination, MMSE, Folstein and Van Petten, 2008), or metal devices in the body. Further, participants were screened for number of falls in the year before study participation (no falls: n = 38; one fall: n = 4, two falls: n = 1). Participants who were absent for more than one test day or more than 25% of the training sessions (calculated independently for each half year of the study) were excluded from data analysis (n = 47). One participant had to be excluded due to incomplete brain imaging data. None of the included participants experienced change in health status during the 1 year study interval. To assess the subjective ability to perform the requested imagery tasks, the Movement imagery questionnaire MIQ-R (Hall and Martin, 1997) was applied during debriefing directly after the scanning session. The MIQ-R is a rating scale to assess the capacity to elicit mental images. It asks for the clarity of image (scale from 1 very hard to see/feel to 7 very easy to see/feel) and the intensity in which participants could feel themselves making movements (Hall and Martin, 1997). No further participant had to be excluded because of not answering to or scoring less than 4 on the 7-point vividness scale of the questionnaire.



Age (average age in years), Education (years of education), IQ, Health (number of diseases), Activity Index (kcal expended per week by leisure time and physical activities, see Huy et al., 2008), Body Mass Index (BMI), hypertension (proportion of participants who had been diagnosed with hypertensive disorder), estrogene replacement therapy (ERT, proportion of participants who participated in an estrogene replacement therapy), and positive affect (affect questionnaire encompassing high and low arousal (Kessler and Staudinger, 2009). There was no significant group effect for any of the measures.

The final sample consisted of 43 participants between 63 and 79 years of age (28 women and 16 men, mean age = 69.6, SD = 3.8). Detailed demographic information as well as information about cognitive and fitness status of the participants is summarized in **Table 1**. Participants of the experimental and the control groups did not differ statistically on measures of age, years of formal education, intelligence index, health, physical activity index, BMI, hypertension, estrogen replacement therapy (for women only) and positive affect (always p > 0.10).

As describe in Voelcker-Rehage et al. (2011), only small sample selectivity was found for age (remaining participants were older), health (remaining participants were healthier), and positive affect (remaining participants were more positive). Given the size of the effects, it seems viable to conclude that findings obtained with the post-test sample may be generalized to the pre-test parent sample.

#### Interventions

Participants were assigned to two experimental groups and one control group. Not all interventions could be offered at all training facilities and thus randomization of group assignment was restricted by residency of the participants. Training groups were led by an experienced exercise leader, three times a week and 1 h each for 12 months (Voelcker-Rehage et al., 2011). Participants of the cardiovascular training group (N = 17, 12 women, 5 men, mean age = 69.3, SD = 3.3) participated in a walking intervention designed to improve cardiorespiratory fitness (aerobic endurance). Training intensity prescriptions were based on HR responses to spiroergometry exercise testing and was aimed to meet a moderate level. Participants of the second intervention group (N = 15, 10 women, 5 men, mean age = 71.3, SD = 4.7) received coordination training designed to improve fine and gross-motor body coordination. This program focused on the improvement of complex movements for the whole body such as balance, eye-hand coordination, leg-arm coordination as well as spatial orientation and reaction to moving objects/persons. The active control group (N = 11, 6 women, 5 men, mean age = 68.5, SD = 3.1) performed a program of relaxation techniques, stretching and limbering for the whole body especially designed for older adults. This group served as a control group to evaluate the potential effects of being involved in a guided group activity for 12 months as well as controlling for retest effects. For details of the intervention programs, see Voelcker-Rehage et al. (2011).

#### Assessment of Motor Status

The motor status of the participants was assessed at t1 and t2 by a heterogeneous motor test battery comprising tests of the five dimensions movement speed, balance, fine coordination, flexibility and strength (Godde and Voelcker-Rehage, 2010): movement speed was assessed by use of the following four tests: hand tapping (Oja and Tuxworth, 1995; cronbachs α = 0.88), feet tapping (Voelcker-Rehage and Wiertz, 2003; cronbachs α = 0.97), 30-s chair stand test (Rikli and Jones, 1999; single trial), and agility test (Adrian, 1981; cronbachs α = 0.95). Balance was assessed by backwards beam walk (Kiphard and Schilling, 1974; cronbachs α = 0.90) and one-leg-stand with eyes open and closed (Ekdahl et al., 1989, cronbachs α = 0.88). Further we assessed fine coordination by use of the Purdue Pegboard test (Tiffin and Asher, 1948; cronbachs α = 0.93), flexibility by the shoulder flexibility test (Rikli and Jones, 1999; cronbachs α = 0.95) and strength by measuring grip force (Igbokwe, 1992; cronbachs α = 0.97). An overall index for the motor status (mean of the z-transformed individual performances within the five domains) was calculated using a z-transformed sum score of the five fitness dimensions. This index was normally distributed at T1 (Shapiro-Wilk test: W(43) = 0.978, p = 0.564).

#### Movement Imagery

At t1 and t2, participants performed three imagery tasks with eyes closed and in first-person perspective: (i) walking forward with an individual moderate speed (2.5–3.5 km/h); (ii) walking backward in tandem walk (1 km/h); (iii) standing still and relaxed (baseline condition); and (iv) counting backward from 100 was chosen as a non-movement control condition. Outside the MR scanner, before the test sessions, a standardized description of the imagery tasks was provided and participants completed a task familiarization exercise. Participants were trained in the two experimental motor tasks (walking forward and backward) and the two control tasks (standing and counting backward). First, participants performed the real tasks and the imagination on a treadmill (Model Lode Valiant, Groningen Netherland). Walking forward was trained with an individual moderate speed (2.5–3.5 km/h) and easy swinging of the arms and walking backward was trained in tandem walk (1 km/h). All participants were trained as long as they needed to feel comfortable on the treadmill. The range was between 10 min and 20 min in total. We used a treadmill instead of real-world walking to provide constant visual input and ground. After executing the real and imagined movements on the treadmill participants trained imagination of these movements (including gait initiation) in a horizontal position in periods of 20 s each until they felt well experienced with the tasks. Participants were instructed to close their eyes and to use a first-person perspective to perform the imagery tasks. Then, at another day, participants first repeated the movement imagination outside the MRI scanner until they felt confident again and then performed the tasks within the scanner (first person perspective, eyes closed).

# Functional MRI

Functional MRI scans were performed at pre- and post-test in a randomized block design with six blocks of 20 s for each of the four conditions in a randomized order without any break between the blocks resulting in a total of 24 blocks lasting for 480 s.

We used a 3T head scanner (Siemens Magnetom Allegra, Erlangen, Germany). A T2<sup>∗</sup> -weighted gradient echo multislice sequence (EPI, TR 2500 ms, TE 60 ms, voxel size 3 × 3 × 3 mm, matrix 64 × 64) was used to acquire 48 slices covering the whole brain and the cerebellum. Additionally, a high-resolution T1-weighted anatomical 3D-dataset containing 172 sagittal slices (1 × 1 × 1 mm<sup>3</sup> ) was acquired for each subject.

Analysis of fMRI data was performed using Brain Voyager (Brain Innovation B.V., Maastricht, Netherlands). FMRI data were first corrected for motion artifacts and linear trends, smoothed in the temporal (2.8 s) and spatial (6 mm) domain, and normalized to Talairach space. The BOLD responses were modeled with a delayed box-car function convolved with a canonical hemodynamic response and a general linear model (GLM) was applied to the time course of each voxel. A random effects analysis was performed, considering the inter-subject variability; the results can therefore be generalized to other samples. On the first level, weighted beta-images were computed for every condition (forward walking, backward walking, and counting backward from 100) relative to baseline (standing still). On the second level, these individual beta values were then entered into a 3 (INTERVENTION groups) × 2 (SESSION: t1 vs. t2) × 3 (CONDITION) random effects analysis of variance P-values were corrected for multiple comparisons by false discovery rate (FDR, P < 0.05) and cluster threshold estimation using Monte Carlo simulations (alpha level < 0.05; Forman et al., 1995; Goebel et al., 2006). Effect sizes of group differences (intervention groups vs. control group) in cortical activation changes (t2–t1) were calculated as Cohen's d (based on sample size; Hedge's Adjustment and weighted average).

# Further Statistical Analysis

Statistical analyses were performed using SPSS for Windows version 20 (IBM Corp., Armonk, NY, USA). From those regions revealing a significant INTERVENTION × SESSION × CONDITION interaction effect we selected those which in our previous cross-sectional study also revealed to be related to fitness (Godde and Voelcker-Rehage, 2010). BOLD values and beta estimates of the individual peak voxels in these regions were extracted and subjected to linear regression analysis with following regressors: group (experimental or control, dummy coded as 1 or −1, respectively; because both intervention groups did not differ in their effect on brain activation change in these regions we combined them in this analysis), the interaction term of group and initial motor status at t1, and the interaction term of group and change in motor status from t1 to t2 (**Table 2**). For that purpose, motor status indices at t1 and t2 were z-transformed. T2 values were transformed relative to t1 and change in motor status was defined as the difference t2−t1 of these z-transformed indices. For calculating the interaction terms with factor group both indices were centered. The level of significance was set to p < 0.05.

# RESULTS

To answer our research questions, fMRI data obtained during motor imagery were analyzed in a two-step procedure. First, we identified regions that revealed significant INTERVENTION × SESSION × CONDITION interaction effects. This interaction effect was revealed for a variety of frontal, parietal, and subcortical brain regions. These regions included frontally the right DLPFC and middle frontal cortex, bilaterally the superior and medial frontal gyrus (MeFG), the precentral gyrus (PrCG) and the left anterior cingulate. Further the postcentral gyrus (PoCG) and the left caudate revealed such interaction effects (**Table 3**).

In the second step, from those regions, we selected only regions that had also been activated stronger in less- than in higher motor fit participants in our previous cross-sectional study (Godde and Voelcker-Rehage, 2010), thus indicating


TABLE 3 | Regions of interests (ROI) with significant SESSION × INTERVENTION × CONDITION interaction Effects (P < 0.05, cluster threshold: 37 voxels).


Listed are the anatomical descriptions (DLPFC, dorsolateral prefrontal cortex; SFG, superior frontal gyrus; MFG, middle frontal gyrus; MeFG, medial frontal gyrus; PrCG, precentral gyrus; ACC, anterior cingulate cortex; PoCG, postcentral gyrus) and Brodmann areas (BA), numbers of voxels (n Vox), Talairach coordinates (Tal X, Tal Y, Tal Z), and F and p-values for the peaks of the respective ROI. The first ROI is marked in gray as it is the only ROI that also revealed significant effects of motor fitness status on brain activation during walking imagery at baseline (Godde and Voelcker-Rehage, 2010).

increased need for cognitive control of motor imagery in low-fit older adults. Only the right DLPFC (Brodmann area 9) met this second criterion.

Follow-up analyses revealed significant reductions in right DLPFC activation from t1 to t2 for both intervention groups as compared to the control group for imagery of walking forward. Even activation for backward walking was reduced in both intervention groups, but not significantly. Interestingly, DLPFC activation was increased for the walking group but not for the coordination group as compared to the control group for counting backward from 100 (**Figure 1**). There were no differences in effect size between the two interventions as revealed by direct comparison of both intervention groups (pairwise two-tailed paired samples t-test, p > 0.13).

FIGURE 1 | Effect sizes for group differences in activation change (change in beta estimates) in the right dorsolateral prefrontal cortex (DLPFC; left panel). Effect sizes of intervention groups (coordination and walking group) relative to the control group were calculated as Cohen's d (based on sample size; Hedge's Adjustment and weighted average). Stars indicate significant effects of the intervention groups.

FIGURE 2 | Activation change in the right DLPFC dependent on baseline motor fitness as indicated as the motor index at t1. Data were centered and z-transformed. Since we did not find differential effects of the two intervention types (walking and coordination), data from both groups were pooled and compared to the control group. Particularly participants with low motor index at t1 (low motor index) revealed the strongest reduction in DLPFC activation after the intervention (negative change values).

Regression analysis with initial motor status at T1 and change in motor status from T1 to T2 as regressors revealed a significant positive association of the initial motor status and activation change in right DLPFC for pooled intervention groups. The overall linear regression model was significant (R <sup>2</sup> = 0.243, F(3,39) = 4.18, p = 0.012). Besides the factor group (standardized beta coefficient = −0.39, T = −2.76, p = 0.009), the interaction of group and baseline motor index (standardized beta coefficient = 0.33, T = 2.30, p = 0.027) were revealed as significant predictors for change in right DLPFC activation. As illustrated in **Figure 2**, participants with low motor status at t1 profited most from the intervention. Because of the small sample size, however, these results must be taken with care.

#### DISCUSSION

Our study addressed the question whether physical training interventions are effective to reduce the need for cognitive control of locomotion in older adults. Results confirm that both walking and coordination training reduced frontal brain activation during imagery of walking forward and backward. Moreover, participants with lower baseline motor status profited most from the intervention. Our data suggest that physical interventions not only have direct effects on cognitive and brain function in older adults as reported earlier (Colcombe and Kramer, 2003; Colcombe et al., 2004; Hillman et al., 2008; Lustig et al., 2009; Voelcker-Rehage et al., 2011; Hayes et al., 2013; Voelcker-Rehage and Niemann, 2013; Schättin et al., 2016), but also indirect effects by freeing up frontal brain resources otherwise needed for the control of motor actions. Herewith our findings are also in line with a recent study using near-infrared spectroscopy revealing that video game dancing training and balance training reduce left and right PFC oxygenation during fast walking (Eggenberger et al., 2016). With decreasing reserve capacity in older adults, these effects may become increasingly important and become especially apparent in so-called dual-task situations where subjects have to perform motor and cognitive tasks concurrently, for example, during crossing a street while observing the traffic flow or walking by talking (Lindenberger et al., 2000; Yogev-Seligmann et al., 2010; Al-Yahya et al., 2011; Neider et al., 2011).

We found INTERVENTION × SESSION × CONDITION interaction effects for a variety of frontal cortical areas belonging to the motor imagery network as described earlier (Allali et al., 2014; Hamacher et al., 2015), indicating altered use of cognitive resources for motor control after the training interventions. Interestingly, these effects were also found for the right but not the left DLPFC which has also been shown to be involved in motor imagery in previous studies (e.g., Malouin et al., 2003; Jahn et al., 2004), particularly in older adults (Allali et al., 2014). These studies, however, did not consider the motor fitness status of the participants. When the motor fitness status was considered, as in our previous cross-sectional study (Godde and Voelcker-Rehage, 2010), more activity in the right DLPFC was revealed in low-fit as compared to high-fit older adults. We explained this finding in the sense that the control condition (standing still) also requires some attentional control and thus the right DLPFC activity particularly seen in low-fit participants only mirrors additional activation that can be interpreted as compensatory. Such additional activity in homologs contralateral frontal areas in older adults has repeatedly been put into the context of compensatory mechanisms of age-related changes (e.g., ''Hemispheric asymmetry reduction in older adults (HAROLD)'' hypothesis; Cabeza, 2002). However, it is not possible to measure motor performance using a motor imagery paradigm, and therefore it must remain open if this additional frontal activation reflects compensation, dedifferentiation or just the higher task complexity for low as compared to high-fit participants. Increased involvement of prefrontal cortex in older adults during complex gait tasks or in imagined walking conditions with high cognitive load was also confirmed by recent reviews (Holtzer et al., 2014; Hamacher et al., 2015).

Interestingly, the interventions did not differ (but they differed from the control group) in respect to their effects on cognitive resources allocated in the DLPFC for the control of walking movements. One explanation might be that the effect in the walking group similar to the coordination group was due to better motor control abilities rather than enhanced cardiovascular fitness, i.e., due to the extensive walking experience walking became more automated (Ross et al., 2003; Wei and Luo, 2010). This is supported by the finding that effects are stronger for imagery of walking forward, what has specifically been trained in the walking group, than walking backwards, what is the more complex task. Indeed, recent MR studies revealed motor training-induced gray and white matter changes in motor-related areas such as the supplementary and presupplementary motor cortex (SMA/pre-SMA) and increased functional connectivity to prefrontal and parietal brain regions, even in older adults (for review see Taubert et al., 2012). Further, coordinative exercise as applied here leads to increased basal ganglia volume in older adults (Niemann et al., 2014).

It might be that (additionally) increase in cardiovascular fitness also could have led to some positive effects on cognitive processing based on more efficient use of frontal brain resources (Voelcker-Rehage et al., 2011). With the paradigms tested here, however, that does not seem to play a role. This might be different under dual-task conditions but must remain speculative here.

We aimed to assure clarity and intensity of motor imagery during scanning. For that purpose, we applied the MIQ-R only. A chronometric test in which the time needed to complete real walking and walking imagery is compared for the two conditions could have given supportive evidence on better forward walking abilities in the walking group.

The motor fitness status of the control group greatly improved from time 1 to time 2, even more so than the cardiovascular training group (**Table 2**) and one might wonder why this improvement in actual motor fitness was not reflected by a change in brain activity (specifically in right DLPFC) for imagined movements in the control group as well. It could well be that stretching and relaxation as exercised in the control group might improve proprioceptive function and self-perception. However, the control group did not explicitly train walking or actively controlled movements. This might explain why they generally performed better in the motor test battery but did not reveal activation changes related to specific motor control in frontal brain regions.

The reader might also wonder about positive effect sizes for counting (**Figure 1**) which seems to indicate increased activation after the intervention and thus more need for cognitive resources. However, all effects were calculated in contrast to the standing condition as baseline condition. Thus, this positive effect for counting might be due to a negative effect for standing (though much less than for walking backwards or even more so walking forward).

Further experiments using electroencephalography or near-infrared spectroscopy during real movements might add additional evidence that frontal brain resources used for cognitive control can be reduced by specific motor training in older adults (Eggenberger et al., 2016; Schättin et al., 2016). Overall physical activity that stress the motor system (here either by regular walking or coordination training) might be beneficial to preserve or enhance cognitive resources (see Voelcker-Rehage et al., 2011), but also to preserve motor functioning—at least in the practiced tasks (here walking forward) leading to less cognitive resources needed to perform a motor task and having resources available in complex situations of daily life.

Based on our results, it is difficult to favor one intervention (walking vs. coordination training) over the other and it might be advised to combine both exercise dimensions in training programs for older adults.

# AUTHOR CONTRIBUTIONS

BG and CVR designed and performed the study and acquired, analyzed and interpreted the data. They together drafted the manuscript and agreed to be accountable for all aspects of the work in terms of accuracy and integrity.

#### REFERENCES


#### ACKNOWLEDGMENTS

Our work was supported by the Robert Bosch Foundation (12.5.1366.0005.0) and the German health insurance company DAK. We thank Peter Erhard, Ekkehard Küstermann and Melanie Löbe (Center for Advanced Imaging, University of Bremen) for support with functional MR imaging.


tasks: a PET study. Hum. Brain Mapp. 19, 47–62. doi: 10.1002/hbm. 10103


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Godde and Voelcker-Rehage. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Acute Exercise Improves Motor Memory Consolidation in Preadolescent Children

Jesper Lundbye-Jensen1,2,3\*, Kasper Skriver 1,2,3 , Jens B. Nielsen<sup>2</sup> and Marc Roig4,5

<sup>1</sup>Department of Nutrition, Exercise and Sports, University of Copenhagen, Copenhagen, Denmark, <sup>2</sup>Department of Neuroscience and Pharmacology, University of Copenhagen, Copenhagen, Denmark, <sup>3</sup>Copenhagen Centre for Team Sport and Health, University of Copenhagen, Copenhagen, Denmark, <sup>4</sup>Memory and Motor Rehabilitation Laboratory (MEMORY-LAB), Feil and Oberfeld Research Centre, Jewish Rehabilitation Hospital, Montreal Center for Interdisciplinary Research in Rehabilitation (CRIR), Laval, QC, Canada, <sup>5</sup>School of Physical and Occupational Therapy, Faculty of Medicine, McGill University, Montreal, QC, Canada

Objective: The ability to acquire new motor skills is essential both during childhood and later in life. Recent studies have demonstrated that an acute bout of exercise can improve motor memory consolidation in adults. The objective of the present study was to investigate whether acute exercise protocols following motor skill practice in a school setting can also improve long-term retention of motor memory in preadolescent children.

Methods: Seventy-seven pre-adolescent children (age 10.5 ± 0.75 (SD)) participated in the study. Prior to the main experiment age, BMI, fitness status and general physical activity level was assessed in all children and they were then randomly allocated to three groups. All children practiced a visuomotor tracking task followed by 20 min of rest (CON), high intensity intermittent floorball (FLB) or running (RUN) with comparable exercise intensity and duration for exercise groups. Delayed retention of motor memory was assessed 1 h, 24 h and 7 days after motor skill acquisition.

#### Edited by:

Claudia Voelcker-Rehage, Technische Universität Chemnitz, Germany

#### Reviewed by:

Keita Kamijo, Waseda University, Japan David L. Wright, Texas A&M University, USA Darla M. Castelli, University of Texas at Austin, USA

\*Correspondence:

Jesper Lundbye-Jensen jlundbye@nexs.ku.dk

Received: 15 September 2016 Accepted: 28 March 2017 Published: 20 April 2017

#### Citation:

Lundbye-Jensen J, Skriver K, Nielsen JB and Roig M (2017) Acute Exercise Improves Motor Memory Consolidation in Preadolescent Children. Front. Hum. Neurosci. 11:182. doi: 10.3389/fnhum.2017.00182 Results: During skill acquisition, motor performance improved significantly to the immediate retention test with no differences between groups. One hour following skill acquisition, motor performance decreased significantly for RUN. Twenty-four hours following skill acquisition there was a tendency towards improved performance for FLB but no significant effects. Seven days after motor practice however, both FLB and RUN performed better when compared to their immediate retention test indicating significant offline gains. This effect was not observed for CON. In contrast, 7 days after motor practice, retention of motor memory was significantly better for FLB and RUN compared to CON. No differences were observed when comparing FLB and RUN.

Conclusions: Acute intense intermittent exercise performed immediately after motor skill acquisition facilitates long-term motor memory in pre-adolescent children, presumably by promoting memory consolidation. The results also demonstrate that the effects can be accomplished in a school setting. The positive effect of both a team game (i.e., FLB) and running indicates that the observed memory improvements are determined to a larger extent by physiological factors rather than the types of movements performed during the exercise protocol.

Keywords: motor memory, consolidation, retention, exercise, learning, children

# INTRODUCTION

During the recent years, there has been an increasing focus on the potential effects of exercise on health, cognitive functions and also learning. The quality of public education has rarely been more debated than it is currently, with the discussion focusing on measures that can optimize academic performance of school children (Danish Ministry of Education, 2014). One of the means by which authorities are presently looking to facilitate academic performance in school children, is by incorporating more physical activity in and outside physical education (PE; Alvang, 2010). This initiative is supported by research indicating that there is a positive relation between high levels of aerobic fitness (Åberg et al., 2009; Lambourne et al., 2013), participation in vigorous physical activity (Coe et al., 2006) and academic achievements, even when time is taken from the normal curriculum and dedicated to exercise (Sallis et al., 1999; Ahamed et al., 2007).

These findings are part of a larger body of research showing that exercise can be beneficial for a variety of brain functions (for reviews see Hillman et al., 2008; Taubert et al., 2015). Both chronic and acute exercise can have beneficial effects on cognitive functions (Hillman et al., 2009, 2014) and facilitate the formation and retention of several types of memory (Roig et al., 2013). In children, chronic exercise is associated with significant memory-related benefits such as better academic performance (Coe et al., 2006; Singh et al., 2012; Booth et al., 2014) and cognitive functions (Sibley and Etnier, 2003; Hillman et al., 2014). Similarly, studies investigating the effects of an acute bout of exercise on memory have documented positive effects on cognitive functions including memory in adults (Winter et al., 2007; Kamijo et al., 2009; Labban and Etnier, 2011) and children (Hillman et al., 2009; Pesce et al., 2009; Ellemberg and St-Louis-Deschênes, 2010). Additionally, a meta-analysis found acute exercise to have small but consistent effects on simultaneous or subsequent performance of a memory-related task (Chang et al., 2012). While both chronic and acute exercise can benefit memory functions and effects may be interrelated, the distinction is important since the underlying mechanisms may differ and effects may relate to different aspects of memory.

Memory can crudely be divided into declarative and nondeclarative memory, and a major emphasis is placed on formation and retention of declarative memory in the educational system, e.g., geographical knowledge or recall of names (Squire, 1992). Declarative memory has also been a main focus for the majority of studies mentioned in the previous paragraph. Acquisition and retention of skills does however also play a major role both in the educational curriculum and in the lives of children and adults in general and skill learning requires the formation and retention of procedural or motor memory. It is thus also relevant to investigate how exercise may influence motor memory functions.

Recent studies have demonstrated that motor skills in preadolescent children are related to objective measures of cognitive functions, academic performance (Geertsen et al., 2016) and to academic achievement in adolescence (Kantomaa et al., 2013), and that daily PE and increased focus on motor skill training during compulsory school years can improve both motor skills and academic performance in adolescence (Ericsson and Karlsson, 2014). Finally, since motor (or procedural) memory is both associated with motor skills, but also supports language (Ullman, 2004) and certain social skills (Lieberman, 2000), motor skills may additionally subserve academic learning. Since motor skills are important in everyday life, and we need to acquire and retain a multitude of skills throughout life, these findings justify an increased focus on the principles and mechanisms involved in motor skill learning and motor memory in children.

Considering the effects of acute exercise on skill learning and motor memory, Roig et al. (2012) demonstrated in adults that an acute bout of intense exercise can facilitate long-term memory of a novel motor skill, when performed either before or after initial motor practice (Roig et al., 2012). The study also revealed that the effect was larger when exercise was performed after learning, indicating that exercise benefits the consolidation processes subserving retention of motor memory. Recently, it has been demonstrated also in adults, that consolidation and motor memory is influenced by the timing and intensity of exercise following skill learning (Thomas et al., 2016a,c). Thus, an acute bout of high intensity exercise performed in close temporal proximity to the skill acquisition and thus in the early consolidation phase is preferable for influencing motor memory consolidation positively. However, since these studies were conducted in a lab setting with able-bodied adult male subjects, it remains unknown whether acute exercise can benefit motor memory consolidation in preadolescent children. This question is important to elucidate given the potential for promoting skill learning in children, and in order to elucidate neuroplastic mechanisms underlying memory functions in children. Furthermore, it remains unknown whether the type of the employed exercise protocol plays a role for eliciting positive effects of exercise on motor memory.

The aim of this study was therefore to investigate if a single bout of intense exercise performed after motor skill learning, promotes long-term motor memory in preadolescent children since this has not been investigated in previous studies. Our main hypothesis was that long-term motor memory would be improved in children performing an intense physical activity after skill acquisition, compared to a passive control group. Additionally, we wanted to elucidate whether different types of intense exercise may affect long-term memory differentially.

## MATERIALS AND METHODS

#### Participating Children

Seventy-eight ethnically diverse children from 3rd and 4th grade were recruited to participate in the study (see **Table 1**). They were all naïve to the visuomotor accuracy-tracking task (MT) used to assess skill learning and retention of motor memory. Exclusion criteria for participation were: history of neurological or psychiatric diseases as well as current intake of medications

#### TABLE 1 | Characteristics of the children in the control (CON), floorball (FLB) and running (RUN) groups.


Data are reported as mean ± SD. BMI = Body mass index, Cardiovascular fitness is estimated peak VO<sup>2</sup> consumption using the yoyo-test, PAQ-C, General Physical activity level questionnaire for children; METs, Metabolic Equivalents; HR, Heart rate). <sup>∗</sup>Denotes significantly different from other groups (p < 0.05). <sup>+</sup>Denotes significantly different from RUN (p < 0.05).

affecting the central nervous system. The legal guardians of all participating children gave informed written consent on behalf of their child prior to participation. One child withdrew consent before completing the experiment. Blocked-randomization was used to assign children to either running (RUN), floorball (FLB) or control (CON) groups. The groups were matched on gender, age, BMI and fitness level since these factors may influence the effect of acute exercise on performance in cognitive tests, and possibly motor performance (Kamijo et al., 2009; Stroth et al., 2009). The study was performed in accordance with the Declaration of Helsinki II. The ethics committee for the Greater Copenhagen area approved the study (protocol: H-2- 2012-169).

# Study Design

#### Design Overview

The experiment was designed as a randomized controlled trial and included four separate sessions (**Figure 1**). The pre-examination consisted of an aerobic fitness test to assess fitness level. The following session was the main experiment during which the children practiced the motor task. After motor practice the children rested (CON), played FLB or ran (RUN) for 20 min. Retention tests were performed 1 h, 24 h and 7 days after initial practice of the motor task.

#### Pre-Examination

At least 1 week prior to the main experiment, all children completed a Yo-Yo Interval Restitution Children's test to assess their aerobic fitness. This test provides an estimate of the maximal oxygen consumption (VO2-peak; Bendiksen et al., 2013). During the test, heart rate was monitored (Polar Team System, Kempele, Finland).

#### Main Experiment and Retention Tests

Children were tested in groups of four, with all four being assigned to the same experimental group. They were seated at independent workstations along with an experimenter that followed them throughout the experiment. Initially children were given a brief description of the experiment. Afterwards, the weight, height and body composition (Innerscan, Tanita, Tokyo,

FIGURE 1 | Schematic overview of the experimental design. FB, Augmented Feedback; CON, control group; FLB, floorball group; RUN, running group.

Japan) of each child was measured. Additionally, the children completed an Edinburgh Handedness Questionnaire to assess handedness and Physical Activity Questionnaire for Children (PAQC) to estimate daily physical activity level (Kowalski et al., 2004).

Following this, the children practiced MT (see detailed information below). Immediately after motor practice, the children engaged in 20 min of rest (CON), FLB or running (RUN). After the interventions, all children rested 40 min during which they were allowed to read magazines or watch cartoons.

The motor task employed in this study was a modified version of the visuomotor accuracy tracking task previously applied in other experiments (Roig et al., 2012; Thomas et al., 2016a,b,c). In short, the children were comfortably seated at a table, with their forearm resting on the table. Immediately in front of the child was a computer screen. The children were instructed to control a computer mouse with their preferred hand.

The motor task setup was established using a customized software application (Matlabr, R2013b, Mathworks), allowing the child to control the vertical position (y-axis) of a cursor that moved left to right across the screen in 8 s i.e., with a constant speed. Based on the cursor's automatic movement along x-axis and the child's movement of the cursor on the y-axis, a trajectory was created, representing the trace drawn by the cursor's movement during the respective trial.

Children were allowed 1 min of familiarization, during which they could freely move the cursor on an empty screen i.e., without a target. Following familiarization, actual motor practice (acquisition) was initiated. For each trial, the child was required to match the cursor as accurately as possible to a preset target (Roig et al., 2012). Eight different targets with different, but predictable trajectories were displayed in a random order (see Thomas et al., 2016b). All eight target trajectories started and ended at target mean positioned in the middle of the visual display, and all targets required the child to move the cursor both upwards and downwards from target mean.

For each trial, motor performance was calculated based on the root mean square error (RMSE) between the preset target and the trace produced by the child. In order to provide intuitive augmented feedback on motor performance to the children, motor performance was transformed to a 0–100 score. The score was defined as the mean of absolute vertical errors between the cursor and target in relation to target mean. If the error exceeded two times the distance from target to target mean, the score for this data point was set to zero. An RMSE of 0 equaled a score of 100 (see Thomas et al., 2016b).

The motor task was performed in blocks of 24 trials arranged so that each block contained three rounds of the eight different targets. Following each trial there was a 2 s pause during which augmented feedback on motor performance could be provided. To promote learning, the following feedback was provided: (1) knowledge of result (KR) was presented as the score ranging 0–100; (2) knowledge of performance was represented by a picture displaying the target with the trace produced by the child superimposed; (3) since the difficulty and thereby scores of individual traces varied, the child was provided with an average of KR for every full eight-targets cycle completed as a measure of average performance.

Augmented feedback was omitted from baseline and retention blocks to test memory and minimize the effect of feedback dependency and (re)learning (Salmoni et al., 1984). The children performed one block of baseline MT performance (A), followed by three blocks of acquisition (B1–B3). A 1-min break was allowed between adjoining blocks. Subsequently the children completed four blocks of MT to assess retention: immediately after (C), 1 h (D), 24 h (E) and 7 days (F) after motor practice, respectively (see **Figure 1**). One additional block of motor practice (block G) with augmented feedback was performed after the retention test on day seven to elucidate potential ceiling effects in motor performance.

#### Intervention Protocols

The intermittent exercise protocol used in both RUN and FLB was based on the exercise intervention applied in Roig et al. (2012). The intensity of the physical activity was relatively high, since some studies, including our own data (Thomas et al., 2016c), have indicated that high work intensity during consolidation leads to larger effect of the intervention (Angevaren et al., 2007; Winter et al., 2007).

Both FLB and RUN represent activities, which are frequently employed in school-settings. The activities were thus chosen based on ecological validity. While the physiological requirements of running in some aspects can be matched to those of FLB (e.g., average heart rate), FLB additionally involves more complex motor skills, decision making, teamwork and competition. The aim of the three group design was thus to include a passive control and if the results demonstrated differential effects of FLB and RUN, the design would suggest that differences would likely be related to the additional requirements of FLB in the aforementioned domains.

FLB played indoor FLB on a court measuring 6.6 × 14 m. The two teams each consisted of two children and one adult experimenter. The experimenters participated in the game to ensure that the flow and intensity of the game was maintained throughout the activity. The intensity of the game varied in a similar intermittent pattern for all children in FLB: 2 min instructions—in 3 min low intensity warm up—3 min of high intensity—2 min low intensity—3 min high intensity—2 min low intensity—3 min high intensity—2 min low intensity. Consequently the children performed a total of 9 min high intensity exercise. The running exercise took place at an indoor square track measuring 6.6 × 14 m. The children in RUN were required to exercise following an intermittent protocol as described for FLB thus totaling 9 min of high intensity running.

While performing high intensity exercise, the childrens' heart rate was monitored online and results stored for offline analysis (Polar Team 2 System, Polar, Finland). This was also the case for three pilot experiments preceding the main experiment. The purpose of these pilot experiments was to determine the heart rates for children following the exercise protocol by playing FLB. During the main experiment, the aim was to reach similar heart rates as in the pilot experiments. This was ensured by experimenters monitoring heart rates online and verbally encouraging the children in both exercise groups. The children in the CON group were resting seated comfortably with the opportunity to watch cartoons for 20 min.

## Data Analysis

#### Children's Characteristics and Exercise Data

Average heart rate (HRavg) and peak heart rate (HRpeak) data were determined for each of the three intervals of high intensity exercise. Peak heart rate (HRpeak) represented the highest heart rate observed during the exercise protocol and HRavg the average of the peak heart rate observed in the three high intensity intervals. Differences in children's characteristics between experimental groups were compared using one-way analysis of variance (ANOVA). A significant difference between groups was assumed if p < 0.05.

#### Motor Learning and Memory

Motor performance scores obtained in each 24-trial block were averaged, providing a total of nine data points for each child (block A to G). Motor learning and memory was assessed by measuring acquisition (block A to C) and retention (block C to G) separately. A one-way ANOVA was applied to investigate differences between groups at baseline (block A). This test was needed to rule out the possibility of differences in baseline motor skill performance. Motor learning was analyzed with two-way repeated measures (RM) ANOVA with TIME (block A to C) and GROUP as factors. Motor memory was analyzed with a two-way RM ANOVA with TIME (block C to G) and GROUP as factors.

In the second ANOVA model the motor performance scores in the delayed retention tests (i.e., retention at 1 h, 24 h and 7 days) were normalized to scores at immediate retention (block C). This was done to ensure that the analysis of motor memory factorized differences among groups in skill performance at the end of acquisition.

If sphericity was violated when applying an ANOVA (as determined using Mauchly's test) the Greenhouse-Geisser correction was used. Furthermore, if a significant main or interaction effect was established in any ANOVA model, post hoc pairwise comparisons were performed using student's t-test. To reduce the risk of type I errors, the α-level was adjusted to p ≤ 0.017 thus applying a modified Bonferroni's correction procedure for the post hoc tests (0.05/3—0.05 divided by the number of comparisons within blocks; Roig et al., 2012). In addition, to elucidate if any statistical significant offline effects of memory could be documented within each experimental group, paired t-tests were carried out, comparing block C with blocks D, E, F and G, respectively. To reduce the risk of type I errors the α-level was adjusted to p ≤ 0.0125 (i.e., 0.05 divided by the number of comparisons within groups; Roig et al., 2012). To further investigate the time course of potential offline effects on motor skill retention, time-weighed regression analysis was performed within single subjects over the follow-up period including data from the immediate retention test and delayed retention at 24 h and 7 days (block C, E, F). A time-weighed slope measure was extracted for each individual, and entered into a one-way ANOVA to test for differences in offline mechanisms between groups. This procedure has previously been applied by Reis et al. (2009) to assess retention effects.

All statistical analyses were performed using IBM SPSS Statistics 22 for PC employing two-tailed probability tests. All p-values for t-tests are reported uncorrected. The results are provided as mean ± SEM unless otherwise reported.

#### RESULTS

#### Description of Children and Groups

The baseline characteristics of the children participating in the three groups are summarized in **Table 1**. There were no significant differences between groups with regards to VO2-peak (F(2,69) = 1.33; p = 0.272), BMI (F(2,74) = 1.56; p = 0.218), body fat percentage (F(2,74) = 0.34; p = 0.717), PAQ<sup>C</sup> score (F(2,74) = 0.33; p = 0.743). There was a difference in age (F(2,74) = 3.46; p = 0.037) with RUN being older compared to CON (t = 2.33; p = 0.024) and FLB (t = 2.02; p = 0.048). For heart rate measurements obtained during exercise, HRpeak was not different between exercise groups (F(1,49) = 0.37; p = 0.374). In contrast, HRavg was significantly higher during FLB compared to RUN (t = 2.86; p = 0.006).

#### Motor Skill Acquisition

There were no differences in baseline motor performance between groups (F(2,72) = 0.61; p = 0.549; see **Figure 2**). A two-way RM ANOVA assessing the effect of motor skill acquisition revealed no effect of either GROUP (F(2,72) = 0.43; p = 0.654) or a GROUP-TIME interaction (F(2,72) = 1.15; p = 0.322). Conversely there was an effect of TIME on motor performance baseline to immediate retention (F(1,72) = 314; p < 0.001) across groups. These finding suggest similar baseline motor performance and skill acquisition for all groups (see **Figure 2**). The average motor performance score obtained at the immediate retention test following motor practice (59.61 ± 0.92) was significantly improved compared to average performance at baseline (43.67 ± 1.11; t = 18.42; p < 0.001).

#### Delayed Retention

Motor performance scores obtained in delayed retention tests were normalized to immediate retention for the experimental groups to depict relative changes (**Figure 3**). Visual inspection of

immediate retention test). <sup>∗</sup>Significantly different (p < 0.05).

these results indicated that the exercise groups performed better in the delayed retention tests compared to CON, and the two-way RM ANOVA showed a statistically significant GROUP-TIME interaction (F(5.972,215) = 3.30; p = 0.004). In addition, there was a main effect of TIME on retention (F(2.986,215) = 15.83; p < 0.001), thus suggesting offline effects. Post hoc tests revealed that FLB performed significantly better than CON in block F (no augmented feedback on task performance) 7 days after motor practice (t = 3.12; p = 0.003). RUN also tended to perform better than CON (t = 2.09; p = 0.041, α–level adjusted p ≤ 0.017). There was no difference between FLB and RUN (t = 1.22; p = 0.228).

In block G however also 7 days after motor practice, the task was performed with augmented feedback on motor performance to investigate effects of continued practice. Withinblock comparisons of motor performance for block G showed that RUN performed significantly better compared to CON, (t = 2.54; p = 0.014), and FLB still tended to perform better than CON (t = 1.97; p = 0.05, α–level adjusted p ≤ 0.017). There were no differences between FLB and RUN (t = 0.513; p = 0.611). The findings thus demonstrate that 7 days after initial motor practice, both FLB and RUN performed better compared to CON. Visual inspection of **Figure 3** confirms that there was no ceiling effect for motor performance in the tracking task.

#### Offline Effects within and between Intervention Groups

The assessment of offline effects in the three intervention groups revealed that children in RUN displayed a significant drop in motor performance from the immediate retention test (C) to block D, 1 h after the end of practice and 40 min after exercise (t = 2.76; p = 0.011). This decrease in motor performance was not observed for FLB or CON.

Assessment of offline effects across 24 h and 7 days showed that children in the FLB group displayed an offline gain in motor performance from the immediate retention test (block C) to block F (t = 3.11; p = 0.003) and block G (t = 3.23; p = 0.002) 7 days later. RUN also performed significantly better in block G after 7 days compared to block C (t = 4.0; p = 0.001), while there were no offline gains in motor performance for children in the CON group.

Offline changes in motor skill were also assessed by means of time-weighed slope measure calculated within single subjects for changes in motor performance between the immediate retention test and delayed retention at 24 h and 7 days (blocks C, E, F; **Figure 4**). The one-way ANOVA revealed a significant effect of GROUP (F(2,75) = 5.0; p = 0.009) and post hoc comparisons revealed that the slope parameter was significantly higher in FLB (t = 3.32; p = 0.0017) compared to CON (**Figure 4B**). The slope parameter also tended to be higher for RUN compared to CON (t = 2.23; p = 0.032, α–level adjusted p ≤ 0.017). There was no difference between FLB and RUN (t = 0.863; p = 0.39). These findings indicate offline gains in motor memory for both exercise groups compared to the control group.

#### DISCUSSION

The present study is to our knowledge the first to investigate: (1) whether a single bout of acute intense intermittent exercise following skill acquisition can improve motor memory consolidation in preadolescent children; and (2) whether different types of acute exercise employed in a school setting are accompanied by differential effects on motor memory and skill learning.

The results demonstrate that an acute bout of intense exercise performed after practicing a novel motor task improves long-term motor memory. In accordance with Roig et al. (2012) we have thus demonstrated that exercise can facilitate retention following skill learning through an effect on motor memory consolidation also in pre-adolescent children. Furthermore, we investigated the effects of different exercise interventions including running and FLB, which represents a team-oriented exercise intervention, demonstrating that team-sports can be effective in reinforcing the consolidation and retention of long-term motor memory. The finding that long-term motor memory was enhanced in both exercise groups indicates that the type of exercise does not seem to be essential to achieve a memory-facilitating effect. Most likely, timing, intensity and duration of exercise are more important parameters for facilitating memory consolidation processes (Angevaren et al., 2007; Chang et al., 2012; Roig et al., 2013; Thomas et al., 2016a,c).

While no study to our knowledge assessed effects of acute exercise on motor memory consolidation in preadolescents, Pesce et al. (2009) previously demonstrated a positive effect of acute exercise on declarative memory in this group, when submaximal exercise preceded the memory encoding and group (p < 0.05).

retrieval. In this case, exercise may have influenced both declarative memory formation and storage. Furthermore, Pesce et al. (2009) did not assess long-term retention thereby potentially missing further offline effects of the exercise. Delayed retention tests are highly important in order to assess interventional effects on consolidation (Kantak and Winstein, 2012), since the formation of both long-term declarative (McGaugh, 2000) and motor memory as in the present study (Brashers-Krug et al., 1996) can persist for many hours after acquisition. In the present study, skill acquisition preceded exercise, and this design including delayed retention tests enable us to demonstrate that the acute bout of exercise, enhanced memory through an effect on consolidation processes.

The timing of both the exercise relative to skill acquisition and the delayed retention tests are important in order to elucidate the effects of exercise on motor memory (Statton et al., 2015; Roig et al., 2016; Thomas et al., 2016a). In the current study, exercise followed motor practice with a short delay, which may be important for the positive effect on consolidation (Roig et al., 2016; Thomas et al., 2016a). Indeed, Thomas et al. (2016a) recently demonstrated that the temporal proximity between the exercise bout and motor practice has a positive influence on motor memory consolidation. This may relate to the potential temporal gradient of the consolidation processes following motor practice.

It is noteworthy that significant effects of exercise on motor memory improvements may appear long after the performance of the exercise bout and not immediately after. Indeed in the present study, memory and thus motor performance was significantly enhanced in the exercise groups 7 days after encoding, even though there was a detrimental effect in the running group 1 h after motor practice. Other studies have found non-significant or even detrimental effects of acute exercise on memory when retention tests were employed during or shortly after a moderate to high intensity exercise bout (Roig et al., 2013). This approach is not appropriate to assess effects on memory because the proximity of the retention test to the exercise stimulus can easily mask potential gains in memory due to exercise-induced fatigue and/or arousal, particularly when exercise is performed at a high intensity (Roig et al., 2016). Thus although intense running was accompanied by an acute detrimental effect on motor performance, this did not preclude delayed gains in the RUN group.

The finding of improved motor memory for the exercise groups 7 days after motor practice is consistent with previous studies (Roig et al., 2012; Skriver et al., 2014; Thomas et al., 2016a,c). While previous studies have also found significant effects of exercise 24 h after motor practice, this was only a tendency for FLB in the current study. Both groups did however display offline improvements from immediate retention to the 7-day retention test (see **Figures 3**, **4**). It is possible that the effects of exercise evolve long after synaptic consolidation processes (Dudai, 2012) and that several days are required to see the effects of exercise on memory consolidation. Sleep-dependent processes may also be involved in the exercise-induced improvements in motor memory consolidation and this could contribute to the delayed effects (Dudai, 2012). In addition, the retrieval and motor practice inherent in the 24 h retention test and the following reconsolidation may contribute to the effects on delayed motor memory observed at 7 days.

Although there were no significant differences between the two exercise groups and both groups displayed different degrees of offline gains in motor performance, it is also noteworthy that there seemed to be apparent differential effects. Whereas FLB demonstrated significant offline improvements from immediate retention to 7 days retention and performed better compared to the control group, this effect was less pronounced for RUN. The running group however displayed a marked gain in performance with continued motor practice on day 7. In line with this, Rhee et al. (2016) also recently noted a latent effect of exercise on the development of motor performance with continued practice, and this may represent an alternative way in which exercise-mediated consolidation effects can influence motor memory and development of motor skills. Thus, acute exercise may both promote both consolidation processes, reconsolidation and facilitate learning with continued motor practice differentially. In essence however, both FLB and RUN ultimately led to improvements in motor memory in the current study.

Moderate to vigorous acute exercise preceding encoding has previously been demonstrated to promote declarative memory in children (Pesce et al., 2009) and adults (Winter et al., 2007) as well as motor skill acquisition (Statton et al., 2015). In a recent study by Thomas et al. (2016c) it was found that higher intensity exercise following skill acquisition was accompanied by more pronounced effects on motor memory compared to moderate intensity exercise. In agreement with previous studies in adults (Roig et al., 2012; Skriver et al., 2014) the present results demonstrate that high-intensity exercise following skill learning can promote consolidation of memory in children.

Indeed children in both exercise groups displayed high heart rates and thus performed exercise at a high intensity during the intermittent intervention. The groups displayed small differences concerning age and heart rate during exercise. The difference in age for RUN occurred due to an unexpected delay due to national labor disputes, causing a relatively larger part of children in RUN to participate in the main experiment 2 months later than planned. Since the groups were matched prior to initiation of the main experiment, this led to the observed small but significant difference for RUN. However, since neither age nor HRavg were related to acquisition (age: r = −0.111; p = 0.338; HRavg: r = 0.171; p = 0.229) or retention (age: r = 0.109; p = 0.348; HRavg: r = 0.003; p = 0.99) these differences were considered to be of minor importance to the interpretation and validity of the results.

While mean HR was high and comparable between exercise groups, FLB and running are naturally different in several aspects. While running is a continuous activity, FLB naturally is a more intermittent activity and although exercise-protocols were also matched on time, the inherent intermittent nature of FLB may have influenced the anaerobic metabolism compared to running. FLB also involves decision making, varied movements also for the upper body in addition to a team element and competition. It is based on the design of the present study not possible to say to which extent these differences may have influenced the results, but further studies may elucidate whether these factors could influence motor memory processes.

The finding in the present study that motor memory consolidation and long-term motor memory can be enhanced by both FLB and running is consistent with the findings of Thomas et al. (2016b) in adults. This does however not mean that it is not relevant to consider exercise type. The finding that team sport in addition to running can boost long-term memory in children is important, since the classic experimental exercise regimes on treadmills or bike ergometers, are far from the regular activities that pre-adolescent children experience in school settings, e.g., playing during recess or PE. The applicability of team sports allows flexibility in selecting a more motivating and manageable activity depending on individual and group preferences.

A recent study has documented that the fitness levels of the younger population are declining (Tomkinson and Olds, 2007) making any effort that stimulates participation in regular physical activity ever more relevant. Furthermore, the level of fundamental movement skills in childhood is correlated to level of physical activity and obesity (Morgan et al., 2013) and physical activity can mediate the association between childhood motor functions and adolescents' academic achievements (Kantomaa et al., 2013). Since increased focus on motor skills combined with incorporation of exercise in education institutions can benefit both skill learning and academic achievements as demonstrated by Ericsson and Karlsson (2014), this underlines the importance of early-life motor skill training.

Skill learning is important and the mechanisms subserving formation and retention of declarative and nondeclarative memory (Censor et al., 2012) are to a large degree similar. Although memory systems are distinct, recent studies have made it conceivable that declarative and procedural memories can and do indeed interact (Robertson, 2012), and memories are not necessarily confined to independent systems. The functional connection between memory processes may contribute to the observed long-term effects of motor skill training on academic performance (Ericsson and Karlsson, 2014), and it suggests that consolidation of declarative memory might also benefit from acute exercise (Kandel et al., 2014). Caution should however be taken when extrapolating the current results to other types of memory or exercise, and studies assessing the long-term effects of acute exercise on declarative and nondeclarative memory are needed to conclusively document such effects.

The present results demonstrate that acute exercise in temporal proximity to a skill learning session can facilitate memory consolidation following acquisition and thus retention of motor memory. Based on the obtained measurements, it is difficult to infer in detail, which mechanisms could be involved in the observed effects of exercise on motor memory. Given that exercise was placed immediately following motor practice, long-term motor memory is most likely influenced by positive influences of the exercise on consolidation processes involving neuroplastic changes in the central nervous system (Brashers-Krug et al., 1996; Lundbye-Jensen et al., 2011; Taubert et al., 2015). We have in a previous study in adults found relations between exercise-induced effects on motor memory and changes in concentrations of specific biomarkers in peripheral blood samples (Skriver et al., 2014). This analysis demonstrated that higher concentrations of brain derived neurotrophic factor (BDNF), norepinephrine and lactate correlated with better retention of motor memory. Recently, a similar relationship was found between delayed motor memory and changes in corticospinal excitability following exercise (Ostadan et al., 2016). Whether these associations may also be found for preadolescent children remains to be elucidated. Future studies could thus focus on investigating the mechanisms underlying the observed effects through application of e.g., electrophysiological or neuroimaging techniques in children.

In addition to investigating the mechanisms underlying the behavioral effects observed in the current study, future studies could investigate the potential effects of strategically implementing acute exercise in longer interventions thus combining acute and chronic exercise with the aim of influencing e.g., skill learning and motor memory. In addition to the direct transferability of the behavioral findings to school settings, the results of the present study could also be relevant for rehabilitation training with the purpose of improving long-term outcome of the rehabilitative treatment (Vaynman and Gomez-Pinilla, 2005), with acquisition and retention of motor skills being particularly important for physical rehabilitation. However, studies targeting clinical populations with mobility impairments are required before finally making such recommendations.

#### CONCLUSION

The present study is to our knowledge the first to demonstrate that acute intense intermittent exercise performed immediately after motor skill acquisition facilitates long-term motor memory in pre-adolescent children, presumably by promoting memory consolidation. The results also demonstrate that the effects can

#### REFERENCES


be accomplished in a school setting. The positive effect of exercise both as a team game (i.e., FLB) and running indicates that the observed memory improvements are determined to a larger extent by physiological factors such as intensity and timing of exercise, rather than the types of movements performed.

#### AUTHOR CONTRIBUTIONS

JL-J, KS, JBN and MR designed the experiment. JL-J and KS collected the data. JL-J, KS and MR conducted the required data analysis. All authors contributed to drafting the manuscript, and approved the final version of the manuscript.

#### FUNDING

The study was supported by The Ludvig & Sara Elsass Foundation and Nordea-fonden.

#### ACKNOWLEDGMENTS

We would like to thank Lene Norbert Henriksen, Alexander Mogensen, Jonas Roland Knudsen, Caroline Borup Andersen, Cecilie Karlsson, Camilla Linde and Sidse Nikoline Stavad for their assistance with data collection. We would like to thank the school, teachers, and most importantly the children for participating in the study.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Lundbye-Jensen, Skriver, Nielsen and Roig. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Movement-Related Cortical Potential Amplitude Reduction after Cycling Exercise Relates to the Extent of Neuromuscular Fatigue

Jérôme Nicolas Spring<sup>1</sup> \*, Nicolas Place<sup>2</sup> , Fabio Borrani <sup>3</sup> , Bengt Kayser <sup>2</sup> and Jérôme Barral <sup>1</sup>

1 Institute of Sport Sciences, Faculty of Social and Political Sciences, University of Lausanne, Lausanne, Switzerland, 2 Institute of Sport Sciences and Department of Physiology, Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland, <sup>3</sup> Institute of Sport Sciences, Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland

#### Edited by:

Claudia Voelcker-Rehage, Technische Universität Chemnitz, Germany

#### Reviewed by:

Marika Berchicci, University of Rome "Foro Italico" - Rome, Italy Sahil Bajaj, The Houston Methodist Research Institute, USA

> \*Correspondence: Jérôme Nicolas Spring jerome.spring@unil.ch

Received: 04 March 2016 Accepted: 17 May 2016 Published: 01 June 2016

#### Citation:

Spring JN, Place N, Borrani F, Kayser B and Barral J (2016) Movement-Related Cortical Potential Amplitude Reduction after Cycling Exercise Relates to the Extent of Neuromuscular Fatigue. Front. Hum. Neurosci. 10:257. doi: 10.3389/fnhum.2016.00257 Exercise-induced fatigue affects the motor control and the ability to generate a given force or power. Surface electroencephalography allows researchers to investigate movement-related cortical potentials (MRCP), which reflect preparatory brain activity 1.5 s before movement onset. Although the MRCP amplitude appears to increase after repetitive single-joint contractions, the effects of large-muscle group dynamic exercise on such pre-motor potential remain to be described. Sixteen volunteers exercised 30 min at 60% of the maximal aerobic power on a cycle ergometer, followed by a 10-km all-out time trial. Before and after each of these tasks, knee extensor neuromuscular function was investigated using maximal voluntary contractions (MVC) combined with electrical stimulations of the femoral nerve. MRCP was recorded during 60 knee extensions after each neuromuscular sequence. The exercise resulted in a significant decrease in the knee extensor MVC force after the 30-min exercise (−10 ± 8%) and the time trial (−21 ± 9%). The voluntary activation level (VAL; −6 ± 8 and −12 ± 10%), peak twitch (Pt; −21 ± 16 and −32 ± 17%), and paired stimuli (P100 Hz; −7 ± 11 and −12 ± 13%) were also significantly reduced after the 30-min exercise and the time trial. The first exercise was followed by a decrease in the MRCP, mainly above the mean activity measured at electrodes FC1-FC2, whereas the reduction observed after the time trial was related to the FC1-FC2 and C2 electrodes. After both exercises, the reduction in the late MRCP component above FC1-FC2 was significantly correlated with the reduction in P100 Hz (r = 0.61), and the reduction in the same component above C2 was significantly correlated with the reduction in VAL (r = 0.64). In conclusion, large-muscle group exercise induced a reduction in pre-motor potential, which was related to muscle alterations and resulted in the inability to produce a maximal voluntary contraction.

Keywords: EEG, fatigue, Bereitschaftspotential, peripheral nerve stimulation, maximal voluntary contraction

# INTRODUCTION

Prolonged exercise increases the difficulty to perform voluntary motor actions by altering motor control or the capacity to sustain an ongoing effort or to generate maximal force or power (Allen et al., 1995; Jaric et al., 1997, 1999; Gandevia, 2001; Bottas et al., 2004) in other words, fatigue develops. Exercise-induced muscle fatigue can be defined as a reduction in the maximal voluntary contraction force (MVC; Gandevia, 2001) and can be related to alterations occurring at different sites along the motor pathway, from the cortex to the muscle fiber. It is typical to distinguish between the central factors located before the neuromuscular junction (central fatigue), which refers to the neural activity that drives the muscle, and the peripheral factors located after the motor plate at the muscle level (peripheral fatigue).

Standardized investigative methods for examining neuromuscular function, such as peripheral nerve stimulation or transcranial magnetic stimulation, have been extensively used to explore the complex relationship between exercise and fatigue (Gandevia, 2001; Taylor et al., 2006; Barry and Enoka, 2007). Peripheral nerve stimulation provides relevant information about muscular properties such as excitation-contraction coupling. This method also allows for the quantification of suboptimal motor drive for force production by the twitch interpolation technique (Merton, 1954). However, peripheral nerve stimulation alone remains ineffective for differentiating between supraspinal and spinal adaptations. Transcranial magnetic stimulation is an alternative stimulation method that consists in applying an electromagnetic field above the motor cortex and/or the cervicomedullary junction, to evoke motor evoked potentials, and as such transcranial magnetic stimulation provides information about changes in corticospinal excitability/inhibition (Goodall et al., 2012). By combining transcranial magnetic stimulation and peripheral nerve stimulation, it is possible to assess the neuromuscular pathway and to gain insight into potential site(s) of impairment during or following exercise (Gruet et al., 2014).

Because any voluntary physical effort begins and ends in the brain (Kayser, 2003), voluntary contraction is not limited by the motor command per se but also by processes upstream from the motor cortex that might limit motor drive and thus contribute to central fatigue (Gruet et al., 2013). Indeed, during voluntary movement, cortical and subcortical regions are involved in the final motor output (Ball et al., 1999; Shibasaki, 2012; Tanaka and Watanabe, 2012). For example, peripheral afferents send projections to the cingulate anterior cortex, the premotor area, the lateral prefrontal cortex and the orbitofrontal cortex (Liu et al., 2002; Liu, 2003; Hilty et al., 2011a; Robertson and Marino, 2016) and thereby participate in modulating motivational and executive processes. Hence, the brain integrates the internal state and the perceptual information to finally modulate the motor output.

Electroencephalography (EEG) appears to be a relevant method for investigating exercise-induced changes in brain activity, particularly because it reflects the spontaneous and immediate activity of neural networks from a wide range of brain systems. During voluntary muscle contraction tasks, EEG allows for investigation of the spontaneous cortical activity related to movement production. The movement-related cortical potential (MRCP) is an event-related potential locked to the onset of movement (Shibasaki and Hallett, 2006). It reflects the preparatory brain activity, taking into account the time factor. First described by Kornhuber and Deecke; (1965a, b) as the Bereitschaftspotential, this EEG pattern is characterized by a slow negative shift, starting ∼2 s before movement, and reflects neural processes involved in preparing the motor command. The MRCP is composed of two main components, distinguished by their change in slopes occurring ∼500 ms before movement onset (Deecke, 1996; Shibasaki and Hallett, 2006). The first component (BP: Bereitschaftspotential) has a moderate steepness and is bilaterally distributed at the frontocentral midline above the supplementary motor area (SMA). BP occurs between 1500 and 500 ms before movement onset. The second component (NS′ : negative slope) is contralateral dominant, more pronounced above the primary motor cortex (M1) and occurs between 500 ms before movement onset and movement onset. Within NS′ , the motor potential (MP) is observed at movement onset and corresponds to the MRCP peak amplitude.

Performing voluntary movement implies dynamic processes that involve multiple areas within the brain, likely with overlapping activities. At present, there is a divergence in opinion regarding whether the MRCP components reflect different processes (Jahanshahi and Hallett, 2012). Nevertheless, the identification of the generators of MRCP and their related intracortical connections suggest different stages in the movement-generating procedure. The main generator of MRCP is not limited to the primary motor cortex; the pre-SMA, SMA, and cingulate cortex have a major contribution (Jahanshahi and Hallett, 2012). The activity of additional subcortical structures, such as the thalamus, the caudate, the putamen, and the pallidum also participate in scalp surface-recorded activity and cannot be excluded. By manipulating motor preparation with specific tasks, neuroimaging studies have shown increased activity in the SMA in self-generated movement, compared with movements directed by external cues (Deiber et al., 1991, 1996; Jenkins et al., 2000). Some evidence supports sequential activation within this neural network, in which the SMA has a driving effect on M1 (Herz et al., 2012). Thus, we can assume that the early MRCP generated above the SMA represents the cognitive process related to the decision to perform a movement and the preparation, whereas the late MRCP recorded above the primary motor cortex is more likely to represent the motor part of movement production (Arai et al., 2012; Hoffstaedter et al., 2013). Although the functional distinction between components is still unclear, their segmentation helps to differentiate the processes related to movement planning from motor execution.

The MRCP is obtained by repeated single-joint contractions protocols. Some studies have reported that repetitions of the same movement are accompanied by an increase in MRCP amplitude (Johnston et al., 2001; Schillings et al., 2006; Morree et al., 2012), which has been interpreted as a way to compensate for peripheral fatigue, whereas others have reported no modifications (Siemionow et al., 2004; Liu et al., 2005). Schillings et al. (2006) observed an increase in MRCP after handgrip contractions, but did not find any correlation between MRCP modulations and the reduction in force or with any peripheral fatigue parameters. The authors postulated that the increase in the motor cortical activity compensated for a reduction in central efficiency. In their study, Freude and Ullsperger (1987) asked participants to perform self-paced contractions for 30 min (i.e., 150–250 contractions) at 20, 50, and 80% of their MVC. MRCP increased when fatigue developed during exercise at 20 and 80% of MVC but decreased when the contractions were performed in the absence of fatigue at 50% of MVC. The increase in MRCP amplitude was interpreted differently according to the intensity of effort. At 80% of MVC, the results suggested an increase in cortical activation to compensate for peripheral fatigue, whereas at 20% of MVC, they indicated the high degree of concentration and attention required to properly perform the task. Conversely, at 50% of MVC, the reduction in MRCP was interpreted as a decrease in intentional involvement because of task monotony. More recently, Berchicci et al. (2013) investigated simultaneously MRCP, perception of effort, muscle twitch force and EMG activity. Eighteen subjects performed four blocks of isometric knee extension at 40% of MVC (240 2 s long contractions). After averaging the early (block 1–2) and late blocks (block 3–4), the authors performed a cluster analysis to create two groups based on the rating of perceived exertion (RPE) and the peripheral fatigue (muscle twitch force loss). MRCP increased in the group with higher RPE and in the group with greater peripheral fatigue. They also observed higher positive activity in the prefrontal cortex in the group with greater RPE. According to the authors, the protocol used required high cognitive effort to properly perform the task, which could explain the frontal positivity. In summary, MRCP modulations appear to be related to global exercise-induced fatigue, not only to peripheral fatigue. Some factors might influence the modulations of the pre-motor potential, such as the cognitive load and the perceived effort (Freude and Ullsperger, 1987; Slobounov et al., 2004).

The present study is the first to investigate the effects of an acute endurance exercise on a motor task (voluntary knee extensions) by combining pre-motor brain activity and neuromuscular measurements. Our strategy was to use a specific fatigue-generating procedure, different from the task used to quantify MRCP changes, to avoid bias from additional and unwanted mental weariness. We asked our subjects to perform a large-muscle-group exercise (cycling) and to participate in a specific MRCP task (repeated knee extension; before and after cycling) to quantify the changes induced by the fatiguing exercise. The aim of the study as two-fold: (1) to assess the effect of cycling exercise intensity (heavy and severe) on MRCP modulations and (2) to relate the exercise-induced MRCP modulations to the extent of central and peripheral fatigue. We hypothesized that large-muscle-group exercise would induce neuromuscular fatigue and an increase in MRCP amplitude above the premotor and motor area.

# MATERIALS AND METHODS

#### Participants

Twenty well-trained male athletes were enrolled in the study after having been informed of the experimental procedure. All the participants completed the Baecke questionnaire to ensure that they were physically active (Baecke et al., 1982). The protocol was approved by the local ethics committee (CERVD: protocol 153/14) and was in agreement with the Declaration of Helsinki. Each subject provided written consent before participation.

# Experimental Protocol

The volunteers visited the laboratory on two occasions, for the pre-participation session and for the experimental session. During the pre-participation session, preliminary medical screening confirmed that participants were in good health and had no disorders that could interfere with the experimental procedure. Upon inclusion, they performed a maximal ramp exercise protocol on a cycle ergometer (Lode Excalibur Sport, Groningen, the Netherlands) to measure the first ventilatory threshold (SV1), peak oxygen consumption (VO˙ <sup>2</sup> peak), and maximal aerobic power (MAP). The participants warmed up for 6 min at 60 W, after which power was incremented by 30 W/min until voluntary exhaustion. Maximality was considered to have been reached when at least three of the following criteria were met: VO˙ <sup>2</sup> plateau; respiratory quotient (QR) >1.1; maximal heart rate (HRmax) >90% of theoretical HRmax (i.e., 220-age); or a pedaling rate below 60 rotations per minute despite strong verbal encouragement.

The experimental session occurred within 4 weeks after the pre-participation session. The volunteers were instructed to maintain their usual diet and to avoid severe exercise the day before. They had to avoid alcohol and caffeine consumption over the 12 h preceding the session. The last meal had to be taken at least 2.5 h before the beginning of the test, and water was provided ad libitum during the experimental session.

The protocol consisted of heavy exercise, followed 15 min later by severe exercise (**Figure 1**). The heavy exercise consisted in pedaling at a freely chosen cadence on a cycle ergometer (Lode Excalibur Sport, Groningen, the Netherlands) for 30 min at an intensity of ∼60% of MAP. The severe exercise was a 10-km all-out time trial (TT) performed on a road bike with the rear wheel mounted on a home trainer (CycleOps, Madison, USA). The objective was to complete the distance as fast as possible, with the distance displayed on a bike computer fixed to the handlebar. The resistance of the roller increased automatically with the force exerted by the cyclist to reproduce field-like sensations. A fan was placed in front of the subject to avoid excessive sweating, and the wind speed was adjusted upon request. A power meter in the rear hub (PowerTap, CycloOps, Madison, USA) allowed the power output to be recorded. Perceived exertion was assessed with the 6–20 Borg scale at the end of the heavy exercise and the TT. Knee extensor neuromuscular function was investigated through the quantification of several parameters. The MVC represents the maximum force that a subject could produce in the isometric knee exercise. The voluntary activation level (VAL) was chosen

as an index of central fatigue and is believed to reflect the ability of the motor cortex to drive muscle. The M-wave was recorded to explore neuromuscular transmission/propagation. The paired stimuli force at 100 Hz (P100 Hz) and the muscle twitch force (Pt) reflected the muscle properties and were both chosen as indices of peripheral fatigue. Neuromuscular data were collected before the fatiguing task (PRE), immediately after the 30-min exercise (POST1) and immediately after the 10-km TT (POST2). The MRCP data were recorded after each neuromuscular assessment for these three time points.

#### Neuromuscular Data Collection

#### Neuromuscular Assessment

The session began with a warm-up of 10 submaximal voluntary isometric knee extensor contractions (4–5 s) between 20 and 90% of the estimated MVC. After a short recovery period, the participants performed two MVCs (4 s duration) with the right leg, separated by a 1-min rest period. For both trials, P100 Hz were delivered at maximal force, followed by a P100 Hz and a Pt in 2-s intervals.

#### Evoked Contractions

A constant-current stimulator (DS7AH, Digitimer, Hertfordshire, UK) was used to deliver electrical pulses. The cathode (5 cm diameter) and the anode (5 × 10 cm, Dermatrode, American Imex Irvine, CA) were placed over the femoral nerve at the femoral triangle level below the inguinal ligament and on the lower part of the gluteal fold opposite the cathode, respectively. The optimal intensity for electrical stimulation was determined after the warm-up period by progressively increasing the stimulus intensity in 10-mA increments until there was no further increase in the amplitude of the mechanical or electrical (M-wave) responses. A 20% supplementary increment was added to ensure supramaximal stimulation intensity (Neyroud et al., 2013).

#### Force Recording

Voluntary and evoked force exerted by the right knee extensors were recorded using an isometric ergometer consisting of a custom-built chair equipped with a strain gauge (Universal Load Cell, model 9363-C3, linear range 0–250 N•m, output sensitivity 2.0 mV•V −1 , Vishay, Malvern, US). The calibrated strain gauge was fixed to the chair and strapped to the ankle with a custommade mold. Subjects were seated with a 90-degree knee angle, the trunk was attached at a 100◦ angle to the chair back panel with a harness belt, and the arms had to be crossed on the chest to minimize upper body movement. The force signal was recorded at 1 kHz using an AD converter system (MP 150, BIOPAC Systems, Goleta, CA).

#### EMG Recording

The EMG activity of the right vastus lateralis (VL) was recorded with a pair of silver chloride (Ag/AgCl) circular (1 cm) surface electrodes (MediTrace 100, Kendall, Canada) positioned lengthwise over the middle of the muscle belly according to SENIAM recommendations (Hermens et al., 2000); the interelectrode distance was 2 cm. The reference electrode was placed over the patella. The VL was chosen as representative of quadriceps muscle activity (Place et al., 2007). Low resistance (<5 k) was obtained by shaving, abrading and cleaning the skin. EMG signals were amplified (gain = 1000) over a frequency bandwidth of 10–500 Hz and digitized at a sampling frequency of 2 kHz using an AD converter system. Force and EMG data were analyzed offline using the software Acknowledge (Biopac System, Santa Barbara CA, USA).

# MRCP Data Collection

#### EEG Recording

Continuous EEG was recorded at a sampling rate of 2048 Hz with a 64-channel Biosemi Active two-amplifier system (Biosemi, Amsterdam, the Netherlands) mounted according to the 10– 20 International System. All channels were referenced to the CMS-DRL ground, which functioned as a feedback loop driving the average potential across the montage as close as possible to amplifier zero (Biosemi, Amsterdam, the Netherlands). Impedance was kept below 5 k by using conducting gel. Participants wore the EEG cap during the entire protocol. Post-exercise recordings started 5.5 ± 0.5 min after the end of the heavy exercise and the TT, and lasted 10 min. Offline analyses were performed with BrainVision analyzer software (Brain Products Gmbh, Munich, Germany).

The MRCP data were collected using the same ergometer device used for neuromuscular assessment. To avoid any unknown disturbances induced by the neuromuscular stimulation, the other leg was used for the MRCP task. A string attached to the left ankle ran over a pulley to a freehanging weighted platform. The subjects were instructed to lift this weight 60 times, equivalent of 20% of their MVC force, by ∼10 cm. The contraction duration was not strictly controlled, but participants were instructed to produce a 2-s contraction: a 1-s concentric contraction to lift the weight, and 1-s eccentric contraction to put it down (metronome). The onset of movement was automatically reported on the EEG recording by the release of a trigger placed behind the heel of the subjects. The contractions were self-generated, but to ensure that the duration of the task was identical for each participant in each condition, a beep sounded every 10 s. The participants were instructed to perform the contraction spontaneously between two beeps. The subjects were also instructed to keep their eyes closed during the task to avoid excessive attentional load and artifacts generated by visual feedback.

#### Data Analysis Gas Exchange

Breath-by-breath pulmonary gas-exchange data were collected during the maximal ramp test with a metabolic cart (OxyconPro, Jaeger, Germany) and averaged over consecutive 10-s period. The VO˙ <sup>2</sup> peak was taken as the highest value attained during the last 30 s before the subject's volitional exhaustion. SV<sup>1</sup> was determined from the combination of different measurements, including the first disproportionate increase in VCO ˙ <sup>2</sup> from visual inspection of individual plots of VCO ˙ <sup>2</sup> vs VO˙ <sup>2</sup>, an increase in expired ventilation VE/ ˙ VO˙ <sup>2</sup> with no increase in VE/ ˙ VCO ˙ <sup>2</sup>, and an increase in end-tidal O<sup>2</sup> tension with no fall in end-tidal CO<sup>2</sup> tension. The intensity for the 30-min exercise was obtained by adding 20% of the difference between SV<sup>1</sup> and MAP to the power reached at SV1.

#### Force Data

MVC force of the knee extensors was reported as the force produced during the maximal voluntary contractions (i.e., peak to peak). Resting P100 Hz and Pt amplitude were analyzed for the trials yielding the highest MVC. The VAL during MVCs was estimated with the superimposed and the potentiated doublets according to the formula proposed by Strojnik and Komi (1998).

VAL = (1 − superimposed 100 Hz doublet force × force level at stimulation /MVC force / superimposed 100 Hz doublet force ) × 100

#### EMG Data

The EMG signals recorded during the highest MVC were used for analysis. M-wave peak-to-peak amplitude was measured from the EMG response after the single stimulation.

#### MRCP Data

The raw EEG data were first down-sampled from 2048 to 512 Hz to reduce computational load and band-pass filtered from 0.1 to 5 Hz. The low-pass filter was set at 5 Hz (Thacker et al., 2014) to avoid bias for alpha rhythm induced by the eyesclosed procedure and to avoid unwanted activity generated by spontaneous physiological and rolandic mu rhythms. EEG signals were segmented into 60 epochs of 3000 ms each (from 2500 ms before movement onset to 500 ms after movement onset). All trials were baseline-corrected with −2500 ms to −2000 ms as a reference and averaged using a semi-automatic artifact rejection procedure with a ±80µV criterion. Artifacted electrodes were interpolated when necessary with a spherical 3D spline, and trials containing periods of muscular artifacts were also rejected. On average, 53 ± 9, 52 ± 11, and 52 ± 9 of 60 trials were available for analysis for the PRE, POST1, and POST2 conditions, respectively. The MRCPs were segmented into four sequential components (Shibasaki and Hallett, 2006; Jahanshahi and Hallett, 2012). The Bereitschaftspotential was divided into BP1 and BP2. The BP1 corresponds to the average amplitude between −1500 and −1000 ms. The BP2 corresponds to the average amplitude between −1000 and −500 ms. The third component was the negative slope (NS′ ), which corresponds to the average amplitude from −500 ms to the onset of movement. The last component was the motor potential (MP), taken as the maximal peak amplitude recorded between −500 ms and movement onset. Those components were calculated for two regions of interest. The first region corresponded to the mean activity of the FC1 and FC2 electrodes and the second region was the mean activity above the C2 electrode. Those electrodes were chosen because they correspond to the area known to generate MRCP, namely the SMA (represented by FC1-FC2 mean activity) for the first part of MRCP and the primary motor cortex (M1; represented by C2 activity) for the late part of MRCP (Shibasaki and Hallett, 2006).

# Statistical Analysis

One-way repeated measures ANOVAs with factor Time were used to compare the neuromuscular and MRCP variables between the different times of measurement (PRE, POST1, and POST2, respectively). When ANOVA revealed significant interactions, pairwise contrasts were performed using the Bonferroni correction. Friedman ANOVAs with follow-up Wilcoxon signed rank tests were used in a few cases in which conditions for using parametric tests were not reached. To better understand the mechanisms responsible for the MVC loss, we performed Pearson correlation analyses by using the deltas between PRE-POST1 and PRE-POST2 on the factor MVC explained by VAL and P100 Hz.

Because of the non-normal distributions of EEG variables, Spearman correlations were used to explore the relationship between neuromuscular and MRCP modulations. The three neuromuscular parameters used for the correlation analyses were the reduction in MVC, VAL, and P100 Hz, which were chosen as global, central and peripheral indices of fatigue, respectively. Those parameters were correlated with the four MRCP components modulations (i.e., BP1, BP2, NS′ , MP) at the FC1-FC2 and C2 electrodes. All statistical analyses were performed using the software Statistica 12.6 (Statsoft, Tulsa, USA). The level of significance was set to p < 0.05. The results are presented as mean ± standard deviation.

# RESULTS

## Participants

Data from four of the original 20 recruited participants had to be excluded because of heavily artifacted EEG signals. The mean age of the 16 remaining participants was 29 ± 7 (years ± SD), and their body mass index was 22.9 ± 1.6 (kg•m−<sup>2</sup> ). All of them were active road cyclists and/or triathlon athletes, had a total score of 9.2 ± 0.7 on the Baecke questionnaire, and reached a MAP of 385 ± 47 W at a VO˙ <sup>2</sup> peak of 63.8 ± 5.9 ml•min−1•kg−<sup>1</sup> during the incremental cycling test.

## Exercise Data

The mean power output during the 30-min exercise was 231 ± 97 W. The mean duration for the TT was 15.7 ± 1.6 min and the average power was 279 ± 31 W. The RPE was 14.9 ± 1.7 at the end of the heavy exercise, whereas the RPE reached 19.7 ± 0.5 at the end of the TT.

## Neuromuscular Data

#### Force

The sequence of the two cycling exercises caused a significant reduction in the MVC force measured at POST1 and POST2 [F(2, 30) = 55.34, p < 0.001; **Table 1**]. **Figure 2** shows two representative recordings of a superimposed MVC with a 100 Hzpotentiated doublet in the PRE (A) and POST2 (B) conditions.

#### Peripheral and Central Fatigue

The peripheral indices of fatigue measured by the Pt force and P100 Hz were significantly reduced [F(2, 30) = 36.95, p < 0.001 and Chi<sup>2</sup> (N = 16, df = 2) = 6.5, p = 0.039], respectively after POST1 and POST2 (**Table 1**). For the M-wave amplitude, Friedman ANOVA revealed a significant effect of Time [Chi<sup>2</sup> (N = 16, df = 2) = 9.5, p = 0.008]. However, although no reduction was observed between PRE and POST1 (p = 0.3), the difference in the M-wave amplitude between PRE and POST2 was significant (p = 0.007). Concerning central fatigue, a Time effect [F(2, 30) = 14.21, p < 0.001] was observed for VAL. For all the significant main Time effects, post-hoc tests revealed a significant difference between PRE and POST1 (except for M-wave) and between PRE and POST2 (All p < 0.033; **Table 1**).

#### Correlations

At POST1, Pearson correlation analysis showed a trend toward a relationship between the reduction in MVC force and the reduction in P100 Hz (r = 0.44, p = 0.08), but without reaching the level of significance. No relationship was found between MVC and VAL reduction (r = 0.32, p = 0.22). At POST2, the results indicated no significant relationship between the decrease in MVC and P100 Hz (r = 0.27, p = 0.3), whereas a positive relationship was found between the reduction in MVC and VAL (r = 0.5, p = 0.047).

#### MRCP Data

The MRCP grand averages between the PRE, POST1, and POST2 conditions at FC1-FC2 and C2 are shown in **Figures 3A,B**. The common MRCP shape can be observed, with a typical increase in slope at ∼1000 and 500 ms before movement onset.

#### Mean Activity at FC1-FC2

ANOVA revealed an effect of Time for FC1-FC2 mean activity for the components BP1 [F(2, 30) = 3.93, p = 0.03], BP2

TABLE 1 | Neuromuscular indices of central and peripheral fatigue measured before the fatiguing task (PRE), after the heavy exercise (POST1), and after the 10-km time trial (POST2).


MVC, maximal voluntary contraction; VAL, voluntary activation level with reference to the 100 Hz resting peak doublet. Pt, muscle twitch; P100 Hz, resting peak doublet at 100 Hz; M-wave, peak to peak M-wave amplitude. 1 PRE, percentage of difference from PRE. Significant differences from PRE: \*p < 0.05; \*\*p < 0.01.

FIGURE 3 | (A) Grand MRCP average recorded above the FC1-FC2 (i.e., above supplementary motor area) and (B) C2 electrodes (i.e., above primary motor cortex) in the pre-exercise condition (dark blue curve), after the heavy exercise (orange curve) and after the 10-km time trial (light blue curve). X-axis units: time in milliseconds locked to the movement onset. Y-axis units: amplitude in microvolts. (C) Mean activity of movement-related cortical potential components measured before the cycling task (PRE), after the heavy exercise (POST1), and after the 10-km time trial (POST2) on the FC1-FC2 electrodes (shaded column) and the C2 electrode (filled column). BP1, mean activity from −1500 to −1000 ms; BP2, mean activity from −1000 to −500 ms; NS', mean activity from −500 ms to movement onset; MP, peak amplitude. Significant differences from PRE: \*p < 0.05; \*\*p < 0.01.

[F(2, 30) = 7.73, p = 0.019], NS′ [F(2, 30) = 7.17, p = 0.003], and MP [F(2, 30) = 7.39, p = 0.002; **Figure 3C**]. The post-hoc tests indicated a significant reduction between PRE and POST1 for BP2 (p = 0.007), NS′ (p = 0.012), and MP (p = 0.016). At POST2, the reduction was significant for the four MRCP components (All p < 0.04; **Figure 3C**).

#### Mean Activity at C2

ANOVA revealed a Time effect on the C2 mean activity for the component BP1 [F(2, 30) = 4.91, p = 0.014], BP2 [F(2, 30) = 8.83, p < 0.001], NS′ [F(2, 30) = 9.14, p < 0.001], and MP [F(2, 30) = 8.43, p = 0.001]. The post-hoc tests indicated a significant decrease between PRE and POST1 only for B2 (p = 0.01), whereas the reduction was significant for the four components between PRE and POST2 (all p < 0.012; **Figure 3C**).

#### Neuromuscular and MRCP Correlations

Bonferroni post-hoc tests revealed significant changes for the MRCP components between PRE-POST1 and between PRE-POST2 conditions, whereas no changes were observed between POST1 and POST2 (see **Figure 3**). Therefore, the correlation analyses between MRCP and neuromuscular modulations were based solely on the PRE-POST1 and PRE-POST2 differences.

No correlation was found between PRE and POST1. In contrast, between PRE and POST2, the reduction in P100 Hz was correlated with the decrease in NS′ (r = 0.61) and the MP (r = 0.61) amplitude above FC1-FC2 (all p < 0.05), whereas the reduction in VAL was correlated with the decrease in BP1 (r = 0.57), BP2 (r = 0.65), NS′ (r = 0.72), and MP (r = 0.64) above C2 (all p < 0.05). The correlations between the reduction in VAL and MP above FC1-FC2 and between the reduction in P100 Hz and MP amplitude above C2 are illustrated in **Figure 4**. Note that the pre-motor potential is a negative value, and thus, a reduction in the amplitude represents a change toward zero (i.e., the value becomes less negative).

#### DISCUSSION

The current study was designed to induce neuromuscular fatigue by using two successive cycling exercises, at heavy and severe intensities, and to assess the related effects on MRCP. The second aim was to relate the exercise-induced MRCP modulations with neuromuscular alterations. Although the large-muscle-group exercise induced neuromuscular fatigue, the results indicated a reduction in MRCP amplitude instead of an increase as expected. The cycling exercise induced both peripheral alterations (as indicated by the decreased P100 Hz and Pt) and central impairments (as indicated by the reduction in VAL). The exercise intensity difference between the heavy exercise and the TT was indicated by a higher perception of effort and a greater strength loss after the second exercise. After heavy exercise, MRCP was reduced mainly above the FC1-FC2 electrodes, whereas the MRCP reduction observed at the end of the TT was associated with the FC1-FC2 and C2 electrodes. The relationship found between the reduction in the late MRCP components (i.e., NS′ , MP) and P100 Hz above FC1-FC2 and with VAL above C2 indicated a close interaction between neuromuscular fatigue and pre-motor brain activity.

# Physical Exercise and Neuromuscular Fatigue

An MVC loss of 10% was observed after the heavy exercise. As expected, this reduction was associated with peripheral alterations characterized by a decrease of 20% in Pt force, without changes in the VL M-wave amplitude and with central fatigue, as reflected by the decrease in VAL. Such neuromuscular changes are very similar to those reported by Lepers et al. (2001). After a cycling exercise lasting 30 min at 80% MAP among trained athletes, the authors reported a decrease of 13% in knee extension force, accompanied by a reduction of 20% in the Pt without changes in the M-wave properties, suggesting alteration of processes located beyond action potential propagation/transmission. Such a reduction in Pt force may be related to intracellular disturbances, such as reduced Ca2<sup>+</sup> release from the sarcoplasmic reticulum, decreased sensitivity of myofilaments to Ca2+, changes in metabolite (H+, inorganic phosphate) concentrations within the muscle, and/or reduced force produced by each active cross-bridge (Allen et al., 2008). The absence of significant correlations between MVC force loss and central or peripheral markers of fatigue between PRE and POST1 does not allow for a clear determination of the origin of fatigue. However, relying on other studies demonstrating that peripheral fatigue develops early during such exercise (Decorte et al., 2010), we believe that the trend observed between the MVC force loss and the reduction in P100 Hz in POST1 favors a major role for peripheral alterations.

After the severe intensity exercise, knee extensor MVC force was decreased by 21%. This strength loss is comparable to the results of Lepers et al. (2001), who reported a reduction of 16% after 30 min of cycling at 80% of PMA. As at POST1, peripheral and central alterations participated in knee extensor force impairment at POST2. The additional peripheral fatigue (−32% in Pt at POST2 vs. −21% at POST2) can be attributed to alteration in action potential transmission/propagation, as indirectly indicated by the 10% reduction in VL M-wave amplitude at POST2. The finding that the VL M-wave amplitude decreased at POST2 but not at POST1 confirms the results of Lepers et al. (2002, 2004) suggesting that cycling exercise must be of sufficient intensity and duration to affect muscle excitability. The purpose of performing the 10-km time trial after having previously performed a heavy exercise lasting 30 min at 60% of MAP (i.e., 67% VO˙ <sup>2</sup> peak), with only a 15-min break (i.e., for data collection), was also to generate greater central fatigue. Indeed, VAL was 91% in PRE condition and decreased to 80% at POST2, in accord with the literature (Lepers et al., 2001; Millet and Lepers, 2004). This finding suggests altered CNS functioning leading to a limitation of descending motor drive. Overall, our results indicate that peripheral mechanisms were mainly involved in the development of fatigue at POST1, whereas central fatigue played a major role in force reduction at POST2.

# MRCP Data

Before, between and after exercise, we observed MRCPs with a shape and amplitude similar to those reported in the literature (Shibasaki and Hallett, 2006). The MRCPs were characterized by a slow negative shift starting between 1500 and 2000 ms before movement onset, with a typical change in slope occurring at approximately −500 ms (**Figure 3**). Our data indicate that this inflection occurred slightly earlier (around −750 ms), likely because the trigger was set at the onset of movement instead of the onset of EMG activity. Both fatiguing cycling tasks resulted in a decrease in MRCP amplitude recorded during 60 spontaneous knee extensions at 20% of MVC. A long-term effect on brain activity was already reported by Thacker et al. (2014), who showed MRCP modulations 1 h after the end of an endurance exercise. Our results extend these findings by showing that the large-muscle-group exercise led to distinctive changes in the pre-motor potential during the two 15-min periods post-exercise (POST1 and 2) compared with the fresh condition (PRE).

The present findings do not confirm that pre-motor brain activity increases with muscle fatigue, as expected and as previously reported in repeated single-joint contraction protocols. Our study used a dynamic large-muscle-groupfatiguing task quite different from the task used to quantify changes in MRCP. It is therefore difficult to compare our results with those of previous single-joint contractions protocols. Several authors have asked their experimental subjects to perform repetitive blocks of contractions and have compared pre-motor potential amplitudes between the first and the last blocks. In such designs, fatigue is not experimentally manipulated through a specific fatiguing task. Another difference from our design is the potential for mental load. In most studies, participants have had to achieve a given force level during contraction by using visual feedback. A high degree of concentration must be maintained to correctly perform the task throughout the block of repeated contractions. However, it has been suggested that an increase in attentional load and the mobilization of cognitive resources could be a confounding factor resulting in an increase in MRCP amplitude (Freude and Ullsperger, 1987; Berchicci et al., 2013). Another factor to consider is the contraction force level used during the MRCP task. In the studies of Johnston et al. (2001) and Schillings et al. (2006), the MRCP task consisted of contractions at 70% of the MVC. By comparing the premotor potential amplitude between the first and last blocks of contractions, the authors reported an increase in MRCP. According to the authors, this result suggested that the cortical activity compensated for the reduction in strength capacities (i.e., to provide the same level of force, the brain had to mobilize more resources). In the present study, the load lifted during the MRCP task was kept low (20% of MVC force) and additional cortical activation to compensate for peripheral fatigue did not appear to be required. In a similar vein, Morree et al. (2012) did not observe any increase in premotor potential after repeated contractions at 20% of MVC force despite a final maximal force loss of 35%.

The difference in the effects on pre-motor potential observed in this study and others (i.e., a reduction in pre-motor vs. an increase, respectively) may arise from the differing fatiguing task features. The effect of single-joint contraction tasks on motor cortex excitability measured by transcranial magnetic stimulation has been reported to be different from the effect observed for locomotor exercises. The excitability of the motor cortex increases after a fatiguing single-joint contraction task, but after a 30-min steady-state sustained cycling exercise, Sidhu et al. (2012) reported no increase in the responsiveness of the motor cortex. Modulation of cortical excitability is likely task-specific and may be related to the systemic physiological consequences of large-muscle-group exercise. The reduction in MRCP could be related to input from group III and IV muscle afferents to the brain. Indeed, fatiguing voluntary contraction lengthens the cortical silent period, as measured by transcranial magnetic stimulation, which is believed to reflect intracortical inhibitory activity (Gruet et al., 2013). However, when the activation of group III and IV muscle afferents are artificially blocked with an anesthetic solution (fentanyl) injection, the cortical silent period is not prolonged, indicating a modulation of intracortical inhibitions (Hilty et al., 2011b). Similarly, maintaining muscle afferent activity at the end of a fatiguing task with ischemia reduces motor cortex output that maximally activates the muscle (Gandevia et al., 1996). More generally, endurance exercise modulates several additional parameters that might play a role in the reduction of the pre-motor potential, such as cerebral oxygenation (Ide and Secher, 2000), brain catecholamines (Nybo and Secher, 2004), or hyperthermia (Périard et al., 2011).

To the best of our knowledge, only one study has investigated MRCP modulations after a large-muscle-group endurance exercise (Thacker et al., 2014). The exercise consisted in pedaling for 20 min at 70% of the age-predicted maximum heart rate on a cycle ergometer. To remove confounding and fatigue factors, the authors used an MRCP task involving the upper limbs (i.e., wrist extension). The results indicated no changes in MRCP amplitude or onset immediately after the end of exercise. We assume that the absence of modification was caused by insufficient exercise intensity or durations and/or because the muscle group used for the MRCP task was different from that mobilized for the fatiguing task.

In our protocol, a reduction in MRCP was observed during movement preparation, initiation, and movement onset, as reflected by the reduction in the BP, NS′ , and MP components, respectively. The heavy exercise induced a decrease in MRCP amplitude for the components BP2, NS′ , and MP above FC1- FC2, whereas only BP2 decreased significantly above C2. After the TT, all MRCP components decreased significantly from the PRE condition above the FC1-FC2 and C2 electrodes. Our results indicate that heavy exercise affects the MRCP differently if it is recorded above SMA or M1. The SMA is connected to the subcortical region, relays sensory feedback received from the muscle and sends direct signals to the M1 (Dum and Strick, 1991; Cadoret and Smith, 1997; Colebatch, 2007), such that the neurons in the SMA are activated several milliseconds before activity within the pyramidal tract (Eccles, 1982). Thus, it appears that the modulations observed above the SMA are more closely related to the integration of peripheral mechanisms, whereas the modulations above the primary motor cortex are more likely to reflect the cortico-motor command. The positive correlation between the increase in peripheral fatigue measured by P100 Hz and the reduction in MRCP (i.e., NS′ and MP) above the SMA strengthens the assumption that this area may be under the modulatory influence of muscle activity, likely via type III and IV afferents. This assumption is also supported by the studies of Gandevia (2001) and Amann et al. (2011) which showed that type III/IV muscle afferents are likely to exert an inhibitory effect on central motor drive during whole-body exercise.

At POST1, the components related to movement preparation are first altered without affecting the execution period of movement production, as reflected by no differences in NS′ and MP amplitude above M1. When neuromuscular fatigue is more pronounced, as observed at POST2, the components related to movement preparation and execution are both modulated, showing a growing impact of fatigue on all the components involved in movement production. The activation of the corticospinal tract via the primary motor cortex is the final step of movement production. Thus, when exercise-induced fatigue decreases the MP above the SMA and M1, it is possible that the motor cortex does not reach the level of activity required to produce the same level of voluntary activation, resulting in a greater decrease in MVC force, as observed at POST2. The positive correlation between the decrease in VAL and the reduction in MRCP above M1 support this assumption, as well as the positive correlation between the drop in MP amplitude above M1 and the reduction in MVC force. Interestingly, some authors have reported neural plasticity between the SMA and M1. By using transcranial magnetic stimulation, Arai et al. (2012) showed that the motor-evoked potential generated by stimulating M1 can be modulated by an SMA-conditioning stimulation procedure. Recently, Bajaj et al. (2015) also showed that the neural connectivity between the SMA and M1 could

## REFERENCES


be modulated by therapy in stroke survivors. It is not unlikely that the acute fatiguing task performed in our study reorganized the connectivity between the SMA and M1 and that the neural impulses sent from SMA to M1 were reduced under the influence of afferent projections.

The limits of our study concern the factors related to movement characteristics, such as strength, accuracy, or rate of force development, all of which are known to modulate the MRCP (Shibasaki and Hallett, 2006). In our protocol, the range of motion was the same between trials. However, the movement duration was not controlled, and we cannot exclude that the duration between the onset of muscle activity and the onset of movement did not change with fatigue. In addition, the MRCP modulations reported in this study are specific to well-trained athletes and cannot be extended to the general population. Because neuroelectric activity could be related to physical fitness (Kamijo et al., 2010), further studies have to be repeated in different populations to determine the impact of fitness on MRCP, especially after endurance exercises.

In conclusion, the MRCP reflects the intention to move and the preparatory period for the intended movement. The results of our study indicate that a cycling exercise induces peripheral and central fatigue and reduces MRCP amplitude. The MRCP components related to movement planning and initiation above the SMA area first altered by heavy exercise-related fatigue, likely by peripheral muscle activity. When neuromuscular fatigue is substantial, as observed after the TT, the overall reduction in MRCP, especially the reduction in the brain component related to movement execution above the primary motor cortex (i.e., MP), is associated with the reduction in the maximal voluntary level, resulting in a decrease in maximal voluntary force.

Finally, large-muscle-group exercise induces neuromuscular fatigue, resulting in an alteration of the corticospinal command. Because this study indicates that this command is modulated by pre-motor cortical activity, we now suggest taking a step backwards to investigate post-exercise resting state electrocortical dynamics, from which the intention to move emerges.

#### AUTHOR CONTRIBUTIONS

JNS, FB, JB: Contributions to the conception of the work. JNS, FB, NP, BK, JB: Contributions the acquisition and interpretation of data. JNS, FB, NP, BK, JB: Revising the content. JNS, FB, NP, BK, JB: Final approval of the version.

for high-intensity endurance exercise performance in humans. J. Physiol. 589, 5299–5309. doi: 10.1113/jphysiol.2011.213769


rehabilitation. Neuroimage Clin. 8, 572–582. doi: 10.1016/j.nicl.2015. 06.006


mid/anterior insular and motor cortex during cycling exercise. Eur. J. Neurosci. 34, 2035–2042. doi: 10.1111/j.1460-9568.2011.07909.x


peripheral fatigue quantification. Eur. J. Appl. Physiol. 114, 205–215. doi: 10.1007/s00421-013-2760-2


chronic fatigue syndrome. Clin. Neurophysiol. 115, 2372–2381. doi: 10.1016/j.clinph.2004.05.012


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Spring, Place, Borrani, Kayser and Barral. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Does Combined Physical and Cognitive Training Improve Dual-Task Balance and Gait Outcomes in Sedentary Older Adults?

#### Sarah A. Fraser <sup>1</sup> \*, Karen Z.-H. Li <sup>2</sup> , Nicolas Berryman3,4 , Laurence Desjardins-Crépeau4,5 , Maxime Lussier <sup>4</sup> , Kiran Vadaga<sup>2</sup> , Lora Lehr 4,5 , Thien Tuong Minh Vu4,6 , Laurent Bosquet <sup>7</sup> and Louis Bherer 4,6

1 Interdisciplinary School of Health Sciences, University of Ottawa, Ottawa, ON, Canada, <sup>2</sup>Department of Psychology, Concordia University, Montréal, QC, Canada, <sup>3</sup>Sports Studies, Bishop's University, Sherbrooke, QC, Canada, <sup>4</sup>Centre de Recherche de l'Institut Universitaire de Gériatrie de Montréal, Montréal, QC, Canada, <sup>5</sup>Department of Psychology, Université du Québec à Montréal, Montréal, QC, Canada, <sup>6</sup>Medecine, Université de Montréal, Montréal, QC, Canada, <sup>7</sup>Laboratoire MOVE (EA6314), Faculté des Sciences du Sport, Université de Poitiers, Poitiers, France

#### Edited by:

Claudia Voelcker-Rehage, Chemnitz University of Technology, Germany

#### Reviewed by:

Sean Commins, Maynooth University, Ireland Wei Peng Teo, Deakin University, Australia

> \*Correspondence: Sarah A. Fraser sarah.fraser@uottawa.ca

Received: 16 November 2016 Accepted: 23 December 2016 Published: 18 January 2017

#### Citation:

Fraser SA, Li KZ-H, Berryman N, Desjardins-Crépeau L, Lussier M, Vadaga K, Lehr L, Minh Vu TT, Bosquet L and Bherer L (2017) Does Combined Physical and Cognitive Training Improve Dual-Task Balance and Gait Outcomes in Sedentary Older Adults? Front. Hum. Neurosci. 10:688. doi: 10.3389/fnhum.2016.00688 Everyday activities like walking and talking can put an older adult at risk for a fall if they have difficulty dividing their attention between motor and cognitive tasks. Training studies have demonstrated that both cognitive and physical training regimens can improve motor and cognitive task performance. Few studies have examined the benefits of combined training (cognitive and physical) and whether or not this type of combined training would transfer to walking or balancing dual-tasks. This study examines the dual-task benefits of combined training in a sample of sedentary older adults. Seventy-two older adults (≥60 years) were randomly assigned to one of four training groups: Aerobic + Cognitive training (CT), Aerobic + Computer lessons (CL), Stretch + CT and Stretch + CL. It was expected that the Aerobic + CT group would demonstrate the largest benefits and that the active placebo control (Stretch + CL) would show the least benefits after training. Walking and standing balance were paired with an auditory n-back with two levels of difficulty (0- and 1-back). Dual-task walking and balance were assessed with: walk speed (m/s), cognitive accuracy (% correct) and several mediolateral sway measures for pre- to post-test improvements. All groups demonstrated improvements in walk speed from pre- (M = 1.33 m/s) to post-test (M = 1.42 m/s, p < 0.001) and in accuracy from pre- (M = 97.57%) to post-test (M = 98.57%, p = 0.005).They also increased their walk speed in the more difficult 1-back (M = 1.38 m/s) in comparison to the 0-back (M = 1.36 m/s, p < 0.001) but reduced their accuracy in the 1-back (M = 96.39%) in comparison to the 0-back (M = 99.92%, p < 0.001). Three out of the five mediolateral sway variables (Peak, SD, RMS) demonstrated significant reductions in sway from pre to post test (p-values < 0.05). With the exception of a group difference between Aerobic + CT and Stretch + CT in accuracy, there were no significant group differences after training. Results suggest that there can be dual-task benefits from training but that in this sedentary sample Aerobic + CT training was not more beneficial than other types of combined training.

Keywords: dual-task gait, dual-task balance, combined training (physical and cognitive dual-task training), transfer effects

# INTRODUCTION

Standing balance and walking are two motor activities that are assumed to be automatic in nature and not requiring a great degree of conscious thought. However, in real life situations, it is rare that we are simply standing still or walking without attending to some secondary task (i.e., walking and talking). Research that examines the ability to attend to two tasks simultaneously during gait and balance suggests that both younger and older adults involve some executive processes (planning, inhibiting, and switching/coordinating) in order to manage these dual-task situations (Woollacott and Shumway-Cook, 2002; Allali et al., 2008; Hausdorff et al., 2008). A large literature associates these executive processes with the prefrontal cortex (PFC; Miyake et al., 2000; Stuss and Alexander, 2000; Gunning-Dixon and Raz, 2003). The PFC has been shown to be an area that is sensitive to age-associated declines with aging (Buckner, 2004; Hedden and Gabrieli, 2004). Moreover, studies suggest that cognitive functions largely subserved by the PFC can be improved through physical exercise and cognitive training interventions (Colcombe and Kramer, 2003; Erickson et al., 2007; Kramer and Erickson, 2007; Braver et al., 2009; Klingberg, 2010; Bherer et al., 2013). Given the potential plasticity of the PFC and its association with executive functions, the goal of the current study was to assess the benefits of combined physical and cognitive training on dual-task gait and balance in a sample of sedentary older adults.

Although both younger and older adults may involve executive function processes to manage dual-task gait and balance, the literature suggests that the ability of older adults to manage such situations is reduced in comparison to younger adults (Woollacott and Shumway-Cook, 2002; Fraser et al., 2007; Li et al., 2012; Fraser and Bherer, 2013). Further, when active and sedentary older adults are compared, sedentary older adults demonstrate a greater risk of cognitive and physical declines that can influence executive functions and ultimately increase fall risk (Thibaud et al., 2011). This risk increases in dual-task situations (Montero-Odasso et al., 2012). In addition, certain types of dual-task combinations can increase the dual-task interference (Fraser and Bherer, 2013). In particular cognitive tasks that are more executive in nature seem to interfere more with gait (Al-Yahya et al., 2011; Buracchio et al., 2011; Walshe et al., 2015). A systematic review and meta-analysis found that cognitive tasks with ''mental tracking'' (pg. 725) increased the interference during walking to a greater degree than simple reaction time tasks (Al-Yahya et al., 2011). The literature on motor-cognitive dualtasks has certainly demonstrated that both cognitive and motor factors might be contributing to these age differences in dual-task abilities (Woollacott and Shumway-Cook, 2002; Hausdorff et al., 2008; Schaefer and Schumacher, 2010). In general, older adults with mild cognitive impairment or dementia tend to walk more slowly than healthy older adults (Montero-Odasso et al., 2012). In addition, the measurement of gait during dual-task has also revealed differences in gait parameters between healthy and cognitively impaired seniors (Sheridan et al., 2003; Springer et al., 2006; Muir et al., 2012) and fallers and non-fallers (Springer et al., 2006; Montero-Odasso et al., 2012). Such findings suggest that dual-task gait might be an important clinical marker of cognitive decline and falls risk (Montero-Odasso et al., 2012).

Similarly, maintaining standing balance is considered a complex skill that requires several elements and cognitive processing has been identified as one of them (Horak, 2006). In comparison to the gait literature, the literature on dual-task balance assessments demonstrates that this approach has been successful in differentiating fallers from non-fallers (Brauer et al., 2001; Condron et al., 2002; Hauer et al., 2003) and somewhat successful in differentiating healthy older adults from those with cognitive impairment (Zijlstra et al., 2008; Muir-Hunter et al., 2014). More research using static balance dual-tasks is required to determine if this type of assessment can identify different forms of cognitive impairment. Despite this, gait and balance are both postural control tasks and common to both is the association with executive functions (van Iersel et al., 2008) and the potential for cognitive plasticity.

Cognitive plasticity and enhanced motor function have been targeted with intervention studies. Several intervention studies have demonstrated that exercise training can improve cognitive outcomes (Predovan et al., 2012; Bherer et al., 2013; Langlois et al., 2013; Berryman et al., 2014; Bherer, 2015) and physical outcomes (Shumway-Cook et al., 1997; Cadore et al., 2013; de Labra et al., 2015). Regarding specific types of exercise training, aerobic training in particular has been associated with enhanced executive functions (Kramer et al., 1999; Hall et al., 2001; Colcombe and Kramer, 2003) and increased activations in the frontal and parietal cortices (Colcombe et al., 2004, 2006). However, most review and meta-analyses tend to suggest that exercise intervention combining aerobic and strength training components lead to greater improvement in cognition (Colcombe and Kramer, 2003). Cognitive training (CT) can also lead to improvement (Segev-Jacubovski et al., 2011; Belleville and Bherer, 2012) in several cognitive domains including working memory, attention, and executive functions (Li et al., 2008; Kueider et al., 2012; Leung et al., 2015). There is however, some debate about the transfer of cognitive training to untrained tasks or everyday activities (Lee et al., 2012; Lussier et al., 2015). Despite this, some cognitive training interventions have shown transfer demonstrating improved motor outcomes in older adults in balance (Li et al., 2010), gait (Verghese et al., 2010), balance and gait (Smith-Ray et al., 2013) and activities of daily living (Willis et al., 2006). More specifically, in the first three studies mentioned, participants received a computerized cognitive training (seated at a computer) and outcome measures such as mediolateral sway (Li et al., 2010), gait speed in single and dual-task walking (Verghese et al., 2010) and timed-up-and go and distracted walking (Smith-Ray et al., 2013) improved in comparison to control groups that did not receive this training.

When exploring training specific to motor-cognitive dual-task situations, a systematic review specific to dual-task outcomes, revealed that motor-cognitive dual-task training was beneficial to standing balance performance and that walking outcomes were improved by both single-task and dual-task training (Wollesen and Voelcker-Rehage, 2013). It is important to note that a majority of the studies contained in the review trained the component tasks either separately (cognitive task alone or motor task alone) or concurrently (dual-task condition in which cognitive and motor task were trained simultaneously), then assessed single- and dual-task performances with the same combination of tasks. None of the studies looked specifically at combined cognitive and physical training and transfer to untrained dual-task balance and gait tasks.

A recent systematic review of combined cognitive and physical training interventions in older adults with or without cognitive impairment demonstrated that this type of combined training was beneficial in two of the three studies on older adults without cognitive impairment and four out of the five studies examining individuals with cognitive impairment (Law et al., 2014). Only one of the studies contained in the review examined dual-task walking as an outcome measure and included an active placebo control group that received low intensity exercise training (Schwenk et al., 2010). This study found that training older individuals with cognitive impairment in dual-task walking, at an adaptive level of difficulty (challenging the patients), led to improvements in dual-task walk outcomes in comparison to the placebo control. While this study is explicitly highlighted as a well-designed randomized control trial in the review, there were a few critiques. Specifically, components of the dual-task assessed at pre-and post-intervention were similar to the tasks being trained and the placebo control did not have the same hours of training.

Taken together, there are few studies that have explicitly examined transfer from a combined cognitive and physical training program to dual-task gait and balance outcomes in a sample of sedentary older adults. There were several outcome measures included in this large mobility study (see Desjardins-Crépeau et al., 2016). Only the results specific to balance and gait dual-task outcomes are reported here. The goal of the current study was to assess whether or not combined cognitive and physical training benefits transferred to dual-task walk and balance outcomes and whether or not there would be synergistic effects when combining cognitive dual-task training and aerobic training which have demonstrated specific benefits to executive function. In the present study, there were four training groups including either aerobic, cognitive or active control conditions. The first three groups all contained components designed to improve executive function and the last group was considered our active placebo-placebo control. All groups were expected to benefit to some degree from training but we expected the Aerobic + CT group to demonstrate greater improvements in dual-task walking and balance outcomes than the other three groups after training. Further, we expected that the placeboplacebo control would demonstrate the least amount of benefits to dual-task walking and balance outcomes in comparison to the three other groups which all included conditions designed to improve executive function.

# MATERIALS AND METHODS

# Participants

Sedentary older adults (≥60 years old) who performed less than 150 min of physical activity per week were recruited from public advertisements (flyers, newspapers) and from the research center's participant pool. One hundred and thirty six individuals passed our phone screening, which had the following exclusion criteria: history of neurological disease or major surgery in the year preceding the study, uncorrected auditory or visual impairments, smoking, severe mobility limitations or any other physical activity contraindications, and being currently engaged in any structured physical activity. The 136 were randomly assigned to one of the four training groups (Aerobic + Cognitive Training (CT); Aerobic + Computer lessons (CL); Stretch + CT, Stretch + CL). Our Aerobic + CT group contained both physical and cognitive components to improve executive function and our placebo-placebo control containing no aerobic or cognitive dual-task training was the Stretch + CL group. Of those 136, 11 abandoned the study before beginning any training (for personal reasons, time constraints, or recent injuries that restricted their ability to participate). Of the 125 who entered into the study and provided written and informed consent that conformed to the Montreal Geriatric Institute ethics committee and the Declaration of Helsinki, 22 abandoned during training and 31 were excluded, at post-test, for the following reasons: wearing a hearing aid (affected performance on auditory task with headset), having a Geriatric Depression Score greater than 11 (which is suggestive of mild/moderate depression), and insufficient training (less than 75% of training sessions completed). Reasons for abandoning typically related to: injury outside the lab that impeded participation; death in the family; and other commitments. Those who abandoned were equally distributed across the training groups. This study was carried out in accordance with the recommendations of human research by the ethics committee of the Montreal Geriatric Institute. The protocol was approved by this ethics committee.

Descriptive characteristics (i.e., age, education, etc.) of the participants included in the final sample (n = 72; 51 women/21 men) are presented in **Table 1** at the beginning of the ''Results'' Section. It is important to note that we began balance assessments in the third cohort recruited for this study, as such, there are fewer participants in each of the training groups that completed the dual-task balance assessments. For



Note: 6MWT, 6 min walk test (total meters completed in 6 mins); PPT, modified Physical Performance Test (max score 36); ABC balance questionnaire (max score 100% confidence in balance abilities); MMSE, Mini Mental State Exam (a global measure of cognition, max score 30); GDS, Geriatric Depression Scale (scores equal to or above 11 are indicative of mild/moderate depression).

dual-task balance, n = 15 (Aerobic + CT); n = 12 (Aerobic + CL); n = 16 (Stretch + CT); and n = 11 (Stretch + CL).

#### Protocol

All participants underwent pre-testing across 3 days. On Day 1, pre-testing involved: a comprehensive medical exam with a geriatrician. On Day 2, participants completed a full neuropsychological battery and a dual-task cognitive training baseline on the computer (see Desjardins-Crépeau et al., 2016 for a detailed description of the protocol). Finally on Day 3, participants completed a battery of physical tests (including the 6 min walk test, the timed-up-and-go, and the short physical performance battery) and dual-task walking and balance assessments. Upon completion of the pre-testing groups of 4–8 individuals began their respective training protocol, which involved 12 weeks of training 3 times a week. All participants had two 60 min sessions of physical exercise (aerobic or stretch) and one 60 min session of cognitive stimulation (dual-task training or computer lessons). The mixed aerobic training involved a 5 min warm up, 15 min of lower body resistance training, 30 min of cardiovascular exercise on a treadmill and a cool down. Intensity of the exercise increased over the sessions based on each individuals' ratings of perceived exertion on the Borg (1998) scale. The Stretching and toning group also had a 5 min warm-up and cool down but spent the majority of the training time (50 min) performing whole body stretching exercises mainly in a seated position. The computer dual-task training involved two visual discrimination tasks (number and shape discrimination) that were based on previous cognitive training paradigms (Bherer et al., 2005, 2006, 2008). Participants were encouraged to be as accurate and rapid as possible at responding to the tasks alone (singletask) and simultaneously (dual-task). Participants were provided continuous feedback on their response time (during the session) and provided feedback for both response time and accuracy at the end of each session. In contrast to the dual-task training, those who received computer lessons had demonstrations and trials with different computer applications (i.e., word and excel) and learned how to search the internet. At the end of the 12 weeks, all participants returned for 2 days of posttesting. The first day of post-testing involved completing the same neuropsychological assessments completed at pre-test and the second day of post-testing involved completing the same battery of physical tests and dual-task walk and balance assessments.

#### Main Outcome Measures

#### Cognitive Accuracy: N-back Task (With and Without Concurrent Walking/Balance)

The n-back task is a working memory task that can be parametrically manipulated to increase the memory load during testing (Jaeggi et al., 2003; Doumas et al., 2008). Typically during an n-back task, a series of stimuli are presented and the participant is asked to remember and respond to the stimuli that they heard ''n'' items-back (0-back, 1-back, etc.). In the lowest load version (0-back), individuals simply have to remember and report the stimuli they just heard. As the number of items back increases, the working memory load increases placing greater attentional demands on the individual (Jaeggi et al., 2003; Doumas et al., 2008). The auditory n-back used in this study involves working memory and mental tracking as the participant has to remember the numbers that are continuously being presented and as a new number is presented say out-loud the number they heard one item back. During the balance task, in order to minimize muscle fatigue from repeated balance assessments (Helbostad et al., 2010), we chose to complete the dual-task balance with the 1-back only. During the walk portion, we used two levels of difficulty the 0-back and the 1-back version of the task (with and without concurrent walking). In all conditions (single (n-back only) and dual-task (n-back + motor task)), a mean accuracy score was computed (percent correct of total possible responses (%)).

The n-back task in the current study has already successfully demonstrated differences in cognitive performances during walking in sample of older women who completed a combined training intervention (Fraser et al., 2014). In this auditory n-back, the to-be-remembered stimuli were numbers (0–9). A visual depiction of the 0-back and 1-back conditions is presented in **Figure 1**. The numbers were recorded in a female voice and soundfiles (wavfiles) of each number were used in pseudo-randomly ordered lists of numbers that were presented

immediately responds "9". In the 1-back condition the participant hears the first stimuli "9" and has to keep it in mind until they hear the second stimuli "1" and at this point they respond with the first item they heard "9", then they have to keep in mind the "1" until presented with the next stimuli, and so on.

using E-prime2 software (Psychology Software Tools Inc., Sharpsburg, PA, USA). The numbers were pseudo-randomly ordered to ensure that there were no repeats (9-9) and no ordered series (1-2-3). The numbers were presented through wireless headphones (Sennheiser Canada, Pointe-Claire, QC, Canada). For DT-balance, six lists of eight numbers were created; two were used for practice and the remaining for the test phase. For DTwalk, 10 lists of 12 numbers were created; two lists were used for practice and the remaining for the test phase. Since there were two levels of n-back in the DT-walk these lists were used twice.

#### Mobility Measures: Posture and Gait

In both balance and the walk portions of the physical assessment after instructions were provided, the participants wore headphones for both the practice and test phases. Prior to any testing, the dominant leg of the participant was assessed by having the participant begin walking from a standing position three times (Fraser et al., 2007). The leg most often used for gait initiation was considered the dominant leg. For the balance portion participants had to maintain balance with their eyes open, arms at their sides, on their dominant leg for 20 s with and without the 1-back task. Postural control was assessed with a Matscan floor mat that captures the plantar pressures and forces of the foot (Tekscan, Boston, MA, USA). The Matscan raw data was converted to center of pressure measures. In the current study, we measured area (cm<sup>2</sup> ), velocity, peak to peak dispersion, and standard deviation of sway values specific to the mediolateral (ML) and anteriorposterior (AP) directions. Based on the training benefits reported in ML sway in the study by Li et al. (2010), we have focused on area and ML sway variables measured during single- and dual-task balance (Peak ML (cm), SD ML (cm), RMS ML (cm<sup>2</sup> ), Velocity ML (cm/s)). Mean scores for each of the ML sway values were computed. In addition, at the end of the pre- and post-test sessions, we asked each participant to complete the Activities-Specific Balance Confidence (ABC) scale (Powell and Myers, 1995) in order to have a standardized questionnaire of balance confidence in everyday activities. The scale has 16 items which are rated on a 0%–100% scale of confidence, higher scores (closer to 100%) indicate that the individual is completely confident in their balance ability during the activity in question. A mean confidence value from all 16-items was calculated for Pre- and Posttest.

For the walk portion, participants had to walk down a 37-m hallway at a comfortable self-selected pace for 30 s with and without the n-back task. Each meter in the hallway was marked on the floor and an experimenter remained with the participant during walk trial and recorded the number of steps taken. The participant was cued to stop and remain still at the end of each trial by a beep in the head set and the experimenter measured from the dominant heel to the nearest meter marked on the floor to obtain accurate measures of distance. Each walk trial had a fixed time (30 s), therefore meters per second (walk speed) was calculated for each participant. Mean walk speed values (m/s) were calculated for each participant for single and dual-task walking.

#### Procedure: Pre-Test and Post-Test Dual-Task Assessments

Each participant included in the final sample completed preand post-training physical assessments. At the beginning of each physical assessment session, participants were weighed (kg) and their height (cm) and abdominal circumference (cm) measurements recorded. In a seated position the participants were explained the n-back task and told that they would complete this task while seated and while balancing on one foot. They then completed single-task cognitive (SC; 1-back only) practice. Once their dominant leg was determined, the participant then practiced single balance (SB; balancing only) and practiced dual-task balancing (DTBal; balance and 1-back). All trials SC, SB, DT-bal were 20 s in duration and started with three warning beeps and finished with a single beep. After practice, the participants completed the test phase, in which the condition order was (ABCCBA): SC, SB, DTBal, DTBal, SB, SC. This ABCCBA order allowed the different conditions to be distributed throughout the test phase and fatigue in any one condition minimized.

Once the balance portion of the physical assessment was complete, the experimenter explained the walk portion of the experiment. Similar to the balance portion the condition order for the walk portion had practice on each condition (SC, single walk (SW), and DT-walk) followed by a similar ABCCBA order to the balance test. All conditions were 30 s in duration and began with three beeps and finished with one beep. In comparison to the balance portion, the walk portion had additional conditions such that each participant had the following test order (SC, SC, SW, DT-walk, DT-walk, DT-walk, DT-walk, SW, SC, SC). Each participant completed this order with the 0-back task and then with the 1-back task.

## Statistical Analyses

#### Training Effects

Prior to presenting any transfer from training to our single and dual-task walk outcomes, it is important to state whether or not the four groups improved after training. With respect to physical training, we expected groups who received aerobic training would have greater physical benefits than our groups who received stretch training. Improvements in physical training were assessed with Pre-Post 6 min walk test (6MWT). A one way ANOVA on mean post-test scores for this variable was conducted to assess group differences. With respects to cognitive dual-task computer training, we examined if groups who received cognitive training were able to diminish their dual-task costs on a visual dual-task to a greater extent to those who did not receive cognitive dual-training. Change scores were computed ((dual-task costs (PRE)-dual-task costs (POST))/dual-task costs (POST)). Groups with higher change scores demonstrate diminished dual-task costs after training. These change scores were also subjected to a one way ANOVA to test group differences in training effects.

#### Single and Dual-Task Walk Data

A full model was conducted to test within and between main effects and interactions for our variable of interest (walk speed). In order to assess difficulty effects, tasks effects, time effects and training group differences in walk speed, we conducted a 2 × 2 × 2 × 4 ANOVA on mean walk speed with the following within-subjects factors: Task (Single vs. Dual), Time (Pre vs. Post-test), and Difficulty (0-back vs. 1-back) and the betweensubjects factor was Group (Aerobic + CT, Aerobic + CL, Stretch + CT, Stretch + CL). The same ANOVA was used to test any mean differences in accuracy (% correct).

#### Single and Dual-Task Balance Data

The analysis of the dual-task balance data differed from the walk data in that it involved only one level of the n-back task (1-back). For each ML-sway variable and cognitive accuracy score (means only), 2 × 2 × 4 ANOVAs were conducted with the withinsubjects factors of Time (Pre vs. Post), Task (single vs. dual) and the between-subjects factor of Group (Aerobic + CT, Aerobic + CL, Stretch + CT, Stretch + CL). For all statistical analyses, IBM SPSS Statistical package version 21 was used, the alpha was set at (0.05) for significance and all post hoc comparisons were Bonferroni corrected.

#### RESULTS

#### Training Effects

The one way ANOVAs on physical improvements after training revealed that all our groups improved on their 6MWT and there were no significant differences between the groups (p = 0.21). The ANOVA on the dual-task cost change scores revealed significant differences between the groups, F(1,67) = 4.63, p = 0.005, η <sup>2</sup> = 0.17. The groups that received cognitive training had higher change scores than those that did not receive cognitive training and there were no differences in the two groups that received cognitive training (Aerobic + CT and Stretch + CT; p = 0.22). Stretch + CT dual-task cost change (M = 1.19) was greater than Aerobic + CL (M = 0.02; p = 0.002) and Stretch + CL (M = 0.05; p = 0.005). Aerobic + CT dual-task cost change (M = 0.74) was greater than Aerobic + CL (M = 0.02; p = 0.04) and marginally greater than Stretch + CL (M = 0.05; p = 0.07).

#### Single and Dual-Task Walk Data

The walk speed means were subjected to a 2 × 2 × 2 × 4 ANOVA in order to assess within-subjects effects of difficulty, task and time and between-subjects effects of group. The ANOVA revealed a main effect of time, F(1,68) = 27.83, p < 0.001, η <sup>2</sup> = 0.29, in which walk speed means were greater at post-test (M = 1.42 m/s, SE = 0.03 m/s) than at pre-test (M = 1.33 m/s, SE = 0.03 m/s). The ANOVA also revealed a main effect of task, F(1,68) = 193.92, p < 0.001, η <sup>2</sup> = 0.74, in which participants walk speeds were greater in single-task (M = 1.39 m/s, SE = 0.02 m/s) than in dual-task (M = 1.35 m/s, SE = 0.02 m/s). With respects to difficulty, there was also a main effect, F(1,68) = 37.80, p < 0.001, η <sup>2</sup> = 0.36, in which participants walked faster in the more difficult 1-back condition (M = 1.38 m/s, SE = 0.02 m/s) than in the 0-back condition (M = 1.36 m/s, SE = 0.02 m/s).

The ANOVA also revealed two significant interactions. The first, a task by difficulty interaction F(1,68) = 4.56, p = 0.036, η <sup>2</sup> = 0.06, supported the main effect findings such that single-task (ST) walk speed was greater than dual-task (DT) walk speed in both 0-back (MST = 1.38 m/s, SEST = 0.02 m/s > MDT = 1.34 m/s, SEDT = 0.02 m/s) and 1-back (MST = 1.41 m/s, SEST = 0.02 m/s > MDT = 1.36 m/s, SEDT = 0.02 m/s). Further, although in both difficulty levels there were significant differences in ST and DT, the mean difference was higher in the 1-back contrast (0.05 m/s) than in the 0-back contrast (0.04 m/s). In addition, a time by task by difficulty interaction, F(1,68) = 5.83, p = 0.02, η <sup>2</sup> = 0.08, demonstrated that there were differences between single-task and dual-task at each difficulty level, such that walk speed was greater in single-task vs. dual-task and greater in 1-back vs. 0-back, but that these differences were less at post-test (p-values for contrasts < 0.02) compared to pre-test (p-values for contrasts <0.003; see **Figure 2** for mean single and dual-task values across time and difficulty level). Please see **Table 2** for the mean dual-task walk speed values across time, difficulty level and group.

the mean.


TABLE 2 | Dual task mean values for walk speed and accuracy for each group, time and difficulty level.

Note: Standard error of the mean (SE) reported in the brackets beside the mean value.

#### Single and Dual-Task Accuracy Data (% Correct)

Similar to the walk speed data, the mean accuracy scores (% correct) were subjected to a 2 × 2 × 2 × 4 ANOVA in order to assess within-subjects effects of difficulty, task, and time and between-subjects effects of group. There were four significant main effects and three interactions with difficulty. There was a main effect of time, F(1,68) = 8.38, p = 0.005, η <sup>2</sup> = 0.11, in which accuracy was higher at post-test (M = 98.57%, SE = 0.20%) than at pre-test (M = 97.57%, SE = 0.36%). There was also a main effect of task, F(1,68) = 7.80, p = 0.007, η <sup>2</sup> = 0.10, in which accuracy was higher in single-task (M = 98.60%, SE = 0.25%) than in dual-task (M = 97.72%, SE = 0.27%). The main effect of difficulty, F(1,68) = 72.64, p < 0.001, η <sup>2</sup> = 0.52, went in the expected direction with participants providing more accurate responses in the easier 0-back (M = 99.92%, SE = 0.05%) in comparison to the more difficult 1-back condition (M = 96.39%, SE = 0.41%). There was also a main effect of group, F(1,68) = 2.96, p = 0.038, η <sup>2</sup> = 0.12, in which the Aerobic + CT group had lower accuracy scores (M = 97.18%, SE = 0.38%) than the three other groups (Aerobic + CL (M = 98.29%, SE = 0.42%); Stretch + CT (M = 98.70%, SE = 0.41%); and Stretch + CL (M = 98.46%,

SE = 0.44%)). This group effect was further qualified by a significant difficulty by group interaction, F(1,68) = 3.24, p = 0.027, η <sup>2</sup> = 0.13, in which the Aerobic + CT group had lower 1-back accuracy scores (M = 94.38%, SE = 0.76%) than the Stretch + CT group (M = 97.50%, SE = 0.82%) only. In addition to this interaction, there was a difficulty by time interaction, F(1,68) = 7.08, p = 0.01, η <sup>2</sup> = 0.09, which demonstrated that there were no differences pre to post-test (p = 0.49) in the 0-back condition, but there were significant improvements from pre (M = 95.25%, SE = 0.71%) to post-test (M = 97.54%, SE = 0.41%) in the 1-back condition. Finally, there was a difficulty by task interaction, F(1,68) = 6.27, p = 0.015, η <sup>2</sup> = 0.08, in which there were only significant differences in single (M = 97.21%, SE = 0.50%) and dual-task (M = 95.58%, SE = 0.53%) in the 1-back condition. Please see **Table 2** for the mean dual-task accuracy values across time, difficulty level and group. Please see **Figure 3** for mean single- and dual-task accuracy across time and difficulty level.

#### Single and Dual-Task Mediolateral Balance Data

The balance dual-task involved only one level of difficulty, the 1-back. We conducted 2 × 2 × 4 mixed ANOVAs (time by task by group) on each sway variable and mean accuracy (% correct). For velocity there were no significant effects. For Peak sway, there was a main effect of time F(1,50) = 3.99, p = 0.05, η <sup>2</sup> = 0.07, in which Peak sway was higher at Pre-test (M = 11.68 cm, SE = 1.02 cm/s) than at Post-test (M = 9.80 cm, SE = 0.95 cm). The standard deviation (SD) sway variable also demonstrated a significant main effect of time F(1,50) = 5.73, p = 0.02, η <sup>2</sup> = 0.10, in which SD sway was higher at Pre-test (M = 2.31 cm, SE = 0.22 cm) than at Post-test (M = 1.84 cm, SE = 0.21 cm). A significant time by task by group interaction F(1,50) = 3.40, p = 0.025, η <sup>2</sup> = 0.17 was also found for the SD variable. Post hoc analyses revealed that the Stretch + CL group had a significant difference (p = 0.001) in SD single-task (balancing alone) from Pre (M = 2.86, SE = 0.49) to Post (M = 1.42, SE = 0.48) while all other groups did not demonstrate any significant differences. The root mean squared (RMS) variable also demonstrated a main effect of time F(1,50) = 4.26, p = 0.04, η <sup>2</sup> = 0.08, in which RMS was higher at Pre-test (M = 25.40 cm<sup>2</sup> , SE = 0.89 cm<sup>2</sup> ) than at Post-test (M = 23.47 cm<sup>2</sup> , SE = 0.92 cm<sup>2</sup> ). In addition, there was a significant time by task interaction F(1,50) = 7.55, p = 0.008, η <sup>2</sup> = 0.13, in which single-task RMS was not significantly different from Pre- to Post-test (p = 0.14) but that dual-task RMS was higher at Pre-test (M = 25.50 cm<sup>2</sup> , SE = 0.90 cm<sup>2</sup> ) than at Post-test (M = 23.04 cm<sup>2</sup> , SE = 0.91 cm<sup>2</sup> ; p = 0.015). For the 1-back mean accuracy during balancing, the 2 × 2 × 4 mixed ANOVA revealed a main effect of time F(1,50) = 16.61, p < 0.001, η <sup>2</sup> = 0.24, in which Post-test accuracy was higher (M = 95.97%, SE = 0.75%) than Pre-test accuracy (M = 91.72%, SE = 1.09%). There was also a main effect of task F(1,50) = 4.36, p = 0.042, η <sup>2</sup> = 0.08, in which the single-task 1-back accuracy (M = 95.10%, SE = 0.89%) was higher than the dual-task 1-back accuracy (M = 92.59%, SE = 1.07%). There were no other significant effects or interactions. **Table 3** presents the mean dual-task mediolateral sway values for each group pre- and post-test.

# DISCUSSION

In the current study, age-associated changes in physical and cognitive function were targeted with a combined training (physical and cognitive) protocol. Participants were randomly assigned to one of four different training protocols (Aerobic + CT, Aerobic + CL, Stretch + CT, Stretch + CL). The goal of the study was to assess whether or not the benefits of a combined training protocol would transfer to untrained gait and balance dual-tasks. Based on the literature, it was hypothesized that the Aerobic + CT group would demonstrate the greatest improvements in dual-task gait and balance in comparison to other groups. In addition, we predicted that the Stretch + CL group, our active control, would demonstrate the least improvements in dual-task gait and balance after training. Our findings only partially support our hypotheses, as we did find improvements in dual-task walk and balance outcomes from pre to post-test, but we did not find that our Aerobic + CT group improved to a greater degree than the other groups, or that our placebo-placebo control (Stretch + CL) improved the least.

# Training Effects

All groups demonstrated improvements in our outcome measures for physical (6MWT) and cognitive dual-task training (dual-task cost change scores). In terms of physical training, the 6MWT did not reveal any differences between our groups in performance gains, such that the groups containing aerobic training did not benefit to a greater degree than the groups containing stretch training. For the cognitive training, a majority of the group contrasts suggest that the cognitive dual-task training was beneficial in reducing dual-task costs post-test for the groups that received this specific type of training compared to the other groups who had computer lessons.

# Single and Dual-Task Walk Findings

Consistent across all groups, and confirming the dual-task effect, walk speed and accuracy were higher in single-task conditions vs. dual-task conditions. Interestingly, when examining main effects of difficulty, all groups walked faster in the harder difficulty level (1-back) when compared to the easier difficulty level (0-back) but all were less accurate in the 1-back compared to the 0-back. Increasing the speed of the motor response to dual-task conditions is similar to the facilitation findings we have found in dual-task treadmill walking (Fraser et al., 2007). When the walk speed was fixed, participants responded more rapidly to the cognitive task during walking when compared to responding in a seated position. Perhaps the ability to speed up ones response cognitively or in the present study increase walk speed in a more challenging dual-task allows for better management of the walking dual-task. The task by difficulty effect in walk speed does support that the 1-back difficulty level was more difficult than the 0-back difficulty level as the difference between single and dualtask walk speed were greatest in the 1-back condition. Further, accuracy levels overall were high, but 1-back accuracy levels are clearly poorer than 0-back accuracy levels which also support the manipulation of difficulty.

The full model also revealed that all the participants in our sample benefited from training. Post-test walk speed and accuracy values were greater than pre-test values and the differences between single and dual-task walk speeds were diminished at post-test. With the exception of a slight accuracy difference between our Aerobic + CT group and our Stretch + CT group, our walk results do not reveal specific group differences but rather demonstrate that all groups improved in their accuracy and walk speed from pre to post-test. Despite the lack of significant differences found in the full statistical model, from a clinical standpoint 0.05 m/s increase in walk speed represents a small meaningful change and 0.10 m/s represents substantial meaningful change (Perera et al., 2006; Kwon et al., 2009). In the current study, the only group that did not demonstrate a clinically significant change in walk speed from pre to post-test

#### TABLE 3 | Mean dual-task mediolateral (ML) sway values for each group and time.


Note: Standard error of the mean (SE) reported in the brackets beside the mean value.

was our active placebo control. Groups with an aerobic and/or cognitive training component demonstrated 0.09 m/s increase in walk speed or greater. As such, from a clinical point of view, it may be more beneficial to have an aerobic and/or a cognitive training component in a combined training protocol.

The results of the full model differ from the study by Schwenk et al. (2010) that targeted dual-task walk speed as a primary outcome and found significant group differences in their most difficult dual-task gait condition.

There are several possible reasons for these contrasting findings. First, Schwenk et al. (2010) chose a type of exercise training that was very close to their outcome measure, as such they successfully demonstrated near transfer (training effects in a task similar to the trained task). Our training protocol did not explicitly train dual-task walking abilities but rather focused on aerobic exercise and cognitive dual-task training at a computer. As such, our training was distinct from our outcome measure and we were assessing far transfer effects to dual-task walking. Our results suggest that any type of active intervention in a sample of sedentary older adults can improve dual-task gait. In addition, while both studies included a 12-week training protocol, in Schwenk et al.'s (2010) study the groups did not have the same number of intervention hours (control group had half the time of the experimental group), whereas all participants in our 12-week intervention were exposed to the same number of intervention hours and always trained in groups. It is unknown how this influences the differences in our findings but it is possible that the differences between our groups were reduced as they all had the same number of intervention hours. Another important difference between the studies, is that Schwenk et al. (2010) trained older adults with cognitive impairment (dementia) and we trained individuals without cognitive impairment. A meta-analysis examining the effects of aerobic exercise on cognitive performance suggests that greater improvements in memory may be seen in those with cognitive impairment compared to those without cognitive impairment (Smith et al., 2010). Although sedentary, our sample was relatively healthy, did not have cognitive impairment or functional limitations and this may have minimized the gains seen after training. One interesting similarity between the two studies is that the most consistent findings emerge in the more difficult condition. In our study, although we found robust differences in v and 1-back performances supporting a difficulty effect, it is possible that if we challenged our groups with an additional difficulty level (for example a 2-back) we would see differences between our training groups on a more challenging dual-task walk situation. Also, in the current study, we chose to focus on walk speed, in a relatively high functioning sample, other gait parameters (e.g., stride time variability) might have been more sensitive to training induced changes (Lamoth et al., 2011.

While it was important in the current study to control for the number of visits and interactions with training groups by including active control groups, another possible reason for improvements across all the groups may have been our alternate choice in physical training (stretch training). Although the stretch training in the current study did not target aerobic capacity it did have a resistance training component and the goal of improving lower body strength. The improvement in lower body strength may have transferred to improvements in walk speed. Indeed, there is some evidence that certain types of stretching/resistance training exercises can influence gait speed (Stanziano et al., 2009; van Abbema et al., 2015).

# Single and Dual-Task Balance Findings

The main findings from the mediolateral sway measures were that all participants improved from pre to post-test in Peak sway, SD, RMS. In the SD variable, there was a time by task by group interaction in which the active placebo-placebo (Stretch + CL) had a significant improvement in single-task performance from pre-to post and the other groups did not demonstrate this effect. This finding should be interpreted with caution, as direct comparisons of the groups on the single-task sway measures by means of a one-way ANOVA did not find any significant differences between the groups. The RMS sway value also demonstrated a task by time interaction in which there were no improvements pre to post in single-task performances but there were improvements in dual-task performances (reduced sway). It is important to note also that our standardized paper and pencil measure of balance confidence, the ABC scale did not reveal any differences between our groups and all groups had high scores suggesting that our findings are not influenced by a lack of confidence in balance abilities.

In terms of accuracy on the 1-back task, all groups demonstrated the typical task effect in which single-task cognitive performance (seated in a chair) was more accurate than dual-task cognitive performance (while balancing). In addition, all groups improved their accuracy significantly from pre- to post-test. Similar to the dual-task gait findings, all participants improved in several (but not all) mediolateral postural sway variables and in their cognitive accuracy during balance. Our results also suggest that sedentary adults can demonstrate reduced sway (improved balance) after an active intervention protocol. There were limited task effects in the balance data, one interaction effect with time, task, and group in which there was single-task SD sway improvement in the Stretch + CL group that wasn't apparent in the other groups. This finding is difficult to explain, as the groups did not differ in single-task SD when the one-way ANOVA comparing groups was conducted. It may be the case that the limited sample size in the balance data influenced the outcome of the larger 2 × 2 × 4 mixed ANOVA. The RMS variable seems to be sensitive to task effects demonstrating dual-task balance improvements over time across groups, but additional research with larger sample sizes is needed to truly evaluate the importance of these variables for outcome measures of dual-task balance.

# Limitations

Limitations specific to our training protocols and active control groups are discussed elsewhere (see Desjardins-Crépeau et al., 2016). Specific limitations for this portion of the study relate to the small sample size in the balance outcome measures, limiting our potential interpretation and the generalizability of results. Nonetheless, there is a very limited literature on dual-task static balance as an outcome measure for combined cognitive and physical training programs that do not train the components of the outcome measure being assessed (i.e., single and dual-task balance training). The results of this study can be used to inform selection of variables of interest in future combined intervention studies with static sway variables as outcomes. Regarding the dual-task gait outcomes, the sample size is larger and the changes in dual-task 1-back walk speed (see **Table 1**) argue against simple test re-test effects, as the three groups that had cognitive and/or aerobic training have higher gains in walk speed pre-post (Aerobic + CT: 0.09 m/s, Aerobic + CL:.12 m/s, and Stretch + CT: 0.10 m/s) than the placebo-placebo group (Stretch + CT: 0.04 m/s). Despite this, it is unclear given the findings what specifically led to improved dual-task gait performances. Given that we solicited sedentary older adults, who may have fewer outings and less social contact, it is possible that the social interaction produced by the regular group meetings contributed to improved outcomes in the current study and this not a factor that was measured in this study. It would be important in future combined cognitive and physical training studies to control for social factors of regular group sessions.

Further, it is important to note that our methodological choice of having a combined training protocol limited the number of training sessions we were able to provide to our participants on a weekly basis. Although our group has demonstrated cognitive improvements from a 12-week training protocol (Predovan et al., 2012) and balance benefits from cognitive training (Li et al., 2010) these training protocols were able to devote all the weekly sessions to a specific training type (either aerobic or cognitive) and as such might have boosted the benefits of training. In our protocol, we had to reduce the amount of cognitive training to once per week and physical training to twice per week in order to provide both cognitive and physical training in the same protocol. Wollesen and Voelcker-Rehage (2013), indicated in their review that not only did the type of training influence dual-task outcomes but the amount of training could also influence outcomes. As such, the combined protocol might have reduced the training effects as the amount of cognitive and physical training per week might not have been sufficient to demonstrate the potential synergistic effects of combined training.

#### CONCLUSION

Given the clinical importance of dual-task gait and balance assessments in identifying individuals at risk for falls and cognitive decline and the potential for training to improve

#### REFERENCES

van Abbema, R., De Greef, M., Crajé, C., Krijnen, W., Hobbelen, H., and Van Der Schans, C. (2015). What type, or combination of exercise can improve preferred gait speed in older adults? A meta-analysis. BMC Geriatr. 15:72. doi: 10.1186/s12877-015-0061-9

dual-task balance and gait, future combined training studies should include dual-task outcome measures in order to tease out what kind of training (perhaps adaptive), how often, at what intensity, would be most beneficial to improve dual-task balance and gait. Future studies, utilizing portable imaging technologies such as functional near infra-red spectroscopy will complement the behavioral findings of this study and provide more clarity on cognitive plasticity in the PFC. Certainly, the training gains seen across all groups in walk and certain balance measures in the current study suggest that encouraging sedentary older adults to actively participate in a training protocol may have transferrable benefits to dual-task gait and balance that could potentially reduce fall risk and cognitive decline in this population.

#### AUTHOR CONTRIBUTIONS

SAF: design, protocol, training of graduate students, data analysis, manuscript write-up. KZ-HL: design, principal investigator, secured funding for the project, trained graduate student for balance data analysis, provided feedback. NB: design for physical assessments (6MWT, etc.) training of all trainers for the physical portion of study and data analysis of the physical data. LD-C: responsible for all the testing, data entry, and analysis of the neuropsychological data. ML: responsible for the cognitive training and computer lesson component, and all the data related to this component. KV: responsible for converting the raw balance data to center of pressure scores, provided feedback on preliminary version of manuscript. LL: responsible for the physcial training with participants, data entry for physical and neuropsychological data. TTMV: medical doctor responsible for all the clinical assessments of participants, provided feedback on preliminary version of manuscript. LBosquet: design, physical assessment, one of the principle investigators who secured funding for the project. LBherer: design, cognitive training component, principle investigator who secured funding for the project—provided feedback on several versions of the manuscript.

#### ACKNOWLEDGMENTS

This study was supported by a Canadian Institutes of Health Research grant (#187596) to LBherer, LBosquet and KZ-HL. SAF was supported by a ''Fonds Québec Recherche en Nature et Technologies'' Postdoctoral Fellowship and LBherer was supported by the Canadian Research Chair Program. We would like to thank all the research assistants, two study co-ordinators (Mélanie Renaud and Chantal Mongeau) and all the participants. The authors would like to thank the Canadian Institutes of Health Research that supported this large mobility in aging trial.


meta-analysis. Neurosci. Biobehav. Rev. 35, 715–728. doi: 10.1016/j.neubiorev. 2010.08.008


fitness and neuropsychological outcomes in healthy older adults. Clin. Interv. Aging, 11, 1287–1299. doi: 10.2147/cia.s115711


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Fraser, Li, Berryman, Desjardins-Crépeau, Lussier, Vadaga, Lehr, Minh Vu, Bosquet and Bherer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Transfer Effects to a Multimodal Dual-Task after Working Memory Training and Associated Neural Correlates in Older Adults – A Pilot Study

#### Stephan Heinzel1,2,3, Jérôme Rimpel<sup>4</sup> , Christine Stelzel2,5,6 and Michael A. Rapp<sup>2</sup> \*

<sup>1</sup> Clinical Psychology and Psychotherapy, Freie Universität Berlin, Berlin, Germany, <sup>2</sup> Social and Preventive Medicine, University of Potsdam, Potsdam, Germany, <sup>3</sup> Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany, <sup>4</sup> Clinical Psychology and Neuropsychology, Johannes Gutenberg University Mainz, Mainz, Germany, <sup>5</sup> International Psychoanalytic University, Berlin, Germany, <sup>6</sup> Berlin School of Mind and Brain, Berlin, Germany

Working memory (WM) performance declines with age. However, several studies have shown that WM training may lead to performance increases not only in the trained task, but also in untrained cognitive transfer tasks. It has been suggested that transfer effects occur if training task and transfer task share specific processing components that are supposedly processed in the same brain areas. In the current study, we investigated whether single-task WM training and training-related alterations in neural activity might support performance in a dual-task setting, thus assessing transfer effects to higher-order control processes in the context of dual-task coordination. A sample of older adults (age 60–72) was assigned to either a training or control group. The training group participated in 12 sessions of an adaptive n-back training. At pre and post-measurement, a multimodal dual-task was performed in all participants to assess transfer effects. This task consisted of two simultaneous delayed match to sample WM tasks using two different stimulus modalities (visual and auditory) that were performed either in isolation (single-task) or in conjunction (dual-task). A subgroup also participated in functional magnetic resonance imaging (fMRI) during the performance of the n-back task before and after training. While no transfer to single-task performance was found, dual-task costs in both the visual modality (p < 0.05) and the auditory modality (p < 0.05) decreased at post-measurement in the training but not in the control group. In the fMRI subgroup of the training participants, neural activity changes in left dorsolateral prefrontal cortex (DLPFC) during one-back predicted post-training auditory dual-task costs, while neural activity changes in right DLPFC during three-back predicted visual dual-task costs. Results might indicate an improvement in central executive processing that could facilitate both WM and dual-task coordination.

#### Edited by:

Claudia Voelcker-Rehage, Chemnitz University of Technology, Germany

#### Reviewed by:

Mario Bonato, Ghent University, Belgium Daniel Barulli, Columbia University, USA

\*Correspondence: Michael A. Rapp michael.rapp@uni-potsdam.de

Received: 15 September 2016 Accepted: 13 February 2017 Published: 24 February 2017

#### Citation:

Heinzel S, Rimpel J, Stelzel C and Rapp MA (2017) Transfer Effects to a Multimodal Dual-Task after Working Memory Training and Associated Neural Correlates in Older Adults – A Pilot Study. Front. Hum. Neurosci. 11:85. doi: 10.3389/fnhum.2017.00085

Keywords: working memory, cognitive training, modality, dual-task, aging, transfer, fMRI, neuroimaging

# INTRODUCTION

fnhum-11-00085 February 23, 2017 Time: 17:51 # 2

Aging is associated with neurochemical, structural, and functional brain changes (Grady, 2012) that affect various cognitive functions. One central cognitive function which is affected by these changes and known to be declined in older age is working memory (WM) (Bopp and Verhaeghen, 2005). Neuroimaging studies have identified brain areas that play a key role in WM processing including lateral prefrontal cortex, inferior parietal lobule, as well as medial frontal regions (Miller and Cohen, 2001; Owen et al., 2005; D'Esposito, 2007). It has been suggested that efficient functioning of such a frontoparietal WM network is reduced in older adults, as indicated by relatively higher activation at low WM load and relatively lower activation at high WM load when compared to younger adults (Schneider-Garces et al., 2010; Nagel et al., 2011; Heinzel et al., 2014a). These age-related changes in WM load-dependent activation patterns have been described within the framework of the compensation-related utilization of neural circuits hypothesis (CRUNCH, Reuter-Lorenz and Cappell, 2008). Specifically, an over-recruitment of neural resources at low WM load has been associated with inefficient neural processing (Barulli and Stern, 2013).

With respect to training effects, several studies have indicated that WM training leads to an increase in WM performance (Klingberg et al., 2002, 2005; Westerberg et al., 2007; Holmes et al., 2009, 2010; Thorell et al., 2009; Strobach et al., 2014). More importantly, WM training has been shown to improve performance in a broad range of other cognitive domains, such as executive control (Olesen et al., 2004; Klingberg et al., 2005; Westerberg et al., 2007; Thorell et al., 2009; Chein and Morrison, 2010; Brehmer et al., 2012; Strobach et al., 2014), episodic memory (Dahlin et al., 2008b; Lövdén et al., 2010; Richmond et al., 2011), and fluid intelligence (Klingberg et al., 2002; Olesen et al., 2004; Jaeggi et al., 2008, 2010; Rudebeck et al., 2012; Stephenson and Halpern, 2013; Au et al., 2015). Moreover, WM training is effective in older adults and has the potential to reduce age-related WM decline (Li et al., 2008; Schmiedek et al., 2010; Richmond et al., 2011; Brehmer et al., 2012; Buschkuehl et al., 2012; Heinzel et al., 2014b, 2016). Likewise, training of specific executive control processes, socalled process-based interventions, show similar beneficial effects in young and older adults (e.g., Li et al., 2008; Karbach and Kray, 2009; Brehmer et al., 2012; Zinke et al., 2012, see Karbach and Verhaeghen, 2014 for a review). Research on dual-task training is an important field in this research domain with promising effects on both training and transfer tasks (Liepelt et al., 2011; Strobach et al., 2012, 2015). Cognitive training research assumes similar mechanisms underlying training and transfer effects. Most WM training studies suppose that training improves executive control processes that are involved in the transformation and coordination of WM contents (Baddeley, 2003; Curtis and D'Esposito, 2003; Fuster, 2004). Likewise, many dual-task training studies assume that executive control processes involved in the coordination of the two component tasks (Meyer and Kieras, 1997; Szameitat et al., 2002; Stelzel et al., 2007) is optimized via training (Liepelt et al., 2011; Strobach et al., 2012, 2015) and that learning such stimulusindependent processes forms the basis for transfer effects rather than the mere improvement in the specific component tasks.

Importantly, it has been suggested that age-related deficits in dual-task performance (Lindenberger et al., 2000; Hartley, 2001; Verhaeghen and Cerella, 2002; Dubost et al., 2006; Göthe et al., 2007; Granacher et al., 2011) and corresponding changes in neural activation (Hartley et al., 2011; Chmielewski et al., 2014) may result from an underlying WM dysfunction (Awh et al., 2006; Gazzaley and Nobre, 2012).

Considering the effectiveness of WM and dual-task training programs to achieve both training and transfer effects in older adults as well as the overlapping constructs of WM and dualtask (Sala et al., 1995; Baddeley, 1996; Hegarty et al., 2000), we assume that improvements in dual-task performance reflected by a decrease in dual-task costs can be obtained by a WM training. According to notions of "neural transfer" (Dahlin et al., 2008a; Buschkuehl et al., 2012; Heinzel et al., 2014a), a trainingrelated increase in neural efficiency of WM processing may facilitate executively demanding dual-task coordination due to an increased availability of neural resources related to dual-task coordination over and above modality-specific improvements in the component single-tasks.

To our knowledge, it has not been studied systematically if dual-task costs can be reduced by the training of a single n-back WM task in older adults. Thus, the present study aimed to contribute by answering the question whether this WM training leads to a transfer effect to dual-task performance. We trained older adults in a single n-back task with visually presented numerical stimuli (Cohen et al., 1997). The transfer dualtask consisted of a novel multimodal delayed match-to-sample paradigm that includes visual and auditory stimulus modalities. We hypothesized that the transfer from WM training effects to executive dual-task processes would improve performance in both stimulus modalities in the dual-task context, as measured by reduced dual-task costs for both tasks.

The results of the current study included older participants from a previously published training group that performed the n-back task during functional magnetic resonance imaging (fMRI) measurement pre and post-training (Heinzel et al., 2014a, 2016) as well as an unpublished control group. As reported in (Heinzel et al., 2014a, 2016), blood-oxygen-level-dependent (BOLD) signal in WM-related fronto-parietal regions was found to decrease in lower WM load after training in the training group, thus indicating a training-related increase in processing efficiency in WM (Lustig et al., 2009).

We investigated the hypothesis that training-related changes of BOLD response in literature-based WM-related regions of interest (ROIs) during one-back (low WM load) predict dual-task costs after training. According to previous research on neural correlates of both central executive components of WM (D'Esposito et al., 2000; Collette and Van der Linden, 2002; Baddeley, 2003; Mohr et al., 2006) and dualtask coordination (Goldberg et al., 1998; Herath, 2001; Loose et al., 2003; Schubert and Szameitat, 2003; Nebel et al., 2005; Yildiz and Beste, 2015), we expect that changes in BOLD response in dorsolateral prefrontal cortex (DLPFC) will be related to behavioral transfer effects to a dualtask.

# MATERIALS AND METHODS

fnhum-11-00085 February 23, 2017 Time: 17:51 # 3

#### Participants

Altogether, 38 older adults (range: 60–72 years) were recruited by announcements in local newspaper and the internet. In four participants the dual-task data was not correctly recorded due to technical failures during data acquisition. Therefore, the total sample consisted of 34 participants (see **Table 1**). Eighteen participants (11 females; mean ± SD age = 65.78 ± 3.04) were included in the training group and 16 participants (11 females, mean ± SD age = 65.00 ± 3.67) in a no-contact control group that was matched one by one to the training group according to age, sex, and education to ensure parallelization of the two groups. Thirteen participants of the training group in the current study also participated in fMRI sessions before and after the training program. Detailed fMRI results have been previously reported (Heinzel et al., 2014a, 2016). FMRI-analyses presented in the current paper specifically test the hypothesis that pre– post activation changes in a DLPFC ROI may predict results in a behavioral dual-task at T2. All participants were native German speakers, right-handed (Oldfield, 1971), had normal or correctedto-normal vision, no psychopharmacological medication or history of any psychiatric disease, and achieved 27 or more points in the Mini Mental Status Examination (MMSE, Folstein et al., 1975). Written informed consent was obtained from each participant after the procedures had been explained. The study was approved by the local Ethics Committee of the Charité Universitätsmedizin Berlin, Germany and was conducted in accordance with the Declaration of Helsinki.

# Design and Procedure

At the beginning and the end of the training/waiting period, all participants completed both an n-back and a dual-task. Note that other results from the training group including neuropsychological tests and a Sternberg transfer task are reported elsewhere (Heinzel et al., 2016). The WM training was accomplished over a period of 4 weeks and contained 12 training session of an adaptive n-back training (approximately 45 min each). Training sessions took place on Mondays, Wednesdays, and Fridays in a quiet room at St Hedwig Hospital, Berlin, Germany. The control group was not contacted during this time. All tasks were presented with the software Presentation (Version 14.9; Neurobehavioral Systems).

#### n-Back Task and Adaptive Training

The n-back task comprised two runs, each consisted of 16 blocks which were counterbalanced across subjects and presented in four pseudorandomized orders. Between the blocks, a white fixation cross was presented for 12 s. During the n-back paradigm a randomly assigned sequence of 16 numerical stimuli (0–9) was presented (Cohen et al., 1997) (**Figure 1**). The stimuli were presented separately in the center of a black screen for 500 ms. Two difficulty levels are induced by two different interstimulus intervals (ISIs) of 500 or 1500 ms (pseudorandomized between blocks). The subjects were required to indicate the re-occurrence of a number which has previously presented one, two, or three trials before (1-, 2-, 3-back) by a button press. During a zero-back condition, the participants were obliged to detect the number '0.' The respective WM load condition (0-, 1-, 2-, and 3-back) was indicated by a cue 2 s before a block began. The n-back task lasted approximately 22 min.

The training group participated in an n-back training program over a period of 4 weeks with three sessions per week, resulting in 12 training sessions. Participants accomplished three runs of the n-back task in each training session, lasting approximately 45 min. Each run consisted of 12 blocks. At run 1 in session 1, all participants began the training with the difficulty level 1 (four blocks of zero-back, four-blocks of one-back, and fourblocks of two-back, at an ISI of 1500 ms). Difficulty level was individually adapted throughout all 12 training sessions in

order to keep the task challenging during the entire training program (Doumas et al., 2009). The difficulty level of the task varied across training runs according to individual performance. Task difficulty was increased by introducing higher WM load levels and by shortening the ISI (Heinzel et al., 2014c). If a participant successfully completed one run with a hit rate of 80% or above within each block and with a false alarm rate below 15%, the next difficulty level was introduced in the following run. From level 1 to level 3, ISI was gradually decreased from 1500 to 500 ms in steps of 500 ms. At level 4, the next n-level was introduced (three-back), and zero-back was removed, i.e., participants completed 1-, 2-, and 3-back tasks. In addition, ISI was set back to 1500 ms. At level 7, four-back was introduced and one-back was removed.

#### Transfer Task

The transfer dual-task consisted of a delayed match-to-sample paradigm in which the participants had to remember previously presented visual and/or auditory target stimuli. During the probe phase of the experiment, 16 visual and 16 auditory stimuli were presented successively, while one visual and one auditory stimulus was always presented simultaneously. In the single-task conditions, participants were instructed to attend to either the visual stimuli in the visual task (**Figure 2A**) and to auditory stimuli in the auditory task (**Figure 2B**). In the dual-task condition, participants had to attend to both visual and auditory stimuli (**Figure 2C**). There were two difficulty conditions for each task (memory load 1 and memory load 2) and each condition was presented twice in a pseudo-randomized order.

#### **Visual and auditory single-tasks**

The visual stimulus set consisted of 12 meaningless white shapes (**Figure 3**) and the auditory stimulus set consisted of 12 different pairs of digits, ranging from 0 to 9 (e.g., '4; 1') and were presented by a female voice via speakers. In the encoding phase at the beginning of each block of the visual single-tasks, 1 (load 1: 'easy') or 2 (load 2: 'difficult') visual target stimuli were presented for 4000 ms (load 1) or 5000 ms (load 2), respectively. In the auditory single-task blocks, 1 (load 1) or 2 (load 2) number pairs were presented vocally as target stimuli. During this encoding phase, subjects were required to encode the target(s). Subsequently, in the probe phase of all conditions of the experiment, 16 visual and 16 auditory stimuli were presented randomly for 1000 ms each with ISIs of 1000 ms, while one visual and one auditory stimulus was always presented simultaneously. Each block included six target stimuli. Participants were requested to press the right mouse key on the laptop with the right index finger each time a target stimulus appeared. Thus, response modality was a motor response for all types of targets.

#### **Dual-task**

In the dual-task condition, 1 (load 1) or 2 (load 2) visual target stimuli and 1 or 2 auditory target stimuli were presented during the encoding phase of each block for 5000 ms (load 1) or 6000 ms (load 2). The probe phase in the dual-task conditions was identical to the single-task conditions, however, participants were required to attend to both visual and auditory stimuli at the same time. Each time one of the memorized targets matched a presented stimulus, the participant had to indicate the match by a button press.

#### Performance and Dual-Task-Costs

The absolute task performance (percent correct) was calculated as the difference of the mean of hits minus false alarms (Eq. 1):

$$\text{bits} - \text{false alarm} = \text{absolute performance} \tag{1}$$

The dual-task costs (relative dual-task performance) were calculated as the difference of the mean of single-task performance minus the mean of dual-task performance divided

by the mean of single-task performance (Eq. 2):

$$\frac{\text{Performance SSingle Task} - \text{Performance Dual Task}}{\text{Performance SSingle Task}} \cdot 100$$

$$= \text{DualTaskCost} \quad \text{(2)}$$

Analyses of transfer effects to the dual-task were focused on dualtask costs because this measure defines dual-task performance in relation to and controlling for individual differences in single task performances. Therefore, dual-task costs are a more specific measure of executive control functions that are required to simultaneously perform two tasks and might be specifically sensitive for detecting age-related changes (Göthe et al., 2007).

#### Analyses in fMRI Subgroup MR Image Acquisition and Processing

A detailed description of the MR image acquisition and processing can be derived from (Heinzel et al., 2014a). 13 participants of the older training group of the current dual-task study also participated in pre- and post- fMRI-measurements during n-back as reported in (Heinzel et al., 2014a, 2016). In the beginning of each scanning procedure, one T1-weighted 3D pulse sequence was obtained. Functional data were obtained using a gradient echo echo-planar imaging (GE-EPI) pulse sequence (TR = 2000 ms, TE = 35 ms, flip angle = 80◦ , matrix size = 64 × 64, voxel size = 3.1 mm × 3.1 mm × 3.8 mm). 31 slices were acquired approximately axial to the bicommissural plane.

#### TABLE 1 | Demographic variables.


MMSE, Mini Mental Status Examination.

#### Estimation of BOLD Effect Sizes in n-Back

The WM experiment was analyzed within the framework of the general linear model (GLM). To this end, at the single subject level, we created design matrices comprising the experimental conditions of 0-, 1-, 2-, and 3-back as separate regressors of interest and all other experimental conditions (cue, button presses, and the six rigid body realignment parameters) as regressors of no interest. The GLM was fitted voxel-wise into the filtered time series using the restricted maximum likelihood algorithm as implemented in SPM8. We computed differential contrasts 1-back vs. 0-back, 2-back vs. 0-back, and 3-back vs. 0-back. Parameter estimates of BOLD response were extracted for seven literature-based ROIs (see Heinzel et al., 2014a for the procedure) which comprised the bilateral DLPFC, rostrate cingulate zone (RCZ), lateral premotor cortex (LPMC), and intraparietal sulcus (IPS). All ROIs combined define the WM network here. Change scores of n-back activity were calculated by subtracting parameter estimates at T2 from T1.

# RESULTS

#### WM Training

In order to assess WM training success a 2 (group) by 2 (time) by 4 (WM load) repeated-measures ANOVA was conducted (**Figure 4**). The ANOVA revealed significant interactions of the factors group by time by WM load (F(3,30) = 7.309, p = 0.001, η 2 <sup>p</sup> = 0.422) and group by time (F(1,32) = 25.602, p < 0.001, η 2 <sup>p</sup> = 0.444), as well as a significant main effect of time (F(1,32) = 41.431, p < 0.001, η 2 <sup>p</sup> = 0.564) and a non-significant main effect of group (F(1,32) = 3.158, p = 0.085, η 2 <sup>p</sup> = 0.090). Post hoc two-sample t-test revealed that both groups did not differ in any condition (0-, 1-, 2-, 3-back) at time T1 (**Table 2**). At time T2 both groups did not differ in the zero-back condition (t(32) = 0.810, p = 0.424). Both groups differed significantly in the one-back (t(32) = 2.159, p = 0.038), two-back (t(32) = 3.203, p = 0.003), and three-back conditions (t(32) = 2.578, p = 0.015). The control group did not show an improvement from T1 to T2 (all p's > 0.11), whereas the training group improved significantly in WM performance from T1 to T2 for one-back (t(15) = 3.400, p = 0.003), two-back (t(15) = 7.368, p < 0.001), and three-back (t(15) = 4.568, p < 0.001, see **Table 2**).

#### Single-Task Performance (Percent Correct)

Single-task performance of visual and auditory single-tasks are reported in **Table 3**.

#### Visual Single-Task Performance (Percent Correct)

A 2 (group) by 2 (time) by 2 (load) repeated-measures ANOVA showed no significant group by time by load interaction (F(1,32) = 0.600, p = 0.444, η 2 <sup>p</sup> = 0.018). Also, the group by time interaction (F(1,32) = 0.500, p = 0.485, η 2 <sup>p</sup> = 0.015) as well as all other interactions were not significant (all p's > 0.32). A significant main effect of load (F(1,32) = 69.595, p < 0.001, η 2 <sup>p</sup> = 0.685) shows that performance decreased


with increasing load. A significant main effect of time (F(1,32) = 5.537, p = 0.025, η 2 <sup>p</sup> = 0.147) indicates general improvement in visual task performance from T1 to T2. There was no significant effect of group (F(1,32) = 0.030, p = 0.864, η 2 <sup>p</sup> = 0.001).

#### Auditory Single-Task Performance (Percent Correct)

Comparable to the findings in visual single-task performance, a 2 (group) by 2 (time) by 2 (load) repeated-measures ANOVA in auditory task performance showed no significant group by time by load interaction (F(1,32) = 0.196, p = 0.661, η 2 <sup>p</sup> = 0.006). The group by time interaction (F(1,32) = 0.508, p = 0.481, η 2 <sup>p</sup> = 0.016) as well as all other interactions were not significant (all p's > 0.44). A significant main effect of load (F(1,32) = 42.558, p < 0.001, η 2 p = 0.571) shows that performance decreased with increasing load. The main effect of time (F(1,32) = 3.938, p = 0.056, η 2 <sup>p</sup> = 0.110) was not significant. On a trend-level, this result may suggest a general improvement in auditory task performance from T1 to T2. There was no significant effect of group (F(1,32) = 0.250, p = 0.620, η 2 <sup>p</sup> = 0.008).

Taken together, the results of single-task analyses show that there is no transfer effect to any measure of singletask performance indicated by non-significant group by time interactions. This is also reflected by non-significant post hoc two-sample t-test (all p's > 0.19).

# Absolute Dual-Task Performance (Percent Correct)

Mean values and standard deviations of absolute dual-task performance are reported in **Table 3**. A 2 (group) by 2 (time) by 2 (load) by 2 (modality) repeated-measures ANOVA showed no significant four-way interaction (F(1,32) = 2.435, p = 0.129, η 2 <sup>p</sup> = 0.071) and none of the three-way interactions was significant (all p's > 22). A significant group by time interaction (F(1,32) = 4.686, p = 0.038, η 2 <sup>p</sup> = 0.128), indicated a trainingrelated improvement in dual-task performance in the training group but not in the control group independently of load or modality. A significant main effect of time (F(1,32) = 14.507, p = 0.001, η 2 <sup>p</sup> = 0.312) shows a general improvement in dual-task performance from pre- to post-test. A significant main effect of load (F(1,32) = 262.019, p < 0.001, η 2 <sup>p</sup> = 0.891) reflects a generally lower performance at high load. No main effect of group was found (F(1,32) = 0.031, p = 0.861, η 2 <sup>p</sup> = 0.001). Post hoc twosample t-test showed lower performance in the training group for visual targets during dual-task in the high load condition at T1 (t(32) = 2.163, p = 0.038) and higher performance in the training group for auditory targets during dual-task in the low load condition at T2 (t(32) = 2.074, p = 0.046). All other two-sample t-test were non-significant (all p's > 0.46).

# Relative Dual-Task Performance (Dual-Task Costs)

Mean dual-task costs and standard deviations as well as the results of the post hoc analyses are reported in **Table 4**. A 2 (group) by 2 (time) by 2 (load) by 2 (modality) repeated-measures ANOVA revealed a significant interaction of group by time by load by modality (F(1,32) = 4.559, p = 0.041, η 2 <sup>p</sup> = 0.125) and non-significant interactions of the factors group by time by modality (F(1,32) = 0.587, p = 0.449, η 2 <sup>p</sup> = 0.018) and group by time by load (F(1,32) = 1.000, p = 0.325, η 2 <sup>p</sup> = 0.030) and group by time (F(1,32) = 3.066, p = 0.090, η 2 <sup>p</sup> = 0.087). A significant main effect of time (F(1,32) = 5.648, p = 0.024, η 2 <sup>p</sup> = 0.150) indicates changes of dual-task cost from time T1 to T2. There was no significant effect of group (F(1,32) = 0.007, p = 0.937, η 2 <sup>p</sup> < 0.001). For the visual modality, post hoc two-sample t-test show that both groups differed significantly at T1 for load 2 (t(32) = 2.564, p = 0.015). From T1 to T2, dual-task costs in the training group decreased from 83 to 54% (t(17) = 3.531, p = 0.003) but did not change significantly in the control group (T1: 50%; T2: 59%, t(15) = −0.541, p = 0.596, see **Figure 5A**). Both groups did not differ at T2 for load 2 (t(32) = −0.302, p = 0.764). Within the auditory condition, post hoc t-test revealed that dual-task costs decreased in the training group from T1 (20%) to T2 (5%, t(17) = 3.324, p = 0.004) but did not change in the control group (T1: 20%; T2: 17%, t(17) = 0.405, p = 0.691) for load 1 (see **Figure 5B**). Both groups did not differ in load 1 at T1 (t(32) = −0.035, p = 0.972) but at T2 (t(32) = −2.415, p = 0.022). Within-subject comparisons can be found in **Table 4**. Dual-task costs for each condition are illustrated in **Figure 5**.

#### Analyses in fMRI Subgroup

As reported in Heinzel et al. (2014a), there was a training-related reduction in BOLD-response during n-back in the WM network. This reduction was found to be specifically strong in the low


Mean (SD); T1, pre-test; T2, post-test.


WM load condition one-back, indicating an increase in neural efficiency. Here, we tested the hypothesis whether this trainingrelated reduction in BOLD-response in the WM network and more specifically in DLPFC, can predict dual-task costs at posttest.

#### Individual Differences in the Training Group: Correlations with Dual-Task Costs

While no correlations between the entire WM network ROI (see **Figure 6C**) and dual-task costs were found (p's > 0.24), analysis in left DLPFC revealed a significant correlation between training-related reduction in one-back activity and auditory dualtask costs at post-test (r = 0.625, p = 0.022), indicating lower dual-task costs in participants that showed a stronger reduction in one-back activity (**Figure 6A**). No correlations were found between right DLPFC and auditory dual-task costs (r = 0.297, p = 0.325). No significant correlations were found between DLPFC activation changes during one-back and visual dual-task costs (p's > 0.35). However, exploratory analyses in 2- and 3-back revealed a significant negative correlation between changes in right DLPFC activity during three-back and visual dual-task costs at post-test (r = −0.711, p = 0.006, see **Figure 6B**), indicating an increase in three-back activity could have been beneficial for visual dual-task performance.

# DISCUSSION

In the present study, we aimed to investigate the influence of WM training on dual-task costs in a novel delayed match-to-sample dual-task. The results indicate that 12 sessions of numerical n-back training can improve the performance in the trained task. Moreover, we found a transfer to the performance in a dual-task. The transfer was reflected by a reduction of dual-task costs in the 'easy' auditory condition in the training but not the control group. Further, we found a reduction of dual-task costs in the 'difficult' visual condition. No training-associated changes in single task performance were found. This is in line with previous research (e.g., Liepelt et al., 2011; Bonato et al., 2013), indicating that measures of dual-task coordination seem to be more sensitive to both subtle cognitive deficits and trainingrelated changes. An additional analysis of a subsample within the training group of this dual-task study that also participated in a previously published fMRI study (Heinzel et al., 2014a, 2016), revealed that a reduction in one-back activity from preto post-test in the left DLPFC predicted dual-task costs in the auditory task at post-test. Additional exploratory analyses indicated that changes in three-back activity in right DLPFC were associated with dual-task costs in the visual task at post-test. Thus, taken together, fMRI results may suggest that a trainingrelated reduction in DLPFC activity during low WM load as well as an increase during high WM load might support dual-task performance.

#### Transfer Effects

The assumption that numerical n-back training can reduce dual-task costs in an untrained transfer task has been partly

confirmed. Unexpectedly, modality-specific transfer effects were found to be dependent on the task demand. While transfer to the auditory task occurred in the 'easy' condition, transfer to the visual task was found in the 'difficult' condition. Especially in the training group, performance in the high load condition of the visual task was lower compared to the auditory task at T1, possibly indicating a task-prioritization (Stelzel et al., 2009) biased toward the auditory task. A diminished difference between visual and auditory performance during dual task after training, supports the notion of a shift toward a more efficient dualtask coordination, accompanied by the ability to focus on both component tasks.

Since a transfer effect to both modalities of the dual-task and no transfer to the single-tasks was found, the WM training in this study might have led to an improved modality-independent executive control that was not restricted to the modality of the internal stimulus representation of the trained n-back task (verbal representation of numbers). Therefore, n-back training may facilitate the coordination of two simultaneous tasks as suggested by models and empirical work on central executive processes (Morris and Jones, 1990; Baddeley, 2000; Collette and Van der Linden, 2002). These central executive processes are thought to comprise an attentional control system that governs other WM subsystems including information storage and rehearsal (Baddeley, 2000), and can be divided into separate subfunctions [e.g., updating, inhibition, shifting, dual task coordination (Miyake et al., 2000; Collette and Van der Linden, 2002)].

The results of the current study add an important piece of information to the current cognitive training literature, as we could show that improvements in dual-task performance do not necessarily require an explicit dual-task training if the applied single task training includes executive control processes that are continuously demanding on the WM system. This notion may be derived from studies comparing different training and transfer paradigms including single choice reaction tasks (Strobach et al., 2012, 2015) and studies comparing adaptive to non-adaptive training regimes (Brehmer et al., 2011, 2012). In fact, improvements in dual-task performance in older adults may be enlarged if crucial components are specifically trained (Strobach et al., 2012) and task demand of the training task is adaptively increased according to individual performance (Brehmer et al., 2012).

Please note that the term "transfer effects" that is used in the current study refers to a relatively narrow concept of transfer ["near transfer," see taxonomy by Noack et al. (2009)], as training and transfer tasks share several process components. On the other

hand, the transfer task (dual-task) also differed in crucial aspects from the training task (e.g., attending to two modalities instead of one, use of different stimuli, complex delayed match to sample instead of updating task). Furthermore, no improvements were found in the control group, neither in the training nor in the transfer task. Thus, we are confident that improvements in the transfer task were not just due to familiarity effects in the training task. Since we only used a no-contact control group in the current study, however, we cannot rule out an additional influence of familiarity effects.

The investigation of neural mechanisms underlying "far transfer," e.g., transfer to tasks that include motor coordination such as postural control tasks could be an interesting focus for future research in older adults. In fact, recent work on cognitive-motor dual-tasking in older adults (for review see Boisgontier et al., 2013) has indicated that training-related cognitive improvements might also facilitate postural control performance, a strong predictor for risk of falls (Beauchet et al., 2009).

#### Analyses within the fMRI Subgroup

As an additional analysis in a subsample (N = 13 training participants), we investigated whether a WM training-related increase in neural efficiency in n-back can predict dual-task costs at post-test. Our hypothesis of a DLPFC-modulated transfer to auditory dual-task costs was confirmed by the current data. A reduction in left DLPFC activation during one-back from preto post-test in the training group may indicate an increase in neural efficiency in WM, which could have facilitated auditory task performance in a multimodal dual-task. However, as re-test effects cannot be excluded, this cannot be directly derived from the current investigation and requires confirmation in further studies.

Against our hypothesis, the magnitude of visual dual-task costs was not predicted by a DLPFC activity reduction during one-back. However, exploratory analyses revealed that those participants showing a training-related increase in three-back activity in right DLPFC also showed the lowest visual dualtask costs at post-test. Previous research on DLPFC functions suggests its role in higher order executive control such as chunking (Bor et al., 2003, 2004; Owen et al., 2005), maintaining (Courtney, 1998; Nee et al., 2013), updating, and manipulating of information (Owen, 1997; Roth et al., 2006; Barbey et al., 2013). Thus, a more efficient processing in, e.g., chunking may have been beneficial for dual-task performance in the present study in terms of a reduction in auditory dual-task demands by an improvement in the conjunction of stimulus information. Potentially, an overlapping internal stimulus representation

(a verbal "code," Wickens, 2002) of the training task and the auditory transfer task may have additionally supported transfer effects within this modality domain.

Hypothetically, the modality-specific findings in DLPFC may relate to a predominantly left lateralized processing of verbal information in WM as compared to predominantly right lateralized processing of visuospatial information (Smith and Jonides, 1999). More specifically, previous research indicates that right hemispheric DLPFC might be predominantly involved in controlling visuospatial WM, whereas left hemispheric DLPFC would mainly control verbal WM (Baddeley, 2003). The association between right-hemispheric DLPFC activity changes during three-back and transfer to visual dual-task could relate to a training-related improvement in neural capacity (Barulli and Stern, 2013) in some subjects as discussed in the model of training-related neural adaptations by Lustig et al. (2009). This potential capacity adaptation may have facilitated the visuospatial task performance within the dual-task condition. However, due to the small sample size of the current pilot study and limitations in study design, these interpretations are only preliminary.

#### General Limitations and Perspectives

There are several limitations that need to be taken into account when interpreting the results of this study. First, we did not measure fMRI during the dual-task. Thus, we cannot make reliable statements about neuronal effects during dual-task processing in this study. The pre–post fMRI results reported here are based on a relatively small sample and no pre–post fMRI data from the dual-task control group was available. Therefore, any kind of conclusion should be made with restraints as re-test effects cannot be excluded. Further limitations are the lower dualtask performance for visual targets in the high load condition of the training compared to the control group at pre-test and very high performance in both groups in single tasks at low load. Thus, the possibility for further improvement was restricted in the easy conditions of the single tasks. Future studies should include larger samples and compare different age groups such as children and young adults. This would increase the significance in terms of allowing extensive assumptions about cognitive plasticity across the life span. Also, for reasons of feasibility, we only included a no-contact control group. Therefore, social interaction or other unspecific effects associated with the training procedure might have influenced results in the training group and the influence familiarity/practice effects cannot be ruled out. In future studies, an active control group should be included in the study design. Another limitation is related to the sustainability of the results. The current experimental design does not allow any assumptions about long-lasting training or transfer effects. Future studies should take account of implementing follow-up measures.

#### REFERENCES

Au, J., Sheehan, E., Tsai, N., Duncan, G. J., Buschkuehl, M., and Jaeggi, S. M. (2015). Improving fluid intelligence with training on working memory: a meta-analysis. Psychon. Bull. Rev. 22, 366–377. doi: 10.3758/s13423-014-0699-x

#### CONCLUSION

In the current study, a 4-week WM training intervention (single n-back task) in a sample of older adults was associated with an increase in WM performance in the training group but not in an untrained control group. Furthermore, a transfer effect from single-task n-back training to dual-task performance was reported in older adults for the first time. Additional analyses of training-related changes of BOLD response during WM processing in relation to post-training dual-task performance provide preliminary evidence for neural underpinnings of this transfer effect. The findings support the notion of a trainingrelated increase in neural efficiency, as indicated by a reduced activity in DLPFC during one-back performance that may have facilitated the reduction of auditory dual-task costs at posttest after training. Additional analyses in right DLPFC during performance of three-back suggest that increased WM capacity might support performance in a visuospatial dual-task condition. Findings may indicate a training-related improvement of central executive functioning.

#### AUTHOR CONTRIBUTIONS

SH and MR designed the study. SH programmed the experiments. SH and JR supervised and conducted the experiments. SH, JR, CS, and MR analyzed the data. SH, JR, CS, and MR wrote the paper.

## FUNDING

This work was supported by a German National Academic Foundation scholarship to SH, in part by the German Ministry for Education and Research (BMBF 01QG87164 and 01GS08195 and 01GQ0914) and in part by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG, FOR 1617: grant RA1047/2-1; DFG SPP 1772 "Human performance under multiple cognitive task requirements": grants RA1047/4-1 and HE7464/1-1), and by a MaxNetAging award to MR.

### ACKNOWLEDGMENTS

The authors would like to acknowledge the Open Access Publishing Fund of the University of Potsdam supported publication of this work. Fractions of the current work are part of an unpublished thesis. We thank Laura Oliveras Puig, Wolf-Rüdiger Brockhaus, Sabrina Saase, and Sven Breitmeyer for assistance during data acquisition and Robert Lorenz for support during fMRI analyses.

Awh, E., Vogel, E. K., and Oh, S.-H. (2006). Interactions between attention and working memory. Neuroscience 139, 201–208. doi: 10.1016/j.neuroscience.2005. 08.023

Baddeley, A. (1996). Exploring the central executive. Q. J. Exp. Psychol. A 49, 5–28. doi: 10.1080/713755608



neuroimaging. Eur. J. Neurosci. 9, 1329–1339. doi: 10.1111/j.1460-9568.1997. tb01487.x


Zinke, K., Zeintl, M., Eschen, A., Herzog, C., and Kliegel, M. (2012). Potentials and limits of plasticity induced by working memory training in old-old age. Gerontology 58, 79–87. doi: 10.1159/000324240

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Heinzel, Rimpel, Stelzel and Rapp. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# A Randomized Controlled ERP Study on the Effects of Multi-Domain Cognitive Training and Task Difficulty on Task Switching Performance in Older Adults

#### Kristina Küper\*, Patrick D. Gajewski , Claudia Frieg and Michael Falkenstein

Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany

Executive functions are subject to a marked age-related decline, but have been shown to benefit from cognitive training interventions. As of yet, it is, however, still relatively unclear which neural mechanism can mediate training-related performance gains. In the present electrophysiological study, we examined the effects of multi-domain cognitive training on performance in an untrained cue-based task switch paradigm featuring Stroop color words: participants either had to indicate the word meaning of Stroop stimuli (word task) or perform the more difficult task of color naming (color task). One-hundred and three older adults (>65 years old) were randomly assigned to a training group receiving a 4-month multi-domain cognitive training, a passive no-contact control group or an active (social) control group receiving a 4-month relaxation training. For all groups, we recorded performance and EEG measures before and after the intervention. For the cognitive training group, but not for the two control groups, we observed an increase in response accuracy at posttest, irrespective of task and trial type. No training-related effects on reaction times were found. Cognitive training was also associated with an overall increase in N2 amplitude and a decrease of P2 latency on single trials. Training-related performance gains were thus likely mediated by an enhancement of response selection and improved access to relevant stimulus-response mappings. Additionally, cognitive training was associated with an amplitude decrease in the time window of the target-locked P3 at fronto-central electrodes. An increase in the switch positivity during advance task preparation emerged after both cognitive and relaxation training. Training-related behavioral and event-related potential (ERP) effects were not modulated by task difficulty. The data suggest that cognitive training increased slow negative potentials during target processing which enhanced the N2 and reduced a subsequent P3-like component on both switch and non-switch trials and irrespective of task difficulty. Our findings further corroborate the effectiveness of multidomain cognitive training in older adults and indicate that ERPs can be instrumental in uncovering the neural processes underlying training-related performance gains.

#### Keywords: cognitive training, task difficulty, aging, Stroop switch task, switch positivity, P3, P2, N2

Edited by:

Louis Bherer, Université de Montréal, Canada

Reviewed by:

Tilo Strobach, Medical School Hamburg, Germany Bruno Kopp, Hannover Medical School, Germany

> \*Correspondence: Kristina Küper kueper@ifado.de

Received: 19 October 2016 Accepted: 28 March 2017 Published: 12 April 2017

#### Citation:

Küper K, Gajewski PD, Frieg C and Falkenstein M (2017) A Randomized Controlled ERP Study on the Effects of Multi-Domain Cognitive Training and Task Difficulty on Task Switching Performance in Older Adults. Front. Hum. Neurosci. 11:184. doi: 10.3389/fnhum.2017.00184

# INTRODUCTION

All planned goal-directed behavior is mediated by executive control functions, such as selective attention, working memory, the inhibition of irrelevant information or the selection and coordination of relevant task sets. Previous research has indicated that these functions are subject to a marked age-related decline beginning as early as in midlife (Li et al., 2004; Sander et al., 2012). Given the crucial role of executive control functions for activities of daily living, age-related deficits in this domain can be particularly detrimental to the well-being and autonomy of older adults (Burgess et al., 1998; Jonides et al., 2008). Cognitive functions show a remarkable degree of plasticity across the lifespan, however, and can hence benefit from different types of training interventions up to a very old age (Hertzog et al., 2008; Karbach and Schubert, 2013; for reviews see Kueider et al., 2012; Ballesteros et al., 2015).

Cognitive training regimen which focus on a single domain or task, such as working memory or task switching, have been shown to consistently improve performance in the trained task (for meta-analyses, see Karbach and Verhaeghen, 2014; Lampit et al., 2014b; Au et al., 2015). Transfer of such training gains to untrained tasks or everyday functioning appears to be more limited, however, and has been reported only in some cases (Karbach and Verhaeghen, 2014; Au et al., 2015), but not in others (Ball et al., 2002; Melby-Lervåg and Hulme, 2013; Melby-Lervåg et al., 2016).

It has been hypothesized that a substantial overlap between the processes underlying performance in the training task and those underlying performance in the transfer task is necessary for successful transfer to occur (Jonides, 2004; Dahlin et al., 2008; Lustig et al., 2009; Buschkuehl et al., 2012). In light of this, cognitive training interventions which focus not only on a single function but on multiple cognitive functions have recently been discussed as a more effective training measure which may potentially yield broader transfer effects (Gates and Valenzuela, 2010; Karbach, 2014). In keeping with this, cognitive training programs integrating multiple tasks have shown both near and far transfer effects to measures of perceptual processing, working memory updating, memory accuracy and reasoning (Mahncke et al., 2006; Wild-Wall et al., 2012; Walton et al., 2014; Baniqued et al., 2015). Moreover, Lampit et al. (2014a) reported transfer gains from a multi-domain training aimed at reasoning, memory, attention and visuo-spatial abilities to a bookkeeping task closely mirroring a real-world work scenario. In direct comparisons to single-domain interventions, multi-domain cognitive training has additionally been associated with more pronounced benefits in far transfer tasks measuring executive attentional control (Binder et al., 2016) and increased longevity of training-related performance benefits (Cheng et al., 2012).

Recent resting-state fMRI studies have been able to offer some insights into the neural processes which may mediate performance gains associated with multi-domain cognitive training: older adults who have undergone multi-domain cognitive training show increased neural lateralization and functional connectivity (Cao et al., 2016; Li et al., 2016; Luo et al., 2016; see also Binder et al., 2017, for similar electrophysiological data). Such brain activation patterns are commonly found in much younger adults suggesting that multi-domain cognitive training may be able to compensate age-related changes to neural connectivity at least to some degree. In keeping with this, multi-domain cognitive training has also been associated with a reduction in age-related cortical thinning in fronto-temporal areas (Kim et al., 2015; Jiang et al., 2016).

A drawback of these imaging studies is that they employed predominantly passive control groups. It is thus possible that the structural differences observed between training and control groups reflect differences in general activity rather than trainingspecific benefits (see Redick et al., 2013). Moreover, it is still relatively unclear what functional consequences the observed structural changes may have, especially for the crucial domain of executive control. In two previous studies, we thus compared middle-aged and older adults who had undergone multi-domain cognitive training to both active and passive control groups and examined event-related potential (ERP) indices of performance in a task switching paradigm (Gajewski and Falkenstein, 2012; Gajewski et al., 2017).

Task switching paradigms have the advantage of yielding indices of multiple distinct subcomponents of executive control. The paradigm requires participants to attend to two or more different tasks in distinct experimental blocks. In single blocks, participants always have to perform only one of the tasks whereas they have to flexibly switch between different tasks on a trialby-trial basis in the mixed block. Mixed blocks thus feature stay trials on which the same task as in the preceding trial has to be performed and switch trials on which a different task has to be performed. In memory-based switch tasks, participants have to memorize a fixed task order for these mixed blocks. In cue-based paradigms, the task order is instead random and a cue preceding each target stimulus indicates which task is to be performed.

Despite the fact that both single and stay trials constitute task repetitions, performance on trials in the single block is usually better than on stay trials in the mixed block. These general switch costs or mixing costs have been interpreted as indexing the ability to maintain a task set in the context of a different, interfering task set. The ability to flexibly switch between tasks on a trial-by trial basis is instead reflected in specific or local switch costs, i.e., performance decrements in switch trials relative to stay trials (Allport et al., 1994; Rogers and Monsell, 1995; Meiran, 1996; see Kiesel et al., 2010, for a review). Age-comparative research has indicated that older adults show increased general switch costs relative to younger adults, but similar specific switch costs (Kramer et al., 1999; Kray and Lindenberger, 2000; Mayr, 2001). It thus appears that aging negatively affects the ability to simultaneously maintain and coordinate distinct task sets, but leaves task switching abilities relatively intact.

Our previous training studies have indicated that multidomain cognitive training has the potential to compensate at least some of this age-related deficit and can lead to a reduction in general switch costs (Gajewski and Falkenstein, 2012; Gajewski et al., 2017). In our studies, these performance gains were accompanied by amplitude increases in three ERPs, the N2, the P3 and the error negativity (Ne). The targetlocked N2 is a negative deflection with a fronto-central maximum, which has been linked to both the detection of stimulus novelty and cognitive control, in terms of response inhibition and the resolution of response conflict during responding (Folstein and Van Petten, 2008). It is thus thought to reflect the implementation of stimulus-response associations, i.e., response selection, which is hampered on interference or conflict trials (Gajewski et al., 2008). In the context of task switching paradigms, the N2 has been found to be decreased in latency and amplitude for task repetitions (Gajewski et al., 2010a). A training-related increase in N2 amplitude may thus reflect improved response selection. The subsequent target-P3, a positive deflection with a parietal focus, has been associated with the allocation of processing resources, specifically memory operations (e.g., Polich, 2007). As such, its amplitude is largest for single trials, intermediate for stay trials and lowest for switch trials (Kieffaber and Hetrick, 2005; Jost et al., 2008; Gajewski et al., 2010b; Gajewski and Falkenstein, 2012). A training-related increase in P3 amplitude can thus be interpreted in terms of improved resource allocation. Finally, the error negativity (Ne or error-related negativity, ERN) is an early negative deflection which is elicited by the detection of a response error (Falkenstein et al., 1991). A training-related increase in Ne amplitude thus reflects improvements in error monitoring.

In the present study, we wanted to corroborate and extend our previous findings on the functional neural processes mediating gains in executive functioning associated with multi-domain cognitive training in older adults. To this end, we employed a different task-switching transfer task than in our earlier study which featured two rather than three distinct tasks and introduced different levels of task difficulty. In order to gain more thorough insights into neural processes involved in task preparation and response selection we examined not only the ERP components described in our earlier study, but also the switch positivity during advance task preparation (Karayanidis et al., 2010) and the target-locked P2 (Kieffaber and Hetrick, 2005).

One-hundred and three healthy older adults were randomly assigned to either a multi-domain cognitive training group, an active control group receiving relaxation training or a passive no-contact control group. At pretest and posttest, we used a binary cue-based switch paradigm featuring two tasks with asymmetric difficulty levels as transfer task. Participants had to indicate either the font color or the word meaning of Stroop stimuli, that is color words which were printed in colored fonts which could either be congruent to the word meaning (i.e., the word ''yellow'' presented in yellow font) or incongruent to it (i.e., the word ''red'' presented in yellow font). Word reading is the dominant behavior in this context, rendering the color task much more difficult than the word task (Stroop, 1935). Previous research has indicated that age-related cognitive deficits can be exacerbated with increasing task difficulty (e.g., Bierre et al., 2016). As of yet, relatively little is known, however, about the impact of task difficulty on training and transfer gains in older adults. An fMRI study by Brehmer et al. (2011) has indicated that training-related benefits to neural efficiency may come to bear mainly under more difficult task conditions for this age group. The present study aimed to further examine this issue and pinpoint the specific neural processes which may benefit from multi-domain cognitive training under difficult as compared to easy task conditions. In order to do this, we examined two additional ERP components which were omitted in our earlier study on older adults (Gajewski and Falkenstein, 2012), the switch positivity and the target-locked P2. The cue-locked switch positivity has been linked to anticipatory processes associated with task-set reconfiguration. Its amplitude is thus highest for switch trials, intermediate for stay trials and lowest for single trials (Eppinger et al., 2007; Wylie et al., 2009; Jamadar et al., 2010; Karayanidis et al., 2010, 2011). The target-locked P2, an early positive deflection with a fronto-central focus, is reduced on switch compared to stay trials and has thus been related to the retrieval of stimulusresponse bindings (Kieffaber and Hetrick, 2005; Schapkin et al., 2014).

Irrespective of task difficulty, we expected to observe performance gains from pretest to posttest in the multi-domain cognitive training group but not in the two control groups. In keeping with our earlier study (Gajewski and Falkenstein, 2012), we further expected that these performance gains would be accompanied by modulations of ERP components indexing the resolution of response conflict during response selection (N2) and the allocation of cognitive resources (target-locked P3) as well as the retrieval of stimulus-response bindings (P2) and anticipatory task-set reconfiguration (switch positivity). Based on previous research (Brehmer et al., 2011), we expected between-group differences to be more pronounced under difficult task conditions.

## MATERIALS AND METHODS

#### Participants

Participants were independently living, healthy older adults which were screened for sufficient visual and auditory acuity. Other exclusion criteria were a history of cardio-vascular, motor, oncological, psychiatric or neurological diseases. Participants were also excluded from the study if their self-reported cognitive training activity exceeded 1.5 h per week. As a result of this screening procedure, 32.5% of applicants were included in the study. After completing the pretest, the 114 participants were randomly assigned to a training group receiving a 4-month multi-domain cognitive training (remaining N = 32; 20 female, 65–82 years old, mean age: 70.5 years, seven drop-outs due to illness, relocation, technical malfunctions), a passive no-contact control group (remaining N = 37; 21 female, 65–88 years old, mean age: 70 years, two drop-outs due to technical malfunctions) and an active (social) control group receiving a 4-month relaxation training (remaining N = 34; 21 female, 65–87 years old, mean age: 70.9 years, two drop-outs due to illness). All groups were comparable with respect to age, education and cognitive status as assessed by Mini Mental State Examination (MMSE German version, Kessler et al., 2000), verbal IQ (MWT-B, Lehrl, 2005), forward and backward digit span and versions A and B of the Trail-Making Test (see, Wild-Wall et al., 2012, for details). All participants were included in the behavioral data analyses. Due to malfunctions of the EEG equipment, data from six participans (one of the cognitive control group, one of the active control group and three of the passive control group) could not be included in the ERP analyses. The study was carried out in accordance with the Declaration of Helsinki and with the recommendations of the local ethics committee of the Leibniz association. All participants gave written informed consent and received 100 Euro to recompense them for travel expenses.

# Multi-Domain Cognitive Training and Relaxation Training

Participants in the cognitive training and the active (social) control group completed two 90-min training sessions per week across a period of 4 months. Both trainings were conducted by payed professional trainers in small groups of no more than 12 participants. Participants who had missed regular sessions had the opportunity to take part in two additional sessions after the regular training had been completed. Participants were otherwise not encouraged to train outside of the regular training sessions.

Participants in the cognitive training group were first given basic information on cognitive functions, their relevance for activities of daily living and the impact of aging on these functions. Participants additionally learned memory strategies, such as the method of loci. Subsequently, participants completed 4 weeks of paper-pencil-based exercises focused on improving processing speed, selective attention, short-term memory span, verbal fluency and arithmetic and reasoning skills (sudokus; MAT, Lehrl et al., 1994; Klauer, 2008). Simultaneously, participants without prior computer experience were familiarized with the use of a computer mouse and keyboard. In the final cognitive training phase, participants completed computer-based cognitive exercises focused on perceptual speed, selective attention and memory (peds Braintrainer<sup>1</sup> ; mentaga GYM<sup>2</sup> ; Mental Aktiv<sup>3</sup> ). This multi-domain cognitive training regimen did not include a task-switching task, a Stroop task or a combination thereof. A more thorough description of the multi-domain cognitive training has already been published elsewhere (see appendix of Gajewski and Falkenstein, 2012).

The relaxation training of the active (social) control group was comprised of gymnastic, back therapy, muscle relaxation and stretching exercises as well as techniques from autogenous training, progressive muscle relaxation, Qigong and massage therapy. The training also included elements of health education, giving basic information about healthy nutrition, the negative effects of addictive substances, such as alcohol and nicotine and the benefits of physical exercise.

#### Pre- and Posttest Procedure

Before and after the interventions, participants completed pretest and posttest sessions, respectively. These included sociodemographic questionnaires (pretest only), paper-and-pencilbased neuropsychological tests and computer-based cognitive tests with concurrent EEG-recording. While the present study focuses on the Stroop switch task, data from other cognitive tasks has been reported elsewhere (Gajewski and Falkenstein, 2012; Wild-Wall et al., 2012). Cognitive testing and EEG-recording were conducted in a dimly lit, electrically-shielded and soundattenuated room. All participants were tested individually and were seated 90 cm from a 15 inch CRT monitor with a resolution of 640 × 480. Stimulus presentation and response acquisition were controlled by an IBM-compatible computer running MS-DOS.

In the Stroop switch task, participants had to indicate either the font color or the word meaning of Stroop stimuli. Stroop stimuli (10 × 5–7 mm) were the German words ''rot'', ''grün'', ''blau'' and ''gelb'' (red, green, blue and yellow, respectively) which were presented on a black background in one of four colored fonts (red, green, blue and yellow). Fonts could either be congruent to the word meaning (i.e., the word ''yellow'' presented in yellow font) or incongruent to it (i.e., the word ''red'' presented in yellow font). Prior to the Stroop stimulus, participants were presented with a cue which indicated which task had to be performed in the current trial. A white square (37 × 37 mm) indicated that font color was the relevant stimulus dimension, whereas a white diamond (37 × 37 mm) indicated that participants had to respond to the word meaning. Responses were given by pressing one of four response keys which were mounted in a response box and each corresponded to a specific color.

A given trial thus began with the presentation of a fixation cross for 300 ms, followed by the presentation of the cue (diamond/square). After 1000 ms, the Stroop stimulus appeared within the cue and remained onscreen until the participant had responded by pressing one of the four response keys. Fivehundred milliseconds after the participant's response a positive (plus sign) or negative (minus sign) feedback appeared. When the reaction time exceeded 2500 ms, the word ''schneller'' (faster) was presented in addition to the feedback in order to encourage participants to respond more quickly on subsequent trials.

Participants completed a total of 250 trials (50% congruent and 50% incongruent) in three distinct experimental blocks. The first two blocks were single blocks in which participants always had to perform either the word (52 trials) or the color task (52 trials). In the subsequent mixed block (146 trials), participants instead had to perform the color task (73 trials) or the word task (73 trials) in random order, as signaled by the cue. Thirty-six trials of each task type were stay trials in which the same task as in the preceding trial had to be performed. The other 37 trials were switch trials in which a different task than in the preceding trial had to be completed. Across all conditions, half of the trials featured congruent Stroop stimuli whereas the other half featured incongruent Stroop stimuli.

<sup>1</sup>www.ahano.de

<sup>2</sup>www.mentage.com

<sup>3</sup>www.mental-aktiv.de

# Electrophysiological Recording and Analyses

EEG activity was recorded continuously from 32 active BioSemi Pin-Type electrodes arranged according to the extended 10–20 system in a preconfigured cap (Easy Cap, Easycap GmbH, Herrsching-Breitbrunn, Germany). Electrodes were placed at positions Fp1, Fpz, Fp2, F7, F3, Fz, F4, F8, FC3, FCz, FC4, T7, C3, Cz, C4, T8, CP3, CPz, CP4, P7, P3, Pz, P4, P8, PO3, POz, PO4, O1, Oz and O2. Eight additional electrodes were used to record the EOG and activity at the left and right mastoids. In the Bio-Semi system, ground and reference electrodes are replaced by a feedback loop between an active and a passive electrode at positions C1 and C2, respectively. Impedances for all electrodes were kept below 10 k. Signals were digitized with a BioSemi Active Two amplifier at a sampling rate of 2048 Hz and a bandpass of 0.01–140 Hz.

For off-line analysis, data were downscaled to a sampling rate of 1000 Hz and digitally bandpass filtered at 0.05–17 Hz. The first trial of each experimental block and trials with an incorrect, very fast (<100 ms) or very slow (>2500 ms) response were excluded from further analyses. The EEG was segmented into cue-locked and target-locked epochs and baseline-corrected with respect to the 100 ms pre-stimulus interval. Vertical and horizontal ocular artifacts were corrected off-line (Gratton et al., 1983), while trials with other artifacts (maximum amplitude in the segment, ±150 µV; maximum voltage step between two successive sampling points, 50 µV; maximum difference between two sampling points within the segment, ±300 µV, lowest activity in a 100 ms interval, 0.5 µV) were excluded from averaging. Electrodes were re-referenced to linked mastoids. ERPs were averaged separately for each of the two tasks (color, word) and the three trial types (single, switch, stay).

The switch positivity and the target-locked P3 were quantified as mean amplitudes between 300 ms and 600 ms post-cue and post-stimulus, respectively, at electrode positions Fz, Cz and Pz. The target-locked P2 was quantified as the most positive local amplitude between 150 ms and 300 ms after stimulus onset at FCz. The subsequent target-locked N2 was measured as the most negative local amplitude between 200 ms and 400 ms post-stimulus at Cz. Electrode positions and time windows were selected on the basis of previous research (e.g., Polich, 2007; Folstein and Van Petten, 2008; Gajewski and Falkenstein, 2012; Schapkin et al., 2014).

# RESULTS

# Behavioral Data

The first trial of each experimental block was excluded from further analyses. Trials with an incorrect, very fast (<100 ms) or very slow (>2500 ms) response were not included in the reaction time analyses. Mean accuracy and reaction times as well as the linear integrated speed-accuracy score (LISAS = mean reaction timescondition + standard deviation of reaction timestotal/standard deviation of proportion of errorstotal × proportion of errorscondition; Vandierendonck, 2016) were computed (see **Figure 1**). Due to lack of errors, LISAS could not be computed for one participant from the cognitive training group. All behavioral parameters were analyzed in separate analysis of variances (ANOVAs) with the within-subject factors Test Time (pretest, posttest), Trial Type (single, switch, stay) and Task (color task, word task) and the between-subject factor Group (cognitive training, active control, passive control). For all parameters, we additionally computed mixing/general switch costs (stay trials − single trials) and specific/local switch costs (switch trials − stay trials) which were submitted to separate ANOVAs with the within-subject factors Test Time and Task and the between-subject factor Group. Results were Greenhouse-Geisser corrected, where appropriate. For the sake of brevity, we only list significant effects involving the factor Test Time. In order to specify training-related changes, these effects were further analyzed with Bonferroni post hoc tests.

In the reaction time data, we observed a significant main effect of Test Time (F(1,100) = 8.21, p < 0.01, η 2 <sup>p</sup> = 0.08) and a significant interaction of Test Time × Trial Type × Task (F(2,200) = 5.48, p < 0.01, η 2 <sup>p</sup> = 0.05). Post hoc tests indicated that for the color task, reaction times decreased from pretest to posttest on single and stay trials (both ps < 0.01), but not on switch trials (p = 0.18). In the word task, reaction time benefits were limited to switch trials (p < 0.01) and stay trials (p < 0.05), but did not emerge on single trials (p = 0.68). None of the effects involving interactions of the factors Test Time and Group were significant.

The accuracy data yielded a significant main effect of Test Time (F(1,100) = 4.1, p < 0.05, η 2 <sup>p</sup> = 0.04) and a significant interaction of Test Time × Group (F(2,100) = 3.4, p < 0.05, η 2 <sup>p</sup> = 0.06). Bonferroni post hoctests showed reduced error rates at posttest compared to pretest only in the cognitive training group (p < 0.01), but not in the two control groups (both ps > 0.84). Baseline accuracy at pretest was equivalent for the three groups (all ps > 0.99).

For the LISAS data, we similarly found a significant main effect of Test Time (F(1,99) = 12.21, p < 0.01, η 2 <sup>p</sup> = 0.11) as well as a significant interaction of Test Time × Group (F(2,99) = 3, p = 0.05, η 2 <sup>p</sup> = 0.06). In keeping with the accuracy data, only the cognitive control group showed decreased LISAS, i.e., better performance, at posttest compared to pretest (p < 0.001, both control groups, ps > 0.18). At pretest, we observed no significant differences in baseline performance between the three groups (all ps > 0.89). The interaction of Test Time × Task (F(1,99) = 6.01, p < 0.05, η 2 <sup>p</sup> = 0.06) was also significant. Post hoc tests nevertheless showed significant performance benefits from pretest to posttest in both the color task (p < 0.001) and in the word task (p < 0.05).

Our analysis of mixing/general switch costs (stay trials − single trials) and specific/local switch costs (switch trials − stay trials) yielded no significant effects involving the factor Test Time for either accuracy, reaction times or LISAS.

#### Behavioral Data Summary

Error rates and LISAS indicated performance gains from pretest to posttest only for the cognitive training group, but not for the two control groups. Training-related performance gains emerged for all trial types and could thus not be attributed

to reductions of mixing/general switch costs or specific/local switch costs. All groups showed decreased reaction times at posttest compared to pretest. These unspecific practice effects were subject to the task participants had to perform: for the easier word task, reaction time benefits emerged under mixing conditions. For the more difficult color task, they were instead limited to single and stay trials which did not require a task set reconfiguration.

Cz and Pz as a function of session (pretest, posttest) and trial type (single, switch, stay). Time scaling ranges from −100 ms to 1100 ms around cue onset and positive deflections are displayed downward. Cognitive training was associated with increased posttest amplitudes of the highlighted cue-locked P3 on stay trials (green lines).

## ERP Data

The switch positivity and the target-locked P3 were analyzed in separate ANOVAs with the within-subject factors Electrode Position (Fz, Cz, Pz), Test Time (pretest, posttest), Trial Type (single, switch, stay) and Task (color task, word task) and the between-subject factor Group (cognitive training, active control, passive control). Target-locked P2 amplitude at FCz and N2 amplitude at Cz were analyzed in separate ANOVAs with the within-subject factors Test Time (pretest, posttest), Trial Type (switch, stay) and Task (color task, word task) and the between-subject factor Group (cognitive training, active control, passive control). Results were Greenhouse-Geisser corrected, where appropriate. Significant effects involving the crucial factor Test Time are listed and were further analyzed with Bonferroni post hoc tests.

## Cue-Locked Switch Positivity Amplitude

Analyses of the switch positivity amplitude yielded a significant main effect of Test Time (F(1,98) = 5.79, p < 0.05, η 2 <sup>p</sup> = 0.06) as well as interactions of Test Time × Electrode Position × Trial Type × Task (F(4,392) = 3.96, p < 0.01, η 2 <sup>p</sup> = 0.04) and, crucially, Test Time × Group × Electrode Position × Trial Type (F(8,392) = 4.83, p = 0.05, η 2 <sup>p</sup> = 0.04). As illustrated in **Figure 2**, post hoc tests indicated that in the cognitive training group, stay trials elicited higher switch positivities at posttest compared to pretest at all sites (Fz and Cz, ps < 0.05; Pz, p = 0.05; all other ps > 0.14). In the active control group, higher posttest amplitudes emerged at Cz for stay trials (p < 0.05) and Pz showed a trend towards pre-post differences for single trials (p = 0.07; all other ps > 0.15). For the passive control group, on the other hand, we found no reliable pre-post differences at any electrode position (all ps > 0.29).

#### Target-Locked P2 Amplitude and Latency

For P2 amplitudes at FCz, we observed a significant interaction of Test Time × Task (F(1,98) = 4.93, p < 0.05, η 2 <sup>p</sup> = 0.05), yet post hoc tests indicated no significant changes from pretest to posttest for either the color or the word task (both ps > 0.11). None of the effects involving the factor Group reached significance. The P2 latency analysis yielded a significant main effect of Test Time (F(2,196) = 4.6, p < 0.05, η 2 <sup>p</sup> = 0.05) and a significant interaction of Test Time × Group × Trial Type (F(4,196) = 2.42,

p = 0.05, η 2 <sup>p</sup> = 0.05). Post hoc tests showed reduced P2 latencies at posttest compared to pretest, but only on single trials of the cognitive training group (p < 0.01; all other ps > 0.14, see **Figure 3**).

#### Target-Locked N2 Amplitude and Latency

The analysis of N2 amplitudes at Cz, yielded a significant interaction of Test Time × Group (F(2,98) = 3.36, p < 0.05, η 2 <sup>p</sup> = 0.06). Post hoc tests showed an increase in N2 amplitude from pretest to posttest for the cognitive training group (p < 0.01), but not for the two control groups (both ps > 0.34, see **Figure 3**). The N2 latency analysis showed a significant interaction of Test Time × Trial Type × Task (F(2,196) = 6.25, p < 0.01, η 2 <sup>p</sup> = 0.06). According to post hoc tests, N2 latencies were shortened from pretest to posttest only on stay trials of the color task (p < 0.01; all other ps > 0.13).

#### Target-Locked P3 Amplitude

For the target-locked mean P3 amplitude, we observed significant interactions of Test Time × Electrode Position (F(2,196) = 4.67, p < 0.05, η 2 <sup>p</sup> = 0.05), and, importantly, Test Time × Group × Electrode Position × Trial Type (F(8,392) = 2.72, p < 0.01, η 2 <sup>p</sup> = 0.05). For both control groups, post hoc tests showed no significant amplitude differences between pre- and posttest, irrespective of trial type and electrode position (all ps > 0.21; see **Figure 3**). For the cognitive training group, on the other hand, P3 amplitudes decreased from pretest to posttest: at Fz, amplitude decreases were significant for single and switch trials (both ps < 0.01). At Cz, we observed significant decreases for switch and stay trials (both ps < 0.05) and a trend towards a decrease for single trials (p = 0.07; Pz, all ps > 0.39).

#### ERP Data Summary

For all trial types, cognitive training was associated with an increase of the target-locked N2 and a subsequent amplitude decrease at fronto-central locations in the time range of the target-locked P3. On single trials, we additionally observed a training-related reduction of target-locked P2 latencies. In the cue-target interval, the switch positivity on stay trials was increased at posttest for both the cognitive training group and the active control group. The passive control showed no significant ERP differences between pretest and posttest.

# DISCUSSION

The present study was aimed at examining the effects of multidomain cognitive training and task difficulty on executive functions in older adults. To this end, healthy older adults were randomly assigned to a passive control group, an active control group receiving 4 months of relaxation training and a cognitive training group who completed 4 months of multidomain cognitive training. In pre- and posttest sessions, we recorded behavioral and ERP indices of performance in an untrained task switching paradigm featuring Stroop stimuli. Participants had to attend two tasks with different difficulty levels, an easier word reading task and a more difficult color naming task (see Stroop, 1935).

At the behavioral level, we observed reduced reaction times at posttest compared to pretest. Although this effect appeared to be driven mainly by reaction time reductions in the cognitive training group and the active control group, it was not significantly modulated by the factor Group. It thus should be considered a practice effect induced by repeated testing rather than a performance gain related to the increased activity associated with training regimen. Interestingly, this practice effect was modulated by task difficulty: in the easier wordtask, reaction time benefits emerged on stay and switch trials, but not in single task blocks. In the more difficult color task, reaction times instead decreased for single and stay trials, but not for switch trials. In light of the low difficulty level of the word task, it is feasible that only the more complex mixed block offered room for improvement because performance for the less demanding single trials was already at ceiling at pretest. For the more difficult color task, on the other hand, the task set configuration associated with switch trials may have further elevated the difficulty level to such a degree that practice was not sufficient to generate reliable reaction time benefits.

In the context of the present study, accuracy and LISAS results were more noteworthy than the reaction time data as they indicated performance benefits associated exclusively with multi-domain cognitive training: both parameters were reduced at posttest compared to pretest only for the cognitive training group, but not for the two control groups. This result pattern corroborates and extends previous research on multi-domain cognitive training which could show beneficial transfer effects to executive functions in younger, middle-aged and older adults (e.g., Gajewski and Falkenstein, 2012; Baniqued et al., 2015; Gajewski et al., 2017). In keeping with these studies, trainingrelated performance gains became manifest in increased response accuracy. In contrast to our initial hypothesis, however, these accuracy improvements were equivalent in the color and in the word task.

Cognitive training was not only associated with gains in response accuracy, but also with modulations of cue- and target-locked ERPs from pretest to posttest. Whereas the switch positivity in the task preparation period was larger at posttest both in the cognitive training group and the active control group, changes in target-N2 and target-P3 amplitude as well as target-P2 latency were limited to the cognitive training group. Like the training-related benefits in accuracy, these ERP modulations were not subject to the task participants had to perform.

ERPs in the cue-target interval showed modulations from pretest to posttest after either type of training regimen: on posttest stay trials, we observed an increase in switch positivity which was widespread for the cognitive training group and limited to central sites for the active control group, but absent in the passive control group. Modulations of the switch positivity are thought to reflect the degree of task set updating necessary to prepare for the upcoming task (for a review see Karayanidis et al., 2010). An enhancement of switch positivity amplitude from pretest to posttest, notably in stay trials, may thus indicate a training-related boost to the efficiency of maintaining a task set from one trial to the next under mixing conditions. As both the cognitive training group and the active control group showed an increase in switch positivity amplitude, this efficiency gain may be due to unspecific vitalization associated with training regimen in general (see Gajewski and Falkenstein, 2015a).

Regarding the target-locked ERPs, the present target-P2 data indicate that multi-domain cognitive training has the potential to accelerate the processing operations underlying the P2, at least under single task conditions. Previous research has linked the P2 to the retrieval of stimulus-response bindings (Kieffaber and Hetrick, 2005; Gajewski et al., 2008; Schapkin et al., 2014). The subsequent target-locked N2 has been associated with cognitive control processes, such as response inhibition, the resolution of response conflict and response selection (for a review see Folstein and Van Petten, 2008). The trainingrelated enhancement of the target-locked N2 we observed in the present study is consistent with our previous reports (Gajewski and Falkenstein, 2012; Gajewski et al., 2017): multi-domain cognitive training, was previously associated with an increase in target-locked N2 amplitude in cue-based and memory-based versions of a task switching paradigm featuring three tasks with comparable difficulty levels. Whereas the earlier study showed reliable N2 enhancements mainly for switch trials, the present N2 data as well as data from the later study indicate that cognitive training can also lead to amplitude increases on single and stay trials (see Gajewski and Falkenstein, 2015a,b, for similar N2 enhancements due to habitual physical activity).

Taken together with our previous findings (Gajewski and Falkenstein, 2012; Gajewski et al., 2017), the present ERP data thus corroborate the idea that multi-domain cognitive training can benefit processes involved in response selection, especially in older adults. At the behavioral level, such a training-related improvement of response selection consistently appears to translate into an improvement of response accuracy not only in older adults (the present study, Gajewski and Falkenstein, 2012) but also in younger participants (Gajewski et al., 2017). To be more specific, the present and previous data suggest that target-P2 and target-N2 are related to the retrieval or activation of stimulus-response mappings or task sets (target-P2) and the implementation of these sets (target-N2). The mechanisms reflected in these ERP components are thus essential for successfully executing a task-appropriate reaction, i.e., for pressing the correct response key. When this process is enhanced as indicated, for example, by a negative shift in the target-locked N2, participants are less likely to make an error. In other words, cognitive training enhances the ability to make a correct decision. For the present study, this was the case irrespective of task difficulty and on both switch and non-switch trials.

In contrast to our earlier report on older adults (Gajewski and Falkenstein, 2012), the present data show an amplitude decrease in the time window of the target-locked P3 at frontocentral electrodes following cognitive training. In the earlier study, participants had instead shown a training-related increase in target-locked P3 amplitudes at posterior sites which we interpreted in terms of improved cognitive resource allocation. In younger adults, the P3 usually has a clear-cut parietal focus (for a review see Polich, 2007). For the present older participants, we instead observed cue-locked and target-locked P3s with a more widespread distribution featuring parietal and frontal foci. This is in line with previous age-comparative research which has indicated that older adults may show a broader distribution of the P3 which extends to anterior sites as well and which likely reflects the compensatory increased recruitment of prefrontal brain areas involved in cognitive control (Kray et al., 2005; Eppinger et al., 2007; Adrover-Roig and Barceló, 2010; Kopp et al., 2014). As the target-P3 amplitude decrease observed in the present study was limited to frontocentral electrodes, it could potentially reflect a training-related reduction in this compensatory over-recruitment of frontal areas. Alternatively, the target-P3 amplitude decrease could be related to the enhancement of the preceding fronto-central target-N2 which extends into the P3 peak latency range, or to an even broader negative shift in the time range of both target-locked N2 and P3 (see Gajewski and Falkenstein, 2015a,b). Further ERP research on cognitive training in older adults is needed to clarify this issue.

Previous research on the impact of transfer task difficulty on training-related performance gains is scarce. In one of the few studies on the subject, Brehmer et al. (2011) examined the performance of a cognitive training group and an active control group in a working memory transfer task featuring high and low load conditions. They found that neither group showed performance gains from pretest to posttest in either condition suggesting that performance was already at ceiling at pretest. The present study instead featured a more difficult transfer task which offered room for improvement from pretest to posttest. Under these conditions, practice effects which were not directly associated with cognitive training were subject to task difficulty whereas genuine training-related performance benefits were not. Likewise, cognitive training benefited performance on single, stay and switch trials to a similar degree. As a result, we were unable to link training-related performance gains to reductions of mixing/general switch costs or specific/local switch costs as previous studies have done (e.g., Gajewski and Falkenstein, 2012). Note, however, that particularly specific/local switch costs were minimal at baseline in the cognitive training group. Any potential impact of multi-domain cognitive training on these costs thus would have been difficult to detect, in the present study.

# Conclusions

The present study corroborates and extends our understanding of the neural underpinnings of performance gains associated with multi-domain cognitive training in older adults. A 4 month multi-domain cognitive training had beneficial effects on response accuracy in an untrained binary switch paradigm featuring two tasks with distinct difficulty levels. These trainingrelated performance gains were likely mediated by an increase in target-locked N2 amplitude, an amplitude reduction in the time range of the target-locked P3 and a decrease in target-P2 latency. These ERP modulations indicate benefits to neural processes involved in response selection which resulted in reduced error rates on both switch and non-switch trials. Our findings suggest that multi-domain cognitive training increases slow negative potentials during target processing which enhance the N2 and may additionally reduce the amplitude of a subsequent P3-like component on both switch and non-switch trials and irrespective of task difficulty.

# AUTHOR CONTRIBUTIONS

KK conducted the data analysis and wrote the manuscript. PDG and MF were involved in study conception and data acquisition and revised and approved the manuscript. CF was involved in study conception as well as the acquisition and analysis of the data and revised and approved the manuscript.

# ACKNOWLEDGMENTS

This work was funded by a grant from the German Insurance Association (Gesamtverband der Deutschen Versicherungswirtschaft, GDV). We thank Ludger Blanke for programming the task and technical support and Christiane Westedt, Brita Rietdorf and Pia Deltenre for their help in organizing and conducting the study. This article is based on a reanalysis of data which were first analyzed by CF as part of her dissertation project. The publication of this article was supported by the Open Access Fund of the Leibniz Association and by the Open Access Fund of the Technical University of Dortmund.

## REFERENCES


effects of cognitive, physical, and relaxation training. Front. Hum. Neurosci. 6:130. doi: 10.3389/fnhum.2012.00130


Klauer, K. J. (2008). Denksport Für Ältere—Geistig Fit Bleiben. Bern: Hans Huber.


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Küper, Gajewski, Frieg and Falkenstein. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# No Overt Effects of a 6-Week Exergame Training on Sensorimotor and Cognitive Function in Older Adults. A Preliminary Investigation

#### Madeleine Ordnung<sup>1</sup> , Maike Hoff <sup>1</sup> , Elisabeth Kaminski <sup>1</sup> , Arno Villringer 1, 2 and Patrick Ragert 1, 3 \*

<sup>1</sup> Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany, <sup>2</sup> Mind and Brain Institute, Charité and Humboldt University, Berlin, Germany, <sup>3</sup> Institute for General Kinesiology and Exercise Science, University of Leipzig, Leipzig, Germany

Several studies investigating the relationship between physical activity and cognition showed that exercise interventions might have beneficial effects on working memory, executive functions as well as motor fitness in old adults. Recently, movement based video games (exergames) have been introduced to have the capability to improve cognitive function in older adults. Healthy aging is associated with a loss of cognitive, as well as sensorimotor functions. During exergaming, participants are required to perform physical activities while being simultaneously surrounded by a cognitively challenging environment. However, only little is known about the impact of exergame training interventions on a broad range of motor, sensory, and cognitive skills. Therefore, the present study aims at investigating the effects of an exergame training over 6 weeks on cognitive, motor, and sensory functions in healthy old participants. For this purpose, 30 neurologically healthy older adults were randomly assigned to either an experimental (ETG, n = 15, 1 h training, twice a week) or a control group (NTG, n = 15, no training). Several cognitive tests were performed before and after exergaming in order to capture potential training-induced effects on processing speed as well as on executive functions. To measure the impact of exergaming on sensorimotor performance, a test battery consisting of pinch and grip force of the hand, tactile acuity, eye-hand coordination, flexibility, reaction time, coordination, and static balance were additionally performed. While we observed significant improvements in the trained exergame (mainly in tasks that required a high load of coordinative abilities), these gains did not result in differential performance improvements when comparing ETG and NTG. The only exergaming-induced difference was a superior behavioral gain in fine motor skills of the left hand in ETG compared to NTG. In an exploratory analysis, within-group comparison revealed improvements in sensorimotor and cognitive tasks (ETG) while NTG only showed an improvement in a static balance test. Taken together, the present study indicates that

#### Edited by:

Claudia Voelcker-Rehage, Technische Universität Chemnitz, Germany

#### Reviewed by:

Tongjian You, University of Massachusetts, USA Eling D. De Bruin, ETH Zurich, Switzerland

> \*Correspondence: Patrick Ragert patrick.ragert@uni-leipzig.de

Received: 28 October 2016 Accepted: 17 March 2017 Published: 04 April 2017

#### Citation:

Ordnung M, Hoff M, Kaminski E, Villringer A and Ragert P (2017) No Overt Effects of a 6-Week Exergame Training on Sensorimotor and Cognitive Function in Older Adults. A Preliminary Investigation. Front. Hum. Neurosci. 11:160. doi: 10.3389/fnhum.2017.00160 even though exergames might improve gaming performance, our behavioral assessment was probably not sensitive enough to capture exergaming-induced improvements. Hence, we suggest to use more tailored outcome measures in future studies to assess potential exergaming-induced changes.

Keywords: exergames, healthy aging, sensorimotor, cognition, video games, aerobic fitness, motor skills

#### INTRODUCTION

In the last decades, great effort has been put into understanding how to improve healthy and successful aging, since a prolonged lifespan has major implications for our health care and social system.

Typically, aging is accompanied with a decline in cognitive (Park et al., 2002; Verhaeghen and Cerella, 2002; Salthouse, 2004) and sensorimotor functions (Stevens and Choo, 1996; Konrad et al., 1999; Smith et al., 1999; Krampe, 2002; Li and Lindenberger, 2002). However, there are substantial individual differences in how much people are affected by age-related decline. And it seems that some people are relatively spared of age-related alterations and disabilities (Nelson and Dannefer, 1992; Morse, 1993; Christensen et al., 1994; Fozard et al., 1994; Stewart et al., 2014). In this regard, successful aging has become a familiar term and is defined by three components: being actively engaged in life, the absence of disease as well as physical and mental fitness (Rowe and Kahn, 1997). More importantly, these components are not acting independently; they rather contribute in an interrelated fashion. Accordingly, many studies emphasized the key role of physical activities in preventing as inevitable seeming age-related chronic diseases, neurodegenerative, and psychiatric disorders (Sacco et al., 1998; Franklin et al., 2001; Strawbridge et al., 2002; Ravaglia et al., 2008; Ciolac, 2013). Being engaged in physical activity can further contribute significantly to life satisfaction and happiness in later life (Menec, 2003; Liffiton et al., 2012). Moreover, there is compelling evidence that exercising is an effective approach to prevent cognitive decline associated with increasing age (Hertzog et al., 2008; Bherer et al., 2013). For instance, it is well established that aerobic training not only improves motor fitness, it also positively influences cognitive performance, such as working memory and executive functions. Furthermore, these behavioral benefits are accompanied by structural and functional brain adaptations (Kramer et al., 1999; Colcombe et al., 2006; Angevaren et al., 2008; Erickson et al., 2011; Voelcker-Rehage et al., 2011; Voss et al., 2013; Bamidis et al., 2014), like volume changes in cortical motor and frontal areas as well as in subcortical structures. Interestingly, these changes are not exclusively induced by cardiovascular training. Coordination training combining balance, limb coordination, spatial orientation and reaction time tasks are also effective in modulating brain structure and function that are associated with improvements in cognition and motor performance (Voelcker-Rehage et al., 2010, 2011). Comparative studies revealed that combining motor and cognitive demands during exercising can even lead to greater enhancements in cognition than training the domains separately (Fabre et al., 2002; Oswald et al., 2006; Lauenroth et al., 2016). It has been argued that enhanced neuronal metabolic processes induced by physical activity can only be exploited efficiently when the brain is simultaneously challenged by cognitive demands (Oswald et al., 2006; Bamidis et al., 2014). Moreover, it is thought that multi-modal training interventions resemble real-life demands and therefore yield higher chances of successful transfer to other tasks and everyday life situations (Lustig et al., 2009). In line with this, recent studies have highlighted the capacity of lifestyle interventions, like dancing (Kattenstroth et al., 2013; Coubard et al., 2014) or playing video games (Maillot et al., 2012; Pichierri et al., 2012), to enhance sensorimotor and cognitive functions in older adults. Investigating the potential of video games in preventing agerelated cognitive decline has gained great research interest within the last years (Green and Seitz, 2015). It has been frequently shown that cognitive video game training can have beneficial effects on several cognitive functions in older adults, including memory, attention, and reaction time (Lampit et al., 2014; Toril et al., 2014). While classical video games being played sedentarily, movement based video games (exergames) require the performance of physical activities while being simultaneously surrounded by a cognitively challenging environment. Studies were able to show that exergaming for instance can promote executive functions and cognitive processing speed in older adults (Maillot et al., 2012; Schoene et al., 2013). Furthermore, a recent review article could show that various studies using exergame trainings, show beneficial effects on cognitive, as well as on dual-task performance, which potentially reduces the risk of falls in older adults (Ogawa et al., 2016). Nevertheless, exergame intervention studies attempting to examine effects on motor performance, investigated mainly the effect on balance, which showed rather mixed results (Bisson et al., 2007; Nicholson et al., 2015). The same holds true for improvements in muscle strength of lower and upper limbs (Nitz et al., 2010; Jorgensen et al., 2013). Moreover, benefits are mostly shown for trained tasks but only limited transfer effects to other tests (Bisson et al., 2007; Pichierri et al., 2012; Baniqued et al., 2014; Sato et al., 2014). Video game characteristics might contribute to this since most of the video games used are tailor-made, representing predominantly highly controlled laboratory conditions and therefore missing a multi-modal environment promoting transfer (Lustig et al., 2009; Baniqued et al., 2014). In contrast, commercial video games which, for instance, include different sport disciplines, require fast reactions and the ability to shift the attentional focus for performing appropriate and well timed movements. Up do date, studies examining commercial video games were mostly conducted using different applications of Nintendo <sup>R</sup> , for instance Nintendo WiiTM (Nintendo Co. Ltd., Kyoto, Japan). This system is equipped with a hand remote controller which requires to play the game in performing physical gestures by moving upper limbs. A more recent development are consoles using a camera system, for instance XboxTM 360 KinectTM (Microsoft corp., Redmond, WA), which recognizes gestures and therefore requires whole body movements for playing (van Diest et al., 2013). Up to now, only a few studies examined the effects of a whole-body exergame training and they have indicated improvements in balance (van Diest et al., 2016) and lower extremity muscle strength (Sato et al., 2014). In fact, to date little is known about the impact of a whole-body and multimodal exergame training intervention on a broad range of sensorimotor and cognitive skills, such as aerobic fitness, fine motor skills, tactile acuity, and working memory. Furthermore, sample characteristics between studies differ to a great extent and ranging from hospitalized, community dwelling (Jorgensen et al., 2013; Schoene et al., 2013; Sato et al., 2014; Nicholson et al., 2015; van Diest et al., 2016) to independent living sedentary participants (Maillot et al., 2012). Until now, it is not known whether healthy active older adults would also benefit from exergame training. Likewise, there is a huge difference in study design of previously performed studies. They differ in duration and frequency of the training program and the standardization of the setting (home-based or under supervision in the lab; Maillot et al., 2012; Schoene et al., 2013; van Diest et al., 2016). Typically, studies so far were taken place within training periods up to 12 weeks, with a training frequency of 2–3 sessions per week (Hall et al., 2012; Marston and Smith, 2012). Interestingly, Colcombe and Kramer (2003) stated that effects of physical training on cognition can already be found after 1 month of training, while longer training periods show substantially larger effects. Therefore, the current study aimed to investigate the effect of a whole-body exergame training intervention using the console XboxTM 360 KinectTM over a rather short training period of 6 weeks. Based on previous findings from the aforementioned studies, we expected that a multi-domain video exergame training combining endurance, coordination, strength as well as demands on cognitive processing will translate into overall, and therefore generalized enhancements of a broad range of sensorimotor as well as cognitive functions. Following on this, we further wanted to examine the potential benefits of exergame training for provoking benefits in a healthy, active sample of older adults. We therefore hypothesized that exergaming will induce significant improvements in the performance of sensorimotor and cognitive functions compared with a group receiving no training. We further expected that the exergame training group will show significant practice effects during exergaming and that these online improvements will be associated with the baseline performance in the sensorimotor and cognitive tests.

#### MATERIALS AND METHODS

#### Participants

A total number of 30 healthy participants were enrolled in the present study after giving written informed consent. To exclude any evidence for neurological disease and contraindications with respect to the study procedures, all participants were neurologically examined by a physician. All participants were free of taking any central-acting drugs and completed the Mini-Mental State Examination (MMSE; Folstein et al., 1983). We further did not include highly skilled musicians, typists, or sportsmen. Nevertheless, some of the participants were experienced in playing a musical instrument or did sports on a regular basis as assessed by a questionnaire (for group characteristics also see **Table 1**). Handedness was assessed using the Edinburgh Handedness Inventory (Oldfield, 1971). According to this questionnaire, three participants were considered ambidextrous (mean score: 2; range −10–26) and therefore excluded for the analysis on hand function, grip strength and pinch force, as well as touch sensitivity. The remaining participants were all right handed (91.81; range 70–100). None of the participants reported to have experiences in playing exergames. The participants were randomly assigned to a passive no training group (n = 15, 8 female, mean age: 68.6 ± 4.67) or the exergame training group (n = 14, 7 female, mean age: 69.79 ± 6.34). All subjects gave written informed consent in accordance with the declaration of Helsinki. The protocol was approved by the local ethics committee of the University of Leipzig (ref no. 376-15-24082015).

## Experimental Procedure

We used a between groups, pretest-training-posttest design for which participants were randomly assigned to either the exergame training group (ETG) or a passive no training group (NTG). All participants completed cognitive tests as well as sensorimotor tests before and after the respective intervention (see also **Figure 1**). The tests were administered each time in a random order over 2 days. Prior to the pre and post-tests for both groups as well as before each session for the ETG, level of attention, fatigue and discomfort was examined using a visual analog scale (VAS) ranging from 1 to 10 (1 = very inattentive to 10 = very attentive; 1 = high fatigue to 10 low fatigue; 1 = no discomfort to 10 = high discomfort). For the ETG, motivation to train was administered at the beginning of each training session (1 = very unmotivated to 10 = very motivated). Between pre and post assessments, participants from the ETG performed


Education: years of school; total score range 1–3 (1 = at least 12 years; 2 = 10 years, 3 = 8 years). MMSE, Mini Mental State Examination; total score range of 1–30; cut-off score for exclusion: <26. BMI, Body Mass Index; Physical Activity: average hours of sport activity per week. Music training: average hours of playing a musical instrument. All values are depicted as mean ± standard deviation of the mean. ETG, Exergame Training Group; NTG, No Training Group.

in total 12 exergame training sessions over 6 weeks, with 2 sessions per week. Each session lasted 60 min resulting in a total training time of 12 h. For avoiding potential influences by social interactions, each participant received personal training sessions. At the beginning of each session pulse at rest was measured using a pulse oximeter, placed at the index finger. In order to reduce the risk of injuries a standardized warm-up of 5 min was performed prior to the exergame training.

# Outcome Variables

#### Aerobic Fitness and Sensorimotor Performance **Aerobic fitness**

To characterize aerobic fitness, we used the 3 Min Step Test (ACSM, 2010) measuring pulse recovery to a functional activity (step climbing). The test was performed on an aerobic platform (heights: 24 cm) and step cadence was controlled using a metronome which was set to 84 beats per minute (bpm). Prior to the test, resting pulse (RP) on the index finger was measured with a finger-tip pulse oximeter (PULOX <sup>R</sup> PO-200, Novidion GmbH, Germany). Following the demonstration by an instructor, the participants performed the test for 3 min. Immediately after finishing the test, strained pulse (ST) and 1 min later recovery pulse (ReP) were measured. For the analysis we computed the Ruffier performance index (Rodríguez Cabrero, 2015) using the following formula: [(RP + ST + ReP) − 200]/10. Participants were excluded from the analysis if they could not keep the given cadence or if they had to stop the test due to exhaustion. For those reasons,

data of three participants had to be excluded from the analysis.

#### **Upper body muscular endurance**

For measuring muscular endurance of the upper body, participants were asked to lie prone on an exercise mattress with the forehead resting on a wedge pillow. The participants were given a dumbbell in each hand and were instructed to perform a rowing motion simultaneously with both arms (stretching and bending of the arms) between two markers. The movement frequency was determined by a metronome which was set to 60 bpm. Upper arms and elbows had to form an angle of 90◦ , slightly touching a marker on the back. For performing the stretching movement, the hands with the dumbbells were supposed to touch the front marker. The number of movement repetitions served as outcome measurement. The test was stopped if the participants were touching the mattress with the dumbbells, if they could not keep the movement frequency or if they were not able to touch the two markers (front or back) anymore. Female participants had to move dumbbells with a weight of 1 kg each; males were tested using 2 kg dumbbells. Data of two participants could not be collected due to non-attendance.

#### **Grip strength**

The maximum strength of the hand and forearm muscle was analyzed using a hydraulic dynamometer (SEAHAN <sup>R</sup> , SEAHAN Corporation, S. Korea). Participants sat in an upright position and were instructed to hold the dynamometer in the hand while keeping the elbow by the side of the body and the arm at right angles. For the testing, participants had to squeeze the dynamometer for 5 s applying maximum isometric effort. Left and right hand were measured three times in a randomized order. Grip strength was measured in kg and for the final analysis mean performance of each side was used. Due to non-attendance, data from one participant could not be collected.

#### **Pinch force**

Maximum pinch force of the thumb was evaluated using a hydraulic pinch gauge (SEAHAN <sup>R</sup> , SEAHAN Corporation, S. Korea). Participants were asked to take an upright position with the arm forming an angle of 90◦ while keeping the elbow on the side. Participants were holding the pinch gauge between the proximal interphalangeal joint of the index finger and the thumb. The pinch gauge had to be squeezed with maximum force three times each side. Each trial lasted 5 s with a break of 10 s in between. The dependent variable was mean performance of each side in kg.

#### **Motor reaction time**

For measuring motor reaction time we used the Ruler Drop Test (Del Rossi et al., 2014). Participants were seated sideways on a chair with the forearm placed over the edge of the chair. The bottom of the measuring stick was placed perpendicular between the thumb and the index finger of the participant. After an acoustic warning signal, the stick was dropped and the participant had to catch it as fast as possible. We recorded the number on the measuring stick displayed just over the thumb representing the reaction time. The test was performed with the dominant hand. Due to non-attendance, data from one participant could not be collected.

#### **Hand motor skills**

Fine motor skills of the upper extremities were evaluated using the Jebson-Taylor Hand Function Test (JTT) (Jebsen et al., 1969). Originally the JTT consists of seven different subtests from which six were included in this study: turning over cards, picking up small objects and placing them in a can, picking up small objects with a teaspoon and placing them in a can, stacking chequers, moving large light-weighted cans, and moving heavy-weighted cans. Participants were instructed to perform the tasks as fast and accurate as possible. All subtests were performed separately with left and right hand with left-hand performance always evaluated first. For the analysis, we recorded times for completion of each task and each hand.

#### **Flexibility**

Flexibility was characterized using the Back Scratch Test (Konopack et al., 2008) which was performed in a sitting position. Participants had to place one hand behind the head and back over the shoulder and reaching down the back as far as possible. The palm was supposed to touch the body and the fingers were directed downwards. The other arm was also moved behind the back under the shoulder with the palm turned outward and fingers upward reaching the fingers from the other hand. The movements of the arms had to be performed without any momentum. As the dependent variable, distance or overlap in centimeter from the two middle fingers was assessed. If there was a distance between the fingers, participants received a negative score while, for overlapping fingers, a positive score was attributed. If the fingertips were exactly touching, the given score was zero. Due to non-attendance data from one participant could not be collected.

#### **Static balance**

For investigating static balance we used the Wii Balanceboard© (Nintendo Co. Ltd., Kyoto, Japan) which has been shown to reliable measure balance in the elderly (Chang et al., 2013). Static stance was assessed under four different conditions: standing still on both legs eyes open and closed, on the right leg and left leg eyes open. Each condition lasted for 15 s. Subjects were instructed to stand as quietly as possible on the platform. Prior the testing, individual center of pressure was calibrated. For all conditions, the average body sway in anterior-posterior (AP) and mediolateral (ML) direction was analyzed using the STABLE software from pro-WISS© (Bochum, Germany).

#### **Tactile performance**

Tactile performance was measured using Touch-TestTM Sensory Evaluators (Semmes-Weinstein Monofilaments, North Coast Medical, Inc., CA). The test kit contained 20 filaments differing in length and diameter, therefore resulting in a specific buckling force. Hence, the applied forces were ranging between 0.008 and 300 mN. The filaments were pressed in a 90◦ angle against the tip of the index finger until they bowed and were hold in place for approximately 2 s. The participants were asked to close their eyes and respond as soon as the stimulus was felt. The test was performed using a staircase procedure starting with the filament representing the highest force by decreasing until the participant could not perceive the stimulus anymore (lower boundary). The forces then were increased until the participant reported an indentation again (upper boundary). In order to receive the absolute touch threshold, this procedure was repeated for three times and the resulting six values were averaged.

#### Cognition

Cognitive performance was measured using subtests from the TAP 2.3 (Test of Attentional Performance, PsyTest, Herzogenrath), a computer-based neuropsychological testbattery for analyzing different aspects of attention (Zimmermann and Fimm, 1995). Therefore, the participants were seated in front of a computer screen with hands placed in front of two response keys (left and right). For familiarizing the participants with each subtest, a short exercise session followed the instructions. Subtests were presented in randomized order.

#### **Alertness and simple reaction time**

Reaction time was measured under two conditions. The first condition concerned simple reaction time to a visual stimuli (greek cross) appearing in random intervals on the screen. Participants had to respond as quickly as possible by pressing a key. In the second condition, the visual stimuli was preceded by a cue in form of a warning tone. The cross followed the warning tone in random intervals. Mean reaction times in ms for both conditions were assessed.

#### **Working memory**

Working memory capacity and information flow was evaluated by using a N-back task (N = 2). Therefore, a sequence of numbers was presented on the screen. Participants were instructed to press the key if the given number was the same as the last but one number. We assessed the number of correct trials out of 15 in 5 min. Due to technical problems, data of one participant was discarded for the final analysis.

#### **Response inhibition**

This paradigm was used to measure the participant's capability to inhibit a response triggered by an external stimulus. For this either an upright cross (Go-stimulus) or a diagonal cross (NoGo-stimulus) was presented on the screen. Participants were instructed to press a key as fast as possible only when the Gostimulus appeared. The average of correct answers out of 20 in 2 min was used as dependent variable.

#### Exergame Training

For the training we used the Microsoft X box 360TM, a commercial video game console, in combination with the Microsoft Kinect SensorTM. The Kinect sensor is a horizontal bar which is connected to a motorized base. Equipped with a RGB camera, a depth sensor and a multi-array microphone the Kinect sensor provides full-body 3D motion capture. Hence, without any need for a game controller the sensor enables exergaming via whole body movements. Connected to a screen the Kinect sensor was placed in 2 ms distance frontal to the participants. While training, the Kinect sensor was tracking the participants' movements and therefore allowed them to control their avatars, a graphical representation of the user, in the game. Every training session included different sport games from the commercial video game Summer Stars 2012 (DEEP SILVER GmbH, Munich Germany). Based on their primarily required motor proficiency, the games were categorized as endurance (freestyle and butterfly swimming, hammer throwing), strength (hurdles, javelin throwing, 100 meter running), and coordination (trampoline, high diving, archery, mountain biking) disciplines. Each participant was playing two times eight disciplines which were presented in a fixed order. Gaming scores for each discipline and every session were noted to examine practice effects. For motivating adherence and enjoyment, 100 meter running and butterfly swimming were replaced by hurdles and freestyle swimming after 6 training sessions, due to similarly required skills. To prevent overexertion pulse was monitored throughout each training session, assessed before and right after every discipline. Each training session was supervised by an instructor ensuring the participants safety and the correct performance of the required movements. For this purpose participants practiced only half of the training program during the first week of training. The participants in the NTG were instructed to keep their usual lifestyle over the entire time of the study.

## Data Analysis

Statistical analyses of the data were performed using the statistical software package for social sciences (IBM SPSS v22), figures were made using RStudio 3.3.1 (RStudio Team, 2015). As an initial step, data-sets were checked for normal distribution using the Kolmogorov-Smirnov Test. Since behavioral data were not normally distributed for some outcome measures, we subsequently performed non-parametric statistical tests to evaluate performance improvements within- and betweengroups. Effect-sizes were analyzed accordingly, using effectsize measurement r. Comparing VAS, we performed parametric statistical tests using an rmANOVA with factor TIME × GROUP. Sample demographics were compared using two-sample t-Tests. Effect size for ANOVA was reported as η 2 p . Effect size for T-tests as Cohen's d.

One participant was excluded from the final analysis due to non-attendance of at least 80 percent of all training sessions. Therefore, the data of 29 participants (mean age: 69.17 ± 5.47 years; range 60–78 years; 15 females) were included in the final analysis.

#### Group Comparisons of Sensorimotor and Cognitive Performance Measures

To unravel potential pre-training differences, we compared the baseline scores from both groups in the sensorimotor and cognitive performance measures using the Mann-Whitney U-Test for two independent samples. Based on significant pretraining differences between both groups, we then computed a gain score of each subtest representing the percentage performance improvement using the following formula:

> [(post <sup>−</sup> pre)/pre]<sup>∗</sup> 100

pre (raw test score before the training period), post (raw test score after the training period).

Thereby, we assured that performance gain was evaluated independent from potential baseline differences between groups. In order to examine offline effects induced by the exergame training, gain scores were subsequently compared betweengroups using Mann-Whitney U-test for two independent samples.

#### Practice Effects and Baseline Dependency of Exergaming Performance

For excluding influences on gaming performance, we analyzed the average VAS scores and motivation to train before and after each training session using paired t-tests. To characterize online training effects, gain scores (see formula above) out of the noted raw game scores from the first (pre) and last session (post) for each sport discipline were calculated. Statistical evaluation was carried out using one sample Wilcoxon signed rank test. Bonferroni correction was applied if necessary to account for multiple comparisons. Within- and between-group comparisons were assessed at the 5% level of significance.

# RESULTS

#### Demographics

According to independent samples t-tests, there were no significant between-group differences regarding age [t(27) = −0.57, p = 0.569, d = −0.21], years of education [t(27) = −1.58, p = 0.125, d = −0.59], MMSE scores [t(27) = −1.45, p = 0.158, d = −0.54], body mass index [t(20) = 0.44, p = 0.665, d = 0.19], hours of physical activity [t(27) = 0.14, p = 0.894, d = 0.05] and hours of playing a musical instrument per week [t(27) = 0.70, p = 0.491, d = 0.27] (see also **Table 1** for mean values). RM-ANOVA revealed no group differences in fatigue [F(1, 25) = 1.23, p = 0.279, η 2 <sup>p</sup> = 0.047] and discomfort [F(1, 25) = 0, p = 0.946, η 2 <sup>p</sup> = 0.001] but a significant TIME × GROUP interaction for attention [F(1, 25) = 7.17, p = 0.013, η 2 <sup>p</sup> = 0.223]. Nevertheless, considering the mean values (see also **Table 2**), we believe that there was no impact from attention on the post-performance of the NTG.

### Group Comparisons of Sensorimotor and Cognitive Performance Measures

We found significant between-group baseline differences in JTT performance of the left hand (Mann-Whitney U-Test for two independent samples, ETG Mdn: 5.91 s, NTG Mdn: 5.34 s, U = −3.275, p = 0.001, r = 0.61), as well as in the touch test of the right hand (ETG Mdn: 3.32 mN, NTG Mdn: 3.03 mN, U = −2.389, p = 0.016, r = 0.44). Please see **Table 3** for details on all variables tested. When comparing the gain scores of both groups, we found significantly greater improvements for the ETG in the JTT performance of the left hand (ETG Mdn: 14.747%, NTG Mdn: 2.538%, U = −3.230, p = 0.001, see also **Figure 2**). However, all other variables showed no significant differences between groups. The descriptives and inferentials for all variables are given in **Table 4**.

#### TABLE 2 | Visual Analogue Scale (VAS).


A visual analog scale was used to assess attention, fatigue and discomfort before the pretraining and post-training measurements. Attention scale ranging from 1 to 10 (1 = very inattentive to 10 = very attentive). Fatigue scale, from 1 to 10 (1 = high fatigue to 10 low fatigue) and discomfort scale ranging from 1 to 10 (1 = no discomfort to 10 = high discomfort). All values are expressed as mean ± standard deviation. ETG, Exergame Training Group; NTG, No Training Group.

# Within-Group Comparisons of Sensorimotor and Cognitive Performance Measures

Within-group comparison were done in an exploratory fashion, to investigate potential training effects and their effect sizes, in the ETG and NTG separately. The ETG showed significant performance improvements from baseline to post-measurement in pinch force of both hands (Wilcoxon-Test for two related samples, Pinch Force Test right, Mdn pre: 6.16 kg, Mdn post: 7.40 kg, W = −2.542, p = 0.011, r = 0.68, Pinch Force Test left, Mdn pre: 6.61, Mdn post:6.95, W = −2.982, p = 0.003, r = 0.80), in fine motor function of both hands as assessed with the JTT (JTT right hand Mdn pre: 4.91 s, Mdn post:4.62 s, W = −3.296, p = 0.00, r = 0.88, JTT left hand Mdn pre: 5.91 s. Mdn post: 5.27 s, W = −3.107, p = 0.002, r = 0.83), in the assessment of static balance with eyes closed (COP AP Mdn pre: 7.05 mps, Mdn post: 6.00 mps, W = −2.010, p = 0.044, r = 0.54, COP ML Mdn pre: 2.90 mps, Mdn post: 2.10, W = −1.992, p = 0.046, r = 0.53) and in two cognitive assessments (Alertness Mdn pre: 273.75 ms, Mdn post: 241.10 ms, W = −2.291, p = 0.022, r = 0.61, simple reaction time Mdn pre: 274.55 ms, Mdn post: 251.60 ms, W = −2.040, p = 0.041, r = 0.55). However, the NTG only showed significant performance improvements in one assessment of static balance (COP AP Mdn pre: 6.60 mps, Mdn post: 6.00 mps, W = −2.786, p = 0.005, r = 0.72). Please see **Table 5** for details.

#### Practice Effects and Baseline Dependency of Exergaming Performance

The overall adherence to the training was 97% with 163 out of 168 sessions (14 participants for 6 weeks/2 sessions per week). The ETG did not show significant differences in their level of attention [t(13) = −1.94, p = 0.075, d = 0.47], fatigue [t(13) = −2.01, p = 0.065, d = 0.56] and discomfort [t(13) = 0.66, p = 0.520, d = 0.09] before and after each training session as assessed by paired t-tests. On average, participants showed a high motivation to train at the beginning of each session (mean: 8.77 ± 0.84). Participants of the ETG showed significant online practice improvements in six of the performed disciplines. Two of the strength and all coordination dominated disciplines showed significant performance improvements already after the

#### TABLE 3 | Baseline characteristics of both groups.


ETG, Exergame Training Group; NTG, No Training Group; Min, Minimum; Max, Maximum; COP, Center of pressure deviation; AP, Anterior-Posterior; ML, medio-lateral; EO, eyes open; EC, eyes closed; U-Test-statistic for Mann-Whitney-U nonparametric test for the comparison of two independent samples, Effect-size r: r <sup>=</sup> |z/<sup>√</sup> n|; r = 0.01: small effect, r = 0.30: medium effect; r = 0.50: large effect. \*indicate significant finding.

first half of the training period (Wilcoxon signed rank test for one sample, hurdles Mdn middle session: 6.00% improvement, Z = 2.551, p = 0.011, r = 0.68), (javelin throwing Mdn middle session: 14.69% improvement, Z = 2.794, p = 0.005, r = 0.75), (archery Mdn middle session: 23.29% improvement, Z = 3.110, p = 0.002, r = 0.83), (high diving Mdn middle session: 24.68% improvement, Z = 2.605, p = 0.009, r = 0.69), (trampoline Mdn middle session: 7.30% improvement, Z = 3.269, p = 0.001, r = 0.87), (mountain biking Mdn middle session: 19.08% improvement, Z = 3.233, p = 0.001, r = 0.86) while there were no significant in-game improvements in the endurance-based disciplines. For detailed description see **Table 6**.

#### DISCUSSION

The main objective of the current study was to investigate whether an exergame training is able to induce changes in cognitive, as well as sensorimotor performance in healthy older adults. Recent exergame studies addressed this question by using rather long training periods and specific tailor-made training regimes. Hence, our main aim was to elaborate effects based on a multi-modal video game combining endurance, coordination, strength and cognitive demands after a short training period of 6 weeks. We observed significant exergaming induced improvements in fine motor skills of the left hand when comparing performance gains between ETG and NTG. Participants in the intervention group showed significant improvements in online gaming performance across the training sessions for most of the strength and all of the coordination based disciplines. However, these online training effects did not seem to translate into overall performance improvements on the assessed cognitive and sensorimotor functions. While we observed significant improvements in the trained exergame (mainly in tasks that required a high load of coordinative abilities), these gains did not result in differential performance improvements when comparing ETG and NTG.

In an exploratory analysis, within-group comparison revealed improvements in sensorimotor and cognitive tasks (ETG) while NTG only showed an improvement in a static balance test. Taken together, the present study indicates that even though exergames might improve gaming performance, our behavioral assessment was probably not sensitive enough to capture exergaminginduced improvements. To which amount different training parameters such as duration, frequency, or study population contribute to the absence of effect regarding cognition and parameters of motor activity, can not be tested with the given

study design. However, different aspects will be discussed in the next sections.

score. When comparing the relative gains score between both groups, we observed significant improvements for the ETG compare to the NTG.

To date, the beneficial effects of exergames on fine motor skills have primarily been shown in clinical studies (Sin and Lee, 2013; Pietrzak et al., 2014; da Silva Ribeiro et al., 2015; Wittmann et al., 2016). For example, Paquin et al. (2015) could show that a 15 min Wii-based training over 8 weeks enhanced upper limb fine motor performance in stroke patients, as assessed by the JTT which was also used in the present study. In a study by McNulty et al. (2011), stroke patients performed a Wii-based training over 10 consecutive days and improved significantly in the Wolf Motor Function Test (WMFT). They concluded that improvements in Wii gaming skills might have generalized to more functional tasks as assessed by the WMFT. The WMFT includes several functional and timed tasks to quantify upper extremity motor ability and is therefore similar to the JTT (Jebsen et al., 1969; Wolf et al., 2001). In the present study, the ETG was able to complete the JTT significantly faster after participating in an exergame training over 6 weeks compared to the NTG, at least in left hand performance. Except from video game training interventions, other lifestyle interventions have already been shown to be effective in enhancing motor performance in healthy older adults (Van Roie et al., 2010; Wu et al., 2010; Kattenstroth et al., 2013; van het Reve and de Bruin, 2014; Eggenberger et al., 2015, 2016). For example, Kattenstroth et al. (2013) performed a 6-month dance intervention in healthy older adults and found beneficial effects in hand-arm steadiness, control precision, and wrist movements compared to a passive control group. Comparable to dancing, playing exergames is based on complex upper body movements and it is therefore assumable that exergaming would also translate into enhanced fine motor skills. The exergame training used in the present study required an intensive use of upper limbs for performing strength and endurance dominated disciplines as well as fine adjusted movements of upper extremity inter-limb coordination. Hence, our findings of improved fine motor performance after 6 weeks of exergame training could be interpreted as transfer of newly acquired motor skills to functional tasks which were not directly practiced during exergaming. Interestingly, exergame induced effects were only visible in significant performance improvements of the non-dominant left hand. This transfer of bilateral training to unilateral improvements has already been well documented in previous studies (Schulze et al., 2002; Burgess et al., 2007; McCombe Waller and Whitall, 2008; Hinder et al., 2013). A study by Schulze et al. (2002) revealed that a bimanual training of the pegboard task leads to improvements in unimanual performance with, however, no difference for the left or right hand. Middleton et al. (2013) demonstrated that after a Wii-based training over 2 weeks young participants performed significantly better in a bimanual and eye-hand coordination task simulating surgical procedures using their non-dominant left hand. One could speculate that task characteristics might explain this effect since performing surgery-like movements require complex functional skills and are therefore visuo-motor and cognitively more demanding compared to a simple motor task. In older adults it has been shown that the performance of unimanual and bimanual tasks requires more neural resources compared to young participants which is known as neural "overactivation" (Mattay et al., 2002; Ward, 2006; Heuninckx et al., 2008; Goble et al., 2010). In a study by Goble et al. (2010), it was shown that older adults exhibited elaborated brain activity in several motor and frontal brain regions during a wrist coordination task compared to younger adults. Moreover, the elaborated brain activation in the supplementary motor area and the somatosensory cortex was positively correlated with higher coordination task demands during antiphasic movements and can therefore be considered a marker for increased task complexity. Hinder et al. (2013) demonstrated a strong transfer of bimanual and unimanual contexts for young and old adults with, however, a release of intracortical inhibition only by older participants. During the exergame training, participants had to perform various bimanual movements. Thus, one could argue that a more elaborated bihemispheric activation while performing the required movements during exergaming might explain the significant improvements in the JTT for the left hand. Nevertheless, the absence of effects in the dominant hand might also be due to an already relatively high skill level compared to the less used hand.

Moreover, Kattenstroth et al. (2013) attributed the enhanced effects in fine motor skills not only to increased sensorimotor coordination but also to muscle strength. Our results corroborate this assumption by showing, that when comparing the withingroup effects of the ETG, there is not only a significant



ET, Exergame Training Group; NTG, No Training Group; Min, Minimum; Max, Maximum; COP, Center of pressure deviation; AP, Anterior-Posterior; ML, medio-lateral; EO, eyes open; EC, eyes closed; Effect-size r: r <sup>=</sup> |z/<sup>√</sup> n|; r = 0.01; small effect; r = 0.30: medium effect; r = 0.50: large effect. \*indicate significant finding.

performance improvement of fine motor skills of the hand, but also in pinch force of both hands, after 6 weeks of exergame training.

However, since a direct comparison between ETG and NTG did not reach significance, our within-group findings need to be interpreted with caution since sample size might be to small to reach significant between-group effects. The impact on strength by exergame training has been reported mainly in clinical studies with mixed results (Yavuzer et al., 2008; McNulty et al., 2011; Lee, 2013; Sin and Lee, 2013; Laver et al., 2015). In fact, Lee (2013) reported that after 6 weeks of exergame training stroke patients improved in upper body muscle strength which were, however, not significantly different to a control group who received occupational therapy. Up to date, only a few studies were examining exergame trainings using healthy populations with preliminary results indicating that video game training can promote lower limb muscle strength in middle aged woman (Nitz et al., 2010) and older adults (Jorgensen et al., 2013). However, since we only assessed muscle strength of the upper limbs, this assessment might have been not specific enough to capture differential training effects between the ETG and NTG.

The investigated exergame training had no differential effects on aerobic fitness when comparing ETG with NTG. This is in contrast to Maillot et al. (2012) where older adults exhibited improvements in aerobic fitness assessed by a reduction in maximum and mean heart rate in a 6-min walking test after 12 weeks of Wii-based training. While Maillot et al. (2012) were using more stationary games without the necessity of leaving the ground with one foot, in the present study, sport disciplines which required pronounced efforts in running (100 meter running) or walking on the spot while performing additional arm movements (javelin throwing, freestyle swimming, hammer throwing) were given.

Effects of rather classical aerobic training regimes have been intensively investigated within the last years, with results indicating strong associations between cardio-respiratory fitness and performance in cognition (Kramer et al., 1999; Colcombe et al., 2006; Erickson et al., 2011; Leckie et al., 2014). Only a small number of studies have focused on this relationship using exergame training interventions. These studies showed that exergames have beneficial effects on executive functions, processing speed and dual-task-performance which is consistent with findings on aerobic training (Maillot et al., 2012; Schoene et al., 2013). Interestingly, in our study we found no significant between-group evidence for a transfer of exergaming on response inhibition (in a Go/NoGo task) or on reaction time.

However, when looking at the within-group comparison in only the ETG, we found significant performance improvements

#### TABLE 5 | Within-group comparison from pre to post.


Wilcoxon-Test for two dependent samples, W, Test-statistic for Wilcoxon-test; ETG, Exergame Training Group; NTG, No Training Group; COP, Center of pressure deviation; AP, Anterior-Posterior; ML, medio-lateral; EO, eyes open; EC, eyes closed; Effect-size r: r <sup>=</sup> |z/<sup>√</sup> n|; r = 0.01: small effect; r = 0.30: medium effect; r = 0.50: large effect. \*indicate significant finding.

from before to after the exergame training in alertness and simple reaction time. Interestingly, this effect was absent in the NTG. Accordingly, one could speculate that a training period of 6 weeks might have been too short for inducing distinct changes in cognition, or that the selected cognitive tests were not sensitive enough for capturing these. On the other hand, sample characteristics might have also crucially contributed. Indeed, while participants of the aforementioned studies were living a sedentary lifestyle, our study sample was characterized by an according to their age, healthy and physical active sample of older adults.

It is well established that poor balance and the risk of falling seems to be associated with executive functions and attention since keeping balance requires the integration of somatosensory informations and is further associated with the control and shift of attentional resources (Woollacott and Shumway-Cook, 2002; Granacher et al., 2008; Yogev-Seligmann et al., 2008; Mirelman et al., 2012; Kearney et al., 2013). Consequently, recent research suggest that an effective balance training should include more complex exercises as well as cognitive demands (Halvarsson et al., 2015). The used exergame sport disciplines required the performance of highly demanding movements covering multiple domains within a cognitively challenging environment. Interestingly, we found significant performance improvements in static balance tasks, after 6 weeks of exergame training, in the ETG as well as in the NTG. This effect was however, only visible in within-group pre to post comparisons and did not reach significance for the differential between-group comparisons.

Studies suggested that static (e.g., in quiet stance) and dynamic postural control (e.g., in perturbation or locomotion) are governed by different neuromuscular mechanisms. In a study by Kang and Dingwell (2006) significant differences and no correlation between dynamic stability properties of walking and standing have been found. Interestingly, while participants in the study of Bisson et al. (2007) improved in functional balance, as assessed by the Community Balance and Mobility Scale, no significant changes in postural sway during quite stance were found. A meta analysis on exergaming for balance training pointed out that instrumental assessments of balance often fails to detect small changes whereas more functional clinical tests are too global. van Diest et al. (2013) therefore concluded that only the usage of both functional and objective-instrumental measures of balance is reliable for capturing effects on balance induced by exergame interventions. In the present study, static balance was administered in an objective-instrumental fashion by using a force plate. As a consequence, the absence of specific


disciplines the scores for the MS therefore represent the performance from the 4th session. Scores from the LS represent the performance from the 6th session. For the remaining disciplines MS refers to the score from 6th session LS to the 12th session. All values are given in percentage. After Bonferroni correction, a p-value of p<0.017 was considered as significant.

training-induced effects on balance performance could be due to the applied test which might have not been able to detect slight improvements of our training participants.

The sense of touch is essential for daily life since sensorimotor performance such as keeping balance or motor hand function depends on afferent tactile informations (Lord et al., 1991; Tremblay et al., 2003). To date, the potential training of tactile performance has been mostly investigated using passive stimulation paradigms such as tactile coactivation where small skin portions of the finger are directly stimulated for a few hours. While studies using this procedure demonstrated that tactile acuity can be restored in older adults (Dinse et al., 2006; Kalisch et al., 2008), the potential of exercise interventions is yet largely unexplored. One study reported that an enriched training environment in the form of dancing activities over 6 month can induce changes in tactile performance in healthy elderly (Kattenstroth et al., 2013). Likewise, exergames provide simultaneous sensorimotor and cognitive demands and can therefore be considered as environmental enrichment. Nevertheless, after 6 weeks of exergame training, the ETG in the present study did not exhibit improvements in tactile performance. Our result is in line with the study of Nitz et al. (2010) which could not find significant changes after 10 weeks of Wii-based training in middle aged woman. Interestingly, it has been proposed that long term practice in Tai Chi, a whole-body sensory-attentional exercise, is associated with superior tactile skills (Kerr et al., 2008, 2016). Hence, it seems that exercise interventions could provoke benefits in tactile performance but a training duration of 6 weeks might have been to short for inducing effects, or the assessment procedure used, was not specific enough to capture training-induced improvements in tacile performance in older adults

# Context Interference and Timing of Post Assessments Crucial for Capturing Training-Induced Effects

In the present study, we especially were interested in whether a multi-domain exergame training is able to foster transfer effects to untrained tasks resulting in significant improvements in a variety of sensorimotor and cognitive assessments. Lustig et al. (2009) reviewed different training types with respect to their effectiveness in improving cognitive functioning in older adults. They pointed out that multi-modal approaches often exhibit broad benefits with, however, relatively small effect sizes. Most of the reviewed multi-modal studies did not include longitudinal follow-up assessments and therefore cannot exclude the possibility that results might have been influenced by interference. The context interference effect suggests that learning of multiple tasks will lead to a poorer initial practice performance but induce superior gains in subsequent retention tests as well as transfer to untrained tasks (Magill and Hall, 1990). Numerous studies investigating simple motor paradigms (Lee and Magill, 1983; Albaret and Thon, 1998; Giuffrida et al., 2002; Sidaway et al., 2016) or more complex real-life motor skills (Bortoli et al., 1992; Hall et al., 1994; Sherwood, 1996; Tsutsui et al., 1998; Babo et al., 2008) concluded that a random compared to a blocked order training of multiple tasks seems to promote learning and transfer more efficiently. For the exergame training used in the present study, participants played different sport disciplines from which each combined different quantities of sensorimotor and cognitive demands. Hence, the training procedure was similar to a random design and we therefore assumed that the exergame training will translate into pronounced gains in the post-tests. After 6 weeks of exergame training, we found improvements in fine motor skill of the left hand in the ETG with, however, no overall enhancements. In general, during exercising various peripheral and central acting physiological processes occur and proceed on different time scales (Adkins et al., 2006). Thus, for capturing training induced changes, timing of post-test assessments seems to be crucial. There are conflicting results regarding the amount of time separating acquisition and retention tests. John dos Santos et al. (2014) confirmed superior performance of a practice group, randomly training a dart throwing task, compared to a block training group, 24 h after the training. However, no significant differences in radial distance from the dart to the inner bull were found 7 and 30 days after training. Contradictory to that, in a study by Pauwels et al. (2014), the interleaved practice group outperformed the block group even 1 week after training in a bimanual tracking task. Moreover, superior performance in retention tests has been found even 2 weeks after training of a throwing task (Granda Vera and Montilla, 2003). According to Lee and Magill (1983) better results in retention following interleaved practice can be explained by higher cognitive demands during the acquisition phase since action plans have to be reconstructed constantly. In the present study, post-tests in motor performance measures were administered 7 days after finishing the exergame training intervention. Despite the fact that we did not have a control group for comparing contextual practice effects, one could speculate that interference between the single tasks took place mediated by a information processing overload. Accordingly, post-test measurements were probably administered too late or too early for capturing overall changes in the sensorimotor and cognitive variables.

# Limitations

As a consequence, one limitation of the present study is that we did not examine performance changes in retention tests 2 weeks after training completion. Therefore, it is not possible to draw conclusions about the persistence as well as about the dynamics of behavioral benefits induced by the exergame training. Hence, to draw a comprehensive picture, future studies should include long term retention tests. Furthermore, we did not enroll additional comparison groups to distinguish the domains which were potentially driving the improvements. With the used passive control group, we have accounted for test-retest effects. When investigating effects on motor and cognitive outcome variables manipulated by exergaming, future studies should include a physical activity and cognitive training group for controlling the contribution of each of those domains. Hall et al. (2012) reviewed studies about exergaming in the elderly and consequently emphasized the high potential on physical and mental health outcomes such as enhanced physical mobility, balance, attention,

and information processing. Nevertheless, concerns have been addressed regarding methodological issues especially about differences in frequency and duration of interventions as well as the low number of participants. Regarding the duration of physical training interventions, Colcombe and Kramer (2003) stated that a short training period between 1 up to 3 month already showed moderate effects on cognitive function, while longer training periods show substantially larger effects. Accordingly, studies applying the exergame training within this period were able to show significant effects on cognition as well as motor performance (Nitz et al., 2010; Maillot et al., 2012). However, these studies were using participants having a predominantly sedentary lifestyle. In the present study, a sample of active older adults has been used and it is therefore conceivable that the training period of 6 weeks might have been too short for inducing overall effects in active elderly.

However, since we were rather interested in the exploratory investigation on the effects of an commercially available exergame, which is not specificially designed to enhance sensorimotor and cognitive functions, it is possible, that the behavioral improvements only resulted in significant in-game improvements and did not translate into improvements of the specific assessments of sensorimotor and cognitive functions. Finally, for some of our outcome measures (e.g., endurance tests, back scratch test, COP measures), there is no information about the reliability of the respective test available. Hence, our results should be interpreted with caution especially because our study might in fact be underpowered.

#### CONCLUSIONS

In summary, the present study was the first which addressed the question whether a whole-body exergame training can promote a broad range of sensorimotor and cognitive functions in healthy and active older adults. Interestingly, we observed significant in-game improvements of the exergame training, which was mainly present in disciplines requiring high coordination skills.

# REFERENCES


However, the only exergaming-induced difference was a superior behavioral gain in fine motor skills of the left hand. However, when assessing behavioral improvements of the ETG alone, we saw significant improvements in pinch force, fine motor skills, static balance, and cognitive function, while the assessment of the NTG alone only showed significant improvements in static balance. In conclusion, we found evidence, that 6 weeks of exergame training result in improved gaming performance, but our behavioral assessment was probably not sensitive enough to capture rather global improvements on sensorimotor and cognitive function in older adults.

Therefore, future studies should control for potential driving variables with simultaneous and careful consideration of sample characteristics. Additionally, more knowledge is needed about the underlying neuronal adaptations induced by exergaming which could be of high relevance in preventing pathological agerelated brain alterations. Future studies should therefore include neuroimaging assessments in order to identify key regions and significant mental processes acting in multi-domain exergames.

# AUTHOR CONTRIBUTIONS

MO, AV, and PR designed the study. MH and EK helped during data acquisition. MO analyzed the data. MO, MH, EK, and PR wrote the manuscript. All authors were involved in discussing data.

## FUNDING

MO received a PhD student stipend from the Max Planck International Research Network on Aging (MaxNetAging).

# ACKNOWLEDGMENTS

We wish to thank Sophia Rose, Pauline Bassler, Claudia Predel, Rouven Kenville as well as Tom Maudrich for helpful support in conducting the study.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Ordnung, Hoff, Kaminski, Villringer and Ragert. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Older Adults with Mild Cognitive Impairments Show Less Driving Errors after a Multiple Sessions Simulator Training Program but Do Not Exhibit Long Term Retention

Normand Teasdale1,2\*, Martin Simoneau1,2 , Lisa Hudon<sup>1</sup> , Mathieu Germain Robitaille<sup>1</sup> , Thierry Moszkowicz <sup>3</sup> , Denis Laurendeau<sup>3</sup> , Louis Bherer 4,5 , Simon Duchesne6,7 and Carol Hudon6,8

<sup>1</sup> Department of Kinesiology, Faculty of Medicine, Université Laval, Quebec City, QC, Canada, <sup>2</sup> Centre intégré universitaire de santé et de services sociaux de la Capitale-Nationale et Centre d'excellence sur le vieillissement de Québec, Quebec City, QC, Canada, <sup>3</sup> Computer Vision and Systems Laboratory, Department of Electrical Engineering, Université Laval, Quebec City, QC, Canada, <sup>4</sup> PERFORM Centre, Concordia University, Montreal, QC, Canada, <sup>5</sup> Department of Medicine, University of Montreal and Montreal Heart Institute, Montreal, QC, Canada, <sup>6</sup> Centre de recherche de l'Institut universitaire en santé mentale de Québec, Quebec City, QC, Canada, <sup>7</sup> Département de Radiologie, Faculté de Médecine, Université Laval, Quebec City, QC, Canada, <sup>8</sup> École de psychologie, Université Laval, Quebec City, QC, Canada

#### Edited by:

Stephane Perrey, University of Montpellier, France

#### Reviewed by:

Arun Bokde, Trinity College, Dublin, Ireland Anouk Vermeij, Radboud University Nijmegen, Netherlands

#### \*Correspondence:

Normand Teasdale normand.teasdale@kin.ulaval.ca

Received: 26 September 2016 Accepted: 08 December 2016 Published: 27 December 2016

#### Citation:

Teasdale N, Simoneau M, Hudon L, Germain Robitaille M, Moszkowicz T, Laurendeau D, Bherer L, Duchesne S and Hudon C (2016) Older Adults with Mild Cognitive Impairments Show Less Driving Errors after a Multiple Sessions Simulator Training Program but Do Not Exhibit Long Term Retention. Front. Hum. Neurosci. 10:653. doi: 10.3389/fnhum.2016.00653 The driving performance of individuals with mild cognitive impairment (MCI) is suboptimal when compared to healthy older adults. It is expected that the driving will worsen with the progression of the cognitive decline and thus, whether or not these individuals should continue to drive is a matter of debate. The aim of the study was to provide support to the claim that individuals with MCI can benefit from a training program and improve their overall driving performance in a driving simulator. Fifteen older drivers with MCI participated in five training sessions in a simulator (over a 21-day period) and in a 6-month recall session. During training, they received automated auditory feedback on their performance when an error was noted about various maneuvers known to be suboptimal in MCI individuals (for instance, weaving, omitting to indicate a lane change, to verify a blind spot, or to engage in a visual search before crossing an intersection). The number of errors was compiled for eight different maneuvers for all sessions. For the initial five sessions, a gradual and significant decrease in the number of errors was observed, indicating learning and safer driving. The level of performance, however, was not maintained at the 6-month recall session. Nevertheless, the initial learning observed opens up possibilities to undertake more regular interventions to maintain driving skills and safe driving in MCI individuals.

Keywords: MCI, driving, learning, training, retention

# INTRODUCTION

Mild cognitive impairment (MCI) is characterized by objective memory impairments with or without other cognitive deficits such as language, attention or executive function disorders (Petersen, 2004). The autonomy of many MCI individuals is relatively preserved, but a gradual functional decline can be observed when the cause of this syndrome is a neurodegenerative disease. One of the most frequent causes of MCI is Alzheimer's disease (AD; Elias et al., 2000; Albert et al., 2011). Although there is some variability between studies regarding its prevalence, MCI could affect between 3% to 32% of the elderly population (65 years and older; Ward et al., 2012).

Compared to older adults with normal cognition, individuals with MCI and those with clinical mild dementia show similar rates of driving cessation and frequency (O'Connor et al., 2010). Also, there are several questionnaire studies showing that a significant proportion of older individuals with a diagnosis of MCI or dementia are active drivers and will continue to drive for several years after having received their clinical diagnosis (Silverstein, 2008; Betz and Lowenstein, 2010; Turcotte, 2012; O'Connor et al., 2013). Although it is well known that older drivers tend to underestimate the number of trips they take and provide inaccurate estimates of their traveled distance (Blanchard et al., 2010; Porter et al., 2015), these data are important because they suggest that older individuals with MCI are active drivers.

So far, all studies that have examined driving among MCI individuals reported that most of these persons can drive safely (Wadley et al., 2009; Frittelli et al., 2009). However, maneuvers involving executive functions, such as left turns (i.e., crossing a lane with traffic going the opposite direction), changing lanes and maintaining the vehicle in the center of the lane, are considered suboptimal in participants with MCI (Grace et al., 2005; Dawson et al., 2009; Wadley et al., 2009). In some driving simulation studies, drivers with MCI have shown poor control of speed (driving at slow speed and greater variability of speed than healthy individuals) and lateral position (weaving, driving off the road), and improper distance with a lead vehicle (Freund et al., 2002; Pavlou et al., 2016).

As cognitive function declines and dementia progresses, driving can become a serious traffic safety problem (Hunt et al., 1993, 1997; Dubinsky et al., 2000; Rizzo et al., 2001; Uc et al., 2005). Because the driving abilities of individuals with MCI are expected to worsen with the decline of cognitive capabilities, there is currently a debate about whether or not individuals diagnosed with MCI should also be allowed to continue to drive (Olsen et al., 2014). However, a recent Cochrane review (Martin et al., 2013) clearly highlights the lack of construct validity of current approaches to assess driving performance and the identification of at-risk drivers. Using data from a large-scale prospective cohort study, the Maryland Prospective Older Driver Study (Staplin et al., 2003), the authors have estimated that the cognitive test that most strongly predicted future crashes would, if used as a screening tool, potentially prevent six crashes per 1000 people over 65 years of age screened. This, however, would be achieved at the price of stopping the driving of 121 people who would not have had a crash. Martin et al. (2013) suggested that, although declining driving skills raise understandable concerns about crash risk, these data suggest that screening currently discriminates unfairly against older drivers. A similar suggestion arises from the work of Jeong et al. (2012) who showed no differences in the history of crashes and traffic citations for a period of 3 years between healthy elderly drivers and older drivers with MCI.

Clearly, the identification of at-risk drivers is problematic (Bédard et al., 2008; Gamache et al., 2010; Bédard and Dickerson, 2014). For older drivers with MCI, transitioning to alternative transportations is an option that needs to be considered as driving cessation will occur in the future (Carr and Ott, 2010; Wheatley et al., 2014). Before this potential transition occurs, one approach could be to examine whether or not these individuals can be retrained to maintain or even improve their actual level of driving performance. An interesting observation is that procedural memory (implicit learning) is preserved in individuals with MCI, as well as in people having clinically probable AD (McEvoy and Patterson, 1986; van Halteren-van Tilborg et al., 2007; Gobel et al., 2013). Implicit learning, contrary to explicit learning, takes place without awareness, often by repetition, and without reference to explicit knowledge learned previously (Vinter and Perruchet, 2002; van Halteren-van Tilborg et al., 2007). According to Willingham (1998) and Willingham and Goedert-Eschmann (1999), there could be an interaction effect between implicit and explicit learning, and implicit motor-skill learning could take place in parallel during explicit learning when a movement response is done. Complex sequence of actions can be learned implicitly (Witt and Willingham, 2006). With driving, knowledge of road safety rules is explicit knowledge, but several of the maneuvers involve implicit learning. For example, people explicitly know they should brake at an intersection with a stop sign, but releasing the accelerator and dosing the pressure on the brake pedal when braking is an implicit learning task that takes place while practicing. With healthy individuals, several studies have reported the maintenance of the acquired skills several months after implicit learning (Albouy et al., 2008; Gheysen et al., 2010; Doyon et al., 2011; Rose et al., 2011; Simon et al., 2012). Furthermore, two recent studies have demonstrated that this type of intervention brings measurable functional changes in the brain in healthy adults (Oosterman et al., 2008; Gheysen et al., 2010).

There is currently a lack of evidence as to whether or not MCI patients can benefit from a driving training program. In a recent pilot study (Teasdale et al., 2016), we examined if individuals with MCI could benefit from a training program in a driving simulator (five sessions over a 3-week period). None of these individuals participated in the current study. For several maneuvers (speeding, not using the turn signal, verification of the blind spot, tailgating), a gradual and significant decrease in the number of errors was noted. Individuals with MCI also showed implicit learning, with their braking showing a shorter and more uniform deceleration with training. These data are important as they suggest that individuals with MCI can be trained to drive more safely. In this new research project, we wanted to replicate these initial results. As well, we had a particular interest in testing if this initial learning decays when there is no rehearsal. To test this hypothesis, we recruited a group of drivers with MCI. They first participated in a 5-session training program within a 21-day period. Then, a 6-month recall session without any feedback was given. This last session was a transfer test serving the purpose to examine if the participants were able to transfer the improved performance observed within the first 21 days (5 sessions) to a new context approximating what is needed in a real-world setting (i.e., driving alone without any additional verbal feedback on the performance; Lee, 1988; Schmidt and Bjork, 1992).

# MATERIALS AND METHODS

# Participants

Fifteen elderly individuals with amnestic MCI (eight singledomain; seven multiple-domain) were recruited from memory clinics. Participants had a valid driving license, normal or corrected to normal vision (6/15 or better on the Snellen test) and declared driving regularly (>3 times a week). This experimental group included 13 men and 2 women (age range: 60–89, mean age (±SD): 72.0 ± 8.8 years, education (years ± SD): 14.6 ± 2.7). None of the participants had a significant decrease in functional autonomy, but all were showing objective cognitive problems, including at least memory impairment. This decline in cognitive functioning was first detected through participants' complaints, which in turn were confirmed by a close relative. To confirm the presence of MCI, procedures similar to those adopted by Gaudreau et al. (2015) were followed. Briefly, the MCI was confirmed based on a battery of clinical and neuropsychological tests that was administered. As well, each case was discussed by a team of clinicians in order to reach a consensus regarding the status of participants. All participants met the core clinical criteria for MCI as defined by Albert et al. (2011). On their first visit to the laboratory, participants were briefed about the requirements of the experiment and invited to read and sign an informed consent declaration approved by the Ethics Committee of the Institut Universitaire en Santé Mentale de Québec.

# Questionnaire and General Driving Assessment

All participants completed a general verbal questionnaire (driving habit questionnaire, DHQ) that included items on driving (frequency of driving and average km/week, presence of an accident during the last years; Owsley et al., 1999). This information regarding self-reports of driving exposure was only used to verify if participants were active drivers. As well, the DHQ includes several questions about avoidance behaviors during the past 3 months (driving outside the immediate neighborhood, left turns (crossing a lane with traffic going the opposite direction), night driving, bad weather, rush hours, highways). For each of these questions, there was also a secondary question regarding the confidence in their driving ability (5-point scale from 0 = no difficulty to 5 = great difficulty). A summary of the responses is provided in **Table 1**.

Before each session, participants were asked if they were in their usual state of fitness (that is, not suffering from a cold or flu or hangover) and were made aware the simulator could make them feel uncomfortable (nausea, dizziness, general discomfort and headache). They were instructed to inform the experimenter if this happened and were told to stop the simulation session before they felt discomfort or illness that could lead to emetic responses. They were told the experiment would stop immediately without any prejudice for them. To prevent simulation sickness situations to occur, the temperature within the room was maintained around 19◦C with proper airflow using a ceiling vent positioned just above the driver.

# Simulator

A fixed-based open-cab simulator powered by STISIM Drive 3.0 (System Technology Inc., Hawthorne, CA, USA) was used for training purposes. Images were projected on a screen (1.45 m high × 2.0 m wide) located 2.2 m from the steering wheel using a projector (Hitachi CPX8) displaying a 40◦ horizontal by 30◦ vertical field-of-view with the center of the screen located at eye-level through the mid-line of the subject. The simulator has an automatic transmission. Steering movements and displacements of the accelerator and brake pedals were also recorded (Computer Measurement PCI DAS08, 12-bit A/D) during driving. The simulator included a digital input/output board (Computer Measurement PCI-DIO24) allowing to record activation and deactivation of the turn signals. The cabin had genuine vehicle parts, and a fully instrumented dashboard (Tessier et al., 2009) leaving the entire screen for presenting the road environment. Audio feedback of the engine noise was provided through two speakers positioned in front of the vehicle. Two USB video cameras (Webcam C905, Logitech, Silicon Valley, CA, USA) were used, one was mounted on the cab facing the subject and zoomed to capture head and eye movements while the other one captured the scenario displayed on the screen. A magnetic tracker (Flock of Birds, Ascension Technology Corporation, Burlington, VT, USA) secured on the driver's head recorded head movements while driving. To comply with the 40◦ field of view of our simulator, there was no right or left-turn maneuver at intersections (i.e., crossing a lane at a right angle with traffic going the opposite direction), and moderate curves were presented (smallest radius of curvature of 120 m).

To detect driving errors, custom-made software was developed using STISIM 3 open module. The open module fed all information about the scenario and the simulation to a second computer through an Ethernet TCP/IP connection. This information was processed in real-time to evaluate the driving performance. The software also included a module to determine head and eye movements when a lane change was performed (Metari et al., 2013). A description of the driving feedback that were provided is presented below.

## Procedure

At the first driving session, participants were explained the study, completed the DHQ, and were familiarized with the simulator. Then, they were given five simulator sessions on five different days within a 21-day period. A 6-month recall session was then held without any feedback. At each session, the participants drove a 6 km practice run (with less graphical information than the experimental scenario) to familiarize themselves with the

#### TABLE 1 | Summary of the self-reported driving habits.


simulator and recorded instructions. They were asked to comply with local traffic regulations throughout the experiment. The width and markings of the lanes were implemented according to governmental rules and speed limits, and advisory signs appeared throughout the scenario. Intersections with a stop sign or a traffic light were presented. No emergency braking response was necessary unless a driving error was made. During the familiarization run (for the first session as well as for all training sessions), general explanations were provided whenever a driver requested specific information regarding the auditory feedback that were provided, and the experimenter made sure that drivers understood the message relevant to each feedback and responded with appropriate corrective responses. The feedback provided were developed based on previous reports of typical errors reported for drivers having cognitive problems (Grace et al., 2005; Dawson et al., 2009; Wadley et al., 2009; Pavlou et al., 2016; Teasdale et al., 2016). They included maneuvers involving executive functions, such as changing lanes (indicating the intent to change lanes and verifying the blind spot) and proper control of the vehicle (speed, variability of the lateral position, and control of the vehicle at intersections). After the familiarization, participants rested for 5 min before they were given a continuous 27.48-km long scenario with urban and semi-urban two-way and four-way roads with minor grade changes. The scenario included recorded instructions to inform the driver about requested maneuvers (for example, instructions to overtake securely a slower-moving vehicle ahead of them) and conditional feedback about specific maneuvers when a driving error was detected by the simulator. No additional information was provided. For the 6-month recall session, the participants drove the familiarization session followed by the 27.48-km long scenario. As mentioned above, no feedback was given during this recall session. A brief description of the feedback provided is now presented.

#### Speeding

The scenario included urban and semi-urban sections with speed limits set at 35 km/h, 50 km/h and 70 km/h. The distance traveled within each of these zones was 650 m, 12,570 m and 14,260 m, respectively. Subsections were arranged within the scenario to represent naturalistic driving conditions. Throughout the drive, a threshold of 10-km/h above the speed limit was accepted (the actual speed was always available through an analog speedometer located within the simulator dashboard). Consequently, for each speed zone, exceeding the speed limit by more than 10 km/h triggered an immediate auditory feedback (''Your current speed exceeds the speed limit. You should slow down''). The driver had to reduce their speed below the 10 km/h threshold within the following 10 s to avoid an additional warning for the same speeding event.

#### Tailgating

This consists of driving too close to a frontward vehicle at a distance which does not guarantee avoiding a collision if stopping is required. The threshold was adapted to the speed of the driver using a time to contact measure. The threshold was set at 2 s. For example, at 50 km/h the minimum distance from the frontward vehicle needed to be greater than 27.7 m. A shorter distance would trigger a feedback (''Keep a safe distance from the vehicle preceding you''). Reducing the speed and/or increasing the distance from the frontward vehicle to increase the time to contact above the 2-s threshold within the next 10 s prevented the driver from receiving an additional feedback for the same tailgating event.

#### Weaving

Failure to control the lateral position of the vehicle is defined as weaving. In this study, we identified difficulties of drivers in maintaining the vehicle within the center of the road. A lateral positioning error was defined as maintaining the vehicle farther than 17.5% of the lane width from the center of the lane for 10 s. In other words, if the tires were less than 15 cm from the nearest road line for more than 10 s, a feedback was given (''You should maintain your vehicle in the center of the road'').

#### Lane Changing

Fifteen lane change maneuvers were included within the scenario. Some of those were requested through a recorded command indicating to overtake a slower vehicle safely and to move back into the rightmost lane. Others were integrated within the scenario through the road design (for instance, lanes that were merging). A few additional lane change maneuvers could occur as a function of the driver's strategies. Whenever a lane change occurred, the system would verify the driver had signaled their intention to change lane before changing lanes (i.e., activation of the correct turn signal) and that a blind spot verification had occurred prior to initiating the lane change. When this was not the case, a feedback (''Verify your blind spot before changing lanes'', ''Activate your turn signal before changing lanes'' or ''Activate your turn signal and verify your blind spot before changing lanes'') was provided as soon as the mid-line of the vehicle crossed the line separating two roadway lanes.

#### Vehicle Control at Intersections with a Stop Sign

Failure to stop completely at an intersection (speed <1 km/h for at least 1 s) or stopping beyond the stop line triggered a feedback as soon as drivers crossed the intersection (''You should stop your vehicle properly at the intersection'').

#### Visual Search at Intersections with a Stop Sign

Drivers were instructed to look ahead and on their left and their right side to verify clearance before they entered the intersection. Ignoring this visual search triggered a feedback as soon as drivers crossed the intersection (''Just before entering the intersection, look left, ahead and right to check that the way is clear'').

#### Vehicle Control at Red-Light Intersections

A permissive yellow-rule was adopted. Specifically, the driver could enter the intersection during the entire yellow interval, but a feedback was provided if the light turned red before the vehicle crossed the midpoint of the intersection (''You should stop at the intersection whenever the light is red''). As well, stopping beyond the stop line triggered a feedback as soon as drivers crossed the line (''You should stop your vehicle properly at the intersection'').

Whenever a feedback was provided, an automatic 7 s delay was imposed before any other feedback could be given. A driving error occurring within this period would not trigger a delayed feedback but the error was recorded. This prevented consecutive feedback that could potentially overload the driver. No other feedback was provided. As well, no account of the driver's performance was given at the end of a session or before any given session. For the 6-month recall session, feedback was turned off and participants were simply asked to drive the same 27.48-km long scenario as safely as they would normally drive.

## Data Analysis

For each session, we first analyzed the duration for driving the scenario. The results were submitted to a one-way ANOVA (first five sessions). We then compiled the number of errors made by each driver for each of the eight different types of driving errors (speeding, tailgating, weaving, omitting to indicate lane change, omitting to verify a blind spot, failure to stop properly at an intersection with a stop sign, failure to engage a visual search before entering an intersection with a stop sign, failure to stop properly at an intersection with a traffic light).

Because of the small number of subjects included in this study and the distribution of the error data, nonparametric statistical tests were adopted. First, we tested for each variable if learning occurred across the five training sessions using a non-parametric Friedman one-way analysis of variance (ANOVA). Significant differences were further examined using a Wilcoxon rank-sum test to examine more specifically if an improvement in the driving performance occurred between sessions 1 and 5. Further comparisons between two sessions were also made using the Wilcoxon rank-sum test (comparisons between sessions 6 and 5 or between sessions 6 and 1). All analyses were conducted using Statistica 13.0 (Dell Statsoft). The level of significance was set at 0.05.

# RESULTS

# Characteristics of the Participants

Overall, four participants (two men and two women) elected to stop their participation due to simulator sickness during the first session. Data for these individuals are not included in the 15 participants included in this study. The 6-month transfer session included 13 participants as 2 participants declined to come back for personal reasons. **Table 2** shows the sociodemographic and clinical/cognitive characteristics of the 15 participants. The sample included 2 women and 13 men. The mean age was 71 years and the mean education level was 14 years. No participant had clinical depression, but all had mild episodic memory impairment. The other cognitive functions were preserved at the group level, but as indicated in the ''Materials and Methods'' Section, some amnestic MCI subjects had additionally non-memory impairment. The most frequent non-memory impairment included language (i.e., naming or fluency) and/or executive deficits (i.e., inhibition). We compared participants with executive deficits with those without such deficits on the total number of errors observed at each session (Kolmogorov-Smirnov Two-Sample Tests). All comparisons were not significant and data for the 15 participants are presented thereafter.

#### Sessions 1 to 5: Training

The time for driving the scenario did not vary across sessions. On average, driving the 27.48-km long scenario took 43 min, 10 s (F(4,56) = 0.33, p = 0.855). The number of errors compiled for each session for each dependent variable is presented in **Figure 1**. Across the first five sessions, there is a general decrease in the number of errors for nearly all variables. The main effect of session was significant for speeding (χ(15,4) = 16.56, p = 0.002), weaving (χ(15,4) = 18.11, p = 0.001), omitting to verify a blind spot (χ(15,4) = 19.03, p = 0.0007), visual search at intersections with a stop sign (χ(15,4) = 24.39, p = 0.0000), and vehicle control at intersections with a stop sign (χ(15,4) = 10.16, p = 0.037). For this latter condition, we did not observe an intersection for which the driver did not stop. All errors were failure to stop completely at the intersection, or stopping too far from or beyond the stop line. The number of omissions to indicate a lane change was small (on average, 1.7 omissions per session for the five sessions), and the small decrease that


Note. Z scores and percentiles were obtained from age- and education-adjusted normative data. BNT, Boston Naming Test; BORB, Birmingham Object Recognition Battery; D-KEFS, Delis-Kaplan Executive Function System; DRS, Dementia Rating Scale; GDS, Geriatric Depression Scale; MoCA, Montreal Cognitive Assessment; PPTT, Pyramids and Palm Trees Test; RL/RI, Rappel libre/rappel indicé; ROCFT, Reu-Osterrieth Complex Figure Test; WAIS, Wechsler Adult Intelligence Scale.

was observed was not significant (χ(15,4) = 5.72, p = 0.22). Similar results were noted for the number of tailgating events (on average, 1.1 events per session; (χ(15,4) = 6.86, p = 0.143) and for vehicle control at intersections with a traffic light (χ(15,4) = 4.69, p = 0.319). As for intersections with a stop sign, we looked for major errors and noted two events where drivers stopped at a green light and two events where drivers did not stop at a red light.

#### Session 6: 6-month Recall

**Table 3** shows a summary of the comparisons between the 6-month recall session and the last training session (session 5) and between the 6-month recall session and the first training session. Mean values (and standard errors) are available in **Figure 1**. Overall, the analyses suggest a decrease in the performance between the 6-month recall session and the last training session (as expressed by a significant increased number of errors). This was observed for nearly all variables. The increased number of errors was significant for speeding, verification of the blind spot, omitting to engage in visual search before crossing an intersection with a stop sign, omitting to indicate a lane change, and tailgating. All other comparisons were not significant. Although we noted a decreased performance at the 6-month recall, when comparing the data with the first training session, none of the comparisons were significant.

Although, the number of errors observed for each variable could be considered small, the total number of errors is certainly not negligible. **Figure 2** presents the mean number of errors per driver. For session 1, 21.6 errors were noted. This number decreased to 8.2 at session 5. The main effect of Session was significant (χ(15,4) = 39.36, p = 0.0001). The number of errors at the 6-month recall session increased to the level observed for Session 1 (21.8). The comparison between session 6 and session 5 was significant (Z = 3.17, p = 0.001) while that between session 6 and session 1 was not (Z = 1.13, p = 0.255). These errors were observed for a relatively short drive (27.48-km long scenario). Although few of these errors were critical (two for stopping at green-light intersections and two red-light crossings), the large numbers (at session 1 and at the recall) suggests that driving is not optimal in MCI individuals.

#### DISCUSSION

The main goal of this study was to examine if MCI individuals could benefit from a driving training program that provided automated real-time auditory feedback on various aspects of the

change, tailgating and vehicle control at intersections with a traffic light. Box and Whisker indicate the standard error of the mean (±1.0 and ±1.96, respectively). <sup>∗</sup> Indicates a main effect of Session (Session 1 to Session 5). † Indicates a significant difference between Session 6 and Session 5. None of the comparisons between Session 6 and Session 1 were significant.

driving performance known to be affected in drivers that are cognitively impaired (speeding, weaving, tailgating, omitting to indicate a lane change, omitting to verify a blind spot, vehicle control at intersections with a stop sign, omitting to engage in visual search before crossing maneuvers, crossing an intersection with a stop sign or a traffic light, and weaving). Also, an


TABLE 3 | Summary of the comparisons between the 6-month recall session and the first and last training sessions (Wilcoxon rank-sum tests).

important objective was to determine if the benefits that could result from the initial training would be maintained at a 6-month recall session.

Overall, MCI individuals showed short-term improvements (five training sessions over a 21-day period). This was observed for speeding, weaving, omitting to verify a blind spot, vehicle control at intersections with a stop sign, and visual search at intersections. There was also a general trend for a decreased number of errors for the other variables that were analyzed (tailgating, omitting to indicate a lane change and vehicle control at red-light intersections). These results corroborate our previous observations also made with another group of MCI participants and a group of healthy older drivers (Teasdale et al., 2016). Contrary to a recent observation made by Pavlou et al. (2016), also in a simulator study, none of our participants drove with an excessively low speed and none maintained a large distance with the preceding vehicle. On the contrary, our participants exceeded the speed limit on several occasions (on average, four errors per driver for the first session with all drivers showing at least one speeding event; all but one driver showed an increased number of speeding events for the recall session) and several drivers also maintained a short time headway (tailgating; 11 out of 15 drivers showed at least one tailgating event on the first session). This was also observed in our previous study (Teasdale et al., 2016). Compared to the study of Pavlou et al. (2016), participants that were tested in our studies were at an earlier stage of MCI. Indeed, their participants included drivers with AD, Parkinson's disease and MCI. Unfortunately, a lack of more specific information about how they diagnosed MCI makes direct comparisons difficult with their study. It may suggest, however, that as MCI progresses, more severe driving errors will come forth (Wheatley et al., 2014; Hird et al., 2016).

The validity of simulator studies is sometimes questioned. Simulator and on-road driving performance have been compared for different populations. These studies have confirmed the relative validity of driving simulators to assess on-road driving performance (Lee et al., 2003; Romoser and Fisher, 2009; Shechtman et al., 2009; Bédard et al., 2010; Mayhew et al., 2011; Lavallière et al., 2012). For example, Bédard et al. (2010) found a correlation of 0.74 between simulator demerit points and on-road demerit points in older drivers. More important, there are studies showing that training in a simulator allowed not only to improve driving in the simulator but also to transfer the learning to a better on-road driving performance. This was shown with healthy older drivers by Romoser and Fisher (2009) and Lavallière et al. (2012) and more recently, by Casutt et al. (2014). The training offered to older drivers in the studies by Romoser and Fisher (2009) and Lavallière et al. (2012) was individualized. Specifically, in Romoser and Fisher (2009), it emphasized visual scanning at intersections while in Lavallière et al. (2012), it emphasized lane change behaviors (indicating a lane change, verification of the mirrors and blind spot prior to changing lanes). In both studies, drivers who received a passive training (no feedback while driving in the simulator and a classroom-like training) showed no improvement in their driving performance. These results (beneficial effect of an active simulator training for improving on-road performance) were replicated by Casutt et al. (2014) in a study where the training consisted of increasing the mental workload by gradually increasing the traffic frequency, the number of virtual drivers ignoring traffic rules and hazardous traffic situations and by providing specific vigilance training. The key result from these three studies is that training improvements observed in a simulator transferred to an improved on-road performance. Furthermore, Lavallière et al. (2011) and Romoser (2013) both reported long lasting effects (2 years post-training) for the on-road performance for most drivers that participated in the active training. These studies are important because they clearly support the suggestion that active training in a simulator can benefit the on-road driving performance. In the current study, we showed clear improvements within the first five sessions and there is a likelihood this training also yielded to safer on-road driving, at least on a short-term basis. Future studies are needed to examine specifically if the improved simulator performance translates into safer on the road driving for MCI individuals.

An important feature of this study was the 6-month recall session. As mentioned in the ''Introduction'' Section, this last session served the purpose to examine if the participants were able to maintain their improved performance observed within the first five sessions to a new context approximating what is needed in a real-world setting (i.e., driving alone without any additional verbal feedback on the performance; Lee, 1988; Schmidt and Bjork, 1992). Our results show the improvements observed after the first five sessions were not long lasting as

we observed a significant increase in the number of errors for four measures (speeding, omission to verify a blind spot, visual search at intersections with a stop sign, and omission to indicate a lane change). The number of tailgating events, which was small and did not vary significantly across the first five sessions, also increased significantly at the recall session. For instance, for speeding events, only one out of 13 participants maintained the level of performance observed at the fifth session (six for weaving, and seven for the omission to indicate a lane change). The performance at the 6-month recall, however, was not different than that observed at the first training session. Considering the large number of errors observed in session 1 and at the recall but the near absence of critical errors, this could indicate that MCI participants showed less than optimal performance. As mentioned in the ''Introduction'' Section, this fits the general description of the driving of MCI individuals both in simulator (Frittelli et al., 2009; Devlin et al., 2012), and on the road (Wadley et al., 2009). For instance, Devlin et al. (2012) reported an absence of significant differences when comparing the driving performance in a simulator of drivers with MCI to age-matched healthy drivers. In a previous simulator study, we also observed limited differences between MCI and healthy control drivers (Teasdale et al., 2016). The progression of MCI certainly could lead to a decreased performance. The present study opens up the possibility, however, that proper training could contribute to preserve and perhaps enhance driving competencies. This is an important result and future studies should aim at defining the optimal training conditions and regime for inducing safer driving in MCI individuals.

A large randomized, controlled clinical trial examining the long-term effectiveness of cognitive training on enhancing mental abilities (ACTIVE study, Unverzagt et al., 2009) showed that at a 2-year follow-up, MCI individuals did not benefit from interventions that were focused on declarative memory. Our study did not involve any training on issues such as trip planning/scheduling or navigation. These features of naturalistic driving clearly are associated with declarative memory (and with executive functioning) and individuals with MCI may experience more difficulties when driving involves these additional tasks. On the other hand, in the ACTIVE study, MCI individuals, just as healthy older adults, did benefit from training in reasoning and speed of processing, two qualities that are fundamental to driving. This is important as it indicates that future studies should consider establishing a list of driving skills that are pervious and impervious to training. Our study clearly shows, that several important driving skills can be improved in MCI individuals. This opens up possibilities to offer regular training to these individuals to maintain safe driving. At the same time, regular follow-ups may offer a window into the progression of cognitive decline and allow to better identify the limitations MCI drivers face.

The present study involved only 15 individuals with MCI. Although this is a limitation, it replicates previous findings observed for the first five sessions (Teasdale et al., 2016). More importantly, it shows that at a 6-month recall session, the MCI drivers were unable to maintain the level of performance they attained at the end of the training session (**Figure 2**). Our results also show large individual differences. These differences could be related to variability in the progression of cognitive decline and intrinsic differences in the driving style. Only three drivers showed a total number of errors smaller than 10 at the recall session. Two of these drivers exhibited safe driving throughout the training and at the recall (that is, less than 10 errors throughout all sessions). A third one who exhibited 20 errors at session 1 maintained the improved level of performance observed at the end of the training at the recall session. All other drivers showed a large number of errors at the recall session (with two drivers showing more than 50 errors). As a key question associated with the driving performance of MCI drivers relates to the impact of the progression of cognitive decline on the performance, a longitudinal study allowing to understand how the performance degrades with the progression of cognitive decline is much needed. A large randomized controlled trial with a control MCI group exposed to the simulator without any feedback and healthy control groups (with and without feedback) would provide additional and important details about

#### REFERENCES


limitations and capabilities of MCI individuals when compared to healthy older drivers. Finally, participants in this study were a convenience sample of individuals with MCI. It would be important to determine if the smaller number of women that volunteered is indicative of a fear of being tested in this population.

In conclusion, this study shows that MCI individuals can be trained in a simulator to improve their driving. This improvement, however, appears to be labile for most MCI individuals that participated. This suggests, that regular rehearsal may be needed to maintain the improved performance. This decreased performance, however, was not beyond the performance observed at the first training session indicating that the MCI individual that were tested had maintained a safe, but not optimal driving performance during this period. Simulator training could be an important means not only for maintaining safe driving in MCI individuals, but also to offer cost-effective and safe means of evaluating how the progression of cognitive decline affects driving until it becomes unsafe.

# AUTHOR CONTRIBUTIONS

All authors have fulfilled the criteria for authorship and have approved the final article.

## ACKNOWLEDGMENTS

This research has received support from the Alzheimer Society of Canada. CH and SD are both Research Scholars from the Fonds de la Recherche du Québec-Santé.


on driving ability: a controlled clinical study by simulated driving test. Int. J. Geriatr. Psychiatry 24, 232–238. doi: 10.1002/gps.2095


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Teasdale, Simoneau, Hudon, Germain Robitaille, Moszkowicz, Laurendeau, Bherer, Duchesne and Hudon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution and reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Cognitive Flexibility Training: A Large-Scale Multimodal Adaptive Active-Control Intervention Study in Healthy Older Adults

Jessika I. V. Buitenweg<sup>1</sup> \*, Renate M. van de Ven<sup>1</sup> , Sam Prinssen<sup>1</sup> , Jaap M. J. Murre<sup>1</sup> and K. Richard Ridderinkhof 1, 2

<sup>1</sup> Department of Psychology, University of Amsterdam, Amsterdam, Netherlands, <sup>2</sup> Amsterdam Brain and Cognition, University of Amsterdam, Amsterdam, Netherlands

As aging is associated with cognitive decline, particularly in the executive functions, it is essential to effectively improve cognition in older adults. Online cognitive training is currently a popular, though controversial method. Although some changes seem possible in older adults through training, far transfer, and longitudinal maintenance are rarely seen. Based on previous literature we created a unique, state-of-the-art intervention study by incorporating frequent sessions and flexible, novel, adaptive training tasks, along with an active control group. We created a program called TAPASS (Training Project Amsterdam Seniors and Stroke), a randomized controlled trial. Healthy older adults (60–80 y.o.) were assigned to a frequent- (FS) or infrequent switching (IS) experimental condition or to the active control group and performed 58 half-hour sessions over the course of 12 weeks. Effects on executive functioning, processing- and psychomotor speed, planning, verbal long term memory, verbal fluency, and reasoning were measured on four time points before, during and after the training. Additionally, we examined the explorative question which individual aspects added to training benefit. Besides improvements on the training, we found significant time effects on multiple transfer tasks in all three groups that likely reflected retest effects. No training-specific improvements were detected, and we did not find evidence of additional benefits of individual characteristics. Judging from these results, the therapeutic value of using commercially available training games to train the aging brain is modest, though any apparent effects should be ascribed more to expectancy and motivation than to the elements in our training protocol. Our results emphasize the importance of using parallel tests as outcome measures for transfer and including both active and passive control conditions. Further investigation into different training methods is advised, including stimulating social interaction and the use of more variable, novel, group-based yet individual-adjusted exercises.

Keywords: aging, cognitive training, executive functions, cognitive flexibility, videogames

#### *Edited by:*

Soledad Ballesteros, Universidad Nacional de Educación a Distancia, Spain

#### *Reviewed by:*

Ana M. Daugherty, University of Illinois at Urbana–Champaign, United States Julia Karbach, Goethe University Frankfurt, Germany

> *\*Correspondence:* Jessika I. V. Buitenweg j.i.v.buitenweg@uva.nl

*Received:* 12 May 2017 *Accepted:* 18 October 2017 *Published:* 01 November 2017

#### *Citation:*

Buitenweg JIV, van de Ven RM, Prinssen S, Murre JMJ and Ridderinkhof KR (2017) Cognitive Flexibility Training: A Large-Scale Multimodal Adaptive Active-Control Intervention Study in Healthy Older Adults. Front. Hum. Neurosci. 11:529. doi: 10.3389/fnhum.2017.00529

# INTRODUCTION

We live in a time of great societal changes in the Western world. Due in part to dramatic improvements in medical science, our aging population is expanding rapidly. As aging is associated with decreased cognitive functioning, the prevalence of age-related cognitive decline is an increasingly important issue. Decline of cognitive control, memory, and decision-making, among other functions, leads to greater dependence on family members and society. With recent increments of the retirement age in many countries, increasing numbers of older workers are expected to contribute to the workforce, but may cognitively fall behind. In order to ensure that older adults can live and work independently for as long as possible, research into possibilities of reducing this age-related decline of functioning is a pressing matter.

Enhancing cognitive functions or limiting their decline using cognitive training is currently a popular topic. Effectiveness of such trainings has been investigated with numerous intervention studies, for instance working memory training (Buschkuehl et al., 2008; Richmond et al., 2011; Rose et al., 2015), virtual reality training (Optale et al., 2010; Lövdén et al., 2012), and game training (Basak et al., 2008; Nouchi et al., 2012; van Muijden et al., 2012; Baniqued et al., 2013; Toril et al., 2016). Benefits of using at-home computer-based training programs are evident: they require no face-to-face contact, are easy to administer, and do not require traveling, which is especially advantageous when catering to more physically impaired individuals. Furthermore, they are cost efficient, and can be customized to a personal level in order to keep motivation optimal. In addition to the possible benefits for cognition, young and older adults also enjoy playing computer games in order to challenge themselves and for reasons of entertainment and—for certain games—social rewards (Allaire et al., 2013; Whitbourne et al., 2013). The gaming industry has conveniently caught on to this trend. As a result, countless commercial training websites and stand-alone applications offer a whole range of games that promise to contribute to cognitive reserve and slowed decline.

Research indicates, however, that not all types of games are enjoyed equally by the older population. Realistic first-person shooter games, though cognitively challenging, are perceived negatively by many older adults (Nap et al., 2009; McKay and Maki, 2010). Generally, casual games or games comprised of short mathematical- or memory activities are rated as most enjoyable and lead to higher compliance and beliefs about enhancement (Nap et al., 2009; Boot et al., 2013a).

Despite its popularity and market potential, the effectiveness of brain training remains a controversial topic. Results are inconsistent (Au et al., 2015; Dougherty et al., 2016) with some producing no transfer effects at all (Ackerman et al., 2010; Lee et al., 2012). Near transfer is often reported, especially after multitasking or task-switching designs (Karbach and Kray, 2009; Wang et al., 2011; Anguera et al., 2013) though far transfer is scarcely found (Green and Bavelier, 2008; Park and Bischof, 2013). Furthermore, a large variability in the degree of individual response to cognitive training is often observed (Langbaum et al., 2009; Melby-Lervag et al., 2016). For instance, general training benefit is often found to be dependent on higher age and lower baseline cognitive abilities, and in some cases on training gain and education (Verhaeghen et al., 1992; Bissig and Lustig, 2007; Langbaum et al., 2009; Zinke et al., 2014) although there is some evidence of increasing benefit after lower baseline scores (Ball et al., 2007; Whitlock et al., 2012).

We and others (e.g., Buitenweg et al., 2012; Slagter, 2012; van de Ven et al., 2016) raised a number of problematic issues often encountered in the training research literature. Among them were brief training periods (limited numbers of session/days/weeks), small sample sizes, absence of active control conditions, inapt competitive motivational incentives, and use of unimodal training tasks (incurring task-specific and even stimulus-specific rather than process-specific training benefits). On the basis of our review of optimal study design, training efficacy, and neurocognitive profiles of successful aging (Buitenweg et al., 2012), we suggested adding the elements of flexibility, novelty (Noice and Noice, 2008) and adaptiveness (Kelly et al., 2014) to training protocols to increase the chances of finding positive effects on cognitive functioning.

Due to the encountered issues in the literature, the current situation in the training field is inconclusive on training generalizability. We therefore created a unique, state-of-the-art, 12-week intervention study incorporating multimodal, novel, adaptive training games and frequent sessions. To induce flexibility, we transformed the idea of task switching training, which has lead to far transfer in Karbach and Kray (2009). We integrated switching between training games to create a more ecologically valid intervention, while using a number of switching tasks as our transfer measures. Besides this, we employed a number of measures with alternate (parallel) forms in order to minimize retest effects. In addition to including a number of essential elements, we are the first study adding flexibility as a key ingredient to training. We were especially interested in the question whether shifting attention between multiple functions during the training would transfer to decreasing switch costs. For this purpose, we required a task in which to present both alternating and repeating cues, which was possible using the switching paradigm previously used by Rogers and Monsell (1995). However, to evaluate effects on the entire construct of task switching, we combined additional measures in our secondary analysis. We included the clinically validated Delis-Kaplan Executive Function System—Trail Making Test and the online version of this task, in which (unlike in our main switch task) every response requires a switch, but participants still access basic knowledge of number- and letter systems. To incorporate a more ecologically valid measure of task switching, we also added the switch condition of the semantic fluency test, in which switching between activations of more covert representations is required.

We investigated whether an online training incorporating these crucial components can lead to transfer in an elderly population. Our training program consisted of two experimental conditions and an active control condition in a program called TAPASS (Training Project Amsterdam Seniors and Stroke). The TAPASS program has been used to determine the effects of cognitive flexibility training in stroke survivors by adding to the usual rehabilitation care (van de Ven et al., 2015). Here, we focus on effectiveness of this program in the healthy aging population. Experimental groups differed in flexibility, novelty, and adaptiveness. Higher flexibility was created by having the subject switch more often in a session between cognitive domains from game to game. High novelty implied exposure to more different cognitive domains within one session. Adaptiveness refers to the extent to which game difficulty can be adapted dynamically to an individual's performance. The frequent switching (FS) group scored high on flexibility, novelty, and adaptiveness. The infrequent switching (IS) group contained high novelty and adaptiveness but low flexibility, and the mock training (MT) scored low on all three features. We investigated whether there are benefits of the experimental training on cognitive functioning, and if so, whether the switching component adds extra value to these effects. In addition, we explored the question whether training efficacy is modulated by individual characteristics, such as age, baseline functioning, or education.

As the current intervention is especially focused on inducing flexibility, we expected transfer to occur in functions of executive control (Buchler et al., 2008; Karbach and Kray, 2009; Buitenweg et al., 2012). Based on a classification model by Miyake et al. (2000) these are often separated into updating, inhibition, and shifting (dual tasking and task switching). Therefore, our main analysis was centered around measures of these constructs. For reasons of equal comparison, for the primary analysis we selected tasks that were all administered by computer at the lab.

In our secondary analysis, we included additional assessments of working memory and task switching as well as tasks from other domains. Due to their dependence on the frontal lobe, planning and verbal fluency are often counted among the executive functions as well (Fisk and Sharp, 2004; Phillips et al., 2006; Lewis and Miller, 2007) and can be subject to decline in older adults (Auriacombe et al., 2001; Sullivan et al., 2009; Kim et al., 2013). For this reason, we chose to include measures within these domains. In addition to the executive functions, processing speed often declines in later life (Salthouse, 2000), though training with a similar intervention has been seen to lead to improvements in this domain (Nouchi et al., 2012). As most of our training tasks included fast paced, timed games, we were interested to see whether the training would generalize to measures of processing speed. Additionally, as using the computer mouse was an important part in this study in completing the training tasks as well as the transfer measures, we also decided to include tasks of psychomotor speed. Finally, two more functions often found to diminish in older adults are reasoning ability and verbal long term memory (Davis et al., 2003; Harada et al., 2013), which have also seen improvements after similar interventions (Au et al., 2015; Barban et al., 2015). Measures of these constructs have been added to our battery of transfer tasks.

The purpose of this study was to test the hypothesis that a 12-week cognitive flexibility training would improve cognitive functions in healthy older adults. We expected to see the largest transfer effects on executive control performance after the frequent switch training, smaller effects after the infrequent switch training, and little to no effects after the MT. We expected differences between conditions to be smaller, yet in the same direction, on performance of other domains.

# METHODS

#### Subjects

Our study entailed a randomized controlled double-blind design. Participants were recruited via media campaigns (pitch talks on regional radio stations and articles in local newspapers) and from a database of healthy older adults interested in participation in psychological research (www.seniorlab.nl). A total of 249 healthy participants signed up online on www.tapass.nl and were assessed for eligibility. Inclusion criteria included age above 60, willingness and cognitive ability to finish the 12-week training program, and daily access to a computer with internet connection. Exclusion criteria were a history of neuropsychiatric disorders, TIA or stroke, strongly impairing visual deficits, and colorblindness. Additionally, mental condition was estimated with the Telephone Interview Cognitive Status (TICS; Brandt et al., 1988): individuals scoring below 26 on this test were excluded. Eleven individuals did not fit the inclusion criteria and were excluded. Twenty-nine individuals withdrew before randomization, and another 51 before the first test session, due to health- and technical issues, or lack of time. The remaining 158 subjects were included in the final sample.

Subjects were randomly assigned to one of three conditions, with the exception of partners/spouses, who were always assigned to the same group. We minimized asymmetry in our three conditions using a minimization program (Minimpy; Saghaei and Saghaei, 2011) over the factors age, computer experience, TICS score, gender, and education. The minimization procedure was carried out by the principal investigators only. All subjects were given the same information regarding the intention of the experiment. They were told they would be placed in one of three different conditions, without explicit mention of a control condition. A schematic overview of the study design can be found in **Figure 1**.

Participants were compensated for travel costs and received free unlimited access to all games on www.braingymmer.com. Full written informed consent was given by all subjects prior to participation. The study was approved by the local Ethics Committee of the University of Amsterdam and registered under number 2012-BC-2566. All procedures were conducted in compliance with the Declaration of Helsinki, relevant laws, and institutional guidelines.

# Study Protocol

A battery of online tests (Neurotask BV, 2012) was devised to measure effects of the training at four points in time: at baseline (T0), after 6 weeks of training (T1), after 12 weeks of training (T2), and 4 weeks post-training (T3). On T0 and T2, subjects also visited the university for a series of neuropsychological tests and computer tasks, and a small set of cognitive tests was administered via a link in the email. Additionally, subjective effects were measured using a series of questionnaires at all time points, and a subgroup of participants underwent Magnetic Resonance Imaging (MRI) scanning at T0 and T2. Results of

these subjective and MRI measures will be reported separately. Testing on T0 and T2 was spread out over three different days, and on T1 and T3 over 2 days. Both the order of the test days for T0 and T2 and order of testing within the neuropsychological test battery were counterbalanced between subjects.

Neuropsychological assessments were conducted by a trained junior psychologist, who was blind to the training condition. As a check, neuropsychological assessors were asked to guess the condition of the subject. A separate test assessor administered four computer tasks, and introduced the training to subjects using instruction videos and a demonstration of the training platform and games. After their first visit to the university, subjects received a personalized instruction booklet with illustrations reminding them how to log on to the testing and training platforms, how to play each game, and how to report technical problems. It also gave useful information beyond the training, for example, how to download a new browser, and the importance of good posture during computer use. Subjects were assigned to a member of the research team who called them weekly to biweekly with standardized questions, would offer motivation and feedback and who could solve (technical) problems. Subjects were encouraged to email or call their contact with more urgent problems.

Subjects were requested to train five times a week for a half hour, on days and times of their choosing. Training activity was monitored. If no login was encountered for 2 days, an automatic email was sent to the subject. Subjects were encouraged to finish the training in 12 consecutive weeks. If training had to be interrupted for a period of more than 2 days, such as during a holiday, the missed trainings were added to the end of the 12 weeks.

After T3, participants filled out an exit questionnaire in which they were asked to rate the training and their own motivation. To verify blindness, we asked subjects to guess in which condition they had been included, in the case that one condition was less effective than the other. Subsequently, all subjects received login information for a lifetime account on the training website.

#### Intervention

All three training programs were based on the brain training website www.braingymmer.com. Games were originally programmed for the general population, but after running a pilot study, we altered the ones selected for our programs to fit the need of older participants. For example, many games commenced at too high speed and difficulty levels. This was adjusted in the research-dedicated "dashboard" version of the platform. In this platform, game presentation order was fully preprogrammed in order to prevent individuals from selecting their own tasks. Subjects had some extra time to finish after the time-period set aside for playing a certain game had been reached (e.g., 3 or 10 min) to prevent too abruptly ending a game.

All groups received the same amount and type of feedback after finishing a game or training session (see Supplementary Material 1). Additionally, all participants received standardized weekly to bi-weekly feedback and support from research team members who supervised them from baseline until 4 weeks post-training.

# Cognitive Training

We designed a cognitive training based on nine games in three domains: reasoning, working memory, and attention (see Supplementary Material 2). In designing our intervention, we chose not to include training games which too closely resembled any of our transfer tasks. Each game consisted of 20 levels, increasing in difficulty. The order of games was selected in such a way that two games following one another were never from the same domain, to optimize variability and flexibility.

Subject performance was rated with up to three stars at the end of each game block. Adaptiveness was implemented by asking subjects to continue to the next difficulty level when reaching two or three stars. In case a subject reached the highest level (20), he or she was asked to improve performance on previous levels with two stars.

Within the cognitive training we created two groups: frequent switching (FS) and infrequent switching (IS). In the FS group, one training session consisted of 10 games of 3 min each, thus requiring subjects to frequently switch to a task aimed to train a different cognitive function than the one before. In the IS group, three games of 10 min each were played so that switching between game domains occurred less frequently. In the first week only, in order for subjects to become familiar with the games, both groups played the games for 10 min each. By the end of the intervention, the time spent on each game was similar across participants in the FS and IS groups.

#### Mock Training

For the MT, we selected games that provided equal visual stimulation and feedback and put equal demands on computer ability, but that were reduced in variability, flexibility, and adaptiveness, compared with the experimental conditions (see Supplementary Material 3). We selected four games that all put minimal demands on executive functions. Per session, subjects played three games of 10 min each, thus minimizing the need for flexibility. Unlike the FS and IS conditions, the MT was not adaptive. Although higher levels could be unlocked in the same manner, participants in the MT were instructed to remain on the same level for a week before continuing to the next level, regardless of the number of stars they received on a game.

## Assessment Tasks

The effects of the flexibility training were estimated using pre- and post-measures on an extensive battery of computer tasks, neuropsychological paper-and-pen tests and computerized versions of these tests. For detailed task descriptions, see van de Ven et al. (2015).

## Principal Analysis

For the principal analysis we used the executive functions as distinguished by Miyake et al. (2000): shifting (task switching and dual tasking), updating, and response inhibition. These were assessed with four computerized tasks. Task switching and dual tasking performances were measured using modified versions of a commonly used switch task (Rogers and Monsell, 1995) and dual task (Stablum et al., 2007). The two tasks were combined to save time. Switch cost was calculated as the difference between reaction time on switch trials and no-switch trials in milliseconds, with higher switch cost signifying lower cognitive flexibility (Rogers and Monsell, 1995). Dual task performance was assessed by the reaction time on speeded responses of the dual trials (Stablum et al., 2007). Updating performance was measured using the N-back task as used by de Vries and Geurts (2014) including 0-back, 1-back, and 2-back blocks. Performance on this task was calculated by the difference between the percentage correct on 2-back and percentage correct on 0-back items (Kirchner, 1958). The stop-signal task (Logan et al., 1984) was used to measure inhibition. Stop-signal reaction time (SSRT) was calculated by sorting all correct Go-trial reaction times, taking the time corresponding to the percentage of correct stop trials, and subtracting the mean stop-signal delay (SSD) from this number (Logan et al., 1984).

#### Secondary Analysis

Effectiveness of the training on a larger scale was assessed by using neuropsychological tests from eight cognitive domains: task switching, psychomotor speed, processing speed, planning, reasoning, working memory, long term memory, and verbal fluency. In most domains we included multiple tests. For the RAVLT, letter fluency, category fluency with- and without switch condition, and Raven Progressive Matrices, we used alternate assessment forms. Where necessary, raw scores were recoded such that higher scores always represent better performance.

In the domain of task switching, we included the Delis-Kaplan Executive Function System—Trail Making Test (D-KEFS TMT; Condition 4), the Trail Making Test-B (TMT-B), and a separate switch condition of the semantic fluency task. The D-KEFS TMT concerned the number-letter switching subtask, with the performance score calculated as the total time in seconds to complete connecting letters and numbers in alternating order (i.e., 1, A, 2, B, etc.; (Delis et al., 2001). The TMT-B pertained to the online version of this task, with performance assessed by the total time in seconds to complete connecting letters and numbers in alternating order (NeuroTask BV). The switch condition of the semantic fluency task consisted of alternating listing as many words as possible from two separate categories (male names and supermarket items, or female names and cities, counterbalanced over participants) in 1 min. Outcome measure is the number of correct words in the switch condition, subtracted from the average number of correct words produced in the same categories without switching (Troyer et al., 1997).

For psychomotor speed, we used five tasks, four of which were assessed online. In the drag-and-drop task, participants were required to use their computer mouse to drag round or square shapes into an empty border. Outcome measure is the total time in milliseconds to complete the task (Neurotask BV). In the dragto-grid task, participants dragged 25 squares into a 5 × 5 grid using the mouse. Performance was assessed by the total time in milliseconds to complete the task (Neurotask BV). The click task required participants to click a spiral of circles of decreasing sizes using the mouse, with total time in milliseconds to complete the task signifying the outcome measure (Neurotask BV). The D-KEFS TMT (Condition 5) concerned the motor speed condition, with the performance score calculated as the total time in seconds to complete tracing a dotted line between a number of circles (Delis et al., 2001). The TMT-A pertained to the online version of this task, with performance assessed by the total time in seconds to complete connecting numbers (NeuroTask BV).

Processing speed was measured using the Digit Symbol Coding test (DSC; Wechsler, 2000) and an online version of this task (Neurotask BV). In this task, participants are required to pair a series of numbers to the correct symbol according to a given rule. Outcome measure on this task is the correct number of items completed in 2 min.

In planning, we used the Tower of London (ToL). This concerned the online version (Neurotask BV) based on the original task by Culbertson and Zillmer (2005), in which participants move colored beads from a starting position into the required position using a minimum amount of possible steps. Performance was assessed by the sum of the number of additional moves to solve the ToL, using a maximum score of 20.

In the reasoning domain, we included Raven's Progressive Matrices (Raven et al., 1998) as well as the Shipley Institute of living scale-2 (Zachary, 1991). For both reasoning tasks, the outcome measure we used was the total number correct on 20 items.

Working memory was assessed using two online tasks and three face-to-face tasks. A modified version of the Corsi block tapping task (Milner, 1971) was constructed for online assessment. Outcome measure was the longest correctly reproduced array of blocks (Neurotask BV). In the Paced Auditory Serial Addition Task (PASAT), participants needed to update the addition of numbers presented auditively. We administered two versions, in which numbers were delivered at a rate of, respectively, 3.4 and 2.8 s. As an outcome measure, we calculated the mean percentage correct of both versions (Gronwall, 1977). The Operation Span consisted of a series of letters presented sequentially that needed to be remembered while solving mathematical equations (Unsworth et al., 2005). Outcome measure for the Operation Span was the total number of correctly remembered letters. In Rey's Auditory Verbal Learning Test (RAVLT)–direct, participants were presented with a series of words auditively for five trials and recalled as many words as possible after each trial. We used the total number of words remembered after five trials as an outcome measure (Saan and Deelman, 1986). Lastly, in Letter Number Sequencing (LNS), participants were required to recall a series of numbers and letters in increasing or alphabetical order (Wechsler, 2000). For this measure, we used the total number of correct items.

In verbal long term memory, the delayed item of the RAVLT was used, in which the outcome measure was the total number of words recalled after a delay of 20 min (Saan and Deelman, 1986).

In the domain of verbal fluency, we used a semantic fluency task and a letter fluency task. In the semantic fluency task, participants produced as many words as possible in two different categories (male names and supermarket items, or female names and cities, counterbalanced over participants), each in 1 min (Thurstone, 1938). In the letter fluency task, participants produced as many words as possible starting with one of three different letters (P, G, and R on one time point, K, O, and M on the other time point, counterbalanced over participants), each in 1 min (Benton et al., 1989). For both tests, the outcome measure was the mean number of correct words.

To control for possible differences in fatigue and depression, we also examined baseline scores of the Checklist Individual Strength—Fatigue subscale (CIS-F) and the Hospital Anxiety Depression Scale—Depression subscale (HADS-D). The HADS-D (Zigmond and Snaith, 1983) measures subjective severity of depression and includes seven items on a four-point scale, with a maximum score of 21. The CIS-F (Vercoulen et al., 1997) measures subjective fatigue and the behavioral characteristics related to this concept. The scale consists of eight items, with scores ranging from 8 to 56. A score of 35 is regularly used as a cut-off to denote severe fatigue (Worm-Smeitink et al., 2017).

We designed an exit scale with four separate questions assessing perceived difficulty and enjoyment of the games, selfrated general cognitive enhancement, and whether participants would continue using the training. Although the scale is not validated, it serves as a necessary tool to judge participants' present and future view of the training. Participants rated these questions on T2 and T3, on a five-point Likert scale.

#### Training Performance

Training performance in all three groups was measured using a mean training z-score as well as a mean gain score between T0 and T2. Level high scores were calculated as a percentage of the maximal score on that level. Next, all were added up to a total game score for each training game. For the experimental conditions, domain scores were also made by averaging the three total scores within each domain, and a final score by averaging all three domain scores. For the MT, a final score was calculated by averaging over the four games. Subsequently, we computed a mean training score for all three training groups separately and transformed these to Z-scores to be able to compare MT and experimental training groups relative to each other. The gain score was calculated by subtracting the mean score attained after the first 10 min of playing from the mean score attained at the end of training.

#### Statistical Analysis

A first set of repeated-measures ANOVAs focused on the executive functions in the principal analysis, using time points T0 (baseline) and T2 (post-training). Scores on task switching and dual tasking, updating, and inhibition were used as dependent variables, with group (FS, IS or MT) as the independent variable. A second set of repeated-measures ANOVAs was carried out for the secondary measures. PASAT, ToL, TMT-A, and TMT-B were transformed due to non-normality. PASAT scores were raised to the 3rd power, a square root transformation was used on ToL data, and TMT-A and TMT-B scores were transformed using the formula 1/x0.14. When necessary, outcome measures were rescored so that a positive value indicated improvement. We computed correlations between significant transfer tasks and age, TICS score, and workouts completed to determine whether to add them as covariates to the primary and secondary ANOVAs. Education level required non-parametric correlation analysis (Spearman's Rho); all other measures used Pearson's correlation coefficient. To explore the extent to which individual characteristics influenced training benefits, significantly correlated covariates were added to a repeatedmeasures ANCOVA of the primary and secondary measures.

When a significant improvement was detected at T2 on one of the dependent variables also measured at T1 (after 6 weeks of training) and T3 (post-training follow up after 4 weeks), these were additionally added to the model to establish whether training effects were visible after 6 weeks of training, and whether they remained after training had ceased.

Grubbs' Extreme Studentized Deviation test was used to detect outliers (Grubbs, 1950). We ran analyses with- and without outliers. All reported results are without outliers, unless otherwise specified. IBM SPSS Statistics for Windows, version 22 (IBM Corp., Armonk, N.Y., USA) was used for all statistical analyses. Normality was checked using Shapiro-Wilk's test and by evaluating skewness and kurtosis. A p-value of 0.05 (twotailed if not mentioned otherwise) was considered significant. For all analyses, Bonferroni corrections for multiple testing were used. Greenhouse-Geisser corrected degrees of freedom were used whenever sphericity was violated, though for the purpose of legibility the original degrees of freedom are reported.

# RESULTS

Of the 158 subjects we tested on baseline, 1 person was excluded before starting the training due to difficulty understanding the transfer tasks, 5 experienced substantial health problems, 3 reported lack of time, 5 did not enjoy the training, and another 5 experienced technical issues. There was no difference in gender, TICS score, education level, or training group between the final sample and dropouts (all p's > 0.19) though there was a significant difference in age [sustainers M = 67.77, SD 5.0; dropouts M = 72.3, SD = 7.8, t(20, 134) = −2,454, p = 0.023]. The subsequent results are based on the remaining 139 subjects (age 60–80, M = 67.8, 60.4% female, mean years of education 13.7).

Because participants receiving MRI scans were only assigned to either the frequent switch training or the MT, these groups contain a higher number of participants than the infrequent switch condition. Fifty-six subjects were allocated to the frequent switch training, 33 subjects to the infrequent switch training, and 50 subjects to the MT. Before training, the three training groups did not differ in gender, level of education, TICS score, age, or computer experience (all p's > 0.26), as expected after minimization (see **Table 1**). There was also no difference in fatigue or depression (all p's > 0.48). Of the 14 participants whose fatigue scores exceeded the cut-off of 35, 5 were in the frequent switch condition, 5 in the active control and 4 in the infrequent switch condition. On the exit questionnaire, an equal number of people in each group reported having started new activities or training other than ours [χ²(2, <sup>N</sup> <sup>=</sup> 139) = 0.561, p = 0.77].

#### Intervention

Average number of completed training sessions was 57.1 (28.6 h) and this did not differ between training groups [F(2, 135) = 0.438, p = 0.65]. All three groups improved equally on all training tasks, judging from the z-scores [F(2, 135) = 0.192, p = 0.826], though in terms of total gain, the experimental conditions improved

TABLE 1 | Subject demographics.


Values are Mean ± SD; p-values are based on ANOVA unless otherwise specified; Prior computer use based on seven-point scale (from 1= less than once a month to 7= more than 4 h a day); Level of education based on seven-point scale (from 1= unfinished primary school to 7= university); FS, frequent switching experimental group; IS, infrequent switching experimental group; MT, mock training group; TICS, Telephone Interview for Cognitive Status; CIS-F, Checklist Individual Strength–Fatigue subscale; HADS-D, Hospital Anxiety Depression Scale-Depression subscale. <sup>a</sup>p-value based on χ².

significantly more than did the active control [F(2, 135) = 6.698, p = 0.002].

Although the active control condition was asked to maintain a single game level for a set week, we discovered that many active control participants continued playing beyond this level, thus diminishing differences in adaptiveness between our training groups. It appeared that 42% of control participants played more than 10% of their training time beyond the highest allowed level (level 9), and 26% played more than 30% of their time beyond this level. Besides this, 25 subjects (39% of IS subjects, 21% of FS subjects) scored a maximal number of points at the highest possible level on one or two of the nine games, thereby compromising adaptiveness among both experimental groups. Although many of these cases occurred only in the last few weeks of the training period, these events may have led to suboptimal differences between the MT and the experimental conditions.

Subjects were told after participation that we had made use of two conditions: one of which we expected would be less effective than the other. When asked whether they believed they had been in the more effective or less effective condition, participants were more likely to assume they had been in the experimental condition: 71% of FS and IS and 59% of MT expected they had received our more effective training. Neuropsychological assessors did not guess subjects' training group above chance level, both before training [39%; χ 2 (4, <sup>N</sup> <sup>=</sup> 105) <sup>=</sup> 2.73, <sup>p</sup> <sup>=</sup> 0.60] and after training [33%; χ 2 (4, <sup>N</sup> <sup>=</sup> 105) <sup>=</sup> 4.07, <sup>p</sup> <sup>=</sup> 0.39]. We can therefore assume that both neuropsychological assessors and participants themselves remained blind to the training conditions.

Besides this, there was no difference between the three conditions in the perceived difficulty or enjoyment of the games, self-rated cognitive enhancement, number of completed training sessions, or the degree to which they would like to continue playing the games (all p's > 0.28), showing that all interventions were enjoyed equally.

#### Transfer Effects

The statistics of the primary and secondary analyses reported below are detailed in **Table 2**. Main effects of Group were absent throughout and will not be discussed further. ANCOVA outcomes are reported where appropriate.

# Principle Analysis: Executive Functions

On task switching, all three groups significantly improved their scores over Time, but a Time <sup>∗</sup> Group interaction did not reach significance. A similar pattern was seen on the dual task, with a main effect of Time that was not modulated by Group. This time effect disappeared when correcting for the number of workouts [F(1, 128) = 0.721, p = 0.397, ï 2 <sup>p</sup> = 0.006]. Time effects for N-back and Stop-signal task did not survive Bonferroni correction, and no modulation by Group was found. Equivalent results appeared when outliers were included.

#### Secondary Analysis

Most of the secondary measures were subject to improvement with Time, as described below, but none of these effects were modulated by Group. Performance on 2 out of 3 cognitive flexibility tasks improved over time for all three groups. Both the DKEFS TMT and the online version of the TMT-B showed decreased switching latency. No significant effects were found for performance on the switch condition of the semantic fluency task. Psychomotor speed improved on the online TMT-A and the three mouse ability tasks (Click, Drag-and-drop, and Dragto-grid). The motor speed condition of the DKEFS TMT did not show a significant Time effect. Processing speed improved in both the original DSC as the online version of the task. A significant Time effect was found for the ToL. In the reasoning domain, a significant Time effect was observed for the SILS. The score on the RPM did not change significantly. On tasks of working memory, both PASAT and RAVLT-direct improved over Time, whereas no change appeared on the online Corsi or the Operation Span. The score on the LNS was not significant after Bonferroni correction. No Time effect was found on long term memory measured with the RAVLT-delay. Finally, both semantic and letter fluency did not improve over Time for any of the groups. With outliers included in the data, the results showed the same pattern.

All Time effects disappeared when correcting for age, baseline TICS score, number of completed training sessions, or education level, regardless of whether one or multiple covariates were used.

#### Follow-Up Effects

As specific training effects were lacking, T1 measurements were not examined. An explorative repeated-measures ANOVA was run for tasks which did exhibit a significant Time effect and had also been administered at T3. All of these measures improved even further on T3, revealing higher effect sizes for all tasks [Switch task: F(1, 125) = 59.167, p < 0.001, ï 2 <sup>p</sup> = 0.32; ToL: F(1, 115) = 37.237, p < 0.001, ï 2 <sup>p</sup> = 0.25; online DSC: F(1, 100) = 101.421, p < 0.001, ï 2 <sup>p</sup> <sup>=</sup> 0.50; TMT-A: <sup>F</sup>(1, 112) <sup>=</sup> 43.755, <sup>p</sup> <sup>&</sup>lt; 0.001, <sup>ï</sup> 2 p = 0.28; TMT-B: F(1, 112) = 53.234, p < 0.001, ï 2 <sup>p</sup> = 0.32; Click task: F(1, 112) = 16.933, p < 0.001, ï 2 <sup>p</sup> = 0.13; Drag-and-drop: F(1, 109) = 39.465, p < 0.001, ï 2 <sup>p</sup> = 0.27; Drag-to-grid: F(1, 113) = 60.085, p < 0.001, ï 2 <sup>p</sup> = 0.35]. However, these Time effects did not interact with Group for any of these measures, and no


2


Transfer

results.

N-back in % correct.

 ,

Time effects remained when correcting for age, education level, baseline TICS score, and number of completed training sessions, regardless of whether one or multiple covariates were used.

#### Extra Analyses

We added an extra analysis, examining a possible interaction between Time, Group, and switch task trial type (switch- and non-switch trials). This three-way interaction was not significant [F(6, 360) = 0.233, p = 0.943, ï 2 <sup>p</sup> = 0.004].

To examine whether there was a significant difference in training benefit of Group after adjusting for baseline performance, we ran a separate number of ANCOVA's using difference scores on all measures, including baseline scores as a covariate. There was a difference in score on the Raven's Progressive Matrices [F(2,128) = 4.111, p < 0.019, ï 2 <sup>p</sup> = 0.060] between the frequent- and infrequent switch conditions, when adjusting for baseline RPM score. However, this value did not survive Bonferroni correction.

# DISCUSSION

We investigated the possibility to train cognitive functioning in older adults using a computerized cognitive training. For this purpose, we designed an intervention with multimodal, novel, adaptive training tasks, a built-in element of flexibility, and frequent training sessions to optimize transfer, and selected a number of transfer tests with parallel forms to minimize retest effects. Based on previous literature (Mahncke et al., 2006; Karbach and Kray, 2009; Düzel et al., 2010) we expected far transfer to several executive functions. Improvement over time was found on training tasks as well as on multiple transfer tasks covering all domains. Our primary analyses showed that older adults benefited from training across the main domains of executive function (updating, shifting, inhibition; Miyake et al., 2000). Our secondary analyses partially confirmed these findings: improvements were seen in planning, reasoning, two out of three cognitive flexibility tests, two out of five working memory tests, and two out of three psychomotor speed tests; while no improvement was observed for IQ, long-term memory, and fluency. Improvements were further amplified 4 weeks after training completion.

Most importantly, however, the experimental training that capitalized on flexibility, novelty, and adaptiveness as central features did not lead to more progress than the trainings without these elements. This suggests that there was no additional advantage of these key ingredients in training tasks, and that improvements were induced mainly by other causes.

On outcome measures where a covariate was included, all time effects disappeared. Although covariates were added only if a significant correlation with a measure occurred, on plotting the covariate data it appeared that different values of each covariate affected the various measures differently. This suggests that the covariation effects were not systematic across covariates, and therefore did not add to the model to explain training effects.

Our study had a number of limitations, some of which it shares with similar studies in the literature.

# Generic Factors: Motivation, Expectation, and Placebo Effects

The effects on training benefits from training-non-specific factors such as attention, motivation, expectancy, and placebo may have played a larger role than anticipated. Long-term intensive training interventions are accompanied by degrees of personal attention as well as motivation that in themselves may suffice to enhance cognitive performance. Moreover, such programs may induce an expectancy to improve, which has proven to generate powerful placebo effects across a wide range of domains and paradigms (Boot et al., 2013b; Dougherty et al., 2016; Foroughi et al., 2016).

Thus, one explanation also for the present set of findings is that of a placebo or subject-expectancy effect. In the information booklet that aspirant participants received at the beginning of the study, we informed them about our intention to investigate whether benefit from training was a possibility, and stated our hope to find positive effects on cognitive functioning. Although it also explained that we were not sure whether this would be the case, we might have inadvertently given subjects the notion that we expected benefit, thus leading them to put extra effort in post-training performance. The finding that a majority of our participants had assumed to be in the experimental condition, may support this notion. A similar pattern has been observed in Foroughi et al. (2016), in which subjects who had responded to a suggestive flyer displayed more improvement on cognitive functions, compared to a control group responding to a nonsuggestive flyer. Elsewhere, we will report on how participants perceived their progress subjectively. If a subject-expectancy or placebo effect has indeed influenced the current results, these improvements might appear in their subjective reports, shedding more light on the question of overall time-based improvement.

A similar, but slightly different interpretation is that of the Hawthorne effect (Green and Bavelier, 2008), referring to subjects' tendency to perform better on tasks when they are working toward a common goal or when a need for attention is satisfied by participating in research. In our case, as all three conditions received a certain amount of social stimulation, this might have led to increased motivation to perform well on the post-training measurements.

# Potential Limitations: Challenge Levels, Group Composition, and Social Cohesion

Based on previous studies using computerized training (Basak et al., 2008; van Muijden et al., 2012; Ballesteros et al., 2014; Kühn et al., 2014), we assumed that using the current set of nine games—with 20 levels each—would provide ample variation and challenge for 12 weeks. Nonetheless, some evidence suggests this challenge was not always met in our experiment. First, a considerable number of participants found themselves reaching the maximum score for at least one of the games within a number of weeks before the end of the training, diminishing adaptiveness in these groups. In addition, on the exit questionnaire, some participants in the frequent switch training commented that the training was too simple or repetitive, or specifically criticized adaptiveness, reflecting that learning in the first half of levels was too gradual and in the second half too steep; we did not answer to all personal needs for challenge. An improvement for future studies is to implement more variable and novel activities tailored to individual demands to further optimize performance increases. Yet, by and large, most participants experienced the tasks as aptly challenging, with the levels of variability and adaptiveness contributing to that experience.

Training in our active control group, on the other hand, might have been too challenging. We had meant to generate novelty only in the frequent- and in FS conditions, and assumed that by selecting only four, less multimodal, games in the active control condition, novelty would be minimal. However, for many participants (across groups) playing games seemed in itself to be a sufficiently novel activity to incur small cognitive effects. Many participants had not previously used a mouse in the relatively fast manner that was necessary in our games and computer tasks. The strong transfer effects to mouse ability tasks in all three groups supports this assumption. Another point to consider is that the limitation on adaptiveness in the control condition was compromised by the fact that many participants continued past the maximally allowed weekly game level, causing the control condition to be more challenging than intended. Thus, our control condition may have unintentionally targeted similar functions as in the experimental conditions. This is especially evident when comparing our design to those of other studies. Overall, many training studies that find more evident transfer than the current study have employed active control conditions that appear distinctly less active than the experimental conditions, spending fewer hours on assigned tasks and having markedly less interaction with the researchers. Some only use a passive control condition, or none at all. Our results stress the value of including an active control condition that receives equal attention and training time, yet creates no overlap in the engagement of functions.

Furthermore, for many of our subjects, participating in the training involved more than just playing the games and may have included aspects such as following a link in an email to get to the online test batteries, downloading a new browser, (later) starting up the correct browser, and navigating to the right page. Although all of our subjects used their computer regularly and knew about basic internet use, such actions beyond the training itself often exceeded those of their usual activities and, thus, may have constituted a type of unintended cognitive training.

However, results from our stroke sample, described elsewhere (van de Ven et al., 2017), suggest that this has had only a minor effect. In this sample, we investigated effects of the TAPASS training in recovering stroke patients, including a no-contact waiting list condition. This group showed equal improvements to the experimental intervention and active control, including mouse ability tasks. This suggests that playing games or increased use of a mouse could not have been the main factor behind the transfer effects that we found. Instead, the improvements in this study, appearing in all three training groups, are more likely to have been caused by retest effects. Almost all tests with parallel forms did not reach significance in any of the groups. Also, there was no indication that improvement was limited to specific cognitive processes, as transfer effects were not exclusive to specific domains. Testing frequency thus seems to be the most important factor underlying these time effects. Although we included parallel tests where available, future studies might benefit from using parallel tests only, to minimize these retest-effects. Furthermore, using the statistical analysis used at present, we have not fully been able to uncover further knowledge of the individual differences in training benefit. More thorough analyses are necessary to provide additional insight into the individual learning processes and contribute to future interventions.

Our design largely lacked social interaction with other participants, which might have provided additional stimulation (Ybarra et al., 2008; Charles and Carstensen, 2010). Also, although the focus for this project was on the effectiveness of the popular home-based training tasks, some recent evidence reveals that for non-impaired older adults, individual at-home training might not be as effective as group training (Kelly et al., 2014; Lampit et al., 2014) or training sessions provided in the lab (Basak et al., 2008; Lövdén et al., 2012; Ballesteros et al., 2014). Among reasons given are optimization of adherence and compliance, as well as providing motivation to master more difficult training tasks. Possibly, participants in these studies may have benefited more from the training due to this procedure, producing more conspicuous results than the home-based training method presented here. Yet, as we contacted participants frequently with motivational telephone calls, it is unclear whether increased transfer in the experimental conditions would have occurred, if subjects had received face-to-face support from a trainer instead.

In our current study, we used a set of commercially available games targeted at the general population to train their cognitive functions. Naturally, commercial- and scientific games are created with different intentions in mind, yet in this case, we expect this to be less of a concern, as we took care to adapt each game, as well as the design of the intervention, to fit our scientific objectives, with the additional benefit of generalizing more to other functions than the frequently used commercial games.

A potential methodological limitation of this study was the homogeneity of our elderly sample, with a high educational level and relatively few cognitive complaints. This is characteristic of participants interested in volunteering in research experiments, increased further by the inevitable self-selection due to our inclusion criteria, such as the requirement to own a modern computer and to be willing to spend 12 weeks on our training. This raises the question to what degree cognitive improvement could have been attained in a sample with ample daily cognitive stimulation and minimal need to improve functions. A sample of older, less fit individuals might be more representative in displaying the benefit for the population. However, logistically it is difficult to encourage lower-educated, more cognitively impaired individuals to participate in research, let alone spend a sufficient amount of time on such an intervention. Despite subjects' demographic homogeneity, we noticed a large test score variability within groups, overshadowing any differences between them.

# CONCLUSION

Our cognitive flexibility training, using elements based on previously effective cognitive interventions, did not produce the expected near- and far transfer. Although training benefits were observed almost across the board, equal effects appeared in the active control group. Taken at face value, our results with commercially available training games suggest that this type of training may yield cognitive benefits among older adults. In our experimental design, however, we could not disentangle training effects from those attributable to test practice, expectancy, and motivation. Our parallel study with recovering stroke patients (van de Ven et al., 2017), which included a wait list condition, suggests that such factors may well have overshadowed the beneficial effect of the training itself and that training effects on cognition could be rather small. Additional investigation into different training methods is advised, including stimulation of social interaction and the use of more variable, novel, groupbased yet individual-adjusted activities. Our results further emphasize the importance of using parallel forms as outcome measures for transfer and including both passive and active control conditions.

As a future direction, we may observe that a thus far underexplored territory pertains to individual differences in "trainability" or the susceptibility to benefits from particular aspects of a training. For instance, it may prove fruitful to explore which cognitive or neural connectivity profiles are predictive of who will improve in what domain. If brain training is to be successful in a meaningful way, we first have to learn more about which determinants are key in tailor-made interventions to maximize far transfer.

#### ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the guidelines formulated by the Ethics Review Board of the Faculty of Social and Behavioral Sciences,

#### REFERENCES


University of Amsterdam, The Netherlands, with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the aforementioned Ethics Review Board of the University of Amsterdam.

#### AUTHOR CONTRIBUTIONS

JB and RvdV contributed to conception and design of the study and were responsible for data collection. JB and SP analyzed the data. JB interpreted the data and wrote the manuscript. JB, RvdV, SP, JM, and KR critically revised the article and approved this version to be published.

# FUNDING

This project is part of the research program "Evidence-based adaptive brain training in seniors: Effects of brain structure and dopaminergic system on individual differences in trainability" funded by the Netherlands Initiative Brain and Cognition (NIHC), a part of the Netherlands Organization for Scientific Research (NWO) under grant number 056-12-010. This quickresults project is embedded in the pillar "Healthy cognitive aging."

#### ACKNOWLEDGMENTS

The authors would like to thank all participants and their relatives for participating in the study; all students for assisting in recruitment, testing and coaching of participants, and Dezzel media for making Braingymmer available for our study.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum. 2017.00529/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Buitenweg, van de Ven, Prinssen, Murre and Ridderinkhof. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.