# MUSIC TRAINING, NEURAL PLASTICITY, AND EXECUTIVE FUNCTION

EDITED BY : Claude Alain, Assal Habibi and Paul J. Colombo PUBLISHED IN : Frontiers in Neuroscience, Frontiers in Integrative Neuroscience and Frontiers in Psychology

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-047-6 DOI 10.3389/978-2-88966-047-6

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# MUSIC TRAINING, NEURAL PLASTICITY, AND EXECUTIVE FUNCTION

Topic Editors:

Claude Alain, Rotman Research Institute (RRI), Canada Assal Habibi, University of Southern California, Los Angeles, United States Paul J. Colombo, Tulane University, United States

Citation: Alain, C., Habibi, A., Colombo, P. J., eds. (2020). Music Training, Neural Plasticity, and Executive Function. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-047-6

# Table of Contents


Sélim Yahia Coll, Noémi Vuichoud, Didier Grandjean and Clara Eline James


Andréanne Sharp, Marie-Soleil Houde, Benoit-Antoine Bacon and François Champoux

*63 Dynamic Orchestration of Brains and Instruments During Free Guitar Improvisation*

Viktor Müller and Ulman Lindenberger


Francis A. M. Manno, Raul R. Cruces, Condon Lau and Fernando A. Barrios


Katri Annukka Saarikivi, Minna Huotilainen, Mari Tervaniemi and Vesa Putkinen

*142 New Perspectives on Music in Rehabilitation of Executive and Attention Functions*

Yuko Koshimori and Michael H. Thaut


## Editorial: Music Training, Neural Plasticity, and Executive Function

Paul J. Colombo<sup>1</sup> \*, Assal Habibi <sup>2</sup> and Claude Alain<sup>3</sup>

*<sup>1</sup> Department of Psychology, School of Science and Engineering, Tulane University, New Orleans, LA, United States,*

*<sup>2</sup> Brain and Creativity Institute, Department of Psychology, University of Southern California, Los Angeles, CA, United States, <sup>3</sup> Rotman Research Institute, Department of Psychology, University of Toronto, Toronto, ON, Canada*

Keywords: executive function, cognitive control, neural plasticity, music training, prefrontal cortex, neural oscillation, absolute pitch, longitudinal study

**Editorial on the Research Topic**

#### **Music Training, Neural Plasticity, and Executive Function**

Music training is rapidly emerging as an important model system for investigating experience-dependent brain plasticity, and also as the basis for robust sensory, motor, and cognitive therapeutic interventions. This special topic contains articles describing theoretical and methodological advancements that further our understanding of relationships between musicianship, music training and executive function, and many of the articles delve into the neural mechanisms that drive these relationships. It is well-established that music training is associated with sensory, motor, and cognitive benefits. As described in the articles in this collection, investigators are characterizing how elements of musical abilities (e.g., absolute pitch and relative pitch processors) and training (e.g., amateur vs. professional), are related to components of executive function including inhibitory control, working memory, and cognitive flexibility. In several reports, new analytical methods are being implemented to probe neural mechanisms of plasticity associated with music training and performance.

## MUSIC TRAINING INTERVENTIONS IMPROVE EXECUTIVE FUNCTIONS IN RELATION TO BRAIN PLASTICITY: CHILDHOOD DEVELOPMENT AND OLDER ADULTS

#### Edited and reviewed by:

*Elizabeth B. Torres, Rutgers, The State University of New Jersey, United States*

> \*Correspondence: *Paul J. Colombo pcolomb@tulane.edu*

Received: *02 June 2020* Accepted: *30 June 2020* Published: *07 August 2020*

#### Citation:

*Colombo PJ, Habibi A and Alain C (2020) Editorial: Music Training, Neural Plasticity, and Executive Function. Front. Integr. Neurosci. 14:41. doi: 10.3389/fnint.2020.00041* Systematic manipulations of musical experience are the key to making causal statements about their effects on sensory, motor, and cognitive outcomes, and several investigators used this approach. For example, Dubinsky et al. report that short-term choir singing improves speech-in-noise perception and pitch discrimination among older adults with hearing loss, and improvement is related to the strength of the frequency following response, a neural representation of auditory stimuli. In another study, older adults received different types of musical experience, including music listening, piano training, or percussion training. Both of the active music groups outperformed the listening group in bimanual synchronization and visual scanning/working memory, and piano training significantly improved motor synchronization skills in comparisons with the percussion training or listening groups (Bugos). Frischen et al. also tested effects of different types of music training by assessing several measures of executive functions, then randomly assigning preschoolers to rhythm-based or pitch-based music training. Inhibition improved from pre- to post-test among children who received rhythm-based training, but not pitch-based training or a sports control, and similar numeric differences were found for measures of set shifting and visuospatial working memory. Taken together, these studies demonstrate that music training causes improvement of sensory, motor, and cognitive control processes among children and older adults, and are not merely reflections of preexisting neurocognitive differences. Moreover, they begin to dissociate components of musical training that may lead to improvement of specific cognitive processes. With regard to very early childhood development, Loewy and Jaschke identify how parameters of music such as timing, timbre, and repetition may influence cognitive development and neural plasticity in therapeutic interventions with neonates. The advancements noted above are bolstered by results of longitudinal studies of musically trained children in which working memory (Saarikivi et al.) and inhibitory control (Hennessy et al.) were assessed in relation to years of training. Musically trained children outperformed controls on the trails A and B tests, and forward digit span, but not backward digit span, suggesting that music training may selectively influence working memory capacity and maintenance more than with updating (Saarikivi et al.). Hennessy et al. report that children with 3 years of music training chose a larger, delayed reward in place of a smaller, immediate reward when compared to children without music training, indicating enhancement of delayed-gratification measures of inhibition. In the flanker task, children in the music group improved their performance accuracy parallel to increasing years of training, while such improvements were not observed in in the groups without music training. The groups were matched at the onset of the study to have no differences among them in cognitive capacities, providing evidence that systematic musicbased training accelerates development of inhibitory control in children.

## COMPARISONS BETWEEN ADULT MUSICIANS AND NON-MUSICIANS REVEAL MECHANISMS OF ENHANCED EXECUTIVE FUNCTIONS

The articles cited above provide new evidence for behavioral and neural mechanisms of cognitive benefits caused by music training during early childhood development, and among older adults. A parallel approach, taken by several contributors, compared behavioral performance on control processing tasks and neural measures of activity among adult musicians and non-musicians (Manno et al.; Sharp et al.; Sharma et al.; Coll et al.; Criscuolo et al.). This approach yielded insights into the mechanisms by which musical experience may enhance cognitive processes. For example, musician advantages in emotion processing are related to greater use of temporal fine structure information (Manno et al.), and to better identification of complex emotional content in both the auditory and tactile sensory modalities (Sharp et al.). Sharma et al. presented non-musicians, and musicians with either relative or absolute pitch, with three different versions of an auditory Stroop task to measure conflict resolution. The pitch-label association ranged from simple semantic associations (i.e., "Low" or "High") to intermediate verbal encodings with no obvious semantic properties (i.e., "Doh" or "Soh") to more abstract semiotic associations (i.e., "C" and "G"). The neural activity indexing conflict detection for abstract pitch label (i.e., musical notation) was present only in musicians with absolute pitch, consistent with a strong automaticity in retrieving the pitchlabel association. Coll et al. also showed greater brain electrical source activity in left temporal junctions in musicians with absolute pitch, which could play a part in the automatic retrieval of pitch-label associations. In addition, Criscuolo et al. show that associations between musical experience and enhanced cognitive abilities, which are frequently reported among children, are also evident among adults after controlling for potential confounding variables including age, education, socio-economic status, and personality variables. Musicians show higher general intelligence, verbal intelligence, working memory, and attention than non-musicians, while amateur musicians score in between. It is notable that the reported correlations between years of musical playing and cognitive abilities support the hypothesis that musical practice is associated with intelligence and executive functions.

## NEURAL MECHANISMS OF SENSORY AND COGNITIVE PROCESSING AMONG MUSICIANS

In addition to innovative behavioral and cognitive assessments, several investigators reported neurophysiological effects of engagement in auditory and musical processes among musicians. In one striking example, Müller and Lindenberger developed a method to study intra- and inter-brain synchronization, or so-called extended hyper-brain networks, by measuring phase synchronization between transformed acoustic recordings of guitar signals and raw EEG signals of guitarists freely improvising in duet. Of importance, this form of timefrequency analysis incorporates the dynamic interaction of both musicians' production and responses to music. As reviewed by, Yurgil et al. the study of neural oscillations is ideal for investigating neural processes occurring over durations of time spanning seconds to minutes, and particularly suitable for investigating components of executive function such as temporal stages of working memory. In addition to studies of neural oscillations in relation to musical experience and executive functions, event-related potentials yield more temporally discrete information about cortical functions. As an example, Matsuda et al. report a dissociation of auditory cortical potentials in relation to music training and absolute pitch, indicating distinct stages of pitch processing in addition to hemispheric specialization of auditory cortical functions. Salvari et al. examined the functional connectivity among regions of the auditory pathway in processing natural, musical, and artificial non-speech sounds, by means of MEG. The different categories of sounds produced differences in activation and interconnection in prefrontal areas, anterior-superior temporal gyrus, posterior cingulate cortex, and supramarginal gyrus, demonstrating an extended brain network of human-made musical and artificial sounds when compared to natural sounds. A summary of behavioral, neuroimaging, and neurophysiological studies in musicians, non-musicians, and clinical populations is taken up in the Perspective article by Koshimori and Thaut in which they discuss potential for executive function and attentional processing stimulation and neurorehabilitation.

## SUMMARY

Taken together, the articles in this Research Topic represent significant advances in our understanding of relationships between musical training and control processes that comprise executive function. They illustrate the range of research questions that are being addressed by application of neuroscientific methods and technology to the universal cultural phenomenon of musical experience. They represent development of a model system for investigating long-term experience-dependent neuronal plasticity, and they bring into focus components of musical experience that may be targeted for specific therapeutic interventions aiming to benefit individuals across their lifespan.

## AUTHOR CONTRIBUTIONS

All authors drafted, revised, and approved the final version of the manuscript for submission.

## FUNDING

This work was supported by the Phyllis M. Taylor Center for Social Innovation and Design Thinking at Tulane University.

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Colombo, Habibi and Alain. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Electrical Neuroimaging of Music Processing in Pianists With and Without True Absolute Pitch

Sélim Yahia Coll <sup>1</sup> , Noémi Vuichoud<sup>1</sup> , Didier Grandjean<sup>1</sup> and Clara Eline James 1,2,3 \*

<sup>1</sup> Neuroscience of Emotion and Affective Dynamics Laboratory, Faculty of Psychology and Educational Sciences and Swiss Centre for Affective Sciences, University of Geneva, Geneva, Switzerland, <sup>2</sup> School of Health Sciences Geneva, HES-SO University of Applied Sciences and Arts Western Switzerland, Geneva, Switzerland, <sup>3</sup> Geneva Neuroscience Center, University of Geneva, Geneva, Switzerland

True absolute pitch (AP), labeling of pitches with semitone precision without a reference, is classically studied using isolated tones. However, AP is acquired and has its function within complex dynamic musical contexts. Here we examined event-related brain responses and underlying cerebral sources to endings of short expressive string quartets, investigating a homogeneous population of young highly trained pianists with half of them possessing true-AP. The pieces ended regularly or contained harmonic transgressions at closure that participants appraised. Given the millisecond precision of ERP analyses, this experimental plan allowed examining whether AP alters music processing at an early perceptual, or later cognitive level, or both, and which cerebral sources underlie differences with non-AP musicians. We also investigated the impact of AP on general auditory cognition. Remarkably, harmonic transgression sensitivity did not differ between AP and non-AP participants, and differences for auditory cognition were only marginal. The key finding of this study is the involvement of a microstate peaking around 60 ms after musical closure, characterizing AP participants. Concurring sources were estimated in secondary auditory areas, comprising the planum temporale, all transgression conditions collapsed. These results suggest that AP is not a panacea to become a proficient musician, but a rare perceptual feature.

Keywords: complex music, electrical source imaging, ERP microstate analysis, trained pianists, true absolute pitch

### INTRODUCTION

The present study applied electrical neuroimaging to investigate the influence of true absolute pitch on processing of complex classical music in pianists trained in the western classical repertoire. The research is a follow-up of 3 previous studies that showed progressive processing changes with musical training intensity and proficiency using the same musical stimuli (Oechslin et al., 2013; James et al., 2017; Jenni et al., 2017) and allows to answer the question whether true-absolute pitch offers additional advantages for music processing.

Absolute pitch (AP) perception is the rare faculty to identify or produce instantaneously any musical tone or other sound according to the 12-tone equal temperament, in the strictest sense

#### Edited by:

Claude Alain, Rotman Research Institute (RRI), Canada

#### Reviewed by:

Peter Schneider, Universität Heidelberg, Germany Lutz Jäncke, University of Zurich, Switzerland

> \*Correspondence: Clara Eline James clara.james@hesge.ch

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 06 December 2018 Accepted: 07 February 2019 Published: 13 March 2019

#### Citation:

Coll SY, Vuichoud N, Grandjean D and James CE (2019) Electrical Neuroimaging of Music Processing in Pianists With and Without True Absolute Pitch. Front. Neurosci. 13:142. doi: 10.3389/fnins.2019.00142

**8**

with semitone precision, without external reference (Miyazaki, 1988; Takeuchi and Hulse, 1993; Levitin and Rogers, 2005; Elmer et al., 2015). In AP possessors, this tone identification is performed automatically and cannot be suppressed (Akiva-Kabiri and Henik, 2012).

Levitin and Rogers (2005) propose an alternative more clement model of "latent" or quasi-absolute pitch (QAP). These authors could show that almost half of the non-musician Western population is able to sing pitches of familiar pop songs from memory with at least whole tone precision (Levitin, 1994). Although this finding is highly interesting in the context of musicality as a human universal, the current research focused exclusively on semitone precision AP with very high accuracy. This extreme or "true" form of AP also called AP-1 (Baharloo et al., 1998, 2000) or "genuine" AP (Bachem, 1937, 1955) constitutes a distinct and truly exceptional capacity, with an "all-or-nothing" quality (Profita and Bidder, 1988) that should be distinguished from QAP. Moreover, true-AP could be anatomically dissociated from both Relative Pitch (RP) and QAP, with larger left-right planum temporale asymmetry in true-AP possessors (Wilson et al., 2009).

Except for tone-deaf amusics (Ayotte et al., 2002; Tillmann et al., 2015), the rest of the population rather relies on RP and processes musical idiom contextually, comparing tones with surrounding ones. However, according to recent observations (Wengenroth et al., 2014; Ziv and Radin, 2014), both AP and RP capacities can be considered as continuous distributions, with a distinctive independent position in both distributions for each individual. So, in the end, any normal individual possesses both AP and RP to some extent (Zatorre, 2003; Wengenroth et al., 2014; Ziv and Radin, 2014).

Whether possessing AP is an advantage or even the panacea to become an exceptional musician is a matter of debate. AP can represent an advantage for developing musical abilities by strengthening internal representations, facilitating musical memory and for score reading, musical dictation or composing (Dooley and Deutsch, 2010). However, the faculty can also hamper proper functioning, as the automatic AP response may cause interference in the context of special tunings, ensemble playing or transpositions (Miyazaki and Rakowski, 2002; Zatorre, 2003; Wilson et al., 2009). According to Ziv and Radin (2014), AP may increase reaction times in the context of global processing of music as opposed to local processing.

AP only develops in those who receive musical tonal training from an early and before a critical age (Gregersen et al., 2001; Vitouch, 2003). As a consequence, classically reported percentages of AP in the general population (0.01%; Bachem, 1955; Ward, 1999) are probably underestimated, as no consensus exists on how to appraise latent AP in people lacking musical education. Finally, training is not sufficient to develop AP, which is also a function of genetic predisposition, often occurring in families (Baharloo et al., 2000; Zatorre, 2003). The estimated proportion is larger in highly trained Western musicians (Baharloo et al., 1998; Gregersen et al., 1999; up-to 15% and over), and much larger in Chinese musicians speaking Mandarin, a tone language (up-to 60% with semitone precision; Deutsch et al., 2006). These observations probably reflect the influence of intensive tonal experience from birth on AP acquisition. However, some influence of genetic factors cannot be excluded in Asian populations (Gregersen et al., 1999).

Some authors propose a 2-level model of AP functioning (Levitin, 1994; Zatorre, 2003; Levitin and Rogers, 2005). The first level involves the representation of pitches in long-term memory and is essentially perceptual in nature. The second level, principally cognitive, involves the categorization or labeling of pitch according to the 12-tone temperament. These 2 levels may rely on distinct cerebral networks (Elmer et al., 2015). The left planum temporale might foster long-term pitch representation, allowing matching incoming spectro-temporal patterns with a template (Ohnishi et al., 2001; Griffiths and Warren, 2002). As a matter of fact, degree of left planum temporale activity correlated significantly with absolute pitch ability (Ohnishi et al., 2001). The left posterior dorsolateral prefrontal cortex would then perform the pitch-verbal associations (Zatorre et al., 1998). The bridge between these networks may be assured by functional connectivity in the left hemisphere that could be shown by theta band synchronization, likely via the arcuate fasciculus (Elmer et al., 2015). Wengenroth et al. (2014), using a unique AP test excluding the intervention of RP during the AP task, rather found an association between AP and the volume of the right Heschl's gyrus and proposed that a right-hemispheric network mediates AP perception. However, this study also accepted whole tone precision -albeit with weighted scoring-, and did thus not focus exclusively on true-AP. A recent publication explained observed differences concerning AP by the diversity of its definitions, and the lack of consensus concerning the tasks used to measure this faculty (Hou et al., 2016). In the current experiment we focused exclusively on true-AP possessors as compared to all other forms of pitch processing.

Most research on auditory processing in AP and non-AP (NAP) populations focused on the processing of isolated tones, chords or on oddball paradigms (McLachlan et al., 2013; Wengenroth et al., 2014; Ziv and Radin, 2014; Rogenmoser et al., 2015; Greber et al., 2018). However, it is in the context of complex dynamic musical contexts that AP is acquired and has its function. In the current study, we examined event-related brain responses to expressive polyphone musical stimuli, investigating a homogeneous population of young highly trained pianists with half of them possessing true-AP.

The musical stimuli were short expressive pieces for string quartet with 3 levels of harmonic transgression at closure that participants appraised, presented while high density EEG was recorded. We previously used these stimuli successfully to show the impact of different levels of musical training intensity on brain and behavioral responses using fMRI (Oechslin et al., 2013) and EEG (James et al., 2017), also exclusively in pianists. Both the fMRI and the EEG study results disclosed progressive changes in music processing as a function of musical training intensity and proficiency. The present study used EEG neuroimaging to investigate whether differences in processing of these pieces would also occur as a function of absolute pitch possession within a homogeneous population of highly trained pianists. EEG neuro-imaging allowed to compare AP and NAP for Event Related Potentials (ERPs), functional ERP microstates and underlying brain sources (Michel and Murray, 2012), providing millisecond precision information on the time course of brain processing.

Microstates are stable voltage topographies over time, lasting tens to hundreds of milliseconds, reflecting time windows of coherent synchronized activation of large-scale neuronal networks that represent basic physiological units of cognition (Lehmann et al., 1987; Murray et al., 2008; Brunet et al., 2011; James et al., 2017). Microstate analysis or spatio-temporal ERP analysis, compared to the isolated amplitude analysis of an array of single electrode sites, has the advantage of conserving the major part of the variance in the data by extracting the microstates in a multivariate way including all electrodes, groups and conditions in one unified analysis (Brunet et al., 2011; Michel and Murray, 2012; James et al., 2017). Classical ERP analysis, studying an array of isolated ERPs, although most effective in differentiating experimental conditions and groups at specific electrode sites (Kutas and Federmeier, 2011; Koelsch et al., 2013), only explores a small part of the variance in the data and does not explore the spatial dimensions of highdensity EEG (Michel and Murray, 2012). Especially in the context of intrinsically different populations like in the current study, studying all electrodes and their underlying sources provides a more comprehensive view on the functioning of the brain. Moreover, changes in the spatial configuration of the microstates, indicate a change in the underlying cerebral sources (Vaughan, 1982; Michel and Murray, 2012), making them reliable precursors for EEG source imaging, thus allowing comparisons with other neuroimaging results. Although source localizations via EEG only allow centimeter precision, the combined temporal and spatial analyzes may shed new light on the time course of cerebral processing. Rigorous statistical thresholds may ensure reliability of these computations.

Some indications exist for possible cognitive advantages of AP outside the music domain, particularly for auditory span or short-term memory (STM; Deutsch and Dooley, 2013). In order to verify the possible influence of true-AP on general cognition, we also compared measures of auditory STM and working memory (WM) as well as the results of an auditory figure-ground perception ("hearing in noise") test between the groups.

This experimental plan allowed us to examine the following hypothesis.

We anticipated that AP musicians would outperform NAP musicians for harmonic transgression sensitivity, following their automatic and infallible pitch analysis (Hypothesis 1), and that specific ERP patterns and microstates would occur in response to the harmonic transgressions. As this has never been investigated before in this population in a dynamic ongoing musical context, we cannot foresee the type of ERPs that may occur.

We expected both early perceptual differences in the ERPs and later more cognitive differences (Hypothesis 2), reflecting the 2-level model of AP functioning (Elmer et al., 2015).

We predicted influence of AP on general auditory cognition, with advantages for short term and working memory confirming the results of Deutsch and Dooley (2013) and possibly disadvantages for hearing in noise, as AP musicians automatically process all pitches (Hypothesis 3).

Finally, we hypothesized to find distinct brain sources, possibly early in time comprising the left planum temporale and later on in the prefrontal cortex in AP musicians (Hypothesis 4; Ohnishi et al., 2001).

#### MATERIALS AND METHODS

#### Participants

Twenty-four right-handed pianists accepted to participate in this study on a voluntary basis. They granted written informed consent, and received financial compensation after participation. The pianists were recruited from the Conservatoires of Geneva, Lausanne, and Neuchâtel. Among them 12 possessed true-AP (3 men; M = 24.18 years, SD = 4.91) and 12 did not (6 men; M = 24 years, SD = 4.81; see the Results section "Absolute Pitch Test" below for mean scores and confidence intervals of both groups). One AP participant was eliminated due to EEG artifacts. Final analyses were made on 11 true-AP and 12 NAP participants. Most of them also practiced other instruments or sang, 9 in the AP group and 7 in the NAP group. Secondary practices concerned mainly wind instruments and singing, but not string instruments. As tested by Student independent sample t-tests and chi-squared tests for independence (**Table 1**), true-AP and NAP participants did not statistically differ for age, gender, socio-economic status, age onset of musical practice, total number of years of piano training, instrumental level on the piano, practice of secondary instruments and years of practice on the secondary instruments.

Concerning the instrumental level (see **Table 1**), 13 pianists (5 AP, 8 NAP) obtained a certificate from the amateur sections of the above-mentioned Conservatoires, thus reaching the highest amateur level. The certificate implies up to 14 years of piano instruction, public final exams, 9 years of music theory closed by examinations, and 3 years of supplementary music instruction like composition or chamber music. The 8 pre-professionals (4 AP and 4 NAP) also obtained the certificate and were admitted in the preparatory classes of the Conservatoires and expected to enter the professional section in maximum 3 years. Finally, 2 AP participants already entered the professional section of one of the Conservatoires. No significant difference existed between the instrumental levels of AP and NAP participants [see **Table 1**; χ 2 (2) = 4.23, P = 0.16].

In the current experiment, the musical proficiency of the participants was confirmed by their elevated d-prime scores for detecting both apparent and subtle transgressions in the musical test (see section "Musical Test").

All participants reported normal hearing and presented no history of neurological illnesses. The experiment was conducted at the Brain and Behavior Laboratory in the "Centre Médical Universitaire" in Geneva and approved by the local ethics committee.

#### Materials and Trial Sequence

Participants realized 6 tasks described in order of execution below.

#### TABLE 1 | Biographical data.


Listed are the means of the variables of both groups and the p-values (t-tests for the quantitative variables and chi-square test for independence for the qualitative variables). SDs are reported in brackets.

#### Edinburgh Handedness Inventory

Developed by Oldfield (1971) this questionnaire precisely assessed the laterality of each participant, providing a score between −100 (100% left-handed) and 100 (100% right-handed). This inventory comprises questions concerning handedness preference for 10 tasks, like writing or drawing.

#### Absolute Pitch Test

Created by Oechslin et al. (2010), this test is composed of 36 piano tones (A = 440 Hz), all repeated 3 times. Sinus tones are reported to be harder to detect than piano tones (Baharloo et al., 1998). However, Baharloo et al. found very similar scores in true-AP people for pure tones vs. piano tones. This can be explained by the "all-or-nothing" quality of pitch labeling of true-AP possessors (Profita and Bidder, 1988), for whom the timbre of the stimuli will not impact their pitch perception. Moreover, we consider piano tones to possess higher ecological validity (Elmer et al., 2013) and therefore more appropriate in this study on music processing. The resulting 108 tones were presented in pseudo-random order, with 18 Klipsch Image S4 earphones. The tones were situated in the third, fourth and fifth piano octave reaching from C3 to H5. All 12 semitones of each octave were presented. Tones lasted 1 s and were separated by a 4 s inter-stimulus interval of brown noise. Tones and brown noise were generated using the "Cool Edit Pro" software (http://cooledit-pro.soft32.com/). After each tone, participants immediately wrote down its exact label (with semitone precision), during the following 4 s window of brown noise. Each correct answer counted for 1 point, allowing a maximum score of 108 points.

At first, we recruited participants based on auto-reporting of semitone precision AP or the absence of it. For the final composition of the groups, after testing for AP, we retained the NAP group participants only if their score respected an upper limit of maximum 65% of correct answers for the 95% CI (Confidence Interval). This threshold was chosen as this percentage corresponds to the level at which adult individuals without AP may temporarily bring their semitone precision detection of tones or learn to identify one single tone following intensive training (Brady, 1970; Ward, 1999; Van Hedger et al., 2015). For the true-AP group participants, the lower limit of the 95% CI was set to 90% (Baharloo et al., 1998). Based on previous studies, AP participants who scored more than 90% with semitone precision on an AP test appeared to be a special group (Miyazaki, 1988; Baharloo et al., 1998; Itoh et al., 2005; Schulze et al., 2009). The other AP possessors that were called "mid AP" (Itoh et al., 2005) had scores around 60% on an AP test but they never attained more than 90% of corrects answers. Therefore, with a lower limit of the 95% CI fixed at 90% of correct answers for our AP group we can be confident that it represents true-AP possessors (for the sake of simplicity, from now on AP in the text).

#### Musical Test

While recording high density EEG (electroencephalogram), 90 different pieces for string quartet (2 violins, 1 alto and 1 cello), with a 10 s mean duration, were presented via EEG compatible earphones (Etymotic Research, Elk Grove Village, IL, USA) to all participants. We used string quartets in order to cancel out the "own instrument effect", consisting in enhanced cortical responses to the timbre of a well-mastered instrument (Pantev et al., 2001; Margulis et al., 2009). A professional composer (see acknowledgments) developed the musical pieces specifically for our experiments, equally distributed over all 24 minor and major tonalities. The "Sibelius" (http://www. sibelius.com/) and "Logic Pro" (http://www.apple.com/logicpro/) software served to create the compositions; natural instruments' timbres were implemented from the "Garritan Personal Orchestra" plug-in (http://www.garritan.com/). Musical styles of the pieces ranged from the Baroque to the Romantic period. The 90 original pieces were presented each with 3 distinct versions of the ending of the piece (see **Figure 1**), resulting in 270 stimuli. These 3 versions represent the 3

levels of our experimental condition "Transgression": (1) tonic chord (I; regular ending, no transgression = R), (2) first inversion of the tonic chord (I<sup>6</sup> ; subtle transgression = T sub) and (3) first inversion of the sub-dominant chord (IV<sup>6</sup> ; apparent transgression = T app). Participants appraised whether closure of the pieces was appropriate, using a left mouse click if this was the case and a right mouse click otherwise. The stimuli were presented along with a fixation cross. All terminal chords of the pieces were cut off at 1,400 ms after onset and faded linearly over the last 150 ms (using Adobe Audition 3.0, Adobe Systems Corp., California, USA). The response screen appeared 1,900 ms after the onset of the terminal chord (stimulus onset) and 500 ms after the disappearance of sound, to avoid EEG contamination by the motor response. The order of presentation was fixed, but there were 2 versions of the task: Direct order with pseudo-randomized presentation, and the reversed order of the latter (in order to cope with possible order effects). Half of the participants of each pitch group completed the direct order and half the reverse order. The task consisted of 5 blocks of 54 pieces, separated by breaks in order to prevent fatigue.

The harmonic transgressions used here, that involve inkey chords that would make perfectly sense if the musical stimuli continued, clearly involve cognitive processing. They represent refined grammatical errors in the musical idiom and are by no means dissonances or out of-key elements that would principally involve perceptual processing. The 2 chords used for transgression are just as consonant (or dissonant, consonance and dissonance are 2 extremes of a continuum) as a tonic chord, both being composed of thirds, sixths, fifths and fourths).

For more comprehensive information on the stimuli and more musical examples we refer to 3 previous publications (Oechslin et al., 2013; James et al., 2017; Jenni et al., 2017).

#### Auditory Selective Attention Test

We created this test based on the figure-ground principle. Participants' task consisted in comparing 2 series of sounds ("melodies") lasting 500 ms, separated by one-second of brown noise, and reporting whether they were identical by means of a left ("identical") or right ("different") mouse click. The series of sounds, the "figures," consisted of 7 tones, drawn at random (sampling without replacement) from the chosen range: The 12 semitones of the fourth octave of the piano. Together with each tone of the figure, a cluster composed of all 12 semitones played simultaneously (ground), with slightly less amplitude than the figure tones. Therefore, the participants had to compare the series in the presence of background noise, demanding auditory selective attention, even more so as the series were not arranged within a tonality, but randomly drawn. This random arrangement limits the grouping of several notes, hampering memorization of the series. In total, the test comprised 50 different series of 2 times 7 tones, of which each appeared in an "identical" and "different" version resulting in 100 stimuli. In a "different" version, 1 tone among the 7 differed from the first vs. the second series, a semitone higher (23 trials) or lower (27 trials). The modification was applied to one of the middle 5 tones, excluding the first and last one of the series (to avoid facilitating serial position effects). The series of piano-like tones were generated with the "Cool Edit Pro" software (http:// cool-edit-pro.soft32.com). Participants received 1 point for each correct answer for a total of 100 points. This test can be succeeded at 100% by means of relative pitch alone, as the participant has to compare interval series.

#### WAIS-IV Digit Span

In order to test auditory short-term memory and working memory, participants accomplished the "digit span" subtest of the Wechsler adult intelligence scale 4th edition (Wechsler, 2011) in direct and reverse order. During these tests, participants listened to spoken series of digits with increasing length (from 2 to 8 digits), and their task consisted in repeating them orally first in direct (testing primarily auditory short-term memory) then in reverse order (testing primarily auditory working memory). Two series of digits (1 for each task), progressively increasing by level of difficulty (length), were presented. Spoken digits were recorded beforehand at a rate of 1 digit per second to guarantee identical conditions for all participants. Participants completed first the direct order, then the reverse order task. They proceeded through the trials until they made 2 successive mistakes at the same level of difficulty. Each correct answer was given 1 point for a total of 16 points per task.

### Loudness/Intensity of the Auditory Stimuli

We normalized all auditory stimuli to an identical amplitude envelope (−5 dB) with the "Cool Edit Pro" software (http:// cool-edit-pro.soft32.com). Participants determined the optimal loudness individually during stimulus presentation, therefore we do not report intensity/dB values for the presentation of the stimuli. As the size and shape of the ear cup and canal varies strongly between individuals, inserting the earphones more or less deeply in the ear canal impacts the distance to the eardrum and thus influences perceived loudness. As the participants were all trained musicians, they are used to optimize hearing using earphones.

#### EEG Acquisition and Raw Data Processing

Participants were comfortably seated in the EEG room in front of a computer screen with approximate 1 m distance from the eyes. Continuous EEG was acquired using the "Biosemi" system (Amsterdam, The Netherlands) comprising an ActiveTwo amplifier system AD-Box with 64 active AG/AgCL electrodes, sampled at 1,024 Hz in a bandwidth filter of 0–268 Hz. The average of all electrodes was used as an online and offline reference. Meanwhile we recorded electro-oculography (EOG) and applied the voltage difference of 2 horizontal (HEOG) electrodes fixed at the outer canthi side of both eyes, to detect horizontal eye movements. Two additional electrodes were placed above and under the right eye to measure the voltage difference (VEOG) caused by blinks.

Raw data pre-processing was performed using the "BrainVision Analyzer 2.0" software (Brain Products GmbH., Munich, Germany). Offline, data were band-pass filtered between 0.25 and 30 Hz within a Butterworth zero phase filter. An additional 50 Hz notch filter removed noise from the power-line. Finally, given that the low-pass filter was set to 30 Hz, data were down-sampled to 256 Hz to reduce data size, thus processing time. Artifact rejection was applied to exclude voltages above 100 µv and under −100 µv. Independent component analysis was used to remove ocular artifacts and reduce deformation at the edge of the epochs. Finally, a baseline correction of −200 ms to stimulus onset was performed. On average 8.82% (SD = 11.38%) of the trials per experimental condition were removed. Data of one AP participant were excluded due to excessive artifacts.

Grand-Averages for the AP group were based on 80.00 ± 13.97 SD epochs for R, 81.00 ± 12.55 SD epochs for Tsub, and 77.73 ± 15.82 SD epochs for Tapp. For the NAP group, 84.42 ± 5.23 SD epochs were kept for the Grand-Averages for R, 84.25 ± 3.49 SD epochs for Tsub, and 84.33 ± 4.42 SD epochs for Tapp .

Noisy channels were interpolated, by means of a 3-D spherical spline, for on average 0.34% (SD = 1.05%) of the electrodes.

## ERP Analyses

The evoked potentials were analyzed in 3 stages described in detail below.

#### Classical Waveform Analysis

Our main interest lies in the microstate and source analyses that provide more comprehensive information, as they comprise all electrodes. Therefore, as multiple testing over our period of analysis (0–1,000 ms) only showed marginal differences for individual electrodes between AP and NAP for all 3 transgression conditions, we limited our waveform statistics to an analysis of Global Field Power (GFP), all 3 transgression conditions collapsed. GFP provides one single, always positive, reference free value representing the neural response strength throughout the brain, incorporating absolute voltages of all electrodes (Murray et al., 2008). This choice was made because during a very early time period, a generalized effect of pitch group manifested at almost all electrodes in all conditions (**Figure 2**).

We did not compute difference waves because they do not allow to compute microstates and source estimations. Moreover, as only marginal differences occurred for all 3 transgression conditions between AP and NAP, we collapsed all conditions to study the above mentioned early generalized brain response.

#### Microstate Analysis

Evoked potentials and spontaneous EEG display over time, as scalp voltage topographies remaining relatively stable for tens to hundreds of seconds, and represent microstates of information processing "atoms of thought" (Pascual-Marqui et al., 1995; Murray et al., 2008; Brunet et al., 2011).

Microstate analysis or spatio-temporal ERP analysis consists of 2 stages. During the first stage, a k-means cluster analysis applied to the grand-means of both conditions and groups defined the most dominant scalp topographies or microstates (Murray et al., 2008). In order to study robust effects, we chose to group microstates sharing over 92% of variance and lasting more than 20 ms. To define the optimal number of microstates we relied on the cross-validation index (Pascual-Marqui et al., 1995), which minimizes the residual variance, and on a modified "Krzanowski–Lai criterion," a quality measure for clustering (Michel et al., 2004; Tibshirani and Walther, 2005; Murray et al., 2008; Michel and Brandeis, 2009; Brunet et al., 2011). The resulting microstate series represent an a priori hypothesis, to be statistically verified in the second stage. This second stage consists in "fitting" the microstates back over time across all individual subjects, by means of a spatial correlation analysis, resulting in a parameter of presence, Duration (in ms) and of Global Explained Variance (GEV; in %) of each microstate for a period of interest.

Statistical testing of these parameters then verifies differential microstate presence over time comparing groups or conditions.

We started by comparing microstates between the 2 groups (AP vs. NAP) for all 3 conditions of musical transgression (R, Tsub, and Tapp). One single precocious distinct microstate occurred in all 3 conditions principally in the AP group. No other statistically significant differences occurred as a function of one of the transgression conditions over the further time period of analysis (up to 1,000 ms after stimulus onset). Therefore, we focused on this early microstate dissociating AP and NAP groups, averaging the data over all 3 transgression conditions, and recomputing a microstate analysis only on the Grand-Average ERPs of the 2 groups (R, Tsub, and Tapp collapsed). As the aim of this experiment was to study ERP differences as a function of pitch group, we will only focus on the latter analysis in the present paper.

The microstate analysis was performed with the Cartool software, developed by Denis Brunet (http://brainmapping. unige.ch/cartool).

#### Source Analysis

Unlike the direct solution that consists in calculating voltage configurations on the scalp from known underlying sources, the inverse problem of computing the underlying hypothetical sources from the measured voltages on the scalp is not unique or rather "ill-posed." To deal with this "ill-posed" inverse problem with surface EEG, distributed source analysis should be based on well-defined a priori constraints (Michel et al., 2004; Grech et al., 2008). Our a priori constraints derive from the microstate analysis that allowed determining a period during which distinct microstates appeared in both groups. Differences in scalp topographies over time indicate a change in the underlying generators (Lehmann et al., 1987; Murray et al., 2008), which relates to changes of the brain's functional state (Michel et al., 1999).

As our main interest lies in exploring differences between the AP and NAP group for auditory processing of music, we focused on ERP sources regarding the primary outcome of the microstate analysis, i.e., the early time period during which 2 spatially different microstates appeared in each group. These microstates appeared during the first 100 ms of processing of musical closure, in all experimental conditions. Microstates reflect stable scalp topographies and underlying brain networks, and maps' stability is maximum during the global field power ("GFP") peak (Koenig et al., 2014). In addition, the signal-to-noise ratio increases with the strength of the GFP (Lehmann et al., 2005). Therefore, we limited the time window of analysis around the GFP peak of the microstate characterizing the AP group (40–80 ms), the NAP group showing no marked GFP peak. This approach of centering source analyses on the GFP peak has been successfully used in different settings (Plomp et al., 2013; James et al., 2015).

We estimated the intracranial distribution (microampere per cubic millimeter, µA/mm<sup>3</sup> ) of the averaged ERP for each subject across all experimental conditions over the 40–80 ms period of interest, using an inverse linear solution of depth-weighted minimum norm (Hamalainen and Ilmoniemi, 1994; Michel et al., 2004; "WMN"). The WMN algorithm compensates the tendency of minimum standard algorithms to favor weak and superficial sources by using a weighted matrix (Grech et al., 2008). In combination with a statistical parametric mapping (SPM) approach, consisting of a statistical comparison of the underlying sources between groups, a considerable amount of noise is annihilated. In addition, the SPM method avoids the domination of maxima of the source that are identical in both groups (Michel et al., 2004). The intracranial distribution was calculated within a discrete 3-D grid of 3005 solution points, evenly distributed through the volume of gray matter of the Montreal Neurological Institute ("MNI") standard brain. By means of a homogeneous transformation that adapted the volume to the most appropriate sphere ("SMAC model"; Spinelli et al., 2000) a 3-shell spherical model was applied to calculate the lead field for the 64 electrodes and the inverse solution based on the constraints of the WMN. Then, to eliminate noise and to compare the groups, we applied the SPM model using 2 sided t-tests at each node, by means of the "Statistical Toolbox for Electrical Neuroimaging" (STEN; https://doi.org/10.5281/ zenodo.1164037; courtesy of Jean-François Knebel). Results of the SPM consist of stronger current density in certain brain voxels for either group (positive vs. negative t-values).

In order to cope with multiple comparisons, we used conservative statistical thresholds and only considered as significant globular clusters of at least 20 contiguous nodes each with a p-value of ≤0.01 (Knebel et al., 2011; De Meo et al., 2015). The spatial criterion of 20 nodes was determined through the "AlphaSim" software (http://afni.nimh.nih.gov) with a 10 mm FWMH smoothing and 10 mm radius of connected poles, together with the node p-value of <0.005 that we applied. Following a bootstrap procedure (10,000 Monte Carlo iterations) a 21-node cluster appeared, with a 0.001 cluster-level probability and a 0.00001 p-value corresponding at the node level. Spatial accuracy of these source estimations was thoroughly investigated in the past and showed that these approaches provide similar results compared to fMRI studies, only with a lesser precision range of 1–2 cm (Martuzzi et al., 2009; Plomp et al., 2010; Birot et al., 2014). The WMN source analysis was performed with the Cartool software (brainmapping.unige.ch/cartool).

A common misunderstanding interpreting these analysis techniques is that the source computations are applied to the microstates. The microstate analysis is a separate analysis stage, and also serves to determine the appropriate time periods for the source analysis, that is then applied to the ERPs. The latter are genuine neurophysiological data, whereas the microstate data are the fruit from a statistical cluster analysis.

## RESULTS

For the sake of brevity and transparency statistical results can be found mainly in the tables, and as little as possible in the text. Note that in order to compute effect sizes for our generalized linear mixed model (McCulloch, 2003; Bolker et al., 2009; GLMM) analyses, we used Nakagawa and Schielzeth's method (Nakagawa and Schielzeth, 2013) implemented in the "MuMin" R package. Their method is based on 2 indicators: A marginal (R 2 <sup>m</sup>) and a conditional (R 2 <sup>c</sup>) R 2 . R 2 <sup>m</sup> is the variance explained by the fixed factors, whereas R 2 c is the variance explained by the entire model (both fixed and random effects).

#### Behavioral Results

#### Edinburgh Handedness Inventory

The Edinburgh Handedness Inventory confirmed that all participants were right-handed. Moreover, as verified by a Student independent sample t-test comparing the 2 groups (AP vs. NAP) there was no significant difference between the AP (M = 79.43; SD = 23.45) and NAP (M = 86.75, SD = 10.80) group concerning handedness level [t(21) = 0.98, P = 0.34, d = 0.40].

#### Absolute Pitch Test

Concerning the absolute pitch test, we applied a GLMM specifying a binomial model with Group (AP vs. NAP) as fixed factor, and participants' performance as random factor. These analyses demonstrated that the AP group was significantly more accurate than the NAP group (see **Figure 3**, **Table 2**). The test



yielded a large effect size [R 2 <sup>m</sup> = 0.69; (Cohen, 1988; Calin-Jageman, 2018)]. The NAP groups showed a less homogeneous performance than the AP group. Indeed, 9 NAP scored <20% on average and 3 between 30 and 60%. As we intended to oppose true-AP possessors against all other types of pitch perception, the inclusion of 3 quasi-absolute pitch (QAP) possessors seems legitimate (Wilson et al., 2009).

#### Musical Test

In order to investigate differences in performance at the musical test between AP and NAP participants, we calculated the rater sensitivity index "d-prime" for Tsub and Tapp. The d-prime index is a statistic used in signal detection theory (Macmillan and Creelman, 1997), preventing impact of response biases, by comprising both "hits" (correctly detecting that Tsub and Tapp endings are not appropriate) and "false alarms" (incorrectly detecting that R endings are not appropriate) in the calculation of task performance. The d-prime index thus incorporates the responses to the R (regular) endings. A higher d-prime value reflects better discrimination between transgressed and regular endings. Comparison of the d-prime parameter for AP (M = 2.28; SD = 1.19) and NAP (M = 2.22; SD = 1.31) participants for Tsub, using a Student independent sample ttest, revealed no significant difference between the groups [t(21) = −0.11, P = 0.91, d = 0.05]. The same analysis showed no significant difference between AP (M = 3.77; SD = 1.26) and NAP (M = 3.82; SD = 1.07) participants for Tapp [t(21) = 0.11, P = 0.91, d = 0.04].

The high d-prime scores in both groups confirm the musical and specifically harmonic skills of our participants. Compared to our previous study comparing expert pianists, amateur pianists and non-musicians (James et al., 2017) mean d-prime scores of the population in the current study for the subtle (difficult to detect) transgressions for the AP (M = 2.28; SD = 1.19) and NAP group (M = 2.22; SD = 1.31) are situated in between those of amateur (M = 1.54; SD = 1.00) and professional pianists (M = 4.19; SD = 0.78) in James et al. (2017).

#### Auditory Selective Attention Test

GLMMs with the group (AP vs. NAP) as fixed factor and participants' performance as random factor were run. The accuracy at the auditory selective attention test was not significantly different between the AP and NAP group (see **Table 3**). Two Student one sample t-tests demonstrated that the score for the AP group [t(10) = 6.25, P < 0.001, d = 6.26] and NAP group [t(11) = 5.36, P < 0.001, d = 1.55] was significantly different (i.e., above) from chance level (50%). Before analyzing reaction time data (ms), a log base 10 transformation of them was applied to correct for a floor effect and equalize the variances. Results showed that NAP participants were marginally faster than the AP participants (see **Table 3**).

#### WAIS-IV Digit Span

A GLMM with the group (AP vs. NAP) as fixed factor and participants' performance as random factor was performed on the scores of the direct order digit span subtest. The analysis revealed that AP participants' scores were not significantly different from those of the NAP participants (see **Table 3**). There was again no significant difference between the AP and NAP group concerning the indirect order digit span subtest (see **Table 3**).

### EEG Results

#### Classical Waveform Analysis

A 2-tailed unpaired t-test comparing mean GFP over the time window 40–80 ms between AP and NAP groups, all 3 transgression conditions collapsed (i.e., R, Tsub, and T app), yielded a significant result, with greater strength for AP (M = 0.71 µV, SD = 0.18) than for NAP (M = 0.49 µV, SD = 0.23) participants [t(21) = 2.47, P = 0.02, d = 1.07]. With d = 1.07, the effect size is large (Cohen, 1988; Calin-Jageman, 2018).

As we collapsed all 3 conditions (R, Tsub and Tapp), the GFP Grand-Averages of AP and NAP were based on 239.18 ± 40.19 SD epochs for AP participants, and 253.75 ± 13.20 SD epochs for NAP participants.

#### Microstate Analysis

A spatio-temporal analysis between 0 and 100 ms on the Grand-Average ERPs of the 2 groups (stage 1, see "Classical Waveform Analysis") yielded a solution with 5 stable microstate map configurations over time, independently of experimental condition (i.e., R, Tsub, and Tapp collapsed). The series of microstates appearing over time are depicted in **Figure 4A**. These 5 microstates explained 96% of the ERP data variance. Visual inspection of this microstate series at the grand mean level showed a similar sequence of microstates for both groups except from 0 to 156 ms and from 740 to 1,000 ms. Only for the early period significant statistical differences could be found. Between 0 and 100 ms after stimulus onset, a frontal positivity (microstate 1) characterized the AP participants, whereas for NAP participants, both microstate 1 and microstate 2, the latter showing posterior positivity, occurred. Interestingly microstates 1 and 2 manifested almost opposite voltage configurations. Statistical analyses (stage 2, "fitting" see "Microstate Analysis") on the microstate parameters Duration and GEV between 0 to 100 ms, a period of difference also observed in our ERP data (see **Figure 2**), reached significance. Because the microstate TABLE 3 | Summary of the Group results (AP vs. NAP) at the auditory selective attention test and WAIS IV direct and indirect order subtests.


FIGURE 4 | Results of the microstate analysis on Grand Average ERP responses to the musical test (0–100 ms), all transgression conditions collapsed, for AP and NAP participants. (A) Top panel: voltage configurations of the 5 microstates resulting from the microstate analysis; Middle and bottom panels: microstates marked in color-code on the superimposed Grand-average ERP waveforms of all 64 electrodes for AP (middle panel) and NAP (bottom panel). (B,C) Fitting results (0–100 ms). (B) Global explained variance (GEV) by group for microstates 1 and 2. (C) Duration by group for the same microstates. Asterisks indicate significant differences between the groups at P < 0.05.

Durations were not normally distributed, we opted for nonparametric testing for both parameters GEV and Duration. A Kruskal-Wallis statistical comparison based on the fitting of these maps into the individual data showed a significant difference of the parameter GEV between the groups (AP vs. NAP) for microstate 1 [H(1, N = 23) = 4.91, P < 0.05, d = 0.95] and microstate 2 [H(1, N = 23) = 4.46, P < 0.05, d = 0.89; see **Figure 4B**]. A significant difference between groups was also found for Duration in ms for microstate 1 [H(1, N = 23) = 4.34, P < 0.05, d = 0.87] and microstate 2 [H(1, N = 23) = 4.34, P < 0.05, d = 0.87; see **Figure 4C**]. These results disclosed that, in the 0–100 ms window of interest, microstate 1 principally characterized AP participants, whereas microstate 2 characterized NAP participants. Effect sizes, varying between d = 0.87 and d = 0.95 are large (Cohen, 1988; Calin-Jageman, 2018).

#### Source Analysis

The a priori constraint for determining the time period for comparing AP and NAP participants for underlying brain sources of the surface EEG was the simultaneous occurrence of 2 distinct microstates (1 and 2; see **Figure 4A**) in both groups in all experimental conditions in the first 100 ms after stimulus onset. These microstates derive from the microstate analysis comparing the 2 groups, over all experimental conditions (R, T sub, and Tapp collapsed). The scalp configurations of these 2 microstates representing each essentially 1 pitch group, displayed quasi-opposite voltage configurations at the scalp level, with a clear GFP peak for AP possessors for microstate 1. Therefore, this time period provides a strong precursor for computing inverse solutions. To cancel out as much noise as possible, we centered our analysis around the GFP peak of microstate 1 in the AP group (40–80 ms), and applied SPM, comparing the groups with 2-tailed t-tests for all 3,005 nodes of the 3-D grid, using the STEN software.

One single 21-node globular cluster passed our statistical thresholds (see section "Source Analysis") characterizing the AP group (positive t-values), in left secondary auditory areas located around the temporo-parietal junction, encompassing part of the planum temporale (**Figure 5**). The peak t-value was found in the gray matter of the left posterior superior temporal gyrus [t(21) = 3.65, P < 0.002, d = 0.76; **Figure 5**, middle upper panel], the other areas composing the cluster comprised parts of the supramarginal gyrus and the rolandic operculum. With d = 0.76 we report a close to large effect size here (Cohen, 1988; Calin-Jageman, 2018).

## DISCUSSION

The key and novel finding of this study is that a precocious ERP microstate, peaking as early as 60 ms after closure onset in expressive music, dissociated AP pianists from NAP pianists, in response to all musical stimuli, independently of the harmonic congruence of the musical closure (partially confirming Hypothesis 2). Later more cognitive cerebral processing of harmonic transgressions in music turned out alike in AP and NAP pianists. Identical rater sensitivity (d-prime) for musical transgressions and equal STM and WM scores underpin this observation.

Despite the absence of a behavioral difference for the harmonic transgression test, the robust behavioral difference for the AP test evidences that we are dealing with 2 intrinsically distinct populations, that process the musical stimuli differently, resulting, however, in a similar level of transgression detection proficiency.

According to the 2-level model of AP functioning (Levitin, 1994; Zatorre, 2003; Levitin and Rogers, 2005), this precocious microstate involves the perceptual representation of pitches in long-term memory and not the semantic labeling. Given its very early appearance, this stage of information processing apparently consists in a highly automatized pitch analysis capacity, which can be dissociated from later more cognitive processing of music. Our source analyses revealed that this microstate concurred in AP musicians with involvement of left secondary auditory areas at the temporo-parietal junction, comprising the planum temporale, corresponding to those described in the literature distinguishing AP possessors (Schlaug et al., 1995; Ohnishi et al., 2001; Jancke et al., 2012; partially confirming Hypothesis 4). Moreover, Elmer et al. (2015) hypothesized that specifically the left planum temporale (PT), that is part of the cluster the current study localized, fosters the first perceptual component of AP according to the 2-level model, matching incoming spectrotemporal patterns with a template (Ohnishi et al., 2001). These left auditory related areas may function as a "hub of information flow" toward inferofrontal areas (Elmer et al., 2015, p. 369).

These results also constitute another confirmation of the validity of source analysis from scalp EEG (Plomp et al., 2010; Birot et al., 2014), while adding a temporal precision that other imaging techniques cannot provide (Michel et al., 2004; Meyer-Lindenberg, 2010; Michel and Murray, 2012).

To the best of our knowledge, our study is the first to highlight such an early difference between AP and NAP participants. The earliest differences reported in previous studies concerned the N150 ERP component (Itoh et al., 2005), and the MEG M100 (Hirata et al., 1999). Given the component's time of occurrence, and the fact that it appears much earlier than the MMN (mismatch negativity) which reflects pitch memorizing, and the P3a, the latter probably reflecting the labeling process (Rogenmoser et al., 2015), we suppose that it highlights an automatic and unconscious sensorial process.

The fact that we were able to identify specific sources underlying AP function in left secondary auditory areas, including the PT at this early point in time, is noteworthy. The assumed high degree of automation of this special pitch processing and identification capacity is strongly supported by this result. This processing stage thus precedes cognitive processing of complex music.

The literature showed that rather pruning in the right PT than development of the left PT explains the observed leftright PT asymmetry in AP possessors (Keenan et al., 2001). This underdevelopment of the right PT then favors functional dominance of the left PT during early development in AP individuals. The here observed pitch processing likely performed by the left PT and surrounding areas undoubtedly represents only part of the AP faculty, i.e., the acoustical part. Left frontal areas, more precisely the left posterior dorsolateral prefrontal areas, may carry the "linguistic labeling" of the acoustically identified pitches (Zatorre et al., 1998; Ohnishi et al., 2001). Our study did not highlight any frontal sources, which can be explained by the SPM analysis technique that annihilates all common activations in both groups and only shows differences. The NAP musicians

in the current study, that are also highly trained musicians, likely also rely on left dorsolateral prefrontal areas when making relative pitch judgments of chords, relevant for the task in the current experiment (Zatorre et al., 1998; James et al., 2017). Likewise, differences in right auditory cortex activation did not reach statistical significance either as AP and NAP participants probably both relied on these brain areas for chord processing.

AP (in red) and NAP (in black) with significant difference in strength between 40 and 80 ms.

Two studies investigated brain connectivity in AP possessors. Enhanced structural brain connectivity in AP people was described by Loui et al. (2011) and involved tracts connecting more strongly the left superior temporal gyrus with the left middle temporal gyrus. Volumes of these tracts predicted AP performance. Furthermore these authors observed hyperconnectivity in bilateral superior temporal lobe structures in AP possesors. A recent study by Brauchli et al. (2019) observed enhanced fine-grained functional connectivity in the left auditory cortex in AP musicians using multi-voxel pattern analysis. These studies do not refute our findings.

Our study deviates however from 2 important publications in the domain of comparison between AP and non-AP musicians. First, we did not find right Heschl's involvement in AP processing like Wengenroth et al. (2014). This may be explained in the first place by the different time window studied with fMRI, not allowing to detect a precocious effect like we could in the current study. Secondly, the AP definition slightly differed, as the authors applied an innovative AP test, excluding any RP involvement, moreover, also accepting whole tone precision -albeit with weighted scoring. In the test used here, relative pitch perception is involved. Additionally, in Wengenroth et al. (2014), tones and not expressive music served as stimuli, and the participants played different instruments. As stated before, observed differences concerning AP may be explained by the diversity of its definitions, and by the absence of consensus concerning its measurement (Hou et al., 2016).

Within a passive listening Oddball paradigm, Rogenmoser et al. (2015) found identical MMN responses in AP and NAP musicians but different P3a responses, concluding that early auditory processing did not differ between the groups, but later more cognitive processing did, with AP showing smaller P3a amplitudes. We report opposite results, with a very early difference arising from secondary auditory areas at 60 ms and no later differences in dorsolateral prefrontal areas. This difference in observations may be explained by the different stimuli used, and also by the criteria for AP. Rogenmoser et al. (2015) tested their participants on out-oftune tones in a passive listening Oddball paradigm. In that case the automatic searching for the correct label seems triggered in AP possessors, because out of tune notes may induce doubt on the exact pitch of the note, despite the passive listening, as AP people process pitch automatically and involuntary. In our active harmonic transgression task, searching for the correct label is not prominent, searching for the correct syntactical function of the chord is, implicating also relative pitch processing. Moreover, in Rogenmoser et al. (2015), AP possessors provided at least 40% of correct responses in the AP test, vs. 90% in our study. So several of our NAP participants would have been categorized as AP in their study and their results did not restrain to true-absolute pitch processors. Finally, in a replication study of Rogenmoser et al. (2015), performed by Greber et al. (2018) on a larger sample of

musicians, the findings on a weaker P3a in AP possessors were not confirmed.

An earlier EEG study by our research group compared participants with 3 levels of training intensity for the 3 transgression conditions (James et al., 2017). In that study, non-musicians, amateur pianists and expert pianists showed pronounced differences in ERP responses and source estimations between the 3 transgression conditions. Interestingly, in the current experiment with 2 groups not differing in training intensity, no such differences were found, although we expected this initially. As, moreover, no differences manifested for transgression detection at the behavioral level, these results suggest that no important differences exist between equally trained AP and NAP pianists for cognitive music processing as involved in our task (infirming Hypothesis 1).

We initially hypothesized that AP participants would perform better than NAP participants at auditory STM and WM tasks. Concerning STM, we expected to replicate Deutsch and Dooley's results (2013), who used similar stimuli and task: Series of digits that participants had to repeat in direct order, for which they observed an advantage for AP participants. In contrast, we observed similar scores in both groups for STM and WM (partially infirming Hypothesis 3). These contradictory results may be explained by differential recruiting of participants and the tests used. Deutsch and Dooley tested 7 APs against 20 NAPs. The instruments played were not specified, thus probably varied. Handedness was not reported. In the current experiment groups were of similar size and composed of right-handed pianists only. Finally, we used the WAIS-IV normed subtests to assess STM and WM, whereas Deutsch and Dooley used a test that they created for their experiment. Interestingly, Benassi-Werke et al. (2012) using the direct and indirect order digit span subtests of the WAIS-III like in the current experiment, did not report any significant differences either between two groups of professional singers with and without AP and a group of amateur NAP singers. However, when testing on tones instead of digits, the AP group performed significantly better. STM and WM advantages possibly only manifest clearly when the test material consists of tones.

As already explained in the introduction, possessing AP may also yield disadvantages in musical contexts. Although not present at the level of correct responses, AP participants responded marginally later than NAP participants in our inhouse auditory selective attention test (partially confirming Hypothesis 3). This may be explained by the fact that because AP possessors automatically process all pitches they hear (Akiva-Kabiri and Henik 2012), they are unable to ignore the "ground," therefore slowing down the processing of relevant information for the task (comparing the tone series). Defining and confirming more precisely domains of disadvantage of the AP skill represents an interesting domain for further research.

## LIMITATIONS

As the study population consisted exclusively of pianists, all conclusions also restrain to this particular population. Pianists perform highly complex polyphone scores as compared to melody instrumentalists like string or wind players. Therefore, NAP pianists are also very proficient in the analyses of chords, which may partially explain why no differences occurred for music syntactic processing as a function of AP.

Concerning source computations, since we did not use digitized electrode positions nor individual MRIs, our source localizations are approximately 10% less sensitive (Brodbeck et al., 2011; James et al., 2017) relative to studies that used such individualized techniques (Birot et al., 2014; Megevand et al., 2014; Plomp et al., 2015).

Finally, the sample size of this study is relatively small. From our point of view there is no reason to suspect that the current study is underpowered due to a lack of sensitivity. As we report large effect sizes for all results (absolute pitch test, classical waveform analysis on GFP, microstate analysis and source analyis), we fulfill the recommendations of Whitehead et al. (2016), with at least 10 participants per group for studies with large effect sizes, in order to obtain 90% power and twosided 5% significance.

## CONCLUSIONS

To summarize, the key and novel finding of this study is the revelation of a precocious microstate (peaking around 60 ms after stimulus onset) characterizing AP participants in response to musical closure in complex polyphone music. Given its very early occurrence and underlying brain sources at this time point in secondary auditory areas, this stage of information processing clearly involves the acoustical features of AP functioning. Specific cerebral sources for AP possessors contributing to this microstate were localized in secondary auditory regions at the temporoparietal junction, including parts of the PT. These brain areas correspond to those described in the literature to distinguish AP possessors at the anatomical and functional level (Schlaug et al., 1995; Keenan et al., 2001; Itoh et al., 2005). In contrast, AP did not facilitate later musical cognition or general cognition. These findings therefore demystify to some extent the incrusted beliefs on the major impact of AP on musical skills.

We consider our findings particularly interesting, because it is the first study to compare such homogeneous populations for investigating AP. Indeed, as AP and NAP participants in our study did not differ for age, age of onset of musical practice, total number of years of music training, gender, instrumental level and socio-economic status, they exclusively differed for AP skill, making our results stand out from previous research in the domain.

## AUTHOR CONTRIBUTIONS

SYC, NV, DG, and CEJ conceived and designed the study. NV recruited the participants. SYC and NV welcomed the participants and administered the EEG recordings. SYC, NV, and CEJ analyzed the data, designed the figures and wrote the main manuscript text. All authors were involved in the discussions about the interpretation of the results and the proofreading of the manuscript.

### FUNDING

This work was supported by the Swiss National Science Foundation (grant number 100014–125050 to CEJ) and is part of a multimodal brain imaging project entitled Behavioral, neuro-functional, and neuro-anatomical correlates of experience dependent music perception.

## ACKNOWLEDGMENTS

The Cartool software (brainmapping.unige.ch/cartool) has been programmed by Denis Brunet, from the Functional Brain Mapping Laboratory, Geneva, Switzerland, and is supported by the Center for Biomedical Imaging (CIBM) of Geneva and Lausanne. The STEN toolbox (https://doi.org/10.5281/zenodo. 1164037) has been programmed by Jean-François Knebel, from the Laboratory for Investigative Neurophysiology (the LINE),

## REFERENCES


Lausanne, Switzerland, and is supported by the Center for Biomedical Imaging (CIBM) of Geneva and Lausanne and by the National Center of Competence in Research, project SYNAPSY—The Synaptic Bases of Mental Disease; project no. 51AU40\_125759. We would like to thank Bruno Pietri for the beautiful compositions.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins. 2019.00142/full#supplementary-material

Audio 1 | Regular closure (degree I).

Audio 2 | Subtle transgression of closure (Tsub, degree I<sup>6</sup> ).

Audio 3 | Apparent transgression of closure (Tapp, degree IV<sup>6</sup> ).


localises the seizure onset zone. J. Neurol. Neurosurg. Psychiatr. 85, 38–43. doi: 10.1136/jnnp-2013-305515


spherical head models. Brain Topogr. 13, 115–125. doi: 10.1023/A:1026607 118642


the overall trial sample size for the external pilot and main trial for a continuous outcome variable. Stat. Methods Med. Res. 25, 1057–1073. doi: 10.1177/0962280215588241


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Coll, Vuichoud, Grandjean and James. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# On the Association Between Musical Training, Intelligence and Executive Functions in Adulthood

#### Antonio Criscuolo<sup>1</sup> , Leonardo Bonetti<sup>2</sup> , Teppo Särkämö<sup>3</sup> , Marina Kliuchko<sup>2</sup> and Elvira Brattico<sup>2</sup> \*

<sup>1</sup> Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands, <sup>2</sup> Center for Music in the Brain, Department of Clinical Medicine, Aarhus University – The Royal Academy of Music, Aarhus/Aalborg, Aarhus, Denmark, <sup>3</sup> Cognitive Brain Research Unit, Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland

#### Edited by:

Claude Alain, Rotman Research Institute (RRI), Canada

#### Reviewed by:

Alexandra Parbery-Clark, Swedish Neuroscience Institute (SNI), Swedish Medical Center, United States Cyrille Magne, Middle Tennessee State University, United States

> \*Correspondence: Elvira Brattico elvira.brattico@clin.au.dk

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Psychology

Received: 13 December 2018 Accepted: 08 July 2019 Published: 30 July 2019

#### Citation:

Criscuolo A, Bonetti L, Särkämö T, Kliuchko M and Brattico E (2019) On the Association Between Musical Training, Intelligence and Executive Functions in Adulthood. Front. Psychol. 10:1704. doi: 10.3389/fpsyg.2019.01704 Converging evidence has demonstrated that musical training is associated with improved perceptual and cognitive skills, including executive functions and general intelligence, particularly in childhood. In contrast, in adults the relationship between cognitive performance and musicianship is less clear and seems to be modulated by a number of background factors, such as personality and socio-economic status. Aiming to shed new light on this topic, we administered the Wechsler Adult Intelligence Scale III (WAIS-III), the Wechsler Memory Scale III (WMS-III), and the Stroop Test to 101 Finnish healthy adults grouped according to their musical expertise (non-musicians, amateurs, and musicians). After being matched for socio-economic status, personality traits and other demographic variables, adult musicians exhibited higher cognitive performance than non-musicians in all the mentioned measures. Moreover, linear regression models showed significant positive relationships between executive functions (working memory and attention) and the duration of musical practice, even after controlling for intelligence and background variables, such as personality traits. Hence, our study offers further support for the association between cognitive abilities and musical training, even in adulthood.

#### HIGHLIGHTS


Keywords: musical training, cognition, intelligence quotient, working memory, attention, executive functions

## INTRODUCTION

fpsyg-10-01704 July 26, 2019 Time: 12:8 # 2

## Musical Training Relies on Executive Functions

Musical training is a multisensory experience engaging multiple cognitive functions and underlying neural networks. Indeed, reading, listening, understanding and performing polyphonic music require the simultaneous processing of sounds and rhythms, higher order perceptual processing and fine sensorymotor coordination (Münte et al., 2002). Long-term musical training engages and trains all those functions on a daily basis and, as a result, musicians seem to improve not only musicrelated abilities, but also domain-general skills. Hence, musicians show increased auditory perception and production abilities, such as enhanced capacity to detect deviations in complex regularities and tone patterns (Tervaniemi, 2001, 2009; Fujioka et al., 2004; Zuijen et al., 2004; Van Zuijen et al., 2005; Bangert and Schlaug, 2006; Herholz et al., 2009) as well as fine motor control (Krings et al., 2000; Koeneke et al., 2004; Vuust et al., 2005; Kleber et al., 2013; Burunat et al., 2015).

Besides improving listening and sensorimotor abilities closely linked to the musical practice (Schellenberg, 2011), there is also evidence in favor of the far transfer effect to non-musical functions. In the literature, far transfer effects relates to the influence of musical training on general (not confined to the auditory domain) cognitive functions, such as spatial (Gromko and Poorman, 1998; Rauscher, 2002; Brochard et al., 2004; Sluming et al., 2007), mathematical (Cheek and Smith, 1999), and non-verbal (Forgeard et al., 2008) abilities. Among these, working memory (WM) refers to the ability to retrieve, monitor, analyze, integrate, chunk and recall within a short time span both auditory and non-auditory information (Herholz et al., 2009; Hansen et al., 2013; for reviews, see, e.g., Kraus and Chandrasekaran, 2010; Reybrouck and Brattico, 2015; Schlaug, 2015). In music processing, WM integrates sound events, recollects information from memory systems, links sounds to meaning and to memories, and supports the generation of emotional reactions (Burunat et al., 2014).

Along with cognitive flexibility, response inhibition and interference control, WM is considered one of the fundamental executive functions in humans (Diamond, 2013). Executive functions designate a set of abilities related to updating and manipulating relevant information (WM), inhibiting automatic responses, shifting attention to mental tasks (selective attention), planning, reasoning and decision making (Guare and Dawson, 2004; Perrotin et al., 2008; Garner, 2009; Collins and Koechlin, 2012). Improvements in executive functions and cognitive flexibility by musical training have been observed in Finnish school-age children and these improvements positively correlated to enhanced neural sound discrimination (Saarikivi et al., 2016).

## Effects of Musical Training on General Intelligence

Previous evidence suggests that long-term engagement in musical activities modulates not only executive functions but also general intelligence or g (Hansen et al., 2013; see Schellenberg and Weiss, 2013 for a review). In psychological science, g has been defined in several ways and assessed using a variety of behavioral tests. Gottfredson (1997, PAG-13) described it as a "very general mental capability that involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and from experience." G seems to rely on similar neural substrates to WM, comprising a network of prefrontal and subcortical regions, along with the anterior cingulate, parietal and premotor regions (Duncan et al., 2000). For similarities at both functional and anatomical levels, Salthouse and Pink (2008) suggested that WM is closely related to g. Moreover, the benefits of far transfer-effects of musical training extend to the domain of g and its quantitative measure, namely intelligence quotient (IQ). For instance, musically trained children show higher IQ as compared to non-trained children (Schellenberg, 2004, 2011).

## Nature or Nurture?

Findings in favor of the link between musical training and cognitive functions led to the notorious debate of nature vs. nurture: i.e., are the observed differences in cognitive abilities only associated with pre-existing neurocognitive differences which predispose people to engage in musical activities (nature), or is the cognitive training promoted by musical activities able to influence cognitive abilities (nurture)? On one side, numerous experimental studies have shown a link between musical training and cognitive development. For instance, in Schellenberg (2004) IQ of a sample of 144 6-year-old children were assessed before and after 1 year of music or drama classes by Wechsler Intelligence Scale for Children–Third Edition (WISC-III). The authors demonstrated that, despite nodifferences in the pre-test for WISC scores, there were greater improvements in WISC scores for children of the music group as compared to the control groups, along with improvements in some of its subscales, such as Verbal Comprehension and Perceptual Organization Indices. Similarly, in Schellenberg (2011) 106 children aged 9 to 12 (half musically trained and half untrained) were tested with the Wechsler Abbreviate Scale of Intelligence (WASI): trained children showed higher IQ scores than their untrained counterparts. Lastly, beyond IQ assessed with psychological tests, music training in childhood is associated with positive academic achievement (Schellenberg, 2006) and improvements in language-related abilities (Moreno, 2009; François et al., 2013).

Together, these findings highlight an association between musical training and general cognitive abilities in childhood. However, when taking in consideration background variables other than musical training, some authors have shown that pre-existing differences in cognitive abilities, together with differences in children's and their parents' personality traits, may contribute in the choice of engaging in musical training and in the duration of such training. In turn, this may ultimately account for differences in cognitive performances in adulthood (Schellenberg, 2006; Corrigall et al., 2013; Corrigall and Schellenberg, 2015). Furthermore, Bonetti and Costa (2017) showed associations between fluid intelligence and music tasks in children aged 4–6 years old with no previous musical training, suggesting a possible innate connection between some musical skills and intelligence that could potentially lead to a higher probability of engaging in musical studies for children with higher IQ. Lastly, by showing that genetic and environmental factors interact in determining music behaviors, such as musical practice and music enjoyment, Butkovic et al. (2015) have further highlighted the need for new investigations to clarify the complex association between music and cognitive development.

### The Current Study

fpsyg-10-01704 July 26, 2019 Time: 12:8 # 3

While a consistent corpus of research has focused on child populations, findings in adults are sparse (Brandler and Rammsayer, 2003; Helmbold et al., 2005; Schellenberg and Moreno, 2010). Existing evidence suggests that far transfer effects of musical training to general cognitive skills might be related to confounding variables that are usually neglected (Sala and Gobet, 2017), such as personality traits (Corrigall et al., 2013; Corrigall and Schellenberg, 2015).

Aiming to test the hypothesis that long-term musical practice is associated with improved cognitive abilities in adulthood, we assessed intelligence and executive functions of adults with different levels of musical expertise while controlling for background variables such as socio-economic status (SES), age, years of education and personality traits. Differently from previous studies (Brandler and Rammsayer, 2003; Helmbold et al., 2005; Schellenberg and Moreno, 2010; Swaminathan et al., 2017), we adopted the Wechsler Adult Intelligence Scale III (WAIS-III) as intelligence test. WAIS belongs to the family of Wechsler tests, the most used to assess intelligence in the psychological literature (Weiss et al., 2016). To investigate executive functions, we used the Wechsler Memory Scale III (WMS-III) for WM and the Stroop test for selective attention, cognitive flexibility and processing speed. Lastly, personality was assessed by administering participants with the Big Five Inventory questionnaire (BFI) as it was previously done in Corrigall et al. (2013) and in Corrigall and Schellenberg (2015). Our sample includes 101 highly educated Finnish adults (representative of the high education level in Finland; oecd.org) with a mean IQ higher than the average Finnish population [comparing the individual scores with the WAIS norms; (Wechsler, 1997a)].

In line with previous studies (Schellenberg, 2006; Swaminathan et al., 2017, 2018), we expected (i) expert musicians to report higher intelligence and executive functions than non-musicians and (ii) to verify that the positive relationship between the duration of musical practice and cognitive performances would hold even after controlling for potential confounding variables.

### MATERIALS AND METHODS

#### Participants

The participants were part of the broad "Tunteet" research protocol involving a multi-dimensional dataset of brain measures, psychological tests and behavioral data on audition, emotion and musical behfavior. The dataset was obtained from 140 participants recruited among university students or qualified professionals. Further details on this protocol can be found in Burunat et al. (2015, 2016, 2018), Kliuchko et al. (2015, 2016, 2018), Alluri et al. (2017), Bonetti et al. (2017, 2018), and Saari et al. (2018), where some of the participants involved in this study were included. All experimental procedures for this protocol were approved by the Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (approval number: 315/13/03/00/11, obtained on March the 11th, 2012). Furthermore, all procedures were conducted in agreement with the ethical principles of the Declaration of Helsinki.

For the purpose of the current study, we selected only participants who completed psychological and cognitive testing with a trained psychologist (N = 114). The other participants did not take part in the testing because their native language was not Finnish, or they did not have enough time to dedicate to the study, and thus other measurements were prioritized. We obtained information on their musical expertise crossing details derived from both a paper and pencil questionnaire (used in previous studies: e.g., Brattico et al., 2009) and an online survey called Helsinki Inventory for Music and Affect Behavior or HIMAB (Gold et al., 2013). Based on those details, subjects were divided into three groups according to levels of musical expertise (or "musicianship" from now on): non-musicians, amateurs, and musicians. Participants were considered musicians when they reported more than 5 years of music practice and considered themselves as musicians. In addition to this, for entering the musicians' group two criteria had to be matched: a final degree at a music academy or monetarily compensation for their music performance or teaching activities. Participants not matching these parameters, though involved in music activities, were classified as amateurs, and all participants with less than 3 years of musical training entered the group of non-musicians. For the scope of this study, we decided to combine the duration of musical training and the years of musical practice in a comprehensive variable named "years of music playing." Out of 114 participants, 13 were further excluded because they deviated from normal distribution in one or more background variables (age, years of music playing, years of education, SES). Therefore, only 101 participants were included in this research: 45 were males (44.5%) and 56 were females (55.6%) within the age range of 18–55 years (mean age 28,44 ± 8.26 SD). The SES of participants was assessed by means of the Hollingshead Four-Factor Index (Hollingshead, 1975) included in HIMAB. Details on the participants' SES, age, gender, years of education, and years of music playing are reported in **Table 1**, together with personality indices of neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness, as measured by the BFI. Musicians' and amateurs' musical background information are provided in **Table 2**, which includes the starting age of musical training and musical practice, together with the average of weekly hours spent in practicing and in listening to music and years of music playing.

All participants took part in the experiment on a voluntary basis and they were compensated with vouchers to use for culture and sport purposes (e.g., museums, concerts, or swimming pools). All of them were healthy and declared to have no history



In order, listed are participants' gender, handedness, mean and SD for age, duration of musical playing, years of general education, SES (socio-economic status) and personality indices of neuroticism, extraversion, openness to experience, agreeableness and conscientiousness, as measured by the Big Five Inventory questionnaire (BFI). These variables were used as predictors of musicianship in a regression model; associated p-value, standardized Beta, 95% confidence interval (CI; lower bound, LB, and upper bound, UB) and partial correlation for each variable are provided. Years of musical playing was the only significant predictor of musicianship and its statistical values are reported in bold.

TABLE 2 | Musical background information for amateurs and musicians.


In order, means and standard errors of the starting age (in years) of musical training and practice, weekly hours spent in practicing and in listening to music, years of musical practice. p-Value, standardized Beta, 95% confidence interval (CI; lower bound, LB, and upper bound, UB) and partial correlation for each variable as reported from a regression model are also provided. Significant statistical values are reported in bold.

of neurological or psychiatric disorders. All participants signed an informed consent before the beginning of the experiment and a researcher was present and available for assistance at any time.

#### Psychological Tests

#### Wechsler Adult Intelligence Scale III

The WAIS-III is a widely used test for the assessment of adults' and old adolescents' intelligence (Wechsler, 1997a). In this study, we used the following eight WAIS-III subtests: Vocabulary, Similarities, Information, Picture Completion, Block Design, Matrix Reasoning, Digit–Symbol Coding, and Symbol Search. The Vocabulary, Similarities, and Information subtests were used to calculate the Verbal Comprehension Index (VCI). The Picture Completion, Block Design, and Matrix Reasoning subtests were used to calculate the Perceptual Organization Index (POI). The Digit–Symbol Coding and Symbol Search subtests were used to calculate the Processing Speed Index (PSI). In addition, these subtests and the Letter-Number Sequencing subtest from WMS-III (which is the same as in WAIS-III) were adopted to estimate the Verbal Intelligence Quotient (VIQ), Performance IQ (PIQ), and fullscale IQ (FSIQ). More details on the tests can be found in **Table 3**.

#### Stroop Test

The Stroop effect is measured with the Stroop test and refers to the interference in the reaction time of a task providing conflicting cues. The Stroop effect is used to assess cognitive abilities, such as selective attention, cognitive flexibility and processing speed and, in general, executive functions (Strauss et al., 2006, pp. 477–499). Stroop test scores are calculated based on performances in word reading and color naming tasks. The word reading and color naming are control tasks where the subject is asked to just (i) read the color words printed in black ink and (ii) name the colors of given bars printed in different inks. In the third task, the subject is shown the color words printed in different ink (conflicting the semantic meaning of the word) and is asked to name the colors in which the words are printed. Since word reading is an automatic process, performance in this task requires the subject to inhibit the reading while focusing on the color naming.

#### TABLE 3 | Description of the task and the required abilities for the psychological tests administered in the study.


Typically, the Stroop effect is derived by comparing the correct responses and performance times of the third task to either one of the control tasks by subtracting the control task from the third task. In effect, this subtraction leaves the actual cognitive process we are interested in (the "cost" of inhibiting the response to the automatic word reading process in the third task). Thereby, the Stroop variable used for this study corresponds to the subtraction of the reaction time obtained in the interference task minus the reaction time obtained in the color naming task. The higher the value, the higher the effort needed to selectively filter out unattended information and focus on attended ones.

#### Wechsler Memory Scale III

The WMS-III is a neuropsychological test designed to measure different memory functions (Wechsler, 1997b). In the present study, we administered the following four WMS-III subtests: Letter-Number Sequencing, Spatial Span (forward and backward), Wordlists I, and Wordlists II, which measure, verbal, spatial, and episodic memory components, respectively. Letter-Number Sequencing and Spatial Span were used to calculate the Working Memory Index (WMI). More details on the tests can be found in **Table 3**.

#### Big Five Inventory

The BFI contains 44 items designed to measure an individual on five main dimensions of personality: Openness to experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (John and Srivastava, 1999). Items are rated on a five-point scale (where 1 corresponds to strongly disagree and 5 to strongly agree), and the score for each personality dimension corresponds to the average rating for the relevant items. Examples of the multiple possible choices for the item: "I see myself as someone who. . ." are: "Is talkative," "Is reserved," "Is full of energy," and "Can be tense."

#### Procedure and Statistical Analysis

The participants were invited to the Advanced Magnetic Imaging (AMI) laboratory for the neuroimaging session of the broad Tunteet project (coordinated by EB). There, before and after the brain scanning session, they were administered the following tests by a graduate student of psychology under supervision of a licensed and expert psychologist (TS): Stroop test, WMS-III, and WAIS-III. In another session taking place at the Biomag laboratory at Helsinki Central University Hospital, the same participants were invited for the second part of the brain scanning and personality data were collected by administering

the complete BFI. The total duration of each experimental session was around 3 h. The psychological tests, in total, did not last longer than 2 h. For the purposes of the present study we only used the results of the psychological tests.

Before testing for group differences in cognitive abilities along musicianship, we controlled that there were no significant group differences in background variables. Therefore, two regression models were performed: the first includes background variables of age, years of general education, SES, personality traits variables and years of music playing as predictors of musicianship (classification in non-musicians, amateurs, and musicians); the second model was performed for amateurs and musicians only and included music-background variables such as onset of musical training and musical practice, average of weekly hours spent in practicing and on listening to music, together with years of music playing as predictors of musicianship. By doing so, we obtained the relative contribution of each variable in predicting group differences when holding constant the others.

Age, years of general education, SES and personality traits variables were normally distributed across participants. Because years of music playing was not normally distributed, we decided to square-root the variable and use its transformation in the analyses. Results and means values for each variable of the first regression model are provided in **Table 1**, whereas the others are provided in **Table 2**. The means displayed for years of music playing are the original values (not the square-root transformed).

In order to test for group differences along cognitive abilities, we performed Multivariate analysis of variance (MANOVA) inserting musicianship as between-subjects factor and the main indices of the cognitive tests scores (Stroop, WMS-WMI, and WAIS-FSIQ) as dependent variables. Having more than two dependent variables and because they significantly correlated with each other (FSIQ-WMI: r = 0.628, p < 0.001; FSIQ-Stroop: r = −0.337, p = 0.001; WMI-Stroop: r = −0.327, p = 0.001), we opted for MANOVA. Indeed, such statistical test is able to take the relationship between dependent variables into account (Warne, 2014). Assumptions of linearity and absence of collinearity were tested before proceeding with the analysis. A separate ANOVA was then performed to examine the differences among groups along subtests of the WAIS-FSIQ, namely WAIS-VIQ and WAIS-PIQ. Post hoc tests with Bonferroni correction were performed for both the MANOVA and ANOVA models to avoid false positive discoveries while calculating group comparisons. The Bonferroni-adjusted alpha level for post hoc tests was obtained by dividing the standard alpha at 0.05 by the number of comparisons [defined as N(N−1)/2]. In our case, with 3 groups and 3 variables, there were 9 comparisons; thus, the alpha level was reduced at 0.0056. To be noted, the p-values reported in the Results section are Bonferroni-corrected p-values, so that a corrected p < 0.05 corresponds to a non-corrected p < 0.005 and hence is interpreted as a significant effect.

To deepen the exploration of the relationship between musical practice and cognitive abilities, and to control for the influence of potential confounding variables, we performed three backward stepwise linear regression analyses inserting background variables of age, years of education, years of music playing and the five personality trait indices as predictors of FSIQ, WMI, and Stroop, respectively. By doing so, we would obtain the unique contribution of each of the variables, and of music practice, in predicting the variance observed in the cognitive test scores. Backward stepwise regression starts with a saturated model and gradually eliminates (stepwise) variables from the regression model in order to find the predictors that best explain variance in the dependent variable. Therefore, multiple models are generated until model fit cannot be further improved.

Because of missing data along some of the demographic variables, these models only included 60 participants from our sample (20 subjects per group). Therefore, to estimate curve fit along our whole sample we performed three further independent linear regression models by inserting years of music playing as the only predictor of, respectively, FSIQ, WMI, and Stroop. Lastly, to estimate the partial correlation of each of the cognitive measures to musical practice, FSIQ, WMI, and Stroop were included in the same model and regressed against years of music playing. Models' effect sizes are always reported as adjusted R 2 .

## RESULTS

## Musicians, Amateurs and Non-musicians

As compared to amateurs, musicians had spent more years and hours practicing an instrument. Musical background information for amateurs and musicians is provided in **Table 2**, along with mean, SD and the associated p, B, partial correlation values and 95% confidence interval (CI) for the group comparisons.

Despite the absence of differences in background variables, musicians performed better in all cognitive tests as compared to the other groups, as shown in the histogram in **Figure 1** and in **Table 4**. The MANOVA performed to compare participants' FSIQ, WMI, and Stroop cognitive scores exhibited a significant group effect: Pillai's Trace [F(2,98) = 2.889, p = 0.01]. The test of between-subject effects reported significant group differences in WAIS-FSIQ [F(2,98) = 4.00, p = 0.021, adjusted R <sup>2</sup> = 0.057], WMS-WMI [F(2,98) = 4.11, p = 0.019, adjusted R <sup>2</sup> = 0.059), Stroop [F(2,98) = 6.68, p = 0.002, adjusted R <sup>2</sup> = 0.102] as provided in **Table 4** (on the left side). The effect sizes of these adjusted R 2 are moderate (Cohen, 1988). A separate ANOVA model was performed to assess group differences along VIQ and PIQ and reported significant differences for the former only: VIQ [F(2,98) = 3.46, p = 0.035]; PIQ [F(2,98) = 1.95, p = 0.148]. Results are provided on the right side of **Table 4**.

Post hoc tests controlled by Bonferroni correction reported significantly higher values in favor of musicians as opposed to non-musicians for all the different tests: FSIQ (p = 0.018), WMI (p = 0.018), Stroop (p = 0.001), and VIQ (p = 0.030) as provided in **Table 4**. In turn, amateurs did not differ significantly from musicians and non-musicians in either of tests.

To deepen our understanding of the relationship between musical training and cognitive abilities, we performed stepwise backward linear regression modeling by inserting 8 demographic variables (age, years of music playing, years of general education and personality indices of neuroticism, extraversion, openness to

experience, agreeableness and conscientiousness) as predictors of FSIQ, WMI, and Stroop (independently). Backward regressions generated in all cases 7 models in which all variables mentioned above were excluded one-by-one except for years of musical practice, which resulted, in all the cases, the only significant factor associated with FSIQ [F(1,59) = 6.76, p = 0.012, partialcorrelation = 0.321, β = 0.321], WMI [F(1,59) = 7.23, p = 0.009, partial-correlation = 0.330, β = 0.330], and Stroop [F(1,59) = 6.81, p = 0.012, partial-correlation = −0.324, β = −0.324].

This approach evidenced that (i) when holding constant the other background variables, years of music playing was the only factor significantly associated with the three cognitive measures (FSIQ, WMI, and Stroop). Besides, by excluding the other factors from the model, we found that (ii) years of music playing was the only predictor necessary to significantly explain the variance in the dependent variables.

Because of missing values in some background variables, not all of the participants were included in these regression models. When regressed independently against years of music playing, WMI and Stroop showed a significant association: WMI [F(1,99) = 7.80, p = 0.006, adjusted R <sup>2</sup> = 0.064, β = 0.27], Stroop [F(1,99) = 8.46, p = 0.004, adjusted R <sup>2</sup> = 0.069, β = −0.28], and FSIQ [F(1,99) = 3.37, p = 0.069, adjusted R <sup>2</sup> = 0.023, β = 0.18]. **Figure 2** represents the curve fit for years of music playing with WMI and Stroop: WMI showed a positive association with music playing duration, whereas Stroop showed a negative relation.

To test for the unique association of each of the cognitive variables with years of music playing, an additional model included the three cognitive variables (FSIQ, WMI, and Stroop) and regressed them against years of music playing. This model resulted significant [F(3,97) = 4.55, p = 0.005, adjusted R <sup>2</sup> = 0.096]. When holding constant the other cognitive variables, WMI and Stroop were still significantly associated with years of musical playing, whereas FSIQ was not: WMI (p = 0.041, β = 0.24), Stroop (p = 0.022, β = −0.24), and FSIQ (p = 0.70, β = −0.05). Standardized coefficients, significance levels, 95% CI and partial correlation values of the regression model for the three cognitive variables (FSIQ, WMI, and Stroop) are provided in **Table 5**. Furthermore, we repeated the regression excluding the zero values for the variable Years of Music Playing. This operation resulted in exclusion of 36 non-musicians scoring less than 1 in the square-root transformed variable years of music playing, hence in a consistent decrease of statistical power. In spite of this, we could still observe tendencies for an association between cognitive measures of FSIQ, WMI, and Stroop and Years of playing: WMI [F(1,64) = 2.90, p = 0.094, adjusted R <sup>2</sup> = 0.028, β = 0.208], Stroop [F(1,64) = 4.40, p = 0.04, adjusted R <sup>2</sup> = 0.05, β = −0.25], and FSIQ [F(1,64) = 3.55, p = 0.064, adjusted R <sup>2</sup> = 0.038, β = 0.23].

### DISCUSSION

The aim of the present study was to investigate the relationship between musical training and higher-order cognitive functions. Although several studies have highlighted anatomical and functional differences between the brains of expert musicians and non-musicians, only few studies have investigated intelligence and executive functions in adults with long-term musical training controlling for possible confounding variables. Our results

#### TABLE 4 | Mean and statistical group comparisons along cognitive abilities.


Top: means, standard errors and mean difference for the scores of Verbal Intelligence Quotient (VIQ), Performance Intelligence Quotient (PIQ), Full-Scale Intelligence Quotient (FSIQ), Working Memory Index (WMI), and Stroop for participants grouped into non-musicians, amateurs and musicians. To be noted, Stroop values refer to reaction times: smaller reaction times indicate better performance. Bottom: on the left side, results of the Multivariate analysis of variance (MANOVA) performed to compare musicians (M), amateurs (A) and non-musicians (NM) reporting significant differences along Full Scale IQ (WAIS-FSIQ), Working Memory Index (WMS-WMI), and Stroop test. On the right side, results of a further ANOVA showing the differences between VIQ and PIQ scores. Post hoc comparisons and p-values corrected for Bonferroni correction are provided at the end of the table for both models. Significant statistical values are reported in bold.

FIGURE 2 | Curve fit showing, on the left, the positive relationship between the variable Years of Music Playing (as square root transformed variable) and Working Memory Index score from Wechsler Adult Intelligence Scale III (WAIS-III) test. On the right, the negative relationship between Years of Playing and Stroop Index score. Y-axis corresponds to the individual WMI and Stroop scores, respectively.

contribute to the literature by showing that adults exposed to professional long-term musical training outperform adult nonmusicians in standardized cognitive tasks designed to measure general intelligence (g), WM and attentive abilities and that these group differences are not associated with any of the examined background variables except for duration of musical playing.

TABLE 5 | Regression model for cognitive abilities. Standardized coefficients for the regression model where Stroop, FSIQ and WMI were included and regressed against the variable "years of music playing."


Beta and t-values, as well as significance levels and 95% confidence intervals (CI) are provided for each of the variables when holding constant the others. Significant statistical values are reported in bold.

Specifically, when testing group differences with analysis of variance we found significantly higher performance in musicians compared to non-musicians in the cognitive tests' general indices of WMI, FSIQ, and Stroop and significantly greater scores. Slightly less strong effects were obtained in one subscale of the WAIS-III, assessing (VIQ). Moreover, by using regression models we noticed an association between all participants' cognitive abilities and years of music playing, which, however, did not explain all of the observed variance. To be noted, this association might not necessarily be linear because when removing the participants completely lacking any musical background, the significance threshold of the association was not reached. Nevertheless, overall these findings converge to demonstrate a positive relationship between musical training and cognitive functions. As proposed by Schellenberg (2006), this relationship could be seen as a continuum dependent on the duration and intensity of training. These findings are in line with previous research (both correlational and experimental) showing associations between musical training and intelligence measures (Chan et al., 1998; Gromko and Poorman, 1998; Cheek and Smith, 1999; Hetland, 2000; Brandler and Rammsayer, 2003; Brochard et al., 2004; Schellenberg, 2004, 2006; Swaminathan et al., 2017, 2018). For instance, a previous study with children conducted by Schellenberg (2006) reported an association between musical training and cognitive abilities (VIQ, FSIQ). Lastly, the increased VIQ we found when comparing musicians to non-musicians might be in line with previous findings, which associated music training with improvements in verbal abilities, such as reading (Anvari et al., 2002; Kraus et al., 2014) and phonology (Francois et al., 2015; see Moreno, 2009 for a review). However, differently from previous studies, we show that these effects are not affected by potential confounding variables: indeed, by controlling for age, education years, SES and personality variables, we demonstrate that the relationship between executive functions and years of music playing is statistically significant. In particular, although not explaining all of the variance, we found a positive effect for WM: the longer the musical practice, the higher the WM functions. In turn, Stroop attentive scores show a negative slope: the longer the musical practice, the shorter interference time, the bigger the attentive abilities.

Together with previous studies, our results allow us to argue that cognitive benefits associated with music practice might be evident along the lifespan. In accordance with previous evidence, we argue that 'widespread effects of musical training on cognitive processing might occur because music lessons train attentional and executive functioning, which benefit almost all cognitive tasks' (Hannon and Trainor, 2007). It is important to consider, though, that additional variables not considered in the present study, such as genetic factors and parental personality traits, might have had a relevant influence on the choice of our participants to engage in and persist with musical training (Mosing et al., 2014; Mosing and Ullen, 2018), as well as in the development of their cognitive abilities, as pointed out by previous studies (Butkovic et al., 2015; Corrigall and Schellenberg, 2015). Moreover, cognitive advantages might be evident particularly for those individuals who take music lessons and/or play music in addition to other academic and studying activities. Indeed, in a previous study the cognitive effects of musical training were not visible in participants who studied only music (Schellenberg, 2011). In contrast, all participants in our sample were recruited among university students and qualified professionals and reported a mean IQ higher than the average Finnish population. This long academic background might be the key difference between the present and previous studies. Because musicians differed from the other participants only regarding their musical expertise and given the positive relation between executive functions (WM, Stroop-derived attentive score) and length of musical training, our results suggest that cognitive abilities might be influenced by musical practice.

We suggest that the observed differences in cognitive performance might represent the behavioral manifestations of brain differences identified when comparing musicians with non-musicians. Indeed, neuroimaging and neurophysiological studies showed stronger and faster neural responses (especially to sounds) and enlarged neuroanatomical structures in musicians as compared with non-musicians, particularly in (pre)motor, auditory, prefrontal and visual regions (Münte et al., 2002; Gaser and Schlaug, 2003; Pantev et al., 2003; Zatorre and McGill, 2005; Hyde et al., 2009; Zuk et al., 2014; Baer et al., 2015; Bonetti et al., 2017, 2018). These modifications of brain functionality and anatomy have been associated with usedependent regional growth of neuronal cells engaged throughout the training and their structural adaptation in response to the intense environmental demands of music practice (for reviews, see Kolb and Muhammad, 2014; Reybrouck and Brattico, 2015).

To conclude, our study highlights an association between musical training and cognitive abilities. We showed that adult participants with similar educational background but varying in their musical expertise exhibited differences in intelligence, workingmemory and attentive abilities and that executive functions are significantly associated with the duration of music practice.

#### ETHICS STATEMENT

fpsyg-10-01704 July 26, 2019 Time: 12:8 # 10

Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (approval number: 315/13/03/00/11, obtained on March the 11th, 2012).

#### AUTHOR CONTRIBUTIONS

EB conceived the study, prepared the ethics permission, coordinated the data collection, designed the initial statistical analyses, and fully edited the final version of the manuscript. AC conducted the final statistical analyses together with LB and wrote the initial version of the manuscript. MK contributed to the data collection, experimental design as well as coordinated the data scoring, and ensured the data quality. TS selected the psychological tests, and supervised the data collection and scoring. All authors

#### REFERENCES


contributed to writing and approved the final version of the manuscript.

#### FUNDING

The study was supported by various funds from the University of Helsinki, Aalto University, and the Academy of Finland. The Center for Music in the Brain (MIB) is supported by the Danish National Research Foundation (DNRF 117).

#### ACKNOWLEDGMENTS

We wish to thank the entire team who contributed to the "Tunteet" data collection: Brigitte Bogert, Benjamin Gold, Johanna Normström, Taru Numminen-Kontti, Mikko Heimölä, Toni Auranen, Marita Kattelus, Jyrki Mäkelä, Mikko Sams, Petri Toivainen, Mari Tervaniemi, Anja Thiede, and Alessio Falco.



impact of community music lessons in at-risk children. Front. Neurosci. 8:351. doi: 10.3389/fnins.2014.00351



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Criscuolo, Bonetti, Särkämö, Kliuchko and Brattico. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Auditory T-Complex Reveals Reduced Neural Activities in the Right Auditory Cortex in Musicians With Absolute Pitch

#### Masato Matsuda, Hironaka Igarashi and Kosuke Itoh\*

Center for Integrated Human Brain Science, Brain Research Institute, Niigata University, Niigata, Japan

Absolute pitch (AP) is the ability to identify the pitch names of arbitrary musical tones without being given a reference pitch. The acquisition of AP typically requires early musical training, the critical time window for which is similar to that for the acquisition of a first language. This study investigated the left–right asymmetry of the auditory cortical functions responsible for AP by focusing on the T-complex of auditory evoked potentials (AEPs), which shows morphological changes during the critical period for language acquisition. AEPs evoked by a pure-tone stimulus were recorded in high-AP musicians, low-AP musicians, and non-musicians (n = 19 each). A balanced non-cephalic electrode (BNE) reference was used to examine the left–right asymmetry of the N1a and N1c components of the T-complex. As a result, a left-dominant N1c was observed only in the high-AP musician group, indicating "AP negativity," which has previously been described as an electrophysiological marker of AP. Notably, this hemispheric asymmetry was due to a diminution of the right N1c rather than enhancement of the left N1c. A left-dominant N1a was found in both musician groups, irrespective of AP. N1c and N1a exhibited no left–right asymmetry in non-musicians. Hence, music training and the acquisition of AP are both accompanied by a left-dominant hemispheric specialization of auditory cortical functions, as indexed by N1a and N1c, respectively, but the N1c asymmetry in AP possessors was due to reduced neural activities in the right hemisphere. The use of a BNE is recommended for evaluating these radially oriented components of the T-complex.

Keywords: music training, language, auditory evoked cortical potentials, brain maturation and development, balanced non-cephalic electrode

## INTRODUCTION

Absolute pitch (AP) is the ability to identify pitch names of arbitrary musical tones without being given a reference pitch (Miyazaki, 1988, 1990; Takeuchi and Hulse, 1993). In addition to a possible genetic predisposition (Baharloo et al., 2000), the acquisition of AP usually requires early musical training (<7 years of age; Miyazaki, 1988; Takeuchi and Hulse, 1993; Miyazaki and Ogawa, 2006), the critical period similar to acquisition of a first language (Newport, 1990; Kuhl, 2000; Russo et al., 2003; Vitouch, 2003; Deutsch et al., 2006). Specialized neural circuitries that underpin AP are hypothesized to be established through early musical training

#### Edited by:

Claude Alain, Rotman Research Institute (RRI), Canada

#### Reviewed by:

Peter Schneider, Heidelberg University, Germany Gavin M. Bidelman, The University of Memphis, United States

> \*Correspondence: Kosuke Itoh itoh@bri.niigata-u.ac.jp

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

> Received: 11 March 2019 Accepted: 22 July 2019 Published: 06 August 2019

#### Citation:

Matsuda M, Igarashi H and Itoh K (2019) Auditory T-Complex Reveals Reduced Neural Activities in the Right Auditory Cortex in Musicians With Absolute Pitch. Front. Neurosci. 13:809. doi: 10.3389/fnins.2019.00809

**36**

(Pantev et al., 1998; Ohnishi et al., 2001), the timing of which is coincident with the maturation of language-related brain functions (Krashen, 1973; Newport, 1990; Kuhl, 2000). These parallels between AP and language are consistent with the view that AP is essentially a verbal function for labeling pitches with their names (Itoh et al., 2005).

Cerebral hemispheric asymmetry is a hallmark of language, and also of AP. AP possessors show left-dominant neural responses to both speech and music stimuli (Ohnishi et al., 2001; Itoh et al., 2005; Schulze et al., 2009, 2013; Wilson et al., 2009; Oechslin et al., 2010b), whereas non-possessors typically show left-dominant neural activities for speech stimuli alone. The geometric location of electromagnetic sources of neural responses to musical pitches are asymmetrically localized in AP possessors but symmetrically localized in non-possessors (Hirata et al., 1999). Functional connectivity in the left superior temporal gyrus is enhanced in AP possessors (Loui et al., 2012). The fractional anisotropy of the superior longitudinal fasciculus, which is a major cortical fiber that is crucial for language and music functions, is left-dominant in musicians with AP, whereas it is symmetrical in musicians with relative pitch (Oechslin et al., 2010a). Volumetric studies have also shown that AP is associated with greater left-dominant asymmetry of gray matter volume of the perisylvian brain areas (Schlaug et al., 1995; Keenan et al., 2001; Ohnishi et al., 2001; Wilson et al., 2009), which was due to either a reduction of the gray matter volume in the right hemisphere (Schlaug et al., 1995; Keenan et al., 2001; Wilson et al., 2009), or its increase in the left hemisphere (Zatorre et al., 1998). However, other studies have reported right-dominant neural activities (Hirose et al., 2005; Wengenroth et al., 2014; Burkhard et al., 2019), increased right auditory cortical volume (Wengenroth et al., 2014), or higher myelination associated with greater functional connectivity in the right auditory cortex (Kim and Knösche, 2016, 2017) in AP possessors. Integrating all these findings into a coherent hypothesis remains difficult. Nevertheless, the available evidence is consistent with the view that the typical hemispheric specialization of cerebral cortical functions for auditory processing is altered in AP possessors, the details of which remain to be elucidated.

This study analyzed the left–right asymmetry of the auditory cortical functions in AP possessors by focusing on the T-complex of auditory evoked potentials (AEPs). The T-complex is a subcomponent of the auditory N1 response, which has multiple generators in the auditory and other brain regions (Näätänen and Picton, 1987). Regarding the auditory cortical generators, two categories of sources can be distinguished on their dipole orientation, tangential or radial (Scherg et al., 1989; Woods, 1995). The tangential sources are located on the superior temporal plane, and the generated electrical potentials project vertically to the fronto-central scalp: the N1b (∼100 ms), or the vertex N1, is the most representative example. The radial sources are located on the lateral surface of the temporal lobe, and therefore, the electrical potentials project laterally to the temporal scalp. The T-complex, which is the focus of this study, is a radial component that consists of three peaks that are recorded over the temporal scalp: N1a (75–95 ms), Ta (100–115 ms), and N1c (130–170 ms) (Näätänen and Picton, 1987; Woods, 1995). The sources of T-complex have been estimated in the secondary or higher auditory cortex of the superior temporal gyrus (Scherg et al., 1989; Woods, 1995; Shahin et al., 2003).

The T-complex is a potentially promising index for investigating the hemispheric asymmetry of auditory cortical functions in AP possessors. First, as these potentials originate from radially oriented sources in the temporal lobe, the waves recorded over the left and right temporal scalp reflect neural activities in the left and right auditory cortices, respectively, (Knight et al., 1988). Clear separation of left and right neural activities would be difficult with the tangential components of AEP that are distributed maximally along the midline, such as the N1b. Second, and more importantly, the waveform of the T-complex shows morphological changes during the critical period of language acquisition (Pang and Taylor, 2000; Ponton et al., 2002; Tonnquist-Uhlen et al., 2003; Rinker et al., 2017), which likely reflect maturational changes in auditory cortical functions, and it is also affected by music training in childhood (Shahin et al., 2003). Hence, the functional hemispheric asymmetry for AP may manifest as an altered morphology of the T-complex.

In fact, a previously identified electrophysiological marker of AP, or "AP negativity" (Itoh et al., 2005), closely resembles the N1c component of the T-complex in terms of polarity, amplitude, latency, and scalp distribution. AP negativity is a negative AEP of approximately 1 µV in amplitude that peaks at approximately 150 ms in latency over the posterior temporal scalp (when recorded with a linked earlobe reference). Its amplitude is greater (i.e., more negative) in the left hemisphere, in accordance with the reported left-dominant pitch processing in AP (Brancucci et al., 2009; Wilson et al., 2009). However, the original study by Itoh et al. (2005) had several issues. First, a linked earlobe reference was used, which was not optimal for evaluating the T-complex; electric potentials from radially oriented sources could have "activated" the earlobe electrodes and thus might have altered the wave morphology. Second, the number of trials for averaging was relatively small (90 trials) for evaluating the small potential of approximately 1 µV. Furthermore, no subsequent study has yet replicated the elicitation of AP negativity in possessors of AP, to the best of our knowledge.

Therefore, we examined the morphology of the T-complex in AP possessors with a particular focus on the left–right asymmetry of the N1c component with an appropriate study design. Many trials (n = 300) were used to record the T-complex and evaluate its amplitudes with high reliability. To obviate the confounding effects of general music training, three groups of participants were used: musicians with high levels of AP (high-AP musicians), musicians with low-levels of AP (low-AP musicians), and non-musicians. Comparisons between the high-AP and low-AP musicians would delineate the effects of AP, whereas comparisons of the two musician groups with the nonmusicians would identify the general effects of music training that are not specific to AP.

Furthermore, and quite important, a balanced non-cephalic electrode (BNE; Stephenson and Gibbs, 1951) was employed as the reference for evaluating the AEP amplitudes. Since no electrodes placed on the head (including those on the earlobes)

can be assumed to be 'quiet' with respect to cortical potentials, measurements of AEP amplitude are always confounded by neural activities recorded at the cephalic reference channels. This poses a serious problem when the mastoid or earlobe electrodes are used as a reference (or a part of the reference) to evaluate the T-complex, because the radial sources of the T-complex are expected to "activate" these reference channels. The BNE method ameliorates this problem by using a non-cephalic reference that is placed outside the head. Contaminations of the non-cephalic channel by electrocardiograms (ECGs) are eliminated by pairing two electrodes that are placed anteriorly and posteriorly at the base of the neck. As the ECGs recorded from these electrodes have an approximately the same magnitude but with opposite polarities (with respect to the scalp), the cardiac signals can be canceled by balancing the electrical resistance between these two non-cephalic electrodes. This is the first experiment to use the BNE reference to record electroencephalograms (EEGs) in AP possessors, to our knowledge.

## MATERIALS AND METHODS

### Participants

Fifty-seven healthy undergraduate students (age: 18–26 years, 15 males) participated in this study, after providing written informed consent. The participants were predominantly females, because subjects with AP were more readily available to us in this gender. This was a limitation of the study, as there is some evidence for gender differences in hemispheric lateralization related to music and AP (Luders et al., 2004; Merrett et al., 2013). Nevertheless, our results were apparently not affected by the gender of the participants (see "Results"). All participants were right-handed, as confirmed using the Edinburgh Handedness Inventory (Oldfield, 1971).

The participants were categorized into three groups according to their levels of music training and AP ability: high-AP musicians (n = 19, 3 males), low-AP musicians (n = 19, 5 males), and non-musicians (n = 19, 7 males). All participants had received basic-level music education as one of the curriculum requirements in Japan. Therefore, the level of music training in this study was defined in terms of the number of years in music training given by professional music teachers outside standard school education. With this definition, the average (± standard deviation) years in music training was 15.2 (± 2.0) in the high-AP group and 13.8 (± 3.2) in the low-AP group; the difference was not significant (t = 1.5, degrees of freedom (df) = 36, P > 0.05). The age of commencement of music training was also comparable between the high-AP musicians (4.3 ± 1.2 years old) and the low-AP musicians (4.9 ± 1.5 years old; t = 1.4, df = 36, P > 0.05). The non-musicians had received less than 5 years of music training, and the average was 0.9 (± 1.5) years.

The participant's AP ability was evaluated using a previously described pitch naming test (Itoh et al., 2005), in which they were instructed to identify pitch classes (e.g., C, C#, and D, without distinguishing octaves) of 60 random piano tones covering five octaves. The AP test was essentially identical to the AP test developed by Miyazaki (1988), which has been validated by many experiments (e.g., Miyazaki, 1990; Itoh et al., 2003, 2005; Miyazaki and Ogawa, 2006; Miyazaki et al., 2012; Itoh and Nakada, 2018). Critical steps were taken to prevent the use of relative pitch. First, no reference tone was presented at any point of the test, and feedback on the accuracy of the participant's responses was not provided. Therefore, the participants had to use their own internal long-term memory to correctly identify the pitch class. Second, the sequence of the test tones was randomized to make the use of relative pitch difficult. Finally, the inter-trial interval was set relatively short. The test sounds were presented every 5 s and the duration of sounds was approximately 1 s long; therefore, the notes had to be identified within a response time window of approximately 4 s. The criterion for high-AP musicians was 90% correct or higher, and for low-AP musicians, it was 40% or lower.

This study conformed to The Code of Ethics of the World Medical Association (Declaration of Helsinki), and was conducted in accordance with the human research guidelines of the Internal Review Board of the University of Niigata.

## Stimuli and Procedure

The stimulus was a sinusoidal tone of 1046.5 Hz, which corresponded to the frequency of C6 (American notation). A single, fixed frequency was used, because the use of multiple frequencies introduces pitch changes, neural responses to which might confound the results by the mechanisms of neural adaptation and/or mismatch negativity (Tervaniemi et al., 1993; Elmer et al., 2015; Rogenmoser et al., 2015; Greber et al., 2018). With a repeated presentation of the note, the participants could perceive it as the keynote of the stimulus sequence, but it required AP to identify the specific note as we did not provide any information about the stimulus. The sound had a duration of 350 ms (10-ms rise-time and 50-ms fall-time). The stimulus was presented monaurally (left-ear and right-ear) or binaurally in a random order via air-conduction insert earphones (Eartone 3A, Etymotic Research, Elk Grove Village, IL, United States) with a stimulus intensity of 65 dB SL. A total of 900 tones (300 tones for each ear condition) were presented using randomly varying stimulus onset asynchrony in the range of 1700–1900 ms (mean: 1800 ms). Only binaural AEPs were analyzed in this study, as it is more natural than monaural listening situations; we seldom receive completely lateralized monaural stimulations while listening to music. The monaural data will be analyzed in a subsequent paper.

The participants sat in a comfortable chair in a temperaturecontrolled and sound-attenuated room during the EEG recordings. They listened to the sounds passively while playing a video game (Nintendo DS, Nintendo Co., Ltd., Kyoto, Japan) to maintain wakefulness and prevent the participants from explicitly labeling the tones.

### EEG Recording and Analysis

Twenty-two Ag electrodes were applied to the scalp of each participant according to the international 10–20 system (Jasper, 1958), and the electrodes were positioned at Fp1, Fp2, Fz, F3, F4, F7, F8, Cz, C3, C4, T3, T4, CPz, Pz, P3, P4, T5, T6, O1, O2, and the left and right earlobes. Horizontal and vertical

electro-oculograms (EOGs) were also recorded. In addition, two electrodes were placed on the right sternoclavicular junction and the tip of the seventh cervical spine for recording BNE data (Stephenson and Gibbs, 1951). All electrodes were referenced to CPz while collecting data. The EEGs and EOGs were amplified by a SynAmp amplifier (Neuroscan Labs, El Paso, TX, United States) with 16-bit resolution, a gain of 5000, and an analog–digital conversion rate of 10 kHz, band-passed between 0.05 and 2000 Hz. The electrode's impedance was kept below 5 k.

After data acquisition, the EEG data were downsampled to 1 kHz, re-referenced to the BNE, segmented (from −100 to 200 ms relative to the stimulus onset), and baseline-corrected to the pre-stimulus period average. The segmented data were checked for artifacts using a threshold criterion of ± 100 µV using the Fp1, Fp2, F7, and F8, and horizontal and vertical EOG channels to remove ocular artifacts. The non-rejected data were averaged and time-locked to the stimulus onset to obtain AEPs for each participant, which were low-pass filtered at 50 Hz (48 dB/oct). The number of non-rejected data segments was in the range of 224–288 (mean 262) in the high-AP musicians, 223– 296 (mean 267) in the low-AP musicians, and 220–300 (mean 267) in the non-musicians. Finally, the AEPs were averaged across participants to obtain grand average waveforms.

The peak amplitudes of N1a and N1c at the bilateral temporal electrode sites (T3 and T4) were analyzed. The N1b measured at the central electrode site (Cz) was also analyzed for comparison. The Ta component was not analyzed because it was small in amplitude and difficult to measure. The grand average waveforms were used to define the time windows for N1a (70–80 ms at T3, 73–83 ms at T4), N1b (77–97 ms) and N1c (112–132 ms at T3, 119–139 ms at T4), and the peaks were defined as the most negative point in these time slots. These time windows were centered at the peak latency of the components as identified in the grand average waveforms and had a time width of 10 ms for N1a, and 20 ms for N1b and N1c.

The peak amplitudes of N1a and N1c were analyzed using a mixed-design two-way analysis of variance (ANOVA) with the group (high-AP musicians, low-AP musicians, and nonmusicians) as the between-subjects factor and the hemisphere (T3 and T4) as the within-subjects factor. For the N1b amplitude, a one-way ANOVA with the group as the betweensubjects factor was performed. An alpha level of P = 0.05 was used as the significance criterion. P-values were corrected for multiple comparisons by using the Bonferroni method wherever appropriate.

#### RESULTS

We first evaluated how the use of BNE reference affected the AEP waveforms. **Figure 1** compares the group-averaged AEPs for all channels, which were obtained using the BNE reference

FIGURE 1 | Group-averaged AEP waveforms for all EEG channels, obtained using a BNE reference (red lines) or a linked-earlobes reference (black lines). The effect of reference was evident at the temporal electrode sites where the T-complex was recorded.

and the linked-earlobes reference. Substantial differences in the waveforms were observed at the temporal electrode sites. This was an expected result, because the radially oriented dipoles of the T-complex would produce electrical potentials at the earlobe electrodes (Wolpaw and Wood, 1982), which was also confirmed in our BNE-referenced data. When the linked-earlobes reference was used, the potentials at the earlobes were subtracted from the AEPs in the other channels to alter the wave morphologies. Specifically, the T-complex obtained with the linked-earlobes reference was artifactually reduced in amplitude when compared to that obtained with the BNE reference.

Accordingly, we used the BNE reference to evaluate the left–right asymmetry of N1a and N1c amplitudes in high-AP musicians, low-AP musicians, and non-musicians. **Figure 2** plots the T-complex waveforms at the temporal electrode sites, where they were maximal. **Figure 3** shows the N1a and N1c amplitudes for all individual subjects. Four main findings were obtained.

First, only the high-AP musicians showed a left-dominant asymmetry in the N1c amplitude (**Figures 2**, **3**). The two-way ANOVA revealed a significant group × hemisphere interaction [F(2,54) = 3.4, P = 0.042] and follow-up one-way ANOVAs indicated a significant hemispheric asymmetry in high-AP musicians, [F(1,54) = 6.4, P = 0.014] but not in low-AP musicians [F(1,54) = 0.3, P = 0.604] or in non-musicians [F(1,54) = 1.3, P = 0.264].

Second, the left-dominant asymmetry of N1c in high-AP musicians was due to a diminution of the right N1c rather than enhancement of the left N1c (**Figure 4**). When the N1c amplitude was analyzed using a one-way ANOVA with the group as a factor, the main effect was significant at the right temporal electrode [T4, F(2,54) = 5.1, P = 0.009] but not at the left temporal electrode [T3, F(2,54) = 0.8, P = 0.469]. Post hoc analyses at T4 indicated that the right N1c amplitude was significantly smaller in the high-AP musicians than in the low-AP musicians (P = 0.033, Bonferroni corrected) and non-musicians (P = 0.017, Bonferroni corrected).

Third, both high-AP musicians and low-AP musicians showed a left-dominant asymmetry in the N1a amplitude (**Figures 2**, **3**). The two-way ANOVA revealed a significant group × hemisphere interaction [F(2,54) = 3.3, P = 0.043], indicating that the effect of the hemisphere varied between the groups. Follow-up one-way ANOVAs indicated a significant main effect of the hemisphere in high-AP musicians [F(1,54) = 18.4, P < 0.001] and low-AP musicians [F(1,54) = 19.0, P < 0.001] but not in non-musicians [F(1,54) = 1.3, P = 0.251].

The above analyses were conducted by treating AP as a categorical variable. Considering that AP is a graded trait (Miyazaki, 1990; Bermudez and Zatorre, 2009), we additionally performed multiple regression analyses in which the N1c amplitudes were evaluated with respect to the continuous independent variables of AP test score (AP) and years in music training (Years). As a result, the model was a significant predictor of N1c at the right temporal electrode site T4 [F(2,35) = 7.1, P = 0.003]: The variable AP contributed significantly to the model (β = 0.562, t(35) = 3.8, P = 0.001), but Years did not (β = -0.123, t(35) = 0.8, P = 0.417). Regarding the left N1c at T3, the model was not a significant predictor of its amplitude (F(2,35) = 1.7, P = 0.206), with an R <sup>2</sup> of 0.086: The contribution

of AP was not significant (β = 0.308, t(35) = 1.8, P = 0.078), and the contribution of Years was also not significant (β = -0.103, t(35) = 0.6, P = 0.549). These results were consistent with the above ANOVA findings regarding the N1c.

No statistically significant effect was observed for the N1b amplitude. The one-way ANOVA with the group as a factor revealed that the main effect was not significant [Cz, F(2,54) = 1.7, P = 0.195].

One limitation of the experiment was that there were fewer male participants than female participants. Nevertheless, there were no apparent gender effects on the above findings, as could be appreciated in **Figure 3**.

### DISCUSSION

The present study focused on the left–right asymmetry of the T-complex to investigate hemispheric asymmetry of the auditory cortical functions underpinning AP. Compared to musicians who

FIGURE 3 | The N1a and N1c amplitudes for all individual participants. Within-subject data are connected with lines. An asterisks (<sup>∗</sup> ) denotes a statistically significant difference (P < 0.05).

had low levels of AP, musicians with high levels of AP showed a greater left-dominant asymmetry of N1c amplitude. This represented AP negativity, which has previously been described as an electrophysiological marker of AP (Itoh et al., 2005). Additionally, the use of the BNE reference in this experiment revealed that the left-dominance of N1c was caused by a reduction of N1c amplitude in the right hemisphere rather than enhancement in the left hemisphere. This AP-specific effect on N1c was distinguishable from the more general effect of music training, which manifested as a left-dominant asymmetry of N1a.

The N1c response to pure-tone stimuli is typically recorded at temporal electrode sites with no left-right asymmetry or with right-dominance in a normal population (Woods, 1995; Pang and Taylor, 2000). This well-known property of N1c was confirmed in non-musicians and musicians with low levels of AP. By contrast, only musicians with high levels of AP showed a left-dominant N1c, which closely resembled AP negativity (Itoh et al., 2005) in terms of polarity (negative), amplitude (approximately 2 µV), scalp distribution (left temporal), and latency (100–200 ms). The novel finding of this experiment was that the left-dominant asymmetry of N1c (or AP negativity) in the high-AP musician group was caused by a diminution of the right N1c, rather than enhancement of the left N1c. A distinct methodological advantage of the current experiment is that a non-cephalic reference was used. The present recording (**Figure 1**) showed that the earlobe electrodes were clearly activated by the T-complex as predicted, as the sources were radially oriented in the temporal lobes (Wolpaw and Wood, 1982). Therefore, it was likely that the use of the linkedearlobes reference in Itoh et al. (2005) altered the N1c amplitudes measured on the temporal scalp electrodes. We therefore recommend the use of non-cephalic references for future recordings of AP negativity.

Notably, the right N1c was diminished in musicians with high levels of AP. The sources of N1c have been estimated in the secondary or higher auditory cortex of the superior temporal gyrus (Scherg et al., 1989; Woods, 1995; Shahin et al., 2003). Thus, although scalp AEP amplitudes are affected by many factors including source location and orientation, one possible interpretation of our finding is that the pitch stimulus activated a smaller number of right auditory cortical neurons in the high-AP musician group than in the low-AP musician group. This hypothesis is in line with previous findings that the volume of the right planum temporale is smaller in AP possessors than in non-possessors (Schlaug et al., 1995; Keenan et al., 2001; Wilson et al., 2009), although Wengenroth et al. (2014) have

reported an increased right auditory cortical volume in AP possessors, and Zatorre et al. (1998) have found an increased left planum polare volume in AP possessors. Considering that AP is typically acquired by early musical training, the functional and anatomical features of the right temporal lobe in AP possessors might be established in the course of brain maturation during which AP is acquired. In typical brain maturation, the N1c evoked by speech sounds gradually decreases in amplitude over the right hemisphere but not over the left hemisphere, commencing around the age of 7 years (Pang and Taylor, 2000). Moreover, the decrease in right N1c amplitude might be correlated with normal language development because the right N1c amplitude for speech sounds is apparently larger in children with language difficulties than in normal children (Groen et al., 2008). Because the core function of AP, or pitch labeling, is essentially verbal (Itoh et al., 2003), a common AEP component (i.e., right N1c) can reasonably be assumed to index both AP and language.

Two major hypotheses have been proposed regarding the neural mechanisms for AP: the early categorical perception hypothesis (Siegel, 1974) and the late labeling hypothesis (Levitin and Rogers, 2005). The early categorical perception hypothesis posits that the pitch labeling function of AP is subserved by some specialized neural circuitries in the auditory areas (Hirata et al., 1999; Itoh et al., 2005; Schulze et al., 2013), which is supported by anatomical findings that the auditory cortical structures in musicians with AP are organized differently from musicians without AP (Schlaug et al., 1995; Keenan et al., 2001; Wilson et al., 2009; Wengenroth et al., 2014; Kim and Knösche, 2016, 2017). By contrast, the late labeling hypothesis (also called the two-component model of AP) proposes that the labeling function of AP is subserved by late stages of cortical processing that occur outside the auditory areas (Zatorre et al., 1998; Levitin and Rogers, 2005). Specifically, the dorsolateral prefrontal cortex has been identified as the core brain structure that associates pitches with their names (Zatorre et al., 1998). This hypothesis is supported by studies that have found enhanced anatomical and functional connectivity between the auditory areas and the frontal lobe (Oechslin et al., 2010a; Elmer et al., 2015).

Previous AEP and event-related potential (ERP) experiments on this topic have yielded mixed results. An observation of the effect of AP on the late components of AEP/ERP (e.g., >200 ms), together with an absence of such effect on earlier components (e.g., <200 ms), is sometimes taken as evidence for the late labeling hypothesis (Tervaniemi et al., 1993; Rogenmoser et al., 2015). However, several AEP/ERP experiments have demonstrated AP-related differences in the early stages of auditory cortical processing (Hirata et al., 1999; Itoh et al., 2005; Wu et al., 2008; Coll et al., 2019), and our present results corroborate these findings.

In addition to the main findings regarding the neural correlates of AP, a left-dominant asymmetry of N1a was fortuitously identified as a novel electrophysiological marker of music training. Music training is known to enhance the tangentially oriented component of N1 (or its magnetic counterpart N1m) elicited by tone stimuli, which typically has a left-dominant distribution at a peak latency of approximately 100 ms (Pantev et al., 1998; Kuriki et al., 2006; Baumann et al., 2008; Itoh et al., 2012). Our result adds to these findings by revealing that the well-known enhancement of N1 (N1m) in musicians is preceded by left-dominant neural activities at an earlier stage of auditory cortical processing at approximately 80 ms. This is in line with the findings that plastic changes due to music training occur throughout the auditory pathway, including the subcortical (Musacchia et al., 2007) and cochlear (Bidelman et al., 2016) levels of processing.

## CONCLUSION

In conclusion, two main findings were obtained: (1) a leftdominant N1c (at approximately 130 ms) indexes AP, and (2) a left-dominant N1a (at approximately 80 ms) indexes music training. We conclude that the faculties for music and AP are both accompanied by a left-dominant hemispheric specialization of auditory cortical functions, but that they affect distinct stages of pitch processing in the human auditory cortex.

## DATA AVAILABILITY

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

### ETHICS STATEMENT

This human study was carried out in accordance with the recommendations of the Ethical Guidelines for Medical and Health Research Involving Human Subjects (Ministry of Education, Culture, Sports, Science, and Technology; Ministry of Health, Labor, and Welfare) with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Internal Review Board of the University of Niigata.

## AUTHOR CONTRIBUTIONS

MM and KI conceived and designed the study. MM and KI conducted the experiments. MM analyzed the data. All authors wrote the manuscript.

## FUNDING

This work was supported by JSPS KAKENHI (19H01770, 17K00202, and 19H05309).

## REFERENCES

fnins-13-00809 August 2, 2019 Time: 17:17 # 8


vertex components of the human AEP. Electroencephalogr. Clin. Neurophysiol. 70, 499–509. doi: 10.1016/0013-4694(88)90148-90144



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Matsuda, Igarashi and Itoh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Comparing the Effects of Rhythm-Based Music Training and Pitch-Based Music Training on Executive Functions in Preschoolers

#### Ulrike Frischen\*, Gudrun Schwarzer and Franziska Degé

Department of Developmental Psychology, Faculty of Psychology and Sports Science, Justus-Liebig-University, Giessen, Germany

Previous research has indicated the beneficial effects of music training on executive functions (EFs) in children. However, researchers have not clearly determined which component of music training produces these beneficial effects or whether different components exert different effects on EFs. In the present study, we examined the impact of rhythm-based music training compared to pitch-based music training and sports training as a control on EFs in preschoolers. Children aged between 5 and 6 years (N = 76) were randomly assigned to one of the three training groups and received training in small groups three times a week for 20 min in German kindergartens. Before and after training, children completed tests designed to assess inhibition, set-shifting, and visuospatial working memory. Parental education, family income, personality, and IQ served as control variables. We observed a significant training group × time interaction for the measure of inhibition. Children from the rhythm group exhibited significant improvements in inhibition from pre- to post-tests (dRM = 0.56), whereas children from the other groups did not. Furthermore, children from the rhythm group significantly differed from the sports control group at post-test (dcorr = 0.82). Concerning the measures of setshifting and visuospatial working memory, the descriptive data revealed similar results; however, we did not observe significant training group × time interactions. Based on our findings, rhythm-based music training specifically enhances inhibition in preschoolers and might affect other EFs, such as set-shifting and visuospatial working memory.

Keywords: music training, rhythm, pitch, executive functions, inhibition, preschoolers

## INTRODUCTION

Playing music is one of the most challenging tasks for the human brain because it involves many cognitive processes simultaneously. Most of these processes require cognitive control, which is summarized into a concept known as executive functions (EFs). EFs are described as a family of top-down mental processes including goal-directed behavior, planning and problem solving, such as inhibition, set-shifting, and working memory (Diamond, 2013). Since EFs are associated with academic achievement, intelligence, health, and wellbeing, EFs should be promoted, even early in childhood (Diamond, 2013). Music training might be an effective intervention to improve EFs,

#### Edited by:

Assal Habibi, University of Southern California, United States

#### Reviewed by:

Adam Winsler, George Mason University, United States Heiner Rindermann, Technische Universität Chemnitz, Germany

#### \*Correspondence:

Ulrike Frischen ulrike.frischen@psychol.unigiessen.de

Received: 15 May 2019 Accepted: 02 August 2019 Published: 27 August 2019

#### Citation:

Frischen U, Schwarzer G and Degé F (2019) Comparing the Effects of Rhythm-Based Music Training and Pitch-Based Music Training on Executive Functions in Preschoolers. Front. Integr. Neurosci. 13:41. doi: 10.3389/fnint.2019.00041 because making music, such as playing a musical instrument or singing a song, involves several EFs at the same time without focusing on a particular EF. In the last decade, some studies have reported an association between music training and EFs in children (e.g., Degé et al., 2011a). Additionally, musical training has been shown to enhance EFs in children (e.g., Jaschke et al., 2018). In preschoolers, musical training improves inhibition (e.g., Bugos and DeMarie, 2017). However, researchers have not clearly determined which specific components of music training programs are responsible for these beneficial effects on EFs. Components of musical training, which might be crucial for EFs include rhythmic entrainment (Miendlarzewska and Trost, 2014) or melodic encoding. Therefore, in the present study, we aimed to investigate whether different types of music training improved EFs such as inhibition, set-shifting, and working memory in preschoolers compared to a control sports training program. In particular, we were interested in determining whether rhythmbased music training would produce different beneficial effects on EFs than pitch-based music training.

Many cognitive processes are activated in parallel when an individual plays a musical instrument. The musician must read the music and translate the notes into movements (e.g., finger movements when playing the piano). At the same time, the individual is listening to the music and for mistakes. Additionally, the automatic responses must be inhibited when the key is changing and different chromatic signs are played (inhibition). A shift between different dynamics, rhythms, and tempos within one piece of music is often required (set-shifting). Working memory capacities are involved while remembering musical excerpts. Furthermore, the processes will become more complex when an individual is playing with others, because in addition to controlling your own playing with all of these processes, you must adapt your playing to the sound and music of the other players. For example, a musician must shift his/her attention between parts (set-shifting) and inhibit the impulse to play another part while playing the assigned part (inhibition). These processes are only examples of the variety of processes the human brain must perform while an individual plays music. However, all these processes share the requirement for voluntary cognitive control, which involves EFs. According to Diamond (2013) and Miyake et al. (2000) the three core EFs include inhibition, not acting impulsively, but prevent prepotent responses; set-shifting, switching between tasks or mental sets and working memory, monitoring and updating representations. EFs are associated with health, wealth, academic achievement and quality of life (Diamond, 2013), and studies have shown that EFs are trainable in children (for a review see Diamond and Lee, 2011) and adults (e.g., Richmond et al., 2011; Voss et al., 2011). Therefore, an opportunity to foster EFs in early childhood would be important to reduce disparities in EFs with the aim of affecting later academic achievement and well-being. As Diamond and Lee (2011) stated, repeated practice and an increasing level of difficulty is crucial for the successful training of EFs. Hence, musical training, which is associated with practicing regularly and a continuously increasing level of difficulty, might represent a perfect intervention to enhance EFs. Interestingly, and as described above, musical training affects several EFs at the same time.

Correlational research observed positive associations between music lessons and EFs, showing that music lessons are associated with inhibition (Degé et al., 2011a; Joret et al., 2016), set-shifting, selective attention, planning (Degé et al., 2011a) and fluency (Zuk et al., 2014) in children. Regarding preschoolers, Winsler et al. (2011) observed better inhibitory control in children aged 4 years or older who were enrolled in early music classes than their age-matched peers who had never taken early music classes (d = 0.28). Moreover, regardless of age, children who are currently taking early music classes outperform children who are not taking early music classes (d = 0.41). The authors postulated that early music classes promote children's motor behaviors via guiding their movements with the aid of music. However, since these studies used a correlational design, they do not allow for inferences of causality. If we consider longitudinal studies of the association between music training and EFs, we also find some evidence that music training has the potential to enhance EFs in children. A study by Holochwost et al. (2017) indicated that intensive school-based music training improves performance on tasks assessing set-shifting (ds from 0.18 to 0.35), short-termmemory (d = 0.25), and inhibition (ds from 0.40 to 0.57) in children from grades 1 to 8. Music training occurred daily and consisted of 40 min of instrumental music classes administered in a small group and 40 min of ensemble rehearsal. In this study, the music training group was compared to an untrained control group. Thus, the authors were unable to exclude the possibility that a Hawthorne or a schooling effect influenced their results. Jaschke et al. (2018) conducted another study using a schoolbased instrumental music training. The authors compared the effects of weekly school-based music classes to school-based visual arts classes on 6-year-olds. The music classes comprised music theory, joint singing, and playing instruments for one or 2 h per week for 2 years. In contrast to the study performed by Holochwost et al. (2017), the lessons were held with the entire class instead of small groups. Children from the music program were divided into two different groups, one group including children who did not have prior music knowledge and another group including children who were receiving private music lessons in addition to the school curriculum. The schoolbased music classes enhanced some, but not all, measures of EF. Both music groups exhibited improved performance on tests of inhibition and planning and outperformed the visual arts and no-arts control groups after 2 years of training. Regarding visual working memory, the visual arts group outperformed both music groups and the no arts control group. However, as the children were not randomly assigned to the groups, the findings might have been affected by pre-existing differences between groups that were not controlled. According to Roden et al. (2012), school-based music training enhances verbal (ds from 0.65 to 1.27), but not visual memory in primary school children compared to natural science training. Children from the music group received weekly 45 min of instrumental music training conducted in a small group with a maximum of five children. In addition, they participated in singing, rhythm and pitch exercises. Another study investigating the effect of music training on working memory showed that 2 years of an extended school-based music curriculum enhanced visual (η <sup>2</sup> = 0.12) and auditory working memory (η <sup>2</sup> = 0.23) in children aged between 9 and 11 years (Degé et al., 2011b). Similar to the study by Holochwost et al. (2017), children participating in the extended music curriculum received intensive training in five to seven music classes per week, including music theory, instrumental instruction and participating in the choir and/or the orchestra. Since the measure for visual working memory also relied on verbal components, the authors were unable to clearly determine if the music training actually affected the visual working memory or whether the training also enhanced the verbal working memory that was also involved in the task. Accordingly, researchers have not clearly determined whether music training has the potential to improve visual working memory. Another reason for this uncertainty is that the studies by Roden et al. (2012) and Degé et al. (2011b) did not randomize the participants; therefore, the results might have been influenced by pre-existing differences in children. In summary, previous studies suggest that school-based music lessons enhance EFs, such as inhibition (Holochwost et al., 2017; Jaschke et al., 2018), set-shifting (Holochwost et al., 2017), planning (Jaschke et al., 2018) and working memory (Degé et al., 2011b; Roden et al., 2012).

Regarding longitudinal studies with younger children, two studies have shown that music training enhances inhibition in preschoolers. In a study by Moreno et al. (2011), children aged 4–6 years were randomly assigned either to a group receiving computerized music listening training or to a group receiving computerized visual arts training five times a week for 45 min over a period of 20 days. Before and after training, all children completed a Go-Nogo task measuring inhibition. Children from the music group exhibited significant improvements from the pre-test to the post-test and outperformed children from the visual arts group (η 2 <sup>p</sup> = 0.12). Therefore, the authors concluded that a short-term intense computerized hearing-based music training program improves inhibition in preschoolers. Another study compared preschoolers receiving 45 min music classes twice weekly for a period of 6 weeks in kindergarten to a control group of preschoolers receiving the same amount of LEGO training (Bugos and DeMarie, 2017). In contrast to the findings reported by Moreno et al. (2011), the music training program was a practical music training program involving tasks such as vocal development, improvisation and bimanual gross motor coordination. Children were administered the Matching Familiar Figures Test (MFFT) and the Day/Night Stroop Task before and after training to test inhibition. Based on the results, the scores of all children on both tests improved from before to after training. Moreover, the music group outperformed the LEGO group on the MFFT after training (d = 0.99). Hence, the authors concluded that short-term music classes enhance complex inhibition tasks in preschoolers. Since inhibition appears to be the key EF during early childhood (Wiebe et al., 2008), music training likely has no impact on other EFs, such as set-shifting and working memory, in preschoolers. However, the reported studies did not investigate the effect of music training on other EFs. Therefore, researchers have not yet clearly determined whether music training improves inhibition alone or if it has the ability to influence other EFs in preschoolers. Taken together, the current state of knowledge indicates that music training enhances several EFs in children. In preschoolers, the reported studies showed that a short-term, computerized music listening training (Moreno et al., 2011) and a comprehensive practical music training program including vocal development, improvisation and gross motor coordination (Bugos and DeMarie, 2017) enhance inhibition. However, based on these studies, we are unable to conclude that a special component of musical training improves inhibition or whether a combination of different music components improve inhibition in preschoolers.

Most of the studies reported above used instrumental music training programs (e.g., Roden et al., 2012; Holochwost et al., 2017) or a comprehensive music training program (e.g., Bugos and DeMarie, 2017) including various components of music, such as pitch, rhythm, improvisation and bimanual coordination. The instrumental training programs consisted of general instrumental music instruction conducted in a small group (e.g., Roden et al., 2012) or in a class (Jaschke et al., 2018). In some studies, children also received lessons in ensemble playing (Holochwost et al., 2017) or music theory (Jaschke et al., 2018). The music training taught in kindergarten consisted of vocal development exercises, gross motor coordination using different instruments such as drums and xylophones, and improvisation tasks (Bugos and DeMarie, 2017). Moreno et al. (2011) did not administer practical music training in their study, but instead provided computerized hearing-based music training that comprised tasks related to pitch and rhythm. The music training programs varied in the length of lessons, ranging from 45 min per week (Roden et al., 2012) to 80 min per day (Holochwost et al., 2017), and in the duration of the training, ranging from a few weeks (e.g., Moreno et al., 2011; Bugos and DeMarie, 2017) to 2 years (e.g., Jaschke et al., 2018). In summary, substantial variety exists in the music training programs administered in current research. However, all music training programs include a mixture of components related to pitch and rhythm. Therefore, none of these studies provides information about possible differential effects of the single components on EFs. Only one study by Patscheke et al. (2018) examined the effects of a pitch-based music training compared to a rhythm-based music training on phonological awareness, but not EFs, in preschoolers. The pitch-based music training, but not the rhythm-based music training, influenced phonological awareness skills. This study proposed the hypothesis that different components of music training exert different effects on cognitive abilities in general. Therefore, single components of a music training program, such as rhythm and pitch training, might also exert different effects on EFs. Furthermore, Miendlarzewska and Trost (2014) suggested that rhythmic entrainment in particular might be one important component of a musical training program that would affect cognitive processes, such as EFs. The term entrainment is defined as the synchronization of at least two independent rhythmic processes (Clayton et al., 2004). In music, a simple example is clapping or tapping to the beat of a song. The ability to synchronize clapping or tapping to a specific rhythm requires the auditory perception of the rhythm, coordination of movements and sensorimotor integration. According to a study by Krause et al. (2010), musicians appear to exhibit increased connectivity in a brain network involving the premotor cortex, posterior parietal cortex and thalamus. These areas are also associated with attentional processes and motor planning (Coull, 2004), prompting the hypothesis that rhythmic entrainment plays a key role in the effect of music training on the development of EFs (Miendlarzewska and Trost, 2014). Thus, musical rhythm training in a group setting potentially represents an appropriate intervention to promote rhythmic entrainment abilities, because the players must automatically synchronize to each other in a natural setting.

We conducted the present study to investigate the differential effects of pitch-based music training compared to a rhythmbased music training on EFs in German preschoolers. Previous studies have already reported an effect of music training on performance on different tests of inhibition (Moreno et al., 2011; Bugos and DeMarie, 2017). However, researchers have not determined whether a special aspect of music training leads to the reported effects or if different musical aspects train various EFs. Miendlarzewska and Trost (2014) proposed that rhythmic entrainment is one important factor leading to cognitive enhancement. Moreover, the study by Patscheke et al. (2018) indicated different effects of various components of a music training program, such as pitch training and rhythm training, on phonological awareness. Therefore, we decided to use the same training program as Patscheke et al. (2018) to explore possible differences in the effects of a pitchbased music training compared to a rhythm-based music training on EFs in preschoolers. In addition to the music training groups, we added a third group receiving sports training as a no-music training control group. Based on the findings reported by Miendlarzewska and Trost (2014), we predicted a greater improvement in inhibition in children from the rhythm group, because the rhythm training focused on rhythmic perception and production, which are strongly associated with rhythmic entrainment. Inhibition appears to be the most important EF during early childhood (Wiebe et al., 2008). Since previous studies showed an improvement in inhibition in preschoolers receiving music training, we were specifically interested in investigating possible differences in the effects of pitch-based music training compared to rhythm-based music training on inhibition. Nevertheless, we decided to test the effects of the training programs on the other core EFs, including set-shifting and working memory, because studies with older children suggest that music training affects several EFs (e.g., Holochwost et al., 2017). The reported studies did not examine whether music training affected other EFs or inhibition alone in preschoolers.

#### MATERIALS AND METHODS

#### Participants

At the pre-test, the sample consisted of 95 preschoolers (57 females) aged 5–6 years (M = 5.7 years, SD = 0.3 years). Participants were recruited from different kindergartens<sup>1</sup> in the City of Giessen, Germany and the surrounding area. Participants were randomly assigned either to a music group receiving pitch training (n = 33), a music group receiving rhythm training (n = 33), or a control group receiving sports training (n = 30). We determined the inclusion criteria for our analyses to a training participation rate of at least 66%. The remaining sample included in the analyses comprised 76 children (see below for details). None of these children received music lessons or participated in a music group such as a choir or an ensemble. Some children (16%) had taken early music education for an average of M = 14.4 months (SD = 7.6 months). The socioeconomic status was assessed based on the parents' education levels and the monthly family income. The parents of most children (65%) did not have a university degree, one parent of 16% of the children had a university degree, and both parents of 11% of the children attained a university degree. Some parents did not provide information about their educational level (11.6%). The monthly family income in the sample ranged from less than 1,000e (8%) to more than 5,000e (3%). Most families reported a monthly income ranging from 1,000e and 2,000e (28%) or 2,000e and 3,000e (20%). Some parents (18%) did not provide details about their monthly income. We did not detect group differences in demographic variables. For additional details, please see **Table 1**.

#### Materials

#### Training Programs

Preschoolers were trained for 20 min three times a week for a period of 20 weeks. Trained research assistants conducted all sessions. Training programs were based on a manual. Every week, the research assistants and the supervisor of the study met to prepare and practice the training session for the following week to ensure that every training session was implemented in the same way. A typical training session on 1 day consisted of two to four different tasks lasting for a total of 20 min. The training sessions were implemented as described in the study by Patscheke et al. (2018) and based on a well-established early music education program designed by Nykrin et al. (2007).

Rhythm training focused on rhythmic exercises, including meter execution, perception, imitation and production of different rhythms. The exercises were conducted using sound gestures (e.g., clapping and stomping) and different Orff instruments (e.g., taborets, claves, and maracas). Exercises in meter execution included the synchronization of body movements to a given meter, dancing and playing rhythms with percussion instruments. Typical perception and imitation tasks consisted of imitating rhythms using rhythm language (ta-a-a-a, ta-a, ta, and titi) or percussion instruments.

Pitch training focused on discriminating sounds, intonation, sound production, and joint singing. Typical exercises for

<sup>1</sup> In Germany, children aged between 3 and 6 years attend kindergarten. Kindergarten is not compulsory and kindergartens do not have a consistent curriculum. In the last year of kindergarten, when children are aged between 5 and 6 years, they receive some preschool classes. However, children are playing, drawing or doing handicrafts for most of the time in kindergarten.

TABLE 1 | Inferential statistics for group comparisons of control variables.


1 In Months, <sup>2</sup> in %, <sup>a</sup> post hoc analyses did not reach significance. \*p < 0.05).

discriminating sounds were listening to different musical instruments or sounds from a CD and then naming or pointing to the instrument on a picture. Intonation was trained with the call and response method, as the teacher sang intervals or short melodies and the children subsequently repeated the intervals or melodies. Pitch discrimination was trained by listening to different tones from the mallet instrument and indicating which tones were higher or lower or by listening two different voices from a CD and indicating which was higher or lower. In pitch training, we did not use any percussion instruments or sound gestures for rhythmic accompaniment. Conversely, in rhythm training, we did not use melodic instruments or sing any melodies. Nevertheless, some overlap occurred between the training programs, because rhythm is to some extent also connected to prosodic features and song cannot be sung without using a certain rhythm. However, we reduced the overlap between trainings as much as possible.

The manual for sports training was the same as used by Patscheke et al. (2018) and comprised different exercises to practice body perception, motor skills and body coordination by supporting balance, physical strength, endurance and relaxation. Similar to the music training programs, a typical session included two to four different tasks for a total of 20 min. The children were asked to perform different exercises for body perception (e.g., exercises from yoga), balance (e.g., balancing objects on different parts of the body), motor skills (e.g., throwing balls into a box from a near or far distance) and different coordination and cooperation games in a group (e.g., walking altogether as a caterpillar). The sports training program was based on Yoga and active games for kids by Dunemann-Gulde (2005).

All training programs involved the same level of engagement of the children, and each trainer conducted classes in each training program to exclude trainer bias.

#### Measures

#### Control Variables

Demographic variables such as socioeconomic status (assessed based on parental education and family income) and the musical background of the children (e.g., if the child had participated in courses for early music education) were assessed using a questionnaire. The parental education level was coded as follows: 0 = no parents holding a university degree, 1 = one parent holding a university degree, and 2 = two parents holding a university degree. Family income was assessed with a six-point scale ranging from less than 1,000e per month to more than 5,000e per month. Early music education was assessed in months. Furthermore, parents completed the German version of the Big Five Inventory (BFI; Rammstedt and Danner, 2017) to assess the personality of each of their participating children. The BFI is a questionnaire comprising 45 items designed to measure the Big Five factors of personality (openness, agreeableness, extraversion, conscientiousness and neuroticism). Parents were asked to rate the extent to which they would agree with specific statements (e.g., ''I see my child as someone who is talkative'') on a 5-point Likert scale. As described in the study by Rammstedt and Danner (2017), we calculated the means for each of the personality factors. Reversed items were recoded. Higher scores indicate a stronger characteristic of the personality factor. The five scales of the German BFI show an internal consistency of α = 0.74 to α = 0.86. The test-retest reliability for the scales ranges from r = 0.78 to r = 0.93. We administered the revised version of the Culture Fair Test (CFT 1-R) created by Weiß and Osterland (2012) to assess fluid intelligence. The CFT 1-R is divided into two parts including three subtests. The first part measures figural perception and processing speed with the subtests substitutions, mazes and similarities. The second part includes the subtestscontinuing series,classifications and matrices to measure figural reasoning. All subtests contain 15 items that must be processed in a certain time. Research assistants explained examples of each subtest according to the manual before the subtests began. The test provides age-corrected standard values for children aged from 5.3 to 9.11 years. The test-retest-reliability is r = 0.88 for the first part, r = 0.94 for the second part and r = 0.95 for the whole test. We assessed children's enjoyment of participating in the trainings to control for potential biases concerning their motivation and willingness. Every 5 weeks of the trainings, children rated on a 5-point-Likert scale how much they enjoyed participating in the training. The scale ranged from 1 = not a bit to 5 = very much and was presented to the children via smileys (1 = very sad smiley; 5 = very happy smiley; 3 = neutral smiley). For each child, we generated an average score of the four measurements.

#### Dependent Variables

The EF measures inhibition, set-shifting and working memory served as our dependent variables and were assessed before and after training. Inhibition was measured using the subtest ''statue'' from the NEPSY-II (Korkman et al., 2007), a developmental neuropsychological assessment for children. The test is designed to assess motor persistence and inhibition. During the test, the child was asked to stay in a particular position with his/her eyes closed for 75 s. The child was asked not to respond to any sound distractor during this time. The experimenter made different noises as distractors at certain times (e.g., after 20 s the experimenter drops a pencil onto the table). At every 5-s interval, the experimenter recorded if the child committed any error by opening his/her eyes, moving the body or responding verbally to the sound or if the child had no errors. For every 5-s interval, the child received two points if no errors were recorded, one point if two errors were recorded or zero points if more than two errors were recorded during that interval. We added the points for all time intervals as the outcome measure and compared them with age-corrected norms, which are provided for children aged from 3 to 6 years. The test-retest-reliability of this subtest is r = 0.88 for children aged from 5 to 6 years.

Set-shifting was assessed with the Dimensional Change Card Sort (DCCS; Zelazo, 2006). We administered both the standard and more challenging border versions. In the standard version, children were presented two target cards (a blue bunny and a red boat) and invited to sort the test cards (red bunnies and blue boats) according to one dimension. In the pre-shift phase, children were asked to sort all cards according to the dimension ''color,'' and were required to place a red card to the side of the red boat and a blue card to the side of the blue bunny. After 6 trials, the experimenter interrupted the game and changed the dimension to ''shape,'' and thus the child sorted cards with the bunny shape to the blue bunny and cards with the boat shape to the red boat (post-shift). Both the pre-shift and post-shift phases comprised 6 trials, with a total of 12 trials for the standard version. After completing the standard version of the test, we administered the more challenging extended version, which combined both dimensions from the standard version. In the extension, the child was asked to sort cards with a black frame according to the dimension ''color'' and cards without a frame according to the dimension ''shape.'' Similar to the standard version, the extended version also consisted of 12 trials. In both versions, we repeated the instructions after every trial to ensure the lowest demands on working memory as possible. In the pre-shift and post-shift phases, children were required to sort at least five of six cards correctly to pass. In the extended version, children were required to sort at least 9 of 12 cards correctly to pass. We used the scoring system suggested by Zelazo (2006): children who failed the pre-shift phase received a ''0,'' children who passed the pre-shift phase received a ''1,'' children who passed the post-shift phase but failed the extended version received a ''2'' and children who passed both the standard and extended versions received a ''3.'' The DCCS shows an intra-class correlation (ICC) of 0.94 for the standard version and ICC = 0.90 for the extended version (Beck et al., 2011).

We measured visual-spatial working memory by administering the subtests Matrix Span Test (for visual cache) and Corsi Block Test (for inner scribe) from the Working Memory Test Battery for 5- to 12-year-old children (''Arbeitsgedächtnistestbatterie 5–12,'' AGTB 5–12) described by Hasselhorn et al. (2012). According to Logie (1995), visual working memory is divided into two subcompartments: the visual cache and the inner scribe. The visual cache is responsible for storing information about shape and color, whereas the inner scribe analyses information about location and movement. Both tests consisted of a standardized computerized introduction with two practice trials and 10 test trials. The difficulty level at the beginning of the test trials depended on the child's age. For example, 6-year-old children started at a more difficult level than 5-year-old children. Both subtests used an adaptive procedure that increased the difficulty for correct responses and decreased the difficulty for incorrect responses (described in detail below). During the test trials, children were not informed about their test performance. The AGTB provides age-corrected standard values for children aged from 5 to 12 years. The Matrix Span Test measures the memory of static visual patterns. Children were presented a 4 × 4 matrix composed of white and black squares on a touchscreen for 4 s. After the matrix had disappeared, children were shown a new matrix with white squares only and asked to tag the squares that were black in the previous image. If the child responded correctly in two consecutive trials, the number of the black squares increased by one (to a maximum of nine black squares). On the other hand, if the child responded incorrectly in two consecutive trials, the number of black squares was reduced by one (to a minimum of two black squares). The Corsi Block Test assessed the memory of dynamic changes in spatial locations. In this test, children were asked to memorize the path of a yellow smiley face that moved between nine white squares displayed on a gray background on the touchscreen. After the full path of the smiley face was shown, children recorded their response by touching the order of the squares in which the smiley face had moved. If the child responded correctly in two trials, the path of the smiley face was increased by one square (to a maximum of nine squares). If the child responded incorrectly in two trials, the path became one square shorter (to a minimum of two squares). The test-retestreliabilities for both subtests are r = 0.66 for children aged from 5 to 8 years.

#### Procedure

The experiments were conducted in accordance with ethical guidelines of the ethics committee of the Faculty of Psychology and Sports Science at Giessen University (application number 2015-0001). Prior to pre-tests, parents were provided information sheets about the aims and contents of the study and informed consent was obtained for each participant. All test sessions were conducted as single sessions in a separate room in kindergarten. Trained research assistants from the


TABLE 2 | Means and standard deviations of dependent variables for treatment groups and the control group at the pre-test (T0) and post-test (T1).

department, who were at all times blinded to the conditions and the hypotheses, administered test sessions. We started test sessions with the DCCS, followed by the statue test and the tests of visual working memory. The CFT 1-R was administered in an extra session on the same day after an appropriate break or on a consecutive day. After the pretests, children were randomly assigned to one of the music groups or to the sports group. All training programs were conducted in small groups of 5–8 children for 20 weeks. Groups participated in the designated training program three times a week for 20 min each. After the training phase was complete, post-tests of the dependent measures were immediately administered using the same method described for pre-test. When post-tests were complete, each child received a small present and a certificate as a reward for participating in the study.

### RESULTS

All analyses were computed using IBM<sup>r</sup> SPSS<sup>r</sup> Statistics 25. First, we analyzed the dropout rate to ensure that the children who dropped out did not differ significantly from the remaining sample. Children with a training participation rate of less than 66% were excluded and counted as a dropout. Of the 95 children enrolled, 19 did not meet this inclusion criterion, and thus the dropout-rate in our sample was 18.1%. Major reasons for the low training participation rate of these children were illness and winter holidays. Based on our analyses, these children did not significantly differ from the included sample in terms of gender, χ 2 (1) = 0.04, p = 0.834, age, t(93) = 0.31, p = 0.757, parental education t(82) = −0.86, p = 0.392, family income, t(75) = −1.18, p = 0.243 and IQ, t(93) = −0.05, p = 0.960. A power analysis using G∗Power (Erdfelder et al., 1996) suggested that a total sample size of 66 participants was sufficient to ascertain small to medium effects (f = 0.25) in a repeated-measures within-between design (α: 0.05, power (1-β): 0.95, correlations between repeated measures: r = 0.50).

#### Preliminary Analyses

We compared differences in control variables between the training groups. As shown in **Table 1**, significant differences in gender, age, IQ, early music education, training participation, enjoyment of the training, parental education and personality were not observed between the training groups. A significant difference in family income was identified, indicating that families from the pitch group had the highest income (M = 3.48, SD = 1.41), followed by the sports group (M = 2.68, SD = 0.89) and the rhythm group (M = 2.60, SD = 1.27). However, Bonferroni-adjusted post hoc analyses did not reveal significant (p < 0.05) differences between the pitch and rhythm groups [95% CI (−0.05, 1.8)], between the pitch and sports groups [95% CI (−0.14, 1.73)] or between the rhythm and sports groups [95% CI (−0.1.05, 0.88)]. We performed ANOVAs for the EFs measured in the pre-test to exclude the possibility of any pre-existing differences in EFs between groups. No differences in inhibition, F(2,73) = 0.45, p = 0.64, set-shifting F(2,73) = 1.26, p = 0.29, and working memory F(2,73) = 1.36, p = 0.26 (Matrix Span), F(2,73) = 0.43, p = 0.65 (Corsi Block) were observed between groups. Taken together, the preliminary analyses did not reveal significant systematic differences in control variables or dependent measures at the pre-test.

#### Main Analyses

We performed repeated measures ANOVAs with the training group as the between-subjects factor and time as the within-subjects factor for the tests of inhibition, set-shifting, and working memory. Mean values and standard deviations for the dependent variables are presented in **Table 2**.

For inhibition, the repeated measures ANOVA indicated a significant main effect of time, F(1,73) = 9.03, p = 0.004, dRM = 0.33, showing an overall improvement from the pre-test (M = 10.86, SD = 2.28) to the post-test (M = 11.64, SD = 2.02), as well as a significant training group × time interaction F(2,71) = 4.18, p = 0.019, d = 0.70. The main effect of group was nonsignificant, F(1,73) = 0.52, p = 0.598. We analyzed the training group × time interaction by performing a 2 (time) × 2 (training group) ANOVA and observed a significant training group × time interaction when comparing the rhythm and sports groups, F(1,42) = 7.14, p = 0.011, d = 0.84. We did not observe significant interactions with time when the pitch and sports groups were compared, F(1,47) = 4.28, p = 0.124 or when the pitch and rhythm groups were compared, F(1,49) = 6.41, p = 0.12. Paired t-tests showed a significant improvement from the pre-test to post-test for the rhythm group, t(23) = −3.21, p = 0.004, dRM = 0.56, whereas the results for the pitch group, t(27) = −1.94, p = 0.062, and the sports group, t(21) = 0.34, p = 0.734 were nonsignificant. Independent t-tests did not reveal differences between the rhythm and the sports group at the pre-test, t(47) = −0.27, p = 0.79, but indicated a significant difference between the

rhythm and the sports group at the post-test, t(42) = 2.34, p = 0.02, dcorr = 0.82 (see **Figure 1**).

Regarding set-shifting, a repeated measures ANOVA revealed a significant main effect of time, F(1,73) = 21.58, p < 0.001, dRM = 0.55, showing an overall improvement from the pre-test (M = 1.96, SD = 0.55) to the post-test (M = 2.35, SD = 0.56), as well as a significant main effect of the training group, F(2,73) = 3.23, p = 0.045, d = 0.59. However, Bonferroniadjusted post hoc group comparisons between the rhythm and sports groups [95% CI (−0.03, 0.54)], the rhythm and pitch groups [95% CI (−0.28, 0.26)] and the pitch and sports groups [95% CI (−0.02, 0.55)] were nonsignificant (all p-values > 0.05). Furthermore, the training group × time interaction was nonsignificant, F(2,73) = 0.77, p = 0.47 (see **Figure 2**).

Repeated measures ANOVAs for working memory did not reveal a significant main effect of time on the Matrix Span result, F(1,73) = 3.55, p = 0.064, or a significant main effect of the training group F(2,71) = 0.88, p = 0.421. The training group × time interaction was also nonsignificant, F(2,71) = 2.31, p = 0.11. For

the Corsi Block Test, we observed a significant main effect of time, F(1,73) = 10.67, p = 0.002, dRM = 0.35, showing an overall improvement from the pre-test (M = 46.13, SD = 8.42) to the post-test (M = 49.49, SD = 6.80), but a significant main effect of the training group was not detected, F(2,71) = 0.23, p = 0.795. The training group × time interaction was also nonsignificant, F(2,71) = 2.02, p = 0.14 (see **Figure 3**).

#### DISCUSSION

We explored differences in the effects of rhythm-based music training and pitch-based music training on EFs in preschoolers. Based on our results, 20 weeks of rhythm training enhanced motor inhibition, but not pitch-based music training and sports training. The effect of the rhythm training program corroborates the hypothesis that rhythmic entrainment represents an important underlying mechanism of music training that supports cognitive function (Miendlarzewska and Trost, 2014). Moreover, since the rhythm training program required a high level of motor coordination we postulated that the motor component of the rhythm training program specifically influenced preschoolers' inhibition skills.

Our study confirms results from previous studies by showing an effect of music lessons on inhibition in school-children (Holochwost et al., 2017; Jaschke et al., 2018) and preschoolers (Bugos and DeMarie, 2017). However, most of these studies had some methodological issues, such as the lack of an active control group (Holochwost et al., 2017) or a nonrandomized sample (Jaschke et al., 2018), and thus these results might also be attributed to a general training effect or to some noncontrolled influencing factors. Accordingly, only our study and the study by Bugos and DeMarie (2017) actually allow for interpretations of causality because the children were randomly assigned to the different training programs. We expand the results of the study by Bugos and DeMarie (2017) by showing that rhythm training particularly appeared to improve inhibition in preschoolers. The rhythmic synchronization while drumming and the precise timing required while producing rhythms and rhythmical movements to music might have improved the inhibition abilities. Since all these rhythmic activities required children's motor control, it is likely that rhythm-guided motor control, in general, improves motor inhibition in children. Therefore, our findings support the idea of Winsler et al. (2011), who suggested that modulating children's movements with the aid of music may be beneficial for motor behavior and self-regulation. As shown in the present study, rhythmguided motor control was implemented in the rhythm training program at a high level and improved motor inhibition, whereas tasks focusing on pitch with less motor involvement did not affect motor inhibition skills in preschoolers. Moreover, the sports training program, which also required a high level of motor control, did not show an effect on motoric inhibition. Thus, we can conclude that rhythm-guided motor control in particular and not motor control alone improved inhibition skills in preschoolers.

Regarding set-shifting, the results showed an overall improvement from the pre-test to the post-test, but the training programs did not exert significantly different effects, since the training group × time interaction was nonsignificant. Hence, we were unable to generate statistically verified conclusions about a special effect of a single training program. However, the descriptive data (see **Figure 2**) appear to show greater improvements in the music groups than in the sports group, with the rhythm group attaining the greatest benefit. Therefore, our results are consistent with the findings from the study by Holochwost et al. (2017), which showed that music training enhanced set-shifting abilities in older children. However, since we did not identify a significant group by time interaction and all groups exhibited significant improvements in set-shifting from the pre-test to the post-test, further studies are required to corroborate this finding.

Regarding visuospatial working memory, we did not find any significant effect of the training programs on the Matrix Span Test, which assesses the visual cache (memory of static visual patterns), but an overall improvement was observed on the Corsi Block Test, which measures the inner scribe (memory of a dynamic series of spatial locations), from the pre-test to the post-test. Since the training group × time interaction was nonsignificant, statistical confirmation of an effect of a single training program was not obtained. However, based on the descriptive data (see **Figure 3**), the music groups exhibited improvements in the inner scribe component, with the rhythm group showing the greatest improvement from the pre-test to the post-test, whereas the sports group recorded similar performances on the pre- and post-tests. A potential explanation is an effect of the rhythm-based music training on the inner scribe component, but due to less power and high variances, we did not observe a statistically significant training group × time interaction. Interpreting the descriptive data, our study also confirms the findings reported by Holochwost et al. (2017) of an association between music training and visuospatial working memory in older children to some extent. The improvement on the Corsi Block Test but not the Matrix Span Test might indicate that the music training program did not improve the memory capacity, but improved the processing of visual information. This hypothesis is consistent with the findings from studies assessing verbal memory, showing that music lessons do not enhance verbal storage itself but improve the articulatory rehearsal (Franklin et al., 2008; Degé and Schwarzer, 2017). Therefore, a similar situation might occur for the visual working memory such that the music training program did not affect the visual storage but the processing of information in the visual memory. However, as stated above, we did not find a significant group by time interaction and must interpret these results with caution.

Our results confirm the hypothesis that different components of a music training program exert different effects on cognitive abilities in general. Furthermore, the study by Patscheke et al. (2018) revealed that pitch training, but not rhythm training, affected phonological awareness. Since we administered the same program for the pitch and the rhythm training, a comparison of the results suggests that the training programs exert different effects on cognitive abilities. Consequently, the rhythmic motor aspect of a music training program might improve general cognitive abilities, such as EFs, and intonation and pitch are strongly correlated with verbal abilities such as phonological awareness.

In summary, a rhythm-based music training program enhanced inhibition in preschoolers. As mentioned above, the motor component of the rhythm training program and rhythmic synchronization might be the components of a music training program specifically affect EF. Furthermore, our results show a descriptive tendency that music training affects other EFs, such as set-shifting and visual-spatial working memory, in preschoolers as well. However, since we were unable to statistically confirm this hypothesis, further studies are required to investigate these effects. As inhibition appears to be a key factor in the development of EFs in early childhood, the effect of the rhythm-based music training program on inhibition is an important finding. Since EFs exert far-reaching effects on crucial developmental functions, such as intelligence, academic achievement, wellbeing and health (Diamond, 2013), EFs should be enhanced, even early in childhood. As the present study suggests that regular music training, particularly training focusing on rhythmic elements, improves inhibition, one recommendation is to implement elements of a rhythmbased music training program in already established EF training programs. Furthermore, the development of a musicbased EF training program that can be integrated in the daily lives kindergarteners might produce promising effects. The implementation of a music-based EF training program in kindergarten might allow all children to improve their EFs in a manner independent of social contexts to ensure equal opportunities to benefit from the program. Nevertheless, further studies are needed to examine the applicability and the effectiveness of these training programs.

### LIMITATIONS AND FUTURE DIRECTIONS

We explored differences in the effects of music training programs on EFs in preschoolers. The results showed an effect of rhythmbased music training on motor inhibition. One limitation of the present study is that our music training programs had some overlapping aspects, as rhythm training cannot be completely separated from pitch training, since melodies and songs are always based on rhythm. However, the main difference between the training programs was that the rhythm training program focused on the music-guided motor component by perceiving rhythms with the whole body and producing rhythms using sound gestures, percussion instruments or rhythmical body movements. In contrast, the pitch training program concentrated on pitch and intonation without any rhythmic motor action. Regarding the sample size, the present study already exhibited an acceptable power, particularly if we consider the costs and efforts associated with the high training intensity and the organization required to work with many different kindergartens. Nevertheless, future studies with larger sample sizes should be conducted to support the generalizability of our findings. Although the power analysis revealed a sufficient sample size in the present study, a larger sample is preferred and might even reveal smaller effects that could have been overlooked in our study. As rhythm training appears to be an important factor influencing inhibition in preschoolers, an investigation of the effect of dancing on EF would also be interesting. Similar to music training, dancing relies on rhythm and meter and includes the musical component as the dancer executes movements to the rhythm of music. Furthermore, similar to our rhythm training program, dancing concentrates on music-guided motor control and rhythmic entrainment, and thus dancing might exert a similar effect on EFs.

#### CONCLUSION

Taken together, music training influences inhibition in preschoolers and might affect other EFs, such as set-shifting and working memory. This study is the first to investigate the different effects of distinct music training programs on several EFs in this age group. In particular, a rhythm-based music training program affects inhibition in preschoolers. In contrast to some previous studies, our results allow for causal interpretations, since we used a randomized controlled design. Therefore, the motor component of rhythm training and rhythmic entrainment represent important components

## REFERENCES


of a music training program that is designed to improve EFs. Further studies examining different age groups and larger sample sizes are required to confirm our findings. Moreover, we recommend a study examining the effect of dancing on EFs, because dancing relies on rhythmic motor movements as well. Therefore, dancing might exert a similar effect on EF. Finally, music-based rhythm training administered in a small group appears to be a suitable intervention to improve inhibition in preschoolers.

## DATA AVAILABILITY

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

#### Human Subject Research

The studies involving human participants were reviewed and approved by Lokale-Ethikkommission Fachbereich 06 (LEK-FB06), Justus-Liebig-University of Giessen, Germany. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

## AUTHOR CONTRIBUTIONS

FD and GS conceived the study. FD designed the study and obtained funding through a DFG grant. UF conducted the experiments and analyzed the data. UF, GS and FD wrote the manuscript.

#### FUNDING

This study was funded by a grant from the Deutsche Forschungsgesellschaft (DFG, No. DE2198/2-1).

## ACKNOWLEDGMENTS

We thank our student assistants who were involved in implementing the training sessions and collecting data. We would like to specifically thank all children and their parents for their participation, as well as the kindergarten teachers for their support.

Concept\_of\_Entrainment\_and\_Its\_Significance\_for\_Ethnomusicology. Accessed May 13, 2019.


executive function. Psychol. Sci. 22, 1425–1433. doi: 10.1177/09567976114 16999


**Conflict of Interest Statement**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Frischen, Schwarzer and Degé. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Musicians Show Better Auditory and Tactile Identification of Emotions in Music

Andréanne Sharp<sup>1</sup> \*, Marie-Soleil Houde<sup>1</sup> , Benoit-Antoine Bacon<sup>2</sup> and François Champoux<sup>1</sup>

<sup>1</sup> École d'Orthophonie et d'Audiologie, Université de Montréal, Montreal, QC, Canada, <sup>2</sup> Department of Psychology, Carleton University, Ottawa, ON, Canada

Musicians are better at processing sensory information and at integrating multisensory information in detection and discrimination tasks, but whether these enhanced abilities extend to more complex processes is still unknown. Emotional appeal is a crucial part of musical experience, but whether musicians can better identify emotions in music throughout different sensory modalities has yet to be determined. The goal of the present study was to investigate the auditory, tactile and audiotactile identification of emotions in musicians. Melodies expressing happiness, sadness, fear/threat, and peacefulness were played and participants had to rate each excerpt on a 10-point scale for each of the four emotions. Stimuli were presented through headphones and/or a glove with haptic audio exciters. The data suggest that musicians and control are comparable in the identification of the most basic (happiness and sadness) emotions. However, in the most difficult unisensory identification conditions (fear/threat and peacefulness), significant differences emerge between groups, suggesting that musical training enhances the identification of emotions, in both the auditory and tactile domains. These results support the hypothesis that musical training has an impact at all hierarchical levels of sensory and cognitive processing.

Keywords: emotion, music, auditory perception, tactile perception, brain plasticity

## INTRODUCTION

It is well established that musical training can lead to functional and structural changes in the brain, and that these changes correlate with improved music processing as measured by pitch, timing and timbre discriminations (for a review see Kraus and Chandrasekaran, 2010). Of particular importance to the present study, a number of studies have revealed that long-term musical training promotes brain plasticity and generates reorganization in regions related to audiotactile processing (e.g., Pantev et al., 2003; Baumann et al., 2007; Zimmerman and Lahav, 2012).

At the behavioral level, it has been shown that in detections tasks, musicians react faster to auditory and tactile stimuli (Landry and Champoux, 2017) and are also better at integrating auditory and tactile information (Landry et al., 2017). In auditory frequency discrimination tasks, musicians have lower threshold compared to controls (Spiegel and Watson, 1984), and this effect appears to be correlated with years of musical expertise (Kishon-Rabin et al., 2001).

#### Edited by:

Assal Habibi, University of Southern California, United States

#### Reviewed by:

Patrick Bruns, Universität Hamburg, Germany Vesa Putkinen, Turku PET Centre, Finland

\*Correspondence:

Andréanne Sharp andreanne.sharp@umontreal.ca; sharp\_andreanne@hotmail.com

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Psychology

> Received: 23 April 2019 Accepted: 13 August 2019 Published: 28 August 2019

#### Citation:

Sharp A, Houde M-S, Bacon B-A and Champoux F (2019) Musicians Show Better Auditory and Tactile Identification of Emotions in Music. Front. Psychol. 10:1976. doi: 10.3389/fpsyg.2019.01976

**56**

To examine whether such discrimination enhancements extended to multisensory processing, Young et al. (2017) used a two-alternative forced choice task in which participants had to determine whether a pair of stimuli were the same or different. Participant could hear the stimuli, combined or not with a corresponding tactile stimulation transmitted through a glove. The results revealed that compared to controls, musician frequency discrimination threshold was improved significantly by the addition of tactile stimulation.

Recent results from our laboratory have confirmed such frequency discrimination enhancements in the auditory and audiotactile domains and have extended the latter by demonstrating that musicians were also better at discriminating tactile-only stimuli applied to the hand (Sharp et al., 2019). Taken together, these results suggest that musical training can have an impact on sensory processing, at least in detection or discrimination tasks. Whether such enhanced abilities can extend to more complex processes remains a matter of debate.

During the last decades, the study of emotions in music has become an increasingly popular research field. It is known that the ability to identify emotion in music starts early in life and that young children base their judgments on basic psychoacoustic cues such as tempo, loudness and pitch (Adachi et al., 2004). At 3 years of age, children are sensitive to the positive and negative connotations of music but their analysis is not yet sufficiently nuanced to distinguish between more specific emotions (Kastner and Crowder, 1990). It is only around 5 years of age that children begin to discriminate happiness and sadness (Terwogt and Van Grinsven, 1991).

Around 11 years of age, children are able to identify emotions at the adult level (Hunter et al., 2011). Since the identification of emotions in music is based on psychoacoustic cues and musical features, the possibility that musical training might enhance this ability has long been surmised. Indeed it appears that musicians are more accurate than non-musicians in the identification of emotions in music (Vieillard et al., 2008). Decline due to age in the identification of emotion in music is also less marked in musicians (Castro and Lima, 2014). Emotion identification abilities in musicians have not been examined further and the capacity of musicians to better identify emotion in music throughout different sensory modalities also remains to be determined.

The present study aims at investigating the auditory, tactile and audiotactile identification of various emotions in musicians using the stimuli of Vieillard et al. (2008) and tactile stimulation technology developed by Young et al. (2017). This study will be the first to examine tactile and auditory-tactile identification of emotion abilities in musicians versus controls.

### METHODS

#### Participants

Seventeen professional musicians (7 women, 10 men, average age = 28.9 years) and 17 matched non-musicians (8 women, 9 men, average age = 34.4 years) participated in the study. Nonmusicians and musicians were matched for age, sex, handedness, educational level, and hearing thresholds. Only participants with less than 1 year of musical training were recruited for the nonmusician (control) group. The sample size of this study is justified by the restrictive criteria used for inclusion in the musicians' group. All musicians were working in the music field or studying music at the university level. The musicians specialized in piano (n = 9), guitar (n = 2), trumpet (n = 2), violin (n = 1), percussion (n = 1), flute (n = 1) and oboe (n = 1). They reported playing only one instrument (n = 4), playing two instruments (n = 2) or playing more than two instruments (n = 11). The average age of beginning to learn their first instrument was 7 years old. The average number of years of active practice of music was 20 years. Hearing thresholds were determined with an audiometer (Astera, GN Otometrics, Denmark). For both groups, pure-tone detection thresholds at octave frequencies ranging from 250 to 4000 kHz were within normal limits in both ears. The Research Committee for Health Sciences of the University of Montreal and the Center for Interdisciplinary Research in Rehabilitation of Greater Montreal approved all procedures, and each participant provided written informed consent. All experiments were performed in accordance with relevant guidelines and regulations.

## Stimuli and Procedure

The stimuli used in this study were developed by Vieillard et al. (2008). They are 56 melodies produced by a digital synthesizer in piano timbre. These instrumental stimuli were composed in the tonal musical tradition to express four emotions: happiness, sadness, fear/threat and peacefulness. The stimuli vary in mode, dissonance, pitch range, tone density, rhythmic regularity, and tempo but do not vary in performance-related expressive features (e.g., vibrato or variations of articulation/phrasing). Therefore the identification of emotions was based exclusively on the compositional structure. The mean duration of each stimuli was 12.4 s. All stimuli were originally validated by Vieillard et al. (2008) and were also cross-culturally validated by Fritz et al. (2009). These stimuli have been designed to elicit specific emotions that can be universally recognized.

The battery of Vieillard et al. (2008) was selected for this experiment because the four emotions evoked by the melodies are easily recognized and discriminated. Furthermore, all stimuli were validated cross-culturally by Fritz et al. (2009), and across age groups by Lima and Castro (2011). Finally, peacefulness is the most likely stimulus in this experiment to avoid a ceiling effect in musicians which show near perfection identification for better known emotions such has happy and sad.

Each of the 56 melodies were presented in a randomized order in three stimulation conditions: auditory-only, tactile-only and auditory-tactile. There were 14 stimuli for each type of emotion. For each stimuli, participants had to rate how much the melody expressed each of the four emotions on a 10-point intensity scale ranging from 0 (absent) to 9 (present). The four scales were presented immediately after each stimulus, and always in the same order (happy/sad/scary/peaceful). Each melody was presented only once in random order during each block (auditory only, tactile only and auditory-tactile) and no feedback was given. All conditions for stimulation and emotion were randomized. For example, one participant started in the tactile condition with a

peaceful stimulus, while another started in the auditory condition with a sad stimulus. To exactly replicate the standardized task of Vieillard et al. (2008), the order of the scale presented after each stimulus was not counterbalanced.

Participants were seated in a soundproof room and stimuli were presented via headphones (TDH-39, Diatec, Canada) for the auditory-only condition, via a vibrating glove device for the tactile-only condition, and via both headphones and a vibrating glove for the auditory-tactile condition. During the tactile-only condition, white noise was presented via headphones and the participant wore earplugs. The participant had to adjust the volume during practice trials so as not to hear the vibrating glove.

The vibrating glove was a replication of the glove used by Young et al. (2017) and was equipped with six independent audio-haptic voice-coil exciters. The voice-coil transducers (TEAX14C02-8 Compact Audio Exciter) had a diameter of 14 mm and were designed to deliver vibrotactile output. The frequency range of these speakers is 300 to 20,000 Hz. Stimuli were sent via a Dayton Audio DTA3116S Class D Micro Mini Amplifier (2 × 15 W), linked via an audio cable to the software Psyscope 1.2.5 (Cohen et al., 1993) on a Mac computer.

#### Analysis

The percentage of accurate responses, defined as the highest rating score for a melody corresponding to the intended emotion, was calculated for each participant for each emotion. For example, given a happy melody and a rating of Happy = 7, Sad = 3, Fear = 2, Peaceful = 6, the response would be counted as correct, whereas Happy = 6, Sad = 3, Fear = 2, Peaceful = 7 would be counted as incorrect. The same rating could never be used twice for any of the melody ratings.

An ANOVA was used as an omnibus test to compare the percentage of accurate responses for stimulation conditions and emotions as within-subject factors and groups as a betweensubject factor. A multivariate analysis of variance was used to compare the percentage of accurate responses between groups. To provide an estimation of multisensory benefits compared to unimodal stimulation, the increase in performance was measured by subtracting the score in the auditory only condition from the score in the auditory-tactile condition. The results provide an estimation of the contribution of tactile stimulation.

#### RESULTS

**Figure 1** displays the percentage of accurate responses for auditory, tactile and auditory-tactile conditions for each of the emotions. An ANOVA for stimulation conditions, emotions and groups was used as an omnibus test. There was a significant difference between groups (F(1,32) = 10.834, p = 0.002). There was also a significant interaction between the condition and emotion variables (p < 0.0001).

The multivariate analysis of variance used to compare the percentage of accurate responses revealed a statistically

significant difference in conditions based on Group (F(12, 21) = 2.585, p = 0.027; Wilk's 3 = 0.404, partial η <sup>2</sup> = 0.596). **Table 1** shows that there were significant differences between groups for fear/threat auditory, peacefulness auditory and peacefulness tactile whereas no significant differences between groups were found in the other conditions.

Uncorrected t-tests revealed that for both groups, mean percentage of responses was above chance for auditory and auditory-tactile stimulation conditions for all type of emotions (p < 0.001). For tactile stimulation, uncorrected t-tests revealed that the mean percentage of responses was above chance for both groups for happy (p < 0.001) and fear/threat emotions (controls: p = 0.002, musicians: p < 0.001), but not for sad (controls: p = 0.153, musicians: p = 0.747). Finally, an uncorrected t-test showed that musicians were performing above chance for tactile stimulation for peaceful emotion (t(16) = 2.170, p = 0.045) while on the contrary, another uncorrected t-test showed that controls were performing below chance for tactile stimulation for peaceful (t(16) = −4,629, p < 0.001).

For the happiness and sadness conditions, no increases in performance were observed in the auditory-tactile compared to the auditory-only condition in either groups (mean under 0%). For sadness and peacefulness, there were increases measured for controls (Sadness: 4% and Peacefulness: 12%), but not for musicians (mean under 0%). After correcting for multiple comparisons he increase in performance between auditory and auditory-tactile stimulation was not significant for either musicians or controls (see **Table 2** for more details).

ANOVAs were used as an omnibus test to compare the number of errors between groups for each expected emotion (4). The dependent variable was the number of errors and independent variables were groups, stimulation conditions and categories of the emotion scale. The ANOVAs for happiness (F(1,32) = 0.141, p = 0. 710), sadness (F(1,32) = 0.196, p = 0.661) and fear/threat (F(1,31) = 3.061, p = 0.090) revealed no differences between groups. There was a significant difference between groups for peacefulness (F(1,32) = 10.691, p = 0.003). t-Test analysis revealed differences between groups for sadness

TABLE 1 | Statistical results from the multivariate analysis of variance used to compare percentage of accurate responses between groups (auditory, tactile, auditory-tactile) for all four emotions.


Bold and <sup>∗</sup>p < 0.05.

when the expected emotion was peacefulness. This emotion had the higher rate of error for both groups. The difference between groups was the number of error, but not the type of emotion wrongly associated with peacefulness. In all conditions, both group were doing the same kind of errors for each type of emotion as shown in **Table 3**. In the auditory stimulation condition, for both groups, the emotion with which happiness and sadness was most often confused with was peacefulness. Similarly, for both groups, the emotion with which fear/threat and peacefulness were most often confused with was sadness. Results were the exact same in the auditory-tactile stimulation. In the tactile stimulation condition, for both groups, the emotion with which happiness was most often confused with was fear/threat. For all other emotions in the tactile stimulation condition, errors were distributed across the other three type of emotions. The missing values in **Table 3** are due to the fact that it was not possible to categorize some errors, because some participants were giving a 0 score to all types of emotions in the scale for a few trials.

## DISCUSSION

The main objective of the present study was to investigate auditory, tactile and auditory-tactile identification of emotion in musicians versus non-musicians. A significant difference between groups was found, with musicians showing better emotion identification for fear/threat in the auditory condition and for peacefulness in both the auditory and tactile conditions. Additionally, even if the difference does not remain significant after correcting for multiple comparisons, the trend indicates a possible gain from adding tactile stimulation to the auditory stimuli in peacefulness condition for controls (12%), but not for musicians (under 0%).

The significant differences found between controls and musicians can be linked to the complexity of the emotions displayed. It is well-known that happiness and sadness are the easiest emotions to identify because they are mainly based on tempo (see Terwogt and Van Grinsven, 1991). As such it is not surprising that results revealed no difference between controls and musicians for happy (auditory, auditory-tactile and




TABLE 3 | Mean percentage of correct responses and mean percentage of errors per emotion classified by the type of emotion wrongly identified.

Bold and <sup>∗</sup>p < 0.05.

tactile) and sad (auditory and auditory-tactile) conditions as there were ceiling effects. The average performance for sad for tactile stimulation did not differ between groups, but also, did not differ from chance for both groups. A more sensitive task would be needed to determine whether musical expertise can lead to more accurate identification of these emotions via auditory, tactile and auditory-tactile stimulation. Fear/threat is a musically less straightforward emotion than happiness and sadness (Vieillard et al., 2008; Tan et al., 2017). Hence, compared to controls, musicians more accurately identified that emotion in the auditory condition. In the same vein, the most complex and ambiguous emotion displayed in the sample melodies, namely peacefulness (Vieillard et al., 2008; Tan et al., 2017), was more accurately identify by musicians than by controls in both the auditory and the tactile conditions.

Results from the auditory condition are consistent with the extensive literature demonstrating that musical training leads to brain plasticity and can improve music processing as measured by pitch, timing and timbre discriminations (for a review see Kraus and Chandrasekaran, 2010). Since the identification of emotions in music is based on psychoacoustic cues and musical features, the enhanced performance of musicians in the auditory condition was also to be expected. Furthermore, an important component of musical training is aimed at understanding and experiencing the full range of emotional meaning and expressiveness, however faint, of a musical performance (Castro and Lima, 2014). As such, it is not surprising that improved performance was only found in conditions where musicians had to identify subtle emotional qualities.

One recent study have investigated recognition of emotions in an auditory-only stimulation condition. They suggest a correlation between years of musical training and accuracy at identifying emotion in music and revealed a significant difference between groups for older musicians with respect to sad and fear emotions (Castro and Lima, 2014). It should be noted that a major limitation of this study was that the range of musical expertise of participants as measured in years was large (8– 18 years), and that the average age of training onset was over 7 years of age, the known threshold beyond which music-induced structural changes and learning effects become less pronounced (for a review see Habib and Besson, 2009). As such the lesser musical expertise of their younger participants may explain why they could not find any significant differences between groups. In contrast, results from the present study were obtained with participants whose average age of learning onset was 7 years of age, and whose average number of years of active practice of music was 20.2 years. All participants were working or studying full-time in the field of music and can be considered professional musicians. In addition, the average age of the participants was 34.4 years for controls and 28.9 years for musicians, which corresponds to the younger group of Castro and Lima (2014).

The present study was the first to investigate the tactile identification of emotions in music. Results revealed that both musicians and controls were able to identify emotions via tactile stimulation only, which is in itself a new and major finding. No study to date has investigated purely tactile identification of emotion in music. The only existing study along these lines was performed by Branje et al. (2013) and suggests that multisensory stimulation can increase emotion perception in film. By using the Emoti-Chair, a device that induces vibration in the back of normal-hearing participants, they found increases in skin conductance levels when vibrotactile stimuli were added to audio/visual film content. They also observed that not only the intensity of vibration but also the frequency of the vibrotacile stimuli was playing a role in the observed reactions. The present study results are consistent with Branje et al. (2013) and further support the hypothesis that both controls and musicians are able to extract meaningful information from the frequency characteristics of a signal presented through vibrations only. Furthermore, for the emotion of peacefulness, results revealed a significant difference between musicians and controls for tactile stimulation. These results are consistent with a previous study from our laboratory, the first to demonstrate that musicians were better at discriminating frequencies via tactile stimulation applied to the hand (Sharp et al., 2019). The enhanced ability to identify peaceful emotions in music via tactile stimulation suggests that more complex processes are improved following long-term musical training. This hypothesis should be verified using other types of complex emotions that are easier to identify via tactile stimulation than peacefulness. Indeed, results in the peacefulness condition are above chance for musicians, but not for controls and the comparison of performance would be easier to interpret if both groups were above chance.

It is well-known that the frequency spectrum treated is more limited than that of the hair cells of the cochlea (1 to 1000 kHz) (Rovan and Hayward, 2000). Which musical components is perceived though the tactile modality remains a question of debate. Some studies suggest that non-musicians can detect different musical notes via the tactile modality (Hopkins et al., 2016) and that they can discriminate timbre (Russo et al., 2012). Furthermore, low frequencies in music are important to understanding beat and can be transmitted via vibrotactile devices (Van Dyck et al., 2013; Tranchant et al., 2017). All these psychoacoustic cues are known to be transmitted via the tactile modality and are all important for emotion identification in music. Further study should investigate if other cues are used in the identification of emotion in the tactile domain or if some of these cues are more important than the others. All these studies support our results suggesting that non-musicians and musicians are able to identify emotion via tactile stimulation only.

Finally, the lack of significant difference between musicians and non-musicians in the auditory-tactile condition can be explained by the trend for controls toward exhibiting gain from tactile stimulation compared to musicians, as the latter were already too skilled in the auditory domain to benefit from tactile stimulation. Further studies should use more complex emotional stimuli to assess whether there could be a tactile gain for musicians, and investigate whether non-musicians' performance could become similar to that of musicians with training and feedback.

### DATA AVAILABILITY

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

## ETHICS STATEMENT

This study was carried out in accordance with the recommendations of Research Committee for Health Sciences of the University of Montreal and the Center for Interdisciplinary Research in Rehabilitation of Greater Montreal with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Research Committee for Health Sciences of the University of Montreal and the Center for Interdisciplinary Research in Rehabilitation of Greater Montreal.

## AUTHOR CONTRIBUTIONS

AS and FC designed and performed the experiment. All authors wrote the manuscript, discussed the results and implications, and commented on the manuscript at all stages.

## FUNDING

This work was supported by Natural Sciences and Engineering Research Council of Canada (RGPIN-2016-05211).

#### REFERENCES

fpsyg-10-01976 August 26, 2019 Time: 15:39 # 7


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Sharp, Houde, Bacon and Champoux. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Dynamic Orchestration of Brains and Instruments During Free Guitar Improvisation

Viktor Müller <sup>1</sup> \* and Ulman Lindenberger 1,2,3

<sup>1</sup> Center for Lifespan Psychology, Max Planck Institute for Human Development, Berlin, Germany, <sup>2</sup> Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Berlin, Germany, <sup>3</sup> Max Planck UCL Centre for Computational Psychiatry and Ageing Research, London, United Kingdom

Playing music in ensemble requires enhanced sensorimotor coordination and the non-verbal communication of musicians that need to coordinate their actions precisely with those of others. As shown in our previous studies on guitar duets, and also on a guitar quartet, intra- and inter-brain synchronization plays an essential role during such interaction. At the same time, sensorimotor coordination as an essential part of this interaction requires being in sync with the auditory signals coming from the played instruments. In this study, using acoustic recordings of guitar playing and electroencephalographic (EEG) recordings of brain activity from guitarists playing in duet, we aimed to explore whether the musicians' brain activity synchronized with instrument sounds produced during guitar playing. To do so, we established an analytical method based on phase synchronization between time-frequency transformed guitar signals and raw EEG signals. Given phase synchronization, or coupling between guitar and brain signals, we constructed so-called extended hyper-brain networks comprising all possible interactions between two guitars and two brains. Applying a graph-theoretical approach to these networks assessed across time, we present dynamic changes of coupling strengths or dynamic orchestration of brains and instruments during free guitar improvisation for the first time. We also show that these dynamic network topology changes are oscillatory in nature and are characterized by specific spectral peaks, indicating the temporal structure in the synchronization patterns between guitars and brains. Moreover, extended hyper-brain networks exhibit specific modular organization varying in time, and binding each time, different parts of the network into the modules, which were mostly heterogeneous (i.e., comprising signals from different instruments and brains or parts of them). This suggests that the method capturing synchronization between instruments and brains when playing music provides crucial information about the underlying mechanisms. We conclude that this method may be an indispensable tool in the investigation of social interaction, music therapy, and rehabilitation dynamics.

Keywords: intra- and inter-brain coupling, brain-instrument coupling, graph-theoretical approach, EEG hyperscanning, phase synchronization, extended hyper-brain networks, social interaction

#### Edited by:

Assal Habibi, University of Southern California, United States

#### Reviewed by:

Donald Glowinski, Université de Genève, Switzerland Shoji Tanaka, Sophia University, Japan

> \*Correspondence: Viktor Müller vmueller@mpib-berlin.mpg.de

Received: 15 May 2019 Accepted: 20 August 2019 Published: 04 September 2019

#### Citation:

Müller V and Lindenberger U (2019) Dynamic Orchestration of Brains and Instruments During Free Guitar Improvisation. Front. Integr. Neurosci. 13:50. doi: 10.3389/fnint.2019.00050

## INTRODUCTION

As noted by D'Ausilio et al. (2015): "Group-level musical coordination can be considered as a microcosm of social interaction." A recently emerging view in music neuroscience with regard to hyperscanning methods holds that playing music in groups is not only social and interactive (Keller et al., 2014; Acquadro et al., 2016; Chang et al., 2017), but that it requires strong inter-brain synchronization and specific hyper-brain network activity supporting interpersonal action coordination (Lindenberger et al., 2009; Sänger et al., 2011, 2012, 2013; Müller et al., 2013, 2018b). This hyper-brain network activity including both intra- and inter-brain synchronization is enhanced during periods of high demands on musical coordination and is accompanied by the emergence of so-called hyper-brain modules composed of nodes from two or more brains. It has also been shown that the topology of hyper-brain networks involving two (Sänger et al., 2012; Müller et al., 2013) or even four (Müller et al., 2018b) brains revealed small-world properties with high segregation and integration of brain function, and had a tendency to become more random at lower frequencies and more regular at higher frequencies. Moreover, this topology was characterized by a higher number of hub-connectors at the delta and theta frequency of brain signals during joint, as compared to solo guitar playing (Müller et al., 2013). Furthermore, different types of information flow—intra- vs. inter-modular—were found when playing guitar in quartet (Müller et al., 2018b). Nevertheless, little, if anything, is known about the interaction between brain processes or mechanisms implementing interpersonally coordinated behavior when playing music and the instruments used for music production.

In the current study, we attempt to close the conceptual gap between these important elements of musically coordinated behavior—music production and its neuronal implementation. Using acoustic recordings of guitar playing and electroencephalographic (EEG) recordings of the brain activity of guitarists playing guitar in duet, we tried to establish a method to investigate the couplings within and between all components of duet playing, i.e., the guitars and brains. All these couplings were then used to construct and to analyze a so-called extended hyper-brain network including two guitars and two brains and all connections within and between them. We describe different situations of connections between guitars and brains here, which together exhibit complex networks with network topology changing in time and reflecting the dynamic orchestration of brains and instruments in guitar duos.

## MATERIALS AND METHODS

#### Participants

Two pairs of professional guitarists participated in the study. In both pairs, the lead guitarist was always the same individual. The guitarists in the duo were not known to each other. All participants were right-handed and had played the guitar professionally for more than 5 years. The study was approved by the ethics committee of Max Planck Institute for Human Development (Berlin), and therefore performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki. All subjects volunteered for this experiment and gave their written informed consent prior to their inclusion in the study.

## Experimental Procedure and Data Acquisition

During the experiment, the guitarists sat facing each other and freely improvised in duet for about 5–6 min. Before playing, the guitarists had the opportunity to briefly discuss the theme of their improvisation. Typically, one guitarist played a single-line melody or solo, while the other accompanied with chords or in another way. They also switched roles several times during the improvisation. Participants were instructed to avoid unnecessary movement in order to reduce movement artifacts.

Acoustic and EEG measurement took place in an acoustically and electromagnetically shielded cabin. EEG was simultaneously recorded from both members of the pair using two electrode caps with 64 Ag/AgCl electrodes placed according to the international 10–10 system, with the reference electrode placed at the right mastoid. For further analysis, we used 40 EEG channels for each subject. These channels or electrodes were distributed across the entire cortex, so that the information of the remaining electrodes would be rather redundant. Separate amplifiers with separate grounds were used for each individual, optically coupled to a computer. The vertical and horizontal electrooculogram (EOG) was also recorded to control for eye blinks and eye movements. Sampling rate was 5,000 Hz. The anti-aliasing filter was set to 1,000 Hz. A notch filter was used to suppress 50 Hz noise. EEG recordings were re-referenced offline to an average of the left and right mastoids and then filtered with a band pass ranging from 0.5 to 70 Hz. Eye movement correction was accomplished by independent component analysis (Vigário, 1997). Thereafter, artifacts from head and body movements were rejected by visual inspection. Spontaneous EEG activity was resampled at 250 Hz and divided into artifact-free 10-s epochs. In all, 10 artifact-free 10-s epochs were used for coupling and network analyses.

The sounds of the guitars were recorded through two microphones (i.e., one for each guitar) on two EEG channels, simultaneously with the EEG recordings. These two sound signals were divided into corresponding 10-s epochs without resampling. In addition, video and sound were recorded using Video Recorder Software (Brain Products, Munich, Germany) synchronized with EEG data acquisition.

#### Data Analysis

To investigate phase synchronization or coupling between the signals, we first normalized the high-frequency auditory signals and applied an analytic Morlet wavelet transform (**Figures 1A,B**) to calculate the power or amplitude within the four different frequency ranges: low (50–250 Hz), middle (250–500 Hz), high (500–2,000 Hz), and whole range (50–2,000 Hz). By averaging the amplitude within these four frequency ranges, we generated low-frequency time series that varied in a frequency range comparable to the EEG time series (see **Figure 1** for details). To investigate phase coupling between the given signals in a directed and frequency-resolved manner, we calculated the

Integrative Coupling Index (ICI) described elsewhere (Müller and Lindenberger, 2011; Müller et al., 2013). To do so, we applied an analytic complex-valued Morlet wavelet transform to both the generated auditory (**Figures 1C,D**) and EEG time series (**Figures 1E,F**), and computed the instantaneous phases for four frequencies of interest or frequency components (FC): 1.25, 2.5, 5, and 10 Hz (FC1, FC2, FC3, and FC4, respectively). The complex mother Morlet wavelet, also called

non-synchronization).

red curve). (H) Coding of the phase difference (–p/4 < 1ϕ < 0: blue stripes; 0 < 1ϕ < +p/4: red stripes; 1ϕ < –p/4 or 1ϕ > +p/4: green stripes =

Gabor wavelet, has a Gaussian shape around its central frequency f :

$$\text{cov}(t, f) = \frac{1}{\sqrt[4]{\sigma^2 \pi}} \exp\left(-\frac{t^2}{2\sigma^2} + \frac{3}{2}\pi j \text{ft}\right), j = \sqrt{-1} \tag{1}$$

where σ is the standard deviation of the Gaussian envelope of the mother wavelet. The frequency resolution of the wavelet transform was fixed at 0.125 Hz, and time resolution was fixed at 4 ms.

In order to identify the phase relations between any two channels (X and Y), the instantaneous phase difference 1ΦXY(t, f) was computed from the wavelet coefficients for all possible electrode and transformed acoustic signal pairs:

$$
\Delta \Phi\_{XY}^k(t, f) = \text{mod}\left(\Phi\_X^k\left(t, f\right) - \Phi\_Y^k\left(t, f\right), 2\pi\right) \tag{2}
$$

With instantaneous phases of the two signals across k data points in the segment: 8<sup>k</sup> X (t, f) = arg n z k X t, f o and 8<sup>k</sup> Y (t, f) = arg n z k Y t, f o .

Given the estimates of the phase difference between the two signals, it is possible to ascertain how long the phase difference remains stable in defined phase angle boundaries by counting the number of points that are phase-locked within a defined time window. In accordance with the procedure described by Müller and Lindenberger (2011) (cf. Müller et al., 2013), we divided the range between –π/4 and +π/4 into two ranges and distinguished between positive and negative deviations from phase zero (**Figure 1G**). We marked negative deviations in the range between –π/4 and 0 in blue (coded with "−1") and the positive deviations in the range between 0 and +π/4 in red (coded with "+1"). Phase difference values beyond these ranges were marked in green (coded with "0") and represent nonsynchronization (**Figure 1H**). For two channels X and Y, a blue stripe in the diagram would mean that the phase of channel Y precedes the phase of channel X, while a red stripe would mean that the phase of channel X precedes the phase of channel Y. We then counted the number of phase-locked data points, for both ranges separately. Before counting, successive points in the defined range (between –π/4 and +π/4) with a time interval shorter than a period (Ti= 1/fi) of the corresponding oscillation at the given frequency f<sup>i</sup> were discarded from the analysis. This cleaning procedure effectively eliminated instances of accidental synchronization. On the basis of this counting, we obtained four synchronization indices: (1) the Positive Coupling Index, PCI, or the relative number of phase-locked points in the positive range (between 0 and π/4); (2) the Negative Coupling Index, NCI, or the relative number of phase-locked points in the negative range (between –π/4 and 0); (3) the Absolute Coupling Index, ACI, or the relative number of phase-locked points in the positive and negative range (i.e., between –π/4 and +π/4); and (4) the Integrative Coupling Index, ICI, calculated by the formula:

$$ICI = \frac{PCI + ACI}{2 \cdot ACI} \cdot \sqrt{PCI} \tag{3}$$

The ICI is equal to 1 when all points are phase-locked and positive; if all phase-locked points are negative or are out of range, the ICI will approach 0. Thus, the ICI measure ranges from 0 to 1 and is asymmetric (ICIXY 6= ICIYX), indicating the relative extent of the positive shift in phase difference between two signals. We restrict the description of our study results to the ICI measure, which is the most informative index due to its directionality. For dynamic representation of coupling within the 10-s segments, we calculated phase coupling using moving time windows of 2,000 ms width and 80 ms time delay. Overall, within a segment of 10-s duration, coupling measures across 101 time windows were collected by this shifting procedure. The Matlab code to calculate the ICI measure from the phase difference between the two signals can be found in the **Supplementary Material**.

#### Network Construction and Calculation of Strengths

For construction of the extended hyper-brain network, we calculated the ICI between all electrode pairs within and between the brains as well as within and between the guitars using the four different frequency components corresponding to the four ranges described above (low, middle, high, and whole range). In addition, we calculated the ICI between the brains and the guitars. Given all these couplings, we finally constructed an extended hyper-brain network comprising 88 nodes (40 + 40 + 4 + 4) and 7,656 edges (all possible couplings between the nodes) for each FC (1.25, 2.5, 5, and 10 Hz) and each time window. **Figure 2** shows an example of an extended hyper-brain network in the form of the connectivity matrix (**Figure 2A**) and connectivity maps (**Figure 2B**).

In order to determine the network properties, we set the cost level (ratio of the number of actual connections divided by the maximum possible number of connections in the network) to 25%, which makes it possible to investigate sparse networks with the same number of edges at different FCs and time windows. The connectivity threshold was always higher than the significance level determined by the surrogate data procedure (i.e., networks at this cost or sparsity level always included significant connections). This allowed more accurate examination of the network topology in the different duos and playing conditions. Surrogate data were created in two ways: (1) by random permutations of the time series under consideration, and (2) by phase permutation of the given time series. The phase permutation procedure involved: (a) computing the amplitude and phase spectrum of a real signal using a Fourier transformation; (b) phase shuffling, whereby the phase values of the original spectrum are used in random order and the sorted values of the surrogate sequence are replaced by the corresponding sorted values of the reference sequence; and (c) inverse Fourier transformation back to the time domain. In this way, the real and the surrogate data retain the same power spectrum but a different time course due to phase shuffling. For both surrogate data procedures, 10,000 permutations were used.

As ICI is a directed weighted measure, we obtained the nodes' in- and out-strengths, with the in-strength defined as the sum of weights of all incoming connections (wji), Sin = P j∈N wji, and the

out-strength as the sum of weights of all outgoing connections (wij), Sout = P j∈N wij. For representation of network dynamics,

we used out-strengths (Sout) that were first determined for each node separately, and then grouped and summed for: (a) the outstrength going from each node of the guitar (A and B, separately) to the both brains of the guitarists, (b) the out-strength going from guitar A to guitar B and vice versa, (c) the coupling within the brains for each guitarist (A and B) separately, (d) the coupling between the brains with the out-strength going from guitarist A's brain to guitarist B's brain and vice versa, (e) the hyper-brain network comprising electrodes or nodes from two guitarists' brains.

As shown in **Figure 2**, the out-strengths are visualized in two different ways. First, in the connectivity maps (**Figure 2B**), the strengths are coded by the size of the nodes. Second, we present the topological distribution of strength, as displayed in **Figure 2C**.

### Modularity Analysis and Modular Organization of the Extended Hyper-Brain Network

To further investigate the modular organization of the networks, community structures and the modularity index (Q) were determined. For this calculation, we used the modularity optimization method for directed graphs that is implemented in the Brain Connectivity Toolbox (Rubinov and Sporns, 2010). The optimal community structure is a subdivision of the network into non-overlapping groups of nodes or communities in a way that maximizes the number of within-module edges and minimizes the number of between-module edges. Q is a statistic that quantifies the degree to which the network may be subdivided into such clearly delineated groups or modules. For directed networks, this is given by the formula (Leicht and Newman, 2008):

$$Q^{\rightarrow} = \frac{1}{l} \sum\_{i,j \in N} \left[ a\_{ij} - \frac{k\_i^{in} k\_i^{out}}{l} \right] \cdot \delta\_{m\_i, m\_j \text{.}} \tag{4}$$

where l = P ij aij is the number of edges in the graph, and aij is defined to be 1 if there is an edge from j to i, and 0 otherwise, k in i and k out i are the in- and out-degrees of the node i, and δm<sup>i</sup> ,mj is the Kronecker delta, where δm<sup>i</sup> ,mj= 1 if m<sup>i</sup> = m<sup>j</sup> , and 0 otherwise. High modularity values indicate strong separation of the nodes into modules. For this analysis, the extended hyperbrain network including all the nodes of the instruments and brains was used. Due to the fact that within-module edges are maximized by this partition procedure, the module or network community comprises those nodes with the strongest connections, which can represent different parts of the brains or instruments. One can assume that these modules or communities must have a specific functional meaning (Müller et al., 2018b). The community structures are presented as connectivity maps, where different modules are coded by color (see **Figure 2D** for an example).

#### RESULTS

Here, we exemplarily present data on the coupling between the instrument sounds and the brains of two guitarists when improvising freely in duets. **Figures 3**, **4** display the traces of guitars A and B (panel A), dynamic changes of coupling strengths going from guitar A (panel B) and guitar B (panel C) to the brains of both guitarists for each of the four frequency components of the guitar signals, dynamic changes of coupling strengths going from guitar A to the guitar B and vice versa (panel D), dynamic changes of coupling strengths within the brains of each of the two guitarists (panel E), dynamic changes of coupling strengths going from the guitarist A's brain to the guitarist B's brain and vice versa (panel F), and brain connectivity maps (left column) of the coupling within (upper map) and between (lower map) the brains, with extended hyperbrain community structures (middle column) as well as the topological distribution of coupling strengths (right column) also within (upper maps) and between (lower maps) the brains of the two guitarists at the three representative time points (panel G). **Figures 3**, **4** display the coupling strengths of the 10-s segment for FC1 (i.e., 1.25 Hz) in the two guitar duos, respectively. Results of coupling strengths for the other FCs (i.e., 2.5, 5, and 10 Hz) of these musical sequences of the two duos can be found in **Supplementary Figures 1–6**. The auditory representation of the 10-s segments analyzed here and the corresponding visualization of coupling strengths in real time can also be found in **Supplementary Movies 1, 2**. Please note that the soundtracks used for the coupling analyses described here were reconstructed from microphone records stored on the EEG computer with a sampling rate of 5,000 Hz. Notwithstanding this low sampling frequency, the auditory signals return the guitar tones well (**Supplementary Movies 1**, **2**).

#### Coupling Dynamics During a 10-s Segment in Duo 1

As can be seen in **Figure 3**, during the first time period (3.0 s) the communication between the guitars was realized through the connections coming from guitar A, which also sent strong connections to the right-temporal regions of guitarist A's brain. The brain of guitarist A also communicated with both guitars, while guitarist B's brain was connected to both guitars as well as guitarist A's brain. The strengths within the brain of guitarist A were predominantly strongest at parieto-occipital and frontal regions, and at parieto-occipital and left-temporal regions within the brain of guitarist B. The strongest betweenbrain strengths predominantly came from right-temporal regions of guitarist B's brain. Modularity analysis of the extended hyper-brain network including both guitars and both brains revealed two modules: one module (blue) comprised guitarist A's brain (except for two left fronto-temporal nodes) and both guitars (except for the low-frequency node of the guitar B), the second module (red) correspondingly comprised guitarist B's brain together with two nodes from guitarist A's brain and one node from guitar B, mentioned above. During the next time period of 6.0 s, both guitars and both brains were synchronized with each other to some extent. Guitar A exhibited bidirectional connections with guitarist B's brain, while guitar B communicated with both brains, but especially guitarist B's. The within-brain connection strengths were asymmetric in the two brains with right-temporal (guitarist A) and left-temporal (guitarist B) location predominance, also in frontal and parietooccipital regions. The between-brain connections were rather scarce and predominantly located in right-temporal regions in both brains. The modularity analysis also partitioned the extended hyper-brain network into two modules, with a blue module comprising guitarist A's brain (except for four right fronto-temporal nodes) and three nodes of guitar B, and a red module comprising all remaining nodes (see **Figure 3** for details). Finally, at the end of the 10-s segment (9.5 s), both brains were under the strong influence of both guitars, and they also strongly communicated with their own guitars and each other. Within the brains, the strongest strengths of guitarist A were located predominantly in the parieto-occipital regions, while

A, blue; guitar B, red. (B) Dynamic changes of coupling strengths going from guitar A to the brains of both guitarists for each of the four frequency ranges of the guitar signal, which are indicated by color: low range, brown; middle range, cyan; high range, purple; whole range, yellow. (C) Dynamic changes of coupling strengths going from guitar A to the brains of both guitarists for each of the four frequency ranges of the guitar signal, which are indicated by the same colors as in (B). (D) Dynamic changes of coupling strengths going from guitar A to the guitar B (blue) and vice versa (red). (E) Dynamic changes of coupling strengths within the brains of each of the two guitarists. (F) Dynamic changes of coupling strengths going from guitarist A's brain to guitarist B's brain (blue) and vice versa (red). (G) Brain connectivity maps, community structures, and topological distribution of coupling strengths. See Figure 2 for explanations.

those of guitarist B were strongest in the fronto-central regions. The strongest between-brain connection strengths were densely located in parieto-occipital and left-temporal regions in both brains. This time modularity analysis revealed three modules. The biggest (red) comprised three nodes from guitar A, all nodes from guitar B, central and parieto-occipital nodes from guitarist A's brain, and occipito-parietal and left-temporal nodes from guitarist B's brain. The second module (blue) is restricted to

FIGURE 4 | Dynamic changes of strengths during a 10-s improvisation period for FC1 (1.25 Hz) in duo 2. (A) Guitar traces obtained by microphone recording: guitar A, blue; guitar B, red. (B) Dynamic changes of coupling strengths going from guitar A to the brains of both guitarists for each of the four frequency ranges of the guitar signal, which are indicated by color: low range, brown; middle range, cyan; high range, purple; whole range, yellow. (C) Dynamic changes of coupling strengths going from guitar A to the brains of both guitarists for each of the four frequency ranges of the guitar signal, which are indicated by the same colors as in (B). (D) Dynamic changes of coupling strengths going from guitar A to guitar B (blue) and vice versa (red). (E) Dynamic changes of coupling strengths within the brains of each of the two guitarists. (F) Dynamic changes of coupling strengths going from guitarist A's brain to guitarist B's brain (blue) and vice versa (red). (G) Brain connectivity maps, community structures, and topological distribution of coupling strengths. See Figure 2 for explanations.

guitarist B and comprised the remaining nodes from guitarist B's brain. The third module (green) comprised the fronto-temporal nodes from the guitarist A's brain, and also one node from guitar A.

## Coupling Dynamics During a 10-s Segment in Duo 2

As can be seen in **Figure 4**, during the time period of 3.0 s, both guitars showed strong connections to both brains (especially to guitarist B's brain) and to each other. These connections were mostly bidirectional going from and to the brains/guitars. The within-brain connections were strongest in guitarist A's brain, with strongest strengths predominantly located in frontotemporal regions, while those in guitarist B's brain were located predominantly right-parietally. The between-brain strengths were strongest at the right and left temporal regions, for guitarist A and B, respectively. Modularity analysis of the extended hyper-brain network revealed two modules: one module (blue) comprised guitarist B's brain (except for three left frontotemporal nodes) and both guitars, the second module (red) correspondingly comprised guitarist A's brain together with three nodes from guitarist B's brain, mentioned above. During the next time period of 6.0 s, both guitars also showed strong connections to both brains and to each other, with links going from guitar A to both brains and links from guitar B predominantly to guitarist B's brain. These connections were mostly unidirectional, going from guitars to brains (especially, from guitar B), indicating strong influence coming from guitars. The within-brain connections are strongest in both guitarists at the fronto-temporal regions, while the between-brain strengths are densely located at frontal regions in guitarist A's brain and at occipital and right-temporal regions in guitarist B's brain. The modularity analysis partitioned the extended hyper-brain network into three modules, with two modules (blue and red) comprising both guitarists' brains (with the blue module additionally comprising three nodes from guitar A), and a third module (green) comprising the remaining node from guitar A, all nodes from guitar B, and frontal, central as well as right-temporal nodes from guitarist B's brain (see **Figure 4** for details). Finally, at the end of the 10-s segment (9.8 s), both guitars and both brains were synchronized with each other to some extent. Within the brains, the strengths of guitarist A were strongest predominantly in the frontal and temporal regions, while those of guitarist B were considerably reduced and occupy the frontal and occipital regions. The betweenbrain connection strengths were strongest in the right parietooccipital regions in guitarist A's brain and in right-temporal and frontal regions in guitarist B's brain. The modularity analysis revealed three modules: one module (blue) comprised all nodes from guitarist A's brain and one fronto-temporal node from guitarist B's brain, the second module (red) comprised the left part and some right parieto-occipital nodes of guitarist B's brain, together with one node from guitar A and all nodes from guitar B, and the third module (green) comprised the remaining part of guitarist B's brain and three remaining nodes from guitar A.

The inspection of other frequency components in both segments (see **Supplementary Figures 1**–**6**) showed (i) different temporal dynamic changes of network strengths for these frequency components, and (ii) that the networks within the same time periods exhibit different connectivity and community structures. All this indicates a complex interplay between different frequency components and underlying networks of the guitarists and instruments when improvising freely. Furthermore, it can be seen that dynamic changes of strengths themselves are oscillatory in nature and can be considered as second-order oscillations (Müller et al., 2018b).

## Power Spectral Density (PSD) of Second-Order Oscillations

Further, we investigated the PSD of the second-order oscillations emerging through dynamic changes of coupling strengths between the instruments and brains as shown previously. To do so, we calculated the PSD across all 10 trials and 4 frequency components within the two duos for all the coupling strengths presented before. The PSD values were then averaged for the four frequency components separately. As can be seen in **Figure 5**, there are several PSD peaks in the frequency range between 0 and 1.5 Hz. Closer inspection of spectral peaks and the PSD course indicates that these peaks or frequency components can be divided into harmonics with a fixed frequency ratio (e.g., 1:2, 1:3, 1:4, etc.). This suggests that the coupling strengths between the instruments and the brains have a specific temporal structure, which apparently facilitate the free guitar improvisation and the underlying musical performance.

## DISCUSSION

The primary objective of this study was to investigate the dynamic orchestration of brains and instruments during free guitar improvisation based on phase synchronization and extended hyper-brain network architecture including all the coupling types within and between the musicians' brains and guitars. The main findings are that such extended hyper-brain networks when playing guitar in duet (1) exhibit different temporal dynamic changes of network strengths, which are oscillatory in nature and have specific temporal structure for different EEG and instrument frequency components, and (2) feature different connectivity and community structures combining different brain regions and frequency components of instrument sounds, which share different, mostly heterogeneous modules (i.e., comprising different brains and instruments). It should be noted that the instrument's sound is a result of the musician's behavior, which is based on sensorimotor synchronization and action. At the same time, this sound influences the behavior of musicians through auditory sensory pathways and is in this sense an actor. In our view, music improvisation and interaction can be understood only when considering both bidirectional influences.

Previously, we have shown that both coupling strengths and community structures change their patterns depending on the oscillation frequency and musical situation (Müller et al., 2013, 2018b). We also pointed out that coupling strengths oscillate and exhibit so-called second-order oscillations, and that "the most important characteristic of the hyperbrain network organization is the existence of so-called hyperbrain modules sharing electrodes from two, three, or even four brains and characterized by strong connections or information flow within the modules and weak connections or information flow between the modules" (Müller et al., 2018b, p. 207). Here, we expand this observation to extended hyper-brain networks including instruments and their players' brains. It has been shown that modules or community structures can comprise one or two instruments as well as one or two brains, or different

parts of them. Such attuned modular organization of extended hyper-brain networks provides efficient information flow within and between the brains and instruments (through behavioral control), and apparently supports the free improvisation when playing guitar in duet. The fact that modularity structure and coupling strengths dynamically change during playing indicates that brain-brain and brain-instrument interactions are never controlled by the same brain regions but rather alter regional dominances in accordance with the current musical situation and/or expectations. Even if it can be assumed that sensorimotor brain regions play an important role in music production, it is to be expected that fronto-parietal brain regions, which have to do with touch, theory of mind or intentions of others, together with the visual, auditory, and somatosensory cortices, are important teammates in music improvisation (Zatorre et al., 2007; Keller et al., 2014). It has been suggested that adaptive timing, motor sequencing, special organization of movements as well as anticipatory and attentional processes that enable rhythmic joint action and interaction are supported by distributed networks of cortical and subcortical brain regions (Zatorre et al., 2007; Keller et al., 2014). How all these functions and underlying cortical regions or brain structures are related to modular organization of extended hyper-brain networks must remain to be seen.

Recently, it has been reported that vocalizing patterns of singers are coupled to their respiratory and cardiac oscillations during choir singing (Müller et al., 2018a, 2019). Furthermore, the coupling between an instrument (piano assessed by MIDI tone onsets) and the brains of pianists performing a musical duet was investigated by using amplitude envelope correlations between EEG and MIDI signals (Zamm et al., 2018). In contrast, the method described here is based on the phase coupling between frequency components of amplitude variation in acoustic signals measured directly from the guitar and frequency components of raw EEG signals. In our view, this method provides more options to investigate the coupling between the instruments and the brains and also offers more information about behavioral and brain synchrony, especially when integrated into extended hyper-brain networks. We also used a directed coupling measure, which indicates the direction of the phase difference shift between two signals or refers to the preceding phase of one of the two signals. The preceding phase of one signal related to another can be understood two

fold: (i) one signal influences the other or (ii) anticipation is at work. Both these processes are obviously very important during music improvisation (Biasutti and Frezza, 2009; Pecenka and Keller, 2011; Badino et al., 2014). In sum, this method provides high flexibility and accuracy in investigating different synchronization patterns between different instruments and musicians' brains when playing music in groups or assemblies.

In a number of studies, it has been shown that hyperscanning as a neuroimaging technique to investigate dynamic social interaction nowadays plays a crucial role in understanding the neural and physiological mechanisms of interacting or collective behavior (Dumas et al., 2011, 2014; Sänger et al., 2011; Keller et al., 2014; Acquadro et al., 2016; Müller et al., 2018a,b, 2019). Music performance has been considered a powerful catalyzer for social interaction (Keller et al., 2014; Acquadro et al., 2016). Extended hyper-brain networks including instrumentbrain coupling would provide further insights into the interplay of music performance and underlying brain-body interactions. Moreover, it has been suggested that hyperscanning has great potential for music therapy (Hunt, 2015). Fachner et al. (2019) measured dual-EEG of an experienced therapist ("Guide") and client ("Traveler") in a real music therapy session, which was combined with audiovisual recordings. They identified and quantitatively investigated therapeutically important moments of interest (MOI) and no-interest (MONI). The authors suggested that combining dual-EEG (hyperscanning) with detailed audiovisual and qualitative data can provide pivotal information for further research into music therapy (Fachner et al., 2019). There is no doubt that registration of instrumentbrain coupling would further improve this interesting and therapeutically significant approach.

In conclusion, this study shows that data acquisition and analysis methods for simultaneous EEG and instrument sound recordings from multiple persons are important for discovery of extended hyper-brain synchrony during interpersonal interactions. Synchronization patterns during free guitar improvisation assessed in terms of phase alignment for instrument–instrument, brain–brain, and instrument–brain interactions seem to reflect the complex interplay of different functions and underlying temporal dynamics of interpersonal coordination. These functions may include prediction of changed group behavior and executive control during group interaction. Future research needs to explore how these different functions assessed on the neuronal, behavioral, and group levels interact with each other and constitute mechanisms that establish and sustain interbrain oscillatory couplings in communication, voluntary action coordination, and social cognitive development. This method opens interesting perspectives for research on music interaction and may be an indispensable tool in the investigation of social interaction, music therapy, and rehabilitation dynamics.

## DATA AVAILABILITY

The datasets for this study will not be made publicly available because restrictions included in the consent statement that the participants of the study signed only allow the present data to be used for research purposes within the Max Planck Institute for Human Development in Berlin.

## AUTHOR CONTRIBUTIONS

VM and UL designed the study and discussed the results, wrote the article, and read and approved the final version of the manuscript. VM acquired and analyzed the data.

## FUNDING

This research was supported by the Max Planck Society.

## ACKNOWLEDGMENTS

We thank Thomas Holzhausen and the other guitarists for participation in the study. Technical assistance of Nadine Pecenka, Fanny Schuster, Carolin Schneidratus, Marlen Grießing, Veneta Mircheva, and Zurab Schera during data acquisition is greatly appreciated. The authors thank Julia Delius for language assistance and Dionysios Perdikis for providing the code to calculate the ICI measure.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnint. 2019.00050/full#supplementary-material

## REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Müller and Lindenberger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Absolute Pitch and Musical Expertise Modulate Neuro-Electric and Behavioral Responses in an Auditory Stroop Paradigm

#### Vivek V. Sharma1,2, Michael Thaut<sup>1</sup> , Frank Russo1,3 and Claude Alain1,2,4 \*

<sup>1</sup> Music and Health Research Collaboratory, University of Toronto, Toronto, ON, Canada, <sup>2</sup> Rotman Research Institute, Baycrest Centre, Toronto, ON, Canada, <sup>3</sup> Department of Psychology, Ryerson University, Toronto, ON, Canada, <sup>4</sup> Department of Psychology, University of Toronto, Toronto, ON, Canada

#### Edited by:

Marc Schönwiesner, Leipzig University, Germany

#### Reviewed by:

Mickael L. D. Deroche, Concordia University, Canada Stefanie Andrea Hutka, University of Toronto, Canada

\*Correspondence: Claude Alain calain@research.baycrest.org

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 27 May 2019 Accepted: 20 August 2019 Published: 06 September 2019

#### Citation:

Sharma VV, Thaut M, Russo F and Alain C (2019) Absolute Pitch and Musical Expertise Modulate Neuro-Electric and Behavioral Responses in an Auditory Stroop Paradigm. Front. Neurosci. 13:932. doi: 10.3389/fnins.2019.00932 Musicians have considerable experience naming pitch-classes with verbal (e.g., Doh, Ré, and Mi) and semiotic tags (e.g., musical notation). On the one end of the spectrum, musicians can identify the pitch of a piano tone or quality of a chord without a reference tone [i.e., absolute pitch (AP) or relative pitch], which suggests strong associations between the perceived pitch information and verbal labels. Here, we examined the strength of this association using auditory versions of the Stroop task while neuro-electric brain activity was measured using high-density electroencephalography. In separate blocks of trials, participants were presented with congruent or incongruent auditory words from English language (standard auditory Stroop), Romanic solemnization, or German key lexicons (the latter two versions require some knowledge of music notation). We hypothesized that musically trained groups would show greater Stroop interference effects when presented with incongruent musical notations than non-musicians. Analyses of behavioral data revealed small or even non-existent congruency effects in musicians for solfège and keycodes versions of the Stroop task. This finding was unexpected and appears inconsistent with the hypothesis that musical training and AP are associated with high strength response level associations between a perceived pitch and verbal label. The analyses of eventrelated potentials revealed three temporally distinct modulations associated with conflict processing. All three modulations were larger in the auditory word Stroop than in the other two versions of the Stroop task. Only AP musicians showed significant congruity effects around 450 and 750 ms post-stimulus when stimuli were presented as Germanic keycodes (i.e., C or G). This finding suggests that AP possessors may process alphanumeric encodings as word forms with a semantic value, unlike their RP possessing counterparts and non-musically trained individuals. However, the strength of musical conditional associations may not exceed that of standard language in speech.

Keywords: music, EEG, conflict resolution, event-related potentials, absolute pitch

## INTRODUCTION

fnins-13-00932 September 6, 2019 Time: 16:41 # 2

Musicians have considerable experience naming pitch-classes with verbal and semiotic tags. A small subset of musicians possesses linguistic tags for decoding pitch-classes into verbal codes or symbols without having to resort to an external reference pitch. This ability to name pitches without a referent is called absolute pitch (AP). The phenomenon of AP is of particular interest because it links measurable sensory inputs to neural representations (Zattore, 2003). AP also demonstrates highly stable category judgments using verbal codes that the general population does not seem to have for isolated sound objects (Levitin and Rogers, 2005). Most musicians do not have AP and instead require an external pitch-class to be presented and maintained in working memory as a reference to compare a subsequent pitch-class in order to label it. This more common ability is referred to as relative pitch (RP), which is the ability to label the intervals between pitches. Thus, while AP musicians can name pitch categories without any external referent, RP musicians can name pitch categories once a reference pitch is provided.

Absolute pitch is thought to be "automatic" whereas almost no RP users report automatic responses when identifying a pitchclass on the basis of knowledge of a preceding one, possibly relying on focused attention and working memory (Zatorre et al., 1998). AP possessors can accurately name a pitch-class within less than a second of hearing it (Miyazaki, 1990). Evidence also shows that AP is likely acquired during a critical period of 5– 7 years of age (Russo et al., 2003; Miyazaki and Ogawa, 2006). There is also a negative correlation between mean response time (RT) and accuracy across participants in AP groups (Takeuchi and Hulse, 1993). Musicians with RP are faster at naming the intervals between two pitch-class categories, most likely because AP possessors are thought to name individual elements first then extrapolate distance, adding processing stages to their interval judgments (Miyazaki, 1992). In contrast, RP can be developed by most individuals at any age and can become quite rapid and accurate with training and rehearsal (Miyazaki, 1989). Musicians with RP cannot associate abstract entities such as verbal codes to individual pitches consistently over a long-term but only to intervals or chord structures. AP possessors are quicker and most accurate when responding to piano-tones, the instrument of acquisition for most musicians with AP – some can only label piano-tones (Miyazaki, 1989). However, AP possessors have more difficulties naming individual pitch when presented with "black-key" pitch classes or simple tones (Takeuchi and Hulse, 1993). Interestingly, musicians with AP may also be susceptible to imperceptibility of "key signature" – the macroscopic organizational schema for all pitches in a piece of music, where a single pitch is perceived as the 1st element of the musical scale sequence (Terhardt and Seewann, 1983). Baharloo et al. (1998) delineated four categories of AP, where certain categories had high accuracy and low RTs to sinetone tests, while other categories did not, which informed the level of automaticity expected in individual AP possessors. Currently, it is thought that non-AP requires working memory to keep reference tones and schema of tonal intervals in echoic memory and consciousness while AP rely more heavily on categorization networks during pitch naming (Schulze et al., 2009). Thus, RP uses working memory and conscious control while AP seems to be automatic and working memory free. In this sense, much of RP ability is schematic while AP may employ more stimulus driven responses.

In the present study, we used auditory versions of the Stroop task to explore how pitch may be internally represented in musicians with RP and AP. In the original version of the Stroop task, participants are presented with a colored word, which could either be congruent (e.g., the word RED in red ink), incongruent (e.g., the word RED in blue ink), or neutral (e.g., word printed in black or a series of the letter "X" printed in colors). Participants are slower and make more errors for incongruent than congruent or neutral stimuli (MacLeod, 1991). The Stroop task has been used to study conflict resolution in children (Prevor and Diamond, 2005) and older adults (West and Alain, 2000) as well as in neurological (Rafal and Henik, 1994) and psychiatric (McNeely et al., 2003) populations. In the auditory versions of the Stroop task, participants are presented with spoken words that could either be congruent (e.g., the word Low in low pitch) or incongruent (e.g., the word Low in high pitch) (Hamers and Lambert, 1972). As with the visual Stroop task, participants are slower and make more errors for incongruent than congruent stimuli (Itoh et al., 2005; Donohue et al., 2012). If musicians process musical annotations similarly to how the general population process words, then we should observe interference effects when they are presented with incongruent musical annotations. However, it is unclear how musicians might process incongruence between semantic meaning and linguistic information during category recognition of musical pitch-classes and notations. Musical experience may allow for behavioral flexibility in an auditory Stroop paradigm involving pitch information. For example, musicians with AP may attenuate or bypass conflicting linguistic information during pitch perception, biasing their attention toward their own internal words. Thus, developing associative bonds between language and pitch may generate reflexive, rapid, and accurate encoding of sensory features irrespective of conflicting verbal codes. That is, audible musical percepts with strongly associated codewords may be less susceptible to interference than associated verbal codes outside of the domain of music. This type of transfer effect has been discussed in the context of musical expertise by some researchers, who suggest that scalp recording of eventrelated potentials (ERPs) may be effective to disentangle neural time-course events to study these effects (Besson et al., 2011).

Earlier ERP studies of the Stroop effect have identified several neural correlates thought to play an important role in information conflict resolution. For example, Duncan-Johnson and Kopell (1981) found that the P300 response at midline scalp locations (i.e., Fz, Cz, and Pz) was not significantly modulated by semantic incongruence or neutrality of color and word and yet the classic Stroop stimuli did indeed elicit a typical Stroop effect, which suggested that based on their resulting RTs, the neural signature of Stroop interference should more likely rely on activity later than the P300 response. This was also supported by other investigators (Ilan and Polich, 1999; Atkinson et al., 2003).

Rebai et al. (1997) demonstrated that a late negative N400 wave is evoked when participants are asked to internally verbalize responses of incongruent color-word conditions from recordings of the Fz, Cz, and Oz plus left and right parietal electrodes. West and Alain (1999) noted four spatially and temporally distinct modulations during the visual Stroop task, which included a phasic polar positivity that peaked at about 500 ms after stimulus onset over lateral fronto-polar scalp areas, a fronto-central slow wave that began at about 500 ms after the stimuli and persisted over the remainder of trial, a left-parietal modulation peaking at 522 ms and a left temporo-parietal positivity beginning at about 600 ms after stimulus onset. Ergen et al. (2014) showed that the N450 wave was larger for incongruent stimuli in comparison to congruent stimuli and maximal over frontal electrodes. These investigators also found a slow late potential at around 600 ms after stimuli onset over the parietal electrodes of their extended 10/20 system (Ergen et al., 2014).

In the present study, non-musicians (NM) and musicians with AP or RP completed three different versions of an auditory Stroop task while we measured neuro-electric brain activity using highdensity electroencephalography. In the auditory word Stroop task, participants were presented with sung English words (i.e., low or high) in one of two pitches corresponding semantically to the words. They were told to ignore the word and press a button indicating the pitch of the stimulus. In the solfège version of the auditory Stroop task, participants were instructed to do the same as the word Stroop task. However, they were presented with the English solemnizations corresponding to the two pitches (i.e., Doh or Soh). Finally, in the keycodes task, participants were presented with phonemic equivalents of key notation that corresponded to the two pitches (i.e., "C" and "G") and instructed to do the same as the other two tasks. For the word Stroop task, we anticipated that musicians would be faster and more accurate than NMs, but that the Stroop interference effects should be similar in all three groups. For the solfège and keycodes versions of the Stroop task, we predicted greater Stroop interference in musicians with RP and AP due to an over-learned semantic response and prolonged exposure to congruencies from pitch naming training. Prior research using auditory word Stroop (Itoh et al., 2005; Donohue et al., 2012) found analogous neuro-electric modulations to those found in the visual domain in terms of increased negativity for incongruent compared to congruent conditions over midline central scalp area between 400 and 600 ms after stimulus onset (West and Alain, 1999). We anticipate a similar modulation in NMs but a possible attenuation of such responses in musically trained individuals for pitch naming-based Stroop effects that use non-musical standard English words. Finally, we hypothesize that musicians with RP and AP would show enhanced negativity for incongruent trials in the solfège and keycodes tasks because of strong internal pitch representations and prolonged exposure to pitch naming with these lexicons. Understanding the strength of pitch information to verbal code mappings may demonstrate that creating abstract representations of perceptual features may allow confirmation of what is heard with greater certainty. This may help explain how words tag onto object features in order for categories to emerge from the environment.

## MATERIALS AND METHODS

## Participants

Sixty participants were recruited for this study. One NM was excluded, one musician with AP and one musician with RP were excluded because of excessive eye movements during EEG recording. The final sample comprised 19 NM (mean age = 24.9 years, range = 20–39, SD = 4.3, five males) with <4 years exposure to any music training. It also included 19 musicians who possess AP (mean age = 25.1 years, range = 19– 41, SD = 5.9, five males) and 19 who possess RP (mean age = 26.4 years, range = 21–39, SD = 4.0, 5 males) all of whom reported >7 years of formal musical training. There were no significant differences between the groups for age and years of education.

## Screening Test

#### Stimuli and Apparatus

Stimuli consisted of 24 pure tones (44.1 kHz sampling rate). They were generated using the digital audio editing software, ProTools LE (Version 7.3.1, Daly City, CA, United States). All stimuli were 1000-ms in duration, including 5-ms onset and 30 ms linear offset ramps. Pitch-classes were equally spaced, ranging from C3 to B5 inclusively and tuned to standard A4 = 440 Hz piano tunings in equal temperament. The stimuli were presented binaurally in an order that was individually randomized for each participant.

A computer interface presented the stimuli and collected response data from the participants using customized software in MATLAB by Mathworks (Cambridge, MA, United States, 2015b). Participants were tested in a quiet room in front of the computer interface, which was equipped with a mouse that allowed the participants to classify the tones using 12 circular response regions. Each of these regions displayed a complete set of chromatic pitch-class labels for AP and diatonic for RP, including enharmonic equivalents. The response-regions were placed in a circular shape, forming a clock of pitch-class labels. A small circle was displayed around the center of this wheel for the participants to click within before hearing a tone. Category response regions were positioned such that each response was equidistant from the pointing cursor at the onset of every tone.

For RP musicians, the stimuli and apparatus were the same, except the pitch C4 was always presented preceding the pitch meant to be named. Because RP is considered to be more effortful and less acute than AP, black-key pitches were not included nor were their pitch-class labels presented on screen, which decreased the response set to ease working memory load. The RP test consisted of naming 12 stimuli ranging based on their distance from C4.

#### Screening Procedure

We used the screening procedure developed by Bermudez and Zatorre (2009). The screen tests employed the same computer and digital audio system as the experiment. For the purposes of this study, the criterion for AP was defined as the ability to score ≥85% or an absolute mean deviation of ≤1 under these screening test conditions. RP was defined as the ability to score ≥80% and participants with mean RT that were three standard deviations from the group mean were eliminated. A set of structured questionnaires and follow-up interviews were then administered after testing to index musical experience and health-related data.

#### Experimental Test Stimuli

fnins-13-00932 September 6, 2019 Time: 16:41 # 4

The auditory stimuli were seven sung monosyllabic phonemes: /low/, /high/, /do/, /so/, /see/, /jee/, and /baw/. A professional vocalist sung the phonemes, which were recorded at a sample rate of 44,100 Hz using a Shure SM58 microphone and ProTools LE (Version 7.3.1, Daly City, CA, United States). The fundamental frequency of each tone was transformed digitally to be tuned to 261.63 Hz for the low pitch and 392 Hz for the high pitch using Adobe Audition 1.5 (San Jose, 2004). Sinusoid tones were generated with the same frequencies as the sung speech tones (i.e., C4 = 261.63 Hz and G4 = 392 Hz). The /baw/ syllable was recorded in both pitches and used as a neutral phoneme for all lexicons. Average root mean square power of the tones was then normalized (M = −16.13 dBs, SD = 1.72 dBs). The stimuli were edited to have a 500-ms duration including 5-ms rise/fall time. All phonemes were standardized to the Carnegie Melon University Pronouncing Dictionary. Stimuli were presented binaurally at an intensity of 75 decibels sound pressure level (SPL) through insert earphones model ER-3A by Etymotic Research (Elk Grove Village, 1985).

#### Procedure

Participants sat in a comfortable chair in an acoustically and electrically shielded room. The stimuli were presented using the computer software Presentation 16.5 by Neurobehavioral Systems Inc. Trials consisted of 500-ms fixation cross, followed by 500-ms auditory stimulation. The fixation cross was maintained on screen throughout the stimulus, and for 500-ms after a response was made, which initiated the next trial. The inter-trial interval was 1000-ms. The fixation cross was a white cross with a 58 point font size and located in the center of a black screen. Each block of trials comprised 100 congruent, 100 incongruent, and 67 neutral trials, which were presented in completely random order within each block. Participants completed three blocks of trials with breaks offered between blocks. They responded using a computer keyboard and were instructed to press the left arrow key for "Low" pitch and the right arrow key for "High" pitch, regardless of sung word. The time between the onset of a tone and a key press was recorded as the RT for that trial. RTs that were greater than 1000-ms or less than 200-ms were excluded from further analyses.

#### EEG Acquisition and Preprocessing

Electroencephalogram (EEG) was recorded from 66 scalp electrodes using a BioSemi Active Two acquisition system (BioSemi V.O.F., Amsterdam, Netherlands). The electrode montage was set according to the BioSemi electrode cap based on the 10/20 system and included a common mode sense active electrode and driven right leg passive electrode serving as ground. Ten additional electrodes were placed below the hair line (both mastoid, both pre-auricular points, outer canthus of each eye, inferior orbit of each eye, two facial electrodes) to monitor eye movements and to cover the whole scalp evenly. The latter is important because we used an average reference (i.e., the average of all scalp EEG channels as the reference for each EEG channel) for ERP analyses. Neuro-electric activity was digitized continuously at a rate of 512 Hz with a bandpass of DC-100 Hz, and stored for offline analysis. All off-line analyses were performed using Brain Electrical Source Analysis software (BESA, version 6.1; MEGIS GmbH, Gräfelfing, Germany).

The continuous EEG data were first digitally filtered with 0.3 Hz high-pass (forward, 6 dB/octave) and 40 Hz low-pass filters (zero phase, 24 dB/octave). For each participant, a set of ocular movements was identified from the continuous EEG recording and then used to generate spatial components that best account for eye movements. The spatial topographies were then subtracted from the continuous EEG to correct for lateral and vertical eye movements as well as for eye-blinks. The analysis epoch consisted of 200 ms of pre-stimulus activity and 1000 ms of post-stimulus activity time-locked to the stimulus onset. After correcting for eye movements, data for each participant were then scanned for artifacts; epochs including deflections exceeding 120 µV were marked and excluded from the analysis. The remaining epochs were averaged according to electrode position, stimulus type (i.e., neutral, congruent, incongruent), experimental condition (i.e., auditory word Stroop, Solfège, and Keycodes), and response accuracy (correct vs. incorrect response). Each average was baseline-corrected with respect to the pre-stimulus interval (i.e., mean amplitude over the 200 ms prior to stimulus onset).

The ERP analyses involved only correct trials and focused on mean amplitude for three pre-defined time windows and electrode clusters motivated by prior auditory (i.e., Itoh et al., 2005; Donohue et al., 2012) and visual (West and Alain, 1999) Stroop studies. The first window (150–250 ms) comprised of C1, Cz, C2, FC1, FCz, and FC2 electrodes over the frontocentral scalp area to measure the P200 response. The N450 (325– 525 ms) was quantified using the same electrodes. A late positive component (LPC) modulation (600–900 ms) was quantified over the P1, P3, Pz, P2, P4, PO3, POz, and PO4 electrode sites.

## RESULTS

In a preliminary analysis we examined whether RTs and error rates differed for the two neutral stimuli (i.e., phonemes and sinetones). For RTs, the ANOVA with Neutral Type and Task as within subject factors and Group as between-subject factors did not reveal significant differences in RT as a function of Neutral Type [F(1,54) = 3.16, p = 0.081], nor did this factor interact with Musicianship [F(2,54) = 2.23, p = 0.118] or Task (F < 1). The three way interaction between Neutral Type, Task, and Musicianship was not significant [F(4,108) = 1.04, p = 0.391]. For error rates, the ANOVA with Neutral Type and Task as within-subject factors and Group as between-subject factors did not reveal significant differences in error as a function of Neutral Type [F(1,54) = 2.96, p = 0.091], nor did this factor interact with Musicianship [F(2,54) = 1.18, p = 0.313] or Task (F < 1).

The three way interaction between Neutral Type, Task, and Musicianship was not significant [F(4,108) = 1.18, p = 0.322]. Thus, the RTs and error rates for neutral phonemes and sinetones were averaged together into a single neutral condition.

## Musicianship and Congruency Effect on Response Time

The group mean RTs for each task and condition are shown in **Figure 1**. A mixed design ANOVA on RTs with Musicianship (NM, RP, and AP) as a between-subject factor and Task (Words, Solfège, and Keycodes) and Congruence (Congruent, Incongruent, and Neutral) as within-subject factors and post hoc comparisons used the Bonferroni procedure. The ANOVA yielded a three-way interaction between Musicianship, Task, and Condition [F(8,216) = 2.81, p < 0.01, η 2 <sup>p</sup> = 0.094]. To better understand this interaction, we examined the effect of musicianship and condition in each task separately.

For the Words task, we anticipated that musicians would be faster than non-musicians given the mounting evidence that musical training enhanced executive function and cognitive control (Moussard et al., 2016; Alain et al., 2019). The ANOVA on RTs revealed a significant group × stimulus type interaction [F(4,108) = 5.56, p < 0.001, η 2 <sup>p</sup> = 0.171], implying differentials of interference effects between the AP, RP, and NM groups. Nonetheless, it should be noted that the main effect of musicianship was significant [F(2,54) = 20.97, p<0.001, η 2 <sup>p</sup> = 0.437]. Post hoc comparisons show that there was no difference between AP and RP (p = 1) though both groups were significantly faster than NMs (p < 0.001 for both differences) due to an overall RT advantage for this task, despite congruency effects. As for the interaction between congruence and musicianship, NMs were significantly slower for incongruent trials, intermediate for neutral, and fastest for congruent trials (all pairwise comparisons were significant, p < 0.01 in all cases). For musicians with AP, the congruent condition was significantly faster than incongruent (p < 0.001) but not neutral (p = 1), and the RTs to neutral words were significantly faster than those generated for incongruent stimuli (p < 0.01). Finally, musicians with RP showed slower RTs to incongruent than both congruent (p < 0.001) and neutral (p < 0.001) stimuli, but no difference was seen between congruent and neutral.

For the Solfège task, we expected NMs to show little interference since they do not associate verbal codes with pitch representation using this lexicon, while musicians should show interference, particularly AP who have learned to associate pitch representations with labels automatically. Surprisingly, the interaction between musicianship and congruency was not significant. This suggests comparable interference effect in AP, RP, and NM groups. Nonetheless, the main effect of musicianship was significant [F(2,54) = 21.05, p < 0.001, η 2 <sup>p</sup> = 0.438], with AP and RP musicians not being significantly different in RT and both musicians groups being faster than NMs (p < 0.001 for both differences). There was also a main effect of congruence [F(2,54) = 6.68, p < 0.01, η 2 <sup>p</sup> = 0.110]. Pairwise comparison shows that the congruent condition was significantly faster than both incongruent (p < 0.001) and neutral (p = 0.029), while no difference was found between incongruent and neutral.

For the Keycodes task, we expected NMs to also show little to no interference unlike musicians. The interaction between congruence and musicianship was significant [F(4,108) = 2.97, p = 0.023, η 2 <sup>p</sup> = 0.099]. Again, the main effect of musicianship was significant [F(2,54) = 21.04, p = 0.001, η 2 <sup>p</sup> = 0.438] in a similar manner as the other two tasks. The main effect of congruence was not significant. Surprisingly, the NM group was significantly faster for the incongruent condition when compared to the congruent condition (p = 0.012).

## Musicianship and Congruency Effect on Error Rates

The group mean error rates for each task and condition are shown in **Figure 2**. A mixed design ANOVA with the same factors as the RTs was performed for error rates. Similar to RTs, the threeway interaction between Musicianship, Task, and Congruency was significant [F(8,216) = 9.16, p < 0.001, η 2 <sup>p</sup> = 0.253]. To better understand this interaction, we examined the effect of musicianship and condition in each task separately.

For English words, the interaction between musicianship and condition was significant [F(4,108) = 11.31, p < 0.001, η 2 <sup>p</sup> = 0.295]. NMs were more accurate for congruent than incongruent or neutral stimuli (p < 0.001, in both cases). They were also more accurate for neutral than incongruent stimuli (p < 0.001). Musicians with RP were more accurate for congruent than incongruent stimuli (p < 0.01). They were also more accurate for neutral than incongruent stimuli (p < 0.01). There was no difference between the neutral and congruent words condition. Musicians with AP did not show significant differences in accuracy for congruent, incongruent, and neutral stimuli for this task.

For the Solfège task, there was a significant interaction between musicianship and condition [F(4,108) = 6.36, p < 0.001, η 2 <sup>p</sup> = 0.191]. NMs were more accurate for congruent than incongruent or neutral stimuli (p < 0.001 in both cases). They were also more accurate for neutral than incongruent stimuli (p = 0.024). Musicians with RP and AP showed similar error rates in this task for congruent, incongruent, and neutral stimuli with no significant differences.

For the keycodes task, there was a significant musicianship by condition interaction [F(4,108) = 12.89, p < 0.001, η 2 <sup>p</sup> = 0.323]. NMs were more accurate for incongruent than congruent (p < 0.01). They were also more accurate for incongruent and congruent than neutral stimuli (p < 0.001 in both cases). Musicians with RP and AP showed comparable error rates in the keycodes task with no significant accuracy differences between congruent, incongruent, and neutral stimuli.

## Effects of Musicianship on Neural Correlates of Stroop Effect

**Figure 3** shows group mean ERPs elicited as a function of task and congruency. In all three groups, stimuli generated an N1 and

P2 deflection at fronto-central sites, which inverted in polarity at inferior parietal sites and mastoids, consistent with generators in superior temporal gyrus. These sensory evoked responses were followed by a broad negativity that peaked at about 525 ms after sound onset at central scalp sites. This negativity was followed by a positive wave at parietal sites (**Figure 4**). The analyses of ERP amplitude focus on congruent and incongruent trials. The ERPs generated by the neutral trials were excluded because the neutral stimuli had distinctive acoustic features making it difficult to dissociate congruency effects from acoustically driven factors. Three different spatio-temporal clusters were examined: Central P200, Central N450, and Parietal LPC (**Figures 5**–**7**).

#### Central P200

**Figures 5**–**7** show the group mean amplitude as a function of task and congruency. The three-way interaction between musicianship, task, and congruency was significant [F(4,108) = 3.00, p = 0.045, η 2 <sup>p</sup> = 0.085]. To further elucidate this interaction, the tasks were analyzed separately in a similar manner as the behavioral results. The main effect of musicianship was not significant.

For English words, the interaction between musicianship and group was not significant. However, there was a main effect of congruency [F(1,54) = 42.93, p < 0.001, η 2 <sup>p</sup> = 0.443], where the congruent condition was significantly more positive than the incongruent condition (p = 0.017 for NM, p < 0.001 for RP and AP). For the solfège and keycodes tasks, there were no significant main effects or interactions between group and congruence.

#### Central N450

**Figures 5**–**7** display the differences in amplitudes for the N450 modulation. The three-way interaction between group, task, and congruency trended toward significance [F(4,108) = 2.12, p = 0.084], whereas the two-way interaction between task and congruency was significant [F(1,54) = 8.59, p < 0.001, η 2 <sup>p</sup> = 0.141]. To further elucidate the three-way interaction, the tasks were analyzed separately. The ANOVA also yielded a main effect of congruency [F(1,54) = 20.29, p < 0.001, η 2 <sup>p</sup> = 0.273], with greater negativity in ERPs for incongruent than congruent stimuli (p < 0.001) across the tasks. A significant main effect of musicianship was also found [F(2,54) = 5.18, p < 0.01, η 2 <sup>p</sup> = 0.161]. Pairwise comparisons showed larger ERP amplitude

in NMs compared to AP (p = 0.048) and RP (p = 0.010) across the tasks and congruencies.

For the English word Stroop task, the group by congruency interaction was not significant [F(2,54) = 0.371, p = 0.692, η 2 <sup>p</sup> = 0.014]. In all three groups, the incongruent stimuli generated significantly greater negative ERP amplitude than the congruent stimuli [F(1,54) = 22.77, p < 0.001, η 2 <sup>p</sup> = 0.297]. The main effect of musicianship trended toward significance [F(2,54) = 3.11, p = 0.053, η 2 <sup>p</sup> = 0.103].

For solfège, the group by congruency interaction was not significant [F(2,54) = 1.60, p = 0.212, η 2 <sup>p</sup> = 0.056]. However, there was a main effect of musicianship [F(2,54) = 6.85, p < 0.01, η 2 <sup>p</sup> = 0.202]. Pairwise comparisons revealed larger ERP amplitude in NMs than musicians with RP (p < 0.01) or AP (p = 0.021). The ERP amplitude was comparable for the RP and AP groups and not significantly different.

In the keycodes task, there was a significant interaction between musicianship and congruency [F(2,54) = 3.67, p = 0.032, η 2 <sup>p</sup> = 0.120]. Musicians with AP showed a significant congruency effect (i.e., N450, p < 0.01), while no differences in ERP amplitude were found between congruent and incongruent stimuli in NM and musicians with RP.

#### Parietal LPC (600–900)

**Figures 5**–**7** show the differences in ERP waveform amplitudes for the parietal LPC. The three-way interaction between musicianship, task, and congruency was significant [F(4,108) = 2.67, p = 0.036, η 2 <sup>p</sup> = 0.090]. Once again, to further elucidate this interaction, the tasks were analyzed separately. There was also a main effect of task [F(2,108) = 2.67, p = 0.036, η 2 <sup>p</sup> = 0.090] and a main effect of musicianship [F(2,54) = 6.15, p < 0.01, η 2 <sup>p</sup> = 0.186].

For the English words, the group by congruency interaction was not significant [F(2,54) = 1.64, p = 0.204, η 2 <sup>p</sup> = 0.057]. Overall, the ERP amplitude was smaller for congruent than incongruent stimuli [F(1,54) = 45.53, p < 0.001, η 2 <sup>p</sup> = 0.457]. The main effect of musicianship was significant [F(2,54) = 7.36, p < 0.01, η 2 <sup>p</sup> = 0.214]. Pairwise comparisons reveal that the musicians with AP generated more negative amplitude than NM group

FIGURE 4 | Event-related potential waveforms at the midline parietal (Pz) electrode site showing a posterior late positive complex (LPC) wave that significantly differs in amplitude at late stages of processing for congruent and incongruent word form conditions, which may reflect a post-trial motor response awareness. English words elicit significant congruence effects but only AP shows significant LPC differences for keycodes.

(p < 0.01). There was no difference in ERP amplitude between the NM and RP group (p = 0.074).

In the solfège task, the group by congruency interaction was not significant [F(2,54) = 1.86, p = 0.166, η 2 <sup>p</sup> = 0.064]. The ERP was significantly more positive in amplitude for congruent than incongruent stimuli [F(1,54) = 6.21, p = 0.016, η 2 <sup>p</sup> = 0.103]. There was a main effect of musicianship [F(2,54) = 4.66, p = 0.014, η 2 <sup>p</sup> = 0.147]. Pairwise comparisons showed greater positive ERP amplitude in NMs than musicians with AP (p = 0.013) but not RP (p = 0.086). There was no difference in LPC amplitude for the solfège task between AP and RP groups.

For keycodes LPC, there was a significant interaction between musicianship and congruency [F(2,54) = 3.41, p = 0.040, η 2 <sup>p</sup> = 0.112]. In musicians with AP, the LPC amplitude was significantly more positive for incongruent than congruent stimuli (p < 0.01) while no such differences in ERP amplitude were observed in musicians with RP or in the NM group. Overall, the ERP amplitude was smaller for congruent than incongruent stimuli, [F(1,54) = 5.74, p = 0.022, η 2 <sup>p</sup> = 0.094]. The main effect of musicianship was also significant [F(2,54) = 5.44, p < 0.01, η 2 <sup>p</sup> = 0.168], with more positivity of ERP amplitude in NMs than musicians with AP (p = 0.012) or RP (p = 0.024). There was no significant difference in ERP amplitude between musicians with AP or RP.

### DISCUSSION

In the present study, we used three different auditory versions of the Stroop task to assess the strength of association between the perceived pitch-classes and verbal labels in musicians with RP and AP. We hypothesized that musically trained groups would show greater Stroop interference effects proportionately when presented with incongruent musical notations than nonmusicians. We also anticipated that musicians may perform better in absolute terms than non-musicians given the mounting evidence suggesting that musical training enhances executive functions and attentional control (Moreno et al., 2014).

## Behavioral Data: Response Times and Error Rates

Overall, participants were slower and made more errors for incongruent than congruent stimuli. This congruency effect was modulated by lexicon, being stronger for word than for solfège or

displaying temporal windows of interest. (B) Topographic maps show difference dipoles at 190 and 400 ms along with a posterior positivity at 700 ms post-stimulation, which occur within the temporal windows of interest. (C) Bar graphs showing group mean event-related potential amplitude for the P200 (top panel), N450 (middle panel) and LPC (bottom panel).

Germanic keycodes stimuli. Overall, musicians with RP and AP were faster and more accurate than non-musicians. This pattern of results is consistent with many studies showing that musicians often outperform non-musicians in a wide range of auditory tasks involving executive functions and attentional control (Moussard et al., 2016; Alain et al., 2018). However, the analyses of RT data show small or even non-existent congruency effects in musicians for solfège and keycodes versions of the Stroop task. This finding was unexpected and appears inconsistent with the hypothesis that musical training and AP are associated with a strong association between a perceived pitch and a verbal label. One possible explanation for these small and no congruency effects in the solfège and keycodes tasks would be that the musicians were successful in suppressing task-irrelevant information. The pattern of RTs during the word Stroop task is consistent with this possibility, with AP showing the smallest congruency effect, even in terms of accuracy as will be discussed. Further, the reverse Stroop effect findings for keycodes in NMs may occur due to a lack of long-term learning, which makes affirmative association processing or evaluation of unfamiliar label to pitch mappings slower and less accurate than simply ignoring the unfamiliar verbal information and responding to pitch directly in the incongruent keycodes condition.

The analyses of error rate are also consistent with a flexibility of task-relevant attentional control account, with greater accuracy on incongruent trials across the tasks among musicians with RP and AP than non-musicians. Indeed, musicians with AP had virtually no errors in all three versions of the Stroop task, suggesting that they were able to successfully suppress task-irrelevant information and/or competing responses. While musician groups were not immune to incongruence, the findings seem consistent with several studies showing superior performance in musicians compared to non-musicians in a wide range of tasks involving auditory discrimination, executive functions, and attentional control (e.g., Kaiser and Lutzenberger, 2003; Marques et al., 2007; Putkinen et al., 2012; Alain et al.,

2018). At the same time, it should be noted that interference effects for words were observed in AP RTs rather than error rate. Interestingly, neutral word conditions were responded to more rapidly and accurately than incongruent stimuli but less rapidly then congruent stimuli by NMs only, while musician RTs did not differ between congruent and neutral English words. This may imply a RT facilitation effect in NMs for congruent trials that has reached fluency in AP and RP, allowing the musicians to name neutral trials with the same fluency NMs seem to name learned percepts such as colors in the color-word Stroop task (Stroop, 1935).

## Neural Correlates: ERP Responses to Congruent and Incongruent Stimuli

The processing of congruent and incongruent stimuli was associated with three temporally and spatially distinct neural signatures. The first modulation peak of interest was directionally positive, occurred at about 200 ms after sound onset, and was larger over midline central and fronto-central areas. In a prior spoken words version of the "low/high" auditory Stroop task (Donohue et al., 2012), a similar neural response was observed around 200 ms. This neural response was found to be significantly attenuated for incongruent compared to congruent sung words but not musical annotations, suggesting that conflicting information in musical domain maybe less salient than in standard language. In the present study, our behavioral data showed reduced congruency effects in the solfège and keycodes tasks. At the same time, the amplitude of the early modulation at 200 ms, or P200, remained comparably unaffected by congruence across both of these tasks. This appears inconsistent with current models positing a strong conditional association between musical notation and pitch, unlike language, which is consistent with the behavioral findings. If both AP and RP musicians were processing musical notation at the same processing strength as words, then one would expect to observe larger and regular congruency effects at that latency for both groups across all tasks as was observed for words across all groups. However, despite a lack of distinct congruency effects on P200 in the music tasks and lack of statistical significance

in group main effects, the overall amplitude was nevertheless slightly larger in musicians across tasks and conditions, which seems consistent with studies showing that this modulation may be an important physiological correlate for longer timescale auditory learning (Reinke et al., 2003; Ross and Tremblay, 2009; Alain et al., 2015). The somewhat greater P200 positivity in the RP and AP groups may also imply an advantage in early processing, which might allow the stimulus conflict to be more easily responded to. Certainly, the idea that musical mappings do not produce strong response conflict but musicianship may still allow for greater efficiency in stimulus conflict detection seems to be consistent with both the behavioral and merely observed P200 results. Perhaps, early sensory stream information processing of direction of change or frequency detection from prolonged familiarity to pitch naming causes P200 to be minutely greater in musicians such that the statistically significant congruence effects on P200 are measurably greater in musicians for at least words, which do indeed show evidence of strong associational bonding to pitch information.

The modulation peaking at about 200 ms was followed by another one that peaked at about 450 ms after sound onset, which was characterized by an increased negativity elicited by the incongruent relative to congruent stimuli. This modulation, referred to as the N450, had a broader and more sustained time course than the early P200, and peaked over the left central and fronto-central scalp area. This long lasting negativity could reflect both congruence detection as well as conflict resolution as it remained until a response was made (see also West and Alain, 1999). The N450 amplitude varied as a function of task, being greater for the English word Stroop task than the solfège or keycodes Stroop tasks. This may be due to greater cognitive load related to the semantic effects of the standard language Stroop tasks and greater flexibility between verbal code and musical pitch in terms of associative processing, particularly in the more common RP ability. The Stroop interference effect also decreased when participants were presented with less familiar material such as in the solfège and keycodes tasks. That is, Romanic solemnization or Germanic key lexicons may not possess as strong stimulus to response mappings as the English language (standard auditory Stroop), even for AP. In other words, there is less interference of response in general for pitch naming with musical lexicons than there is for the English language lexicon. Thus, it may also be easier to suppress abstract musical lexicons than concrete referential pitch labels such as low or high. The latter is likely to have a stronger stimulus to response mapping than more abstract musical notations despite prolonged exposure to musical lexicons. However, it may be possible that the significant differences in N450 and also LPC from congruency effects in the keycodes task suggest that only AP seem to experience stimulus conflict though not necessarily response conflict as measured with RTs and accuracy. At the same time, the N450 for both musician groups was significantly lower in amplitude than the NMs, especially in the auditory word Stroop task. This finding may reflect modified thresholds for information conflict detection in the auditory domain for musically trained populations.

The LPC is thought to index conceptual level of representation (West and Alain, 1999). In the present study, its amplitude was modulated by task and musical training, with greater difference between congruent and incongruent stimuli during the auditory word Stroop than musical annotations. Given that the LPC often occurred at the time of or even after a button-press response was made, we reasoned that the conceptual level of representation is a post-response or decision evaluation process, described in previous studies as a semantic re-activation of the lingual item preceding a response (Ergen et al., 2014). The effects of task on this modulation provide some support for such an interpretation because the conceptual level of representation differed between the three tasks, with a stronger representation for words than solfège or keycodes. Interestingly, the modulation was observed in the keycodes task for AP musicians with differences between congruent and incongruent stimuli, while no such modulation was seen in NMs or musicians with RP. This provides converging evidence supporting the notion that AP might yield a conceptual level post-response process for alpha-numeric encodings, which in turn may be modulated by incongruence when there is conflict in information. AP possessors were also significantly different in LPC amplitudes for English words, which may mean they process word forms and alpha-numeric musical encodings similarly but with less response conflict for the musical codes, contrary to our hypotheses. The interaction between musicianship, task, and congruency for the LPC also seems to imply less processing power requirements for the musical tasks, which may show a transfer effect to non-musical tasks in musicians, though both the exogenous and LPC modulations may demonstrate differential use of cognitive resources and possibly behavioral strategy for each group.

## CONCLUSION

Music can be viewed as another form of language or complex communication medium. Those proficient in music (i.e., musicians) likely develop a lexicon of musical annotations that may share similarities with speech representation. That is, a familiar pitch and musical annotation may automatically trigger a verbal label with a specific meaning. If this is the case, then musicians should experience difficulty in processing incongruent musical stimuli. The present study shows that musicians, even those with AP, exhibited little interference effect at the behavioral level in the two musical versions of the Stroop task. However, the analyses of ERP data show a significant difference between congruent and incongruent in the Germanic keycodes for AP, unlike in the RP or NM groups for this task. This finding suggests that AP possessors, at least at the neural level, might process phonemic equivalents of alpha-numeric encodings of pitch information with a similar associative processing as word forms. Yet, while this processing appears to have semantic properties it is not as powerful as that of standard language. Thus, keycodes appear to have unique processing in music for their mappings to pitch classes in AP. We hypothesize that greater executive function and attentional control associated with playing a musical instrument

may help musicians process conflicting information. For this reason, the detection of stimulus conflict may generate attenuated response conflict in musicians compared to NMs, which would help to explain certain discrepancies between the neural and behavioral data in particular parts of our study, along with the more rapid and accurate musician responses. Future research is needed to better understand the apparent contradiction between behavioral and electrophysiological measures in AP. At the same time, the lack of P200 congruency effects in the musical tasks should also be examined by increasing task difficulty and attentional load. Finally, cross-modal transfer of musical training on semantic information may be studied by applying conceptual level processing to domains outside of pitch perception, such as spatialization.

### DATA AVAILABILITY

The datasets generated for this study are available on request to the corresponding author.

#### ETHICS STATEMENT

The studies involving human participants were reviewed and approved by the Human Research and Ethics Committee at

#### REFERENCES


the Baycrest Centre, Toronto, ON, Canada. The participants provided their written informed consent to participate in this study.

## AUTHOR CONTRIBUTIONS

VS, MT, FR, and CA designed the experiments and interpreted the results of experiments. VS performed the experiments and collected the data. CA and VS analyzed the data. VS and CA drafted the manuscript. All authors edited, revised, and approved the final version of the manuscript.

## FUNDING

This work was supported by the Natural Sciences and Engineering Research Council of Canada to CA under Grant (RGPIN-2016-05523).

## ACKNOWLEDGMENTS

We would like to thank all the participants for their involvement in this study.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Sharma, Thaut, Russo and Alain. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Uncertain Emotion Discrimination Differences Between Musicians and Non-musicians Is Determined by Fine Structure Association: Hilbert Transform Psychophysics

Francis A. M. Manno1,2 \*, Raul R. Cruces<sup>3</sup> , Condon Lau<sup>2</sup> \* and Fernando A. Barrios<sup>3</sup> \*

<sup>1</sup> School of Biomedical Engineering, Faculty of Engineering, University of Sydney, Sydney, NSW, Australia, <sup>2</sup> Department of Physics, City University of Hong Kong, Hong Kong, China, <sup>3</sup> Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, Mexico

#### Edited by:

Paul J. Colombo, Tulane University, United States

#### Reviewed by:

Bruno Gingras, University of Vienna, Austria Graham Frederick Welch, UCL Institute of Education, United Kingdom

#### \*Correspondence:

Francis A. M. Manno Francis.Manno@Sydney.edu.au Condon Lau condon.lau@cityu.edu.hk Fernando A. Barrios fbarrios@unam.mx

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 17 June 2019 Accepted: 13 August 2019 Published: 18 September 2019

#### Citation:

Manno FAM, Cruces RR, Lau C and Barrios FA (2019) Uncertain Emotion Discrimination Differences Between Musicians and Non-musicians Is Determined by Fine Structure Association: Hilbert Transform Psychophysics. Front. Neurosci. 13:902. doi: 10.3389/fnins.2019.00902 Humans perceive musical sound as a complex phenomenon, which is known to induce an emotional response. The cues used to perceive emotion in music have not been unequivocally elucidated. Here, we sought to identify the attributes of sound that confer an emotion to music and determine if professional musicians have different musical emotion perception than non-musicians. The objective was to determine which sound cues are used to resolve emotional signals. Happy or sad classical music excerpts modified in fine structure or envelope conveying different degrees of emotional certainty were presented. Certainty was determined by identification of the emotional characteristic presented during a forced-choice discrimination task. Participants were categorized as good or poor performers (n = 32, age 21.16 ± 2.59 SD) and in a separate group as musicians in the first or last year of music education at a conservatory (n = 32, age 21.97 ± 2.42). We found that temporal fine structure information is essential for correct emotional identification. Non-musicians used less fine structure information to discriminate emotion in music compared with musicians. The present psychophysical experiments revealed what cues are used to resolve emotional signals and how they differ between non-musicians and musically educated individuals.

Keywords: emotion, psychophysics, modulation, fine structure, envelope, frequency, amplitude

## INTRODUCTION

The process of resolving emotions has been described as the optimization of an economic choice (Seymour and Dolan, 2008). Experimentally, emotions are selected due to their perceived certainty and robust differentiation (Russell, 1980; Barrett, 1998; LeDoux, 2000; Phan et al., 2002; Koelsch, 2014). Few studies have analyzed uncertain emotions or the psychoacoustic cues that endow sound with emotion. Here, we sought to identify the sound attributes involved in musical emotion discrimination and to determine if non-musicians and musicians perceive emotional sound cues differently. We were interested in emotional uncertainty, where the perception of sound attributes is difficult to distinguish.

## Cues in Emotion

fnins-13-00902 September 14, 2019 Time: 12:26 # 2

Music is transmitted through temporal fine structure (TFS) and envelope (ENV) modulations, which are the perceived changes of the sound in amplitude and frequency, respectively. However, the exact contributions of TFS and ENV used to resolve emotion in music are unknown. Although the influence of pitch on emotional content of speech is well known (Lieberman and Michaels, 1962; Wildgruber et al., 2005), its contribution to music is less clear (Scherer, 1995; Coutinho and Dibben, 2013). Most studies have concentrated on emotion in speech (e.g., since very early on: Fairbanks and Pronovost, 1938; Fairbanks, 1940; Lieberman et al., 1964) or the cues (TFS or ENV) that confer intelligibility to speech (Licklider, 1946; Lieberman and Michaels, 1962; Williams and Stevens, 1972; Frick, 1985; Drullman et al., 1994, 1996; Drullman, 1995; Shannon et al., 1995; Leinonen et al., 1997). Using a Hilbert transform, which allows the signal to be deconstructed and reconstructed into its individual frequency and amplitude time components (King, 2009), researchers found that ENV was most important for speech reception, whereas TFS was most important for pitch perception (Smith et al., 2002). Several follow-up studies have corroborated the importance of ENV for speech intelligibility, in addition to the importance of TFS features (Zeng et al., 2004, 2005; Davidson et al., 2009; Fogerty, 2011; Swaminathan and Heinz, 2012; Apoux et al., 2013; Shamma and Lorenzi, 2013; Moon et al., 2014; Swaminathan et al., 2014). Unfortunately, the sound properties (ENV and TFS) that confer emotion to music have been less studied than the sound properties conveying emotion in speech, i.e., prosody (Hodges, 2010; Coutinho and Dibben, 2013). Therefore, a primary aim for the present experiment was to discern which and how the attributes in sound endow music with emotion.

## Differing Ability to Discriminate Emotions in Auditory Cues: Emotional Resolvability Differences

Musicians have an enhanced auditory perception for several acoustic features, such as the ability to learn lexical tones (Wong et al., 2007), enhanced audiovisual processing (Musacchia et al., 2007), better speech-in-noise perception (Bas˛kent and Gaudrain, 2016; Coffey et al., 2017), better pitch discrimination thresholds (Micheyl et al., 2006), and superior frequency discrimination (Liang et al., 2016; Mandikal Vasuki et al., 2016; Madsen et al., 2017) compared with non-musicians. Musical training and experience shape linguistic patterns (Wong et al., 2007) and enhance speech-in-noise discrimination (Parbery-Clark et al., 2009b), altering brainstem and cortical responses to musical and non-musical acoustic features (Parbery-Clark et al., 2009a; Kraus and Chandrasekaran, 2010; Strait et al., 2010). Musicians possess different auditory perceptual abilities than non-musicians; hence, a musician's ability to discriminate emotion in sound when linked to TFS or ENV changes should also differ from their non-musician colleagues.

## Speech and Music Relations

There are several dimensions, models, and psychoacoustic features which are used to categorize emotion in music (Russell, 1980; Schubert, 2004, 2013; Gabrielsson and Lindstrom, 2010; Eerola and Vuoskoski, 2011; Eerola, 2012). Psychophysical studies (i.e., frequency and amplitude features) of emotion make up less of the literature than other aspects (i.e., anxiety, arousal, etc., Hodges, 2010). For example, Gingras et al. (2014) found that intensity and arousal ratings in music-induced emotion were largely unaffected by amplitude normalization, suggesting that additional acoustic features besides intensity could account for the variance in subjective arousal ratings. Further, some of the same neurobiological mechanisms underlying emotion in music also subserve the emotion in speech (Fitch, 2006; Juslin and Västfjäll, 2008; Kotz et al., 2018). This is important as several studies give clues as to how uncertain emotion in music might be perceived. For example, Shannon et al. (1995) found speech recognition primarily utilized temporal cues with a few spectral channels. Further to that stated above, Smith et al. (2002) found ENV is most important for speech reception, and TFS most important for pitch perception. Several follow-up studies have corroborated the importance of ENV for speech intelligibility up to a certain number of bands including aspects of TFS (Zeng et al., 2004, 2005; Davidson et al., 2009; Fogerty, 2011; Swaminathan and Heinz, 2012; Apoux et al., 2013; Shamma and Lorenzi, 2013; Moon et al., 2014; Swaminathan et al., 2014). Although the emotion carried in speech is a similar, but different auditory perception than music, resolving emotion in music will aide our understanding of identifying emotions in sound.

## Present Experimental Aims

Our aim was to conduct a psychoacoustic experiment to investigate certain and uncertain emotion in musical sound. We sought to determine which cues are used to resolve emotional signals (Pfeifer, 1988). Moreover, we studied the differences between musicians and non-musicians. To accomplish these aims, we decomposed happy and sad musical stimuli in TFS or ENV using a Hilbert transform (Smith et al., 2002). The process yielded stimuli with increasing decomposition in TFS and ENV, and then we explored the different degrees of emotional certainty they conveyed. Certainty was defined as the ability to identify the decomposed stimuli based on their unaltered forms. Happy and sad stimuli with varying degrees of decomposition were presented in a forced-choice discrimination task. First, we expected that decomposing TFS or ENV information essential to determining emotionality in sound would reveal which cue was more important for emotion discrimination. Second, we expected that segregating participants by their identification with the original excerpt into good and poor performers, based on the reported classification (Peretz et al., 1998, 2001; Gosselin et al., 2007; Khalfa et al., 2008; Hopyan et al., 2016), would result in different emotional resolvability curves. Third, we expected that assessing musicians in their first year of study in the conservatory compared to those in their last year would reveal differences in emotional resolvability based on their musical education. Lastly, comparing non-musicians to musicians would reveal differences in emotional resolvability based on musical experience. Our main aim was to understand the cues used to resolve emotional signals.

#### MATERIALS AND METHODS

fnins-13-00902 September 14, 2019 Time: 12:26 # 3

The experiment included both non-musicians and musicians who were studying music at a conservatory as participants. Participants with musical experience were from a musical conservatory located in Querétaro, Mexico. All data, sound files, and scripts are available at www.fmanno.com, the Open Science Framework (Manno et al., 2018, 2019) and GitHub<sup>1</sup> . **Supplementary Data Sheet S1** contains extended analyses and **Supplementary Data Sheet S2** contains the original data. The research protocol was approved by the Internal Review Board of the Instituto de Neurobiología, Universidad Nacional Autónoma de México in accordance with the Declaration of Helsinki, 2013. Informed consent (verbal and written) prior to undertaking the experiment was granted and abided by as set forth in the Ethical Principles of the Acoustical Society of America for Research Involving Human and Non-human Animals in Research and Publishing and Presentations.

#### Study Participants

Participants were recruited from a local university (Natural Sciences and Musical Conservatory), and final participants were randomly selected from approximately 320 individuals (with the sample not differing from the population group). All individuals were native Spanish speakers. The present study included 64 individuals divided equally into 8 groups (**Table 1**). Nonmusician participants were selected and classified as poor and good (n = 32, age 21.17 ± 2.63 SD), based on their performance in the Montreal Emotional Identification Task using greatest mean spread between the groups as the separation metric (Peretz et al., 1998, 2001; Gosselin et al., 2007, 2011; Khalfa et al., 2008; Hopyan et al., 2016). Poor and good participants did not have musical training or education, nor did this cohort play instruments. Poor and good participants were a separate sample from musicians. Musicians were separated based on musical education as firstyear (low education) and last-year (high education) students at the conservatory (n = 32, age 21.97 ± 2.42). Musically educated participants were recruited from the wind and string sections of the conservatory. All volunteers were free of contraindications for psychoacoustic testing. Prior to undertaking the emotional resolvability experiment, participants had audiometric testing to confirm hearing within normal limits.

Audiometric testing consisted of the examiner presenting to the participants a series of pure tones from 400 to 8,000 Hz, in addition to linear sweeps, log sweeps, and white noise in the same frequency range. The sound pressure level (SPL) decibels (dB) was modulated from 20 to 60 SPL dB. The participants included in the study confirmed hearing the series of audiometric presentations. Participants who did not confirm hearing these tones were excluded.

#### Acoustic Stimuli

The 32 original acoustic stimuli classified as sad or happy were taken from a previous study [Montreal Emotional Identification Task<sup>2</sup> ; **Supplementary Table S1** (Peretz et al., 1998, 2001; Gosselin et al., 2007, 2011; Khalfa et al., 2008; Hopyan et al., 2016)]. Half of the stimuli in the repertoire evoked a sense of happiness (major mode with a median tempo of 138 beats per min, bpm), and the other half evoked a sense of sadness (minor mode with a median tempo of 53 bpm) (Peretz et al., 2001). The altered excerpts had the same frequency and amplitude values in terms of pitch and duration as was found in the original stimulus. The sound files of classical music were processed in MATLAB to curtail length to 3-s (in order to present the entire battery of stimuli within an hour period), restricted in frequency/amplitude range for presentation (to reduce noise in the Musical Instrument Digital Interface (MIDI) file), and analyzed spectrally for differences in TFS or ENV (see **Figure 1**). Original and altered stimuli were presented with MATLAB (Statistics Toolbox Release 2012b, The MathWorks Inc., Natick, MA) using the Psychophysics Toolbox extension<sup>3</sup> (Manno et al.,

<sup>1</sup>https://github.com/rcruces/2019\_uncertain\_emotion\_discrimination

<sup>2</sup>www.brams.umontreal.ca/peretz

#### <sup>3</sup>http://psychtoolbox.org


Group (good, poor, first year music education or last year music education) by sex (male and female). Years of music education ± SD years. Correct identification – indicating the percent identification based on the original classification of the Montreal Emotional Identification Task. Total – indicating the total number of stimuli presented with the column representing the number out of the total.

TABLE 1 | Participant data.

magnitude on the y-axis and time on the x-axis. Lower panel, fast Fourier transform plot of complex 440 Hz 4th order harmonic tones with magnitude (dB) on y-axis and frequency hertz on x-axis. (g) Hilbert decomposition of single 440 Hz tone fine structure with complex 4th order harmonics envelope. (h) Hilbert decomposition of complex 4th order harmonics fine structure with single 440 Hz tone envelope. Progression from left to right for both panels (g,h) represent the Hilbert transformation for this simple example with 2nb, 4nb, 8nb, 16nb, 32nb, and 64nb decompositions. The simple example demonstrates how a complex acoustic stimulus that is categorized as emotional can be decomposed.

2018, 2019). Sound level was adjusted before psychoacoustic testing as described above.

## Acoustic Stimuli Decomposition

fnins-13-00902 September 14, 2019 Time: 12:26 # 5

Happy and sad stimuli were decomposed by a Hilbert transform in order to derive the altered excerpts (Smith et al., 2002; Moon et al., 2014). The decomposition process associated the acoustic aspects of emotion (happy or sad) with TFS or ENV. The ENV is represented as the magnitude of the Hilbert transform and TFS is the phase (Smith et al., 2002; King, 2009; Moon et al., 2014). Here, we created band-decomposed hybrid stimuli as mixtures of the happy and sad sounds by equal bandwidth steps. Cutting frequencies to 80, 260, 600, 1240, 2420, 4650, and 8820 Hz created six bands of decomposition: 2nb, 4nb, 8nb, 16nb, 32nb, and 64nb. Here, "nb" means number band decomposition as in the original description (Smith et al., 2002). An increase in band decomposition results in decreasing emotional resolvability for the original stimuli (Smith et al., 2002). The entire set of 224 decomposed stimuli and a script demonstrating the Hilbert transform process (Hilbert Explanation) can be found at https://osf.io/8ws7a (Manno et al., 2019). In our case, the Hilbert decomposition resulted in hybrid acoustic sounds and emotional resolvability was effectively tied to TFS or ENV in equally spaced decreasing bandwidths (Smith et al., 2002; Moon et al., 2014). For signal decomposition in the present study, the Hilbert transform y(t) in the time domain is related to real function x(t) by the analytic signal A(t) = x(t) + iy(t), with i = (–1)1/<sup>2</sup> . The Hilbert ENV is the magnitude of the analytic signal |A(t)| = ((sr(t))<sup>2</sup> + (si(t))2)1/<sup>2</sup> and the Hilbert TFS is cos ϕ(t), where ϕ(t) = arctan(sr(t)/si(t)) is the phase of the analytic signal (Smith et al., 2002; King, 2009). If the real (r) part pertains to cosine of the frequency contained within the signal and the imaginary (i) part pertains to sine of the frequency contained within the signal, the magnitude of the amplitude is related by the value of the cosine and sine of the signal (King, 2009). The decomposition process has been previously elaborated on for various signal-processing purposes (Oswald, 1956; Smith et al., 2002; Binder et al., 2004; King, 2009; Moon et al., 2014).

## Hybrid Stimuli Example

Recombined hybrid stimuli with differing combinations of ENV and TFS from either emotion category were presented in a happy/sad descending two-interval forced-choice discrimination task (**Figures 1a,b**). For visualization of the Hilbert process, we created two stimuli, one pure tone (440 Hz, **Figure 1c**), and a harmonic series of the pure tone (440, 880, 1320, and 1760 Hz, **Figure 1e**). Both stimuli underwent TFS and ENV extraction by the Hilbert transformation and decomposed into hybrid stimuli for each band of decomposition. **Figure 1g** presents the Hilbert transformation of the pure tone in TFS combined with the harmonic series stimuli in ENV. **Figure 1h** presents the Hilbert transformation of the harmonic series stimuli in TFS combined with the pure tone in ENV. From left to right for **Figures 1g,h**, band decomposition increases from 2nb to 64nb. The pure tone and the harmonics series were given an amplitude double their predecessor (t), starting with 10 dB SPL (decibels sound pressure level), a phase (π/2)/t change from its lower harmonic, and separate durations. For the representation, spectrogram plots contained magnitude (dB) on the z-axis, normalized frequency (× π rad/sample) on the x-axis, and time in milliseconds (ms) on the y-axis. Spectrograms and acoustic stimuli were normalized across power/frequency (dB/Hz) and amplitude (dB SPL) by fine structure components (Hz).

## Happy/Sad Forced-Choice Discrimination Task

A forced-choice discrimination task was conducted where participants were required to respond to the acoustic stimuli indicating if they perceived them as happy or sad. The entire repertoire of original and band-decomposed hybrid stimuli was utilized for the present experiments (see Supplementary Music Files at https://osf.io/8ws7a). The participants were presented with all the stimuli and asked to classify them as happy or sad (Smith et al., 2002: Moon et al., 2014). If an original sound was happy, decompositions were categorized as happy and the identification was deemed correct for happy and incorrect for sad. Participants were handed a sheet consisting of 32 rows and 7 columns (see Supplementary Music Matrix at https://osf.io/ 8ws7a). The columns in the Supplementary Music Matrix are organized as original stimuli presentation, followed from left to right by stimuli decompositions (2nb, 4nb, etc.). The trial was organized by random presentation followed by increasing decompositions. Decompositions in TFS were presented starting with their original unaltered form and continuing through 2nb to 64nb decompositions (**Figure 1a**).

## Statistical Analysis

#### Average Discrimination Curves by TFS

Average response calculations were derived for TFS for happy and sad musical stimuli. The average discrimination curves were percent identification of the response based on original categorization (found in TFS for all original stimuli). Curves were analyzed independently for non-musician poor and good performers separated by male and female, in addition to firstyear (low education) and last-year (high education) musicians separated by male and female.

#### Group Differences

Discrimination curves between groups were tested for significance with an ANOVA and follow-up t-test. The discrimination normalized ratio for identification of the stimuli as happy or sad was calculated by determining the percent identification of happy or sad over its opposite stimuli discrimination. Measures of discriminability (D'), and the corrected non-parametric measure of discriminability (A'), were utilized for determining differences in emotional resolvability (Stanislaw and Todorov, 1999; Verde et al., 2006). These measures provide an estimation of signal from noise and determination of the threshold response for the acoustic emotional stimuli (Green, 1960; Swets, 1961, 1986; Swets and Sewall, 1961; Swets et al., 1961). The interest in determining a threshold response for an acoustic emotional stimulus is to ascertain when an emotional stimulus becomes non-emotional (i.e., threshold). Here, we group averaged identifications; however, individual values were calculated by stimuli.

#### Normalized Benefit of TFS and ENV

fnins-13-00902 September 14, 2019 Time: 12:26 # 6

The difference between TFS or ENV emotion discrimination was examined as the ratio between the perception of one emotion (using TFS or ENV) versus the perception of the other emotion (using TFS or ENV). The ratio was calculated by the normalized benefit of TFS or ENV to the emotion discrimination curve (originally calculated for the visual contribution to speech in noise; Sumby and Pollack, 1954; Meister et al., 2016). The adopted formula was: TFS benefit = (TFS – ENV)/(1 – ENV) or ENV benefit = (ENV – TFS)/(1 – TFS), to compare both the TFS and ENV contributions, respectively, on the scale of +1 to –1. If the percent difference, normalized by each contribution result, was a positive value, this represented benefit to perception. Conversely, a negative value indicated lack of contribution to perception.

#### Discriminant Analysis

A canonical discriminant analysis was used to determine the weight of our variables (nb0, nb2, nb4, etc.), which best separated our different groups (poor performer, good performer, first year musician, last year musician). The two generalized canonical discriminant analyses (one for happy and one for sad) were computed using the multivariate linear model Group ∼ nb0 + nb2 + nb4 + nb8 + nb16 + nb32 + nb64 to obtain the weights associated with each variable. This model represents a transformation of the original variables in the space of maximal difference for the group.

## RESULTS

## Poor and Good Performers

#### Group Comparison Differences for Poor and Good Performers

Percent identification for poor (df = 6, 18, F = 30.76, p < 0.0001) and good (df = 6, 18, F = 79.04, p < 0.0001) performers was significantly different by decomposition, indicating a decrease in certainty (**Figures 2a,b**). Both poor and good performers used TFS for emotion identification, and discrimination performance decreased with increasing band (**Supplementary Figures S1a–d**). **Supplementary Figures S1a**, **S2b** show the average male poor and good identification curves, and **Supplementary Figures S1c,d** shows the average female poor and good identification curves. For visualization, happy and sad discriminability was represented by identification accuracy with the unaltered excerpt in 20th percentile bands (**Figures S2c,d**).

#### Fine Structure and Envelope Identification Differences for Happy or Sad Emotion for Poor and Good Performers

There were no apparent differences for poor performers' TFS- or ENV-based identification of happy or sad emotion. For example, good performers did not show differences for happy TFS (df = 1, 6, F = 2.545, p = 0.1617), sad TFS (df = 1, 6, F = 1.494, p = 0.2674), happy ENV (df = 1, 6, F = 1.897, p = 0.2176), and sad ENV (df = 1, 6, F = 2.885, p = 0.1403). For good performers, happy in TFS (df = 1, 6, F = 7.749, p = 0.0318) and sad in ENV (df = 1, 6, F = 7.591, p = 0.0331) were different between male and females. Sad in TFS was significantly different between poor and good performers (df = 6, 30, F = 3.773, p = 0.0091), but happy in TFS did not reach significance (df = 6, 30, F = 1.912, p = 0.1218). Happy in ENV was significantly different between poor and good performers (df = 1, 6, F = 3.630, p = 0.0110), but sad in ENV did not reach significance (df = 1, 6, F = 1.881, p = 0.1273).

#### Discrimination Curves for Poor and Good Performers

Non-musician good and poor performers were significantly different for uncertain emotion discrimination (df = 15, 90, F = 1.814, p = 0.0445). Good and poor performers used TFS to discriminate happy and sad uncertain emotions differently, by approximately 4.01 ± 3.33% SD and 9.20 ± 6.82% SD, respectively, depending on the increasing uncertainty of stimuli (**Supplementary Figures S1a–d** and Discrimination curves). Males and females, irrespective of poor or good performers, used TFS to discriminate happy and sad uncertain emotions differently, by approximately 2.67 ± 1.62% SD and 2.06 ± 1.67% SD, respectively, depending on the increasing uncertainty of stimuli (**Supplementary Figures S1a–d**, Discrimination curves).

#### Average Discriminability A' and Benefit of TFS and ENV for Poor and Good Performers

The corrected non-parametric measure of discriminability (A') was used for determining differences in emotional resolvability, and no significant difference between poor performers sad (p = 0.5957) and happy (p = 0.6612) was found. Similarly, no difference between good performers sad (p = 0.6712) and happy (p = 0.6644) was found. **Figure 3a** demonstrates grouped A' for poor and good performers by uncertain emotion. **Figure 3c** depicts the averaged normalized benefit of TFS or ENV to the emotion discrimination curve for poor and good performers. The TFS was beneficial for emotion discriminability for poor and good performers for sad and happy stimuli. Poor performers had 0.2928 and 0.2377 benefit for happy and sad, respectively. Good performers had 0.3295 and 0.3355 benefit for happy and sad, respectively. The ENV was negatively beneficial for emotion discrimination for all stimuli for poor or good performers.

#### First and Last Year Musicians Group Comparison Differences for First- and Last-Year Musicians

Percent identification for first-year musicians (df = 6, 18, F = 71.69, p < 0.0001) and last-year musicians (df = 6, 18, F = 45.37, p < 0.0001) were significantly different by decomposition, indicating decrease in certainty (**Figures 2a,b**). Both first-year and last-year musicians using TFS for emotion identification and discrimination decreased with increasing band (**Supplementary Figures S2a–f**). First-year (df = 6, 18, F = 28.29, p < 0.0001) and last-year (df = 6, 18, F = 29.04, p < 0.0001) musicians resolved happy and sad emotion differently. **Supplementary Figures S2a,d** shows the average first- or last-year musician discrimination curves for

FIGURE 2 | Heatmap for accuracy of response by group and uncertainty. Discrimination heatmap for accuracy concerning uncertain (a) happy fine structure accuracy and (b) sad fine structure accuracy. The darker red color represents greater discriminability of emotion and similarity with the original un-altered excerpt. On the x-axis, band number is represented with Org, unaltered original excerpt, to nb64. Discriminability is represented by identification accuracy with the unaltered excerpt. (c) Happy discriminability and (d) sad discriminability. The color bar represents discriminability 20th percentile bands.

FIGURE 3 | Discriminability A' and TFS/ENV benefit to stimuli perception of uncertain emotion. (a) Non-musician discriminability A' and (b) musician discriminability A'. (c) Non-musician benefit and (d) musician benefit. For all figures, the x-axis represents band number from Org to nb64. The band decompositions were associated in ranges to represent the discriminability. The z-axis represents sex (male or female), type of group (G – good, P – poor, L – first-year musician, H – last-year musician), and emotion type (happy or sad). Here, groups are connected to visualize the trend and pattern more readily. For example panels (a,b), all groups interpret stimuli with greater certainty (i.e., greater discriminability A' shows more blue coloring) for happy more than sad. The dashed line represents the grouping of happy and sad. Note that happy is fuller than sad, indicating high A'. For benefit panels (c,d), note that for TFS there is a fuller line graph occupying more area, indicating that the majority of individuals use fine structure to discriminate uncertain emotion. Note the subtle difference between happy and sad, represented by the dashed line. Musicians benefit from TFS more than non-musicians.

happy and sad stimuli. **Supplementary Figures S2b,c** shows male and female first-year musician discrimination curves and **Supplementary Figures S2e,f** shows male and female last-year musician averaged discrimination curves.

#### Fine Structure and Envelope Identification Differences for Happy or Sad Emotion for First- and Last-Year Musicians

There were no apparent identification differences for firstyear musicians related to TFS happy (df = 1, 6, F = 3.364, p = 0.1163), TFS sad (df = 1, 6, F = 0.1967, p = 0.6729) or ENV happy (df = 1, 6, F = 0.1047, p = 0.7573). Interestingly, first-year musicians used ENV sad significantly more for emotional resolvability (df = 1, 6, F = 6.058, p = 0.0490). Last-year musicians were significantly different for emotional resolvability for TFS happy (df = 1, 6, F = 14.88, p = 0.0084), TFS sad (df = 1, 6, F = 11.91, p = 0.0136), ENV happy (df = 1, 6, F = 13.66, p = 0.0101), and ENV sad (df = 1, 6, F = 14.49, p = 0.0089). There were significant differences in emotional resolvability between first- and last-year musicians for TFS happy (df = 6, 18, F = 7.585, p = 0.0017), TFS sad (df = 6, 18, F = 4.574, p = 0.0150), ENV happy (df = 6, 18, F = 4.966, p = 0.0110), and ENV sad (df = 6, 18, F = 7.816, p = 0.0015).

#### Discrimination Curves for First- and Last-Year Musicians

When comparing first- and last-year musicians, we found a significant difference in discriminating uncertain emotion (df = 15, 90, F = 4.377, p < 0.0001). First- and lastyear musicians used TFS to discriminate happy and sad uncertain emotion differently, by approximately 2.51 ± 1.68% SD and 3.90 ± 2.30% SD, respectively, depending on the stimuli uncertainty (**Supplementary Figures S1a–f, S2a–f** – Discrimination curves). Males and females, irrespective of first/last year musical education used TFS to discriminate happy and sad uncertain emotion differently, by approximately 12.10 ± 4.08% SD and 4.24 ± 3.16% SD, respectively, depending on the increasing uncertainty of stimuli (**Supplementary Figures S2a–f** – Discrimination curves).

#### Average Discriminability A' and Benefit of TFS and ENV for First- and Last-Year Musicians

The corrected non-parametric measure of discriminability (A'), used for determining differences in emotional resolvability, found a significant difference between first-year and lastmusicians (**Figure 3b**). First-year musicians discriminability A' for sad was 0.5794 and last-year musicians discriminability A' for sad was 0.5632. However, first- and last-year musician discriminability A' for happy was 0.7725 and 0.7805, respectively. This represents a discriminability A' difference of 24.99% for first-year and 27.83% for last-year musicians. **Figure 3b** demonstrates discriminability A' for first-year and last-year musicians by uncertain emotion. Note that the blue portion of the curve for sad is missing, indicating a lack of discriminability. **Figure 3d** depicts the averaged normalized benefit of TFS or ENV to the emotion discrimination curve for musicians. The TFS was beneficial for emotion discriminability for first or last year musicians for sad and happy stimuli. First year musicians had 0.4813 and 0.3013 benefit for happy and sad, respectively. Last-year musicians had 0.4652 and 0.3094 benefit for happy and sad, respectively. The ENV was negatively beneficial for emotion discrimination for all stimuli for musically educated participants.

#### Poor and Good Performers Compared With First- and Last-Year Musicians Group Comparison Differences and Discriminant Analysis for Non-musicians and Musicians

Identification of happy differed significantly for non-musicians and musicians (df = 9, 54, F = 15.68, p < 0.0001) and by emotional resolvability (df = 6, 54, F = 315.5, p < 0.0001; **Figure 4a**). Identification of sad differed significantly between non-musicians and musicians (df = 9, 54, F = 3.526, p = 0.0017) and by emotional resolvability (df = 6, 54, F = 112.1, p < 0.0001; **Figure 4b**). **Figure 4** demonstrates the separation between the different groups.

Group separation and differences were based on emotional resolvability, and **Figure 4** shows the spread of separation weighted by the band decomposition as result of the canonical discriminant analysis. For happy, the greatest discriminant variables between groups were the original stimuli (Can1 = 0.6264) and nb64 (Can1 = –0.7641). For sad, the greatest discriminant variables were the original stimuli (Can1 = – 1.011) and nb64 (Can1 = 0.4193). For both sad and happy, the greatest discriminant variables were the original stimuli and the hybrid stimulus nb64, indicating that the original and most uncertain stimuli were most discriminable between our groups (**Figure 4**).

#### Average Discriminability A' of TFS and ENV for Non-musicians and Musicians

The differences in discriminability (A') between non-musicians and musicians (**Figures 3a,b**) for happy were statistically significant (df = 1.518, 9.109, F = 8.796, p = 0.0101), as were differences in A' as a function of emotional resolvability (df = 6, 42, F = 191.7, p < 0.0001). Differences in A' between nonmusicians and musicians (**Figures 3a,b**) for sad were statistically significant (df = 2.934, 17.61, F = 5.086, p = 0.0107), as were differences in A' as a function of emotional resolvability (df = 6, 42, F = 156.1, p < 0.0001).

#### Benefit of TFS and ENV for Non-musicians and Musicians

The averaged normalized benefit of happy TFS to the emotion discrimination curve (**Figures 3c,d**) was statistically different between non-musicians and musicians (df = 2.383, 14.30, F = 6.922, p = 0.0060), as was the difference in benefit of happy TFS to emotional resolvability (df = 6, 42, F = 298.5, p < 0.0001). The averaged normalized benefit of sad TFS to the emotion discrimination curve (**Figures 3c,d**) was not statistically different between non-musicians and musicians (df = 2.643, 15.86, F = 1.826, p = 0.1870), although the contribution to emotional resolvability was statistically

FIGURE 4 | Canonical discriminant analysis for (a) happy and (b) sad uncertain emotion discrimination. Group is sorted by poor performer, good performer, first year music education and last year music education with band number from Org (nb0) to nb64 in red. The plot shows the canonical scores for the groups. Scores are indicated by points and the canonical structure coefficients are indicated by vectors from the origin. Standardized beta coefficients are given for each variable in each discriminant (canonical) function, and the larger the standardized coefficient, the greater the contribution of the respective variable to the discrimination between groups. Here, the discriminant function coefficients denote the unique contribution of each variable to the discriminant function, while the structure coefficients denote the simple correlations between the variables and the functions. For happy, the greatest standardized beta coefficients for org was Can1 = 0.6264, and for nb64 was Can1 = –0.7641. For sad, the greatest standardized beta coefficients for org were Can1 = –1.011, and for nb64 was Can1 = 0.4193.

significant (df = 6, 42, F = 333.0, p < 0.0001). The averaged normalized benefit of happy ENV to the emotion discrimination curve (**Figures 3c,d**) was not statistically different between non-musicians and musicians (df = 2.638, 15.83, F = 1.708, p = 0.2087), although the contribution to emotional resolvability was statistically significant (df = 6, 42, F = 342.1, p < 0.0001). The averaged normalized benefit of sad ENV to the emotion discrimination curve (**Figures 3c,d**) was statistically different between non-musicians and musicians (df = 2.909, 17.45, F = 10.05, p = 0.0005), as was the difference in benefit of sad ENV to emotional resolvability (df = 6, 42, F = 450.0, p < 0.0001).

## DISCUSSION

The objective of the present experiment was to investigate certain and uncertain emotion in musical sounds, and determine if non-musicians and musicians resolve emotion differently. Here, stimuli that varied in emotional certainty were presented in a happy/sad interval forced-choice discrimination psychophysical task. There are three results of considerable interest: First, TFS information was essential to discriminating emotion in sound. Second, different emotional resolvability curves were found to depend on whether participants were poor or good performers and on year of musical education. Third, non-musicians used less TFS and had reduced emotional resolvability curves compared to musicians. The aim of the present experiment was to understand the cues used to resolve emotional signals at threshold and how they differ between nonmusicians and musicians.

## Resolving Emotion Using Psychoacoustic Cues

Emotion in sound is transmitted through TFS and ENV modulations. In a groundbreaking study, Lieberman and Michaels (1962) found that pitch aided in the identification of emotional content by 44% while amplitude cues added only 3% more. The present study found TFS cues essential and beneficial to discriminating emotion in musical excerpts, whether individuals were good performers, poor performers, or had musical training. The results indicate that TFS cues are required for resolving emotion in sound and individuals differ in their perceptive ability to discriminate these cues. Furthermore, happy emotion was discriminated with higher accuracy than sad emotion for all groups (**Figures 2a**, **3** and **Supplementary Figures S1**, **S2**). This is most likely due to individuals using major mode and the fast tempo of tones for discriminating emotion in sound, which are prominent in happy stimuli (Peretz et al., 2001; Hopyan et al., 2016). However, how these cues determine specific emotions and are perceived by individuals is not completely understood. For example, individuals differ in their tendency to report the cooccurrence of discrete emotions of the same valence (Barrett, 1998). Individuals vary in the extent to which they distinguish between like-valence discrete emotions or did not distinguish between like-valence emotions when reporting on their subjective experience (Barrett, 1998). The results indicate that individuals are reporting several affective states together, or it may indicate they are not distinguishing between distinct emotional states. The aforementioned manuscript (Barrett, 1998) bolstered support for both the theory of discrete emotion, where individuals label emotions based on determining a subjective level of arousal,

and the dimensional theory of emotion, where individuals focus on the subjective emotional experience dimensionalized by valence, arousal, and intensity of the affective state (Russell, 1980; Barrett, 1998). The present study found emotional resolvability changed as a function of altering the TFS content of the musical excerpt, revealing that an essential cue to discriminating emotion is fine structure. Recently, a study investigating the similarities/dissimilarities of emotion in music and speech prosody found that the psychoacoustic features implicated were loudness, tempo and speech rate, melodic and prosodic contour, spectral centroid, and sharpness (Coutinho and Dibben, 2013). In contrast, the features distinct to music and speech were spectral flux and roughness, respectively. Here, the authors indicated that emotional cues in sound are encoded as psychoacoustic spatiotemporal patterns, which for music and speech rely heavily on their "shared acoustic profiles" (Coutinho and Dibben, 2013). We encourage research into determining what constitutes an emotion from non-emotion sound (Pfeifer, 1988; Barrett, 1998; LeDoux, 2000; Phan et al., 2002; Wildgruber et al., 2005; Koelsch, 2014), which will enable a more thorough classification of the neurobiology of emotion. Future studies should further explore the psychoacoustic foundations of emotion.

### Musicians Compared With Non-musicians

Musicians discriminate emotion differently, likely due to their unique education. For example, in the speech-in-noise and hearing-in-noise test, musicians perform significantly better than non-musicians, derived in part from musicians' enhanced working memory and frequency discrimination (Parbery-Clark et al., 2009a,b). Musicians in the present study discriminated emotion in sound differently than non-musicians by using more TFS through each of the nb decompositions. Within the group of musicians, last-year musically educated individuals discriminated happy or sad excerpts somewhat differently than first-year musically educated individuals. Although the greatest difference was in male and female musicians, the difference between musicians and non-musicians reveals the most, as musicians benefited more from TFS components. For example, musicians discriminating happy or sad excerpts utilized more TFS irrespective of whether individuals were in the first or last year of their music education. Recent discrimination tasks bolster these results. In a study where participants were tasked to detect frequency changes in quiet and noisy conditions, the acoustic change complex, a type of late auditory evoked potential, showed a larger P2' amplitude in musicians than in non-musicians (Liang et al., 2016). Moreover, in a task where target speech and competing speech were presented with either their natural F0 contours or on a monotone F0, and with F0 difference between the target and masker systematically varied, F0 discrimination was significantly better for musicians (Madsen et al., 2017). Most of these frequency discrimination tasks indicate that musicians have an enhanced ability to perceive or discriminate TFS or fine structure components. Future studies should expand the range and variety of emotion discrimination paradigms, to explore differences between musicians and non-musicians.

#### Study Limitations and Future Directions

The present study concentrated on analyzing two emotions elicited by classical music excerpts. Constraining the variety of emotion to a dichotomous task is artificial, but aides in discerning how the basic components of sound cue emotion. Future studies should analyze the diverse emotional repertoire that exists in humans. The present study analyzed non-musicians and individuals with musical education. We chose these groups based on previous literature (Micheyl et al., 2006; Musacchia et al., 2007; Wong et al., 2007; Parbery-Clark et al., 2009a,b; Kraus and Chandrasekaran, 2010; Strait et al., 2010; Bas˛kent and Gaudrain, 2016; Liang et al., 2016; Mandikal Vasuki et al., 2016; Coffey et al., 2017; Madsen et al., 2017), suggesting music training would influence acoustic perception in the emotional resolvability task. This manuscript found that years in music education significantly affected emotional resolvability (F15,<sup>90</sup> = 4.377, p < 0.0001), with advanced musicians using more fine structure to discriminate happy uncertain emotion by 2.51 ± 1.68%. Future studies could analyze different musicians (piano versus string) to determine if emotional resolvability differences are associated with the type of instrument training. The present manuscript assessed males and females separately, as it is known that gender differences in the perception of non-target emotions (incorrect) are greater for men than women (Fischer et al., 2018). Further, our entire cohort of subjects (n = 64) was derived from a Spanish speaking population. In this regard, identifying emotion is easier for listening individuals with similar cultural and language backgrounds (Waaramaa and Leisiö, 2013) and a second language is known to interfere with emotion recognition from speech prosody (Bhatara et al., 2016). The repertoire of music used in the present study was classical, which was familiar to all participants; therefore, we believe the effect due to cultural background should be minimal. Future studies could further assess these confounding variables to determine their affect on uncertain musical emotion.

### DATA AVAILABILITY

Data and sound files used in this work can be downloaded in an anonymized format from the Open Science Framework: Manno, Francis A. M. 2018. "Music Psychophysics." OSF. November 20. https://osf.io/8ws7a.

## ETHICS STATEMENT

The studies involving human participants were reviewed and approved by the Internal Review Board of the Instituto de Neurobiología, Universidad Nacional Autónoma de México. The participants provided their written informed consent to participate in this study.

### AUTHOR CONTRIBUTIONS

FM designed the experiments, interviewed participants, executed experiments, analyzed the data, and wrote the manuscript. FM and RC assisted in experimental design and analysis, and revised the document. CL and FB supervised the experiments, assisted with data curation and analysis, and assisted in writing the manuscript.

#### FUNDING

We thank the Consejo Nacional de Ciencia y Tecnología (CONACyT) for the funding received via the grant CB255462 to FB.

#### ACKNOWLEDGMENTS

fnins-13-00902 September 14, 2019 Time: 12:26 # 11

The authors are grateful to L. González-Santos, M.Sc., Dr. S. Hernández-Cortés Manno, and Chair Professor S. H. Cheng for support, and Z. Gracia-Tabuenca, Jorge Gámez, and Prof. F. King for their comments, coding, statistical expertise, and mathematical assistance. The authors thank J. G. Norris for editing the manuscript. The authors are also grateful to the students of the Facultad de Bellas Artes and the Facultad de Ciencias Naturales of the Universidad Autónoma de Querétaro for their participation in the study.

#### REFERENCES


#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins. 2019.00902/full#supplementary-material

FIGURE S1 | Discrimination curves and discrimination normalized ratio indices for non-musicians. Discrimination curves of percent identification of happy or sad stimuli. (a) Average male poor performers, (b) average male good performers, (c) average female poor performers, and (d) average female good performers. Sad curve represented with blue line and happy curve represented with red line. The discrimination normalized ratio of stimuli identification of happy or sad stimuli. (e) Male poor performers, (f) male good performers, (g) female poor performers, and (h) female good performers. Sad discrimination ratios are represented in red and happy discrimination ratios are represented in blue.

FIGURE S2 | Discrimination curves and discrimination normalized ratio indices for musicians in their first or last year of study. Discrimination curves of percent identification of happy or sad stimuli. (a) Average first year (low) and (d) average last year music education (high). Male (b) and female (c) first year and male (e) and female (f) last year music education. The discrimination normalized ratio of stimuli identification of happy or sad stimuli (g) through (j). Sad curve represented with blue line and happy curve represented with red line. (g) Male first year and (h) male last year. (i) Female first year and (j) female last year.

TABLE S1 | Montreal emotional identification task acoustic stimuli.

DATA SHEET S1 | Contains extended analyses.

DATA SHEET S2 | Contains the original data used for all analyses in a csv file.



investigation: Estonia, Finland, Sweden, Russia, and the USA. Front. Psychol. 4:344. doi: 10.3389/fpsyg.2013.00344


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Manno, Cruces, Lau and Barrios. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Auditory Categorization of Man-Made Sounds Versus Natural Sounds by Means of MEG Functional Brain Connectivity

Vasiliki Salvari<sup>1</sup> , Evangelos Paraskevopoulos1,2, Nikolas Chalas<sup>2</sup> , Kilian Müller<sup>1</sup> , Andreas Wollbrink<sup>1</sup> , Christian Dobel<sup>3</sup> , Daniela Korth<sup>3</sup> and Christo Pantev<sup>1</sup> \*

1 Institute for Biomagnetism and Biosignalanalysis, University of Münster, Münster, Germany, <sup>2</sup> School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, Thessaloniki, Greece, <sup>3</sup> Department of Otorhinolaryngology, Friedrich-Schiller University of Jena, Jena, Germany

#### Edited by:

Claude Alain, Rotman Research Institute (RRI), Canada

#### Reviewed by:

Jochen Kaiser, Goethe University Frankfurt, Germany Fernando R. Nodal, University of Oxford, United Kingdom

> \*Correspondence: Christo Pantev pantev@uni-muenster.de

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 10 June 2019 Accepted: 19 September 2019 Published: 04 October 2019

#### Citation:

Salvari V, Paraskevopoulos E, Chalas N, Müller K, Wollbrink A, Dobel C, Korth D and Pantev C (2019) Auditory Categorization of Man-Made Sounds Versus Natural Sounds by Means of MEG Functional Brain Connectivity. Front. Neurosci. 13:1052. doi: 10.3389/fnins.2019.01052 Previous neuroimaging studies have shown that sounds can be discriminated due to living-related or man-made-related characteristics and involve different brain regions. However, these studies have mainly provided source space analyses, which offer simple maps of activated brain regions but do not explain how regions of a distributed system are functionally organized under a specific task. In the present study, we aimed to further examine the functional connectivity of the auditory processing pathway across different categories of non-speech sounds in healthy adults, by means of MEG. Our analyses demonstrated significant activation and interconnection differences between living and man-made object sounds, in the prefrontal areas, anterior-superior temporal gyrus (aSTG), posterior cingulate cortex (PCC), and supramarginal gyrus (SMG), occurring within 80–120 ms post-stimulus interval. Current findings replicated previous ones, in that other regions beyond the auditory cortex are involved during auditory processing. According to the functional connectivity analysis, differential brain networks across the categories exist, which proposes that sound category discrimination processing relies on distinct cortical networks, a notion that has been strongly argued in the literature also in relation to the visual system.

Keywords: auditory perception, functional connectivity, magnetoencephalography, sound discrimination, manmade sound, natural sound

## INTRODUCTION

Current knowledge on neuronal networks underlying auditory perception remains fragmentary, despite the fact that audition has been extensively studied (Zatorre et al., 2002). The basic network properties that have been suggested for the auditory modality resemble the structure of the visual one, dividing the information processing pathways in dorsal and ventral networks, corresponding to the processing of information to "where" and "what," respectively (Romanski et al., 2000; Kubovy and Van Valkenburg, 2001; Arnott et al., 2004; Huddleston et al., 2008; Asplund et al., 2010). Following the "what" pathway, the physical characteristics of the sound stimulus are initially encoded in the primary and secondary auditory cortex, along with their associative areas, prior to their integration into a more abstract representation (Griffiths and Warren, 2004; Bregman, 2017). Within this pathway, the processing of auditory information seems

to be performed in sound category specific channels (Caramazza and Mahon, 2003). In this line, suggestions have been raised to propose functional specificity for processing different types of sounds (Belin et al., 2000; Zatorre and Belin, 2001; Patterson et al., 2002; Lewis et al., 2004; Zatorre et al., 2004; Hunter et al., 2010).

Attempts to dissociate the processing of different sound categories at the cortical level have been made in brain-lesion case studies (Clarke et al., 2000; Taniwaki et al., 2000; Mendez, 2001; Steinke et al., 2001). Cases like auditory agnosia, represent the impaired ability to recognize sounds, when peripheral hearing is intact. However, this impairment does not necessarily apply to all sound categories; it may rather be category-specific depending on the brain damage. For instance, a patient with focal damage in the right fronto-parietal area was able to identify environmental sounds and name musical instruments, but could not recognize music (Steinke et al., 2001). On the other hand, left temporal lesion or left fronto-temporal ischemia have caused agnosia restricted to environmental sounds (Clarke et al., 2000). It should be emphasized though, that single-case studies of brainlesioned patients are very heterogeneous and therefore they cannot provide a detailed model of cortical sound processing.

Several functional neuroimaging studies with healthy participants indicate bilateral auditory cortex activation for speech sounds (Belin et al., 2000; Zatorre and Belin, 2001) and right-lateralized activation for non-speech sounds during sound discrimination tasks (for review see Tervaniemi and Hugdahl, 2003). Other regions of the brain, such as the inferior frontal cortex have been also reported to be involved, indicating a network that involves further cognitive functions, beyond the auditory ones (for review see Price, 2012). So far, the majority of current studies are mainly focused in the differential processing of speech versus non-speech sound categories, though we still have poor knowledge about the differential processing within non-speech sound category. The existing studies have shown that sound category discrimination depends more likely on its associated manipulative characteristics (Lewis et al., 2004, 2005; Murray et al., 2006; De Lucia et al., 2009, 2012). For instance, in the context of playing a guitar, we listen to the sound while we perceive motor and visual actions of the guitar playing. Thus, at a higher cognitive level information from all sensory modalities that receive input from a stimulus are integrated in order to construct the percept of the sound. Similar model (Heekeren et al., 2008) has been proposed in the past for the visual and somatosensory system, indicating a multisensory integration already in low level early stages of cognitive processing.

Evidence on the functional organization of auditory perception shows that sounds can be categorized into "living" and "man-made" stimuli (Lewis et al., 2005), suggesting differential brain activation. In particular, a man-made object in comparison to a sound of an animal, might require a top-down mechanism which integrates semantic and multisensory features associated more with action. Similarly, Murray et al. (2006) demonstrated by means of EEG, that "man-made" sounds display stronger brain activation in the auditory "what" pathway compared to the "living" objects, and that regions of the right hemisphere and premotor cortices were mainly involved. Other differentiations have been also reported within the category of man-made objects. The main idea is that daily used object sounds, such as the phone ringing, might trigger more response of action than a typical tone of a musical instrument (in non-musicians) and, thus stronger brain activation will be elicited (De Lucia et al., 2009). Interestingly, EEG studies focusing on the temporal dynamics have shown that the category discrimination process occurs around the N1 component, already 70 ms after the stimulus onset (Murray et al., 2006; De Lucia et al., 2012).

Nevertheless, the way distinct networks operate across different categories of sounds is still poorly understood. Although, the aforementioned studies have given some insights of when and where this differentiation appears, the source space analyses offer simple maps of activated brain regions, rather than indicating how these regions of a distributed system are functionally connected to execute a specific task. Up to date, the investigation of complex networks has been developed methodologically, given the opportunity to study cortical reorganization underpinning associated cognitive processes (Rogers et al., 2007; Bullmore and Sporns, 2009). Therefore, in the current study we aimed to further investigate the functional connectivity of the auditory processing pathway across different non-speech sound categories. The cortical responses of three different categories of sounds were compared. Namely, the Musical and the Artificial category (sounds of daily used/heard objects), representing the man-made-objects sound categories and the Natural category (mainly animal vocalizations). According to our knowledge, this is the first study to investigate functional connectivity across different nonspeech sound categories by means of magnetoencephalography (MEG), which has high spatial resolution and excellent temporal resolution. Taking into consideration the existing literature about living versus man-made-related sounds, we expected that the Musical and the Artificial sounds would demonstrate stronger cortical responses, that would involve motor related regions and significant interconnections among these regions in comparison to the Natural sounds. Further, it should be noted that although the Musical and the Artificial sounds belong to the manmade category, evidence for differential activation between these groups has been previously reported (De Lucia et al., 2009) as a function of daily use. The N1 auditory evoked field was a priori set as the time interval of interest based on previous electrophysiological findings that report early responses in the sound category discrimination processing (Murray et al., 2006; De Lucia et al., 2009, 2012).

### MATERIALS AND METHODS

#### Subjects

The current study was conducted with a sample of 20 young adults (mean age = 27.19, SD = 5.59, 8 males). They were recruited from the pool of subjects of our institute among those who had normal hearing, according to a clinical audiometric evaluation. All subjects were right handed, according to the Edinburgh Handedness Inventory (Oldfield, 1971). The participants were informed about the aim of the study and the ones willing to participate were provided with a consent

form that ensured the confidentiality of their identity. The study was performed according to the Declaration of Helsinki and approved by the ethics committee of the Medical faculty of the University of Münster.

#### Stimuli

The stimuli consisted of three different categories of sounds: Natural, Musical, and Artificial. The Natural and Artificial sounds were recordings obtained from online sound databases (Free Sounds Effects<sup>1</sup> ; SoundBible<sup>2</sup> ; ZapSplat<sup>3</sup> ). The Musical sounds were obtained from "McGill University Master samples" sound bank that have been created for perceptual research related to the psychology of music. The Audacity software<sup>4</sup> was used to resample all sounds at 44,100 Hz and to implement onset/offset linear slopes of 20 ms. Then, the mono sounds were converted into stereo sounds. By means of the WavePad Sound Editor<sup>5</sup> they were normalized by −10 dB RMS based on the Average Loudness normalization method.

The stimulus paradigm was performed via Presentation software (Version 18.0, Neurobehavioral systems, Inc., Berkeley, CA, United States)<sup>6</sup> . It consisted of two blocks with a short break in between. Each block included the presentation of the three different categories of sounds that were pseudorandomly presented across blocks and across subjects, whereas the sounds of each category were presented always in the same order: A = Artificial, M = Musical and N = Natural; Block1: A1-A2-. . .-An-M1-M2-. . .-Mn-N1-N2-. . .-Nn; Block2: M1-M2- . . .-Mn-N1-N2-. . .-Nn-A1-A2-. . .-An. Each block contained 144 stimuli, 48 for each category, that makes a total of 288 stimuli for the whole experiment; 48 (sounds per category) × 3 (categories) × 2 (blocks). Each block contained 144 stimuli, 48 for each category, that makes a total of 288 stimuli for the whole experiment; 48 (sounds per category) × 3 (categories) × 2 (blocks). Each stimulus lasted for 1 s with a randomized Inter-Stimulus Interval (ISI) between 0.7 and 1.3 s, in order to avoid expectancy and rhythmicity. The Natural sounds contained sounds of living objects. The Musical sounds contained notes of different musical instruments, whereas the Artificial sounds were daily object-like sounds. Examples of the sound files used in the study can be found in the **Supplementary Material**.

### MEG Recordings

Participants were examined in a magnetically shielded and acoustically quiet room by means of 275 channel whole-head system (OMEGA 275, CTF, VSM Medtech Ltd., Vancouver, BC, Canada). Data were continuously recorded with a sampling frequency of 600 Hz resulting in an off-line cut-off frequency of 150 Hz. Participants were seated in an upright position and their head was stabilized with cotton pads inside the MEG helmet. A silent movie was presented on a projector screen mounted on the MEG system gantry, placed according to participants' best view angle, in order to keep them staying vigilant during the experiment; as been applied in previous auditory experiments (Pantev et al., 2004; Okamoto et al., 2008; Paraskevopoulos et al., 2018). After passing electro-static transducers the auditory stimuli were delivered via silicon tubes of 60 cm length and an inner diameter of 5 mm ending with a silicon earpiece fitted individually to each subject's ear. Prior to the stimulation, an audiological hearing threshold determination test with 5 dB accuracy on 1 kH frequency, was conducted. Stimulus sound pressure levels were set to 60 dB SL above the individual hearing threshold. The whole experiment lasted for approximately 30 min.

#### MRI Protocol

A T1-weighted MR image was performed for all participants, in a 3 Tesla scanner (Gyroscan Intera T30, Philips), in order to obtain the individuals' Finite Element Model (FEM) of the head. The files gave images of 400 one layer-slices with 0.5 mm thickness in the sagittal plane (TR = 7.33.64 ms, TE = 3.31 ms). The matrix size of each slice was 512 × 512 with voxel size of 0.5 × 0.58 × 0.58 mm<sup>3</sup> . To ensure the reliability of investigation on brain structure within and across subjects, we used SPM12 (Statistical Parametric mapping)<sup>7</sup> to regulate for intensity inhomogeneity (Ganzetti et al., 2016) and therefore, the images were resliced to isotropic voxels of 2 × 2 × 2 mm.

#### MEG Data Analysis

The analysis of the MEG data was run according to a previously developed analysis applied for functional connectivity networks under different auditory paradigms (Paraskevopoulos et al., 2015, 2018). The Brain Electrical Source Analysis software (BESA MRI, version 2.0, Megis Software, Heidelberg, Germany) was used to compute the individual's head model by segmenting four different head tissues (scalp, skull, CSF, and brain), based on the FEM. The four-layer FEM model gives more precise results as compared to other models, since it includes the CSF (Ramon et al., 2004; Wendel et al., 2008), which is a highly conductive layer and important for MEG source reconstruction (Wolters et al., 2006). The MEG sensors were co-registered and adjusted to the individuals' structural MRI via the nasion, and the left and the right entries of the ear-canals as landmarks. By means of 3D spline interpolation the MRIs were transformed to ACPC (anterior-posterior cingulate) and to Talairach space. A predefined option for conductivity values (c.f. Wolters et al., 2006) was set for the skin compartment to 0.33 S/m, for the skull to 0.0042 S/m, for the CSF to 0.79 S/m and for the brain tissue to 0.33 S/m.

For the pre-processing of the MEG data, the BESA research software (version 6.0, Megis Software, Heidelberg, Germany) was used. For artifact rejection, an automated electrocardiogram (ECG) and eye blinks artifact detection and correction provided by BESA (Ille et al., 2002) was applied. Data were filtered offline with a 50 Hz notch filter, zero-phase low-pass filter of 45 Hz and zero-phase high-pass filter of 0.5 Hz. The data were divided

<sup>1</sup>www.freesounds.org

<sup>2</sup>www.soundbible.com

<sup>3</sup>www.zapsplat.com

<sup>4</sup>www.audacityteam.org

<sup>5</sup>www.nch.com.au <sup>6</sup>www.neurobs.com

<sup>7</sup>www.fil.ion.ucl.ac.uk

into epochs of 1000 ms post- and 500 ms pre-stimulus onset. A baseline correction based on a 100 ms pre-stimulus interval was applied. During the averaging of the stimulus related epochs, trials having amplitudes larger than 3 pT and data exceeding the 15% of rejected trials, were excluded from the analysis. The two measurement blocks were then averaged for each participant in order to improve the signal-to-noise ratio.

For the current density reconstruction, we used a time window around the N1 major component of the slow auditory evoked field (Pantev et al., 1993; c.f. **Figure 2**), which according to the global field power of the grand average data was between 80 ms and 120 ms after stimulus onset, including the rising slope of the N1. Low Resolution Electromagnetic Tomography (LORETA) provided by BESA was applied for the source reconstruction, for each subject and each category of sounds as it provides smooth distribution of sources as inverse solution (Pascual-Marqui et al., 1994). It is based on the weighted minimum norm method (Grech et al., 2008) and it does not rely on an a priori determination of activated sources.

#### Statistical Analysis

For the statistical analysis of the LORETA reconstruction, we used the SPM12 running on Matlab software (R2016b version; MathWorks Inc., Natick, MA, United States). An explicit mask was set, to include results only for the gray matter, thus decreasing the search volume. One-way ANOVA analysis was run with the three different categories of sounds (Natural, Artificial, and Musical) as within-subjects factor. F- and t-contrast matrix-tables were then designed (based on the general linear model) to test for statistical differences across the three categories and betweencategories, respectively. For multiple comparisons control, the Family Wise Error (FWE) was implemented.

#### Connectivity Analysis

In order to examine the cortical network across the significant sources derived from the SPM12 analysis, we further implemented a connectivity analysis. Having defined the activated regions in source space via the above described analysis, we employed an equivalent current dipole model by setting one dipole to the peak of each significant cluster derived by the F-contrast. This resulted into five equivalent current dipoles in total. Due to the fact that SPM expresses coordinates based on standardized brains by the Montreal Neurological Institute (MNI coordinates), the coordinates were transferred to Talairach space to fit the brain coordinates of BESA software, where the dipole model was run. For the conversion the "NMI2TAL" applet of the Yale BioImage Suite Package was used (sprout022.sprout.yale.edu), which is based on the Lacadie et al.'s (2008) mapping coordinates. The orientation of the dipoles was fitted based on the individuals FEM volume conductor, whereas the coordinates were fixed across all subjects and conditions, as defined above. The results contained five source waveforms that corresponded to each dipole including the 80–120 ms interval.

The HERMES toolbox (Niso et al., 2013) of Matlab was used to construct a 5 × 5 adjacency matrix for each subject and each condition based on the Mutual Information (MI) algorithm, which measures the mutual dependence between variables and it detects correlations of random variables with nonlinear dependence measure (Zeng, 2015). The results were then transferred to the Network Based Statistic Toolbox (NBS; Zalesky et al., 2010) to examine statistically significant connections. One way within-subjects ANOVA was run with the three conditions as the within-subjects factor. The NBS method was set for multiple correction at the significant level of p > 0.05 (see **Figure 1** for analysis pipeline tools). This resulted in a functional connectivity graph with nodes and edges representing the significant activated regions and their significant interconnection, respectively.

## RESULTS

#### Source Space

The N1 auditory evoked field was a priori set as the time window of interest. The root-mean square time series of the grand average across subjects was computed in sensor space to depict the time window in ms around the a prioriset N1 auditory field maximum. The **Figure 2** illustrates the mean of the root-mean square values of each sound category as well as the maximum and minimum values of their confidence intervals. The 80–120 ms interval was determined for the following source reconstructions where we performed F- and t- contrast statistics by means of the SPM12 software. **Figure 3** and **Table 1** illustrate the significant clusters obtained by the Musical 6= Artificial 6= Natural contrast. The biggest in size cluster involved parts of the right and left frontal cortex, as well as, parts of the temporal lobe. The peak of the current cluster was located in the anterior part of the right temporal cortex, in the most dorsal area of the superior temporal gyrus (STG) (x = 43, y = 14, z = −29; F(1, 20) = 13.1, cluster size = 3128, p < 0.001 FWE corrected at cluster level). A second cluster was located in the right inferior parietal lobe, with the peak in the right supramarginal gyrus (SMG) (coordinates: x = 56, y = −27, z = 27; F(1, 20) = 12.80, cluster size = 513, p < 0.001 FWE corrected at cluster level). The third cluster was located in the posterior cingulate cortex (PCC; overlapping with cluster two in the figure), which involved the posterior cingulate gyrus and medial part of the parietal lobe (coordinates: x = 10, y = −52, z = 39, F(1, 20) = 12.42, cluster size = 650, p < 0.001 FWE corrected at cluster level).

For the between groups comparisons, the results revealed significant differences for both Musical > Natural and Artificial > Natural comparisons as demonstrated in **Figure 4** and **Table 2**, depicting the coordinates and the source mapping results for both comparisons. In more detail, two significant clusters were obtained for Musical > Natural sounds. The biggest cluster involved regions of left and right frontal cortex, as well as, temporal cortex and SMG. The peak of the current cluster was located at the SMG (x = 56, y = −27, z = 27, t(20) = 5.05, cluster size: 109960, p > 0.001 FWE corrected at cluster level). The second cluster was located in the PCC (coordinates: x = 8, y = −54, z = 40, t(20) = 4.61, cluster size = 3142, p < 0.001 FWE corrected at cluster level). For the Artificial > Natural contrast we found two clusters revealing significant differential activation; one significant peak was located at the PCC (coordinates: x = 10,

connectivity was performed in the NBS toolbox and the data were transferred to the BrainNet toolbox to visualize the statically significant connectivity networks.

y = −37, z = 39, t(20) = 4.09, cluster size = 675, p = 0.001 FWE corrected at cluster level) and the other one at the medial prefrontal cortex mPFC (coordinates: x = 13, y = 39, z = 18, t(20) = 4.08, cluster size = 1007, p < 0.001 FWE corrected at cluster level).

### Connectivity Results

For the connectivity analysis, five equivalent current dipoles were set to the peaks of the significantly differential activated areas as derived from the F- contrast. These are the PCC, the aSTG, the SMG and the bilateral prefrontal cortex. Three connectivity analyses were conducted based on the significant interaction found in the source space analysis, p < 0.05, NBS corrected (Network-Based Statistics; Zalesky et al., 2010).

For the Musical 6= Artificial 6= Natural contrast, the results revealed connections of all the nodes having six edges in total. As **Figure 5** demonstrates, edges between PCC and right anterior superior temporal gyrus (aSTG), as well as, between left ventral mPFC and right medial dorsal prefrontal cortex had the strongest activation (as indicated by the F-value). The left mPFC yielded significant differential connections with all the nodes located in the right hemisphere and it was the only one connecting with the SMG.

In the Music-versus-Natural comparison a significant differential network of 10 edges was yielded with all the nodes in the network being interconnected (c.f. **Figure 6**). According to the t-value, increased functional connectivity was demonstrated between the right mPFC and the PCC, as well as, between the PCC and the aSTG. With smaller t-value, the left mPFC yielded significant interconnections with the right PFC, the aSTG, the SMG and the PCC. Weaker interconnections were obtained between the SMG and the aSTG, the PCC, as well as the left mPFC.

With regard to Artificial-versus-Natural comparisons (c.f. **Figure 6**), the results showed significant differential interconnections among all the nodes having nine edges in total. In detail, the edge-connection of the prefrontal inter-hemisphere and the connection between the aSTG and PCC nodes, were the most pronounced according to the t-value. Slighter in strength edges were revealed for the rest of interconnections, but no significant interconnection was found between PCC and SMG.

### DISCUSSION

The current study examined neural responses in the processing of different categories of sounds, by means of MEG. Further, functional connectivity analysis based on the MI algorithm was used to depict the differential connectome of the regions involved in the discrimination of Natural, Musical, and Artificial sound-category. The obtained results demonstrated statistically significant differences across the different sound conditions in the superior temporal cortex, the posterior cingulate, the inferior parietal and the bilateral prefrontal cortex, between 80 and

120 ms after stimulus onset. Direct comparisons across the different categories, showed that both Musical and Artificial sounds demonstrated statistically significant differences and more enhanced brain activation in source space and connectivity analysis, when compared to Natural sounds category.

Our finding that regions of parietal cortex showed significant category modulation of activation, as derived from the contrast analysis, it is consistent with several neuroimaging studies investigating the organization of cortical auditory perception (Lewis et al., 2005; Murray et al., 2006; Staeren et al., 2009). In our study, we found two regions in the parietal lobe to be involved in the differential processing of the assessed sound categories, namely, the PCC and the right inferior parietal cortex (IPC). The peak of the IPC cluster was located in the SMG, which is part of the somatosensory cortex (Buccino et al., 2001). In fact, it is suggested to be involved in action representation and specifically in the mental representation of movement (Chong et al., 2008; Tunik et al., 2008). Within the framework of this interpretation, any event or stimuli containing actions would engage motor related regions even in the absence of tactile stimulation (Nilsson et al., 2000). Additional evidence associates the SMG with the mirror neuron system, a system involved in imitating and identifying the actions of other persons (Carlson, 2012). On this basis, the right SMG has been suggested to be especially involved in motor planning, even when the actions are just observed and not necessarily executed (Chong et al., 2008). It seems that in the mental representation, information of how a tool is manipulated, visualized or how the sound is produced, are retrieved and integrated.

Apart from the parietal cortex, our connectivity results further stressed the involvement of the prefrontal cortex and the STG in the processing of different sound categories. Significant functional connectivity between prefrontal cortex and parietal cortex has been demonstrated in the past, suggesting that this network is part of the working memory network, linking perception and higher order cognition. This is rather multimodal and regards both auditory and visual object processing (Husain et al., 2004; Husain and Horwitz, 2006). With respect to the STG, most of neuroimaging studies have shown the significant involvement of this region in the auditory processing. However, according to the meta-analysis of Arnott et al. (2004; including 38 studies of different neuroimaging techniques) it


#### TABLE 1 | Significant clusters based on the F-contrast.

The locations of clusters with corresponding MNI coordinates of significant different cortical regions as derived by the source space analysis for the Musical 6= Artificial 6= Natural contrast are given. Cluster size in voxel, F- and p-values are also depicted, FEW corrected at p < 0.001 level of significance.

seems that the anterior area of the STG, as demonstrated here, is associated to greater extend with the discrimination of sounds. In line with our connectivity results, the STG region has been found to interact significantly with the prefrontal cortex during category discrimination task (Husain and Horwitz, 2006). The fact that our connectivity pattern was lateralized on the right hemisphere is consistent with the suggestion that this hemisphere is more strongly activated, when non-speech or non-living stimuli are represented (see Zatorre et al., 2002; Tervaniemi and Hugdahl, 2003). This derives not only from brain lesion studies, but also from neuroimaging studies with healthy participants (Belin and Zatorre, 2003; Murray et al., 2006).

Another possible interpretation of our results is that the processing of Musical and Artificial sounds involves more the default mode network (DMN). The PCC area, as derived by our results, has been strongly suggested to have a central role in the DMN (for review see Leech and Sharp, 2013) with the inferior parietal lobe, prefrontal cortex and temporal cortex (however, more medial structures) as the main nodes for intrinsic connections (Raichle et al., 2001). Activation of DMN has been mainly observed under task-free conditions when attention to external stimuli is not required (for review see Buckner et al., 2008). However, increased activity during cognitive processing which requires internal attention is also found, such as memory retrieval and planning (Spreng, 2012). From this view, the processing of man-made objects sounds might require a more top-down processing of information than the living-object sounds do. Moreover, the PCC has been highly associated with

#### TABLE 2 | Significant clusters of the between-subject comparisons.


The location of clusters with the corresponding MNI coordinates of the significant different cortical regions as derived by the source space analysis for the Musical-versus-Natural and Artificial-versus-Natural t-tests are shown. Cluster size, t- and p-values are also depicted, FEW corrected at p < 0.001 level of significance.

emotional processing and arousal state with higher responses and large-scale functional connectivity under arousal state and decreased on sleeping state (Leech and Sharp, 2013). As such, stimulation with Musical and Artificial sounds might evoke higharousal emotions in relation to natural sounds. Nevertheless, the lack of a relevant behavioral measurements does not allow for further interpretation on this assumption and it is beyond the scope of this paper. In the future, a behavioral test assessing the levels of emotional arousal to different sounds would give us some insights on whether this factor affects significantly the differential brain activation.

Our main hypothesis was that the man-made related sounds would differentiate from the Natural sounds due to the mental representation of motor characteristics, that was indeed more pronounced. We further aimed to investigate whether statistical differences also apply to the comparison between Musical and Artificial as it has been previously reported; based on the assumption that the sounds of daily-used objects might trigger more the response of action compare to the musical instruments (in non-musicians). Our results did not yield any significant differences in source space analysis, however, in the connectivity analysis, even though the Musical-versus-Natural and Artificialversus-Natural demonstrated similar connectivity patterns, the strength of the interconnections (as given by the thickness of the edge) was different. For instance, in the Artificial-versus-Natural network, the inter-hemispheric prefrontal edge and the STG-PCC edge demonstrated stronger interconnections among the rest, whereas the Musical-versus-Natural network yielded a more distributed intensity across the interconnections with smaller values. A previous study (De Lucia et al., 2009) has shown differential discrimination between musical and tool sounds only after 300 ms stimulus onset. On this basis, it might be that the

current dipoles. The edges are weighted and colored, according to their connectivity strength, as indicated by the F-value of the color scale. The networks are significant at p < 0.05, NBS corrected. The upper-left figure depicts the frontal coronal view, the upper-right one displays the axial plane viewed from the top and the figure at the bottom illustrates the right sagittal plane.

discrimination of broader categories of sounds, such as between living and man-made sounds already occur on early responses, though discrimination of subcategories could be followed by later responses. It would be interesting in the future to investigate later time intervals in which higher-level conceptual processes might be needed for the discrimination of man-made subcategories. It should be mentioned though that the Artificial category in our study, contained sounds that are generally man-made objects, however, some of them were less manipulative (e.g., ambulance siren). The absence of a stricter categorization based on a behavioral assessment that would divide the objects based on familiarity and the frequency of use, might have limited our interpretation regarding the significant differences between sound categories. The examination of the cortical regions in response to sound familiarity, recognizability and attention might give insights on the role that the "what" auditory neuronal pathway has in sound processing. Previous studies have shown the importance of sound novelty, task demands and attention in sound discrimination (Levy et al., 2001, 2003). However, others argue that the discrimination of living and man-made objects is present independently of behavioral proficiency, and hence familiarity (De Lucia et al., 2012), when correctly categorized sounds were compared with incorrectly categorized sounds as well as independently of consciousness (Cossy et al., 2014) when comatose patients were examined. This is still on debate in the literature and should be addressed in future studies.

A possible limitation of the current study might be that the categorization of complex sounds could be confounded by the physical differences of stimuli (Belin et al., 2000; Staeren et al., 2009), something that by purpose was not controlled here, in order to avoid any distortion in the quality of the sounds. However, natural sounds differ from human-made and more synthetic sounds by nature which we cannot manipulate. According to Theunissen and Elie's (2014) review, the natural sounds consist of statistical properties that follow a power law relationship and are optimally encoded in the ascending auditory processing system in contrast to the sounds with more random and flat envelope power spectra (such as the sounds in the Artificial category). Nevertheless, similar categorization of sounds has been administered in the past and our results are in consistence (Murray et al., 2006).

The results of our study suggest that the dissociation between living and man-made objects, is based on distinct neuronal processing. However, the reason of this phenomenon is still questioned. From another perspective, it might be that sounds have been conceptually specialized in the processing of different categories of sounds due to evolutionary adaptation; a theory that has been strongly argued for the visual system, as well (Caramazza and Mahon, 2003). According to that, distinct cortical pathways corresponding to different categories of sounds, have been evolved analogous to the environmental sounds, which humans have experienced over the years. This would be in agreement with the previous mentioned review that suggests optimally neural encoding over the natural sounds compare to the sounds of human-made machines (Theunissen and Elie, 2014). From this perspective, listening to the sounds of nature would elicit weaker activation of brain in comparison to the more "modern" tool sounds (e.g., phone ringing), since we are evolutionary more adapted to the natural sounds and this in turn would require less cognitive effort (Buckner et al., 2008). This would in turn give some explanation on the abovementioned findings, where the man-made sounds showed stronger brain activation compare to the "living" objects. Furthermore, the functional connectivity maps showed that within the man-made category, there might be "key" connections for Artificial-versus-Natural relative to Musicalversus-Natural, even though the nodes remained the same for both comparisons. This could also explain cases of semantic impairments, where patients are impaired in a very specific category, while the remaining categories within the same domain are spared (McCarthy and Warrington, 2016; Muhammed et al., 2018). Therefore, it seems very probable that the perception of auditory objects relies on a large-scale distributed system, which follows distinct neuronal pathways, dissociated on the basis of the weight that each node has in the network. Based on our findings, this assumption cannot be clearly answered, though it gives rise for upcoming examination. A connectivity analysis involving also the common activated regions might provide a better picture to this assumption, however, due to the fact that this requires a different analysis, it is recommended for future study.

In general, the fact that the processing of auditory stimuli engages regions beyond the auditory cortex, such as anterior

temporal and frontal lobe (Maeder et al., 2001; Alain et al., 2008), is well documented. Similar to our cortical network analysis results, recent neuroimaging studies suggest that a network of fronto-temporo-parietal regions contributes to semantic processing (for review see Thompson-Schill, 2003; Patterson et al., 2002). This network has been proposed to be associated with the perception of both auditory and visual object identification (Goll et al., 2010; Hunter et al., 2010; Brefczynski-Lewis and Lewis, 2017). However, what is not well documented yet is how these brain responses are functionally connected. Our connectivity study underpins the connectivity of brain regions within sound discrimination. In order to obtain a better understanding of brain auditory categorization, it is not sufficient to investigate only the activated regions in isolation, but rather to understand how these regions interact. In this aspect we believe that our results are valuable for better understanding of the human brain in sound discrimination.

### CONCLUSION

The present study demonstrated an enhanced brain network of man-made related sounds (Musical and Artificial) when compared to Natural sounds. So far the literature has provided simple brain activation maps. We additionally showed how these differentially activated brain regions are functionally connected and linked to the respective cognitive processes. We replicated previous findings supporting the engagement of other modalities beyond the auditory, to be involved in the processing of sound stimuli, as soon as this reaches the level of object representation. This in turn seems to be based on semantic categorization of the stimulus, following distinct neuronal pathways for living versus man-made objects. In addition to previous studies that investigated only the cortical activation to different sound categories, we demonstrated significant differences in the functional connectivity between the cortical sources involved in the processing of the different sound categories.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

The studies involving human participants were reviewed and approved by the Medical faculty of the University of Münster. The patients/participants provided their written informed consent to participate in this study.

## AUTHOR CONTRIBUTIONS

VS contributed to the data recruitment, data analysis, and wrote the first draft of the manuscript. EP and NC involved in the statistical analysis. KM and AW involved in the informatic support. CD, DK, and CP contributed to the manuscript revision. All the authors read and approved the submitted version of the manuscript.

### FUNDING

This work was supported by the Deutsche Forschungsgemeinschaft (DFG; PA 392/16-1, DO-711/10-1).

## ACKNOWLEDGMENTS

fnins-13-01052 October 1, 2019 Time: 16:42 # 11

The authors would like to thank the subjects for their cooperation and the technicians of the Institute for Biomagnetism and Biosignalanalysis for supporting the data acquisition.

### REFERENCES


Carlson, N. R. (2012). Physiology of Behavior, 11th Edn. London: Pearson.


## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins. 2019.01052/full#supplementary-material



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Salvari, Paraskevopoulos, Chalas, Müller, Wollbrink, Dobel, Korth and Pantev. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Effects of Music Training on Inhibitory Control and Associated Neural Networks in School-Aged Children: A Longitudinal Study

Sarah L. Hennessy<sup>1</sup> , Matthew E. Sachs<sup>1</sup> , Beatriz Ilari1,2 and Assal Habibi<sup>1</sup> \*

<sup>1</sup> Brain and Creativity Institute, University of Southern California, Los Angeles, CA, United States, <sup>2</sup> Thornton School of Music, University of Southern California, Los Angeles, CA, United States

Inhibitory control, the ability to suppress an immediate dominant response, has been shown to predict academic and career success, socioemotional wellbeing, wealth, and physical health. Learning to play a musical instrument engages various sensorimotor processes and draws on cognitive capacities including inhibition and task switching. While music training has been shown to benefit cognitive and language skills, its impact on inhibitory control remains inconclusive. As part of an ongoing 5-year longitudinal study, we investigated the effects of music training on the development of inhibitory control and its neural underpinnings with a population of children (starting at age 6) from underserved communities. Children involved in music were compared with children involved in sports and children not involved in any systematic after-school program. Inhibition was measured using a delayed gratification, flanker, and Color-Word Stroop task, which was performed both inside and outside of an MRI scanner. We established that there were no pre-existing differences in cognitive capacities among the groups at the onset. In the delayed gratification task, beginning after 3 years of training, children with music training chose a larger, delayed reward in place of a smaller, immediate reward compared to the control group. In the flanker task, children in the music group, significantly improved their accuracy after 3 and 4 years of training, whereas such improvement in the sport and control group did not reach significance. There were no differences among the groups on behavioral measures of Color-Word Stroop task at any time point. As for differences in brain function, we have previously reported that after 2 years, children with music training showed significantly greater bilateral activation in the pre-SMA/SMA, ACC, IFG, and insula during the Color-Word Stroop task compared to the control group, but not compared to the sports group (Sachs et al., 2017). However, after 4 years, we report here that differences in brain activity related to the Color-Word Stroop task between musicians and the other groups is only observed in the right IFG. The results suggest that systematic extracurricular programs, particularly music-based training, may accelerate development of inhibitory control and related brain networks earlier in childhood.

Keywords: music, inhibitory control, executive function, longitudinal research, music training, inhibition, neuroplasticity

Edited by:

Virginia Penhune, Concordia University, Canada

#### Reviewed by:

Vesa Putkinen, Turku PET Centre, Finland Lutz Jäncke, University of Zurich, Switzerland

> \*Correspondence: Assal Habibi ahabibi@usc.edu

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 06 July 2019 Accepted: 24 September 2019 Published: 16 October 2019

#### Citation:

Hennessy SL, Sachs ME, Ilari B and Habibi A (2019) Effects of Music Training on Inhibitory Control and Associated Neural Networks in School-Aged Children: A Longitudinal Study. Front. Neurosci. 13:1080. doi: 10.3389/fnins.2019.01080

Executive functions are broadly defined as top–down processes related to goal acquisition and attention that primarily recruit the brain's prefrontal network (Miller and Cohen, 2001). Inhibitory control, a sub-construct of executive function (Miyake et al., 2000), refers to the ability to suppress a primary response. It is correlated with the activation in the dorsolateral prefrontal cortex (DLPFC), anterior cingulate cortex (ACC), supplementary and pre-supplementary motor cortex (SMA/pre-SMA), insula, and inferior frontal gyrus (IFG) (Niendam et al., 2012). Inhibitory control has been shown to be predictive of academic success (Mischel et al., 1989; Alloway et al., 2005; Duckworth and Seligman, 2005; Kirby et al., 2005), career success (Bailey, 2007), positive socioemotional wellbeing (Hughes and Dunn, 1998; Lengua, 2003; Eisenberg et al., 2005), wealth (Moffitt et al., 2011), reduced substance abuse risk and incarceration (Moffitt et al., 2011), and physical health (Seeyave et al., 2009; Miller et al., 2011).

The evidence supporting the association between inhibitory control and positive life outcomes has encouraged the development of educational programs aimed at improving these skills in childhood. Several studies observed enhancement of inhibitory control after short-term exercise programs among children who were overweight (Davis et al., 2011), who were diagnosed with attention deficit hyperactivity disorder (Chang et al., 2012), as well as among typically developing, nonoverweight children (Chen et al., 2014). Similar improvements have been reported in children involved in martial arts (Lakes and Hoyt, 2004), mindfulness training (Flook et al., 2010), and classroom-based programs such as Tools of Mind (Diamond et al., 2007). In addition to behavioral improvements, these interventions produced training-related activation increases in regions of the cognitive control network changes in the cognitive control network in both children (Davis et al., 2011; Voss et al., 2011) and adults (Allen et al., 2012; Berkman et al., 2014). Others cite neural activation decreases associated with increased efficiency of inhibitory control after an intervention (Chaddock-Heyman et al., 2013; Nishiguchi et al., 2015).

Recently, there has been increasing interest in the impact of music interventions on developing inhibition control. Playing a musical instrument requires a variety of functions, including coordination of fine motor skills and different streams of auditory input (Zatorre et al., 2007), the rapid adjustment of one's motor behavior in response to mistakes (Jentzsch et al., 2014). To meet the technical demands of playing their instrument, musicians must continue to play as they inhibit attention away from one hand to focus on a different movement in the other hand. Like any practice-dependent activity or acquiring any skill, playing a musical instrument requires focused attention and self-discipline, prioritizing practice over other, more instantly gratifying, activities. This is particularly important in a group setting, where there are many social distractors, such as side conversations or other playing a piece incorrectly, that musicians must ignore for the benefit of the performance. For these reasons, musicians have been often cited as a model from which to investigate neuroplasticity (Münte et al., 2002).

In spite of this, contributions of musical abilities on inhibition in adults are not clear. While some studies have found that professional adult musicians show faster reaction times on a Color Stroop Task than amateur adult musicians (Travis et al., 2011) and untrained individuals (Bialystok and Depape, 2009), and faster reaction times than untrained individuals on a visual Simon task (Bialystok and Depape, 2009), Zuk et al. (2014) did not find differences in response times between adult musicians and untrained individuals on a Color Stroop Task, and Slevc et al. (2016) found music training to be unrelated to scores in an Auditory Stroop and Simon Arrows task.

Studies with children are similarly inconclusive. In a Simon task, children aged 10–11 who were musically trained showed reduced reaction time differences between congruent and incongruent trials when compared to children without music training (Joret et al., 2017). When asked to name shapes and arrows in an opposite manner (e.g., 'up' when presented with a 'down' arrow), musically trained children aged 3–9 performed significantly faster than musically un-trained children (Saarikivi et al., 2016). In another study, Holochwost et al. (2017) reported improvements in go-no-go task accuracy in 7–13-year-old children participating in an El Sistema orchestral program after 2 years and again after 3 years of training. Children aged 5–7 involved in school music education had greater increases in scores on a go-no-go task over two and a half years when compared to a visual arts and passive control group (Jaschke et al., 2018). Similarly, 4– 6-year-old children involved in a music listening program performed better than those in an arts training program on go-no-go accuracy at post-test (Moreno et al., 2011). In an fMRI investigation of cognitive control networks, Zuk et al. (2014), demonstrated that musically trained children aged 9–12 had greater activation in the pre-SMA/SMA and right VLFC during a set shifting task as compared to nonmusician children. Our group recently reported that, children aged 8–9 with 4 years of music training, as compared to children without any systematic training, had greater BOLD signal during incongruent trials of a color-word Stroop task in inhibition-related neural regions, including the pre-SMA/SMA, precentral sulcus, ACC and IFG (Sachs et al., 2017). However, no significant differences were observed between musically trained children and children involved in sports training. Additionally, neither our group (Sachs et al., 2017), nor Zuk et al. (2014) observed behavioral differences between groups, and others have reported no significant benefits of music training on inhibition (Schellenberg, 2011), leaving the findings inconclusive.

Several factors possibly contribute to the lack of consistency in these findings, including the absence of an active comparison group in some studies, and a great degree of variation in the music training programs studied. For example, Moreno et al. (2011) used a computerized curriculum with a focus on listening rather than learning to play a musical instrument, which involves sensory-motor learning and participation in a group setting (e.g., Holochwost et al., 2017). Even among studies that agree on the definition of "music training" as instrumental instruction, many differ on the length of training required to label participants as

"musicians" or "non-musicians." In Schellenberg (2011) study, child musicians were described as having at least 2 years of music lessons outside of school; whereas, children in Zuk et al. (2014) study had been playing for an average of 5 years, while Degé and Schwarzer (2011) defined music training as a continuous variable. These differences are of particular importance due to the rapid development of inhibitory control during early to middle childhood, in which one or 2 years of regular participation in a music program could provide significant improvements in skill performance. Differences in individual and group music training between studies may also affect comparability between studies, since the social aspect of music making requires more integration of cognitive and social functions than solo playing (Keller, 2008). Playing in an ensemble where one is required to follow a conductor's command, to attend to other members of the ensemble, and to adjust behaviors in response to other players, could improve inhibition skills at a faster rate than individual musical practice.

Finally, a limitation of current research in the field is the difficulty in recruiting a study sample that reflects society's growing socio-economic status (SES) and cultural diversity. While a few groups (e.g., Holochwost et al., 2017) have investigated underrepresented populations, many studies are conducted with participants from what Henrich et al. (2010) called WEIRD societies (white, English speaking, industrial, and from rich and democratic countries), and may be generalizable only to a narrow portion of society at large. To draw broader conclusions about the impact of music training throughout society, more research involving participants from diverse backgrounds is needed.

In the present study, we investigate the effects of music training on inhibitory control using behavioral and neuroimaging methods, aiming to extend the previously reported findings (Sachs et al., 2017). Over the course of 4 years, we compare children involved in music training with children involved either in sports training or no systematic enrichment training. By conducting this investigation longitudinally, and implementing a pre-training baseline assessment, we attempt to differentiate effects of training from pre-existing biological contributions. To our knowledge, this is the first longitudinal study using neuroimaging to specifically assess inhibitory control in children involved in music training. Additionally, by comparing children involved in music training to children involved in sports training, we assess whether any effects observed in measures of inhibition are related to music training specifically or are associated with any type of extra-curricular activity that is socially engaging and motivating.

We hypothesize: (a) that children involved in music training will show greater improvements on behavioral measurements of inhibition than will children involved in no systematic training and children involved in sports training, evidenced by reduced reaction time and improved accuracy; (b) that, during an fMRI inhibition task, children involved in music training as compared to children with no systematic training will show greater activation of brain regions associated with inhibitory control, including the IFG, SMA/pre-SMA, ACC, and insula, continuing the pattern reported after 2 years of training (Sachs et al., 2017).

## MATERIALS AND METHODS

## Participants

Data for this report were collected as part of an ongoing longitudinal study investigating the effects of music training on child brain, cognitive, and socioemotional development (Habibi et al., 2014). Eighty-eight participants (36 female, mean age = 6.81 years, SD = 0.69) were recruited from community music and sports programs, and public elementary schools in the Greater Los Angeles Area. Participants were from three groups: 28 children (11 female) who had enrolled and were about to begin participation in the Youth Orchestra of Los Angeles at the Heart of Los Angeles program (hereafter called "music group"), 29 children (12 female) who had enrolled and were about to begin participation in community based soccer or swimming training formed the first comparison group (hereafter called "sports group"), 31 children (13 female) recruited from public elementary schools in the same Los Angeles Area who, at the time of recruitment, were not engaged in an organized and systematic after-school music or sports programs formed the second comparison group (hereafter called "control group").

#### The Music Training Program

The Youth Orchestra of Los Angeles at Heart of Los Angeles (YOLA at HOLA) is inspired by the Venezuelan approach known as "El Sistema," offering free group-based music instruction 4– 5 days a week to children from underserved communities of Los Angeles. The program emphasizes systematic, high intensity group music training, focusing on rhythm, melody, harmony, and ensemble practice with a goal of promoting social inclusion. Children (up to 20 per year) are selected by lottery from a list of interested families and, after selection, provided with a violin or viola. The curriculum includes group stringed instrument practice, group singing, Orff, and musicianship (ear training and theory skills), totaling 6–7 h of music instruction per week. Individual practice at home is left to discretion of the families.

#### The Sports Training Program

The soccer and swimming programs offer free or lowcost training in a community setting to all children whose parents choose to enroll. The soccer program consisted of a 2-h practice three times a week, and a 1-h game each weekend. Soccer practices included warmups, team cheers, skill training (dribbling, passing, etc.), and simulated games. The swimming program consisted of a 1-h practice, two times a week, with additional recreational sessions each weekend. Swimming practices included fitness, endurance, water safety, and stroke development. Both programs were taught by professional coaches.

The sports training group was selected as a comparison group to control for aspects of musical training that would likely be shared by those in a regular, extra-curricular activity such as social engagement, discipline, and sustained effort. Additionally, sports training was chosen due to its sensory motor learning, a component shared with music training. These aspects alone may have beneficial effects on development of both cognitive and social skills, and thus including an active comparison group is essential.

## Exclusion Criteria

fnins-13-01080 October 14, 2019 Time: 16:55 # 4

Children were excluded from the study if they had a history of psychiatric or neurological disease. At all assessment times, participants' parents were interviewed to ensure that their children had not been diagnosed with a developmental or neurological disorder in the previous year.

During the first visit of each year, children were asked to report any activities they had begun to participate in outside of school. Participants in the control group were excluded from analyses if they had been involved in an extra-curricular music or athletic activity three times a week for at least 6 months. Participants in the sports group were excluded from analyses if they had been involved in a music training program three times a week for at least 6 months. Participants in the music group were excluded from analyses if they had discontinued participation in the music program.

Five time points were used to assess the development of inhibition (**Figure 1**). Children were assessed annually, at approximately the same time each year. Hereafter, "baseline" will be used to refer to the initial assessment, prior to the start of any training (in case of music and sports and no training in case of the control), and subsequent testing times will be referred to as "Year 1" through "Year 4."

## Socio-Economic Status

All participants came from equally underprivileged communities in Los Angeles (Habibi et al., 2014). Socioeconomic status (SES) was assessed through parental interviews conducted by research assistants who were native speakers of the parents' preferred language (i.e., English, Spanish, or Korean). Parent interviews included questions ascertaining maternal and paternal education and occupation and annual family household income and size. An SES score was calculated as the mean of each parent's education and annual income. Education level was scored on a 5 point scale: (1) elementary/middle school; (2) high school; (3) college education; (4) master's degree (MA, MS, MBA); (5) professional degree (Ph.D., MD, JD). Annual income was also scored on a 5-point scale: (0) <\$10,000; (1) \$10,000–\$19,999; (2) \$20,000–\$29,000; (3) \$30,000–\$39,999; (4) \$40,000–\$49,999; (5) >\$50,000.

The ethnic distribution included children of Latino, Korean, and African-American backgrounds. 97.7% of participants were bilingual, raised in English–Spanish (93.2%) or English– Korean (4.5%) speaking households while attending English speaking schools that did not offer systematic music programs for their students.

#### Procedures

Recruitment and induction protocols were approved by the University of Southern California Institutional Review Board. Informed consent was obtained in writing from the parents/guardians in the preferred language on behalf of the child participants, and verbal assent, at each year, was obtained from each child. Either the guardians or the children could end their participation at any time. Participants (parents/guardians) received monetary compensation (\$15 per hour) for their child's participation and children were awarded small prizes (e.g., toys or stickers). To recognize continuous participation, participants were given a \$50 bonus after each year's testing and sent small holiday and birthday gifts. All children were tested individually at the Brain and Creativity Institute at the University of Southern California, or at Heart of Los Angeles in a designated private room.

#### Behavioral Assessment

Cognitive development was assessed using the Wechsler Abbreviated Scale of Intelligences (WASI-II) for children (Wechsler, 2011). Cognitive inhibition was measured with a child-friendly version of the Flanker task (Davidson et al., 2006; Diamond et al., 2007) (**Figure 2**). An Animal Stroop task was used at Year 1 (Wright et al., 2003), and a computerized Color-Word Stroop task was used starting Year 2. Additionally, we measured inhibitory control using a version of the delayed gratification task (Mischel et al., 1989), where children were presented with increasing reward sizes that they could elect to take immediately, or wait in favor of a larger reward at a future time.

See **Supplementary Material** for a detailed description of each task.

#### Neuroimaging Assessment

Children underwent anatomical, diffusion, and functional MR imaging of their brain. A child-friendly protocol was implemented that included a training session prior to scanning, where children learned about the scanner and familiarized themselves with the environment in a mock scanner. Results from structural and diffusion scans are discussed in a previous report (Habibi et al., 2017).

The fMRI consisted of a modified version of the Color-Word Stroop task, designed for performance inside the fMRI scanner (**Figure 3**), and performed at Years 2 and 4 (see **Supplementary Material** for detail). Children completed two functional runs, of six blocks each. Blocks were divided between word and color congruent, and incongruent conditions. Each block was followed by a 16 s rest period for a total scan time of 240 s (120 TRs).

A 3T MAGNETOM Prisma System was used to acquire high-resolution T1-weighted structural MRI images, using a 20 channel head coil (1 mm × 1 mm × 1 mm resolution over a 256 × 256 mm × 256 mm FOV; T1/TE/TR = 850/32.05/2300 ms; flip angle = 8◦ ; GRAPPA acceleration factor R = 2). A gradient echo, echo-planar, T2<sup>∗</sup> -weighted pulse sequence was used to acquire functional images (TR = 2000 ms, one shot per repetition, TE = 25 ms, flip angle = 90◦ , 64 × 64 in-plane resolution). Forty-one slices covering the entire brain were acquired (3 mm × 3 mm × 3mm voxel resolution). Each run of the task consisted of 165 volumes.

#### Analysis

Statistical analyses were performed using R statistics (R Core Team, 2018). Only participants who had completed all time-point

assessments of a given task were included in the final analysis. Outliers, defined as scores two or more standard deviations above or below the mean, were removed. Not all tasks were performed every year (**Table 1**). When sphericity assumptions were violated, degrees of freedom were corrected using Greenhouse–Geisser epsilon adjustments. The alpha-level used in all analyses is 0.05.

#### Behavioral Analysis

Separate repeated measures ANOVAs were conducted for each task, with group as the between-groups factor and year as the within-groups factor. For tasks that included multiple conditions (e.g., reward size or congruency), condition was included as an additional within-groups factor. For tasks that included multiple variables (e.g., accuracy and reaction time), separate repeated measures ANOVAs were used for each variable.

The animal Stroop task, completed at Year 1, was analyzed with one-way ANOVAs for each reaction time and errors, obtained by subtracting congruent trials from incongruent trials. For the color-word Stroop task, participants responded verbally Years 2 and 3, and through keypresses Year 4, thus analyses for Year 2 and 3 were conducted separately from Year 4. For Year 2 and 3, a repeated measures ANOVA was conducted for accuracy and reaction time, with group as the between-groups factor, and year and congruency as within-groups factors. For Year 4, one-way ANOVAs were conducted for accuracy and reaction time. Stroop reaction time performed during fMRI was analyzed with a repeated measures ANOVA, with betweensubjects factors of congruency and year and group as the within-subjects factor. Post hoc tests for all tasks were computed using Tukey's HSD, and Cohen's d effect size was calculated and reported.

#### fMRI Analysis

#### **Whole brain**

All analyses of functional MR data were completed using the FMRIB Software Library (FSL). Pre-processing included removal

TABLE 1 | Task completion per year.


of non-brain tissue (BET), motion correction (MCFLIRT), slice-timing correction, spatial smoothing (5.0 mm FWHM Gaussian kernel), and high pass temporal filtering (140 s). Motion scrubbing was conducted for each functional run to correct for additional head motion, using root-mean-squares intensity differences (dvars) to determine which slices should be regressed out during GLM analysis (Power et al., 2012). Slices with dvars values greater than the 75th percentile +1.5<sup>∗</sup> interquartile range were included in a confound matrix added to the GLM. A fixed effects analysis was used to combine the two functional runs for each participant, and FLIRT was used to register images to a high-resolution structural and standard space with 12 DOF and a 2-mm MNI template. The task was modeled with a regressor for each congruency condition using a boxcar convolved with a double-gamma hemodynamic response function. BOLD signal between the task conditions (congruent, incongruent) were contrasted using a general linear model. Models were combined into a mixed-effects analysis, and independent two-sample t-tests were used to determine brain activation differences during these contrasts between groups. A repeated measures ANOVA was used to compare contrasts between groups across years. Age at the time of scan was used as a covariate of non-interest. Statistical inference was completed using Z images and FSL's cluster thresholding, using a cutoff of Z > 2.3, and a cluster size probability of p = 0.05.

#### **Region of interest**

Region of interest (ROI) analysis was conducted to assess percent signal change in regions selected a priori due to their known involvement in cognitive control mechanisms (pre-SMA/SMA, ACC, and IFG). 8-voxel spheres were drawn with center coordinates located at the peak voxel base on significant clusters found in the group-level all subject, incongruent > congruent contrast from Year 2 results (Sachs et al., 2017). Percent signal change was calculated from beta values using Featquery in FSL. A repeated measures ANOVA was conducted for each ROI, with year as the within-subjects factor, and group as the betweensubjects factor.

## RESULTS

A chi-squared test revealed no significant group differences in sex [χ 2 (2, N = 53) = 2.54, p > 0.05]. Age at baseline was higher in the Control group (M = 7.04 years, SD = 0.49 years) than in the Music group (M = 6.48 years, SD = 0.42 years), [F(2,52) = 5.34, p < 0.01]. There were no differences between the sports group age (M = 6.69 years, SD = 0.75 years) and any other group (p > 0.05), and there were no differences between groups in age at any other year. The number of included subjects differed between tasks, thus age at baseline was used as a co-variate only in cases in which there was a significant group difference at onset in subjects that completed all years of a given task. There were no differences between the groups in gender, SES, or cognitive abilities at baseline assessment (Habibi et al., 2014) and thus these factors were not included in the analysis.

## Behavioral Results

#### WASI-II

The WASI-II was completed at all five assessment times by 52 participants (Music n = 17, Sport n = 17, Control N = 18). For FSIQ-4, no main effect of group (p > 0.05) or year (p > 0.05) was observed. No year by group interaction effect was found (p > 0.05). For PRI, no main effect of group was observed (p > 0.05). A main effect of year was observed [F(4,196) = 2.65, p < 0.05, η <sup>2</sup> = 0.01], but post hoc comparisons did not reach significance (p > 0.05). No year by group interaction was observed (p > 0.05). No effects of group, year, or year by group interactions were observed in VCI (all p > 0.05), or FSIQ-II (all p > 0.05).

#### Stroop

The animal Stroop task was completed at Year 1 by 54 participants (Music N = 19, Sport N = 14, Control N = 21).

No significant group differences were observed (p > 0.05). The color-word Stroop task was assessed at Year 2, Year 3, and Year 4, and completed by 43 participants (Music N = 17, Sport N = 12, Control N = 14). For Year 2 and Year 3, a significant main effect of year on accuracy was found [F(1,39) = 11.96, p < 0.01, η <sup>2</sup> = 0.24], where accuracy was greater Year 3 (M = 0.96, SD = 0.06) than Year 2 (M = 0.94, SD = 0.08) (p < 0.01, d = 0.40). A significant main effect of condition was also observed [F(1,39) = 36.74, p < 0.001, η <sup>2</sup> = 0.49], where accuracy was greater on the congruent condition (M = 0.99, SD = 0.03) than the incongruent condition (M = 0.92, SD = 0.08) (p < 0.001, d = 1.12). No main effect of group was observed (p > 0.05). No year by group (p > 0.05) or condition by group (p > 0.05) interaction was observed. No significant year by condition by group interaction effect was found (p > 0.05). For reaction time (incongruent minus congruent trials), no main effect of year (p > 0.05) or group (p > 0.05) or year by group interaction (p > 0.05) was observed.

For Year 4, a main effect of condition on accuracy was observed in a one-way ANOVA [F(1,40) = 22.27, p < 0.001, η <sup>2</sup> = 0.36], where congruent trials (M = 0.86, SD = 0.15) were responded with more accuracy than incongruent trials (M = 0.80, SD = 0.16) (p < 0.001, d = 0.37). There was no main effect of group (p > 0.05), or an interaction effect between condition and group (p > 0.05). For reaction time (incongruent minus congruent trials), no main effect of group was found (p > 0.05).

#### Flanker Fish Task

The flanker task was completed by 48 participants at Year 2, Year 3, and Year 4 (Music N = 17, Sport N = 15, Control N = 16). There was a significant main effect of year on accuracy [F(1.17,52.68) = 7.02, p < 0.01, η <sup>2</sup> = 0.14], in which participants improved their accuracy each year. There was a significant main effect of condition [F(1.54,69.45) = 13.58, p < 0.001, η <sup>2</sup> = 0.23], where participants responded more accurately on congruent trials (M = 0.97, SD = 0.12) than incongruent trials (M = 0.91, SD = 0.15) (p < 0.001, d = 0.42) and more accurately on neutral trials (M = 0.96, SD = 0.13) than incongruent trials (p < 0.001, d = 0.30). There was no main effect of group (p > 0.05), and no interaction effect of condition by group (p > 0.05), or year by group (p > 0.05). There was no significant year by condition interaction effect (p > 0.05). There was a significant year by condition by group interaction effect [F(8,180) = 2.061, p < 0.05, η <sup>2</sup> = 0.08]. Post hoc analysis revealed that the condition by year interaction effect was significant in the music group only, [F(4,64) 4.67, p < 0.01, η <sup>2</sup> = 0.06], where during incongruent trials participants responded with higher accuracy at Year 4 (M = 0.97, SD = 0.07) (p < 0.01, d = 0.98) and Year 3 (M = 0.95, SD = 0.08) (p < 0.05, d = 0.82) compared to Year 2 (M = 0.83, SD = 0.19). No such year by condition effect was observed in the sport (p > 0.05) or control groups (p > 0.05) (see **Figure 4**).

For reaction time, there was a significant main effect of year [F(2,90) = 41.43, p > 0.001, η <sup>2</sup> = 0.48]. A main effect of group approached significance [F(2,45) = 2.98, p = 0.06, η <sup>2</sup> = 0.12], indicating that the Sport (M = 680.64 ms, SD = 151.02 ms) trended toward longer reaction times than Music (M = 619.83 ms, SD = 138.76 ms) (p < 0.001, d = 0.42) and Control (M = 602.77 ms, SD = 137.37 ms) (p < 0.001, d = 0.54). There was a significant main effect of condition on reaction time [F(1.77,79.47) = 189.40, p < 0.001, η <sup>2</sup> = 0.81]. No year by group interaction effect was observed (p > 0.05). A condition by group interaction effect approached significance [F(4,90) = 2.41, p = 0.06, η <sup>2</sup> = 0.10], indicating a trend that group differences

were significant in the incongruent and congruent conditions, and not in the neutral condition. No year by condition by group interaction effect was observed (p > 0.05) (see **Figure 5**).

#### Delayed Gratification

fnins-13-01080 October 14, 2019 Time: 16:55 # 8

The delayed gratification task was assessed at Year 3 and Year 4, completed by 59 participants (Music n = 18, Sport n = 21, Control n = 20). For quarters, no main effect of year (p > 0.05), or group was observed (p > 0.05). There was a significant main effect of reward size [F(1.72,96.2) = 4.88, p < 0.05, η <sup>2</sup> = 0.08], where large rewards (M = 0.89, SD = 0.32) were saved more than small rewards (M = 0.77, SD = 0.42) (p < 0.05, d = 0.32). No year by group (p > 0.05) or reward size by group (p > 0.05) interaction was observed. No year by reward size by group interaction was observed (p > 0.05). Although group differences were non-significant, the music group consistently saved 100% of quarters in the large condition both Year 3 and Year 4, while the Sport and Control groups saw a decline in large reward delay, from Year 3 to Year 4.

For M&Ms, no main effects of year (p > 0.05), or group (p > 0.05) were observed. A main effect of reward size [F(2,112) = 15.42, p < 0.001, η <sup>2</sup> = 0.22], where large (M = 0.81, SD = 0.39) (p < 0.001, d = 0.62) and medium rewards (M = 0.76, SD = 0.43) (p < 0.01, d = 0.49) were saved more than small rewards (M = 0.54, SD = 0.50). No year by group (p > 0.05) or reward size by group (p > 0.05) was found. No year by reward size interaction effect was observed (p > 0.05). A year by reward size by group interaction was found [F(3.57,99.81) = 2.80, p < 0.05, η <sup>2</sup> = 0.09]. Follow up analyses indicated that the year by group interaction was significant only in the large reward condition [F(2,56) = 3.49, p < 0.05, η <sup>2</sup> = 0.11], where the Music group (M = 1.00, SD = 0.00) saved more M&Ms than the Control group (M = 0.56, SD = 0.51), (p < 0.01, d = 1.19) at Year 3 (see **Figure 6**). A non-significant trend indicated a stepwise increase in reward delay from small to large rewards that was present only in the Music group, but not the Sport or Control groups, where the music participants saved more large rewards than medium rewards and more medium rewards than small rewards. At Year 4, the group by reward size interaction was no longer significant.

## fMRI Results

The fMRI protocol was completed by 40 participants at Year 2 and Year 4 (Music N = 14, Sport N = 11, Control N = 15).

#### Reaction Time

During the fMRI Stroop task, a main effect of condition on reaction time was observed [F(1,26) = 99.26, p < 0.001, η <sup>2</sup> = 0.22], where participants performed faster at congruent trials (M = 654.72 ms, SD = 129.21 ms) than incongruent trials (M = 810.51 ms, SD = 169.56 ms) across years. There was no main effect of group or year (p > 0.05), and no interaction between group, year, or condition (p > 0.05).

#### Whole Brain

Whole brain analysis of the incongruent > congruent condition between Year 2 and Year 4 revealed no effect of year and no interaction between year and group on BOLD signal. Due to the small number of participants (N = 25) who were scanned in both Year 2 and Year 4, we additionally analyzed Year 4 (N = 40) separately. Whole-brain analysis for the incongruent > congruent condition contrast revealed significant signal differences across groups in the left IFG, bilateral ACC,

bilateral insula, and the left anterior intra-parietal sulcus at Year 4 (see **Figure 7**). There were no significant differences in BOLD signal found between groups. Analysis of the incongruent > rest condition revealed significant signal differences across groups in the bilateral premotor cortex, right frontal operculum cortex (Broca's area), bilateral occipital pole, left anterior intra-parietal

sulcus, and left angular gyrus (see **Figure 8**). Musicians compared to the control group showed significantly greater activation of the right inferior frontal gyrus in this contrast (see **Figure 9**). No other group differences in BOLD signal were observed at Year 4 (see **Table 2**).

#### Region of Interest

Region of interest analysis of the incongruent > congruent contrast (see **Figure 10**) in the SMA revealed an effect of group that approached significance (p = 0.06), where the music group trended toward greater signal change (M = 0.19%) than the sport group (M = 0.06%) across years. In the left IFG, a year by group interaction effect approached significance (p = 0.09), indicating that the music group (M = 0.25%) trended toward greater signal change than the control group (0.19%) year 2, but not year 4 (music M = 0.12%, sport M = 0.19%). In the right ACC, a significant main effect of group was observed [F(2,28) = 3.66, p < 0.05, η <sup>2</sup> = 0.07], where the music group (M = 0.18%) had greater percent signal change than the control group (M = 0.06%), but this difference was not significant after correcting for multiple comparisons (p = 0.07, d = 0.61). No other significant main or interaction effects were observed for this contrast (p > 0.05).

The incongruent > rest contrast (see **Figure 11**) revealed a main effect of year in the SMA [F(1,28) = 11.29, p < 0.001, η <sup>2</sup> = 0.19], where participants had greater signal change at Year 4 (M = 0.55%) than Year 2 (M = 0.28%). In the left IFG, a significant main effect of group was observed [F(2,28) = 3.87, p < 0.05, η <sup>2</sup> = 0.10]. This finding indicated the music group (M = 0.54%) had greater percent signal change than the control group (0.34%) but was not significant after correcting for multiple comparisons (p = 0.08, d = 0.59). A main effect of year was also observed

FIGURE 8 | All-subject, whole brain activation for incongruent > rest contrast of fMRI Color-Word Stroop task at Year 4.

[F(1,28) = 10.10, p < 0.01, η <sup>2</sup> = 0.28], indicating that percent signal change was greater at year 2 (M = 0.62%) than year 4 (M = 0.28%). No other significant main or interaction effects were observed for this contrast (p > 0.05).

Neither response time nor accuracy of incongruent trials of the Color-Word Stroop task performed outside the scanner, on a separate day, was correlated with Incongruent - Congruent percent signal change in any ROI (p > 0.05).

### DISCUSSION

The present study examined the effects of group-based music training on the development of inhibition skills in children from under-resourced communities. Using a longitudinal design, children involved in music training were compared with active and passive comparison groups (children involved in sports training, and no systematic after-school program, respectively). We assessed changes in children's inhibition skills and its neural correlates over the course of 4 years, using a delayed gratification task, a flanker task, and a Color-Word Stroop task that was performed both inside and outside of an MRI scanner.

We observed gradual improved performance associated with music training in incongruent trials of the flanker task, in which participants were required to inhibit a dominant response. Music participants improved their accuracy significantly in incongruent trials after 3 and 4 years of training, whereas improvement was observed in the sports and control groups but did not reach a level of significant difference. Overall accuracy was not different among groups after 4 years, however. This may be explained by the music group's slightly, nonsignificant, lower level of accuracy at Year 2 than the sports and control groups, and that they therefore had more to gain from an intervention. While no group differences were observed at any time point, the development observed in the music group may suggest an effect of training unique to music instruction. This is similar to Holochwost et al.'s (2017) finding that children involved in a community-based orchestra showed improvement in go-no-go and flanker task accuracy after 2 and 3 years of training, and with Jaschke et al.'s (2018) finding of improved go-no-go scores in music participants after 2.5 years of training. However, here, the music group's improvement becomes evident after slightly later, after 3 years of training, as we did not obtain a baseline measure of this task, we do not know how participants would have performed at an earlier stage of training. Reaction time in the music group was similar to that of the control group, while sports participants showed a non-significant trend of slower response times. This was most apparent at the last year of participation, suggesting an effect related to the length of training on development of inhibition skills. These findings are consistent with previous work citing faster reaction times of child (Joret et al., 2017) and adult (Amer et al., 2013) musicians in comparison to controls in a Simon Arrows Task, where the effect is more pronounced in incongruent trials. These findings are additionally in line with research indicating adult musicians perform with faster reaction times overall than controls on inhibition measures such as Auditory and Color-Word Stroop tasks (Bialystok and Depape, 2009; Travis et al., 2011; Amer et al., 2013).

In the Delayed Gratification task, we observed an accelerated ability to reject a small, immediate reward in favor of a delayed, large reward in children who had been involved in music training in comparison to children who had not been involved in any training. This finding is not explained by other differences among the groups, as all groups were similar with regard to SES, age, and other cognitive tasks. Discounting of future rewards in


Coordinates represent the peak voxel of the cluster in MNI space.

favor of smaller, immediate rewards declines with age, where children increasingly make rational reward choices as they get older (Bettinger and Slonim, 2007; Steinberg et al., 2009). While the sport and control groups followed this pattern by delaying large rewards by the last year of assessment, the music group did so earlier and more often, demonstrating an ability to rationally delay rewards that was better than expected for their age group. Music-trained children were also less likely to delay discount small rewards as they got older, while sports-trained and control children were more likely to do so. While no other studies to our knowledge have investigated the effect of music training on delayed gratification skills, it is a possibility that attention to detail and orientation to subtle musical cues had a role in extending inhibition to smaller rewards.

Group differences in the delayed gratification and flanker fish tasks, albeit not always statistically significant, indicate a trend toward enhanced inhibition skills in participants involved in music training and are consistent with other longitudinal findings citing overall greater inhibitory control improvements in musically trained children (Moreno et al., 2011; Holochwost

et al., 2017; Jaschke et al., 2018). Musicians' advantages in inhibition skills may relate to paralleled processes in disciplined instrumental practice; learning to play a musical instrument involves frequent stopping to correct mistakes and rehearsing small passages in isolation, which both delays the reward of playing a piece in its entirety and requires one to put aside immediate non-musical distractors for the larger reward of musical proficiency. These skills may additionally relate to musicians' regular practice of error monitoring, where children playing music must quickly adjust motor behavior in response to unanticipated musical demands and mistakes.

Results from the fMRI Stroop task support the trend observed in the Delayed Gratification task, where children involved in music training demonstrated an accelerated maturity that was not achieved until a later age by the Sports and Control groups. We previously reported that, at Year 2 (after 2 years of music training), music participants as compared to controls showed greater difference in BOLD signal in the cognitive control network in the incongruent versus congruent blocks, specifically in the IFG, pre-SMA/SMA, ACC, precentral gyrus, and insula (Sachs et al., 2017). In the current analysis, after 4 years, we found no differences between groups in this contrast, suggesting that the music group had matured on the task at an earlier age than the control group. In the incongruent > rest contrast, the music group showed greater BOLD signal in the right IFG, a finding not observed at Year 2. While color-word Stroop paradigm has been associated with only left IFG (Taylor et al., 1997; Fan et al., 2003; Bernal and Altman, 2009), some studies have reported bilateral activation (Peterson et al., 1999; Banich et al., 2000; Adleman et al., 2002), often driven by differences in design (event-related vs. block; Leung et al., 2000). The right IFG has instead been linked to the stop-signal task and, specifically, cue detection and attentional monitoring, as opposed to the initiation of a motor response (Aron et al., 2003; Hampshire et al., 2010; Sharp et al., 2010). Given the conflicting literature on the right IFG implicated in the Stroop task, and the relatively small sample size here, we do not draw strong conclusions from this finding. Yet, we note that this may indicate improved inhibitory processing associated with length of music training, where music participants are more strongly marking incongruent stimuli as cues to stop a prepotent response.

Our ROI analysis similarly indicated that significant group differences in percent signal change, when present, occurred at Year 2 but not at Year 4. Such differences at Year 4 were present only as non-significant trends, or as significant effects that did not survive post hoc analysis. At Year 2, we found that the music group as compared to both control groups had significantly greater percent signal change in the left IFG, pre-SMA/SMA, and right ACC (Sachs et al., 2017). Here, with a notably smaller sample, we find the group by year interaction term approached significance only for the left IFG. This finding is notable due to the left IFG's role in inhibiting prepotent motor responses (Swick et al., 2008), which is a skill that is highly practiced by musicians when monitoring their performance. However, the

reduced number of participants from Year 2 to 4 may have contributed to the statistically non-significant trends.

It should be noted that several studies report increased engagement of the cognitive control network related to decreased efficiency in inhibition tasks (Tamm et al., 2002; Luna et al., 2010). Others have additionally reported activation decreases in these regions after interventions aimed at improving inhibition skills (Chaddock-Heyman et al., 2013; Nishiguchi et al., 2015). We acknowledge that, without a measure of accuracy inside the scanner, we cannot definitively interpret our BOLD signal findings as indicative of better inhibitory processing. Neural activation observed in the music group at Year 2 thus may have been due to greater cognitive effort required to complete the task and that, by Year 4, all groups were engaging the task with similar levels of effort. However, as accuracy in the Stroop task outside of the scanner trended toward a positive correlation with percent signal change in the pre-SMA/SMA and left IFG at Year 2 (Sachs et al., 2017), our interpretation of these findings aligns more with improved inhibitory control processing.

We note an important and possible confounding factor in this study: the high percentage of bilingualism among participants: 96% of participants retained through Year 4 were Spanish-English bilingual: 100% in the music group, 86.67% in the sports group, and 100% in the control group. Although formal bilingualism measures were not obtained, interview data at baseline indicated that all bilingual participants were fluent in both Spanish and English, primarily speaking Spanish at home and English in the classroom. All bilingual participants indicated that they watched television, read books, and listened to music in both English and Spanish. 66.23% of participants at baseline were enrolled in English as a Second Language classes at their school.

Bilingual individuals have demonstrated advanced inhibitory control skills, as evidenced on performance on the flanker task (Costa et al., 2008; Yang et al., 2011; Barac et al., 2016), stop-signal task (Bialystok et al., 2005; Colzato et al., 2008), Stroop task (Bialystok et al., 2008; Hernández et al., 2010), and Simon task (Martin-Rhee and Bialystok, 2008) when compared to monolingual counterparts, despite lower verbal abilities. These differences are evidenced to emerge as young as 7 months (Kovács and Mehler, 2009), indicating that bilingualism produces enhancements of inhibitory skills before the onset of speech production. Explanations of these findings cite evidence indicating that bilingual individuals activate both languages simultaneously, regardless of context (Hernandez et al., 1996; Costa et al., 1999; Kroll et al., 2008). It has thus been proposed that individuals must constantly suppress the context-dependent irrelevant language in order to communicate effectively, leading to an adaptation in neural attentional networks (Bialystok et al., 2009; Green and Abutalebi, 2013). Studies investigating functional brain organization indicate that bilinguals recruit the frontal regions typically associated with executive control, such as the DLPFC, when switching between languages (Hernandez et al., 2000; Hernandez, 2009; Luk et al., 2011). Despite these findings, a recent meta-analysis reported no evidence for a bilingual advantage in executive functioning in adults (Lehtonen et al., 2018), yet did not investigate studies involving children.

Music training and bilingualism may share similar mechanisms in relation to developing inhibitory control skills. While bilingualism and music playing both require the suppression of irrelevant stimuli, additional similarities exist at a neural level. Musicians, in comparison to musically un-trained individuals, activate the DLPFC when passively listening to music (Ohnishi et al., 2001), as bilinguals do when switching between languages (i.e., Hernandez et al., 2000). Koelsch et al. (2005) observed that, when listening to irregular versus regular musical chords, adults and children engage frontal regions, including the inferior frontal gyrus, and that this activity was positively correlated with music training in both age groups. Thus, both musicians and bilinguals appear to recruit regions associated with executive control when engaging in their respective domains, suggesting that intensive experience in either activity would contribute to enhanced inhibition skills.

Given the above evidence, and the fact that almost all of the participants in this study are bilingual, one possible explanation for discrepancies between our results and what we expected to be the effect of music training on inhibition is the contribution of bilingualism. While music training may have an effect, it is likely that multiple factors, including bilingualism, benefit inhibitory control skills. In studies comparing the effects of music training and bilingualism, bilinguals have been reported to perform similarly (Bialystok and Depape, 2009; Schroeder et al., 2016) or better than (Janus et al., 2016) musically trained individuals on measures of inhibition. Other research has indicated no interaction between music training and bilingualism on measures of inhibition, suggesting that both factors may independently contribute to the development of inhibitory control, but do not produce additional combined effects (Moradzadeh et al., 2015; Schroeder et al., 2016). A proposed explanation for the lack of additive benefits of music training and bilingualism in these studies was that each factor alone produced ceiling effects, restricting further benefit of the combination of experiences. Here, we observe differences, while not always significant, between groups in which nearly all participants are bilingual, suggesting that music training and bilingualism may have a small additive effect early in development. We do, however, acknowledge that, if Lehtonen et al. (2018)'s findings generalize to children, the possible explanation related to bilingualism may not be relevant in our findings.

A limitation of the present study is the inevitable decrease of sample size from baseline to Year 4. While we took preventive measures to reduce attrition, it is typical for longitudinal studies, in particular with a population from underrepresented communities, to experience a reduction in participant number over time. However, attrition was similar in all groups. Additionally, as this is not a randomized control trial, it is possible that our participants had pre-existing differences for which we could not account, such as home environment and parental motivation between groups. A strength of this study is that participants were assessed at baseline and demonstrated no differences between groups in measures of musical pitch

and rhythm discrimination (Ilari et al., 2016); it is nonetheless possible that children with inherent musical motivation persisted in the music training program. However, the absence of group differences at baseline in other key variables strongly suggests that any observed enhancements of inhibitory control are due to the intervention in contrast to pre-existing factors.

Despite these limitations, we provide evidence that 4 years of group-based music training leads to modest positive effects on inhibition skills, evidenced by a greater rate of accuracy improvement on a Flanker task and increased rational decisions on a Delayed Gratification task, and its neural correlates in children from underserved communities. The absence of consistent group differences across all inhibition measures point to the other possible explanations such as bilingualism. When present, advantages associated with music training manifest as early, as musicians demonstrate accelerated development of inhibition, but later diminish as non-musically trained children catch up in these skills.

#### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

#### ETHICS STATEMENT

The studies involving human participants were reviewed and approved by University of Southern California Institutional Review Board. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

AH and BI: conceptualization and supervision. SH, BI, and AH: data curation. SH and AH: behavioral analysis and wrote the manuscript. SH and MS: fMRI analysis and visualization. AH: funding acquisition, investigation, resources, and software. SH: project administration.

#### FUNDING

This project was supported by the University of Southern California's Brain and Creativity Institute Research Fund.

#### ACKNOWLEDGMENTS

We thank all participating families and children. We additionally thank our team at the Brain and Creativity Institute who have helped make this project possible, including Priscilla Perez, Alison Wood, Katrina Heine. We also acknowledge the Los Angeles Philharmonic, Youth Orchestra of Los Angeles, Heart of Los Angeles, Brotherhood Crusade's Soccer for Success Program, Vermont Avenue Elementary School, Saint Vincent School, and MacArthur Park Recreation Center for their ongoing support.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins. 2019.01080/full#supplementary-material


controlled intervention. Front. Hum. Neurosci. 7:72. doi: 10.3389/fnhum.2013. 00072


J. Exp. Psychol. Learn. Mem. Cogn. 22, 846–864. doi: 10.1037//0278-7393.22. 4.846



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Hennessy, Sachs, Ilari and Habibi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Selectively Enhanced Development of Working Memory in Musically Trained Children and Adolescents

Katri Annukka Saarikivi 1,2\*, Minna Huotilainen2,3 , Mari Tervaniemi 2,4 and Vesa Putkinen2,5

<sup>1</sup>Cognitive Brain Research Unit, University of Helsinki, Helsinki, Finland, <sup>2</sup>Department of Psychology and Logopedics, Medicum, Faculty of Medicine, University of Helsinki, Helsinki, Finland, <sup>3</sup>Faculty of Educational Sciences, University of Helsinki, Helsinki, Finland, <sup>4</sup>CICERO Learning, Faculty of Educational Sciences, University of Helsinki, Helsinki, Finland, <sup>5</sup>Turku PET Centre, Turku, Finland

In the current longitudinal study, we investigated the development of working memory in musically trained and nontrained children and adolescents, aged 9–20. We measured working memory with the Digit Span (DS) forwards and backwards tests (N = 106) and the Trail-Making A and B (TMT-A and B; N = 104) tests three times, in 2011, 2013, and 2016. We expected that musically trained participants would outperform peers with no musical training. Indeed, we found that the younger musically trained participants, in particular, outperformed their nontrained peers in the TMT-A, TMT-B and DS forwards tests. These tests all primarily require active maintenance of a rule in memory or immediate recall. In contrast, we found no group differences in the backwards test that requires manipulation and updating of information in working memory. These results suggest that musical training is more strongly associated with heightened working memory capacity and maintenance than enhanced working memory updating, especially in late childhood and early adolescence.

#### Edited by:

Paul J. Colombo, Tulane University, United States

#### Reviewed by:

Fabiana Silva Ribeiro, Portuguese Catholic University, Portugal Lutz Jäncke, University of Zurich, Switzerland

\*Correspondence:

Katri Annukka Saarikivi katri.saarikivi@helsinki.fi

Received: 17 June 2019 Accepted: 07 October 2019 Published: 06 November 2019

#### Citation:

Saarikivi KA, Huotilainen M, Tervaniemi M and Putkinen V (2019) Selectively Enhanced Development of Working Memory in Musically Trained Children and Adolescents. Front. Integr. Neurosci. 13:62. doi: 10.3389/fnint.2019.00062 Keywords: musical training, longitudinal, working memory, updating, maintenance, development, trail-making test, Digit Span

## INTRODUCTION

Musically trained individuals have been reported to outperform musically nontrained peers in various kinds of cognitive tests not directly related to music-making, including ones measuring long-term verbal and visual memory (Chan et al., 1998; Ho et al., 2003), executive functions (Bialystok and Depape, 2009; Degé et al., 2011; Moreno et al., 2011; Zuk et al., 2014; Saarikivi et al., 2016; however, see Schellenberg, 2011), and even intelligence (Schellenberg, 2004, 2006; Moreno et al., 2011). Executive functions (Stuss and Alexander, 2000; Jurado and Rosselli, 2007; Diamond, 2013) are cognitive processes typically divided into three related components: working memory, inhibition, and cognitive flexibility (Miyake et al., 2000; Lehto et al., 2003; Miyake and Friedman, 2012; Diamond, 2013). These three processes allow individuals to acquire, maintain, manipulate, and update representations of information of the environment, and monitor, direct and alter behavior according to these representations. Multicomponent models of working memory propose subprocesses for maintaining representations of information in memory and for manipulating this information. For instance, the influential model of Baddeley and Hitch (1974) divided working memory into two components for storage and manipulation of verbal and visual material, and a cognitive control unit (for other models, see e.g., Cowan, 1988, 1999; Unsworth and Engle, 2007). Neuroimaging and lesion studies have found separate neural functions for memory representations and attention processes that govern manipulation of this information, supporting these modular views of working memory (Postle et al., 1999; Gerton et al., 2004; Owen et al., 2005; reviews: D'Esposito et al., 1995; Miller and Cohen, 2001; Linden, 2007; Nee et al., 2012; Rottschy et al., 2012; Eriksson et al., 2015; for a discussion on differences between short-term memory and working memory, see e.g., Unsworth and Engle, 2007; Cowan, 2008; Aben et al., 2012). Multicomponent models of working memory have been validated in child studies (Gathercole et al., 2004; Gray et al., 2017), and separate brain mechanisms for encoding, maintenance and retrieval of verbal information have also been found in neuroimaging studies of children, from the age 6 onwards, and in adolescents (Gathercole et al., 2004; Siffredi et al., 2017).

Working memory and other executive functions develop from early childhood until adolescence (Cepeda et al., 2001; De Luca et al., 2003; Vuontela et al., 2003; Luna et al., 2004; Zelazo et al., 2004; Huizinga et al., 2006), following the maturation of prefrontal areas (Casey et al., 2000; Fuster, 2002; Kwon et al., 2002; Bunge and Wright, 2007; Kharitonova et al., 2013). Different executive functions however mature at slightly different rates. Development of shifting ability, which is related to cognitive flexibility, has been found to continue until adolescence (Huizinga et al., 2006; Best and Miller, 2010; Huizinga and van der Molen, 2011), and development of working memory even further, until early adulthood (Kwon et al., 2002; Huizinga et al., 2006; Satterthwaite et al., 2013).

Several cross-sectional studies have reported varying musician advantage in working memory tasks. For example, in a study by George and Coch (2011), years of musical training correlated positively with scores in both verbal and visual span tests for memory in college-aged individuals. Similarly, in another study (Talamini et al., 2016), musically trained adults outperformed nontrained peers in auditory as well as visual span tests for working memory. Finally, Zuk et al. (2014) found better performance in the Digit Span backwards test in adult musicians compared to nonmusicians, but not in musically trained children compared to nontrained peers.

Longitudinal studies with children suggest that the putative musician advantage in memory tasks may be caused by training and does not solely reflect pre-existing differences (for a discussion on problems of inferring causation from these kinds of studies, see Schellenberg, 2015). In the study by Ho et al. (2003), verbal long-term memory improved in children who continued musical training during a year-long follow-up, but not in those who did not. Similarly, in a study following the development of musically trained and nontrained children (Bergman Nutley et al., 2014), musical training was associated with improvement of verbal working memory as measured by the backwards Digit Span test, but also visual working memory as measured by a visuo-spatial working memory task. Another longitudinal study (Fujioka et al., 2006), comparing the development of children who undertook music lessons for 1 year to the development of children in a Control group, found significant improvement of working memory as measured by the Digit Span test only in the Music group. In another study (Roden et al., 2014), improvement of working memory was observed in preschool-aged children after 18 months of musical training, but not in an active Control group. In the study, effects were found specifically in tests measuring the phonological loop and the central executive subcomponents of working memory. The phonological loop was measured with the One Syllable Word Span Test, requiring participants to memorize and recite a sequence of words in the order they were presented, and the Nonword recall test, requiring participants to recite a nonword immediately after hearing it. The central executive was measured by complex span tasks requiring processing and storing information at the same time or requiring reversal of the order of a memorized sequence of information units. Last, in a recent study with quasi-random assignment of children into musical training and Control groups (Guo et al., 2018), it was found that 6 weeks of musical training improved working memory. Auditory working memory was assessed with the Digit Span forward and backward tests and with the Letter-Number Sequencing test. Both require working memory maintenance of aurally acquired information and updating and manipulation of that information in memory.

The notion that musical training might influence memory skills is further supported by findings of training-related changes in brain structures important for working memory. In their seminal study on structural differences, Gaser and Schlaug (2003) found that musicians had greater gray matter density in areas important for motor and auditory processes, and also a region of the cerebellum connected to working memory (Stoodley et al., 2012). Similarly, in a study by James et al. (2014), musical training correlated positively with gray matter density in a cerebellar areas and basal ganglia important for working memory. Another MRI study found increased thickness of frontal areas related to working memory in musicians, when compared to non-musicians (Bermudez et al., 2008).

Musical training has also been connected to changes in brain functions related to working memory. In the study by George and Coch (2011), musicians had shorter latencies of electrical brain responses (P3) to changes in visual as well as auditory stimuli, as well as larger P3 amplitudes to tonal changes. The P3 response is thought to reflect updating of working memory. In a recent study (Cheung et al., 2017), musically trained individuals outperformed nontrained peers in tasks for verbal memory and also differed in electrical brain activity measured during a verbal memory task. Specifically, musically trained individuals showed more intrahemispheric coherence in the theta band. In an fMRI study (Pallesen et al., 2010), musicians showed greater activation of brain networks for attention and working memory, including frontal, parietal, and subcortical areas than nonmusicians. In another study (Schulze et al., 2011), musicians showed different patterns of activation of brain areas during memory encoding and rehearsal of structured and unstructured tonal sequences. Nonmusicians did not show differences in activation patterns during these tasks. Musicians also outperformed nonmusicians in learning the tonal sequences.

A recent meta-analysis on cross-sectional and longitudinal studies on music-related enhancement of working memory in children and adults (Talamini et al., 2017) concluded that musicians and musically trained individuals have a clear advantage across different memory tasks when compared to nontrained peers. However, according to the results, the effect size depended on the type of information that was processed and on the memory processes required by the task. In general, the musician advantage was stronger for working memory than long-term memory, and for auditory rather than visual stimuli.

Another recent meta-analysis (Sala and Gobet, 2017a) found only a weak musician advantage in memory tasks. In this metaanalysis, however, no distinction was made between working memory and long-term memory, which may have obscured the effects of musical training on working memory reported in several cross-sectional and longitudinal studies.

In sum, there is evidence of an association between musical training and specifically verbal working memory. However, longitudinal studies have focused on school- or preschool-aged children even though executive functions are known to develop long into adolescence. As a result, it is still unclear how musical training specifically augments the development of working memory, and for how long into adulthood the possible advantage persists.

In this longitudinal study, we compared the working memory skills of 114 musically trained and nontrained children and adolescents aged 9–20. During this age range, executive functions including working memory undergo significant development, owing to the protracted development of brain areas such as the frontal lobes that important for these skills, but also begin to reach maturity (Taylor et al., 2013, 2015) The sample allows for investigating the effects of musical training on working memory, and the persistence of these effects during a developmentally highly interesting window of time. The study aims at answering questions that remain unresolved in research examining the effects musical training may have on cognitive development: does musical training augment the development of working memory, does musical training produce an advantage in working memory tasks, does this advantage persist into adulthood?

To investigate working memory, we employed two broadly studied and well-established tests: the Digit Span backward and forwards tests and the Trail-Making Test A and B. Data on performance in Digit Span tests were collected during a 3-year follow-up and the TMT-A and B tests during a 2 year followup. Based on previous literature, we expected musically trained children and adolescents to perform better than nontrained peers in all tests.

#### MATERIALS AND METHODS

#### Participants

Altogether 106 children and adolescents aged 9–20 years participated in the study (**Tables 1**, **2)**. The musically trained participants (N = 54, 32 females) had started training on a musical instrument approximately at age 7. They had attended or were currently attending a public elementary school that emphasizes music in the curriculum. In addition to weekly classes in classical instrumental training, school days contained music lessons such as choir and ensemble training and performances. Thus, at age 9, participants had a total of approximately 2 years of musical training and participation in the musical curriculum, at age 11, 4 years and so on. The nontrained participants (N = 52, 26 females) had no formal training on a musical instrument. They attended or had attended a standard elementary school with weekly group-based music lessons until the age of 13, but no instrumental tuition. No children reported hearing deficits or neurological impairments. The Music and Control groups were matched in SES and IQ (Putkinen et al., 2014).

Written informed consent for participation was obtained from guardians of underaged participants or from over 18-year old participants themselves before the experiment. All participants also gave verbal consent for their participation. Participants were rewarded three movie tickets for taking part in the measurement. The experiment protocol was approved by the Ethical Committees of the Department of Psychology and of the Faculty of Behavioural Sciences, both at the University of Helsinki, Finland.

#### Working Memory Tests

The Digit Span forwards and backwards (DS forwards, DS backwards) tests (WISC-IV, Wechsler, 2010) as well as the TMT-A and B (TMT-A, TMT-B; Poutiainen et al., 2010) were used to measure verbal working memory. In the Digit Span forwards test, subjects are aurally presented with a series of digits, and immediately recite them from memory. In the DS backwards test, participants are required to recite the presented digits in reverse order. There are multiple presentation rounds, with the experimenter always adding one to-be-memorized digit. The forwards test requires active maintenance of information in mind, and the backwards tests also manipulation of this information. Performance is evaluated by the total of digits that the participant is able to correctly recite.

The TMT-A requires the participant to connect digits printed randomly on paper by drawing a line from number to number in a sequential order. The Trail-Making Test B requires participants to alternate between connecting numbers and letters printed on the paper in order (1-A-2-B-3-C. . .). Both tests require maintenance of the rule of the task in mind and also maintaining awareness of where one is progressing on the sequence of digits (A) and both digits and letters (B). Performance is measured by the time taken to complete the test.

#### Procedure

This study is part of a longitudinal study that started in 2003 investigating the maturation of auditory processes and executive functions in children undergoing musical training and a control group. The study entailed also EEG measurements and other tests for various cognitive skills (these data are reported elsewhere). Measurements were conducted every 2 years, with a new group of 7-year-olds recruited every 2 years. The data, therefore, contains measurements from the same participants but from different years. The data reported here include measurements conducted in the years 2011, 2013, and 2016 for the DS forwards and backwards tests and from the years 2011 and 2013 for the Trail-Making A and B Tests (The TMT was not conducted in 2016). Not all children took part in every measurement. For the tests, 25 children participated in only one measurement, 43 in two, and 38 in all three measurements. For the Trail-Making A and B Tests, 43 children participated in one measurement and 61 in both.

The cognitive tests were conducted before the EEG experiments and took a maximum of 1 h altogether. Upon arrival at the laboratory, written informed consent as well as oral consent was received from the participants. After this, the participants accompanied the experimenter to a room to complete the tests. Experimenters were graduate students, trained to work with children and adolescents and to administer the tests. The space was a comfortably lit sound-proofed room, previously used as an EEG lab, converted for testing use. The experimenter and the subject were orthogonally seated at a table. After the tests were completed, the subject was escorted to the EEG lab, where the EEG cap was attached, and the subject informed more closely about the EEG experiment. EEG measurement ensued. Participants were offered bathroom breaks when needed, and cookies and juice before the EEG measurement as well as half way through it.

## Statistical Analysis

Completion times in the TMT-A and B, and span (number of correctly recited digits) in DS forwards and backwards were included separately in analyses of test performance. The effect of age and group membership on test performance was modeled with linear mixed modeling using the lmer function [Test Score ∼ Age <sup>∗</sup> Group + (1|Subject)] of the Lme4 package in R (Bates, 2005; Bates et al., 2007). Age was mean centered so that the significant effect of Group indicates a group difference in the test score at average age of the participants (mean ages for the DS and TMTs were 14.39 and 13.44 years, respectively). Linear mixed modeling was selected as the analysis approach since it allows a different number of data points across subjects and takes into account the correlated nature of the data within a subject. Values below the Q1–1.5 <sup>∗</sup> IQR (inter-quantile-range) or above Q3 + 1.5 <sup>∗</sup> IQR were classified as outlier and replaced by the lower or upper cutoff values of this range, respectively. This procedure was applied twice for the DS backwards and Trail Making A data and five times for the Trail Making B data.

## RESULTS

Performance of participants in all tests except for the DS backwards test improved with age (**Figures 1**, **2**). Musically trained participants outperformed nontrained participants in the DS forwards test, but not in the backwards test. The musically trained individuals also outperformed nontrained peers in the TMT-A. However, the group difference depended on age. The difference between performance in the Music and Control groups decreased with age. A similar age-dependent effect was also found for performance in Trail-Making Test B. The results are described in more detail below.

## Digit Span Forwards and Backwards

Performance in the DS forwards test improved with age (estimated increase in span per year: 0.22, p < 0.001). The Music group outperformed the Control group (estimates for the Music and Control groups: Control difference in span: 0.56, p < 0.05). The model revealed no evidence that this group difference was dependent on age (Group∗Age interaction, ns).

For the DS backwards test, there were no significant effects of Age or Group or and no significant interaction between these predictors.

## Trail-Making A and B Tests

Subjects' performance in the TMT-A improved with age (estimate for the decrease in completion time per year: −2.87, p < 0.001). There was a trend-level effect of group suggesting that the Music group outperformed the Control group in this test (estimate for the Music < Control difference: −2.50, p < 0.07). However, there was also a significant interaction between Age and Group indicating that the group difference was more pronounced in the younger children and decreased with age (estimate for the Music < Control differences in the change in completion time per year: 1.11, p < 0.05).

The performance in the Trail-Making Test B also improved with age (Estimate for the decrease in completion time per year: −9.25, p < 0.001). For this test, there was a significant effect of group indicating that the Music group outperformed the Control group (estimate for the Music < Control difference: −12.06, p < 0.05) as well as a significant interaction between Age and Group indicating that this group difference decreased with age (estimate for the Music < Control differences in the change in completion time per year: 4.69, p < 0.05).

## DISCUSSION

In this study, we investigated the development of working memory in musically trained and nontrained children and adolescents. Musically trained participants outperformed nontrained peers in the DS forwards test as well as the Trail-Making A and B tests. Furthermore, the group difference in the two latter tests decreased with age. We did not find a significant difference between the Music and the Control groups in the DS backwards test.

## A Musician Advantage in the DS Forwards and Trail-Making A and B Tests

The better performance of the Music group in the DS forwards test concurs with previous research showing a musician advantage in tests for memory (Chan et al., 1998; Fujioka et al., 2006; George and Coch, 2011; Bergman Nutley et al., 2014; Roden et al., 2014; Zuk et al., 2014; Talamini et al., 2016; Guo et al., 2018; review: Talamini et al., 2017). It is noteworthy, however, that studies reporting memory enhancement in musicians have conceptual and methodological differences. In the study by Cheung et al. (2017), verbal memory enhancement was reported

FIGURE 1 | Performance of participants in the Digit Span forwards and backwards, and the Trail-Making A and B tests across all age groups. Music and Control groups represented with different colors.


TABLE 1 | Ages of participants in the Music and Control groups per measurement year for the Digit Span Test.

TABLE 2 | Ages of participants in the Music and Control groups per measurement year for the Trail-Making Test.


based on performance in a task requiring immediate as well as delayed recall of a word list, i.e., working memory as well as long-term memory. In contrast, the current study focused on working memory and employed the classical DS measure. Therefore, this study adds to the evidence for enhanced working memory in musically trained children along with earlier longitudinal studies that have used the similar span tests (Fujioka et al., 2006; Bergman Nutley et al., 2014; Guo et al., 2018).

Interestingly, we found enhancement of performance in only the forwards and not the DS backwards test. Similar results have been obtained in the study by Hansen et al. (2013) who found that musical training was associated with better performance in the DS forwards, but not backwards test. Furthermore, in their study, DS forward performance was connected to performance in musical ability tests. Along the same lines, Lee et al. (2007) found that musically trained adults outperformed nontrained peers in forwards, but not DS backwards. In their study, musically trained children, aged 12 on average, however, outperformed nontrained peer both in the forwards and DS backwards tests. Guo et al. (2018), in turn, found enhancement of the backwards but not the DS forwards after a short-term instrumental training program. Likewise, Bergman Nutley et al. (2014) found only enhancement of DS backwards performance in musically trained adults and children, but unfortunately they did not include DS forwards to allow for comparison. Thus, the literature is mixed as to whether musical training is associated with enhancement of forwards or DS backwards or both.

In any case, the current study found longitudinal evidence in a large sample of subjects in favor of selective enhancement of DS forwards in musically trained children and adolescents. Although negative results cannot be taken as evidence for the null hypothesis that there is no difference between the groups in DS backwards, the substantial statistical power of the current study indicates that a putative undetected group difference in DS backwards would have to be very small and of little practical importance.

In this study, we also found that musically trained participants outperformed nontrained peers in both of the Trail-Making Tests. Previously, adult musicians have been found to outperform nonmusicians in TMT-A and B (Bugos and Mostafa, 2011), or TMT B alone (Strong and Mast, 2019). However, for instance Bialystok and Depape (2009) and Virtala et al. (2014) found no differences between adult musicians and nonmusicians in span tests or the TMT-A or B. Our results concur with previous findings of enhanced performance in TMT-A and B in musically trained individuals but extend these findings to children and adolescents.

## Working Memory Subcomponents Measured by the Digit Span Test and the Trail-Making Test

The Digit Span test has usually been categorized as a simple span test, requiring maintenance of information in memory. Complex span tests in turn require memory maintenance of information during another, unrelated cognitive operation (Wilhelm et al., 2013). However, in a meta-analysis conducted by Redick and Lindsey (2013), the correlation between DS backward performance and performance in n-back tasks as well as a verbal complex span tests was greater than the correlation between DS forward and these tests. Because the DS backward test requires subjects to reverse the order of the strings presented in mind, it also requires working memory updating. Furthermore, the DS forward and backward tests have both been found to recruit in part separate brain networks (Manan et al., 2014). Both activate areas connected to working memory, but with the backward test more strongly activating brain areas related to cognitive control and phonological processing (Gerton et al., 2004; Yang et al., 2015).

As the DS forwards and backwards tests have been found to recruit in part separate working memory processes, it is possible that performance in one but not the other could be enhanced through training. Indeed, selective enhancement of working memory updating (Linares et al., 2018, 2019) and maintenance (Carretti et al., 2007) skills has been found as a result of working memory training in adults. Another study achieved selective impairment of working memory maintenance, but not updating with tDCS (Wang et al., 2018). Our findings of musician advantage in DS forwards, but not backwards points towards selective enhancement of working memory maintenance but not updating.

The Trail-Making Test is usually used to measure executive functions, and neuroimaging and lesion studies have identified that TMT recruits large-scale fronto-parietal brain networks related to these functions (Varjacic et al., 2018). However, there is evidence that performance in the TMT is related primarily to processing speed and working memory ability, as well as fluid intelligence (Sánchez-Cubillo et al., 2009; Satterthwaite et al., 2013). These findings are supported by evidence of genetic correlations between trail-making performance, reasoning ability and general cognitive ability, processing speed, and memory (Hagenaars et al., 2018). Research has also found differences between the cognitive processes underlying TMT-A and B performance. TMT-A is thought to rely mainly upon processing speed, and TMT-B to additionally require working memory and switching ability (Arbuthnott and Frank, 2000; Sánchez-Cubillo et al., 2009). According to a validation study of a computerized version of the TMT, TMT-B performance was explained to a large degree by inhibition and visual working memory skills (Fellows et al., 2017). Similar results were obtained in a factor analysis of TMT performance and several other neurocognitive measures in older individuals, where TMT-B performance was connected to measures of working memory and inhibition, and TMT-A to processing speed (Llinàs-Reglà et al., 2017).

There is also significant overlap between the cognitive processes that the TMT recruits. For instance, working memory skills and working memory capacity are tightly related to fluid intelligence (Kane et al., 2005; Kail, 2007; Demetriou et al., 2014; Salthouse, 2014; Heinzel et al., 2016). It has also been found that working memory predicts switching (Blackwell et al., 2009), presumably through supporting the maintenance of switching rules. Inhibitory control, in turn, may have a role in supporting working memory maintenance (Jonides et al., 1998; Zanto and Gazzaley, 2009; Getzmann et al., 2018).

In behavioral studies, DS backwards performance has been found to predict TMT-B performance, suggesting a partial overlap between the cognitive requirements of these tasks (Sánchez-Cubillo et al., 2009). Both DS backwards and TMT-B engage cognitive control more than DS forwards and TMT-A, but there are also obvious differences between test requirements. TMT-B requires switching attention from one rule and sequence of information in memory to another (letters or numbers). It also requires continuous updating of information about the respondent's position along the series of letters or numbers they are connecting. DS backwards requires recoding a string of digits in mind into reverse order, or updating the representation of the acquired information, but does not require switching between rules or response patterns during responding. The TMT-A and B also engage specifically working memory maintenance, by requiring the participant to keep the response rule and progression along the sequence of letters and numbers in mind. As in this study, we observed a musician advantage in DS forwards but not backwards, and both the TMT-A and B, our results point towards enhancement of skills that are required by these tests but not by the DS backwards test. These include working memory maintenance for DS forwards, as well as TMT-A and B. TMT-B also requires switching ability, not required by the DS backwards test. In addition to working memory maintenance, the musician advantage in TMT-B can therefore also be explained by enhancement of switching ability.

In sum, while the Digit Span and the Trail-Making Test are routinely used to assess and connected to working memory ability, the task impurity problem complicates reaching conclusions about specifically which cognitive functions are measured and to what extent. Our results are best explained by enhancement of working memory maintenance, required by the TMT-A, B and the DS forward test. In addition, enhancement of switching ability may explain the musician advantage in TMT-B.

## How Musical Training Could Exert Selective Effects on the Development of Working Memory

Learning to play a musical instrument or sing requires working memory in a multitude of ways. For example, memorizing and producing sequences of tones when learning music by heart, and responding to changes in music when playing together with others both require working memory. It is possible that musical training during childhood could enhance working memory to the extent that this could be seen as faster development of these skills.

Augmentation of memory skills has been obtained by working memory programs (Melby-Lervåg and Hulme, 2013; Sala and Gobet, 2017b). It has been suggested that programs focusing on core working memory skills are most effective (Morrison and Chein, 2011). These programs are characterized by tasks that contain stimuli in more than one modality, require working memory maintenance and interference control, quick memory encoding and retrieval, change according to the individual's skill level and require high engagement and focus (Morrison and Chein, 2011). Musical training matches these characteristics of core working memory training programs well. For instance, learning to play sheet music requires transformation of visual stimuli into motor actions, which produce sound stimuli. Playing from notes requires concurrent working memory maintenance and updating of visual information from notation and auditory information produced by the musician. Playing from memory adds to working memory updating and maintenance demands through requiring monitoring of the sounds and movements produced and matching them to the model of the musical piece in memory. In ensemble playing, interference control is needed in order to be able to segregate the stream of sound produced by the individuals from those produced by others. Ensemble playing also requires rapid working memory encoding and retrieval, as musicians need to follow not only their own stream of sound but also that of others, and respond to changes in others' output. In joint improvisation, these rapid working memory encoding and retrieval requirements are accentuated. Musical training increases in challenge according to the proficiency of the individual, and successful learning and playing of music requires great engagement and focus. It is therefore feasible that musical training might influence working memory processes.

The results on selective enhancement of the participants' working memory maintenance, but not working memory updating skills, would mean that musical training selectively engages these mechanisms and perhaps selectively supports development of one more than the other. This explanation resonates with findings of different patterns of brain activation during memory encoding and rehearsal, reflecting differences in memory processes in musicians compared to nonmusicians (Schulze et al., 2011). It is feasible that musical training might exert powerful effects specifically on working memory maintenance. Learning to play by ear relies heavily on an individual's capability of acquiring and storing auditory information, melodies, and then reproducing this information immediately. Learning to play from notes, in turn, hones working memory maintenance in the visual domain. Conversely, classical musical training may not as much emphasize the ability to augment the presented information in mind, but rather reproduce it exactly as presented.

An alternative explanation for selective enhancement of working memory maintenance is that musical training improves selective attention. Indeed, there is evidence that selective attention underlies working memory maintenance (Sreenivasan and Jha, 2007; Berry et al., 2009; Gazzaley and Nobre, 2012). Selective attention seems to support encoding and maintenance of information in memory by shielding it from distracting information. This notion is supported by neuroimaging evidence of attenuated processing of distracting information during a working memory maintenance task (Sreenivasan and Jha, 2007). There is also tentative evidence of a musician advantage selective attention, indexed by decreased variability of frontal brain responses to attended stimuli (Strait and Kraus, 2011; Strait et al., 2015). It is possible that music training, for instance through playing in ensembles, and learning to focus on only the sound produced by one's instrument, could develop selective attention, which is of benefit in tasks requiring memorization of aurally presented information. Selective attention may also be required and therefore trained in learning to play sheet music. For instance, in learning to play piano, there are two notations to follow—one for the right and one for the left hand. Selectively attending to this visual information is required for successful production of sound. Trail-Making Test performance has been connected to selective attention skills, as measured by the ability to recognize speech in noise (Ellis et al., 2016), but to the knowledge of the authors, similar results on a connection between specifically selective attention and Digit Span performance have not been obtained. In future studies investigating working memory in musically trained individuals, including measures for selective attention would help further elucidate this possible connection.

#### Augmented Developmental Trajectories in Trained and Nontrained Participants

The difference in performance between the Music and Control group observed in this study diminished over time. It is possible that musical training enhances the development of working memory maintenance or selective attention, which can be seen as faster maturation in the Music group, but with time the Control group children attain the same level of performance. This explanation is contrasted by studies that have found enhanced working memory still in musically trained adults (Chan et al., 1998; Bialystok and Depape, 2009; George and Coch, 2011; Zuk et al., 2014; Talamini et al., 2016; Ding et al., 2018). There are, however, also contrary findings. In one study, adult nonmusicians were found to outperform musicians in tests requiring immediate as well as delayed recall of newly acquired information, with no significant group differences in performance in the TMT-A or B or DS (Virtala et al., 2014). Thus, the existence of working memory benefits associated with musical expertise in adulthood should be considered with caution.

As stated before, the task impurity problem complicates understanding of which cognitive functions are putatively most affected by musical training. Furthermore, the maturation of other executive functions may influence the maturation of subprocesses of working memory. For instance, the protracted development of inhibitory control and shifting ability influence performance in complex working memory span tasks that require these skills in addition to maintaining information in working memory (Jonides et al., 1998; Schleepen and Jonkman, 2009). Future longitudinal studies investigating working memory development should include measures that allow for disentangling the unique contributions of development in these cognitive skills to the development of working memory.

Since our study lacks baseline measurement of working memory skills prior to musical training, our results may also be explained by pre-existing differences between the two groups, instead of developmental causal explanation (for a study pointing towards pre-existing differences in intelligence, which may explain better performance in executive functions, see Schellenberg, 2015). The lack of the baseline measurement is caused by our choice to minimize the length of the experimental session when the children were only 7 years old and about to start their instrumental training. We added more behavioral and ERP paradigms gradually when the children became older and could then better cope with longer sessions. By this arrangement, we were able to minimize the number of drop-out participants—a serious problem in all longitudinal studies (for discussion, see Tervaniemi et al., 2018; Barbaroux et al., 2019).

One might consider the lack of random group allocation also as a caveat of our study. However, in our view, it is not feasible to plan a longitudinal study for several years on children and adolescents, at least if a control group is included. If the participants are not motivated, they either quit the training, do not participate in the investigations, or both. Even in shorter longitudinal studies, it has been a challenge to maintain the motivation of the participants unless the study is conducted in special circumstances such as summer camp like the study environment in the innovative study by Moreno et al. (2011). Thus, the current choice of having a longitudinal study on children who chose their music training based on their own and their family's initiative, gives solid evidence about the development of cognitive functions of music-oriented and control children obtained in an ecologically valid context.

## SUMMARY AND CONCLUSIONS

In this study, we investigated the maturation of working memory in musically trained and nontrained children and adolescents. We found different patterns of development for different subcomponents of working memory in the trained and nontrained participants. Musically trained individuals had better performance in tests tapping working memory maintenance, but not updating, than musically nontrained individuals. However, the difference lessened over time, as nontrained participants attained a similar level of performance as trained participants. Our results extend previous findings of a musician advantage in tests for working memory by specifying which subcomponents of working memory may be most affected, and by clarifying the trajectory of enhancement from childhood into adolescence.

## DATA AVAILABILITY STATEMENT

The datasets for this manuscript are not publicly available because the consent form signed by participants did not include

## REFERENCES


permission for distribution of data outside the research group. Requests to access the datasets should be directed to Katri Saarikivi, katri.saarikivi@helsinki.fi.

#### ETHICS STATEMENT

The studies involving human participants were reviewed and approved by The University of Helsinki Ethical Review Board in the Humanities and Social and Behavioural Sciences. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

#### AUTHOR CONTRIBUTIONS

KS and VP participated in planning the experiment, conducting the measurements, analyzing the data and writing the manuscript. MT and MH were responsible for establishing the longitudinal study, planning the experiments and measurement paradigms included, and reviewing and writing the manuscript.

#### FUNDING

This research has been funded by The Jenny and Antti Wihuri Foundation, The Finnish Cultural Foundation and The Academy of Finland.


**Conflict of Interest**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Saarikivi, Huotilainen, Tervaniemi and Putkinen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# New Perspectives on Music in Rehabilitation of Executive and Attention Functions

#### Yuko Koshimori\* and Michael H. Thaut

Music and Health Research Collaboratory, Faculty of Music, University of Toronto, Toronto, ON, Canada

Modern music therapy, starting around the middle of the twentieth century was primarily conceived to promote emotional well-being and to facilitate social group association and integration. Therefore, it was rooted mostly in social science concepts. More recently, music as therapy began to move decidedly toward perspectives of neuroscience. This has been facilitated by the advent of neuroimaging techniques that help uncover the therapeutic mechanisms for non-musical goals in the brain processes underlying music perception, cognition, and production. In this paper, we focus on executive function (EF) and attentional processes (AP) that are central for cognitive rehabilitation efforts. To this end, we summarize existing behavioral as well as neuroimaging and neurophysiological studies in musicians, non-musicians, and clinical populations. Musical improvisation and instrumental playing may have some potential for EF/AP stimulation and neurorehabilitation. However, more neuroimaging studies are needed to investigate the neural mechanisms for the active musical performance. Furthermore, more randomized clinical trials combined with neuroimaging techniques are warranted to demonstrate the specific efficacy and neuroplasticity induced by music-based interventions.

#### Edited by:

Paul J. Colombo, Tulane University, United States

#### Reviewed by:

Aline Moussard, Centre de Recherche de l'Institut Universitaire de Gériatrie de Montréal (CRIUGM), Canada Alfredo Raglio, Scientific Clinical Institutes Maugeri IRCCS (ICS Maugeri), Italy

\*Correspondence:

Yuko Koshimori yuko.koshimori@mail.utoronto.ca

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 03 June 2019 Accepted: 05 November 2019 Published: 19 November 2019

#### Citation:

Koshimori Y and Thaut MH (2019) New Perspectives on Music in Rehabilitation of Executive and Attention Functions. Front. Neurosci. 13:1245. doi: 10.3389/fnins.2019.01245

Therapy

## INTRODUCTION

Brain and biomedical research involving music has shown that music is a highly structured auditory language engaging complex perception, cognition, and motor control in the brain (Peretz and Zatorre, 2005; Stewart et al., 2006; Alluri et al., 2011; Lee et al., 2011; Thaut et al., 2014) and has a distinct influence on the brain by stimulating complex cognitive (Leggieri et al., 2019), affective (Koelsch, 2014), and sensorimotor processes (Thaut et al., 2015; Crasta et al., 2018; Schaffert et al., 2019).

Keywords: executive function, attention processes, music neuroscience, music improvisation, Neurologic Music

As musical sound patterns are dynamic, continuously unfold, and change in time, external auditory input may serve well for cognitive rehabilitation, in particular for the domains of executive function (EF) and attentional processes (AP) by helping to adapt to a changing environment. Music by its nature drives attention exogenously (Klein and Lawrence, 2011; Thaut and Gardiner, 2014). Furthermore, music is a multidimensional stimulus consisting of multiple acoustical elements (e.g., pitch, loudness, tempo, rhythms, timbre, melody, harmony) that exist in temporally patterned simultaneity and sequentiality to drive AP (Thaut and Gardiner, 2014). In addition, active music performance involves moment-to-moment feedback systems, which facilitates monitoring, adjusting, and updating (Gardiner and Thaut, 2014). Further, there is some evidence that musicians

perform better on EF/AP (e.g., Hanna-Pladdy and Mackay, 2011) as well as there are prefrontal structural and functional differences between musicians and non-musicians (e.g., Fauvel et al., 2014; Groussard et al., 2014). Furthermore, research shows that active musical performance extensively modulates the prefrontal activity in musicians (Erkkinen and Berkowitz, 2018).

Therefore, this sensory language may effectively be used as a therapy to induce neuroplasticity in the brain affected by disorders, diseases, and injuries (Särkämö et al., 2014). With this neuroscientific knowledge, music may be able to yield more specific therapeutic and stimulation outcomes for targeted functions. In this paper, we first briefly summarize existing studies investigating the effects of formal musical training on the EF/AP performance as well as on brain structure and function. Second, we discuss the neuroimaging/neurophysiological studies demonstrating the prefrontal neural correlates and neuroplasticity of active musical performance. Third, we present the body of literature in music-based interventions in healthy and clinical populations, focusing on the EF/AP findings. Finally, we present the future directions of research in this field to move forward the neuroscientific approach, which facilitates further advancement of music-based interventions for the EF/AP stimulation and rehabilitation.

## MUSIC TRAINING ON EF/AP AND BRAIN STRUCTURE AND FUNCTION IN MUSICIANS

Some correlational and cross sectional studies have supported the beneficial effects of music training on EF/AP in older musicians (Hanna-Pladdy and Mackay, 2011; Hanna-pladdy and Gajewski, 2012; Liu et al., 2012; Amer et al., 2013; Moussard et al., 2016; Strong and Mast, 2019), younger musicians (Moradzadeh and Blumenthal, 2015; Okada and Slevc, 2018; Medina and Barraza, 2019), and musically trained children (Zuk et al., 2014). In addition, several studies demonstrated structural and functional changes associated with formal musical training in the prefrontal regions. For example, greater gray matter volume was associated with increasing musical practice in the multiple regions including supplementary motor area (SMA), superior/middle and medial frontal cortex, and insula (Groussard et al., 2014; James et al., 2014). Furthermore, resting-state functional connectivity revealed significantly greater connectivity density in the medial and lateral prefrontal regions as well as temporoparietal junction in musicians (Fauvel et al., 2014; Luo et al., 2014; Klein et al., 2016; Tanaka and Kirino, 2016).

A few studies investigated the brain activity during an EF task in musicians. One EEG study using a visual go/no-go task demonstrated that a group of 17 older musicians exhibited a significantly larger difference in the N2 amplitude (reflecting a conflict detect signal or inhibition of a prepotent response) in the central midline sites between go and no-go conditions compared to a group of 17 older non-musicians (Moussard et al., 2016). The music group also performed significantly better on the task, demonstrated with fewer no-go errors. Further, the music group showed a significant correlation between the N2 amplitude and the no-go task performance. However, there were no significant correlations between the N2 amplitude and any measures of musical background (age of first instruction, years of musical instruction, and hours of current practice). On the other hand, the measures of musical background, but not the task performance were significantly associated with the P3 amplitude that showed no significant group difference in amplitude but differential topography between groups (anterior shift in musicians). Two other studies demonstrated differential brain activities during EF tasks in younger musicians (Moreno et al., 2014) and musically trained children (Zuk et al., 2014) compared to non-musician groups without showing task performance differences between groups.

In summary, the cross-sectional or correlational studies in musicians have shed light on the potential benefit of formal musical training on EF/AP and brain changes in the prefrontal area. The task-based studies consistently demonstrated that individuals with formal musical training have differential brain activity during EF tasks in relative to those without it. However, these studies do not allow to determine the direct and causal effects of music training on EF/AP and brain changes. Furthermore, in these musician studies, the specific effects of different types of musical training on the EF/AP as well as on brain structure and function in the prefrontal area are still unclear.

#### NEURAL MECHANISMS FOR ACTIVE MUSICAL PERFORMANCE

Several neuroimaging/neurophysiological studies investigated the brain activity during active musical performance in musician. One study using near-infrared spectroscopy (NIRS) reported that piano playing engaged the frontal activity and playing more complex musical piece activated significantly wider frontal areas in musicians (Hashimoto and Okamoto, 2006). Furthermore, the brain activity during musical improvisation was investigated in amateur and professional musicians with varying experience of improvisation. These studies suggest that this creative musical activity extensively engages the brain activity in the prefrontal regions such as dorsal premotor area, pre-SMA, SMA, medial prefrontal cortex (mPFC), dorsolateral prefrontal cortex (DLPFC), anterior cingulate cortex (ACC), inferior frontal gyrus (IFG), and anterior insula (Brown et al., 2006; Pinho et al., 2014; Beaty, 2015; Erkkinen and Berkowitz, 2018; Loui, 2018). However, the directionality of improvisationinduced brain activity varied across the studies. The enhanced prefrontal brain activity and connectivity was interpreted as a reflection of goal-directed, top-down processing including motor planning, response selection, inhibition of competing stimuli, and conscious monitoring (Bengtsson, 2007; Villarreal et al., 2013). The dissociation of decreased lateral and increased medial prefrontal brain activity was interpreted as a reflection of a disconnection or disintegration of the lateral regions to suppress the top-down conscious control to generate spontaneous ideas, engaging the medial prefrontal regions, which was observed in more experienced improvisers (Limb and Braun, 2008;

Liu et al., 2012). Further, the decreased lateral prefrontal activity and functional connectivity was interpreted as indicating that experts could spare EF load for the highly automatized activity (de Manzano and Ullén, 2012; Pinho et al., 2014; Rosen et al., 2016). The observed activation or deactivation in the lateral prefrontal cortex may also depend on the degree of top-down control required by a given improvisation task (Berkowitz and Ansari, 2008; Pinho et al., 2014). For example, the deactivation in the fronto-parietal regions was greater as the freedom of improvisation task increased (Berkowitz and Ansari, 2008). On the other hand, the enhanced lateral prefrontal activity observed in experienced improvisers may be due to the constrains imposed on the improvisation task (Bengtsson, 2007). In addition, the existing studies demonstrated that musical improvisation induced greater brain activity in the prefrontal regions such as DLPFC compared to simple musical repetitive or patterned tasks (Berkowitz and Ansari, 2008; Villarreal et al., 2013) and reproducing music (de Manzano and Ullén, 2012).

In summary, piano playing and musical improvisation extensively engages the prefrontal activity. This is likely because these musical activities involve EF/AP (Hannon and Trainor, 2007; Beaty, 2015). In addition, musical improvisation induces greater activation of the prefrontal regions and functional connectivity in musicians with less experience of improvisation and compared to simple musical repetitive and patterned tasks. Therefore, these musical activities may have some potential for EP/AP stimulation and neurorehabilitation.

## MUSIC-BASED INTERVENTIONS FOR EF AND AP STIMULATION IN NON-MUSICIANS

There are a few studies that have demonstrated benefits of learning to play a musical instrument on EF in older healthy nonmusicians. In one randomized control trial (RCT), 16 older adults received individualized piano lessons for 6 months (Bugos et al., 2007). They exhibited improvement on the Trail Making Test-B (TMT-B) over time during the period. This was not observed in a control group consisting of 15 older adults. In another study, a 4-month weekly group piano lessons designed and implemented by a professional music teacher and pianist resulted in improved performance on the Color-Word Stroop test in 13 older adults. This cognitive improvement was not observed in 16 older adults of the control group (Seinfeld et al., 2013). Another study further supported the benefit of group piano lessons on EF such as verbal flexibility and inhibition control in 24 older adults (Bugos, 2010). The EF improvement was significantly greater compared to a music appreciation group consisting of 22 older adults who had learned musical elements while listening to music. However, both music groups significantly improved the EF performance.

In addition, one EEG study with a pseudo-randomized design investigated the effects of music making on inhibition control and interference (Alain et al., 2019). Sixty healthy older non-musicians received either 3-month musical, visual art, or no training. The music-based intervention included music making using body percussion, and voice and non-pitched musical instruments, as well as learning basic music theory, and melody and harmony concepts by singing simple canons. It was provided by a professional music teacher. Transient differential neural activities were observed in the fronto-central sites in both intervention groups without showing improvement on the task performance.

In summary, learning to play the piano may have beneficial effects on EF in healthy older non-musicians. This may be because it is a complex process, requiring processes for the coordination of multiple sensory modalities, motor control, monitoring, working memory, inhibition, and attentional shifting (Hannon and Trainor, 2007; Seinfeld et al., 2013). However, the sample sizes of these studies are small with additional methodological limitations in some of the studies such as no randomization, no blind assessors, no active control group, and unknown intervention compliance.

## MUSIC-BASED INTERVENTIONS FOR EF AND AP NEUROREHABILITATION IN CLINICAL POPULATIONS

Several clinical studies incorporated musical improvisation and instrument playing as their intervention techniques. A few pilot studies investigated the effects of Neurologic Music Therapy (NMT) training for EF/AP. It is guided by a NMT-certified music therapist and is based on group improvisation projects with a special emphasis on attention switching that requires participants to musically respond to specific musical cues (Gardiner and Thaut, 2014; Thaut and Gardiner, 2014). In a pseudo-experimental design (due to institutional constraints of U.S. Veterans Administration), 31 persons with acquired brain dysfunction received one 30-min NMT training session (Thaut et al., 2009). A paired t-test analysis showed that the music group significantly improved cognitive flexibility assessed on TMT-B with a large effect size (d = 1.21), but not memory functions. A control group (N = 23) without any intervention did not show any change on the test. In addition, the music group significantly increased their confidence in the EF skill. Other NMT studies included children/adolescents with neurodevelopmental disorders such as Attention Deficit Hyperactivity Disorder (Abrahams and van Dooren, 2018) and Autism Spectrum Disorder (Pasiali et al., 2014). In both studies, the interventions consisted of a 45-min weekly training session over a period of 6 weeks. Both studies reported some improvement on AP. However, both studies included small sample sizes (2 participants in the NMT group in the former study and 9 participants with varying severity in the latter study). Due to the methodological limitations, these NMT preliminary findings need to replicate in RCTs with large sample sizes.

In addition, one RCT included 35 older adults with Mini Mental State Examination scores ≥18 and employed a 12 biweek group music cognitive training (Biasutti and Mangiacotti, 2018). The training was delivered by a specialist with music and neuropsychology background and consisted of voice and instrumental improvisation, which was created based on the

framework of different music-based interventions including NMT (Thaut, 2005). Repeated ANOVA showed that the music group (N = 18) showed significant improvement in mental flexibility assessed on verbal fluency and showed a trend toward significance in selective attention assessed on Attentional Matrices test compared to an active control group (gymnastic activities; N = 17).

In another RCT, 28 chronic stroke patients without extensive musical experience received either 30-h of music-supported therapy (MST) or conventional physical training over a 10-week period (Fujioka et al., 2018). The MST protocol included mapping functional movements on playing musical instruments, which was based on the NMT technique (Thaut, 2005), as well as music-making with a therapist. The music group showed significant improvement in cognitive flexibility measured by TMT-B in the mid-intervention whereas the active control group showed the improvement post-intervention.

In one RCT combined with functional near-infrared spectroscopy (fNIRS), 39 persons with mild cognitive impairment were assigned either to an instructor-guided 12-week movement music therapy (MMT) or a comparable active non-musical control therapy (Shimizu et al., 2018). MMT included light exercises synchronized to the background music and playing a percussion instrument accompanied with familiar songs. The MMT group showed enhanced mPFC activity and increased functional connectivity within the prefrontal area compared to the control group. ANOVA did not show a significant group effect on the Frontal Assessment Battery score. This is likely because both groups showed some improvement post-intervention in which the MMT group showed a significant improvement while the active control group showed a trend toward significance, shown using a paired t-test. A major limitation of this study was that the number of participants in each group was imbalanced (MMT: N = 30 vs. control: N = 9).

There are also RCTs that demonstrated the beneficial effects of other music-based interventions on EF/AP. For example, a 12-week singing (four different familiar songs) intervention guided by a professional choir instructor led to a significant improvement on inhibitory control in 31 persons with mild Alzheimer's disease (AD; Pongan et al., 2017). This positive effect was also observed in the active control group (painting) of 28 mild AD participants. Another RCT employed Sound Training for Attention and Memory in Dementia (STAM-Dem; Ceccato et al., 2012). STAM-Dem is a 12-week musicbased manualized protocol delivered by a music therapist in which the participants perform specific movements to instructed sound stimuli that requires selective attention. It consists of multiple phases with increasing difficulty. Test score changes on selective attention were significantly greater (improved test score) in the music group of 27 participants compared to the control group of 23 participants who received standard care. In addition, 1-h daily listening to participant-selected favorite music guided by a music therapist for 2 months significantly improved AP recovery in 17 persons with acute post-stroke compared to both active and passive control groups of 19 and 17 participants, respectively (Särkämö et al., 2008). Furthermore, the AP improvement was also associated with a significant increase in the prefrontal gray matter volume in those with left hemisphere stroke (Särkämö et al., 2014).

In summary, clinical studies showed some potential of active music-based interventions such as musical improvisation and instrument playing-based activities to enhance EF/AP. However, their specific effectiveness is still unclear as active control interventions are often effective as well. In addition, there is a paucity of research literature with various clinical conditions. With few neuroimaging studies, it is still to be determined whether the music-based interventions have engaged executive control areas or whether neuroplasticity had occurred in those areas. Further, some studies show critical limitations in the research design such as a small sample size, no randomization, no active control condition, no follow-up assessment, no analysis of a group × time interaction effect, and no trained therapist.

## FUTURE DIRECTIONS AND PERSPECTIVES FOR EF/AP STIMULATION AND NEUROREHABILITATION

The current paper summarizes the effects of formal musical training, active musical performance, and music-based interventions on EF/AP and associated brain activity in musicians, non-musicians, and clinical populations to present the future directions and perspectives of music-based interventions for the EF/AP stimulation and neurorehabilitation. Active musical performance such as piano playing and musical improvisation engages the prefrontal activity and some intervention studies showed the potential of these musical activities for a basis of the EF/AP stimulation and neurorehabilitation techniques. One study showed that enhanced inhibition control was associated with the extent of involvement of the musical activity in the motor system via the cognitive control system in musicians (Slater et al., 2017), suggesting that instrument playing may be more effective to enhance inhibition control compared to singing, for example. However, neuroimaging studies are needed to uncover the neural mechanisms for piano playing and musical improvisation in non-musicians and to demonstrate the direct link between these musical activities and the EF/AP performance. It is also useful for such studies to contrast images between musical activities and the resting state activity as well as between the two musical activities to show what areas are involved in each musical activity and how similar or different in the brain activity induced by these musical activities. Further, more RCTs with a rigorous design are warranted to replicate the findings. In addition, it is important to address whether these interventions are feasible and effective for different age groups and clinical populations with different clinical characteristics (e.g., severity, acute vs. chronic).

Furthermore, it is crucial to design music-based interventions that can tap into targeted processing components of EF/AP. Some of the clinical studies specifically targeted an enhancement of cognitive flexibility and selective attention, utilizing acoustic elements of music as cues or target stimuli (Thaut et al., 2009;

Ceccato et al., 2012). However, as music is tightly connected to the attention system (Särkämö et al., 2008), it can temporally enhance cognitive performance associated with some of the AP as well as learning and memory via increased arousal (Hommel et al., 1990; Thaut et al., 2005; Thompson et al., 2006; Särkämö et al., 2008). Therefore, future RCTs should include music listening as an active control condition to determine the specific efficacy of active music-based interventions above the music-induced arousal effect on EF/AP with follow-up assessments.

Lastly, neuroimaging techniques should be combined to determine whether music-based interventions engage the brain activity of the executive control regions and networks as well as whether neuroplasticity has occurred in the expected regions and networks following the interventions. These studies are also useful to determine whether the observed patterns of hyper- or hypo-activations are due to compensation, neural efficiency, or attempted compensation (Grady, 2012). Such research studies can help growth in this domain, shedding

#### REFERENCES


an increasing light on the neural mechanisms of how musicbased intervention techniques can tap into higher cognition to facilitate maintenance, enhancement, as well as recovery of cognitive functions.

### AUTHOR CONTRIBUTIONS

YK conceived the concept of the manuscript. YK wrote the first draft of the manuscript and MT wrote the sections of the draft. Both authors contributed to manuscript revision, and read and approved the submitted version of the manuscript.

## FUNDING

This research was supported by the Canada Research Chair Program.



middle cerebral artery stroke. Brain 131(Pt 3), 866–876. doi: 10.1093/brain/ awn013


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Koshimori and Thaut. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Short-Term Choir Singing Supports Speech-in-Noise Perception and Neural Pitch Strength in Older Adults With Age-Related Hearing Loss

Ella Dubinsky<sup>1</sup> \*, Emily A. Wood<sup>1</sup> , Gabriel Nespoli<sup>1</sup> and Frank A. Russo1,2

<sup>1</sup> Department of Psychology, Ryerson University, Toronto, ON, Canada, <sup>2</sup> Toronto Rehabilitation Institute, Toronto, ON, Canada

Prior studies have demonstrated musicianship enhancements of various aspects of auditory and cognitive processing in older adults, but musical training has rarely been examined as an intervention for mitigating age-related declines in these abilities. The current study investigates whether 10 weeks of choir participation can improve aspects of auditory processing in older adults, particularly speech-in-noise (SIN) perception. A choir-singing group and an age- and audiometrically-matched do-nothing control group underwent pre- and post-testing over a 10-week period. Linear mixed effects modeling in a regression analysis showed that choir participants demonstrated improvements in speech-in-noise perception, pitch discrimination ability, and the strength of the neural representation of speech fundamental frequency. Choir participants' gains in SIN perception were mediated by improvements in pitch discrimination, which was in turn predicted by the strength of the neural representation of speech stimuli (FFR), suggesting improvements in pitch processing as a possible mechanism for this SIN perceptual improvement. These findings support the hypothesis that short-term choir participation is an effective intervention for mitigating age-related hearing losses.

#### Keywords: aging, musical training, speech-in-noise, frequency, hearing

## INTRODUCTION

As the population ages, and the expectation of longevity increases, a growing interest in healthcare is the promotion of healthy aging – the maintenance of mental, social, and physical wellbeing as one ages, in order to retain independence and lead a high-quality life. Aging is associated with declines in cognitive functioning (e.g., decreased working memory and attentional control; for review, see Fabiani, 2012), and deteriorating sensory-perceptual processes (e.g., Fozard, 1990). Declines in hearing can make it difficult for aging individuals to maintain personal relationships and engage socially, and have been linked to feelings of isolation and depression (Arlinger, 2003; Djernes, 2006). Although assistive technologies (e.g., hearing aids) can target aspects of peripheral hearing loss, persistent perceptual deficits are widely reported (e.g., Killion, 1997). One prevalent example is the loss of the ability to perceive speech in a noisy environment (Salomon, 1986; Chmiel and Jerger, 1996; Gomez and Madey, 2001; Ricketts and Hornsby, 2005; Betlejewski, 2006). Counseling programs may improve communication outcomes associated with age-related auditory declines, but they do not appear to influence speech-in-noise problems (Hickson et al., 2007, 2019). While some auditory rehabilitation programs have been shown to be moderately effective in mitigating speech-in-noise problems (Kricos and Holmest, 1996;

#### Edited by:

Claude Alain, Rotman Research Institute (RRI), Canada

#### Reviewed by:

Jeanette Tamplin, University of Melbourne, Australia Emily B. J. Coffey, Concordia University, Canada

#### \*Correspondence:

Ella Dubinsky ella.dubinsky@ryerson.ca

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

Received: 05 July 2019 Accepted: 11 October 2019 Published: 28 November 2019

#### Citation:

Dubinsky E, Wood EA, Nespoli G and Russo FA (2019) Short-Term Choir Singing Supports Speech-in-Noise Perception and Neural Pitch Strength in Older Adults With Age-Related Hearing Loss. Front. Neurosci. 13:1153. doi: 10.3389/fnins.2019.01153

**148**

Sweetow and Sabes, 2006; Song et al., 2012), they require a high level of motivation and are not appropriate for all cases (Sabes and Sweetow, 2007; Saunders et al., 2016). As such, there is presently a great demand for complementary interventions that target age-related auditory declines, particularly ones that are engaging and scalable, and that show efficacy with regard to speech-in-noise perception. Developing and evaluating an intervention – and its proposed mechanism(s) for change – involves consideration of biological and experiential contributors to these abilities, beginning with age-related hearing loss and the role it plays in speech-in-noise perception.

Hearing loss can occur at different stages in the auditory system. Peripheral hearing loss refers to the reduction in efficient sound transmission through the bones of the middle ear (conductive hearing loss), and the deterioration of the outer and inner hair cells (sensorineural hearing loss; Arlinger, 2003; Yueh et al., 2003; Wingfield et al., 2005). Central hearing loss refers to the degradation of neural mechanisms that relay sound information from the cochlea to the brain, resulting from longterm attenuation of neural input from the cochlea, as well as age-related changes in neuronal responses to sound (Syka, 2002; Frisina and Walton, 2006; Yamasoba et al., 2013). Although peripheral losses can be remediated to some degree through the use of assistive technologies such as hearing aids (or, in extreme cases, cochlear implants), central processing deficits seem to persist in spite of such interventions (Chmiel and Jerger, 1996; Killion, 1997). These central processing deficits – including age-related declines in the synchrony of neural firing (Pichora-Fuller and Schneider, 1992; Frisina and Frisina, 1997; Pichora-Fuller et al., 2007), length of recovery time (Walton et al., 1998), and numbers of neurons in auditory nuclei (Frisina and Walton, 2006) – have been associated with agerelated losses in key auditory perceptual abilities, such as sound localization (Abel et al., 2000), pitch discrimination (Raz et al., 1989), duration judgments (Fitzgibbons and Gordon-Salant, 1994; Schneider et al., 1994), mistuned harmonic detection (Alain et al., 2001), and speech-in-noise perception (Pichora-Fuller et al., 1995; Russo and Pichora-Fuller, 2008; Schneider et al., 2010). Of the perceptual deficits, the loss of speech-innoise perception seems to have the most severe impact on the aging adult's quality of life (e.g., Pichora-Fuller et al., 1995, 2007; Anderson et al., 2011).

Speech-in-noise perception refers to the ability to track a voice in a complex acoustic environment, such as a crowded room with many people talking. Vital in social settings and everyday interactions, the loss of this skill can immensely impact an individual's ability to maintain independence, emotional wellbeing, and quality of life as they age (Salomon, 1986; Gomez and Madey, 2001; Betlejewski, 2006). This age-related decline also appears to persist in spite of peripheral remediation, and can even occur in adults with normal audiometric thresholds (Cruickshanks et al., 1998; Schneider B. A. et al., 2002; Tremblay et al., 2003; Gordon-Salant, 2005; Souza et al., 2007; Vermiglio et al., 2012; Alain et al., 2014); in research studies involving older individuals, pure-tone thresholds tend to be a poor predictor of speech-in-noise perception (Dubno et al., 1984; Hargus and Gordon-Salant, 1995; Kim et al., 2006; Souza et al., 2007).

One way to elucidate the neural underpinnings of speech-innoise perception is through the use of electroencephalography (EEG recordings) to study cortical and subcortical responses to acoustic stimuli (Tremblay et al., 2003; Musacchia et al., 2008; Anderson et al., 2011). Of particular interest here, the auditory brainstem – a collection of nuclei involved in afferent and efferent auditory processing – has been shown to encode spectral and temporal acoustic information with a high degree of precision (Clinard et al., 2010; Skoe and Kraus, 2010).

One component of the auditory brainstem response (ABR; Skoe and Kraus, 2010) that has been implicated in perceptual deficits – in particular, speech-in-noise perception – is the frequency following response (Johnson et al., 2005; Skoe and Kraus, 2010). This response consists of phase-locked neural activation, wherein the inter-spike intervals correspond to the fundamental frequency (F0) of the sound input (Hoormann et al., 1992). On the basis of animal work involving ablations, the primary source of the FFR appears to be the inferior colliculus (Smith et al., 1975), however recent work also suggests cortical contributions (Lehmann and Schönwiesner, 2014; Coffey et al., 2016, 2017a).

The FFR provides a useful index of the auditory nervous system's representation of periodic sound – such as a vowel in speech – through sustained synchronous neural phase-locking. Spectral and temporal features of the FFR, obtained through signal analysis, are associated with different aspects of neural pitch encoding. A fast Fourier transform (FFT) of the signal yields a spectral analysis that can be used to assess the strength of the neural representation of periodic sound input (Skoe and Kraus, 2010). Another feature, the inter-trial phase coherence (ITPC), can be used to assess the extent of consistency in the neural response to periodic sound input – i.e., the extent of phase alignment (synchronization) in oscillatory responses (e.g., Delorme and Makeig, 2004).

In the perception of speech cues, the ability to discern and track changes in pitch over time gets significantly more difficult when the signal-to-noise ratio (SNR) decreases (e.g., Killion et al., 2004). By the time an acoustic signal reaches the auditory cortex of an aging adult, it is likely to have undergone both peripheral and neural distortion (due to age-related declines in sensorineural hearing, and neural noise introduced as the signal is relayed through the ascending auditory pathway, respectively), leading to diminished preservation of key temporal and spectral characteristics (e.g., Yueh et al., 2003; Clinard et al., 2010). This suggests a possible mechanism for age-related declines in speech-in-noise perception (and other auditory perceptual abilities which rely on pitch discrimination), whereby age-related central processing deficits (such as reduced FFR fidelity) result in downstream perceptual impairments which tend to persist in spite of peripheral remediation. In terms of mitigating and preventing these declines, one activity that appears to confer some benefits against certain age-related auditory losses is musical experience (e.g., Alain et al., 2014).

Musicianship is purported to have some benefits outside the musical domain, but the most convincing positive effects have been observed in regards to the auditory system (for review, see Herholz and Zatorre, 2012). Over the course of

training, musicians are taught to attend to fine-grained acoustic features – including pitch, timing, and timbre – that contribute to human perception of sound (Kraus et al., 2009; Kraus and Chandrasekaran, 2010). This trained sensitivity to minute acoustic changes is thought to promote the enhancement of auditory perceptual abilities, including those which decline throughout the aging process (Musacchia et al., 2007; Parbery-Clark et al., 2012; Zendel and Alain, 2012; Alain et al., 2014). In studies comparing auditory perception in musicians and non-musicians, musical experience has been associated with a relative advantage in processing some of the same features that have been linked to age-related declines. These benefits include improved pitch discrimination (Kishon-Rabin et al., 2001; Micheyl et al., 2006; Schellenberg and Moreno, 2009; Bidelman et al., 2011b; Meha-Bettison et al., 2018), gap and duration judgments (Rammsayer and Altenmüller, 2006; Zendel and Alain, 2012; Habibi et al., 2014; Donai and Jennings, 2016), mistuned harmonic detection (Koelsch et al., 1999; Zendel and Alain, 2009), and perception of speech-in-noise (Parbery-Clark et al., 2009; for review see Coffey et al., 2017b).

Musicians also demonstrate structural and functional differences in the neural substrates of auditory, sensory-motor, and visuospatial processing (Musacchia et al., 2007; Kraus and Chandrasekaran, 2010; Schlaug, 2015). Among musicians, musical aptitude is correlated with an increase in gray matter volume in the primary auditory cortex, as well as somatosensory and motor areas, the inferior temporal gyrus, hippocampus, and corpus callosum regions (Schlaug et al., 1995; Schneider P. et al., 2002; Gaser and Schlaug, 2003; Herdener et al., 2010). In addition to structural changes in associated brain regions, musicians also demonstrate functional improvements in neural responses to sound, at cortical and subcortical levels in the auditory processing pathway. Compared with non-musicians, musicians demonstrate enhanced neural responses and activation in the auditory cortex (Koelsch et al., 1999; Schneider P. et al., 2002; Pantev et al., 2003; Shahin et al., 2003; Kuriki, 2006; Besson et al., 2007; Zendel et al., 2015; Habibi et al., 2016) and the auditory brainstem (Musacchia et al., 2007; Wong et al., 2007; Lee et al., 2009; Parbery-Clark et al., 2009; Strait et al., 2009; Bidelman and Krishnan, 2010). Notably, musicians demonstrate improvements in both FFR strength (Musacchia et al., 2007, 2008; Bidelman et al., 2011b; Slater et al., 2017) and consistency (Parbery-Clark et al., 2009; Strait et al., 2009; Bidelman et al., 2011a,b; Skoe and Kraus, 2013; Slater et al., 2017); these benefits appear largely resistant to normal age-related declines (Parbery-Clark et al., 2012; White-Schwoch et al., 2013). Because of the importance of pitch processing across auditory perceptual domains, FFR improvements have been suggested as one of the mechanisms through which musicianship enhances auditory perceptual abilities (Kraus and Chandrasekaran, 2010; Bidelman et al., 2011b; Coffey et al., 2017a).

While the aforementioned studies suggest that musical training can improve speech-in-noise perception, they are cross-sectional studies which should preclude causal inferences (Schellenberg, 2019); further, not all studies have found an effect (Ruggles et al., 2014; Boebinger et al., 2015; Madsen et al., 2017, 2019). One way to resolve these inconsistencies is through the evaluation of musical training outcomes in a controlled experimental design (i.e., a longitudinal context), which a handful of studies have sought to do. These studies essentially provided musical training to non-musicians (with the extent and nature of musical training varying between studies), and administered pre- and post-training assessments to determine whether changes occurred in outcomes of interest. In terms of neural changes, musical training has been found to enhance both structure (Hyde et al., 2009) and function (Fujioka et al., 2006; Lappe et al., 2008; Lappe et al., 2011; Habibi et al., 2016) of the auditory cortex in young children and adults who received musical training, compared to those who did not (control participants). Consistency of the FFR has also been found to be enhanced in adolescents following musical training (Tierney et al., 2015). Children who took part in instrumental music training showed improvements in speech-in-noise perception following 2 years of training (Slater et al., 2015), and younger adults who participated in singing training demonstrated improvements in speech-innoise perception after only 8 days (Jain et al., 2015). Older adults with 6 months of piano training demonstrated improved cortical responses and speech-in-noise perception following training, suggesting that neural and perceptual benefits can be conferred to aging adults in an intervention context (Zendel et al., 2019).

In addition to improvements in auditory processing, musical training has been linked to enhancements in different aspects of cognitive functioning in older adults – including improvements in working memory and executive control processes – in both cross-sectional (Parbery-Clark et al., 2011; Slevc et al., 2016; Grassi et al., 2017; Mansens et al., 2017) and longitudinal studies (Bugos et al., 2007; Särkämö et al., 2014; Biasutti and Mangiacotti, 2018; Fu et al., 2018). Musical experience has also been shown to alter neural structure and function in regions associated with cognition (Schulze et al., 2011; West et al., 2017); improvements in shared neural substrates for music and nonmusic domains have been suggested as one possible mechanism for transfer from musical training to speech-in-noise perception (e.g., Kraus et al., 2012).

Taken together, cross-sectional and longitudinal findings suggest that musical training may be able to alter brain structure and function; moreover, it appears to have the capacity to promote enhancements in the same auditory abilities that decline as we age (Solé Resano et al., 2010; Hanna-Pladdy and MacKay, 2011; Zendel and Alain, 2012; Alain et al., 2014), suggesting its use as an intervention to mitigate declines in older adults. Of the forms of music making available, singing may be particularly suited to this purpose.

Singing emerges spontaneously in the first months of life (Papoušek, 1996), and appears to be a universal form of expression (Mithen et al., 2006). Although considerable variability in accuracy exists, the vast majority of adults appear to be able to carry a tune (Dalla Bella et al., 2007; Pfordresher and Brown, 2007; Dalla Bella and Berkowska, 2009). Group singing has been shown to lead to improvements in cooperation (Good and Russo, 2016), social and emotional wellbeing (Hillman, 2002; Bailey, 2005; Hays and Minichiello, 2005; Clift and Morrison, 2011; Creech et al., 2013), and physical and creative outcomes (Beck et al., 2000; Clift and Hancox, 2001; Cohen et al., 2006). Some of these benefits may be mediated by changes in hormonal levels that occur during choral singing: after choir practice,

choristers demonstrate decreased cortisol (Beck et al., 2000) and enhanced immune system functioning more generally (Kreutz et al., 2004). Singing in a group can also be highly motivating for older adults (Hillman, 2002; Creech et al., 2013), which may promote intervention adherence. This is of particular import when singing is contrasted with existing auditory rehabilitation programs, which tend to be plagued by low compliance rates and high attrition (Sweetow and Sabes, 2010; Tye-Murray et al., 2012).

In addition to the social, cognitive, and emotional benefits, singing appears better positioned to confer near-transfer benefits to speech. All forms of vocal production involve the rapid integration of auditory and vocal-motor systems (Hickok, 2001; Zatorre et al., 2007; Pfordresher and Dalla Bella, 2011; Pruitt and Pfordresher, 2015); this integration requires feedback loops along the auditory dorsal stream that allow for realtime monitoring and adjustments (Houde and Jordan, 1998; Brainard and Doupe, 2000; Zheng et al., 2010). In a recent study that compared temporal lobe activations across perception of singing, instrumental music, and speech, it was found that compared with instrumental music, singing and speech both led to greater bilateral activations of the superior temporal sulcus (STS; Whitehead and Armony, 2018), a critical node in the auditory dorsal stream (Hickok et al., 2003).

In terms of perceptual processes, there is greater reliance on the vocal-motor system in more challenging listening environments, such as understanding speech-in-noise (Du et al., 2014); older adults rely on this to an even greater degree (Du et al., 2016). Training the vocal-motor system through singing could theoretically improve the resources upon which older adults draw to perceive degraded speech signals. An emphasis on pitch training, feedback, repetition, and the rewarding nature of improvements have been implicated as key components of successful auditory training paradigms, and in the transfer of musical experience to speech perceptual benefits (Besson et al., 2011; David et al., 2012; Herholz and Zatorre, 2012; Shepard et al., 2013; Patel, 2014; Pruitt and Pfordresher, 2015).

The current study investigated whether short-term choir participation and musical training could improve speech-innoise perception in older adults, compared to an age- and audiometrically matched control group who were not taking part in musical training. Outcomes of interest included speech-innoise perception (SIN) and pitch discrimination (FDL), strength and consistency of the neural response to sound (as indexed by features of the frequency following response [FFR] to a repeated speech stimulus), and exploratory cognitive measures of working memory (LSpan) and inhibitory control of attention (Flanker task). We hypothesized that older adults who took part in 10 weeks of group choral practice (2 hours weekly) and individual online musical training (up to 1 hour weekly) would demonstrate improved speech-in-noise perceptual abilities following training, which may be driven in part by enhancements in pitch processing and perception, as indexed by enhanced neural responses to sound (features of the FFR) and improved pitch discrimination thresholds. Exploratory cognitive measures of working memory and attention were assessed in relation to training outcomes, as potential dependent variables. We hypothesized that choir participants would experience greater post-training gains than an age- and audiometrically matched do-nothing control group.

## MATERIALS AND METHODS

#### Participants

The process of recruitment and participation is shown in **Figure 1**. Participants were recruited from the first class in a 10-week group singing course run through the 50+ Program at Ryerson University; interested participants came into the lab to undergo eligibility testing that week. Fifty three participants were screened (8 didn't meet eligibility criteria), 45 participants enrolled (9 withdrew from the study), and 36 participants completed choral training and all test sessions. Two participants were rejected as audiometric outliers, so the final analysis included 34 choir singers.

Thirty-four choir participants (31 female), aged 54–79 (mean age = 67.6, standard deviation [SD] = 6.1 years) underwent pre-testing data collection during the first week of the choir and post-testing data collection following the final choir class. Peripheral hearing loss was measured by an in-lab audiometric assessment at standard test frequencies between 0.25 and 8 kHz; average peripheral hearing loss ranged from 9.7 to 45.3 dB HL (mean = 23.1, SD = 9.9 dB HL). Participants were prescreened to ensure that they did not have any neurological conditions and did not use assistive technology (e.g., hearing aids; see **Figure 1**). Twenty-nine age- and audiometrically matched do-nothing control participants (26 female) aged 60–76 (mean age = 67.7, SD = 4.9 years) were recruited through the Ryerson University Hearing Database. Control participants' average peripheral hearing loss ranged from 10.6 to 47.5 dB HL (mean = 24.1, SD = 10.3 dB HL); average pure-tone thresholds for both groups are shown in **Figure 2**. Groups were matched for the duration of time between test sessions, and there were no group differences in previous musical experience, as indexed by years of formal musical training. Informed consent was obtained from each volunteer prior to their participation in the study, in accordance with the Ryerson Research Ethics Board guidelines (REB 2013-128).

#### Study Design

Each choir participant (n = 34) visited the lab for a pretraining assessment that took approximately three hours, during which time they completed several questionnaires and auditory and cognitive assessments, and underwent an EEG during presentation of repeated auditory stimuli.

Choir-singing participants took part in weekly 2-hour group choral sessions over the course of 10 weeks, during which time they received pitch training and vocal direction in an open and encouraging environment. In addition to the weekly group choir sessions, participants were assigned weekly individual online musical and vocal training exercises (up to 1 hour weekly). This training consisted of pitch discrimination and vocal production exercises designed to target and improve the participants' abilities to perceive and produce small changes in pitch (Theta Music Trainer)<sup>1</sup> .

After 10 weeks of choir participation and online musical training, each choir participant returned to the lab for a

#### Experimental Procedure

Apart from the questionnaires, all assessments were completed in an Industrial Acoustics Company (IAC) double-walled soundattenuating booth. Computerized assessments were presented

post-training assessment that lasted approximately 2.5 hours. During this session, participants completed different versions of the original assessments, and underwent a post-training EEG of during auditory stimulus presentation. To account for possible differences in version difficulty within the matched behavioral tasks, participants were assigned one of four possible counterbalanced configurations of assessments.

The do-nothing control group (n = 29) underwent the same battery of pre- and post-testing, with 8–10 weeks between data collection sessions, but did not receive any active training during this time. The inclusion of this control group in the analysis intended to account for any practice effects within the repeated measures, enabling a controlled examination of the unique effects of the musical intervention on experimental outcomes.

<sup>1</sup>https://trainer.thetamusic.com

using a Mac mini (Apple, 2010), with visual components of the experiment presented on a 24<sup>00</sup> Acer LCD display (Acer X243w, 1920 × 1200) placed at eye level approximately 0.5 m in front of the participant. Audiometric testing and FFR auditory stimuli were administered through binaural foam insert headphones (Electro-Medical Instruments, 3A) connected to a GSI 61 Clinical Audiometer (VIASYS Healthcare). All other auditory assessments were administered binaurally through Koss SB40 headphones at approximately 70 dB SPL.

Before the experiment began, participants were familiarized with task requirements and response methods for each assessment. Participants were monitored throughout the data collection session.

#### Questionnaires

After signing the consent form and going over experimental expectations and volunteer rights, participants were given background and music history questionnaires. These elicited demographics and medical history, and years of formal musical training.

#### Auditory Measures

#### **Speech-in-noise perception: signal-to-noise ratio (SNR)**

Ability to track speech in a noisy environment was assessed using the QuickSIN test (Speech-In-Noise; Etymotic Research; Killion et al., 2004). Participants were presented with four sets of six pre-recorded sentences, with five key words per sentence embedded in four-talker babble noise. In this assessment, the sentences were presented binaurally with a decreasing SNR: the first sentence was presented with an SNR of 25 dB (i.e., the target sentence was twenty-five dB above the background noise; very easy), each subsequent sentence was presented with a −5 dB SNR reduction, to an SNR of 0 dB for the final sentence. Participants were asked to repeat back the target sentences as closely to what they heard as possible, and were awarded one point for each correctly repeated target word, for a possible total of 30 points per set. The sentences in the QuickSIN do not contain many semantic or contextual cues, despite being syntactically correct (Wilson et al., 2007). Out of the four sets of sentences presented, the first two lists were treated as practice sets, to familiarize participants with the task requirements, and the second two lists were scored as experimental data. Mean SNR loss (dB) for each list was calculated by subtracting the total number of correct words from 25.5; Mean SNR loss (dB) represents the increased SNR required to correctly repeat 50% of key words on the QuickSIN test (Etymotic), above 2 dB SNR (the level required for normal hearing individuals to achieve 50% test accuracy; Killion, 1997; Killion et al., 2004). Final scores were calculated by averaging the scores of the two experimental lists; since this is a threshold assessment, a more negative SNR score indicates better performance. Participants' responses were scored online by a researcher, and were also recorded using Audacity software in case response ambiguity necessitated further review. The pre- and post-testing lists consisted of different sentence sets in order to avoid practice effects, and participants' exposure to the sets were counterbalanced across experimental conditions.

#### **Pitch discrimination: frequency difference limens (FDL)**

Participants' ability to distinguish different frequencies was measured using a computerized assessment of FDL. In this task, participants were presented with 3 pure tones, each lasting 200 ms, with amplitude envelopes of 20 ms rise and delay times. A three-alternative forced choice paradigm was used, in which each presented set contained two pure tones at the standard 500 Hz frequency, and one stimulus at a randomly selected higher frequency (Schneider, 1997; Parbery-Clark et al., 2009; Russo et al., 2012). The participant was instructed to identify which tone was higher than the other two by pressing the corresponding number on a computer keyboard (i.e., 1 = first tone is higher; 2 = second tone is higher; 3 = third tone is higher). An adaptive staircase procedure was used to determine the pitch discrimination threshold, whereby the difference between standard and comparison frequencies was halved after three correct responses, or doubled after one incorrect response. After five reversals, the step was changed, so that the frequency difference was divided by 1.414 after 3 correct responses or multiplied by 1.414 after one incorrect response. FDL was determined from the mean of the last 10 reversals.

#### EEG Measure: The Frequency Following Response (FFR)

#### **Stimulus**

Auditory presentation of a repeated/dα/syllable (F0 = 100 Hz) was used to elicit the FFR, following methodological conventions described by Skoe and Kraus (2010). This stimulus was selected because it is a speech sound that has been extensively used in this area of research, and robustly elicits clear FFRs (Russo et al., 2005; Parbery-Clark et al., 2009, 2012; Skoe and Kraus, 2010). Each participant heard 6000 repetitions of this 170 ms sound, presented at alternating polarities. Stimuli were presented binaurally through insert headphones; stimulus volume was set to 60 dB SPL for normal hearers. For individuals with hearing loss above 25 dB, presentation volume was set to 60 dB + (dB HL – 25 dB), controlling stimulus levels for sensory loss across all participants.

#### **EEG administration and data collection**

EEG data were collected using a vertical one-channel montage configuration, using three electrodes, in which active and reference electrodes were placed on the mastoids, and a ground electrode was placed on the forehead. A researcher applied 1 <sup>00</sup> square cloth solid gel electrodes (EL504, BIOPAC Systems, Inc.) to the mastoids and forehead; electrodes were connected to a BIOPAC MP150 data acquisition system and ERS100C Evoked Response Amplifier (BIOPAC Systems, Inc.). Data were recorded at a sampling rate of 20 kHz, with an online lowpass filter of 10 kHz and a high-pass filter of 1 Hz; the signal was recorded using Acknowledge software (AcqKnowledge, version 4.1). Stimuli were presented for 25 minutes in total, during which time participants shown a silent film<sup>2</sup> , to promote relaxation and stillness during the EEG.

<sup>2</sup>http://www.openculture.com/free-silent-films

#### TABLE 1 | Summary of linear mixed effects models for choir-singing and do-nothing control groups across key auditory measures.


The Session × Group interaction accounted for significant unique variance in speech-in-noise perception (mean SNR loss; dB) and pitch discrimination (Frequency Difference Limens; FDL, log Hz), and marginally significant variance in the strength of the FFR at F0 (µV). Group analyses showed that the choir-singing class demonstrated significant improvements in SNR, FDL, and FFR strength following training, while the control group showed no changes. Bold values indicate statistical significance (p < 0.05).

TABLE 2 | Mean scores compared with post hoc pairwise t-tests and related effect sizes (Cohen's d) for the choir-singing class (n = 34) and do-nothing control group (n = 29) at pre- and post-testing sessions, across auditory measures of interest.


Bold values indicate statistical significance (p < 0.05).

#### **EEG data processing**

fnins-13-01153 November 26, 2019 Time: 18:16 # 8

EEG data were processed in MATLAB, using the PHZLAB toolbox (Nespoli, 2016). A 75 Hz high-pass filter was applied, and data were segmented according to individual stimulus responses (i.e., 6000 segments), with epoch windows extending 40 ms pre- and post-stimulus, and the steady-state component extending from 60 to 170 ms post-stimulus onset (Skoe and Kraus, 2010). The 40 ms signal preceding stimulus onset was used as a baseline of ambient EEG activity, against which to compare the response activation. Peak amplitudes in the response waveform were compared to the baseline; response peaks with absolute amplitudes that did not exceed the baseline were not considered 'reliable' (Skoe and Kraus, 2010). Myogenic artifacts, which are many times larger than the neural response, were accounted for by rejecting all trials with amplitudes that exceeded a threshold of 50 µV. Responses that remained after artifact rejection were averaged, using the addition method of inverse polarity processing in order to preserve the representation of the fundamental frequency while minimizing stimulus artifact in the signal. Peak amplitude of the fundamental frequency (a measure of the strength of the pitch representation in the signal, or FFR strength) was calculated by applying a FFT to the averaged signal, and inter-trial phase coherence (ITPC; a measure of response consistency or FFR consistency) was calculated by finding the latency variations across each participant's un-averaged signal.

#### Exploratory Cognitive Measures

Cognitive assessments were administered electronically on the stimulus computer (see section Experimental Procedure). Assessment scripts, coded in HTML-5, were retrieved from

the Millisecond online database<sup>3</sup> (2016), adapted to have fewer blocks and runtime, and administered using Inquisit software (version 5.0.6). Working memory was assessed using a computerized version of the listening span task (LSpan), an auditory adaptation of the reading span task developed by Daneman and Carpenter (1980). Inhibitory control of attention was assessed using a computerized version of an adapted Flanker task (Ridderinkhof et al., 1997).

## Statistical Analyses

Linear mixed effects analyses in a regression format were conducted on choir and control groups to examine the effects of choir participation on speech-in-noise perception (mean SNR loss; dB), pitch discrimination ability (FDL; Hz), and aspects of the FFR which represent the strength and consistency with which the speech fundamental frequency was represented. Exploratory cognitive measures of auditory working memory (LSpan) and inhibitory control of attention (Flanker effect) were also examined in the same format. In all models, measures were regressed on Session, Group, and the interaction between them (e.g., SNR ∼ Session × Group); contrasts were assigned such that the interaction effect represented the training effect of the choir group compared to the control group. Intercepts significantly varied across participants for all auditory measures; because the Session × Group interaction was the main effect of interest for each outcome measure, individual variability in baseline scores across all dependent variables were included as random effects in the multilevel models. Session × Group interactions are reported first in each section, and significant Session × Group interactions were plotted and broken down in separate multilevel models of the choir and control groups. In these separate regressions conducted on each group, the models specified were the same as the main model for each variable, but excluded the main effect and interaction term involving Group. This was done in order to elucidate differential effects of choir participation vs. do-nothing control participation, on all outcomes of interest. Statistical analyses were conducted in R; the nlme and lmer packages were used to conduct linear mixed effects modeling in a regression format (Bates et al., 2015; Pinheiro et al., 2019).

## RESULTS

Linear mixed effects models for key auditory measures are summarized in **Table 1**; pre- and post-testing group means, post hoc pairwise t-tests, and effect size calculations are reported in **Table 2**. Due to a computer error, FDL scores for 25 participants were spurious and removed from analyses.

## Speech-in-Noise Perception

Pre-training and post-training SNR loss (dB) for choir singing and do-nothing control groups is plotted in **Figure 3**. The Session × Group interaction accounted for significant variance in dB SNR loss [b = −1.26, t(61) = −2.57, p = 0.009]. Regressions conducted on each group showed that choir participants

<sup>3</sup>http://www.millisecond.com/download/library/

demonstrated improvements of 0.81 dB SNR following training [b = −0.81, t(33) = −2.68, p = 0.006], while control participants showed no change [b = 0.45, t(28) = 1.14, p = 0.247; see **Table 1**].

## Pitch Discrimination and Neural Representation of Frequency

fnins-13-01153 November 26, 2019 Time: 18:16 # 9

**Figure 4** shows mean pitch discrimination thresholds (FDL, log Hz) before and after 10 weeks of choir singing and do-nothing control participation. The Session × Group interaction accounted for significant variance in frequency discrimination thresholds [log Hz; b = −0.11, t(36) = −2.10, p = 0.036]; regressions conducted on each group showed that choir participants demonstrated improved pitch discrimination thresholds following training [b = −0.13, t(19) = −3.21, p = 0.0013], while control participants showed no changes [b = −0.03, t(17) = −0.99, p = 0.322; see **Table 1**].

**Figure 5** shows the strength of the neural representation of the fundamental frequency (F0) of the steady-state component of a complex sound (/da/; F0 = 100 Hz), before and after choir-singing or do-nothing control participation. The Session × Group interaction accounted for marginally significant variance in the FFR strength at F0 [µV; b = 0.0037, t(48) = 1.77, p = 0.0721]; regressions conducted on each group showed that following training, choir participants demonstrated improvements in the neural representation of pitch [b = 0.0033, t(29) = 2.60, p = 0.009], while control participants did not demonstrate significant changes [b = −0.00034, t(19) = −0.21, p = 0.8346; see **Table 1**]. The Session × Group interaction did not account

for significant variance in the inter-trial phase coherence of the FFR [µV; b = 0.0025, t(50) = 0.38, p = 0.7066].

power of the fundamental in participants' EEG signals (FFR strength at F0).

#### Exploratory Cognitive Measures

There were no Session × Group interaction effects on either listening span (auditory working memory) or the Flanker effect (inhibitory control of attention).

## Possible Contributors to Choir-Driven Improvements in Speech-in-Noise Perception

**Figure 6** shows the relationship between improvements in pitch discrimination ability, FFR strength at F0, and speech-in-noise perceptual gains. SNR scores of choir participants were analyzed in a multilevel model including Session (effect of training), FDL, and FFR strength as potential predictors of variance, in order to elucidate potential mechanisms for choir-related improvements in speech-in-noise perception. Predictors were added into the model hierarchically based on hypothesized contributions to speech-in-noise perception derived from previous research, and only predictors with significant Session × Group interactions were included in the model (i.e., FDL and FFR strength at F0). The final choir model is reported in **Table 3**; with Session included in the model, the interaction between FDL and FFR strength accounted significant unique variance in SNR scores [b = −239.74, t(21) = −3.18, p = 0.0045]; inclusion of FDL accounted for training-related variance in SNR loss previously accounted for by Session, suggesting a potential mediating effect of pitch discrimination on training-related improvements in speech-in-noise perception.

## Pitch Discrimination as a Potential Mechanism for Musicianship Improvements in Speech-in-Noise Perception: A Mediation Analysis

**Figure 7** shows the distribution of the indirect effect of choir training on SNR, which shows that choir-related improvements in speech-in-noise perception were significantly mediated by improvements in pitch discrimination. The regression of mean SNR loss (dB) on Session, ignoring the mediator, was significant (see **Table 1**); when mean SNR loss (dB) was regressed on the mediator while controlling for Session, FDL significantly predicted SNR [b = 3.57, t(18) = 3.22, p = 0.0047], but training-specific effect was no longer a significant predictor of SNR [b = −0.23, t(18) = −0.605, p = 0.553]. Aroian's test revealed a significant mediating effect of FDL on choir-related improvements in SNR (Aroian's test statistic = −2.20, p = 0.0280), and a Monte Carlo resampling approach (n = 20000) confirmed that FDL fully mediated the relationship between choir participation and SNR improvements, 95% CI [−0.775, −0.0894]; **Figure 7**.

The significant interaction effect of FDL × FFR strength on SNR (**Table 3**) suggested a potential moderating effect of FFR on the relationship between pitch discrimination and SNR, so a moderated mediation analysis was conducted to assess the statistical significance of this effect. This analysis revealed a marginally significant moderation of the mediation by changes in FFR strength [b = −277.14, t(14) = −1.885139, p = 0.0803]; the model of this relationship is shown in **Figure 8**.

Exploratory analyses considered whether the relationship between choir-related improvements in pitch discrimination and speech-in-noise perception could be predicted by the

TABLE 3 | Hierarchical regression of possible contributors to speech-in-noise perceptual gains following choir (vs. control group) participation; marginal R <sup>2</sup> = 0.325, conditional R <sup>2</sup> = 0.691.



Continuous predictor variables included in the moderation analysis (FDL and FFR strength at F0) were centered around the grand means for each variable before groups were subsetted. Bold values indicate statistical significance (p < 0.05).

strength of the representation of F0 in the FFR. Simple slopes analyses on low, average, and high FFR strength at F0 (centered variable ± SD) showed that the relationship between pitch discrimination and SNR was strongest in high FFR conditions [i.e., when F0 is strongly represented in the FFR; b = 4.28, t(14) = 3.29, p = 0.0054], weaker in average FFR conditions [b = 2.28, t(14) = 1.08, p = 0.0546], and non-significant in low FFR conditions [b = 0.28, t(14) = 0.16, p = 0.8714], suggesting that when controlling for session, the strength of the FFR at F0 is predictive of the strength of the relationship between pitch discrimination and speech-in-noise perception. These analyses suggest that neural and perceptual pitch processes play a role in speech-in-noise perceptual ability in older adults, and could mechanistically contribute to a potential musicianship advantage in this domain.

## Effects of Peripheral Hearing Loss on Perceptual and Neural Auditory Outcomes

**Figures 9–11** demonstrate the differential effects of peripheral hearing loss (dB HL) on the efficacy of the choir-singing intervention on: speech-in-noise perception (**Figure 9**); pitch

FIGURE 7 | A mediation analyses was conducted using the Monte Carlo technique (20000 samples); choir related gains in SNR were fully mediated by improvements in pitch discrimination ability; 95% CI [–0.775, –0.0894].

discrimination (**Figure 10**); and FFR strength at F0 (**Figure 11**). There were significant main effects of Audiometry on SNR [b = 0.07, t(60) = 2.66, p = 0.0101] and FDL [b = 0.008, t(45) = 2.52, p = 0. 0153]; across groups and sessions, worse peripheral impairments were predictive of worse performance on perceptual tasks. However, there were no significant effects of peripheral hearing loss on FFR strength at F0, indicating a potential differentiating effect of audiometry on neural vs. perceptual outcomes.

## DISCUSSION

This study demonstrated experimentally that short-term choir participation can be used as an intervention to target and improve speech-in-noise perception in older adults, supporting the hypotheses that: (1) the musicianship advantage in speechin-noise perception can be conferred to older adults through a relatively short training period, using choir singing and vocal training; and (2) enhancements in pitch processing contribute to improvements in this domain. This study lays the groundwork

FIGURE 9 | The relationship between peripheral hearing loss and pre-post changes in mean SNR loss (dB), for choir and control groups. Degree of impairment is categorized on clinical audiometric criteria, where normal to near normal hearing = 0–25 dB HL, and mild to moderate = 26–55 dB HL. Error bars are within subjects CIs (95%) plotted around Session × Group means.

groups. Degree of impairment is categorized on clinical audiometric criteria, where normal to near normal hearing = 0–25 dB HL, and mild to moderate = 26–55 dB HL. Error bars are within subjects CIs (95%) plotted around Session × Group means.

for a highly scalable, cost-effective, and engaging intervention that can be used to mitigate declines in speech-in-noise perception in older adults, and importantly provides insight into potential neural and perceptual mechanisms underlying these changes. In particular, the relationship between auditory processing, pitch discrimination, and speech-in-noise perception suggested by this study elucidates one way in which musical experience – and specifically, singing and vocal training – can transfer to improvements in speech processing, through enhanced representation of pitch.

FIGURE 11 | The relationship between peripheral hearing loss and pre-post changes in the strength of neural representation of fundamental frequency (FFR strength at 100 Hz) for choir and control groups. Degree of impairment is categorized on clinical audiometric criteria, where normal to near normal hearing = 0–25 dB HL, and mild to moderate = 26–55 dB HL. Error bars are within subjects CIs (95%) plotted around Session × Group means.

Compared with do-nothing control participants, choir singers demonstrated 1.26 dB improvements in mean SNR loss following training, a change that corresponds to a functional difference of 10–20% improvement in speech intelligibility (Middelweerd et al., 1990). Other forms of auditory rehabilitation for older adults yield similar improvements (1.5 dB with LACE training), require intensive practice in the target domain (30 min per day, 5 days per week for 4 weeks), which may account for the relatively poor compliance and high rates of attrition (Sweetow and Sabes, 2006; Song et al., 2012; Tye-Murray et al., 2012). In contrast, group singing is reported to be highly engaging and motivating, provides many benefits outside of the focus of training, and promotes ongoing social involvement and activity (Hillman, 2002; Creech et al., 2013). It is important to note that in the current study, while nine participants withdrew from data collection, almost all of the original 53 participants surveyed remained in the choir class (two withdrew due to health issues), and many participants reported joining other choirs and singing groups after the study ended. As a proof of concept, this makes a strong case for the engagement and enjoyment of participants in a group singing class, and the sustainability of this type of intervention, along with its efficacy at improving speech-innoise perception.

In terms of possible mechanisms accounting for changes in speech-in-noise perception, improvements appeared to be driven at least in part by enhancements in pitch processing. In addition to improved speech-in-noise perceptual abilities, choir singers demonstrated improved pitch discrimination thresholds (as indexed by lower FDL) and stronger neural representations of the speech fundamental frequency (F0) following training as (stronger FFR representation of F0 of the/da/stimulus; 100 Hz). Analyses showed that training-related improvements in

speech-in-noise perception were fully mediated by improvements in pitch discrimination, suggesting that the benefits afforded by choir-singing arose at least in part from enhancements in the perception of pitch. A moderated mediation analysis suggested that over the course of choir training, the strength of the neural representation of F0 was predictive of the strength of the relationship between pitch discrimination and speech-innoise perception. This suggests that neural indices of pitch processing influence the extent to which older adults rely on pitch cues to support speech-in-noise perception. Taken together, these findings suggest that older adults who take part in 10 weeks of choir singing and vocal training demonstrate enhanced neural responses to and perception of subtle frequency cues, which lead to improved perception of speech-in-noise following training.

A number of previous studies have findings that converge with our mediation account of speech-in-noise via pitch perception. For example, musical experience has been correlated with improvements in speech-in-noise perception (Parbery-Clark et al., 2009; Zendel and Alain, 2012; Swaminathan et al., 2015; for review see Coffey et al., 2017a), pitch discrimination ability (Micheyl et al., 2006; Parbery-Clark et al., 2009; Schellenberg and Moreno, 2009; Bidelman et al., 2011a,b; Fuller et al., 2014; Boebinger et al., 2015; Yates et al., 2018), and subcortical encoding of F0 (Parbery-Clark et al., 2009, 2011; Bidelman et al., 2011a,b; Musacchia et al., 2017); relationships have also been demonstrated between pitch perception and subcortical encoding of F0 (Carcagno and Plack, 2011; Coffey et al., 2016; Bianchi et al., 2017), as well as between speech-in-noise perception and pitch processing (e.g., Coffey et al., 2017a). On the other hand, a number of correlational studies have not been able to replicate the musicianship advantage for speech-in-noise perception (e.g., Ruggles et al., 2014; Madsen et al., 2017). Some of this discrepancy may be based on methodological or sampling differences across studies. More generally, limited experimental work in this field has left the nature of these relationships somewhat uncertain (excepting some recent work by Zendel et al., 2019). Zendel et al. (2019) found that older adults (non-musicians) who took part in 6 months of piano training showed improvements in speech-in-noise perception compared with control and video game intervention groups showing no improvements in this domain. Importantly, these individuals were randomly assigned to the interventions in this study, lending credence to the use of musical training to support auditory abilities in older adults.

As a musical intervention, choir singing may be uniquely suited to hone pitch perceptual processes, through activation of existing vocal-motor systems, rapid integration of perceptual and productive processes, and shared neural architecture activated by speech and vocal song. Choir singers have the benefit of both intrinsic auditory and sensorimotor feedback, and can harness existing feedback loops between auditory perception and vocal production – which allow humans to monitor and dynamically alter speech – to rapidly alter and hone vocal output, including production of pitch. These integrative feedback loops and fine-tuned changes may allow choir singers to undergo rapid improvements in both productive and perceptual processes, in a short period of time. Singing is also an intuitive and innate form of music-making, and may be learned (and improved upon) more quickly than learning to play an instrument. This innate quality, along with intrinsic auditory and vocal motor feedback loops, and extrinsic feedback (from the choir director and other singers) create the ideal circumstances to quickly and effectively improve pitch processing and downstream perceptual abilities such as speechin-noise perception.

While the current study found that improvements in pitch processing fully mediated choir enhancements in speech-innoise perception, this does not preclude the role of other neural, perceptual, and cognitive contributors to this ability. Previous work suggests that musical training also leads to enhancements in attentional processes involved with speech encoding (e.g., Zendel et al., 2019); auditory working memory has also been implicated in this ability (e.g., Kraus et al., 2012). Further experimental research is necessary to determine the unique contributions of various auditory, cognitive, and neural processes to music-related improvements in speech-innoise perception in older adults. In addition, the contribution of productive musical training (i.e., singing practice) vs. perceptual training (i.e., learning to listen to differences in pitch) to auditory processing improvements is not clear from the current study. Notably, a recent study found that non-musicians who received targeted pitch perceptual training achieved pitch discrimination thresholds comparable to musicians in 4–8 hours (Micheyl et al., 2006); it is unknown whether these pitch improvements would be sustained over time, or transfer to speech-in-noise perceptual benefits; it is also unclear whether the mechanism by which pitch discrimination is improved – i.e., through targeted psychoacoustic training, vs. through a more naturalistic singing or music listening paradigm – would alter the degree to which pitch perception mediates speech perceptual processes. This further underscores the need for targeted experimental study of musical (and non-musical) perceptual and productive training on auditory abilities, to elucidate the roles and contributions of each. It is also unclear from the current study whether the auditory benefits of choir participation would persist after cessation of training, and whether these benefits would accumulate with additional/long-term choir or musical involvement. These are rich avenues for future research projects, especially in an experimental/longitudinal context.

In addition to the role of pitch, the degree of peripheral hearing loss appeared to influence the amount of gains choir participants experienced as a function of training, whereby participants with lower levels of peripheral hearing loss appeared to experience greater training-related improvements in speechin-noise perception. This could be suggestive of a possible limit on the efficacy of this intervention at improving perceptual processes in individuals with levels of hearing loss approaching the need for peripheral assistance (i.e., 26–55 dB HL). One potential explanation is that these individuals may not have been able to hear well enough in the classes to pitch-match with other voices, and thus did not receive equivalent experiential benefits from this activity.

Another explanation is that greater peripheral impairments may have led to more substantial central deficits that were recalcitrant to a behavioral intervention in this capacity, at least in the current dose of 10 weeks of choir singing (2 hours/week of group singing, plus 1 hour/week of individual online exercises). However, there were no effects of peripheral hearing loss on the strength of FFR responses, as individuals within the upper range of peripheral hearing loss still showed improvements in the FFR representation of F0. This suggests that while individuals with greater peripheral hearing loss may not receive perceptual benefits from 10 weeks of choir participation, their neural responses may still be enhanced through this experience. An interesting line of inquiry for a future study would be to address whether individuals with greater hearing loss may be able to obtain similar benefits by participating in choir training in conjunction with the use of hearing aids.

Overall, group choral singing appears to be uniquely well suited for this training paradigm, as it encourages singers to produce (and hopefully, perceive) fine-tuned adjustments in pitch structure, which seem to play a major role in improving speech-in-noise perceptual outcomes in this population. The intrinsic relationship with speech, rapid sensory-motor integration, instantaneous feedback afforded by vocal production and auditory perception, and the innate nature of this ability suggest singing as an ideal candidate for improving a speechrelated perceptual issue. Group singing is highly motivating, social, and emotionally fulfilling; this is of immense import in developing interventions that will encourage engagement and promote active involvement, especially with older adults. Overall, running a choir is an immensely scalable intervention, requiring minimal cost and equipment (and which could be implemented anywhere), and this study demonstrated that training-related improvements in auditory perception can appear after a very short intervention period (10 weeks of choir singing). The efficacy of this intervention can easily be assessed through experimental manipulations of dose or duration (e.g., using longer periods of choir singing, or assessing persistence of effects post-training), different study populations (e.g., hearing aid users vs. unaided individuals), and with different emphases during the class (e.g., focusing on pitch matching/singing in unison vs. attending to different melodic or harmonic lines). The ease of implementation and scalability of the choir singing paradigm, efficacy at improving auditory abilities in aging adults, and rich opportunity for further investigation suggest choir singing as an ideal framework for examining musical training as an auditory rehabilitation for aging adults.

#### CONCLUSION

Group singing is an intuitive, engaging, and motivating form of music making, that has in previous studies been shown to contribute to social, emotional, cognitive, and physical well-being. The current findings suggest that choir singing can be used as an effective intervention to mitigate age-related losses in auditory perceptual abilities, in as short a time as 10 weeks. Importantly, these findings showed that this intervention improved older adults' abilities to perceive speech in noisy environments, a key concern in promoting healthy aging. This work provides an empirical basis for a highly scalable and effective intervention that could significantly improve quality of life in older adults.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Ryerson University Ethics Board (REB). The patients/participants provided their written informed consent to participate in this study.

## AUTHOR CONTRIBUTIONS

ED and FR conceived and designed the study. ED recruited the participants, collected the data, analyzed the results, and wrote up findings as Master's thesis with FR as supervisor. EW collected the data, revised the literature review, and contributed to the theoretical interpretation of findings. GN developed MATLAB toolbox for processing EEG data. All authors were involved in discussions about the interpretation of the results and the writing of the manuscript.

## FUNDING

This work was supported by a Research Chair sponsored by Sonova Holding AG awarded to FR, as well as a grant from the Natural Sciences and Engineering Research Council of Canada (RGPIN-2017-06969).

## ACKNOWLEDGMENTS

We would like to thank Saul Moshe-Steinberg, James McGrath, and Paolo Ammirante for their work toward establishing a subset of the protocols implemented in the current investigation. We are also indebted to Sina Fallah for leading the singing classes and to the 50+ Program of the Chang School of Continuing Education (Ryerson University) for their partnership in all aspects of this research.

## REFERENCES

fnins-13-01153 November 26, 2019 Time: 18:16 # 15



click-evoked brainstem response. Hear. Res. 59, 179–188. doi: 10.1016/0378- 5955(92)90114-3


in older adults. Aging Men. Health 22, 1–8. doi: 10.1080/13607863.2017. 1328481


Y. E. Cohen, A. N. Popper and R. R. Fay (Berlin: Springer), 293–327 doi: 10.1007/978-1-4614-2350-8\_10


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Dubinsky, Wood, Nespoli and Russo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Effects of Bimanual Coordination in Music Interventions on Executive Functions in Aging Adults

#### Jennifer A. Bugos\*

School of Music, Center for Music Education Research, University of South Florida, Tampa, Tampa, FL, United States

Music training programs have been shown to enhance executive functions in aging adults; however, little is known regarding the extent to which different types of bimanual coordination (i.e., fine and gross motor) in music instruction contribute to these outcomes. The aim of this study was to examine the effects of bimanual coordination in music interventions on cognitive performance in healthy older adults (60–80 years). Participants (N = 135) completed motor measures and battery of standardized cognitive measures, before and after a 16-week music training program with a 3 h practice requirement. All participants were matched by age, education, and estimate of intelligence to one of three training programs: piano training (fine motor); percussion instruction (gross motor), and music listening instruction (MLI) (no motor control condition). Results of a Repeated Measures ANOVA revealed significant enhancements in bimanual synchronization and visual scanning/working memory abilities for fine and gross motor training groups as compared to MLI. Pairwise comparisons revealed that piano training significantly improved motor synchronization skills as compared to percussion instruction or music listening. Results suggest that active music performance may benefit working memory, the extent of these benefits may depend upon coordination demands.

#### Edited by:

Assal Habibi, University of Southern California, United States

#### Reviewed by:

Eckart Altenmüller, Hanover University of Music Drama and Media, Germany Nils Henrik Pixa, Chemnitz University of Technology, Germany

#### \*Correspondence:

Jennifer A. Bugos bugosj@usf.edu

Received: 05 July 2019 Accepted: 05 November 2019 Published: 05 December 2019

#### Citation:

Bugos JA (2019) The Effects of Bimanual Coordination in Music Interventions on Executive Functions in Aging Adults. Front. Integr. Neurosci. 13:68. doi: 10.3389/fnint.2019.00068 Keywords: music training, executive functions, piano training, cognitive intervention, older adult

## INTRODUCTION

Bimanual coordination is a common element of instrumental music performance that requires collaboration between the hands. Often bimanual coordination is measured based upon in-phase, antiphase, or out- of phase movements calculated by phase angles of each limb at discrete timepoints. However, in music performance, bimanual coordination can also be accounted for in hand synchronization, pattern accuracy, and rhythmic accuracy while performing hand movements in parallel, contrasting, and oblique motion. This research employs this musical definition of bimanual coordination with respect to pattern, synchronization, and rhythmic accuracy and examines its relationship to music training and cognitive performance.

Bimanually coordinated movements can be varied by level of task complexity, task difficulty, absence or presence of feedback, and level/amount of training or task practice (Maes et al., 2017). According to information processing theory, bimanual coordination is an example of a dual-task situation that includes interference between simultaneously performed tasks, placing demands upon cognitive and neural resources (Swinnen and Gooijers, 2015). Practicing a musical instrument can offer a level of complexity for novice instrumentalists using a dual-task situation of bimanual motor movements

coupled with auditory feedback which may exercise areas of cognition. Our hypothesis is that bimanual coordination tasks such as performance on a musical instrument, may contribute to improving and maintaining cognitive skills in aging adults.

The purpose of this study was to examine the effects of bimanual coordination in musical tasks with graded dual-task load on cognitive and motor outcomes in healthy older adults (60–80 years) as compared to a non-motor music training intervention, music listening instruction (MLI). Specifically, we evaluated the effects of music training with fine motor (i.e., group piano training), gross motor (i.e., group percussion ensemble), and no motor (i.e., music listening instruction) requirements on bimanual coordination skills and executive functions. We also examined the relationship between bimanual coordination performance as measured by pattern, synchronization, and rhythmic accuracy, and performance on measures of executive functions. Based upon the Supply and Demand framework in aging adults, fine motor training may contribute to corticocerebellar networks (Seidler et al., 2010). Therefore, we hypothesized that those allocated to group piano training who practice fine motor skills in piano training may exhibit greater enhancements in executive functions than the other two interventions.

Musicians devote many practice hours to the development of complex bimanual sensorimotor skills which vary based upon instrumentation (Krampe and Ericsson, 1996; Krampe, 2002; Kim et al., 2017). Gross motor skills refer to movements that require coordination of arms, legs, other large muscle groups. For example, percussionists with hand drums or mallets utilize gross motor skills. While fine motor skills such as those used by wind players and pianists, require the coordination of small muscles such as the digits of the hand. While all musical instruments require precise timed movements, differentiation in task requirements by instrument and types of motor coordination may impact transfer to motor skills and the degree of transfer to cognitive performance. For instance, Kim et al. (2017) showed correlations between synchronization errors committed on an electric drum task by older adults and errors committed on cognitive tasks of executive functions. This research explored relationships between motor performance and executive functions in the context of a training study.

#### Expert Musicians and Motor Control

Few studies examine the emergence of skilled bimanual coordination in novice musicians, however, studies of professional musicians can provide insight into the neural mechanisms implicated through intense musical training. In professional musicians with extensive practice, neuroplastic changes have been found that correspond to the trained instrument. For instance, in professional string players, larger cortical representations for fingers of the left hand were found as compared to non-musicians (Elbert et al., 1995). Expert musicians were shown to have a larger anterior corpus callosum as compared to non-musicians, suggesting more efficient interhemispheric communication (Schlaug et al., 1995).

Many examples of experience-dependent plasticity have been found in musicians who engage in fine motor control. For example, wind players were found to have showed enlarged cortical thickness in lip-and tongue related brain areas when compared to non-musicians (Choi et al., 2015). Similar to these findings, research in professional pianists showed differential cortical activation patterns (Koeneke et al., 2004), structural and functional activity in auditory and motor areas (Krings et al., 2000), and changes to white matter integrity and gray matter density (Han et al., 2009). If neurological evidence indicates neural changes resulting from instrumental training (Palomar-García et al., 2017), what are the functional differences in cognitive outcomes from engaging in different types of bimanual coordination in adults?

## Bimanual Coordination and Cognition in Aging

Age-related cognitive decline is associated with deficits in executive functions and motor coordination (Seidler et al., 2010; Fujiyama et al., 2013; Bernard and Seidler, 2014; Rueda-Delgado et al., 2019). Motor deficits may be due to the gradual degeneration of the neuromuscular system, contributing to sensory and motor challenges. However, older adults can learn new motor skills despite the availability of fewer cognitive resources (Seidler, 2007).

Research suggests that older adults recruit additional neural resources when compared with younger adults (Heuninckx et al., 2005; Rueda-Delgado et al., 2019). For instance, neuroimaging research has shown that older adults completing coordination tasks, demonstrate increased activation patterns in the posterior cerebellum, an area associated with coordinated movements, when compared to younger adults (Heuninckx et al., 2005). Additional research by Rueda-Delgado et al. (2019) found functional reorganization after training in older adults as compared to younger adults who completed a novel bimanual gross motor task. Specifically, beta power was reduced after task training in older adults suggesting higher neural activity.

The "Supply and Demand" Framework (Seidler et al., 2010) provides a potential explanation for age-related changes in the neural control of movement. This framework accounts for structural and functional declines (demand), shown to increase with age, in the motor cortices, cerebellum, and basal ganglia pathways. Thus, older adults rely more heavily upon cognitive resources for motor control. In addition, aging adults demonstrate reductions in attentional capacity and cognitive resources (supply) due to the degradation of the prefrontal cortex and anterior corpus callosum. The framework suggests that the dopaminergic system acts on the corticocerebellar neural pathways projecting from the cerebellum to the frontal cortices, areas associated with higher cognitive processes. By engaging neural pathways associated with cerebellar activity in fine motor tasks such as piano training, the corticocerebellar pathway may be strengthened. Activations in the corticerebellar pathway can influence neural architecture such as the dorsolateral prefrontal cortex, an area associated with executive functions (Watson et al., 2014). While Seidler et al. (2010) have suggested that training bimanual coordination skills may reduce the

potential for cognitive decline, few cognitive training programs consider bimanual motor coordination in the context of a music intervention.

Some interventions for aging adults include bimanual coordination tasks such as multisensory exercise programs (Moreira et al., 2018); and juggling interventions (Draganski et al., 2004; Scholz et al., 2009). Results of a randomized controlled trial showed benefits from multisensory exercise training with bimanual hand movements to cognition and motor outcomes in institutionalized older adults (Moreira et al., 2018). In addition, neuroimaging data in adults who participated in a 90-day juggling intervention showed increased gray matter volume in the mid-temporal area bilaterally and in the left posterior intraparietal sulcus (Draganski et al., 2004). Boyke et al. (2008) replicated the juggling intervention in a sample of older adults in which similar neuroplastic changes were found posttraining. A significant increase in gray matter in the hippocampus and nucleus accumbens was found for older adult jugglers. Training on a musical instrument includes bimanual motor control in a temporal context, similar to complex motor activities such as juggling.

### Piano Training and Cognition

Piano performance requires complex fine motor control and integrates auditory feedback in a temporal context. Research studies have found that piano training enhances several executive functions such as working memory (Bugos et al., 2007; Guo et al., 2018), spatial reasoning (Rauscher and Zupan, 2000), verbal fluency (Bugos and Kochar, 2017), and cognitive control (Seinfeld et al., 2013) in adults and children. Preschool children who received piano training programs were shown to increase spatial-temporal reasoning (Rauscher and Zupan, 2000). Results of a 6-week keyboard harmonica intervention improved working memory in young children as compared to no treatment controls (Guo et al., 2018). In addition, piano training was shown to significantly increase auditory word discrimination skills in young children as compared to a reading training program and no treatment controls (Nan et al., 2018). Collectively, studies in young children showed the impact of piano training on working memory, verbal skills, and reasoning abilities as compared to control conditions.

Additional research has been conducted in aging adults on the effects of piano training on cognitive performance in novice adult musicians. Bugos et al. (2007) found that older adults who were randomly assigned to a 6-month individualized piano training program outperformed those assigned to a no treatment control group on a series of standardized measures of executive functions and working memory. Data suggested that some areas of executive functions such as perceptual speed and visualscanning, were maintained after a 3-month delay period.

In another research study, Seinfeld et al. (2013) examined the impact of a four-month piano intervention on cognitive control as measured by the Spanish version of the Stroop task, visual spatial performance by the Trail Making Card A, and wellbeing as measured by the Beck Depression Inventory, Profile of Mood States, and World Health Questionnaire Quality of Life Brief Questionnaire. While adults in Seinfeld et al. (2013) were not randomly assigned, post-training data showed enhanced performance on measures of cognition, particularly in response inhibition for those engaged in group piano training compared to a group of older adults engaged in recreational activities. There is a need for additional experimental research on the impact of piano training on cognitive outcomes in older adults and to compare and contrast with outcomes in other percussion based interventions.

## Gross Motor Coordination in Percussion Programs

Research in the music therapy literature showed that gross motor performance on non-pitched percussion instruments (e.g., drums, congas, or djembes) in drum circles can assist with health-related outcomes such as stress and anxiety. For example, patients with dementia who received a drumming program for 6 weeks (twice weekly) demonstrated significantly reduced anxiety compared to no treatment controls (Sung et al., 2012). Other variables affecting health such as blood pressure have been shown to change after drumming. Researchers in Africa found that older adults with high blood pressure who partook of three, 40 min djembe drumming sessions demonstrated reduced systolic blood pressure post-training (Smith et al., 2014).

Research has also found that drumming can impact psychosocial and cognitive outcomes. For example, patients with Parkinson's disease who received a 6-week West African drumming class (twice per week), self-reported higher quality of life than controls (Pantelyat et al., 2016). In another study, nine participants with Huntington's disease demonstrated increased performance on measures of executive functions after 2 months of a drumming intervention (Metzler-Baddeley et al., 2014). Neurological data further suggested that the training may have contributed to changes in the genu of the corpus callosum. While there are only a few studies that actively implement percussion performance in the music therapy literature, we seek to evaluate a different kind of percussion training program, mallet training in healthy older adults.

Few studies have researched the impact of percussion training on cognitive outcomes in healthy older adults. Dege and Kerkovius (2018) showed 15 weeks of drumming (60 min, weekly) enhanced working memory in aging females compared to a literature group and no treatment group. Another study found that beginning adult musicians who completed an 8 week mallet training program demonstrated no significant differences in cognitive outcomes, though a non-significant trend was noted for processing speed. Researchers found increased self-efficacy compared to an autobiographical writing group (Bugos and Cooper, 2019). While both studies acknowledged limitations in sample size, these studies included a progressively difficult curriculum and a social learning environment, factors positively associated with musical cognitive training programs (Bugos, 2014). Therefore, this research included training groups with a social learning environment (i.e., groupbased format) and a progressively difficulty curriculum (see **Supplementary Material**).

The impact of fine motor and gross motor music training on cognitive and motor performance in aging adults remains unclear. The purpose of this study was to investigate the effects of bimanual coordination in fine motor training (Group Piano Training; GPI), gross motor training (Group Percussion Ensemble; GPeI), and a no motor control regimen (Music Listening Instruction; MLI) on cognitive performance and bimanual coordination in healthy adults. It was hypothesized that older adults assigned to instrumental interventions, group piano training, and group percussion instruction (GPeI), would outperform matched controls enrolled in MLI on measures of executive functions. Since fine motor activities such as piano training have been more closely linked to corticerebellar pathways associated with bimanual coordination (Herrojo Ruiz et al., 2017; Jirenhed et al., 2017), it was hypothesized that those in group piano instruction would demonstrate increased bimanual coordination as compared to GPeI. However, we predicted that those in GPeI would demonstrate increased rhythmic coordination.

## MATERIALS AND METHODS

#### Participants

One hundred eighty non-musicians (60–80 years) were recruited from a mid-size city in the Southeastern United States and a series of surrounding rural communities. Recruitment included flyers and presentations to local Councils on Aging, church groups, and media coverage of the research. Criteria for participation consisted of those between the ages of 60–80, a native English speaker, no history of colorblindness, no prior history of neurological impairment (e.g., dementia or stroke), no difficulty with the movement of the hands or persistent tremor, not currently taking any psycho-reactive medications or those affecting memory performance (e.g., sleep medications, antidepressants), less than 3 years of prior musical training, and not currently engaged in music reading or musical performance. Participants were screened for cognitive impairment (scores ≥ 30) with the Telephone Interview for Cognitive Status (TICS, Brandt et al., 1988). The TICS is a short reliable screening for cognitive impairment. Those scores, ≥30, suggest no cognitive impairment and correspond to those found on the Mini-Mental State Exam (Fong et al., 2009). Informed written consent was obtained from all participants in accordance with the procedures of the University Institutional Review Board.

One hundred thirty-five participants completed the research study (**Table 1**). While each music training intervention began with 60 adults, the final sample included: group piano instruction (n = 49), MLI (n = 48), or GPeI (n = 38). Attrition was due to lack of assignment to preferred group, personal illness, family illness, caregiver responsibilities, or a financial need to return to the workplace. The attrition rate was 24%, similar to other research employing interventions for older adults (Jancey et al., 2007).

#### Procedure

All participants completed baseline measures of music aptitude and estimate of intelligence. A series of dependent measures



VIQ, Verbal Intelligence Quotient; PIQ, Performance Intelligence Quotient; FSIQ, Estimate of Full Scale Intelligence Quotient scores; AMMA, Advanced Measures of Music Audiation.

of executive functions (i.e., visual scanning/working memory, processing speed, verbal fluency, and cognitive control) and motor measures (i.e., finger dexterity and bimanual coordination) were repeated at post-testing. Alternate forms of measures (e.g., Form A and B for the Verbal Fluency subtest) were used whenever possible to avoid the potential for practice effects. All measures were administered by two highly trained research assistants with experience in psychological testing for aging adults. Both research assistants were blind to the participants' assigned condition. Participants were matched by age, education, and intelligence to three separate training interventions; however, placement of three individuals who matched these criteria were randomly assigned to one of three interventions. Sixty participants were assigned to each of three training interventions. Fifteen participants were allocated to each class session within each training intervention (i.e., music listening, group piano training, and group percussion ensemble). Trainers remained the same for the study duration and each conducted four separate classes for their prescribed intervention. All participants received 16 weeks of training sessions, which met for 45 min weekly.

Since success in music programs depends upon access to practice and consolidation of learning, all participants were required to practice training exercises for 30 min per day or 3 h per week. All practice homework was checked and practice times logged. The format of the training programs were structured similarly with 10 min of review (e.g., homework or practice log checks and review of concepts), 25 min of group performance of new skills/concepts/repertoire, and 10 min for self-practice and assessment. Training fidelity checks were conducted bi-weekly in each intervention by two separate research assistants. These checks insured that time was allocated equivalently between interventions and that the trainer adhered to the research protocols which included the training curriculum. A separate research assistant was solely responsible for collecting homework and checking practice logs for completion.

#### Description of Programs

Group Piano Instruction (GPI) consisted of a basic piano course which included technical exercises (e.g., major scales), finger dexterity exercises (e.g., Hanon exercises), basic piano repertoire selected from the Alfred All-in-One text (Palmer et al., 1995).

Classes were taught in a Yamaha Clavinova electronic piano lab by a certified music educator with 20 + years of piano teaching experience. Participants either borrowed keyboards for practice from the university or completed practice sessions at a local church. Weekly practice assignments were provided that consisted of finger exercises, written theory assignments, and two practice pieces.

Group Percussion Instruction (GPeI) consisted of a basic mallet program which included technical exercises (e.g., major scales), music theory, ostinati exercises (e.g., Music For Children: Volume 1, Murray, 1976) and repertoire performed was chosen from Get America Singing Again vol. 2 (Hal Leonard Publishing Corporation, 1997) and Hot Marimba (Hampton, 1995). Each participant borrowed a soprano or alto Peripole-Bergerault Orff rosewood xylophone that was brought to each class held at the local senior center. Classes were taught by the same music educator. Weekly homework assignments were provided that consisted of mallet exercises (e.g., scales, ostinati, and patterning), written music theory assignments, and two percussion ensemble pieces.

Music Listening Instruction (MLI) consisted of a basic music appreciation course with information about genre, composers, and forms in classical and world music based upon the Music Listening Today text (Hoffer, 2005). All music listening classes were held at a local senior center and were taught by the same music educator. Participants were loaned copies of the texts and accompanying enhanced CDs for the duration of the program. All enhanced CDs, not only provided audio practice, but CDs could also be inserted into computers (either at home or at the senior center) to view a listening map containing illustrated formal elements of the pieces unfolding over time. Weekly homework assignments included listening assignments and written questions that required written responses.

#### Description of Measures Baseline Measures

The Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler, 1999) was used to generate a Verbal Intelligence Quotient (VIQ), Performance Intelligence Quotient (IQ), and an estimate of Full-Scale Intelligence Quotient (FSIQ) (**Table 1**). The WASI consists of four subtests: Vocabulary, Block Design, Similarities, and Matrix Reasoning. The Vocabulary subtest is an untimed measure in which participants orally define 42 words based upon visual stimuli. The Similarities subtest involves 26 sets of paired words in which participants respond as to how the two words are similar. The Block Design subtest measures visuospatial skills with 13 timed block patterns constructed from nine cubes. Items of the Matrix Reasoning subtest measure non-verbal reasoning skills with the selection of a missing piece to complete a visual pattern from five potential items. The Vocabulary and Similarities subtests contribute to an estimate of Verbal Intelligence Quotient (VIQ), while the Block Design and Matrix Reasoning subtests may be used to estimate one's Performance Intelligence Quotient (PIQ). The WASI scores correspond to intelligence scores from the longer Wechsler Adult Intelligence Scale III (Axelrod, 2002); and provide norms for a broad age range (6–89 years).

To measure musical aptitude, participants completed the Advanced Measures of Music Audiation (AMMA; Gordon, 1989). The AMMA consists of 30 paired-melodic phrases and requires participants to differentiate changes in piano melodies. Melodies are either tonally altered, rhythmically altered, or the same. The AMMA was chosen for its reliability (r = 0.81) and content validity.

## Dependent Measures of Executive Functions

#### Visual Scanning and Working Memory

The Trails Test Card A and B (TMT; Reitan and Wolfson, 1985) was administered as a measure of visual scanning and working memory. The test requires sequencing, visuomotor speed, and mental flexibility. Card A includes drawing a line connecting numeric stimuli in sequential order, while Card B required the switching between numeric and alphabetical stimuli in sequential order (1, A, 2, B, etc.). TMT scores reflected the time to complete the card (in seconds) and the number of errors made on the task. In order to examine the cognitive domain without the motor element, a TMT delta score was calculated; time to complete Card A was subtracted from time to complete Card B. Construct validity for the TMT was established by correlations (0.36–0.93) with an object-finding task and a hidden pattern task (Ehrenstein et al., 1982). Reported reliability coefficients for the TMT were (r = 0.94) for Part A and (r = 0.90) for Part B (Spreen and Strauss, 1998).

#### Processing Speed

The Paced Auditory Serial Addition Task (PASAT; Gronwall, 1977) was administered to evaluate processing speed with complex attention. PASAT includes four trials, each consisting of 25 items in which a string of single digit numbers is presented aurally with progressively faster interstimulus intervals. Respondents summed the last number with the new number and verbal responses were recorded. The PASAT was chosen for its reliability (r = 0.95; Nikravesh et al., 2017) and sensitivity in aging adults (Tombaugh, 2006).

#### Verbal Fluency

The Delis Kaplan Executive Function Measure Verbal Fluency subtest (Delis et al., 2001) includes two forms to minimize practice effects and measures three types of fluency conditions: letter fluency, category fluency, and category switching. Each of the three letter fluency trials consisted of naming as many words that begin with the letters F, A, S (pretesting) and B, H, R (post-testing) in 60 s. Words generated could not be names of people, places, or numbers. Category fluency consisted of naming items pertaining to a specific category such as animals or girl's names in 60 s. Category switching trials required switching between the generation of words belonging to two different categories such as fruit and furniture. Reliability coefficients for the conditions of the Verbal Fluency subtest are letter fluency (r = 0.88), category fluency (r = 0.82), and category switching (r = 0.51) (Delis et al., 2001).

#### Response Inhibition

fnint-13-00068 December 5, 2019 Time: 12:54 # 6

The Cued Color Word Stroop (Perlstein et al., 2006) was used to measure response inhibition in the visual domain with Color and Word cues. Each trial began with a 750 ms visual cue (Word or Color) followed by an 1000 ms delay, and a visual stimulus presented for up to 2500 ms, and terminated by the participant's response. Words are presented in one of three colors (red, blue, and green). Participants chose between the ink color and physical word depending upon instructions of the cue. For the color condition, participants responded to the ink color of the word. For the word condition, participants responded to the name of the physical word on the screen. A total of 360 trials were presented in 4 blocks of 90 trials in the following order: mixed, color, mixed, and word with pseudorandomized stimuli. Visual stimuli were presented in the center of a black screen visual display, and delivered on an Apple Macintosh MacBook Pro computer using an E-Prime Professional 2.0 software (Schneider et al., 2002) with a Psych Scope button box response apparatus. Participants were instructed to respond as quickly and accurately as possible. Prior to administration, participants successfully completed a practice block of each condition (color, word, or mixed) with 70% accuracy.

## Dependent Motor Measures

#### Manual Dexterity

The Finger Tapper Test (FTT; Reitan and Wolfson, 1993) was used to evaluate manual dexterity by depressing a key with the index finger for ten second increments until either a score within a five-tap range is reached or ten trials are completed per hand. Research in healthy older adults suggests that the FTT is a promising measure for upper body motor performance, and performance on the FTT was not correlated with performance on standardized measures of working memory, processing speed, or spatial organization (Ashendorf et al., 2009).

#### Bimanual Coordination

The Bimanual Coordination Task (BMCT; Bugos and Iordache, 2019) is a gross motor task that evaluates parallel motion,

contrary motion, and oblique motion with 15 separate colorcoded visual patterns, containing 8 beats in two rows (numbered boxes represent beats in a measure), one for the right hand, and one for the left hand (**Figure 1**). The top row represents the right hand pattern and the bottom row represents the left hand pattern. Boxes on the visual represented the beats in two sets of four beat patterns. Respondents are allocated 1 min to study each visual pattern and then perform the pattern on a color coded four octave Yamaha synthesizer with a metronome (90 = bpm). All patterns utilize black keys on the piano keyboard in sets of twos or threes. All sets of two or three black keys must be held down and selected at the same time. The first four practice items encompass responses with one hand and are followed by the inclusion of both hands. Thus, no previous knowledge of music reading or the piano keyboard is necessary to complete the task. The task was chosen for its reliability (r = 0.88) in a group of healthy older adults with no previous musical training (Bugos and Iordache, 2019). Reliability for sub scores of rhythm (r = 0.82), synchronization (r = 0.76) and pattern (r = 0.78) demonstrate good psychometric properties.

#### Other Variables

Mood was evaluated at baseline and at post-testing with the Beck Depression Inventory II (Beck et al., 1996) and the Geriatric Depression Scale (Yesavage, 1991), to ensure that mood had not changed significantly over the course of the intervention. While completion of the GDS has been shown to be less challenging for older adults, the sensitivity to change has been found to be more robust in the BDI. Thus, both were used as valuable instruments to evaluate changes in mood (Olin et al., 1992).

#### Statistical Analysis

Statistical analyses were carried out with IBM SPSS 24.0 (IBM Corp, 2016). Four factors were included into a Group × Time ANOVA on executive functions: Trail Making Test Delta Scores (visual scanning/working memory), Paced Auditory Serial Addition Task mean number correct (processing speed), Category Switching mean correct (verbal fluency), and Stroop mean errors (response inhibition) with Bonferroni correction to control for Type I error. The analysis of motor outcomes included the mean number correct for the dominant and non-dominant hands for the Finger Tapper Test (manual dexterity), and each of three domains for the Bimanual Motor Coordination Task (motor coordination): Hand Synchronization, Pattern Accuracy, and Rhythmic Accuracy. Motor outcomes were entered into a separate Group × Time ANOVA to evaluate differences across each domain of motor performance with Bonferroni correction.

## RESULTS

Demographic data for participants (N = 135) can be found in **Table 1**. Results of an ANOVA across demographic variables showed no significant (p < 0.05) differences between age, F(2, 132) = 0.381, p = 0.68; education, F(2, 132) = 1.52, p = 0.22, estimate of full scale intelligence, F(2, 132) = 0.99, p = 0.37, or music aptitude, F(2, 132) = 0.36, p = 0.70.

## Executive Functions

fnint-13-00068 December 5, 2019 Time: 12:54 # 7

A Group (GPI, MLI, GPeI) × Time (Pretest, Posttest) Repeated Measures ANOVA was conducted across all measures of executive functions (**Table 2**): visual scanning/working memory (TMT Delta); processing speed (PASAT), verbal fluency (Category Switching-DKEFS), and response inhibition (Stroop). Results showed a main effect of time for processing speed, F(1, 127) = 29.23, p < 0.001, η 2 <sup>p</sup> = 0.187, ω 2 <sup>p</sup> = 0.179, and for verbal fluency, F(1, 127) = 12.04, p = 0.001, η 2 <sup>p</sup> = 0.086, ω 2 <sup>p</sup> = 0.079. Effect sizes for processing speed and verbal fluency were found to be moderate to large (Field, 2013). However, no main effect of time was found for response inhibition, F(1, 127) = 1.66, p = 0.20, η 2 <sup>p</sup> = 0.013, ω 2 <sup>p</sup> = 0.005, or visual scanning/working memory, F(1, 127) = 1.93, p = 0.17, η 2 <sup>p</sup> = 0.015, ω 2 <sup>p</sup> = 0.007.

Data showed a significant Group × Time interaction for visual scanning and working memory as denoted by the TMT Delta scores, F(2, 127) = 3.47, p = 0.03 η 2 <sup>p</sup> = 0.518, ω 2 <sup>p</sup> = 0.037 (**Figure 2**). No significant differences were found for verbal fluency in category switching, F(2, 127) = 0.84, p = 0.21, η 2 <sup>p</sup> < 0.001, ω 2 <sup>p</sup> < 0.001, or for cognitive control as measured by errors committed on the Stroop task, F(2, 127) = 0.84, p = 0.44, η 2 <sup>p</sup> < 0.001, ω 2 <sup>p</sup> < 0.001. However, a trend was found for processing speed on the PASAT, F(2, 127) = 2.49, p = 0.08, η 2 <sup>p</sup> = 0.038, ω 2 <sup>p</sup> = 0.022 (**Figure 3**).

A secondary analysis that included independent trials was conducted with a Group × Time ANOVA by cognitive domain was conducted to performance across specific areas of executive functions.

#### Visual Scanning and Working Memory

Results of an ANOVA on TMT Card A and B completion times showed a main effect of time on Card A, F(132) = 3.86, p = 0.05, η 2 <sup>p</sup> = 0.028, ω 2 <sup>p</sup> = 0.021, and Card B, F(1, 132) = 5.76, p = 0.02, η 2 <sup>p</sup> = 0.042, ω 2 <sup>p</sup> = 0.034. No group by time interaction was found for Card A, F(2, 132) = 1.50, p = 0.23, η 2 <sup>p</sup> = 0.022, ω 2 <sup>p</sup> = 0.007, and a trend was found for Card B, F(2, 132) = 2.67, p = 0.07, η 2 <sup>p</sup> = 0.039, ω 2 <sup>p</sup> = 0.024.

#### Processing Speed

An independent trial analysis revealed a significant main effect of time for all four trials, F(1, 132) = 11.09, p = 0.001, η 2 <sup>p</sup> = 0.078, ω 2 <sup>p</sup> = 0.070 (Trial 4, most challenging trial). However, only trial 3 showed a Group × Time interaction, F(2, 132) = 3.86, p = 0.02, η 2 <sup>p</sup> = 0.055, ω 2 <sup>p</sup> = 0.041. Pairwise comparisons reveal no significant interactions (p = 0.538) suggesting that the variance in scores may have contributed to this finding.

Dyad scores (i.e., the number of consecutive responses) were also analyzed (**Table 2**), since it is common for adults to skip items on this measure to compensate for task difficulty. Results of a Group × Time ANOVA on dyad scores revealed a main effect of time, F(1, 132) = 35.58, p = 0.001, η 2 <sup>p</sup> = 0.212, ω 2 <sup>p</sup> = 0.205, and no group by time interaction, F(2, 132) = 1.76, p = 0.18, η 2 <sup>p</sup> = 0.026, ω 2 <sup>p</sup> = 0.011.


<sup>∗</sup>Denotes significant (p < 0.05) group × time differences.

#### Verbal Fluency

Results showed a significant main effect of time was found for category fluency, F(1, 132) = 10.19, p = 0.002, η 2 <sup>p</sup> = 0.072, ω 2 <sup>p</sup> = 0.064, and category switching, F(1, 132) = 12.87, p < 0.001, η 2 <sup>p</sup> = 0.089, ω 2 <sup>p</sup> = 0.081, however, no effect of time was found for letter fluency, F(1, 132) = 2.51, p = 0.12, η 2 <sup>p</sup> = 0.019, ω 2 <sup>p</sup> = 0.011. No group × time interactions were found for any of the trials of the verbal fluency subtest: letter fluency, F(2, 132) = 1.86, p = 0.160, η 2 <sup>p</sup> = 0.027, ω 2 <sup>p</sup> = 0.013; category fluency, F(2, 132) = 1.55, p = 0.217, η 2 <sup>p</sup> = 0.023, ω 2 <sup>p</sup> = 0.008; and category switching, F(2, 132) = 0.86, p = 0.428, η 2 <sup>p</sup> < 0.001, ω 2 <sup>p</sup> < 0.001.

#### Response Inhibition

While the pattern of results showed decreases in mean errors committed by all groups at post-testing, analysis by blocks (color, mixed, word) showed no main effect for time and no group × time interaction. Independent trial analysis of error rates by block: color, F(2, 132) = 0.39, p = 0.676, η 2 <sup>p</sup> < 0.001, ω 2 <sup>p</sup> < 0.001;

TABLE 3 | Mood and Motor Measures.

word, F(2, 132) = 1.02, p = 364, η 2 <sup>p</sup> = 0.015, ω 2 <sup>p</sup> < 0.001; or mixed, F(2, 132) = 2.98, p = 0.055, η 2 <sup>p</sup> = 0.043, ω 2 <sup>p</sup> = 0.028; showed no significant main effect of time nor a Group × Time interaction.

#### Motor Speed Analysis

Results of the two motor tests administered, the Finger Tapper Test, and the Bimanual Motor Coordination Task can be found on **Table 3**. A separate Repeated Measures ANOVA conducted on measures of motor speed showed a main effect for time for the dominant hand of the Finger Tapper Test, F(1, 132) = 4.78, p = 0.03, η 2 <sup>p</sup> = 0.034, ω 2 <sup>p</sup> = 0.027, and all conditions of the Bimanual Motor Coordination Task including the Rhythm condition, F(1, 132) = 22.18, p < 0.001, η 2 <sup>p</sup> = 0.144, ω 2 <sup>p</sup> = 0.136. Synchronization condition, F(1, 132) = 21.60, p < 0.001, η 2 <sup>p</sup> = 0.141, ω 2 <sup>p</sup> = 0.133, and Pattern condition, F(1, 132) = 19.63, p < 0.001, η 2 <sup>p</sup> = 0.129, ω 2 <sup>p</sup> = 0.122. No main effect of time was found for the Finger Tapper Test, non-dominant hand, F(1, 132) = 0.08, p = 0.78, η 2 <sup>p</sup> < 0.001, ω 2 <sup>p</sup> < 0.001.

A significant Group × Time interaction was found for synchronization on the Bimanual Motor Coordination Task, F(2, 132) = 3.38, p = 0.03, η 2 <sup>p</sup> = 0.049, ω 2 <sup>p</sup> = 0.034 (**Figure 4**). No significant group × time interactions were found for the Finger Tapper Test for dominant hand, F(2, 132) = 0.78, p = 0.46, η 2 <sup>p</sup> < 0.001, ω 2 <sup>p</sup> < 0.001, or non-dominant hand, F(2, 132) = 0.34, p = 0.71, η 2 <sup>p</sup> < 0.001, ω 2 <sup>p</sup> < 0.001. Nor were group × time interactions found for the Rhythm condition, F(2, 132) = 1.56, p = 0.21, η 2 <sup>p</sup> = 0.023, ω 2 <sup>p</sup> = 0.008 or Pattern conditions, F(2, 132) = 1.03, p = 0.36, η 2 <sup>p</sup> = 0.015, ω 2 <sup>p</sup> < 0.001, of the Bimanual Motor Coordination Task.

#### Correlational Analysis: Cognitive and Bimanual Coordination Outcomes

Correlations were conducted between executive function composite scores and bimanual coordination scores. Composites scores for measures of executive functions included: visual scanning/working memory (Trail Making Card A and Card B); processing speed (PASAT), verbal fluency (Category Switching-DKEFS), and response inhibition (Stroop errors and response times); and composite scores from the Bimanual Motor Coordination Task: Rhythm, Synchronization, and Pattern


BDI, Beck Depression Inventory; GDS, Geriatric Depression Scale; FT, Finger Tapper; BMCT, Bimanual Coordination Task. <sup>∗</sup>Denotes significant (p < 0.05) group × time differences.

Accuracy were included. Results revealed significant negative correlations between response inhibition (Stroop error rates) and Pattern accuracy at pretesting (r = −0.208, p = 0.017) and at post-testing (r = −0.273, p = 0.002). When the error rates increased in Stroop scores, pattern accuracy decreased in bimanual coordination. A similar significant negative correlation was found for reaction times in Stroop performance and Rhythmic accuracy at post-testing (r = −0.285, p = 0.001), however, this relationship was not significant at pretesting (r = −0.150, p = 0.096.).

Results of the correlational analysis showed a significant negative correlation in the time to complete the Trail Making Test (Card A) and all three composite scores from the Bimanual Motor Coordination Task at pretesting: Rhythm (r = −0.173, p = 0.045), Synchronization (r = −0.184, p = 0.033), and Pattern Accuracy (r = −0.258, p = 0.002). Increases in completion times for the Trail Making Test (Card A), resulted in reduced accuracy on the Bimanual Motor Coordination Task. However, no significant relationships were found at post-testing, Rhythm (r = −0.107, p = 0.218), Synchronization (r = −0.027, p = 0.752), and Pattern Accuracy (r = −0.136, p = 0.115).

Results of the correlational analysis showed a significantly negative correlation between the time to complete the Trail Making Test (Card B) and bimanual coordination scores on Pattern (r = −0.253, p = 0.003) and Synchronization (r = −0.209, p = 0.015) at pretesting. However, no significant correlation was found between the time to complete the Trail Making Test (Card B) and Rhythmic accuracy scores (r = −0.129, p = 0.136). Post-testing correlations showed a similar significantly negative correlation between completion times on the Trail Making Card B and Pattern accuracy (r = −0.194, p = 0.024); however, no significant relationships were found for Rhythmic accuracy (r = −0.085, p = 0.324) or Synchronization (r = −0.091, p = 0.294).

Similar findings were present for results of correlational analysis for the Total Correct on the PASAT, a measure of complex processing speed. Significant positive correlations between the PASAT number correct for pretesting and bimanual scores of Pattern (r = 0.193, p = 0.025) and Rhythmic accuracy (r = 0.167, p = 0.045), but not for Synchronization accuracy (r = 0.103, p = 0.234). However, no significant relationships were found between bimanual coordination scores at posttesting and PASAT performance, Pattern (r = 0.150, p = 0.083), Synchronization (r = 0.001, p = 0.994) and Rhythmic accuracy (r = 0.089, p = 0.307). Also, no significant correlations were found between composites for verbal fluency and bimanual coordination scores.

#### DISCUSSION

The purpose of the study was (1) to examine the effects of fine motor and gross motor bimanual coordination music interventions on cognitive and motor performance in healthy older adults and (2) to examine the relationships between performance on cognitive measures and areas of bimanual coordination. The experimental design of this research is unique as it is the first study to dissect complex behavioral interventions with three music training interventions in beginning older adult musicians. Specifically, this research differentiated gross and fine motor coordination in a stringent study design to discern potential benefits to cognitive and motor performance. Two music interventions involved instrumental performance in an ensemble class that placed complex demands upon bimanual skill development, in-phase synchronization, antiphase coordination, precise timing, and sensorimotor integration (Repp and Su, 2013). Those assigned to GPI faced greater load on manual dexterity in movements that required distal musculature.

## Cognitive Outcomes

fnint-13-00068 December 5, 2019 Time: 12:54 # 10

Our results showed significantly enhanced performance by the GPI and GPeI groups on visual scanning and working memory as compared to MLI. In addition, data showed a trend with a pattern of increased processing speed for the GPI group. These data are consistent with previous experimental studies examining the impact of active music participation in healthy older adults (Bugos et al., 2007; Bugos and Kochar, 2017; Dege and Kerkovius, 2018). Results may be due to the high level of sensorimotor integration involved in percussion and piano training programs. Sensorimotor synchronization in playing in an ensemble requires precise timing which involves anticipation of the beat and adaptation to stay with the ensemble (van der Steen and Keller, 2013; van der Steen et al., 2015). Synchronization to the beat was both challenging and rewarding for participants.

Results of this research that showed a significant main effect of time to visual scanning/working memory, processing speed, and verbal fluency, suggesting benefits related to participation in all active music interventions. These data are consistent with previous benefits stemming from drumming programs (Dege and Kerkovius, 2018), piano training programs (Bugos et al., 2007; Bugos and Kochar, 2017), and music listening (Mammarella et al., 2007). However, given research that examined factors contributing to executive functions on the Trail Making Test (Salthouse, 2011), we believe that differences found post-training may be related to processing speed.

Data analyzed by Salthouse (2011) examined areas of executive functions measured by the Trail Making Test, a common measure of visual scanning/working memory. Salthouse found abilities were related to spatial visualization and processing speed with no unique contribution of working memory. Recent research found after controlling for a general cognitive factor in a bifactor model in an older adult sample, only processing speed accounted for completion time on the Trail Making Test (MacPherson et al., 2019). Thus, changes found in this study may likely be due to enhanced processing speed, differentially measured by the Trail Making Test and PASAT. Furthermore, the correlational analysis includes supports the relationships between bimanual coordination and processing speed in the Trail Making Test and PASAT. Despite the low strength of the correlations found, replication of this research with alternate measures may help further differentiate the effects of music interventions to areas of executive functions.

No changes over time were found in response inhibition, a component of cognitive control. This result was in contrast to previous research by Seinfeld et al. (2013), who found enhanced cognitive control in older adults who received 16 weeks of group piano lessons. We believe these outcomes may be linked to practice quality and quantity. Practice requires cognitive control as the performer inhibits the incorrect note, in favor of the correct note or rhythm. In the Seinfeld et al. (2013) study, practice requirements were 45 min per day or 5 h weekly, and in the current study 30 min per day or 3 h weekly. Further research is necessary to evaluate the role of practice, time duration of practice, and practice strategies that most efficiently contribute to cognitive control.

## Motor Outcomes

Our results suggest that participants in all music interventions significantly improved bimanual motor outcomes of rhythmic accuracy, pattern accuracy and hand synchronization. In addition, the GPI group significantly improved in synchronization as compared to GPeI and MLI group.

This result was not surprising in that piano training requires complex motor skills coupled by sensorimotor synchronization. Research has shown that even after short-term piano experiences, neurological and functional changes occur as a function of motor and auditory experiences (Lappe et al., 2008). In research comparing musicians and non-musicians, it was suggested that non-musicians showed greater auditory-to-visual transfer of information as compared to experts who demonstrate auditoryto motor transfer of information (Brown and Penhune, 2018). While adults in the present study were not at the expert level, they certainly were able to demonstrate learning in the domain, as evidenced by a concluding recital (one for GPI, and GPeI) which included memorized music. Future research is necessary to examine the trajectory of music instruction and to identify the point at which older beginners begin to demonstrate auditory-tomotor transfer of information.

## Limitations and Potential Explanations

Measures in this study pertained to either cognitive or motor outcomes, this study did not measure learning in the trained domain with a measure of musical achievement or motivation. The lack of available music achievement measures with good psychometric properties and the lack of inclusion of measurement of musical achievement in training studies, renders it difficult to determine near transfer in the trained domain. In addition, the level of motivation may have influenced study outcomes. Anecdotal reports by many participants suggested an interest to enroll for the opportunity to learn musical notation; however, future research with standardized measures of motivation are necessary to explore the influence of motivation on music learning outcomes.

Practice logs were completed for all participants. While most individuals complied with practice requirements, some participants reported a longer practice period or less than 30 min per day. In addition, the quality of the practice session was not evaluated in this study. Future research is necessary to examine the quality and duration of practice sessions for novice adult musicians and the relationships to cognitive transfer. Recordings of each practice session and perhaps a practice coach, may contribute to the ability to evaluate the impact of music interventions delivered in a way that allows researchers to capture meaningful consolidation in music learning.

While the purpose of this study was to measure the effects of bimanual coordination in the context of music interventions, it is unclear as to whether or not the benefits to executive functions are unique to music training or may be an outcome of tasks that require complex bimanual motor practice (e.g., juggling). Further research is necessary to examine the role of bimanual motor coordination interventions on executive functions.

This research included participants from a wide agerange (60–80 years). To adequately answer additional research questions regarding the impact of age-group differences on cognitive and motor outcomes, there is need for future research with larger samples of older adults. Nevertheless, this study showed that active engagement in music interventions contributes to cognitive and motor performance in healthy older adults. Interventions delivered in this research differed from many community music programs by including non-musicians and extending: a progressively difficult music curriculum, opportunities for social interaction, practice requirements, and bimanual coordination exercises. Instruments selected for training included pitched percussion instruments (i.e., pianos, xylophones, and metallophones) which were chosen to reduce any anxiety associated with tuning or embouchure to obtain a high quality sound. Group music programs can promote fine and gross motor coordination which may contribute to cognitive outcomes in aging.

#### DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

#### REFERENCES


## ETHICS STATEMENT

The studies involving human participants were reviewed and approved by the East Carolina University Institutional Review Board. The patients/participants provided their written informed consent to participate in this study.

### AUTHOR CONTRIBUTIONS

JB conceived of the study, obtained funding, and carried out the activities associated with this study.

## ACKNOWLEDGMENTS

I wish to thank the Retirement Research Foundation (RRF) for their generous support of this research. I wish to thank Rebecca Martin, Joy Fitzpatrick, and Wendy Mostafa for testing assistance, and Jamelle Simmons, Ayo Alexandra Gbadamosi and Miranda Rose Torres for data assistance.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnint. 2019.00068/full#supplementary-material

MATERIAL S1 | Summary of the 16-week curriculum for GPI and GPeI interventions.


fnint-13-00068 December 5, 2019 Time: 12:54 # 11

associated with elevated attentional demand especially in older adults. Exp. Brain Res. 227, 289–300. doi: 10.1007/s00221-013-3511-7


fnint-13-00068 December 5, 2019 Time: 12:54 # 12

functional, and biochemical effects. Neurosci. Biobehav. Rev. 34, 721–733. doi: 10.1016/j.neubiorev.2009.10.005


**Conflict of Interest:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Bugos. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fnint-13-00068 December 5, 2019 Time: 12:54 # 13

# Music Training, Working Memory, and Neural Oscillations: A Review

Kate A. Yurgil<sup>1</sup> , Miguel A. Velasquez<sup>2</sup> , Jenna L. Winston<sup>2</sup> , Noah B. Reichman<sup>3</sup> and Paul J. Colombo2,3 \*

<sup>1</sup> Department of Psychological Sciences, Loyola University, New Orleans, LA, United States, <sup>2</sup> Department of Psychology, Tulane University, New Orleans, LA, United States, <sup>3</sup> Brain Institute, Tulane University, New Orleans, LA, United States

This review focuses on reports that link music training to working memory and neural oscillations. Music training is increasingly associated with improvement in working memory, which is strongly related to both localized and distributed patterns of neural oscillations. Importantly, there is a small but growing number of reports of relationships between music training, working memory, and neural oscillations in adults. Taken together, these studies make important contributions to our understanding of the neural mechanisms that support effects of music training on behavioral measures of executive functions. In addition, they reveal gaps in our knowledge that hold promise for further investigation. The current review is divided into the main sections that follow: (1) discussion of behavioral measures of working memory, and effects of music training on working memory in adults; (2) relationships between music training and neural oscillations during temporal stages of working memory; (3) relationships between music training and working memory in children; (4) relationships between music training and working memory in older adults; and (5) effects of entrainment of neural oscillations on cognitive processing. We conclude that the study of neural oscillations is proving useful in elucidating the neural mechanisms of relationships between music training and the temporal stages of working memory. Moreover, a lifespan approach to these studies will likely reveal strategies to improve and maintain executive function during development and aging.

#### Edited by:

Bruno L. Giordano, UMR 7289 Institut de Neurosciences de la Timone (INT), France

#### Reviewed by:

Peter Cariani, Harvard Medical School, United States Anne Keitel, University of Dundee, United Kingdom

> \*Correspondence: Paul J. Colombo pcolomb@tulane.edu

#### Specialty section:

This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Psychology

> Received: 13 June 2019 Accepted: 04 February 2020 Published: 21 February 2020

#### Citation:

Yurgil KA, Velasquez MA, Winston JL, Reichman NB and Colombo PJ (2020) Music Training, Working Memory, and Neural Oscillations: A Review. Front. Psychol. 11:266. doi: 10.3389/fpsyg.2020.00266 Keywords: music training, working memory, neural oscillations, frequency bands, executive functions

## INTRODUCTION

Music training engages some of our most complex cognitive abilities (Peretz and Zatorre, 2005; Zatorre and McGill, 2005) and induces brain plasticity in widely distributed cortical regions (Altenmüller and Schlaug, 2015). Musicianship requires selective and flexible attention, as well as inhibition of irrelevant auditory and visual stimuli. Musicians must readily manipulate stored information in accordance with complex hierarchies of rules and conventions. Reading musical notation requires strong spatial associations with percepts and symbols, and performing music requires effortful self-regulation and emotional expression.

Music training is emerging as an important model system for studying experience-dependent brain plasticity, and for the development of therapeutic interventions for healthy brain development and aging. Reports over the last few decades have firmly established that music training is associated with improvements on measures of executive functions, such as inhibitory

**179**

#### Glossary


control (Moreno and Farzan, 2015; D'Souza et al., 2018), working memory (Pallesen et al., 2010; Oechslin et al., 2013; Okhrei et al., 2016; Ding et al., 2018; D'Souza et al., 2018), and cognitive flexibility [see Okada and Slevc (2016) for a review], which coincide with structural and functional changes in brain regions implicated in these cognitive processes.

In studies using electroencephalography (EEG) and magnetoencephalography (MEG) to measure brain activity, inhibitory control is the most frequently reported executive function that is enhanced by music training, and it is frequently studied by measuring event-related potentials (ERPs), which coincide with discrete events. However, the relationship between music training and other components of executive function, such as working memory, has been studied less. In addition to ERPs, measurement of neural oscillations is a complementary method to study relationships between music training, working memory, and brain plasticity. In specific, the study of neural oscillations is well-suited for the study of events over periods of time from seconds to minutes, corresponding to the temporal stages of working memory. This review is focused on reports that link music training, working memory, and neural oscillations. The evidence that music training is associated with improvement in behavioral measures of working memory continues to increase. In addition, there is strong evidence linking behavioral measures of working memory to both localized and distributed patterns of neural oscillations (Tallon-Baudry et al., 1998; Jensen and Tesche, 2002; Kaiser et al., 2003, 2008, 2009; Herrmann et al., 2004; Grimault et al., 2009; Haegens et al., 2009; Moran et al., 2010; Van Dijk et al., 2010; Palva et al., 2011; Roux et al., 2012). And there is a small but growing number of reports of relationships between music training, working memory, and neural oscillations.

Taken together, we suggest that further study of neural oscillations will lead to significant progress in understanding the neural mechanisms that support effects of music training on behavioral measures of executive functions. The current review is divided into five main sections as follows: (1) Discussion of behavioral measures of working memory, and effects of music training on working memory in adults; (2) effects of music training on neural oscillations during distinct temporal stages of working memory; (3) relationships between music training and working memory in children; (4) relationships between music training and working memory in older adults; and (5) effects of entrainment of neural oscillations on cognitive processing.

## Behavioral Measures of Working Memory in Adults

There is increasing interest in the effects of music training on enhancement of general cognitive abilities and intelligence (Moreno et al., 2011; Schellenberg, 2011; Schellenberg and Weiss, 2013; Zuk et al., 2014; Costa-Giomi, 2015), and evidence suggests that working memory is an important component of the cognitive benefits derived from music training (Bergman Nutley et al., 2014; Suárez et al., 2015). For the purposes of this review, we define working memory as time- and capacity-limited storage of task-relevant information, which generally requires one or more of the following operations: mental manipulation, flexible use, or inhibition of distractors. Although working memory is sometimes equated with shortterm memory, we consider it distinct for two reasons. First, working memory, as defined above, is distinct from short-term storage in the general requirement of mental manipulations of encoded information, or the inhibition of goal-irrelevant stimuli. Second, working memory requires integrity of medial temporal lobe regions, whereas short-term memory does not.

Working memory is often studied using behavioral tasks that implement a combination of stored information, cognitive manipulation, and interference. These tasks include the N-back (Pallesen et al., 2010; Oechslin et al., 2013; Ding et al., 2018), backward digit span (George and Coch, 2011; Zuk et al., 2014; Clayton et al., 2016), reading span (Franklin et al., 2008; D'Souza et al., 2018), and operation span tasks (Franklin et al., 2008; D'Souza et al., 2018). For the N-back task, a sequence of visual or auditory stimuli is presented, and the participant maintains the information while deciding whether each subsequent stimulus item matches the stimulus that came N letters previously (Owen et al., 2005). For the backward digit span, participants are presented with a series of digits, then asked to report the sequence of digits in reverse order (Hester et al., 2004). Both the N-back and the backward digit span require participants to maintain information in memory and manipulate the information in a certain order. The N-back may be a more demanding task as the participant must simultaneously keep track of current stimuli and determine whether or not they match the stimulus N turns back. For the reading span task, N number of sentences are presented one sentence at a time. As N increases, so does memory load required to perform the task. For N = 2, two sentences are presented sequentially, and after each sentence is presented, the participant writes the sentence verbatim, and the last word of each sentence in order (Daneman and Carpenter, 1980). Finally, the operation span task requires participants to memorize a sequence of unrelated words while simultaneously performing a series of math operations. After all of the operation-word strings are presented, the participant writes all of the words that were displayed in the order of presentation (Turner and Engle, 1989). Therefore, both the reading span and the operation span require participants to hold information while working on a secondary task, which may cause interference. As noted above, each of these tasks fulfill the criteria of maintenance and manipulation of information, which may occur with differing levels of interference (Aben et al., 2012). Working memory performance on these and related tasks will be discussed in the paragraphs that follow in relation to music training and neural oscillations.

## RELATIONSHIP BETWEEN MUSIC TRAINING AND WORKING MEMORY IN ADULTS

Musicians reportedly outperform non-musicians on a variety of working memory tasks. For example, musicians showed better working memory capacity and duration than non-musicians using forward tonal discrimination tasks, whereas under atonal conditions, musicians outperformed non-musicians on working memory capacity, but not duration (Ding et al., 2018). Thus, musicians held more items in memory for both tonal and atonal auditory stimuli, but retained the items longer than non-musicians only for tonal stimuli. Furthermore, musicians have been reported to outperform non-musicians on N-back tasks using both auditory (Pallesen et al., 2010; Ding et al., 2018) and visual (Oechslin et al., 2013; Okhrei et al., 2016) stimuli. Therefore, the benefits of music training for working memory are not necessarily limited to the auditory domain. Musicians have enhanced verbal working memory compared to non-musicians, as measured by both reading and operation span tasks (Franklin et al., 2008). However, this advantage may not extend to the visuospatial domain (Hansen et al., 2013), as no significant differences were reported between musicians and non-musicians on the Colorado Assessment for visual working memory (Strait et al., 2014), a computerized visual working memory task (Okhrei et al., 2016), and a spatial span task (Hansen et al., 2013). Discrepant reports of effects of musicianship are most likely due to the type of information to be retained, and less likely due to the age of participants, the sample size, or the types of tasks used. Slevc et al. (2016) reported that music training predicts individual differences in working memory, but that effect is not as strongly related to inhibitory control, and unrelated to cognitive flexibility. Using a similar approach to study the relationships between music training and executive functions, Okada and Slevc (2018) used a test battery consisting of tasks related to three subcomponents of executive functions: working memory, inhibitory control, and cognitive flexibility. They reported a positive correlation between individual variation in music training and working memory updating, but no relationships between music training and inhibitory control or shifting. In these contexts, inhibitory control refers to one's ability to regulate behavior, attention, and thoughts – especially in the face of conflicting information, or when doing so requires overriding a prepotent response. Cognitive flexibility refers to one's ability to successfully switch between task demands or mental sets (Slevc et al., 2016; Okada and Slevc, 2018). Relationships between music training and inhibitory control and cognitive flexibility have been reviewed in other sources and will not be taken up here (Moreno and Farzan, 2015; Okada and Slevc, 2016). It should be noted, however, that understanding the relationships between music training and behavioral measures of executive functions will progress more rapidly if a reliable battery of tests of executive functions is used consistently among investigators. For example, the NIH Toolbox cognitive measures include measures of executive function and have the added benefit of normative data.

## Neural Oscillations and Working Memory in Adults

Neural oscillations are believed to represent synchronous changes in excitability in networks of cortical cells (Fries, 2005; Klimesch et al., 2007), and they are well-suited for the study of cognitive processes that occur over longer durations of time than a single event. This applies particularly to the study of the duration and capacity of the maintenance phase of working memory. Compared to intracranial recording techniques, EEG is a non-invasive and relatively inexpensive tool for measuring patterns of neuronal activity synchronized at the population level that may accompany sensory and cognitive processing. Electrical brain activity may be analyzed in the time-domain as ERPs, which are time-locked averages of EEG to a repeated stimulus or response event (see **Glossary**).

While there is abundant research related to ERPs and working memory (Drew et al., 2006; Yurgil and Golob, 2013; Pinal et al., 2015; Getzmann et al., 2018), the current review focuses on spectral (time-frequency) methods as a complementary approach to extracting ongoing oscillatory features that correspond to cognitive operations, such as the encoding or short-term storage of information. Various time-frequency analyses can be used to determine EEG magnitude or power within a given frequency band as well as the degree of coherence between different cortical regions (see **Glossary** for description of different frequency bands and spectral measures). Although time-frequency analyses are typically more computationally intensive than time-domain ERP methods, characterizing cortical oscillations and their synchronization is advantageous in examining the brain's distributed processes. In addition, there is strong evidence that different oscillations are supported by different functional networks that underlie sensory and cognitive processes, including working memory.

There is ample evidence that neural oscillations are important for memory formation. This was first demonstrated in studies of long-term potentiation (LTP) and long-term depression (LTD), which are neuronal models for memory formation involving biphasic changes in neuronal responses to synaptic inputs. Huerta and Lisman (1995) reported that a single burst of stimulation at the peak of theta phase induced LTP, whereas stimulation at the trough of theta induced LTD. The role of hippocampal theta has been investigated extensively in rodent models of memory formation (Buzsáki et al., 1994), and extended to interactions between the hippocampus and prefrontal cortex in humans (Anderson et al., 2010). Thetaband activity is also implicated in "chunking" of perceptual auditory information, which may contribute to hierarchical control representations necessary for complex, flexible behavior (Kikumoto and Mayr, 2018; Teng et al., 2018). In addition to theta, other neural oscillations, including alpha and gamma, are implicated in memory formation and maintenance. Indeed, Roux and Uhlhaas (2014) attributed distinct functional roles of alpha, theta, and gamma neural oscillations in working memory. In specific, they proposed that activity in the gamma-band subserves maintenance of working memory, whereas alpha-band oscillations may inhibit task-irrelevant information, and theta-band oscillations are important for temporal order of items in working memory. While significant advances have been made in understanding the roles of neural oscillations in working memory, less is known about the relationships between music training, neural oscillations, and their functional networks.

In this review, we offer a framework for examining oscillations related to different working memory components, and how music training may influence these underlying mechanisms. Specifically, we propose that oscillations may be used to: (1) dissociate temporal components of working memory, that is to distinguish between processing stages of encoding, maintenance, and retrieval; (2) investigate working memory processing demands, such as changes in memory load and inhibition; (3) measure short and long-range synchronous activity to examine the relative contributions of distributed brain regions involved during working memory tasks; and (4) examine the degree to which music training influences oscillatory activity during working memory.

#### Music Training, Neural Oscillations, and Working Memory Maintenance

Stimulus-related oscillations may fluctuate over time and are therefore advantageous when examining neural activity over memory delays. There is considerable evidence that working memory maintenance, or the process of sustaining information in the absence of sensory input, is associated with enhanced activity or coherence in theta (4–8 Hz; Jensen and Tesche, 2002; Moran et al., 2010), alpha (8–12 Hz; Herrmann et al., 2004; Grimault et al., 2009; Haegens et al., 2009; Van Dijk et al., 2010), beta (12–30 Hz) in non-human primates (Tallon-Baudry et al., 2004) and humans (Palva et al., 2011), and gamma (>30 Hz; Tallon-Baudry et al., 1998; Kaiser et al., 2003, 2008, 2009; Palva et al., 2011; Roux et al., 2012) frequency bands [see Roux and Uhlhaas (2014) for extensive review of theta, alpha, and gamma]. See **Table 1** for a summary of results.

In addition, activity within each frequency band has been shown to vary with working memory load, or the number of items maintained over a brief delay (**Table 1**). High load conditions – that is, maintaining more information within working memory – are associated with increased power in alpha (Grimault et al., 2009; Palva et al., 2011; Samuel et al., 2018), beta (Palva et al., 2011), and gamma ranges (Howard et al., 2003; Palva et al., 2011). Increased load during memory delays is also associated with increased peak theta frequency (Moran et al., 2010), however findings on theta power and working memory load are mixed. Some studies show increased theta power with memory load over midline frontal (Jensen and Tesche, 2002; Meltzer et al., 2007) and other dispersed brain regions (Raghavachari et al., 2001, 2006), while others show decreased power with load over lateral frontal regions (Meltzer et al., 2007; van Vugt et al., 2010; Brzezicka et al., 2019).

While less studied, music training-related differences in working memory maintenance have been shown within the theta and beta ranges (**Table 2**). For example, music training may enhance theta-related activity during delay periods, affecting subsequent memory processing. Cheung et al. (2017) reported that musicians exhibited increased intra-hemispheric theta coherence during verbal memory encoding. Coherence is a measure of consistency within a given frequency band in amplitude or phase between different brain areas, thus higher coherence indicates greater synchrony of activity across regions. Higher theta coherence in musicians also correlated with subsequent memory performance (Cheung et al., 2017). Likewise, compared to non-musicians, musicians show increased left hemispheric theta coherence when judging the semantic relatedness of a new stimulus to that of previously learned information (Dittinger et al., 2017). Although not working memory maintenance per se, these tasks required sustained activation of memory representations for later processing; thus, music training may enhance theta coherence during


DMS, delay match-to-sample task; aDMS, auditory DMS; vDMS, visual DMS; F, frontal; C, central; T, temporal; P, parietal; O, occipital; M, medial; L, left; R, right. SII, supplementary somatosensory; RT, reaction time. Frequency bands: δ, delta (1–4Hz); θ, theta (4–8 Hz); α, alpha (8–12 Hz); β, beta (12–30 Hz); γ , gamma (>30 Hz).

tasks that require maintaining stimulus representations over time. Increased theta coherence may support connections between brain areas important for memory processing, including medial temporal lobe and prefrontal cortex (Jones and Wilson, 2005; Anderson et al., 2010). In addition, Hsu et al. (2017) compared beta activity in musicians and non-musicians during an auditory N-back task. Enhanced beta activity has been shown to facilitate maintenance of information over delays in non-human primates (Tallon-Baudry et al., 2004) and humans (Lundqvist et al., 2016), and is predictive of individual visual working memory capacity (Palva et al., 2011). Group differences in beta activity indicated a processing advantage for musicians over non-musicians. As these differences were observed during the first 30 s of the 0-back condition, music training may promote more efficient encoding and maintenance of information within working memory, in the absence of interference or distraction.

There is also evidence that music training alters oscillatory coherence between different brain areas that are active during spatial working memory tasks (**Table 2**). Bhattacharya et al. (2001) examined the effects of music training on gamma coherence during mental rotation, a task that requires participants to discriminate between a 3D object and its rotated mirror image (Shepard and Metzler, 1971). Gamma coherence between frontal and right parietal cortex increased during mental rotation for all participants; however, musicians showed higher coherence overall and greater phase synchrony in left hemisphere compared to non-musicians. Consistent with visuospatial behavioral findings discussed above (Hansen et al., 2013; Strait et al., 2014; Okhrei et al., 2016), there were no behavioral group differences in mental rotation. However, differences in gamma synchrony and hemispheric recruitment suggest music training alters activity within functional networks that support spatial working memory processes, including parietal and prefrontal cortices (Alivisatos, 1992; Tagaris et al., 1996; Alivisatos and Petrides, 1997). In a more recent study, musicians demonstrated enhanced gamma coherence between frontal and temporal regions during a spatial working memory task, but reduced coherence across all other frequency bands compared to non-musicians (Boutorabi and Sheikhani, 2018). Enhanced coherence was observed across the entire trial, and thus not strictly limited to working memory maintenance. However, it appears that enhanced coherence observed in musicians mediates cortical interactions important for working memory.

#### Music Training, Neural Oscillations, and Working Memory Encoding and Retrieval

Fluctuations in oscillatory activity may drive not only the active maintenance of items over brief delays, but the encoding and retrieval of such items. In a visual delayed match to sample paradigm, higher memory loads were associated with decreases in alpha frequency and power during encoding, and increases in alpha frequency during retrieval (Samuel et al., 2018). In another study by Myers et al. (2014),


TABLE 2 | This table summarizes the findings of the effects of music training on working memory and oscillatory activity.

M, musicians; NM, non-musicians; FB, frequency band; F, frontal; C, central; T, temporal; P, parietal; O, occipital; M, medial; L, left; R, right; RT, reaction time; BDS, backward digit span.

pre-stimulus fluctuations in alpha power predicted accuracy during working memory retrieval; specifically, alpha eventrelated desynchronization (ERD; reduced power) was correlated with better memory performance.

Music training may further modulate pre-stimulus alpha activity, consequently affecting working memory retrieval (**Table 2**). In a study by Klein et al. (2016), pre-stimulus alpha activity over anterior–posterior regions was negatively associated with reaction time during a visual Sternberg task for musicians compared to non-musicians. According to findings from functional imaging studies, working memory recruits fronto-parietal brain areas (Wager and Smith, 2003; Owen et al., 2005), thus reduced anterior–posterior alpha activity may facilitate processing task-relevant information in musicians compared to non-musicians (Klein et al., 2016). Group differences in pre-stimulus alpha activity may reflect experiencedependent engagement of different neural networks involved in working memory.

Desynchronized oscillatory activity has been observed in other learning and memory paradigms. In a study by Silva et al. (2018), non-musicians learned to identify different melodic intervals, and were tested after an initial training session (baseline), and after 5 days of at-home training sessions (post-training). Improved accuracy and reaction time to learned intervals was accompanied by reductions in alpha, beta, and gamma activity from baseline to post-training. Taken together, these findings suggest that changes in oscillatory activity and their functional networks may reflect individual variability during encoding, maintenance, and retrieval of items within working memory.

#### Music Training, Neural Oscillations, and Distractor Inhibition

It is important to note that the majority of findings discussed above were derived from working memory tasks that probe temporary storage of information in the absence of processing task-irrelevant information. However, given that working memory is a capacity-limited resource, active inhibition of taskirrelevant information is an important and necessary executive function. Neural oscillations are a useful tool in dissociating between storage and processing components of working memory and their underlying functional networks.

There is substantial evidence that alpha oscillations, first observed by Berger (1929), are involved in attentional suppression of task-irrelevant information. According to the alpha inhibitory hypothesis, alpha desynchronization (reduced alpha power) is associated with active information

processing while alpha synchronization (increased alpha power) is associated with active inhibition of task-irrelevant information [see Klimesch et al. (2007) and Foxe and Snyder (2011) for reviews].

Using a modified Sternberg task with weak and strong distracters, Bonnefond and Jensen (2012) found alpha activity increased with anticipation of strong distracters and was associated with faster reaction time to memory probes. Likewise, Sauseng et al. (2009) found increased alpha activity over posterior brain areas associated with distractor suppression, while maintaining relevant items was associated with increased theta-gamma interactions. Thus, high alpha activity facilitates inhibition of distracters during working memory. Furthermore, asymmetries in alpha power have been used to dissociate target selection vs. distractor inhibition functions within working memory (Schneider et al., 2019).

In tasks that require distractor suppression, individual differences in alpha activity may underlie behavioral performance during working memory tasks. Alpha activity has been shown to vary with working memory capacity, defined as the ability to actively maintain items in working memory in the face of distraction (Engle et al., 1999). Dong et al. (2015) found that independent of task difficulty, individuals with low working memory capacity exhibit greater alpha ERD (reduced alpha power), which may reflect the involvement of additional neural resources including those that are irrelevant to the task at hand.

To our knowledge, no literature exists on whether music training modulates oscillatory activity (alpha or otherwise) related to distractor inhibition. However, behavioral and ERP studies indicate better or more efficient inhibitory processing in musicians compared to non-musicians [see Moreno and Farzan (2015) for review]. While ERPs are useful in examining inhibition that is time-locked to a particular distracter event, measuring changes in oscillatory activity and coherence over longer periods of time may be useful to determine whether music training modulates distracter inhibition and associated functional networks over longer delays, as in working memory maintenance.

#### Music Training and Cross-Frequency Coupling During Working Memory

Neurons that oscillate at different frequencies may interact with each other, forming nested hierarchies in a time-dependent manner spanning multiple brain regions. Cross-frequency coupling, or the interaction of different frequency bands, is a useful tool in examining underlying generators and functional connectivity related to working memory.

It has been well established that working memory is associated with cross-frequency coupling between theta and gamma oscillations [Lisman and Jensen (2013) for review]. Studies show that the number of items maintained in working memory is related to the number of gamma rhythms nested within one theta cycle (Lisman and Idiart, 1995; Jensen and Lisman, 1998). This nesting predicts behavioral performance on working memory tasks, including response time (Axmacher et al., 2010) and individual differences in span (Kaminski et al., ´ 2011). Theta-gamma coupling during working memory has been associated with activity in fronto-hippocampal networks in both rodents (Belluscio et al., 2012) and humans (Axmacher et al., 2010). Furthermore, modulation of theta-gamma rhythms in prefrontal cortex improves working memory performance in adults (Alekseichuk et al., 2016).

Besides theta–gamma interactions, other cross-frequency couplings have been shown to support working memory processes. In monkeys, alpha–gamma coupling in the frontoparietal network is sensitive to changes in working memory load (Pinotsis et al., 2019). In humans, interactions between visual and fronto-parietal theta and alpha/gamma, and between alpha and beta/gamma oscillations are enhanced during working memory maintenance, and reflect interactions between sensory and fronto-parietal networks (Siebenhühner et al., 2016). Additionally, cross-frequency coupling increased with working memory load and predicted individual differences in working memory capacity.

Previous sections of this review discussed differences in activity within frequency bands related to musical expertise. However, to our knowledge, there are no reports of whether cross-frequency coupling during working memory varies as a function of music training. While music training-dependent interactions between frequency bands remain to be investigated, we provided evidence that music training may alter neural connectivity in functional networks underlying working memory. During spatial working memory tasks, musicians show enhanced gamma coherence in fronto-parietal (Bhattacharya et al., 2001) and fronto-temporal (Boutorabi and Sheikhani, 2018) networks. During verbal memory tasks, musicians show enhanced intra-hemispheric theta coherence (Cheung et al., 2017; Dittinger et al., 2017). Group differences in spectral coherence complement fMRI findings that musicians engage different but overlapping brain regions during verbal vs. tonal working memory tasks, while non-musicians recruit the same regions regardless of task (Schulze et al., 2011). Thus, musicians may rely on stronger or alternative functional networks when engaged in different working memory tasks. These findings may be supported by future investigations of the effects of music training on cross-frequency coupling during working memory.

## RELATIONSHIPS BETWEEN MUSIC TRAINING AND WORKING MEMORY IN CHILDREN

Several authors have reported that development of working memory may serve as a mechanism for emergence of cognitive abilities and other developmental outcomes (see Kraus et al., 2012). For example, there are reports of associations between working memory performance, music training or aptitude, and developmental outcomes such as neural encoding of speech (Strait et al., 2011, 2012; Christiner and Reiterer, 2018; Ireland et al., 2018), as well as cognitive skills related to reading abilities (Banai and Ahissar, 2013; Degé et al., 2015).

Behavioral investigations of children have shown that music training and musical aptitude are associated with enhanced auditory working memory. For example, in a longitudinal study of 6–8-year-old children, half of the sample was randomly

assigned to biweekly keyboard training for 6 weeks, while the other half received no training. Afterward, only the training group demonstrated a significant improvement in working memory capacity, measured with the backward digit span task (Guo et al., 2018). Advantages in working memory capacity have also been observed in school-aged children who received at least 6 months of music lessons (Degé and Schwarzer, 2017), as well as pre-schoolers who received 1 year of violin instruction (Fujioka et al., 2006). Bergman Nutley et al. (2014) measured working memory capacity every 2 years on two or three occasions among participants aged 6–25 years. They reported that musical practice was associated with better working memory capacity at each timepoint, and the increase was proportional to the hours of weekly musical practice, suggesting a dose–response relation between musical practice and working memory capacity.

The findings of cross-sectional studies of musical training and working memory in children are mixed. One report showed a musician advantage for auditory but not visual working memory (Strait et al., 2012), while other reports showed no effect of music training on working memory (Banai and Ahissar, 2013; Sachs et al., 2017). It should be noted, however, that while Sachs et al. (2017) reported no effect of music training on working memory, they did find differences between children with music training and those without training in measures of neural activity in brain regions associated with cognitive processes. In addition, cross-sectional studies have examined the relationship between musical aptitude and working memory capacity. Such investigations have shown that higher working memory capacity is associated with better scores on rhythmic (Strait et al., 2011; Degé et al., 2015; Ireland et al., 2018) and tonal (Christiner and Reiterer, 2018) subtests of musical aptitude in children as young as 5 years.

One issue in studies of relationships between musical training and cognitive outcomes is the direction of causality. For example, those who are predisposed to succeed in music may also be predisposed to demonstrate enhanced cognitive abilities relative to age-matched peers (see Schellenberg, 2011). Alternatively, music training may cause enhancement of cognitive abilities. Longitudinal studies in which participants were randomly assigned to music training or control conditions clearly indicate a causal effect of training on working memory capacity (Fujioka et al., 2006; Degé and Schwarzer, 2017; Guo et al., 2018). Furthermore, evidence for a dose–response relationship between musical practice and working memory supports the view that training, rather than any predisposition, produces changes in working memory capacity (Bergman Nutley et al., 2014). However, associations between working memory capacity and measures of musical aptitude (Strait et al., 2011; Degé et al., 2015; Christiner and Reiterer, 2018; Ireland et al., 2018) suggest that training-related advantages are not independent of innate abilities. Thus the evidence presented here support the view that musical training and musical aptitude both contribute to working memory performance among children.

## Working Memory and Neural Oscillations in Children

As indicated previously in this review, working memory is associated with neural oscillations in adults, and this relationship has also been reported in children (**Table 3**). The majority of research on neural oscillations in children has used the delayedmatch-to-sample test to measure oscillations during memory maintenance. As with work in adults, alpha oscillations have been implicated in working memory maintenance in children. For example, one study of 10–13-year-olds, in which alpha activity was measured during working memory maintenance showed that the lateralization of alpha power changed with the number of items in working memory (Sander et al., 2012). Specifically,

TABLE 3 | This table shows the results of studies on development, working memory, and oscillatory activity.


YA, young adults; OA, adults; Ado, adolescents; Chil, children; F, frontal; C, central; T, temporal; P, parietal; O, occipital; M, medial; L, left; R, right.

when the number of items increased from low to medium, alpha power became more lateralized, whereas when the number of items increased from medium to high, alpha power became less lateralized. In contrast to the pattern seen in children, the adults showed an increase in alpha power lateralization as the number of items in working memory increased. In another study, Sato et al. (2018)reported higher alpha phase synchrony in 6-year-olds during the retention period of a delayed match to sample task for correct trials than for incorrect trials. Taken together, these reports indicate that the alpha network is engaged in memory maintenance during childhood.

In contrast to the view that alpha oscillations are important for memory maintenance in children, another study showed no difference in alpha coherence, but an increase in theta coherence, during maintenance among 7–8-year-old children (Machinskaya and Kurgansky, 2012). Notably, adults in the latter study showed an increase in alpha coherence, but no change in theta coherence during memory maintenance. Rodriguez-Martinez et al. (2013) also showed that theta oscillations play a role in working memory in children, and that this role diminishes throughout development. Specifically, the authors showed correlations between resting state theta oscillations and composite scores on a nine-subtest measure of working memory, conducted across an age range of 6–26 years. Power spectral density of the theta range negatively correlated with age across the sample, and this relationship, when included in a bivariate model with reaction time scores on the oddball task, accounted for 90% of variability in working memory due to age (Rodriguez-Martinez et al., 2013).

Taken together, these results indicate that alpha and theta oscillations are involved in working memory during childhood, and that oscillatory patterns during working memory processing differ between children and adults. Alpha peak frequency (APF) increases from infancy through adulthood (Klimesch, 1999), and may be a useful marker of brain maturation (Valdés et al., 1990), and cognitive development (Mierau et al., 2016). However, due to divergent findings on the role of alpha (Sander et al., 2012; Sato et al., 2018) and theta (Machinskaya and Kurgansky, 2012) during working memory maintenance, further studies are needed to understand the contributions of different oscillations during working memory maintenance.

It is also possible that gamma rhythms play an important role in working memory maintenance, specifically in regard to its maturation across development. In a recent study, 10–12-yearold children were compared with 15–17-year-old adolescents on the delayed match to sample task. The older children showed increased gamma power during the delay phase, and an increased gamma response to transcranial magnetic stimulation (TMS) applied to the prefrontal cortex. It is worth noting that the TMSelicited gamma power was positively correlated with working memory capacity (Walker et al., 2019).

Overall, the present discussion shows the utility of measuring oscillations for investigating the neural mechanisms of working memory processing at different stages of development. Because working memory capacity and duration improve over the course of development, and oscillatory patterns for the same abilities vary at different stages of development, studying how oscillations change throughout development could provide insights into the neural mechanisms of working memory. No study, to our knowledge, has been conducted in children to investigate relationships between music training, working memory, and neural oscillations. Thus, a significant gap in our knowledge is elucidation of how music training interventions and measures of musical aptitude might enhance or alter oscillatory activity throughout development.

## RELATIONSHIP BETWEEN MUSIC TRAINING AND WORKING MEMORY IN OLDER ADULTS

In the sections above, we reported that adults with music training outperformed those without music training on several behavioral tasks of working memory. Consistent with those findings, older adults with music training also outperform older adults without music training. For example, Parbery-Clark et al. (2011) compared older musicians and non-musicians on auditory and visual working memory, and the ability to perceive speech in noise. They reported that musicians were significantly better at perceiving speech in noise and performed better in auditory, but not visuospatial, working memory capacity tasks. The study also revealed a linear relationship between auditory working memory and speech in noise performance, suggesting that these two functions are related. Grassi et al. (2018) also reported that older adult musicians outperformed older adult non-musicians on auditory and visuospatial working memory tasks, as well as auditory discrimination, but the groups did not differ on tests of short-term memory. In addition, Amer et al. (2013) reported that older adult musicians outperformed older adult non-musicians on several tests of executive functions, including visuospatial working memory. Taken together, these results reveal similarities and differences between adults and older adults with regard to relationships between music training and working memory. As with young adults, the benefits of music experience on working memory among older adults are not confined to the auditory domain. However, the finding that older musicians perform better than non-musicians on visuospatial working memory are in contrast to reports that young adult musicians and non-musicians do not differ on visuospatial working memory (Hansen et al., 2013). Also, lifelong music training in older adults is not always associated with stronger working memory, as Hanna-Pladdy and MacKay (2011) reported strong correlations between music training and other executive functions, including cognitive flexibility, but not working memory. Some of the discrepancies in findings between young and older adults may be due to the effects of music training to compensate for age-associated decline in some executive functions.

Recently several studies have tested the potential of music intervention programs in reducing the deleterious effects of aging on cognition (Bugos et al., 2007; Hars et al., 2013; De Oliveira et al., 2014). In a study conducted by Bugos et al. (2007), older adults participated in individualized piano instruction consisting of motor dexterity exercises and learning music theory. Participants had lessons each week for 6 months and were tested on cognitive and working memory measures across three time points: pre-training, post-training, and following a delay period of 3 months. The experimental group obtained significantly higher scores post-training on the Trail Making Test and Digit Symbols than the untrained controls, indicating an improvement in visoscanning, perceptual speed, and working memory.

## Effects of Music Training and Interventions on Dementia

fpsyg-11-00266 February 21, 2020 Time: 15:43 # 10

In addition to the benefits of music training on working memory in older adults, music training across the lifespan may serve as a protective factor against dementia (Verghese et al., 2003; Balbag et al., 2014). In a population based twin study, Balbag et al. (2014) showed older twins who played an instrument were 64% less likely to develop dementia than their co-twins. Contrary to these findings, a recent study (Kuusi et al., 2019) tracked the causes of death of classical musicians in Finland and found that musicians were just as likely to suffer from a neurodegenerative disorder as the general population. Taken together, these discrepant findings suggest that further studies are needed to determine whether music training is protective against neurodegenerative disorders.

Some studies have shown improvements in cognition and working memory in patients with dementia after active singing interventions (Särkämö et al., 2014; Maguire et al., 2015; Pongan et al., 2017). In a randomized controlled study, Särkämö et al. (2014) compared the effects of three different interventions on working memory on a group of patients with dementia. Each participant was assigned to one of three 10-week groupbased interventions: singing, listening to music, or a usual care control group with physical or social activities. They reported that participants in the singing group showed a temporary improvement in working memory as measured by backward digit span. In contrast, Narme et al. (2014) found no cognitive improvements in patients with dementia after a 4-week music intervention. The discrepant results may be due to differences in intensity or duration of the interventions, suggesting that important parameters for interventions, such as the effective "dose" of training, are not yet well established.

Overall, these studies show that music training may protect against age-related decline in working memory, as well as improve performance among older adults who show decline in working memory. Importantly, music training may also be useful in the prevention and treatment of dementia.

### Neural Oscillations in Older Adults

In comparison with young adults, the neural oscillations of older adults undergo several changes. As described in the paragraphs that follow, these changes may be related to age-related decline in working memory and cognition. However, there is a significant gap in our knowledge of the extent to which music training and music interventions can prevent or restore patterns of oscillatory activity and working memory among older adults. Some of the most often reported age-related changes in brain wave activity are in: (1) activity during resting state recording (slowing of alpha frequency and theta changes); (2) changes in functional connectivity, or coherence; and (3) changes in power (or ERS/ERD) across different frequency bands and electrode sites during active recording. We discuss age-related changes in each of these measures below.

#### Resting State Alpha Frequency Slows Among Older Adults

Some of the most well studied age-related changes occur during resting-state recordings. Resting state alpha frequency increases throughout development (Lindsley, 1939; Stroganova et al., 1999; Marshall et al., 2002) reaching a peak in young adulthood (Chiang et al., 2011) and then slowing in older adults. The slowing of dominant alpha in older adults is a well-known phenomenon that may indicate various cognitive deficits (Obrist, 1954; Duffy et al., 1984; Köpruner et al., 1984; Aurlien et al., 2004; Chiang et al., 2011; Scally et al., 2018). Clark et al. (2004) found that age-related slowing of alpha, as measured by APF, was negatively correlated with working memory capacity. In the elderly, APF slowing is more dramatic in anterior than in posterior recording sites compared to young adults. In addition, variability in APF is negatively correlated with working memory across the lifespan, indicating the effectiveness of APF as a biomarker for working memory, a finding that has been replicated (Grandy et al., 2013). It is possible that normalizing the APF could restore some age-related decline in cognition. Training older individuals to increase APF through neurofeedback may improve processing speed, inhibitory control, and working memory (Angelakis et al., 2007).

Alpha peak frequency could also serve as a potential biomarker for mild cognitive impairment (MCI) (Babiloni et al., 2018) as it is lower for the MCI population than for normal older adults, as well as being correlated with MMSE scores. Furthermore, APF in posterior sites is positively correlated with hippocampal volume, which is reduced in the mild cognitively impaired population (Garcés et al., 2013). Likewise, a negative correlation has been found between frontal theta power and hippocampal volume (Grunwald et al., 2001). Overall, studies seeking to investigate the effects of music training on cognition and working memory in older adults should consider the relationship between aging, cognition, and changes in oscillatory activity during resting state, like APF.

Recording of neural oscillations during working memory may elucidate neural changes associated with aging (**Table 4**). Young and elderly adults show different patterns of ERS/ERD in alpha (Sander et al., 2012) and theta (Karrasch et al., 2004) during active recording in working memory tasks, even though the two groups may perform similarly. Older adults also tend to show lower theta power during working memory maintenance (Cummins and Finnigan, 2007; Kardos et al., 2014, Tóth et al., 2014) in comparison with young adults. Older adults also have lower theta (Kardos et al., 2014) and alpha power (McEvoy et al., 2001; Sander et al., 2012) in high working memory load tasks. Moreover, Sander et al. (2012) reported that older adults and children demonstrate similar activity during working memory tasks. On average, both older adults and children have higher


YA, young adults; OA, older adults; F, frontal; C, central; T, temporal; P, parietal; O, occipital; M, medial; L, left; R, right.

alpha power in comparison to young adults for low working memory loads. In children, higher alpha power is likely due to less well-developed neural mechanisms of working memory. In contrast, higher alpha power, and lower theta, in older adults is likely due to compensatory mechanisms.

#### Connectivity Decreases Among Older Adults

Age-related reductions of interhemispheric coherence have been observed in the delta, theta, alpha, and beta oscillations (Duffy et al., 1996; Kikuchi et al., 2000) during resting state recording. Global connectivity is also affected with age. Global alpha connectivity during resting state decreases with old age (Scally et al., 2018). Testing a large sample of 17,722, Vysata et al. (2014) reported an age-related decrease in global theta and alpha coherence. In a longitudinal study Fjell et al. (2016) found that reduction in connectivity is related to decline in inhibitory control. While there are not many studies that have explored the relationship between age-related declines in functional connectivity and working memory, it has been shown that working memory may improve by restoring theta synchronization in frontotemporal regions through transcranial alternating current stimulation (tACS; Reinhart and Nguyen, 2019). As mentioned previously, young adult musicians show higher theta synchrony during working memory tasks as well as better performance in those tasks (Cheung et al., 2017; Dittinger et al., 2017). Improved coherence as a result of music training may underlie the cognitive and working memory benefits seen in older musicians. Music training may help in strengthening some connections that may become weaker due to aging.

Consistent with the findings reported for young adults, older adults with music training outperform older adults without music training on working memory tasks. These findings indicate that cognitive benefits that occur as a result of music training persist throughout the lifespan. Musical interventions in older non-musicians may also help in ameliorating age-related cognitive decline. Both non-pathological and pathological aging are accompanied by disruptions in brain wave activity, and these disruptions may also reflect working memory decline. While there are no studies showing the effects of music training on oscillatory activity in older adults, studies in young adults show the potential of music training to alter neural oscillations in ways consistent with enhanced working memory. It remains to be determined whether or not music training may improve some of the functional connectivity that is lost during aging. Studies with mice have shown that entrainment of oscillations in the gamma range through visual (Iaccarino et al., 2016) and auditory (Martorell et al., 2019) stimulation improves spatial and recognition memory and reduces the neuropathology associated with Alzheimer's disease. The entrainment of neural oscillations is taken up in the following section.

## ENTRAINMENT OF NEURAL OSCILLATIONS

As discussed throughout this review, music training may alter oscillatory activity and functional networks associated with working memory. These findings suggest that oscillations may be targeted selectively to modulate working memory performance. Some studies have used stimulation techniques to disrupt oscillatory activity underlying WM processes; however, the effects on behavior may be at the individual level. For example, when repetitive TMS is applied over parietal brain regions during

memory delays, alpha-tuned stimulation impairs performance for individuals with high baseline WM capacity (Li et al., 2017), or for those who show increased alpha power during the delay period (Hamidi et al., 2009). Transcranial electrical stimulation applied over frontal and parietal regions during consecutive days of WM training has negative effects on WM performance and resting state connectivity (Möller et al., 2017). These findings seem to suggest a causal link between oscillatory rhythms and WM, such that perturbation of these rhythms, particularly during memory delays, negatively impacts maintenance of information over time.

If disruption of oscillatory activity impairs WM processes, can entrainment of specific rhythms enhance behavior? Indeed, a recent study using TMS examined whether tuning TMS to a specific frequency could enhance entrainment of neural networks associated with working memory (Albouy et al., 2017). TMS tuned to theta frequency enhanced performance during an auditory memory manipulation task; furthermore, theta power in correct trials positively correlated with musical experience. Similarly, theta-tuned tACS improved performance on an N-back task (Pahor and Jaušovec, 2018).

Stimulation techniques such as TMS and tACS, while non-invasive, require specialized equipment and expert administration. However studies have shown that simply listening to or tapping along with the rhythmic structure of music entrains the brain's low-frequency oscillations (Lakatos et al., 2008; Besle et al., 2011). For example, passive listening to musical sequences induces changes within alpha (Bridwell et al., 2017) and increases coupling between delta and high beta frequency ranges (Adamos et al., 2018). Word lists that are sung rather than spoken increase alpha coherence in bilateral frontal areas (Thaut et al., 2005) known to support learning-related processes (Sato et al., 2018). Furthermore, enhanced coherence to rhythmic musical stimuli correlates with improved memory for subsequent speech stimuli (Falk et al., 2017).

Tapping along with music rhythms induces beat-related entrainment (Nozaradan et al., 2015) as well as changes in functional somatosensory networks (Daly et al., 2014) and behavior (Nozaradan et al., 2016; Crasta et al., 2018). Importantly, rhythm entrainment is not limited to the auditory modality (Okawa et al., 2017) and can occur in the absence of an external stimulus. For example, entrained oscillatory activity persists during brief pauses in a rhythmic pattern (Stupacher et al., 2016), and varies not only with rhythm-based predictions of directly observed sequences, but also with memory-based predictions of imagined sequences (Breska and Deouell, 2017; Okawa et al., 2017). As discussed previously in this review, musicians show enhanced oscillatory activity and coherence during working memory delays, which may facilitate improved music-related error processing and pattern predictions shown in musicians compared to non-musicians (Doelling and Poeppel, 2015; Stupacher et al., 2017; Harding et al., 2019).

In addition to rhythm entrainment, oscillatory activity associated with working memory may be targeted by presenting two tones of different frequencies, one frequency to each ear; the resulting percept is of a binaural beat equal to the difference between the two frequencies (Licklider et al., 1950). Recent studies show that listening to binaural beats induces changes in cortical networks associated with information processing. For example, increased gamma and beta power over frontal and central regions in response to binaural beats improved short-term memory of middle-list items on a serial recall task (Jirakittayakorn and Wongsawat, 2017). In addition, accuracy on visuospatial (Beauchene et al., 2016) and verbal N-back (Beauchene et al., 2017) working memory tasks increased in response to binaural beats, which were thought to strengthen cortical networks involved in maintenance and retrieval of task-related information. Even passive listening to binaural beats induced changes in alpha power over frontal, temporal, and parietal lobes, with greater power increases in participants with musical experience (Ioannou et al., 2015). Thus, binaural beat stimulation may be a promising tool to entrain cortical networks involved in working memory.

## SUMMARY AND FUTURE DIRECTIONS

Music training requires storage, manipulation, and integration of complex pitch and temporal sequences. In this way, it shares several features with commonly used measures of working memory. Therefore, it is not surprising that music training is related to enhancements in executive functions, including working memory. This review provides evidence that the study of neural oscillations is important for understanding the neural mechanisms underlying relationships between music training and working memory. In specific, we provide evidence that measurement of neural oscillations is particularly useful for studying temporal stages of memory and cognition that may occur over spans of seconds to minutes. In addition, we show that a lifespan approach to the study of relationships between music training, working memory, and neural oscillations reveals similarities and differences in working memory and underlying neural events at different stages of life. The following are some of the important specific points raised in this review, as well as suggestions for further investigation.

First, behavioral studies in adults show that benefits of music training are not restricted to auditory working memory, but may extend to the visual or other sensory modalities. In addition, music training may influence working memory capacity more selectively than working memory duration.

Second, music training is related to distinct patterns of modulation of oscillations that are related to encoding, maintenance, and retrieval phases of working memory in adults. **Table 1** lists studies in which measures of neural oscillations were used to: (1) dissociate temporal components of working memory such as encoding, maintenance, and retrieval; (2) investigate working memory processing demands, such as changes in memory load and inhibition, and (3) measure short and longrange synchronous activity to examine the relative contributions of distributed brain regions involved during working memory tasks. **Table 2** lists studies that examine the degree to which music training influences oscillatory activity during working memory. While we have discussed differences in activity within frequency bands related to musical expertise, to our knowledge

there are no reports of whether cross-frequency coupling during working memory varies as a function of music training. Thus while music training-dependent interactions between frequency bands remain to be investigated, there is evidence that music training may alter neural connectivity in functional networks underlying working memory.

Third, neural oscillations that support working memory and related cognitive processes change across the lifespan, and may serve as targets for music training or selective entrainment. Investigators have not yet studied relationships between music training and neural oscillations in children or older adults. Thus it remains to be determined whether or not relationships between music training and neural oscillations observed in adults will be the same as those observed in children or older adults. It is notable that alpha and theta oscillations are involved in working memory during childhood, and that oscillatory patterns during working memory processing differ between children and adults (**Table 3**). In addition, theta coherence declines with age but is enhanced during working memory maintenance in young musicians compared to non-musicians. Alpha activity, which is associated with inhibition in young adults, is also shown to decline with age, as does inhibitory processing. Accordingly, selective modulation of theta and alpha may show age-related changes in working memory maintenance and distractor suppression, respectively (**Table 4**). Future studies could test this hypothesis by comparing

#### REFERENCES


the effects of music training or targeted entrainment on theta maintenance or alpha inhibition in young and older adults.

Finally, recent findings that entrainment of neural oscillations can enhance memory and cognition, and may reverse markers of neuropathology in models of Alzheimer's disease, suggest that further study of the role of neural oscillations in these processes will be necessary to guide development of therapeutic interventions for enhancement of cognition and for treatment of neural disorders across the lifespan. Of importance, music training, or targeted applications of musical stimuli, may serve as natural and non-invasive interventions for altering or entraining neural oscillations.

## AUTHOR CONTRIBUTIONS

All authors contributed to the conception, organization, drafting, and revision of this manuscript.

## FUNDING

This study was supported by the Phyllis M. Taylor Center for Social Innovation and Design Thinking, the Louise & Leonard Riggio Professorship in Social Innovation and Social Entrepreneurship, and Carnegie Professor I Endowments.

peak alpha frequency training for cognitive enhancement in the elderly. Clin. Neuropsychol. 21, 110–129. doi: 10.1080/13854040600744839


factors of general cognitive abilities. Neuroimage 79, 10–18. doi: 10.1016/J. NEUROIMAGE.2013.04.059



Training: Perspectives from Psychology, Neuroscience, and Human Development, eds M. Bunting, J. Novick, M. Dougherty, and R. W. Engle, (New York, NY: Oxford University Press), doi: 10.13016/M2GM81P70



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Yurgil, Velasquez, Winston, Reichman and Colombo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Mechanisms of Timing, Timbre, Repertoire, and Entrainment in Neuroplasticity: Mutual Interplay in Neonatal Development

Joanne Loewy <sup>1</sup> \* and Artur C. Jaschke2,3

<sup>1</sup>The Louis Armstrong Center for Music and Medicine, Mount Sinai Beth Israel, Icahn School of Medicine, New York, NY, United States, <sup>2</sup>Department of Music Therapy, Beatrix Children's Hospital—University Medical Centre, ArtEZ University of the Arts, Groningen, Netherlands, <sup>3</sup>Department of Neonatology and Clinical Neuropsychology, Amsterdam, Netherlands

Neonatal brain development relies on a combination of critical factors inclusive of genetic predisposition, attachment, and the conditions of the pre and postneonatal environment. The status of the infant's developing brain in its most vulnerable state and the impact that physiological elements of music, silences and sounds may make in the earliest stages of brain development can enhance vitality. However, little attention has been focused on the integral aspects of the music itself. This article will support research that has hypothesized conditions of music therapeutic applications in an effort to further validate models of neurobehavioral care that have optimized conditions for growth, inclusive of recommendations leading toward the enhancement of self-regulatory behaviors.

#### Edited by:

Paul J. Colombo, Tulane University, United States

#### Reviewed by:

Nadia Justel, Interdisciplinary Laboratory of Cognitive Neuroscience, Argentina Marcela Pena, Pontifical Catholic University of Chile, Chile

#### \*Correspondence:

Joanne Loewy joanne.loewy@mountsinai.org

Received: 28 September 2019 Accepted: 04 February 2020 Published: 02 March 2020

#### Citation:

Loewy J and Jaschke AC (2020) Mechanisms of Timing, Timbre, Repertoire, and Entrainment in Neuroplasticity: Mutual Interplay in Neonatal Development. Front. Integr. Neurosci. 14:8. doi: 10.3389/fnint.2020.00008 Keywords: music plasticity, neural music mapping, neonatal music mapping, music brain, neonatal music therapy

### INTRODUCTION

Recent research has evaluated the effects of music training with benefits that have attributed music learning as a significant contributor to the improvement of cognitive function (Barrett et al., 2013). In particular, music's implicit audio-structural capacity (Lordier et al., 2019) seems to reflect notable influence in both developmental outcomes and restorative capabilities that stretch across the lifespan. While numerous studies have focused on outcomes of young children (Trainor et al., 2012; Moreno et al., 2011) and aging adults (Johnson et al., 2011), few have considered the impact of the pre-natal 'sound' environment in terms of ritual and context, nor the culturally relevant conditions and applications that organized purposeful sounds and music may stimulate. Elements of sound in and out of music and their structured or disorganized contexts can be considered as disruptive or as viable enhancements to the advancement of the Neonatal Intensive Care Unit (NICU) experience (Stewart and Schneider, 2000; Rossetti and Canga, 2013), particularly as the conditions of the infant's brain in its most vulnerable state are at stake. The impact that the physiological elements of music, silences and sounds, can make in the earliest stages of brain development is worth consideration. This article will support research that has hypothesized conditions of music therapeutic applications validating models of neurobehavioral care that have provided proven optimal conditions for growth inclusive of recommendations leading toward the enhancement of self-regulatory behaviors. Developmental markers based on research and clinical experiences related to current quantitative data of musical elements have been sparse. Clinical experiences substantiated through the First Sounds: Rhythm, Breath and Lullaby research (Loewy et al., 2013) that lead to a model of practice and training will be explicated within this article (First Sounds, 2016). Specifically, we will address the optimal contexts for the essential foundational relationships between infants and caregivers based on music's neural plasticity factors. These can be enhanced through attachment conditions which are fortified through music to address its relationship in developing executive functions (EFs) across the lifespan (Sala and Gobet, 2017). Elements contributing to this context include live vital sign entrainment, social interaction via attunement, rhythmic assessment with the institution of indicated repertoire, and sleep/wake patterns inclusive of aural transition capacity.

## FACTORS OF INFLUENCE

#### Genetic Predisposition

Multiple influences such as auditory, visual or somatosensory factors, show a trend in influencing brain dynamics in young clinical populations, and especially in prematurely born infants in the NICU. These factors have the tendency to translate into later childhood and even adulthood expression (Galvan, 2018). Some of these trends can be seen as factors of influence in cognitive and overall neural development. Neural differences in social cognition, which can be triggered by hormonal changes, or by experiences of an ''unnatural'' sensory environment to the infant, have been described (Galvan, 2018). Their influence requires more research, as larger cohort studies are still scarce.

Neonatal growth relies on a combination of critical factors that contribute to brain development. In our experience of studying neonates' responses to music therapy stimuli, and in working within dyads and triads involving father, mother, staff and neonate, the factors of influence where music's potential might enhance developmental prowess are worth defining. This is because the neonate's sensitivity to sound and music is distinctly and definably related to neurologic outcomes. These factors are inclusive of genetic predisposition (Montirosso, 2015) and the conditions of the pre and postneonatal environment. There are distinct periods of infant development whereby a genetically determined microcircuitry of key limbic-hypothalamic-midbrain structures are particularly sensitive to early environmental contexts. These influences directly formulate an individual's responsiveness to psychosocial stressors as well as the resiliency or susceptibility to difficult circumstances in later life circumstances (Leckman et al., 2004).

#### Caregiver Capacity for Secure Attachment and Aural Sound Conditions

Developmental factors are typically attributed to a caregiver's knowledge and capacity for creating opportunities for secure attachment (Lomanowska et al., 2017). Emerging research undertaken first by Bowlby (Bretherton, 1992) and subsequently his student, Ainsworth emphasized the significance of secure attachment, and its relationship as a most critical interplay between the biological and environmental conditions of two systems, that of the mother (parent) and of the infant (Bretherton, 1992; Zimmer, 2019).

This interplay, coupled with an aural sensitivity (Lahav and Skoe, 2014) within a context conducive to optimal sound conditions can foster neural organization. Aural sensitivity is the intricate perception and processing of sounds. These cover a variety of different frequencies and can be traced back to, however not exclusively to, the tonotopic map of the primary auditory cortex. Moreover, sensitivity to sound reaches further than the primary auditory cortex and can have implications beyond the temporal lobe resulting in changes in behavior (avoidance, interest) or emotional responses (e.g., pain).

One study used a repeated-measures research design to examine variables of neonates recorded every 2 min through two 60-min observation periods for each research day (1 h in the morning and 1 h in the afternoon) over 2 days. Thirtyseven preterm infants provided a total of 4,164 observations that were made rendering a statistically significant (P < 0.05) relationship between environmental stressors and changes in physiological signals. There were also statistically significant (P < 0.05) relationships between environmental stress and specific stress behaviors. Physiologic stress responses to acoustic events such as changes in heart rate, intracranial pressure, and oxygen saturation may have a significant impact on the preterm infant's future neurologic development due to altered perfusion and oxygenation of the brain tissue (Peng et al., 2009). These changes have been shown to influence neurologic development. This study has shown a significant correlation between environmental stressors and physiological signals of bio-dynamical changes.

#### Trauma and Pain

Another factor that has moved in and out of the literature base encasing elements respective to the aforementioned conditions, is related to the concept of trauma. There is a suspected neurobiological mechanism by which maternal unresolved trauma can modify maternal caregiving so as to disrupt the normative development of an infant. This was highlighted in a recent review culminating a plethora of research indicating the impact of childhood adversity and its effects on parenting, which may be perpetuated inter-generationally (Kim et al., 2018).

Music therapist Kristen Stewart developed a model for treating trauma with premature infants and parents-entitled: Preventative Approach to Traumatic Experience by Resourcing the Nervous System (PATTERNS). Instituting music therapy in a preventative treatment model that is based on latent human resiliency and trauma renegotiation principles, Stewart's model (Stewart, 2009a) involves integrative mechanisms that influence perception, nurturance, and opportunities for building resilience at all levels within—infant-parent—and staff contexts utilizing music therapy.

An additional factor that affects neonatal neural activity involves the distinct formation of neuronal connections, including dendritic spine development, synaptogenesis, axon myelination, and crucial cortical folding of the neocortex (Dubois et al., 2008). Importantly, developmental changes are not wholly determined by intrinsic prenatal biological factors, but by postnatal experiences as well. While axonal guidance and myelination, and axonal elements are thought to be influenced by genetics, the distinct pathways of connections that are retained rather than pruned are within the axon myelination process as well and are sensitive to the sound environment (Benasich and Ribery, 2018). This brings to the forefront of the realization that neonatal neural activity is affected by pain (Anderson et al., 2011). The younger the gestational age of a premature infant's receptive fields are the more vulnerable the effects of pain and in this way, the result of pain can involve hypersensitivity (Fitzgerald and Walker, 2009). Since it is known that sensory afferents are not ''hard-wired,'' but plastic (Ren and Dubner, 2007) and given the wider receptive fields and hypersensitivity of preterm neonates, the pain they experience in this vulnerable time may have profound long-lasting effects. Primary hypersensitivity is more easily provoked in preterm neonates (Johnson et al., 2011). Attention to the neonates' early experience of pain has ramifications for plasticity and neural development. A recent study utilized live lullaby singing (song of kin-as selected from parental preference), for the first time on behavioral and psychological pain responses during venipuncture of 38 preterm and full-term infants with some interesting findings towards higher oxygen saturation and calmer heart rates in the intervention groups as compared to the control group. This could be an important development for future neonatal and young infant pain protocols (Ullsten et al., 2017).

#### Pre-mature Brain Function

Our knowledge of brain functions during the perception and processing of music has increased in the past decade (for review, see Jaschke, 2019). Modern technological advances such as functional Magnetic Resonance Imaging, Diffusion Tensor Imaging, and functional Near Infrared Spectroscopy, have made it possible to track and analyze music multisensory stimulus as it activates several brain regions in its path (Jaschke, 2019). Music is comprised of complex stimuli that activate regions beyond the auditory cortex. Among others it activates the thalamus, hippocampus, and temporal as well as frontal regions; all of which are known to have additional primary functions such as language processing, EFs, and associative memory.

Recent developments in music and brain research justify music as a multi-sensory stimulus and the influence of its multisensory integrative properties on dynamic brain networks (Barrett et al., 2013). There is still more research needed to investigate these properties in neurotypical pediatric as well as adult brain function. Multisensory properties are even less studied in the neonatal/preterm brain and will certainly assist in our potential to develop formative hypotheses that may well have bearing on all aged populations. Neonatal brain access allows for the study of neural plasticity in light of multisensory integration in real-time.

The multiple brain areas involved in the perception and processing of music make it a potentially interesting clinical tool in NIC (Sa de Almeida et al., 2020). Throughout the development of the brain, coupling and synchronization of different brain areas are thought to mediate between the assembly and development of neural networks, which in turn contribute to the coordination and plasticity of the brain as well as play supporting roles in sensorimotor functions such as cognition including perception, language, learning, executive functioning and memory (Benasich and Ribery, 2018; Sa de Almeida et al., 2020).

Concerning sensorial influences in the NICU, a specific interrupted process weighs upon auditory brain development, which starts early in gestation. By 23–25 weeks, all structures necessary for hearing, among which the cochlea, have developed (McMahon et al., 2012) As such, most infants admitted to the NICU are able to hear, unless they have a congenital anomaly.

From approximately 26 weeks' gestation, fetuses or preterm infants will have the capacity to react to auditory stimuli. Sounds a fetus hears within the womb include a mother's heartbeat, respiration and the maternal voice, and show recognition to the mother's voice, and in some circumstances, father's voice as well (Lee and Kisilevsky, 2014). From 30 weeks' gestation onward, the infant is able to distinguish between varying speech tones and timbres and is also able to process complex auditory sounds. This point likely marks the start of speech and language development (McMahon et al., 2012).

In preterm birth, this process is interrupted. On the one hand, this means that infants are exposed to noise in the NICU that they may not yet be able to process, which could be harmful and has been shown to alter respiratory and cardiac functions (Sa de Almeida et al., 2020). On the other hand, the deprivation of sounds heard in utero could have consequences for auditory brain maturation affecting speech and language development. A recent study of 136 (<30 weeks) infants hospitalized in a quiet private room NICU environment vs. an open active ward showed lower language and motor scores and a trend towards lower cerebral maturation at follow up- 2 years of age (Pineda et al., 2014). In a second study, a literature review of 88 studies inclusive of measurements related to the auditory environments in the NICU sought to determine medical, environmental, and sociodemographic factors that predict and further define influential protagonists and antagonists related to infant auditory exposure in the NICU.

This review stratified succinct elements of the sounds in both conditions, the open and private room NICU-such as Noise, Mechanical equipment, meaningful words, electronic noise, adult words, distant words, et cetera (Pineda et al., 2017). It was found that there was an average of 3 h more silence in a 16-h period in the private room-that equated to roughly 30% more silence than in the open ward. This study, among other reports (Jobe, 2014; Webb et al., 2015) makes a case for the importance of low-frequency sound exposure for neonates. At the same time recommendations on the importance of all NICU sound environments whether in open or private spaces should seek to define the specific rationale for implementing any specific modality, inclusive of the timing and frequency considerations of sensory-based interventions. The goal should include strategies that optimize and inform any environmental modifications.

Examining infant sound exposure and seeking opportune experiences for premature infants inclusive of music will provide assurances that they will have exposure to meaningful words (McGrath, 2013), inclusive of language or singing that is infantdirected. Sound making and holding can also provide optimal conditions for development (Teckenberg-Jansson et al., 2011).

This delicate balance of activity, inhibition and multimodality integration prompts our capacity to make sense of the world around us in an attempt to perceive through multi-signaling. These dynamic processes develop over time through learning, exposure, and experience existent through until adulthood and beyond (Freeman, 1997). Such mechanisms are part of a complex and demanding process, which can place a fair amount of strain on any neurotypically developing infant, child or adult. Neonates and especially preterm born children face aside from the aforementioned difficulties, additional challenges as their brains and bodies encounter the complexity of these developing networks prematurely (Arnon et al., 2006).

After birth, the brain not only develops as a result of internally driven mechanisms alone. Neural circuitry is shaped by external factors and biogenetical mechanisms as well. The impact of external factors on a not-yet-ready-brain can, therefore, have a significant impact on the development of a child during both childhood and adulthood (Bieleninik et al., 2016; Provenzi et al., 2018). Considerations taken between over and under stimulation when it comes to understanding and supporting the development of a prematurely born infant brain have been scrutinized historically and currently. In order to be able to stimulate the brain to provide the infant with the best possible start; sounds and voices can be used as therapeutic stimuli (Loewy, 2015). The signature sound of the womb is one that an early pioneer, Van de Carr and Lehrer (1988) examined and notoriously named a ''prenatal university.'' These sounds are thought to be the most critical aspect of co-regulation for the neonate and mother, and subsequently family and community adaptation (Van de Carr and Lehrer, 1988). The mother's heart rhythm, and her voice pitch, tone and rhythm are hallmarks of an infant's first auditive ''memory'' prior to birth (Webb et al., 2015). Taking this a step further, in pro-active music activity with infants, exaggerated speech, such as singing, where the tones, pitches and intervals are purposefully arranged and orchestrated to meet the expression of the neonates, enhanced responsiveness develops. The inclusion of fathers is advisable. Breathing rhythms and tonal/vocal expressions become an active means of co-regulation and form the basis of expressive language and emotional attachment.

Stimulating the premature brain in this stage with entrained sounds, language, or music has to be implemented and coordinated with conscious effort and rationale, as the cortical folding is still in flux and a great number of neurons are still migrating to their final cortical destinations (Provenzi et al., 2018). Applied cautiously and consciously, parental and therapeutically trained voices and music can enhance neural development and stimulate neural plasticity, which in turn has the potential to influence cognitive function, brain oscillation and stimulate dynamic brain plasticity and network formation, in short as well as long-term development (Bieleninik et al., 2016; Provenzi et al., 2018; Sa de Almeida et al., 2020).

#### Music and Music Therapy as Neural Stimulation

Early in gestation—at around 16 weeks, the fetus's auditory system is formed. Between the 26th and 30th weeks, the fetus is able to detect and react to sound stimuli. This period in the womb is considered to be a critical period for neurodevelopment (Benasich and Ribery, 2018). Vandormael et al. (2019) discuss how the ''fine-tuning process takes place in the uterus where both internal (e.g., respiration, heart rhythm, and digestion) and external sounds (e.g., voices and music) can be perceived'' (Vandormael et al., 2019). They also cite significant recent research in numerous NICU conditions showing evidence that too much chaotic noise or not enough sounds, largely people's voices, may have detrimental, or deprived conditions which can show up in delayed language capacity in toddlerhood.

In a recent study, of 272 infants born prematurely, and serving as their own controls within a randomized 2-week intervention period in 11 hospitals, when exposed to entrained music therapy conditions, either in parental or music therapist singing conditions, vital signs improved. Implementing song of kin; parent-selected familiar melodies, or improvised to create in-the-moment, meaningful lyrics, within a simple melodic construct, or using womb sounds simulated through the use of a quietly entrained Remo ocean disc, sleep patterns were enhanced, and heart rate patterns were more even (regulated) and soothed (promoting sleep; Loewy et al., 2013).

The implementation of entrainment conditions, confirm and substantiate that live music in a music therapy context is best indicated. Live music ensures feedback that makes for reliable patterning of a sound-based relationship and a safe environment as music is regulated, in real-time, at the moment. The implementation of entrainment conditions, confirm and substantiate that live music in a music therapy context is best indicated. Live music ensures feedback that makes for reliable patterning of a sound-based relationship and a safe environment as music is regulated, in real-time, in a shared moment. In a recent review, results of 512 preterm infants among 15 recent clinical trials using live and recorded maternal voice interventions showed fewer cardiorespiratory episodes and significant improvement of physiologic and behavioral conditions (Filippa et al., 2017).

Examining the mother's voice, and recordings of her heartbeat sounds have shown auditory cortex improvement (Webb et al., 2015), at 1 month in 40 infants born extremely prematurely (between 25- and 32-week gestation). Newborns were randomized to receive auditory enrichment in the form of audio recordings of maternal sounds (including their mother's voice and heartbeat) or routine exposure to hospital environmental noise. They were shown to elicit strengthened auditory plasticity at 1 month compared to their controls. This study again used recorded sounds that were static, and not meeting the meter or condition of the premature infant's signals in the here and now. However, the organized sounds (music) and familiarity rendered better outcomes when compared with noxious hospital noise.

A recent study of five infants with congenital heart disease explored entrainment on the physiologic measurements of withdrawal through live singing and guitar accompaniment. This single case withdrawal pilot study examined the effects of music therapy ''entrainment'' on heart rate, respiratory rate, blood pressure and oxygen saturation rate, of five infants suffering from congenital heart disease, in the cardiac intensive care unit. Receiving music therapy ''entrainment sessions'' before and after heart surgery, and consistently, 3–5 times a week for up to 3 weeks, their physiological measures were recorded every 15 s after the music therapy intervention began, until 20 min after the intervention was complete. Although the outcomes showed improvements when ''entrainment'' was used from baseline to follow up, the songs implemented, when not informed by parents did not necessarily consider range, melodic content, nor musical elements. The entrainment was matched with the guitar accompaniment (many strings) rather than the singing (single-toned phrase). A metronome was instituted, which produces a static rhythm, likely not providing sensitivity to the actual shifts of heart rhythms that one would advantageously follow to match the infants' rhythmic heart rate changes moment by moment (Yurkovich et al., 2018). Even so, the study had an 80% success rate, whereby four of the five infants experienced decreases in average heart rate and respiratory rate and improvements in the derivative of the heart rate signal as well.

#### Music and Executive Functioning

Studies addressing very early exposure to music in (extremely) preterm born infants and a possible outcome-transfer to long-term effects on cognitive functions such as EF or learning are rare. Therefore, it is crucial to understand music itself and how it affects, touches and possibly influences brain and cognitive development, in a broader context. Being able to trace musical stimuli through the brain and linking this activity to a possible cognitive effect provides a means toward understanding neural development across the life span. Research on the development of the human brain has indicated that the brain reaches its full level of maturity at an age of around 30 years (Tomlinson, 2015).

Within these crucial years of development, the frontal lobe is one of the last to reach its full maturity (Tomlinson, 2015). With the prefrontal cortex being the primary seat of EFs, correlated with a possible influence of music (a combination of listening, playing and improvising), this area has the potential to influence the developing neural structures through the very complexity of the music itself and the conditions within which it is provided.

Processing music engages long fiber tracts in the brain and overlaps with regions responsible for EFs, and we expect a stimulation of EF when perceiving and performing music (Klein et al., 2016). Understanding the influence of music on EF can, in turn, provide insight into pathologies later in life, which manifest in a form of executive dysfunction. Music therapeutically informed interventions hold the potential, to stimulate the brain, supporting processes of neural plasticity, and can offer an enhanced start into life for (extremely) prematurely born infants, who have entered this world with so often physical and cognitive disadvantages stemming from birth trauma and/or deprivation.

### Music Learning and Brain Function

Early exposure to music can have strong influences on cognitive reserve and development later in childhood (Jaschke, 2019). As there is limited research on the development of EFs in preterm born children (regarding their age for neuropsychological testing) we have reverse—engineered out arguments from the literature which indicate improvements in executive functioning in relation to music training and exposure (Moreno et al., 2011; Degé et al., 2011; Jaschke et al., 2018a,b). This approach affords the tracing back of steps potentially contributing to neural development as informed by neural plasticity, utilizing large published data sets; a method that borrows from historical and anthropological research. We can, therefore, build the argument that by understanding EFs in ''neurotypical'' or so-called healthy populations we can trace this development back to influences shaped in an NICU environment (Sa de Almeida et al., 2020).

Frontal brain regions mainly process anticipation and expectation and the execution of musical thought during the event of listening and execution. Furthermore, they relate to music improvisation, which also relates to the amygdala and the hippocampus. Of note is the dorsolateral prefrontal cortex (dlPFC), which plays a crucial role in improvisation as well as in learning and memory, connecting deeper brain regions with frontal regions via the thalamocortical-thalamic loop. When it comes to music learning and its likely associated executive functioning, it is crucial to examine the difference between music listening and music playing. Both listening and especially playing music, activate a wide array of brain areas. These neural activity networks, however, should be linked in kind to cognitive processes. Understanding the differences between mere exposure to music and playing music, within a collaborative signalreading environment, can have a significantly different outcome, especially as studies propose causality between music and far transfer domains (Jaschke et al., 2018a).

Linking back to prematurely born infants, to have a deeper understanding of the impact of musical learning and its capacity to affect learning in neurologic development, it may be useful to distinguish four modalities of music participation: (1) passive listening; (2) active listening; (3) music-making; and (4) improvisation. All four, even though consecutively building upon each other, translate uniquely and distinctly to a possible effect on cognition and should, therefore, be considered as influential when analyzing the effects of music on brain function, emotion and behavior.

In premature infancy, the listening conditions might translate well within a quiet or active alert state context, and the musical vocalizing might translate to a contingent singing (Malloch et al., 2012; Shoemark, 2017) or infant-directed singing (Mehr and Krasnow, 2016) exploration. In such contexts, the in-themoment improvisatory experience might be indicative of a developing repartee and one whereby the music therapist is singing back the premature infant's vocalization, strategically on the exact pitch of the infant's tone. Eventually, the premature infant might absorb the vibration percept and create a new tone or an interval of two tones outside of the tone of the formerly set condition. This may unfold without prompt and occur seemingly suddenly, such as a crying sound with accented phrasing, or a comforting sound wherein the rhythmic meter of the plosive mouthing sounds indicates exploration (Loewy, 1995).

#### MUSICAL PARAMETERS

#### Timing

A recent study of 35, <32 weeks GA neonates revealed that those who listened to 8 min of pre-composed music five times per week, based on nursing assessment of developmental need (assessed on a neonatal behavioral assessment scale), timed to be offered at the moment and distinctly related to need of sleep, wakefulness-interactive, or at a time of alertness, such as prior to feeding, led to functional brain architectures (shown through fMRI) that were more similar to those of full-term newborns (Lordier et al., 2019). This provided their evidence for a ''beneficial effect of music on the preterm brain.'' This example of identifying the timing of a music intervention is quite indicative of sensitivity to best practice needs and might provide a unique platform for the inclusion of the infant's state and readiness for music, suggestive of a dynamic aspect of signal reading and its importance when providing intervention for the music applied.

However, the music intervention, and its meter and instruments ''composed of a soothing background, bells, harp, and punji (charming snake flute) five times per week from a GA (gestational age) of 33 weeks until the MRI'' may not provide aspects related to the most efficacious provisions of the music itself. Unfortunately, readers are not provided with the music applied-and when one seeks to gather the evidence and arrives at the website<sup>1</sup> , the original music provided for this referenced study is not easily found.

Even if it were easily accessed, as a link on-line, because it is an obstructive variable and one that is not reliant on the infants' vital signs, it is likely not best-indicated. Utilizing a recording, the conditions of timing, so central to a preterm infant's neural plasticity cannot be sensitively provided as recorded music does not provide a platform for the entrainment features most critical to the infant's neurologic brain effect. The results for the most ultimately safe and nurturing environment might include the mechanism of entrainment, whereby the parent and/or music therapist (whose training in NICU music therapy) is inclusive of the assessment of the infants' vital signs and whose timing of intervention is inclusive of the rhythm and melody relevant to the parents' cultural background/preference, or adapted to be inclusive of musical elements related to these vital conditions. Such cultural aspects necessarily determine whether the music that may be best indicated is meter-less, such as, for example, an Indian raga, or a classically formatted work, such as a baroque piece, or soft rock-set, or folk song, perhaps phrased in 3/4 or 6/8 as a ballad, or in lullaby style. Knowing the music and composers that the infant heard in utero and presenting this music live, and eventually constructed within a suitable framework that parents can utilize as their ''song of kin'' (Loewy, 2015) may ensure continuity of care (by parents, in NICU, and at home post-discharge), with enhanced reliability that might influence the potential for bonding as an incentive for what some researchers have studied to be a ''neurologic brain effect'' in premature infants. The case for timing of when music is provided, and the timing of the music itself (time signature), and how it is applied (entrained) within its natural meter of culture may be the best-indicated elements one considers for making music therapeutic interventions that seek to strengthen the neurological development of plasticity in premature infants.

## Timbre

The timbre element of sound and music is something rarely discussed in health music applications (such as neurology, music medicine and music therapy). Two innovative NICU studies using recorded music did include the actual womb-sounds of the placenta, combined with female voices. With attention to timbre, these studies incorporated a music program inclusive of the sounds of an intrauterine maternal pulse along with synthesized female voices at 65 dbs. Created by Fred Schwartz, an anesthesiologist, the program was named ''Transitions'' (Placenta Music, Inc., Atlanta, Georgia). The first study of this music's effects included four premature infants with Bronchopulmonary dysplasia (BPD) on continuous ventilator support. Each infant was exposed to three different 15-min conditions: (1) music played via a Somatron mattress; (2) music played via a recorder at the foot of the infant's crib; and (3) silent isolation room condition. This study included providing music immediately following suctioning. The conditions were counterbalanced and repeated so that each infant received 18 interventions in the following order ABC-BCA-CAB-(etc). They looked at heart rate, oxygen saturation, arousal state, facial expressions, limb movements, and autonomic states all within the 13 s (with 7 s to recover before starting again), every 15 min for 18 trials. The time period of the data collection ranged from 8 to 21 days, every 13 s within a 15 min period for 18 trials. They collected in total 4,860 pieces of data per infant. Oxygen saturation and sleeping time improved significantly during the interventions (Burke et al., 1995).

The second study utilizing Schwartz's placenta soundscape program was also incorporated in another infants-as-their-own measurement design (Chou et al., 2003). In 30 premature infants with respiratory distress requiring endotracheal intubation and ventilation and endotracheal suctioning, this second trial showed statistically significant higher oxygen saturation during recovery periods compared to the controls, who did not receive the ''Transitions'' soundscape. Statistical significance was reflected in the period oxygen saturation was achieved, with quicker recovery time when this music was applied. This procedure has potentially disturbing side effects such as dysrhythmias, oxygen desaturation, cerebral blood flow fluctuations, and laryngospasms. Familiar timbres can increase tolerance and build resilience. The potential of watery timbres, reminiscent of womb sounds and their effect on vital sign enhancement should not be overlooked.

Studies of older infants have shown that the mother's voice, is recognizable to the infant who has heard it distinctly and repetitively in the womb pre-birth (Thompson and Trevathan, 2009). Greater attention taken has achieved the conditions of timbre that replicate a womb-like timbre experience inclusive of water or breathing sounds. These are effectively metered to the premature infant's rate of breathing. This can occur during kangaroo or skin to skincare, whereby a parent holds their

<sup>1</sup>http://vollenweider.com/en

premature infant over their heart (on the left side, Salk, 1973), and can readily pay attention to their infants' breathing patterns.

The perception of timbre, recruits multiple brain areas, including regions in the frontal lobe, the thalamus—as central relay and multisensory integration region, the hippocampus, amygdala and especially the planum temporal and the temporal plane (Jaschke, 2019). The neural function of perceiving timbre is a combination of pitch decoding, pitch identification, the spatial location of sound and the ''color'' (warm, cold, squeaky, round) of the perceived sound.

An audible ''ah'' sound will not only provide a timboric effect of vibration but additionally will offer a means of breath connection, particularly if the parent or music therapist entrains the application of their self- breathing to the meter of the infant's breathing., as in ''tonal-vocal holding'' (Loewy et al., 2013). One early study inclusive of entrainment (Ingersoll and Thoman, 1994) included a devised ''breathing bear'' contraption that could reflect the breathing rate of the infant in proximity. The study included 36 premature infants; half were randomly assigned the breathing bear device vs. the control, a breathing bear device that did not entrain breathing sounds. At 35 weeks, the intervention infants showed slower and more regulated respiration during quiet sleep. At 45 weeks, these same infants showed more quiet sleep and less active sleep. At both ages, only the intervention infants showed a correlation between respiratory regulation and the amount of quiet sleep (Ingersoll and Thoman, 1994). Sleep regulation is a critical mechanism of strong neurologic function that can be influenced by music (Loewy, 2020).

The findings show an effect of positive neurobehavioral development. The author's credit ''entrainment'' effects related to optional stimulation. Central to this effect was the reflecting of the infant's own biological rhythms activated and directed by the infants' rhythms. Of note, this timbre sound was a water sound set by a machine. It is likely that when a live human being is brought into the response, the essential interplay lays a foundation for not only neurological function but as one of attachment as well.

## Repetition: Predictability as Enhancement in Neonatal Music Therapy

The human brain as it formulates structure in the womb is reliant on the rhythmic conductor of another; that of the mother's heartbeat. This ''conductor'' is variable, and the fetus necessarily regulates growth in accordance with a wide range of meters based on the mother's activity level.

The practice of ritual in patterning music and rhythms of daily life inclusive of the sleep-wake cycles of developing infants is critical (Loewy, 2020). It is known that infants recognize familiar voices and melodies, and language patterns. Repetition, in particular as part of a music gestalt, provides space for interaction and safety within the music's spaced cue, as in a breath before a downbeat, or within a final repetitive cadence.

It is best practice to engage premature infants with music that has predictability. One study showed that young infants engage more rhythmically to music when compared to speech and that the cued timing related to the effectual mood of songs vs. speech was linked to greater rhythmic coordination (Zentner and Eerola, 2010). A former study (Loewy et al., 2005) comparing chloral hydrate with parents' songs of kin arranged in lullaby format, incorporating repetition, lead toward a quicker, more profound sedation effect than pharmacologic sedation for infants and toddlers requiring EEG.

## Melody-Simplicity of Range and Recognizable Intervals

It is important when instituting music to seek the melodic organization and sequences of songs that are familiar to the neonate. If parent/s are unavailable, the range of the mother and father's voice is good to render when providing interventions (eg soprano, alto, tenor, baritone, bass). The ''song of kin'' (Loewy et al., 2013; Loewy, 2015) is inclusive of a model that assists parents in creating a natural, easily accessed sung melody, and most particularly one that has meaning for the parent. The song's significance and relevance can be part of a music psychotherapeutic process. It does not have to be a melody that the neonate heard in utero. It does not have to be related to spiritual or historical aspects one the infants' family. The song should aim to be a favored song-culturally relevant to parental preference, and the melody can be entrained and best applied with a simplified single-line matching of sung vocal phrasing. Accompaniment, if and when applied, can be minimal. A Capella is well suited for neonates.

The primary auditory cortex and secondary auditory cortex are the primary preceptors of the incoming auditory information. Wernicke's area and Heschl's gyrus, process mimicking and associations, together with pitch intervals and melody (Jaschke, 2019). Music therapy literature has advised on important considerations to keep in mind when implementing best practices of musical melodies for neonates. Notwithstanding cultural nuances, it is useful to consider aspects of caution and simplicity when developing melodies with infants and caregivers: ''Slow tempo, simplicity, minimal number of instruments and harmonics, quiet and stable dynamics: decibel levels not greater than 60–65 dB (A-weighted scale), repetition and consistency; rocking meters, one-octave tonal range maximum, unidirectional melodic contours, with limited changed in pitch direction; emphasis on descending tones to engage relaxation response (Stewart, 2009b, p. 127).

## Song of Kin: Affective Culture as Attachment Incentive

Perceiving, processing and executing music recruit numerous neural areas. In combination, the varying facets of music such as rhythm, melody, and timbre are projected to distinct areas of the brain, which decode the stimuli to create what we understand as music. These areas, however, are not exclusive to music. The temporal, as well as frontal lobes that are equally involved in language and arithmetic, are also overlapping within the musicrelated regions (Jaschke, 2019). Each of the regions includes its own interpretation of the task at hand and therefore stimulating these, increases more than the actual understanding of the task, but can be transferred to other mental exercises such as an increase in empathy or working memory, which share overlapping regions.

The ''song of kin'' importantly, it is a song that is set to a simple construct, whereby a holding condition or lull (such as 3/4 or 6/8) is implied through an easily accessible form for the parents. It is provided as a developmental tool inclusive of attunement (Bowlby, 1998; Ainsworth, 1989) strategies in the growing relationship between parent and infant. It can be used when an infant is seeking connection-so as to build attachment, in an improvisatory fashion, for instance, such as in the key the infant is cooing within (melodic entrainment). It can be used to calm when an infant is fussy (breath entrainment). It can be used as a ritual, providing the safe assuring conditions of sleep (looping and repeating the cadence to sustain a ''cadential effect'' (Loewy, 2009, 2020). In the instance of enhancing sleep, the lyrics can be removed, and vowels will elongate melodies and suggest a slowing down of the mind-body-attention.

With so many play songs and activating conditions on the current market related to musical circumstances available for infants and parents, it is perhaps hard to believe that the most nurturing source and use of music can emulate from the parent uniquely, and be formulated to fit the conditions of the infants' ever-changing physiological state. This was highlighted in a recent study on vocalizations of parents and its influence on eliciting sound-making in preterm infants (Caskey et al., 2011).

As infants do not speak, touch and emotional closeness expressed through vocalizing can forge the seeds of trust and musical repartee (Goulet et al., 1998). The absence of such opportunities within the critical first days of life can increase prenatal anxiety in the NICU (Franck et al., 2005). As musical expression enhances participatory and opportune moments for inter-relating, enhancing a parent's confidence is essential for bonding. Maternal sensitivity in mothers with preterm infants is less optimal when compared with full-term controls (Forcada-Guex et al., 2006) and early separation from an infant at birth in one study was related to an increase in parents' NICU-related stress (Franck et al., 2005).

Attachment can be the most intimate and personalized condition of mutual interplay (Ainsworth, 1989) when it is enriched with musical conditions that are personalized. Using the infant's name and creating circumstances whereby a parent's singing becomes strengthened and more intense within a response (Loewy, 1995, 2015), such as a silent moment and singing a tone at the same pitch as the infant, or hiding the face and finding it when the infant creates a sound are part of a personalized connecting music soliloquy that prolongs interplay and sustains attention and expectancy factors which are the seeds of brain development.

This is not an instantaneous condition. The experienced clinician or therapist with advanced training deemed fit to work with parents understands that fragile infants have fragile parents (Zimmer, 2019). It is therefore imperative that the music is not simply implemented as a condition that a parent is told is good for the infant. Work in music therapy with mother (Arnon et al., 2014), fathers and families can lend integral inclusion factors that will take a critical community role within developing families (Ettenberger, 2017).

Als et al. (2012) were among the first to follow preterm infants who participated in an individualized developmental approach to neonatal care-currently named the Newborn Individualized Developmental Care and Assessment Program (NIDCAP). The infants provided with NIDCAP showed improved electroencephalogram coherence and more mature frontal brain structural development as evidenced by magnetic resonance imaging at term-corrected age. There were improved neurobehavioral outcomes at 2 weeks and 9 months corrected age (Als et al., 2012).

Moreover, the RBL model includes the parent prong as one of the most essential treatment areas in working with families in the NICU. In playing for and playing music with parents who experience the circumstances of premature birth, opportunities for soothing and nurturance encourage easeful feelings of release and eventually of empowerment (Haslbeck and Costes, 2011). Acknowledgment of the difficult circumstances can circumvent rage, shame, doubt and self-mistrust that are common with premature birth. The provisions of music therapy for parents will likely lead to better outcomes for their capacity to use music in their new relationship with their infant (Loewy, 2015). The effects of trauma on the premature infant in terms of its implications within the developing relationship with the parent/s (Als, 2009) is an understudied aspect of arrested or delayed capacity of neuroplasticity (Stewart, 2009a). Stewart addressed the potential traumatic experiencing for infants and parents and recommended the utilization of a preventative music therapy treatment, inclusive of staff, and based on resiliency and renegotiation principles. The six phases of the model are: (a) stabilization; (b) self-regulation; (c) integration of experience/resolution of traumatic memories; deconditioning; (d) establishment of secure social connections: repair and/or development of effective attachment and reciprocity; (e) accumulation of restorative emotional experience; (f) future planning: development of self-care plans and goals (Stewart, 2009a).

Working with trauma in the NICU requires advanced training, such as the incorporation of Peter Levine's Somatic Experiencing, or of Stephen Porges' Polyvagal theory. Aspects of these orientations have been incorporated into the three prong (First Sounds, 2016) music therapy advanced training for NICU music therapists.

## CONCLUSION

The first 1,000 days from conception to delivery are among the most important periods in the development of an individual. During this time, the infant brain is rapidly developing and is extremely sensitive to environmental influences. For preterm infants, born before a gestational age of 37 weeks, and specifically, those born very preterm (i.e., before 30 weeks' gestation), the period that should have been spent in the womb, is interrupted, often suddenly. Preterm infants often undergo trauma, that may be experienced by their parents (Aagard and Hall, 2009) as they, at the same time, are subsequently admitted to the NICU, where they face many challenges. Among these challenges is the stress of physical and sensorial influences coupled with maternal separation.

Music as an often complex and cognitively demanding modality can be over-stimulating particularly when considering the fragility of the neonatal brain developmentally. Therefore, a conservative ''less is better'' minimalistic approach to the music instituted is recommended. Opportunities for music therapy assessment that implement one stimulus at a time, with sensitivity to rhythm, decibel level, proximity to the infant (at mid-line to encourage fetal positioning) is indicated. Entrainment conditions where the parent/therapist is trained to follow the infant's cues and clues, with breath (air/water timbre qualities), and melody, with small, repetitive short phrases comprised of the cultural sequences and sung by familiar voices have been substantiated (Loewy, 2015; Mondanaro et al., 2016).

Opportunities to treat fragile parents directly with music therapy relaxation techniques may provide a wealth of physiological sensorial conditions that in turn, may be sensitively and naturally translated to their infants. Adherence to reading the signals of the neonates' readiness or disengagement, inactive alert, quiet-alert, or sleep state preferences will provide key information related to the most opportune times for live music that can stimulate interactive activity, or alternatively induce restful transitions for sleep that will ultimately enhance

#### REFERENCES


neonatal brain function. The clinical applications of music are best integrated when elements of decision-making related to musical choices are substantiated by cultural, psychological and neurological mechanisms. In this way, clinicians, caregivers and researchers alike can incorporate elements of music that will assist in regulating, enhancing, adapting and fostering the music to meet the cues and clues of the ultimate conductors—the premature infant and parent/s we serve.

#### AUTHOR CONTRIBUTIONS

JL conceived the manuscript concept and created the outline, authoring the introductions and body-support of the hypotheses, based on research on neonatal music therapy in Mount Sinai Beth Israel. AJ co-contributed to the neuroscience theory, enhancing aspects related to music mechanisms in neural plasticity.

#### FUNDING

The Louis Armstrong Center for Music and Medicine Mount Sinai Health System.


**Conflict of Interest**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Loewy and Jaschke. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

digital media

of impactful research

article's readership