Can You Hear Me Now? Musical Training Shapes Functional Brain Networks for Selective Auditory Attention and Hearing Speech in Noise

Strait, Dana  L; Kraus, Nina

doi:10.3389/fpsyg.2011.00113

ORIGINAL RESEARCH article

Front. Psychol., 13 June 2011

Sec. Auditory Cognitive Neuroscience

volume 2 - 2011 | https://doi.org/10.3389/fpsyg.2011.00113

This article is part of the Research TopicThe relationship between music and languageView all 23 articles

Can you hear me now? Musical training shapes functional brain networks for selective auditory attention and hearing speech in noise

Dana L. Strait^1,2

Nina Kraus^1,2,3,4,5*

¹ Auditory Neuroscience Laboratory, Northwestern University, Evanston, IL, USA
² Institute for Neuroscience, Northwestern University, Evanston, IL, USA
³ Department of Communication Sciences, Northwestern University, Evanston, IL, USA
⁴ Department of Neurobiology and Physiology, Northwestern University, Evanston, IL, USA
⁵ Department of Otolaryngology, Northwestern University, Evanston, IL, USA

Even in the quietest of rooms, our senses are perpetually inundated by a barrage of sounds, requiring the auditory system to adapt to a variety of listening conditions in order to extract signals of interest (e.g., one speaker’s voice amidst others). Brain networks that promote selective attention are thought to sharpen the neural encoding of a target signal, suppressing competing sounds and enhancing perceptual performance. Here, we ask: does musical training benefit cortical mechanisms that underlie selective attention to speech? To answer this question, we assessed the impact of selective auditory attention on cortical auditory-evoked response variability in musicians and non-musicians. Outcomes indicate strengthened brain networks for selective auditory attention in musicians in that musicians but not non-musicians demonstrate decreased prefrontal response variability with auditory attention. Results are interpreted in the context of previous work documenting perceptual and subcortical advantages in musicians for the hearing and neural encoding of speech in background noise. Musicians’ neural proficiency for selectively engaging and sustaining auditory attention to language indicates a potential benefit of music for auditory training. Given the importance of auditory attention for the development and maintenance of language-related skills, musical training may aid in the prevention, habilitation, and remediation of individuals with a wide range of attention-based language, listening and learning impairments.

“Attention is the holy grail. Everything that you’re conscious of, everything you let in, everything you remember and you forget, depends on it.”

D. Strayer¹

Introduction

The human nervous system is constantly faced with an astounding amount of sensory input. Despite the fact that our brains house over 10 billion neurons with more than 10 trillion synapses, accurate encoding of a complete environmental scene is a functional impossibility. Fortunately, the brain has evolved in ways that permit the modulation of neural activity according to environmental and systemic demands, permitting the selection, efficient encoding, and appropriate behavioral response to the stimuli of greatest biological interest. Selective attention makes this modulation possible, directing the allocation of neural resources to selectively encode one aspect of the environment while excluding competing aspects. Selective attention resolves the competition imposed by a mass of incoming signals through the activation of executive control regions (e.g., prefrontal cortex) to promote increased spiking in neurons that represent the attended object (Desimone and Duncan, 1995). Variability in the activation of brain networks that underlie selective attention can bring about significant behavioral disadvantages, such as attention lapses (Weissman et al., 2006) and symptoms of an attention impairment (Depue et al., 2010).

Recently, scientific pursuits concerning attention have invoked a hotbed of discussion. With the dawn of attention-fracturing devices such as portable music players, texting, and the internet, the act of sustaining attention on a single task may be rapidly fading into oblivion. More than ever, scientists need to determine the neural mechanisms that underlie attention, their behavioral outcomes, and how we might strengthen them with training and life experience. Here, we emphasize the discussion of auditory attention given its importance for language processing and the development and maintenance of language-related skills, such as hearing speech in background noise.

Event-related potentials (ERPs) have provided striking insights into the neuronal underpinnings of selective auditory attention (Hillyard et al., 1973; Woldorff et al., 1993; Coch et al., 2005), especially as it relates to everyday auditory function in noise. Standard experimental procedures imitate listening requirements in noisy environments by presenting separate sound streams to the right and left ears, asking participants to pay attention to one side while ignoring the other. When ERPs to the attended relative to the ignored sounds are compared, the negative deflection occurring at ~100 ms post-stimulus (i.e., the N100) is deeper in amplitude (i.e., more negative). Although more information is needed to precisely define the neuronal mechanisms that drive such outcomes and their malleability with training, development, and life experiences, these findings reveal that attention has the power to modulate early sensory processing.

The brains of musicians may provide insight into neural attention mechanisms and their potential for experience-dependent plasticity (Kraus and Chandrasekaran, 2010). Musical practice and performance require sustained attentional control for the delicate online manipulation of sound and, for ensemble players, to permit coordination with other instrumentalists. Given that musicians traditionally initiate training during early developmental years, attention to sound is regularly practiced during pivotal periods of brain development. Recent evidence from our laboratory indicates enhanced auditory but not visual attention ability in musicians relative to non-musicians, with musicians demonstrating faster reaction times to a target sound than non-musicians, but not to the task’s visual analog (Figure 1A; Strait et al., 2010; see Materials and Methods for task details). This finding may reflect decreased variability in musicians’ sustained auditory attention task performance. Surprisingly little is known about the impact of musical training on the neural correlates of attention. We do know, however, that cortical networks that promote attention to music share considerable overlap with those that underlie general attention in other auditory domains, such as language. In addition to the primary auditory cortex, these sites include the fronto-parietal attention and working memory networks that comprise the prefrontal cortex, the intraparietal sulcus, the supplementary and presupplementary motor areas, and the precentral gyrus (Janata et al., 2002; Kane and Engle, 2002). This functional overlap between attention to language and music corroborates previous results suggesting that a combination of modality-specific and general attention and working memory mechanisms (e.g., the fronto-parietal attention network) contribute to sustained auditory attention (Zatorre et al., 1999; Petkov et al., 2004). The prefrontal cortex has been particularly emphasized for its role in sustaining attention by providing access to recently presented stimuli and directs sensory processing according to behavioral goals – especially in challenging perceptual environments (Kane and Engle, 2002). Although we lack direct evidence for how musical training shapes brain mechanisms that underlie auditory attention performance, that musical training tunes the brain’s executive control network for auditory processing beyond the music domain – particularly for sustaining attention with minimal variability – would not be surprising.

FIGURE 1

Figure 1. Musicians, auditory attention, and processing speech in noise. We assessed auditory attention, speech-in-noise perception and auditory brainstem function in musicians and non-musicians (Parbery-Clark et al., 2009; Strait et al., 2010). Musicians demonstrated enhanced auditory attention as measured by reaction time (A) and were better able to accurately repeat sentences presented in noise at poorer signal-to-noise ratios than non-musicians (B). Auditory attention performance correlated with speech-in-noise perception across all subjects, with individuals having faster reaction times during a sustained attention task demonstrating better hearing in noise (C). Although both musicians and non-musicians demonstrated robust neural responses to a speech sound when presented in a quiet background, non-musicians’ responses were particularly degraded with the addition of a six-talker babble noise to the background. In both groups, the brainstem response waveform is positively correlated with the acoustic waveform of the stimulus. However, when the stimulus is presented in the presence of background noise musicians’ brainstem responses represent the stimulus more faithfully than non-musicians’ (D). *p < 0.05; **p < 0.01.

While little is known about the neural correlates of attention ability in musicians, it is well established that musical training strengthens cortical and subcortical mechanisms for auditory processing. Despite the fact that neural specializations for music and speech have been established (Zatorre et al., 2002; Brown et al., 2006; Abrams et al., 2010; Rogalsky et al., 2011), there is no doubt that the human brain also recruits shared mechanisms for processing sound in both domains (Koelsch et al., 2002; Patel, 2003; Zatorre and Gandour, 2008; Fedorenko et al., 2009; Slevc et al., 2009). Such shared mechanisms may account, at least in part, for musicians’ structural (Schmithorst and Wilke, 2002; Schneider et al., 2002; Gaser and Schlaug, 2003; Hutchinson et al., 2003; Schlaug et al., 2009) and functional enhancements for general auditory processing that are not constrained to the domain of music but that extend to language (Schon et al., 2004; Marques et al., 2007; Musacchia et al., 2007; Wong et al., 2007; Moreno et al., 2009) and emotional communication sounds (Strait et al., 2009). Neural enhancements are particularly evident in musicians in the context of challenging listening environments, such as in the presence of background noise (Parbery-Clark et al., 2009a) and reverberation (Bidelman and Krishnan, 2010), with musically trained adults demonstrating less noise-induced degradation in the subcortical encoding of speech than non-musicians (Figure 1D). The degree of noise-induced subcortical response degradation is functionally correlated with speech-in-noise perceptual ability in that individuals with increased subcortical resilience to background noise demonstrating better speech-in-noise perception (Parbery-Clark et al., 2009a). These findings imply that musicians’ nervous systems are fine-tuned for the extraction of meaning from a complex soundscape, being shaped through their extensive and consistent interactions with organized sound to better exclude competing noise and more accurately encode signals of interest.

The precise neurobiological mechanisms that bring about musical training-induced neuronal enhancements remain undetermined, although strengthened cognitive control over auditory processing, as would be directed by attention, provides a plausible agency (Strait and Kraus, in press; Strait et al., 2010). Increasing evidence has accrued to indicate that musicians more heavily recruit extra-sensory cortical areas associated with attention and working memory, such as the prefrontal, superior parietal, and inferior frontal cortices, during challenging auditory tasks that demand discriminatory alertness (e.g., when subjects are instructed to listen for certain auditory targets) compared to non-musicians (Stewart et al., 2003; Haslinger et al., 2005; Baumann et al., 2008). The prefrontal cortex has been attributed particular importance, being associated with goal-directed behavior and the top-down guiding of sensory processing according to internal states or intentions (Miller and Cohen, 2001). Whereas musically trained and non-trained adults demonstrate equivalent auditory cortex activation for the completion of pitch discrimination and sound recall tasks, musicians more extensively activate parietal and prefrontal extra-sensory networks – indicating more extensive involvement of attention and working memory networks that could facilitate heightened control over sensory processing (Gaab and Schlaug, 2003; Pallesen et al., 2010). Musicians’ recruitment of extra-sensory networks involved in attention and working memory may account for their enhanced performance on auditory tasks such as pitch discrimination (Kishon-Rabin et al., 2001; Parbery-Clark et al., 2009a; Strait et al., 2010), sound recall and hearing speech in noise (Figure 1; Parbery-Clark et al., 2009a, 2011).

That strengthened cognitive control mechanisms guide general auditory processing enhancements in musicians in a top-down manner is particularly viable given recent observations to this effect in animal models, in which auditory training leads to modifications in spectrotemporal tuning curves in the primary auditory cortex that appear to be facilitated by functional connections with the prefrontal cortex (cf. Bajo and King, 2010; Fritz et al., 2010). Even with regard to subcortical auditory plasticity, a primary role has been established for the reciprocal corticocollicular pathway, with training-induced changes in inferior collicular response properties being ablated with the targeted cooling of the cortex (Bajo et al., 2010). This is not surprising given the noted strength of cortical descending pathways in modulating subcortical (i.e., collicular) neuronal response properties (Suga et al., 2002). The resiliency of musicians’ nervous systems for encoding signals of interest in the presence of background noise (Figure 1D) may indicate increased executive control over auditory function, or, in other words, strengthened top-down attentional mechanisms within the primary auditory cortex that guide the resolution of competition imposed by a mass of incoming signals.

As noted, when multiple auditory streams are present in a scene, they compete for cortical representation. Selective auditory attention provides a mechanism for determining which sounds will be most thoroughly processed and brought to awareness, to the exclusion of others. It is unlikely, however, that the human brain is able to invariably maintain attention on a specific sound stream of interest over a sustained period of time. Consistently sustaining attention – with minimal attention lapses – is particularly difficult in input-rich sensory environments, such as when tracking a single individual’s voice amidst other conversations. Accordingly, Weissman and colleagues have demonstrated that brain regions associated with attention routinely demonstrate performance variability during the execution of sustained attention-demanding tasks, decreasing in activity while other brain regions increase in activity. Specifically, variability in the activation of the attention network during task performance has been linked to momentary lapses in attention, with continued activation of prefrontal and parietal regions underlying successful sustained attention performance (Weissman et al., 2006). Variability in the activation of prefrontal control regions is interpreted as the failure to accomplish attention’s goal, being to maximally and consistently enhance the sensory processing of behaviorally relevant stimuli. Decreases in the fronto-parietal network’s activation reduce its suppression of a default – or “daydreaming” – network, which corresponds with poorer attention task performance.

Here, we aimed to define the impact of musical training on neural networks underlying selective auditory attention performance in a natural language-listening environment. In light of the functional importance of sustained fronto-parietal attention network activation combined with musicians’ enhanced reliance on this network compared to non-musicians for the execution of auditory tasks, we asked two questions. First, we asked whether the act of sustaining auditory attention on a target speech stream leads to decreases in auditory-evoked response variability across all participants, especially within the primary auditory cortex and fronto-parietal attention areas. Second, we asked whether this decrease is larger in musicians. We hypothesized that musicians demonstrate less variability in neural responses to speech with auditory attention compared to non-musicians, particularly in prefrontal and parietal cortices, and that this decrease in variability correlates with musicians’ training backgrounds.

Materials and Methods

Participants

All experimental procedures were approved by the Northwestern University Institutional Review Board. Twenty-three normal hearing adults (≤20 dB pure tone thresholds at octave frequencies from 125 to 8000 Hz) between the ages of 18–35 participated in this study, for which they provided informed consent. All participants completed an extensive questionnaire addressing family history, musical experience and educational history. Musicians (Mus, N = 11) were self-categorized, began instrumental musical training at < age 7 and had consistently practiced for ≥11 years (consistently defined as practicing at least 3 days weekly; M = 16.5, SD = 5.8). Non-musicians (NonMus, N = 12) were self-categorized, had <5 years of formal musical experience throughout their lifespans (M = 1.2, SD = 1.8), began musical training after age 11 and had not played a musical instrument in the 5 years leading up to the experiment. Nine of the 12 NonMus participants had no degree of musical experience. Mus and NonMus groups did not differ according to age (F_(1,22) = 0.20, p = 0.66), sex (χ² = 0.35, p = 0.68), non-verbal I.Q. as measured by the Wechsler abbreviated matrix reasoning subtest (F_(1,22) = 0.37, p = 0.55; Harcourt Assessment, San Antonio, TX, USA), or performance on the attention task (as measured by quiz scores, described below) (F_(1,22) = 0.32, p = 0.58).

Speech-in-Noise and Auditory Attention Performance

In order to clarify the relationship between speech-in-noise perception and auditory attention, we assessed these skills in 22 18- to 35-year-old musician and non-musician participants (N = 14; Mus = 8) using data collected for two separate experiments, the isolated results of which have since been published and are discussed above (Parbery-Clark et al., 2009a,b; Strait et al., 2010). Five of these participants also participated in the electrophysiological paradigm, described below. Speech-in-noise (SIN) perception was measured using the Hearing in Noise Test (Nilsson et al., 1994), in which participants are asked to repeat short sentences presented in speech-shaped background noise using a speaker placed one meter directly ahead. The noise presentation level was fixed at 65 dB SPL and the program adjusted perceptual difficulty by increasing or decreasing the intensity level of the target sentences until the threshold signal-to-noise ratio was determined. Perceptual SIN thresholds were defined as the level difference (in dB) between the speech and the noise presentation levels at which 50% of sentences are correctly repeated.

Auditory attention was assessed using the IHR Multicentre Battery of Auditory Processing’s Auditory Attention subtest (Barry et al., 2010), which measures phasic alertness via reaction times induced by the presence or absence of a cue that occurred with a variable delay (0.5–1.0 s) before a target stimulus. We have previously reported between-group differences using this measure in musicians and non-musicians, with musicians demonstrating enhanced performance compared to non-musicians (Figure 1A; Strait et al., 2010). Participants were instructed to listen for a “beep” (presented at 80 dB SPL) and to press a button on a response box as soon as they heard it. Participants were cued by a second sound (a “siren,” presented at 70 dB SPL) on some trials and were asked not to respond to that cue. Reaction time was measured in milliseconds. Results reported here (Figure 1C) reflect subjects’ average reaction time to the cued stimulus.

Electrophysiology

We employed a paradigm designed by Helen Neville and colleagues that has proven enormously successful for studying neural mechanisms of selective auditory attention in children and adults (Coch et al., 2005).

Stimulus

The evoking stimulus was a six-formant, 170 ms speech syllable synthesized in Klatt (1980) with a 5 ms voice onset time and a level fundamental frequency (100 Hz). The first, second and third formants were dynamic over the first 50 ms (F₁, 400–720; F₂, 1700–1240; F₃, 2580–2500 Hz) and then maintained frequency for the rest of the duration. The fourth, fifth and sixth formants were constant throughout the entire duration of the stimulus (F₄, 3300; F₅, 3750; F₆, 4900 Hz). The stimulus was presented using NeuroScan Stim2 (Compumedics, Charlotte, NC, USA).

Electrophysiologic recording parameters and procedure

Auditory-evoked potentials were recorded to the speech sound /da/ using a 31-channel tin-electrode cap (Electrocap International, Eaton, OH, USA) in NeuroScan Aquire 4.3 (Compumedics) while participants were seated in a sound-attenuated booth. Single electrodes were placed on the earlobes and on the superior and outer canthi of the left eye, thereby acting as reference and eye-blink monitors, respectively. Contact impedance for all electrodes was under <5 kΩ with less than 3 kΩ difference across channels. Neural recordings were on-line filtered from 0.05 to 100 Hz and digitally sampled at a rate of 500 Hz.

The evoking stimulus was presented in the context of short story recitations through two wall-mounted speakers located 1 m to the left and right of the participant. Participants were asked to attend to one of the two simultaneously presented stories, which differed in direction (left/right speaker), presentation voice (male/female), and story content. Instructions described both the direction of the attended story and its speaker’s sex (listen to the story on your right/left, which will be told by a male/female, and ignore the story presented from the other side by a speaker of the opposite sex). The initial direction of the attended voice was randomized across participants to control for potential advantages or disadvantages of attending to one voice over the other. The evoking stimulus was presented randomly to the left or right (i.e., attended or ignored) sides of the head with a randomized inter-stimulus interval (ISI) that was either 600, 900, or 1200 ms. The stories and the evoking stimulus were presented with a 10 dB difference between the stories (65 dB SPL) and the stimulus (75 dB SPL). The recording was paused every 8 min, during which participants were given one minute to complete a five-question multiple choice quiz regarding the attended story content and one minute to stretch. An average score of ≥4/5 correct answers was required for study inclusion. After each break, the attended story changed directions and participants were asked to change their attended side (left/right) in order to continue with the same voice. The entire recording session lasted 40 min and yielded 600 simultaneously recorded responses in both attended and ignored conditions.

Data processing and analysis

Continuous neural data for attended and ignored conditions were baseline corrected and the removal of eye-blink artifacts was accomplished using the spatial filtering algorithm in NeuroScan Edit 4.3 (Compumedics). Response variability was computed through generation of a variability index (VI) for each subject in each condition, following a procedure described in Smith and Goffman (1998), who applied it to assess variability in speech movements. Continuous files were epoched from −100 to 500 ms, referenced to the presentation of the stimulus (0 ms); epochs demonstrating amplitudes beyond ±100 μV were rejected as muscular artifact and the first 500 artifact-free responses from each participant were subjected to analysis. Epochs were grouped into twenty subsets of 25 individual responses; these 25 individual responses in each subset were then averaged, resulting in 20 averaged waveforms (i.e., subaverages). The VI was determined through calculation of amplitude variances across these subaverages. Specifically, amplitudes were determined for each of the 300 points that made up the evoked response subaverages. Rather than comparing amplitudes across subaverages on a point-by-point basis, we averaged point-by-point amplitudes across 50 equally spaced increments (comprised of six points each), computed the variances in these increments across the subaverages and summed them. This generated a single VI for each subject in both attended and ignored conditions. Although evoked response variability has been previously assessed in humans (Anderson et al., 1991), our method is unique in that it enables the assessment of variance over the entire evoked response, including early evoked potentials that are not observable in individual evoked responses (P1/N1). Because these early components are small, we performed our analysis on small subaverages. All data processing was executed with scripts generated in Matlab 7.5.0 (The Mathworks, Natick, MA, USA).

Differences in response variability between attend and ignore conditions were compared for all of the 31 electrode sites using a Repeated Measures ANOVA. Effects at individual electrode sites (Figure 2) were subsequently explored using post-hoc paired and independent samples t-tests for all electrode sites except for F7, F8, O1, O2, and OZ, which did not demonstrate clear responses characteristic of cortical auditory-evoked activity (i.e., the P1–N1–P2–N2 complex; all other electrode sites demonstrated clear responses characteristic of cortical auditory-evoked activity). Relationships among musical practice histories (i.e., age of onset of musical practice, years of musical practice) and cortical variability were examined with Pearson’s correlations (SPSS Inc., Chicago, IL, USA). All results reported herein reflect two-tailed values and normality for all data was confirmed using the Kolmogorov–Smirnov test for equality.

FIGURE 2

Figure 2. Impact of attention on cortical auditory-evoked response variability at individual electrode sites. Electrode sites demonstrating a significant decrease in response variability in the attend relative to the ignore condition are in bold black font. Gray italics denote sites that were not subjected to individual analysis. Auditory-evoked activity recorded from PF1/PF2 demonstrated a decrease in variability with attention in musicians only (see Figure 3). ~p < 0.10; *p < 0.05; **p < 0.01; ***p < 0.001.

Results

Summary of Results

Within the subset of participants who had both measures (as described in Materials and Methods), auditory attention performance correlated with speech-in-noise perceptual ability, with better auditory attention relating to the ability to accurately perceive speech in higher levels of background noise (i.e., at lower signal-to-noise ratios; Figure 1C).

All participants demonstrated less cortical auditory-evoked response variability over a majority of electrode sites in responses to the stimulus when it was presented in the attended compared to the ignored story (Figure 2). Only musicians, however, demonstrated decreased cortical response variability with auditory attention over the prefrontal cortex (Figure 3), a region of particular importance for sustaining attention in challenging perceptual environments (Kane and Engle, 2002). The degree to which attention decreased prefrontal response variability correlated with musical practice histories and is interpreted in the context of musical training’s impact on cortical mechanisms of selective auditory attention.

FIGURE 3

Figure 3. Cortical auditory-evoked response variability in musicians and non-musicians. (A) 31-Channel headplots for musicians (left) and non-musicians (right) demonstrate the difference in cortical auditory-evoked response variability between ignore and attention conditions, plotting variability across the scalp as a function of attention. Because the difference was calculated by subtracting attend from ignore variability, positive values (red) indicate a decrease in response variability in the attend relative to the ignore condition. Negative values (blue) indicate an increase in response variability in the attend relative to the ignore condition. (B) Musicians demonstrate an increased impact of attention on prefrontal response variability compared to non-musicians. Whereas musicians demonstrate a decrease in prefrontal response variability in the attend relative to the ignore condition, non-musicians do not. **p < 0.01.

Cortical Response Variability Decreases as a Function of Selective Auditory Attention

A 2 × 31 × 2 RMANOVA with condition (attend/ignore) and electrode site as within subject variables and group (Mus/NonMus) as between subject variable revealed a main effect of condition on response variability, indicating that cortical response variability varied as a function of attention (F_(1,22) = 10.49, p < 0.005). We also observed an interaction between condition electrode site, indicating that attention impacted response variability differentially across the scalp (F_(1,22) = 17.08, p < 0.0001), and an anticipated main effect of electrode site (F_(1,22) = 17.49, p < 0.0001). Post hoc paired t-tests demonstrated that, across all participants, response variability decreased to stimuli in the attended compared to the ignored story at all analyzed electrode sites except for seven (for these seven sites, all t₍₂₂₎ < 1.4, all p ≥ 0.2; Figure 2). An overall effect of attention across all participants was not observed for the three prefrontal sites (F_(1,22) = 0.31, p = 0.40).

Effect of Musical Training on Cortical Response Variability

Musicians and non-musicians did not differ based on overall response variability at any electrode site. That is, neither group was more or less variable in auditory-evoked responses to the ignored or to the attended stories individually, indicating no group difference in general auditory-evoked variability. Rather, differences between musicians and non-musicians were observed with regard to the extent to which attention decreased auditory-evoked response variability at prefrontal electrode sites. Specifically, a 2 × 2 RMANOVA with condition (attend/ignore) as within subject variable and group (Mus/NonMus) as between subject variable revealed a significant condition–group interaction at prefrontal electrode sites FP1 and FP2 (F_(1,22) = 10.21, p < 0.005). Post hoc within-group paired t-tests demonstrated that whereas musicians demonstrated decreased response variability over the prefrontal cortex with auditory attention (t₍₁₀₎ = 3.0, p < 0.01), non-musicians did not (t₍₁₁₎ = 1.6, p = 0.2; Figure 3B). Differences in response variability with attention between musicians and non-musicians were not observed for any of the other electrode sites (all t < 1.4, all p > 0.12).

Across all participants with some degree of musical training (N = 14; Mus = 11, NonMus = 3), the age of onset of musical training correlated with the extent to which response variability decreased in responses to the attended relative to the ignored story (r = −0.54, p < 0.05).

Discussion

Here, we substantiate a relationship between auditory attention performance and speech-in-noise perception (Figure 1C) and reveal a novel neural index for selective auditory attention in musician and non-musician adults, consisting of decreased auditory-evoked response variability in attended relative to ignored speech. Across all participants, attention decreased cortical auditory-evoked response variability at central, temporal, and parietal sites (Figure 2), and this effect was equivalent in musicians and non-musicians. Only musicians, however, demonstrated an impact of selective auditory attention on prefrontal evoked activity (Figure 3). These results provide evidence for the power of musical training to shape prefrontal neural activity involved in sustaining auditory attention and may contribute to the definition of a biological mechanism that would facilitate musicians’ advantages in auditory tasks (Kraus and Chandrasekaran, 2010; Strait et al., 2010).

Cortical Auditory-Evoked Response Variability Underlies Selective Auditory Attention

The goal of sustaining attention on a specific task is to reduce moment-to-moment variability in one’s performance. Sustaining attention becomes particularly difficult in the presence of competing stimuli, such as when tracking a single voice amidst a noisy background. In this situation it is the listener’s goal to absorb the entirety of the attended speaker’s content in order to adequately respond, and lapses in attention result in comprehension gaps that can lead to conversational confusion. Variability in attention performance (i.e., lapses in attention) can also have more drastic consequences, being responsible for accidents while operating mechanical equipment (e.g., cars) and, in educational scenarios, has the potential to diminish the quality of learning that takes place in young brains (Vaurio et al., 2009).

Moment-to-moment behavioral variability has been directly linked with variability in the brain’s extra-sensory evoked activity during task performance (i.e., prefrontal, frontal, and parietal cortices; Carmena et al., 2005; Fox et al., 2005; Weissman et al., 2006). As tasks become more difficult, cortical response variability increases, concurrent with poorer task performance (Vogels et al., 1989). Given that the most frequent analysis technique for electrophysiologic data involves averaging, scientists might regularly overlook a crucially informative neural index for attention and human behavior. This is because evoked potentials traditionally necessitate the averaging of many individual responses to a repeated stimulus in order to maximize that which is consistent across trials (i.e., the average evoked response), effectively minimizing that which is inconsistent and discarding it as noise. This disregarded “noise,” or response variability, is often as large or even larger than the average response itself (Figure 4; Vogels et al., 1989; Softky and Koch, 1993; Arieli et al., 1996). Arieli et al. (1996) encouraged the revision of what we regard as noise in the nervous system, proposing that in doing so we may discover that “response variability… provide[s] the neuronal substrate for the dependence of sensory information processing on behavioral and conscious states”. The data we present here corroborate Arieli’s suggestion by demonstrating a functional relevance for variability in cortical evoked potentials in humans, serving as an index for selective auditory attention. Further work comparing cortical response variability with more commonly employed techniques for analyzing electrophysiologic recordings is likely to reveal relationships between response variability and both spontaneous and averaged evoked activity, such as average peak amplitudes/latencies and oscillatory activity within different frequency bands.

FIGURE 4

Figure 4. Variability in evoked neural activity from intracranial recordings in the cat visual cortex (areas 17 and 18). The local field potential (LFP) and spike discharges of two isolated neurons were simultaneously recorded from a microelectrode in response to repetitive visual stimulation that occurred every 3.5 ms (see Arieli et al., 1996 for further information). Variability in neuronal activity can be seen within (A) trial-by-trial LFPs as well as (B) within the spike trains of individual neurons contributing to the LFP.

Although it is possible that cortical evoked response variability stems, at least in part, from stochastic noise (Faisal et al., 2008), evoked response variability can be predicted by deterministic interactions of sensory responses with ongoing spontaneous activity (Arieli et al., 1996; Curto et al., 2009) that can be modulated by an individual’s brain state (Steriade et al., 2001) and cognitive capacity (Benasich et al., 2008). The decreased evoked response variability with selective auditory attention demonstrated here may indicate general changes in ongoing spontaneous activity between attended and ignored states, revealing a novel neural metric for selective auditory attention in behaving humans. Furthermore, group differences as a function of musical training reflect more consistent prefrontal activity in musicians with auditory attention, which may translate into increased control over the sensory competition imposed by competing auditory signals. This implication is particularly relevant given the role of the prefrontal cortex in directing goal-oriented behavior and the top-down shaping of sensory processing according to internal states or intentions (Miller and Cohen, 2001). Further studies coupling behavioral and neural indices of selective auditory attention are needed in order to better define the functional advantage of decreased prefrontal response variability in musicians. Furthermore, simultaneous recording of cortical and subcortical evoked activity may shed light on relationships between prefrontal response variability and subcortical response properties, such as to speech in background noise (Figure 1D).

Musical Training Hones Cortical Mechanisms of Executive Control that are Implicated in Selective Auditory Attention

Our results demonstrate a selective impact of musical training on response variability with attention at prefrontal electrode sites. This outcome contributes to a growing literature suggesting that musical training shapes auditory function by training the brain to more extensively recruit extra-sensory mechanisms affiliated with cognitive control, such as working memory and attention, for the completion of general auditory tasks (Gaab and Schlaug, 2003; Stewart et al., 2003; Haslinger et al., 2005; Baumann et al., 2008; Pallesen et al., 2010). Previous experiments, however, have not explicitly investigated neural mechanisms of auditory attention in musicians but, rather, studied musicians’ brain function during the execution of psychophysical auditory discrimination and memory tasks. Our data provide the first direct evidence for differential brain activation in musicians and non-musicians during selective auditory attention to speech. That these data are observed in an ecologically valid language-listening environment strengthens arguments for musical training’s impact on functional brain networks that underlie language processing.

Our findings may indicate that musicians demonstrate more consistent ongoing (i.e., spontaneous) prefrontal activity during selective auditory attention, compared to non-musicians. As described above, the dynamics of ongoing neural activity convincingly predict variability in cortical evoked responses. Specifically, evoked activity is low when spontaneous activity is low and evoked activity is high when spontaneous activity is high, with spontaneous and evoked activity positively correlating at an impressive r = 0.9 and p < 10⁻¹² (Arieli et al., 1996). Decreased variability in musicians’ responses would imply increased consistency in ongoing prefrontal activity and, given the importance of consistency for sustaining attention (Weissman et al., 2006), provides a biological mechanism that could account for our previously reported advantage for sustained attention task performance in musicians (Figure 1A; Strait et al., 2010).

Distinctive neural activity during selective auditory attention in musicians and non-musicians may be attributed to the musicians’ rehearsal of auditory cognitive mechanisms required for focused musical practice and performance, strengthening top-down contributors to auditory processing (Tervaniemi et al., 2009; Kraus and Chandrasekaran, 2010; Strait et al., 2010). Although the argument can be made for a genetic contributor to structural and functional neural differences between musicians and non-musicians, repeated evidence substantiates that these differences can be modulated, at least in part, by one’s method of musical practice (Seppanen et al., 2007) or instrument of specialization (Pantev et al., 2001; Shahin et al., 2008; Margulis et al., 2009; Strait et al., 2011). Furthermore, data consistently reveal correlations between the extent of neural enhancement observed in musicians and their years of musical practice or age of practice onset (Gaser and Schlaug, 2003; Hutchinson et al., 2003; Wong et al., 2007; Parbery-Clark et al., 2009a,b; Strait et al., 2009). These data, together with the correlation reported here between prefrontal response variability and age of onset of musical practice, suggest a contribution of experience-induced neuroplasticity to musicians’ auditory processing characteristics.

Clinical and Educational Implications

That musical training has the power to shape neural mechanisms underlying selective attention to speech carries substantial implications for educators and clinicians involved in the remediation of attention-based listening and learning impairments. The ability to attend to a target signal and suppress competing noise is a primary concern for child educators and clinicians given its primacy in everyday learning and communication. It is also of concern to those involved in the treatment of aging-induced listening impairment, which may be prevented through the strengthening of auditory cognitive abilities, such as attention (Parbery-Clark et al., 2011). Accordingly, interest in learning to attend has increased in recent years (Tang and Posner, 2009); within the visual domain, outcomes reveal that task-specific training can improve the temporal allocation of attention (Makovski et al., 2008) and, as required by our paradigm here, increases the neural capacity to filter out competing irrelevant input (Dixon et al., 2009; Kelley and Yantis, 2009). Musical training may provide a naturalistic and entertaining means for strengthening auditory cognitive processing through increasing the consistency of prefrontal control over auditory function.

Although improving attention and the ability to tune in to a signal of interest would benefit the general population, the topic of behavioral and neural variability during selective attention has particular relevance for attention deficit/hyperactivity disorder (ADHD). This is because ADHD is characterized by moment-to-moment variability in behavioral performance (Mullins et al., 2005; Vaurio et al., 2009) and neuronal activity (Depue et al., 2010). Furthermore, structural and functional prefrontal anomalies have been associated with the disorder (Hynd et al., 1990; Casey et al., 1997; Filipek et al., 1997; Gilliam et al., 2011) and are reflected in decreased prefrontal activity during attention task performance (Bush et al., 1999; Rubia et al., 1999). Children with ADHD are particularly noted for an inability to suppress the neural processing of competing sensory input (Suskauer et al., 2008), contributing to frequent distraction. Our association between musical training and decreased prefrontal variability in a neuronal mechanism that underlies selective attention to language during the simultaneous suppression of a competing sound stream may suggest musical training as a viable remediation strategy in children with attention impairment. Still, more work should be done to test the efficacy of music as a remedial approach for ADHD. Population studies investigating the prevalence of ADHD in children and adults with musical training, particularly those with a family history of ADHD, could yield interesting insights.

Conclusion

Increasing effort is being expended to define activities that strengthen what might be considered the cornerstone of human perception: attention. While musical training is known to bolster auditory-specific cognitive skills, such as auditory short-term memory, and the ability to pull out speech signals from competing background noise, little is known about how musical training strengthens attention; even less is known about how music shapes the neural mechanisms that underlie it. Here, we present the first biological evidence for musical training’s impact on neural mechanisms of selective auditory attention within a language context. Given the high prevalence of developmental attention disorders and their detrimental impacts on educational performance, musical training’s power to shape neural mechanisms that underlie selective attention to speech may be of interest to individuals involved in the habilitation and remediation of attention and attention-based learning impairment.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors wish to thank V. Abecassis for his assistance with data collection, T. Nicol for his contributions to the variability analyses and T. Nicol, A. Parbery-Clark, and K. Chan for their constructive comments on a previous version of this manuscript. This work was supported by NIH F31DC011457-01, NSF 0921275, and a Grammy Foundation Research Grant.

Footnote

^Quoted from Matt Richtel, “Your Brain on Computers: Outdoors and Out of Reach, Studying the Brain,” New York Times Aug 15, 2010, p. A1.

References

Abrams, D. A., Bhatara, A., Ryali, S., Balaban, E., Levitin, D. J., and Menon, V. (2010). Decoding temporal structure in music and speech relies on shared brain resources but elicits different fine-scale spatial patterns. Cereb. Cortex. doi: 10.1093/cercor/bhq198. [Epub ahead of print].

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Anderson, J., Rennie, C., Gordon, E., Howson, A., and Meares, R. (1991). Measurement of maximum variability within event related potentials in schizophrenia. Psychiatry Res. 39, 33–44.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Arieli, A., Sterkin, A., Grinvald, A., and Aertsen, A. (1996). Dynamics of ongoing activity: explanation of the large variability in evoked cortical responses. Science 273, 1868–1871.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Bajo, V. M., and King, A. J. (2010). Focusing attention on sound. Nat. Neurosci. 13, 913–914.