ECoG high gamma activity reveals distinct cortical representations of lyrics passages, harmonic and timbre-related changes in a rock song

Listening to music moves our minds and moods, stirring interest in its neural underpinnings. A multitude of compositional features drives the appeal of natural music. How such original music, where a composer's opus is not manipulated for experimental purposes, engages a listener's brain has not been studied until recently. Here, we report an in-depth analysis of two electrocorticographic (ECoG) data sets obtained over the left hemisphere in ten patients during presentation of either a rock song or a read-out narrative. First, the time courses of five acoustic features (intensity, presence/absence of vocals with lyrics, spectral centroid, harmonic change, and pulse clarity) were extracted from the audio tracks and found to be correlated with each other to varying degrees. In a second step, we uncovered the specific impact of each musical feature on ECoG high-gamma power (70–170 Hz) by calculating partial correlations to remove the influence of the other four features. In the music condition, the onset and offset of vocal lyrics in ongoing instrumental music was consistently identified within the group as the dominant driver for ECoG high-gamma power changes over temporal auditory areas, while concurrently subject-individual activation spots were identified for sound intensity, timbral, and harmonic features. The distinct cortical activations to vocal speech-related content embedded in instrumental music directly demonstrate that song integrated in instrumental music represents a distinct dimension in complex music. In contrast, in the speech condition, the full sound envelope was reflected in the high gamma response rather than the onset or offset of the vocal lyrics. This demonstrates how the contributions of stimulus features that modulate the brain response differ across the two examples of a full-length natural stimulus, which suggests a context-dependent feature selection in the processing of complex auditory stimuli.

Listening to music moves our minds and moods, stirring interest in its neural underpinnings. A multitude of compositional features drives the appeal of natural music. How such original music, where a composer's opus is not manipulated for experimental purposes, engages a listener's brain has not been studied until recently. Here, we report an in-depth analysis of two electrocorticographic (ECoG) data sets obtained over the left hemisphere in ten patients during presentation of either a rock song or a read-out narrative. First, the time courses of five acoustic features (intensity, presence/absence of vocals with lyrics, spectral centroid, harmonic change, and pulse clarity) were extracted from the audio tracks and found to be correlated with each other to varying degrees. In a second step, we uncovered the specific impact of each musical feature on ECoG high-gamma power (70-170 Hz) by calculating partial correlations to remove the influence of the other four features. In the music condition, the onset and offset of vocal lyrics in ongoing instrumental music was consistently identified within the group as the dominant driver for ECoG high-gamma power changes over temporal auditory areas, while concurrently subject-individual activation spots were identified for sound intensity, timbral, and harmonic features. The distinct cortical activations to vocal speech-related content embedded in instrumental music directly demonstrate that song integrated in instrumental music represents a distinct dimension in complex music. In contrast, in the speech condition, the full sound envelope was reflected in the high gamma response rather than the onset or offset of the vocal lyrics. This demonstrates how the contributions of stimulus features that modulate the brain response differ across the two examples of a full-length natural stimulus, which suggests a context-dependent feature selection in the processing of complex auditory stimuli.

INTRODUCTION
The appreciation for music is a universal human capacity that plays an inspiring role in our individual and social lives. Music is processed in a cascade of steps that lead from the segregation within the auditory stream, the extraction and integration of a variety of acoustic features, to cognitive, memory-related processes that induce personal, often emotional, experiences. Critical structural components of music have been analyzed in studies addressing the processing of pitch (Hyde et al., 2008;Kumar et al., 2011;Nan and Friederici, 2013;Plack et al., 2014), sensory dissonance (Regnault et al., 2001;Perani et al., 2010;Daikoku et al., 2012), timbre (Deike et al., 2004;Goydke et al., 2004;Caclin et al., 2006Caclin et al., , 2007, melodic contour (Trainor et al., 2002), key (Janata et al., 2002), mode (Halpern et al., 2008), scale properties , music-syntactic congruity (Koelsch et al., 2002;Sammler et al., 2011Sammler et al., , 2013Jentschke et al., 2014;Kim et al., 2014) and rhythmic aspects (Jongsma et al., 2004;Snyder and Large, 2005;Grahn and Rowe, 2009;Abrams et al., 2011;Schaefer et al., 2011a). Typically, these approaches rely on carefully selected or specifically designed stimulus material that allows to examine one aspect of music in an isolated manner while controlling for other influences. This approach has provided a large corpus of evidence about associations between specific aspects of music and brain areas. Notably, by design, it does not directly address the confluence of the multitude of musical features and their intrinsic relations. Including this integrative aspect of musical compositions could contribute to a comprehensive and veridical picture of brain responses to music. Brain responses to naturalistic stimulation may differ from those related to controlled stimulation with simplified stimuli, as suggested by evidence from both the visual and the auditory domain (Hasson et al., 2010or Abrams et al., 2013. Abrams et al. provided (to our knowledge) the first direct evidence that the between-subject synchronization of a large-scale distributed network including auditory midbrain and thalamus, auditory cortex, parts of frontal and parietal cortex, and motor planning regions was significantly higher when listeners were presented with complex musical stimuli lasting minutes than when they listened to shorter pseudo-musical contexts. Accordingly, a stimulus should be sufficiently long to represent a complex musical context, and the music material should be naturalistic and free of manipulations to approximate ecological validity.
These requirements imply a complex, often unbalanced, stimulus material and the single presentation of one (long) stimulus without a-priori defined chances for repetition and signal averaging. In spite of this challenge for data analysis, the interest in the processing of natural music has recently grown considerably. A number of studies using naturalistic music stimuli examine relations between brain signals and behavioral measures, such as autobiographic salience (Janata, 2009), expressive performance (Chapin et al., 2010), emotion ratings (Mikutta et al., 2012(Mikutta et al., , 2013, or ratings of perceived tension (Lehne et al., 2014). Several approaches that combine neuroimaging and acoustic feature extraction directly investigate the relation between brain signals and the multi-dimensional structure of music (Alluri et al., , 2013Toiviainen et al., 2014) or investigate the intersubject synchronization of brain responses to naturalistic music (Abrams et al., 2013;Potes et al., 2014). Only few of these studies used the electroencephalogram (EEG), which due to its high temporal resolution is suitable for investigating the dynamics of music on a fine-grained time scale but typically relies on averaging. Apart from (Mikutta et al., 2012(Mikutta et al., , 2013 where EEG recordings are related to behavioral measures, one novel approach to analyze ongoing EEG elicited by natural music stimuli has been proposed in Cong et al. (2012). It allows to identify EEG components that are common to the majority of subjects and, subsequently, compares the time course of these components to music features. Common to all approaches mentioned above is that they are only sensitive to effects that occur with a certain degree of (spatial) consistency within the group of subjects.
Electrocorticographic recordings (ECoG) from the brain surface provide additional benefits since their superior signal-tonoise ratio is advantageous for the analysis of single stimulus presentations at the level of single subjects. They combine high temporal resolution with high spatial resolution. Thus, they offer a much higher level of spatial specificity and an extended frequency range compared to scalp-recorded EEG. In the field of speech perception research, ECoG has emerged as a new technique to study the functional cortical organization of speech processing (Pasley et al., 2012;Kubanek et al., 2013;Leonard and Chang, 2014;Martin et al., 2014) while studies on music perception are still rare. A first example how the time course of sound intensity of a naturalistic music stimulus can be tracked in ECoG features was provided by Potes et al. (2012). Specifically, this study revealed that high-gamma band (70-170 Hz) ECoG activity in the superior temporal gyrus as well as on the dorsal precentral gyrus is highly correlated with the time course of sound intensity in a continuous stream of natural music. A subsequent study by Kubanek et al. (2013) found that high-gamma ECoG activity also tracks the temporal envelope of speech and compared it to the activations related to music, identifying different levels of specificity in an auditory network constituted by the auditory belt areas, the superior temporal gyrus (STG) and Broca's area. Very recently, a new analysis of the same data set identified spatial and causal relationships between alpha and gamma ECoG activity related to the processing of sound intensity (Potes et al., 2014). Considering that sound intensity (a technical proxy for perceived loudness) was tracked in ECoG features with significant robustness, the same data set appears highly promising for a further investigation that takes into account the variety of features available in this natural music stimulus, a rock song.
The goal of the present follow-up analysis was to explore whether music-related variables other than sound intensity can be tracked in ECoG and, if so, how respective areas of cortical activation compare to those associated with the processing of sound intensity in Potes et al. (2012). Because a naturalistic music stimulus contains different perceptual dimensions that are intrinsically related, it was a critical challenge to differentiate these in the brain response. In addition to the feature of sound intensity that was investigated in the previous studies, we chose four features that relate to different aspects of music. These include the momentto-moment distinction vocals on/off, a continuous measure of harmonic change probability, a measure related to timbral aspects (spectral centroid), and a rhythm-related measure (pulse clarity) (for details see Materials and Methods).

SUBJECTS AND DATA COLLECTION
We analyzed data from ten subjects (for patient's clinical profiles see Table 1 in the Supplemental Data). These 10 subjects included seven of the eight subjects who were analyzed in our previous study (Potes et al., 2012) where patients with epilepsy (4 women, 4 men) were instructed to listen attentively (without any other task) to a single presentation of the rock song "Another Brick in the Wall -Part 1" (Pink Floyd, Columbia Records, 1979) while ECoG activity was recorded. We added to this dataset data from three additional subjects who followed the same protocol. In all patients in the present analysis the electrode grid was in the left hemisphere. All subjects gave informed consent to participate in the study, which was approved by the Institutional Review Board of Albany Medical College. None of the subjects had a history of hearing impairment. The total numbers of implanted electrodes were 96, 83, 109, 58, 120, 58, 59, 112, 134, and 98 for subjects S1 to S10, respectively. After removal of channels containing environmental or other artifacts, 86,82,104,56,108,57,53,93,110, and 92 channels were left for analysis. Grid placement and duration of ECoG monitoring were based solely on the requirements of the clinical evaluation without any consideration of this study. Each subject had postoperative anterior-posterior and lateral radiographs, as well as computer tomography (CT) scans to verify grid locations. The song was 3:10 min long, digitized at 44.1 kHz in waveform audio file format, and binaurally presented to each subject using in-ear monitoring earphones (12-23.5 kHz audio bandwidth, 20 dB isolation from environmental noise). ECoG signals were referenced to an electrocorticographically silent electrode (i.e., a location that was not identified as eloquent cortex by electrocortical stimulation mapping), digitized at 1200 Hz, synchronized with stimulus presentation, and stored with BCI2000 (Schalk et al., 2004;Schalk and Mellinger, 2010). In addition, we analyzed data from the same subjects where they listened to the presentation of four narrated stories that are part of the Boston Aphasia Battery (Goodglass et al., 1983) (details see Kubanek et al., 2013).

EXTRACTION OF ECoG FEATURES
Our analysis focused on the high-gamma band. ECoG activity in the high gamma band has generally been associated with functional activation of the cortex in different domains (Crone et al., 2006). For auditory and speech perception, numerous studies have shown that ECoG high-gamma power modulations over auditory areas provide important information about the spatiotemporal dynamics of sound processing (Edwards et al., 2005;Towle et al., 2008;Pei et al., 2011;Pasley et al., 2012;Potes et al., 2014). We extracted ECoG high-gamma power using the same method as in Potes et al. (2012): high-gamma (70-170 Hz) amplitudes were extracted by first applying a 0.1 Hz high-pass filter and then a common average reference (CAR) spatial filter to the ECoG signals. For every 50 ms window, we estimated a power spectrum from the time-series ECoG signal using an autoregressive (AR) model. Spectral magnitudes were averaged for all frequency bins between 70 and 115 and between 130 and 170 Hz (omitting line noise at 120 Hz).

SELECTION OF MUSIC FEATURES
From the large number of potential features that characterize a music audio signal, we chose a set of five features that capture salient dynamic features of the stimulus and cover a broad spectrum of structural categories of music. Since the results of Potes et al. (2012) revealed a strong correlation of ECoG high-gamma power fluctuations with the sound intensity of the continuous music stimulus, sound intensity was chosen as first feature. It is a temporal feature that can be extracted directly from the raw audio signal and can be considered as an approximate measure of loudness. The second feature was the logarithmic spectral centroid, which is perceptually related to the complex property of timbre. More specifically, it has been related to perceived brightness of sound in Schubert et al. (2004) and to perceived pitch level in Coutinho and Cangelosi (2011). The third feature was probability of harmonic change, which relates to higher-level musical structure, i.e., to harmonic progression and musical syntax. Pulse clarity as fourth feature indicates how easily listeners perceive the underlying rhythmic or metrical pulsation of a piece of music. This feature has been introduced and perceptually validated in Lartillot et al. (2008b) and since then has been used in numerous studies (Eerola et al., 2009;Zentner, 2010;Higuchi et al., 2011;Alluri et al., 2012;Burger et al., 2013). Since an essential characteristic of the music stimulus is the presence of song (lyrics), the fifth feature, vocals on/off, captures the change between purely instrumental passages and passages with vocal lyrics content.
In summary, we chose a description of the audio signal that relates to important basic variables of the perception of music: loudness, timbre, and rhythm. With harmonic change, it encompasses also an abstract high-level property related to the rules of Western major-minor harmony. Finally, with vocals on/off, it allows also to address the impact of vocals with lyrics in music. For comparison, in a complementary analysis, the identical analysis was applied to the sound files of the speech stimuli.

EXTRACTION OF MUSIC FEATURES
Sound intensity was calculated in Matlab (The MathWorks Inc., Natick, Massachusetts). Vocals on/off was determined manually. All other features were extracted using freely available software (see below). We used the first 125 s of Pink Floyd's The Wall -part 1 in the analysis since the last minute of the song is an instrumental afterlude passage with considerably less variation, in particular without any vocal parts.The five features were calculated as described in the following sections.

Sound intensity
The sound intensity of the audio signal was calculated as the average power derived from 50 ms segments of the audio waveform overlapping by 50%. The resulting time course was downsampled to match the sampling rate of 20 Hz of the extracted ECoG high gamma power.

Vocals on/off
The presence of vocals was annotated manually in the audio file. This annotation resulted in a binary function that contained the value 1 for passages with lyrics and 0 otherwise. In the music stimulus there are seven passages with vocal lyrics with average duration of 4.22 s (±0.77) that are separated by at least 5 s of purely instrumental music. In a complementary analysis, we applied a similar procedure to the speech stimuli. Here, 0 was assigned to passages of silence within the story that exceeded the duration of 400 ms, such as pauses between sentences or phrases, while 1 denoted ongoing speech. In the speech stimulus the duration of speech passages was shorter (mean duration 1.65 s ±0.55) and vocals on/off changes occurred more frequently (30 changes in 100 s). In both stimuli the analyzed data start with the first tone of the song or with the first sentence of the narration, respectively, not including a silent pre-stimulus period.

Spectral centroid
The centroid of the log-scaled frequency spectrum was calculated for 50% overlapping windows of 50 ms using the implementation in the MIRtoolbox (Lartillot et al., 2008b). The spectral centroid is the amplitude-weighted mean frequency in a window of 50 ms. It is an acoustic measure that indicates where the "mass" of the spectrum is located. The log-scaled centroid was downsampled to match the sampling rate of 20 Hz of the extracted ECoG high gamma power.

Pulse clarity
Pulse clarity was calculated for windows of 3 s with a 33% overlap using the MIRtoolbox (Lartillot et al., 2008b), then interpolated to match the ECoG sampling frequency of 20 Hz. Pulse clarity is a measure of how strong rhythmic pulses and their periodicities can be perceived by the listener. It is based on the relative Shannon entropy of the fluctuation spectrum (Pampalk et al., 2002) and has been perceptually validated as being strongly related to listener's perception of the degree of rhythmicity in a piece of music in Lartillot et al. (2008a).

Harmonic change
The harmonic change function measures the probability of a harmonic change and detects chord changes. We derived this metric using the Queen Mary plugin for the sonic visualizer (del Bimbo et al., 2010), which implements an algorithm that was proposed and validated on a selection of rock songs in Harte et al. (2006). The algorithm comprises a segmentation of the audio signal into 50 ms windows, spectral decomposition of each window, assignment of chroma and a tonal centroid to each window. After that, the tonal distance between consecutive frames is calculated based on a hypertoroid model of tonal space proposed by Chew (2000). Figure 2 gives a visual representation of each stimulus' spectrogram, an annotation of lyrics and chords or text and the time courses of the five extracted music features for a 12 s-segment.

ANALYSIS
The five features that we used to describe the music stimulus are not independent of each other, but are correlated with each other to variable degrees (see Figure 1). Only by accounting for this correlation, one can attribute a particular ECoG signal to one particular music feature (Kendall et al., 1973). This post-hoc approach is a way to exert statistical control over variables in a setting where experimental control on the different aspects that are to be investigated is ruled out by design. The partial correlation coefficient is given by Equation (1). (1) Within the framework of linear regression analysis, the partial correlation coefficient can be derived as the correlation of the residuals that are produced if the interfering variable z (that is to be eliminated) is used as a regressor to predict each of the two variables of interest x and y (Abdi, 2007). The partial correlation coefficient is related to multiple linear regression analysis (MLR), which was applied in Schaefer et al. (2009) in a similar setting to decompose EEG responses into evoked response components that relate to specific aspects of music stimuli. Furthermore, regression-based models have been applied in a natural speech context in Power et al. (2012) where the auditory evoked spread spectrum analysis (AESPA) method gives a precise account of the temporal dynamics of the transformation of the speech envelope into an EEG signal in single trials. In Ding and Simon (2012) this is extended to an approach that differentiates further between a range of modulation frequencies of the stimulus signal, and, subsequently, identifies the slow temporal modulations of speech in a broad spectral region (below 1 Hz) as features that are represented best in the brain response. Importantly, the partial correlation coefficient differs in one important aspect from the semi-partial correlation/regression coefficient of the multiple linear regression framework: The partial correlation coefficient eliminates the influence of the interfering factor from both variables of interest, not only from one (in the framework of MLR: from the regressor). As a consequence, using the partial correlation coefficient, shared variance that does not cover a large proportion of the total variance, but may still reflect specific relations, is also detected. In a different context, partial correlation has been applied previously in connectivity analysis of EEG recordings: In Marrelec et al. (2006) it was used as a simple but effective method to identify connections between brain areas while accounting for the effects of volume conduction between electrodes. In contrast, here we examine how much each of the five features of music contributes to the sensor-level ECoG recordings in a manner that is independent from the remaining four features. It is important to recognize that both ECoG features and the extracted music features have an autocorrelation, i.e., subsequent samples are not independent of each other. This fact violates the assumptions that underlie the standard tests for significance of correlation. To account for this issue, we assessed the significance of the partial correlation coefficients by applying randomized permutation tests with surrogate data as proposed in Theiler et al. (1992). For each music feature, we generated a surrogate target function by transforming the time domain signal into the frequency domain, randomly permuting its phase spectrum, and reconstructing the time domain signal using the original spectral amplitudes and the permuted phases. After that, we calculated the correlation coefficient between the ECoG feature and this surrogate target function. We repeated this process 1000 times, which resulted in a distribution of correlation coefficients for the surrogate data. We then asked how likely the observed correlation coefficient was to be produced by this surrogate distribution of correlation coefficients.
The resulting p-values were corrected for multiple comparisons within all electrodes [false discovery rate (FDR), q < 0.05]. We then plotted the negative logarithm of the corrected p-values for each electrode on each subject's brain model as an indicator of how much brain activity at a particular site was related to a specific acoustic feature. Since we did not observe negative correlation coefficients, there was no need to distinguish between negative and positive correlation.
Naturally, one would expect that a cortical brain response that tracks features of an auditory stimulus will not respond instantaneously, but delayed. Accordingly, we examined the channelwise partial correlation coefficients with time lags up to 300 ms. However, this resulted in cross-correlation sequences that varied only on a very small scale over time and were not conclusive with respect to an optimal time lag, suggesting that a time lag between stimulus and brain response may be evened out by our sampling rate of 20 Hz. For instance, selecting a biologically plausible time lag of 100 ms, based on Kubanek et al. (2013) where the optimal (averaged) time lag for tracking the speech envelope ranged between 86.7 and 89.9 ms, had only an marginal effect on the significance of correlation coefficients, although the magnitude of correlation coefficients varied slightly (but not systematically). An overview of the group-level results for different time lags is depicted in Figure S3. On these grounds it would have been arbitrary to define a fixed time lag for the analysis and, moreover, a chosen time lag would not have been informative. Therefore, we decided to calculate instantaneous correlation coefficients in the present analysis, using this is a neutral or 'null' hypothesis given that no significant estimate of a biologically plausible time lag was obtainable for this data set. For a detailed analysis of latencies, in particular with respect to differences in the processing of different aspects of music, as suggested in Schaefer et al. (2011b), our approach is not appropriate since the dependencies between the five features play a role in calculating the partial correlation coefficients for one music feature and the ECoG signal. This could be a topic for a future investigation, for instance applying methods proposed in Bießmann et al. (2010) or Power et al. (2012).
Since these measures of significance cannot be directly averaged across subjects, to examine the topographical distribution of significant correlations at the group-level, we visualized the results as following: for each subject, we determined electrodes with significant correlation and projected their positions onto the MNI brain. To detect activated electrodes in similar regions, each of these electrodes was represented by a round patch of activation with radius 10 mm centered around its position. These representations were added up for the 10 subjects, resulting in a map showing the topographical overlap of the presence of significant correlation within the group of subjects. Values range from zero (no significant correlation in all ten subjects) to ten (significant correlation in all ten subjects). The degree of overlap is determined by the radius around an electrode (10 mm). Since grid placement was determined by clinical requirements and, consequently, varied between patients, we needed to account for the fact that the maximal number of subjects who can contribute to the group-level overlap of activation also varies between brain regions. Therefore, we determined the group-level overlap of grid coverage on the MNI brain, referred to as grid coverage index in the following, for all electrodes. Using the grid coverage index, a normalized group-level overlap in a specific cortical area can be obtained by dividing the (unnormalized) group-level overlap by the grid coverage index for each vertex. However, even the normalized group-level overlap values cannot be used for inferring group-level statistics, for instance to assess differences between brain areas. Nonetheless, this does not affect the primary goal of the present analysis, which is to explore potential differences in one location between features and also between the conditions music and pure speech. For distinct foci of high degree of grouplevel overlap, we determined representative coordinates on the MNI brain manually, and derived the corresponding Brodmann areas using the Talairach Atlas daemon 1 . Owing to the variance introduced by the projection of each subject's individual brain onto the MNI brain and to the blurring effect that the above mentioned procedure of determining group-level overlap may cause, this procedure yields only an approximate localization of cortical activation. Notwithstanding, on the scale of the Brodman area, this level of exactness appear appropriate for comparing the present results with the literature. Figure 1 shows a confusion matrix. For each element in this matrix, the brightness gives the correlation between two particular music features. In the music stimulus vocals on/off is strongly correlated with spectral centroid (r = 0.69) and intensity (r = 0.37), which confirms the necessity for calculating partial correlations. Figure 2 gives a visual representation of each stimulus' spectrogram, an annotation of lyrics and chords or text and the time courses of the five extracted music features for a 12 s-segment as well as the time course of ECoG high gamma power, measured at one representative electrode in one subject. features the time course of ECoG high gamma power, measured at one representative electrode of subject S5 was added below. The location of the electrode is indicated on the brain model on the right panel.

FIGURE 3 | Grid coverage index: Overlap of grid coverage on MNI brain.
superior temporal gyrus and the ventral parts of the precentral and postcentral gyri. Figure 4 shows the significance values of partial correlation of ECoG high-gamma features with each of the five music features for each individual patient. Significant high-gamma correlations with vocals on/off are present in 9/10 of the subjects, and exceed in spatial extent those of all other acoustic features. In all of these nine patients, significant positive correlations are present in auditory areas around the Sylvian fissure, either confined to a region on the posterior superior temporal gyrus (pSTG) (S1, S2, S4, S6, and S8), or extending also to the anterior part of the STG and dorsally from the Sylvian fissure (S3, S5, S9, and S10). In addition, significant correlation in an isolated area at the dorsal precentral cortex is present in three subjects (S3, S5, and S9). Compared to the effect related to vocals on/off, correlation with sound intensity (after calculating the partial correlation and thereby rendering it now independent from fluctuations in the other four acoustic parameters, including vocals on/off) is low, reaching significance only in subject S4, S5, S7, and S10) and is detected only in a smaller region on the posterior STG. Correlation with spectral centroid is significant only in subject S5 and S10 and distributed similarly to the feature vocals on/off, but spatially less extended. For harmonic change, significant correlation is present in four subjects (subject S3, S5, S9, and S10) on the posterior STG and in subject S3 in frontal areas. The correlation with pulse clarity reaches significance in only one subjects (S6) in a small region on the precentral cortex. Figure 5 depicts the cortical distribution of significant partial correlation of ECoG high-gamma features with each of the five acoustic features for the natural speech stimuli at the level of each individual patient. Differing from the music condition, the feature that is reflected most consistently within the group is sound intensity with significant correlation in 6/10 subjects (S1, S2, S3, S4, S5, S9, and S10). In all of them, the focus of correlation is located on the pSTG. Beyond that, significant correlation is present on the inferior/medial temporal gyrus (S1, S2), on the inferior frontal gyrus (S3) and on the precentral cortex (S5). For the feature spectral centroid, significant correlations are present only in three subjects on the superior and medial temporal gyrus. Of these, subject S10 is the only subject with significant correlation for spectral centroid in, both the music and the speech condition. For harmonic change, significant correlation is present only in subject S4 on the inferior frontal gyrus (IFG). For pulse clarity, no significant correlation with ECoG high gamma features is present.
The top row of Figure 6 shows the group-level overlap of significant "standard" correlation (Pearson's correlation coefficient without partialing out the other features) of high-gamma ECoG features with each of the five music features, i.e., including influences of the other features, on the MNI brain. Common to all patterns except pulse clarity is a focus of significant correlation in peri-Sylvian areas that is present in all ten subjects for the features sound intensity, vocals on/off, spectral centroid, in six subjects for harmonic change. The pattern for pulse clarity is most extended, and shows a large spatial variability of activation that do not overlap in more than three patients. In general, at a descriptive level, the similarity between cortical overlap patterns mirrors the correlation matrix of the music features in that they mainly document the interdependence of musical features rather than allowing to differentiate between processing of specific dimensions of music. The middle row of Figure 6 gives the group-level overlap of significant correlation of high-gamma ECoG features with each of the five music features after the influence of the remaining four other features has been removed by calculating partial correlations (see Materials and Methods). The highest degree of overlap is present in the feature vocals on/off with significant correlation of high-gamma power with vocals on/off in more than seven subjects around the Sylvian fissure, covering the posterior and middle part of the superior temporal gyrus and of the middle temporal gyrus. The point of most consistently detected activations in the present group of subjects is the posterior part of the superior temporal gyrus (9/10 subjects). Furthermore, overlap of significant correlation is present in the precentral gyrus in three subjects. For all other features, the group-level overlap is considerably less: for sound intensity, there is a common focus of activation in the anterior peri-Sylvian area in three patients. Locations of significant correlation for harmonic change vary along the STG, amounting to a number of three overlapping subjects at the maximum. Significant correlation with spectral centroid is distributed around the Sylvian fissure, however with minimal inter-individual overlap.
The bottom row of Figure 6 shows the group-level overlap of significant correlation for complementary analysis of speech-only stimuli. The overlap of significant correlation with sound intensity is distributed around the Sylvian fissure with highest values on the middle part of the STG, corresponding to the respective location in the music condition, but with five contributing subjects, compared to three subjects in the music condition. However, for all other features the degree of overlap does not exceed two subjects in any location. Figure S2 shows the group-level overlap depicted in Figure 6, normalized with respect to the grid coverage index depicted in Figure 3. We included only cortical sites with a minimum grid coverage of 2 subjects. This representation demonstrates that the characteristic patterns of the group-level overlap representation ( Figure 5) do not merely reflect the distribution of the grid coverage index, but that the distribution of significant correlation has features that are consistently present in a large proportion of the subjects in which grid coverage is given. Figure S3 shows the group-level overlap of significant correlation for delays of 0, 50, 100, 150, 200, 250, and 300 ms of the time course of the ECoG high gamma power and the music features.

DISCUSSION
The present study explored the processing of complex natural music by examining relations between ECoG high-gamma band power and five features of music and thereby extends findings by Potes et al. (2012). To varying degree, these features co-fluctuate in original (unmodified) music and were found to produce similar cortical distributions of significant correlation in the high-gamma band. To address this issue, we calculated partial correlation coefficients to assess the unique impact of each of the five features (for comparison of standard and partial correlation see Figure 6, top and middle). Significant correlation of the high-gamma band ECoG features with the feature vocals on/off (indicating the change between purely instrumental passages and those with lyrics content) exceeded by far any of the other features in spatial extent and was consistently present within the group of subjects. Furthermore, distinct cortical patterns of significant correlation with the features sound intensity, spectral centroid and harmonic change were present in single subjects.

REFLECTION OF ASPECTS OF MUSIC IN ECoG HIGH GAMMA POWER
These results demonstrate that in this example of a rock song, the change between purely instrumental passages and those with vocal lyrics content is the compositional feature that exerts the most marked effect on the electrocorticographic brain response in the high-gamma frequency band. In nine of ten patients, significant correlation of the high-gamma ECoG features with vocals on/off was present on the superior temporal gyrus. This led to a core region of high group-level overlap of significant correlation along the middle and posterior superior temporal gyrus (including Brodman areas 22, 41, and 42, see Figure 6, second column), particularly, extending posteriorly toward Wernicke's area (posterior BA 22 and BA 40) and also dorsally from the Sylvian fissure. In three subjects, significant correlation was also present on the dorsal precentral cortex (BA 4) and in two subjects on the inferior frontal gyrus near Broca's area (BA 45). Thus, the very onset of speech-related vocal content within a stream of music has a major impact on the brain response, as it affects stimulus-related neural activity distributed across several distinct brain regions. Considering that the partial correlation approach has removed the influence of the co-fluctuating four other factors that relate to sound intensity, to timbral (spectral centroid) or rhythmic characteristics (pulse clarity), and to harmonic structure, the remaining significant correlation could be related to speech-related aspects not addressed specifically in the analysis, such as linguistic-semantic aspect or the presence of the human voice that has been found to effect ECoG gamma activity even for (unintelligible) reversed speech in Brown et al. (2014). A more specific interpretation of this effect cannot be derived in the present context of a naturalistic complex music stimulus. The topography of the speech-related neural activity during listening to music is in line with Merrill et al. (2012), where the left superior temporal gyrus was found to code for the difference between (normal) speech and speech without words (hummed speech prosody) as well as for the difference between song with lyrics and song without lyrics (hummed). Furthermore, a differential BOLD response of the STG (bilaterally) for music with lyrics vs. instrumental music was observed in Brattico et al. (2011). The left mid-superior temporal sulcus was found to reflect (variable degrees of) integrated processing of lyrics and tunes (Sammler et al., 2010). An activation of the dorsal precentral cortex in auditory perception of singing/speaking as well as in covert production has been observed before (Callan et al., 2006;Sammler et al., 2010) and was associated either with internal singing, or, more generally, with an activation of (pre)motor codes upon perception of song or speech. In particular, since the presentation of a full-length well-known rock song resembles a natural listening experience at least with respect to the stimulus material, one may speculate that patients might have silently sung along.
Beyond the impact of vocals on/off, a specific reflection of the features spectral centroid and harmonic change is present on the STG. For the spectral centroid, foci of significant correlation are present in 2/10 subjects (Figure 4). In both of them, they are located on the posterior part of the STG, which includes a part of Heschl's gyrus (BA 41 and 42). This particular area has been related to auditory processing in general, but specifically also to frequency (Liebenthal et al., 2003), pitch (Patterson et al., 2002), and harmonic tones (Zatorre and Krumhansl, 2002). The reflection of the spectral centroid on the STG is in line with Alluri et al. (2012) where the fluctuation of brightness (a component of timbre) in a natural music stimulus correlated with the BOLD response in the STG/MTG. For harmonic change, the focus of significant correlation is distributed similarly to spectral centroid extending also to Wernicke's area (posterior part of BA 22).
For sound intensity, significant correlation is present in 4/10 subjects. The maximal group-level overlap of significant correlation is located on the anterior STG.
Since pulse clarity was found to be reflected in the listener's BOLD response very clearly in the bi-lateral STG, insula and supplementary motor areas in Alluri et al. (2012), detecting a related reflection in the ECoG high gamma response would in principle have been possible with the present setup. Even though pulse clarity is (relatively within our feature set) uncorrelated with the other four features, significant correlation was present in only one subject and was confined to a small area. This suggests that changes in pulse clarity did not have a specific reflection in the ECoG high gamma response. A speculative explanation for the absence of rhythm-related effects in the brain response might include that acoustic features that have been found to be typical for "highgroove" music, such as clear pulses, high energy in low frequency bands, and high beat salience (Madison et al., 2011) may not be very salient in Pink Floyd's The Wall, part 1 (not to be confused with Pink Floyd's The Wall, part 2 that has strong drums).

COMPLEMENTARY ANALYSIS OF SPEECH STIMULI
Considering that speech and music can both be characterized by similar concepts relating to temporal structure (onset structure and syllable structure), pitch-related structure (melody and prosody) and timbral aspects, it is an interesting question whether an identical description of a speech stimulus leads to similar reflection of extracted acoustic features in the high-gamma band. Therefore, we applied the same analysis to ECoG recordings of stimulation with natural speech [four narratives which are part of the Boston Aphasia Battery (Goodglass et al., 1983), for details see Kubanek et al., 2013 where this data set was analyzed previously].
In the speech condition, the feature vocals on/off was reflected much less than in the music condition, in two subjects on the superior temporal gyrus (STG) and in one subject in the posterior part of the middle lateral temporal lobe. Contrastingly, reflections of sound intensity were present in seven subjects with a common focus on the middle part of the STG and individually different distributed effects on the posterior part of the STG, on the MTG, on the dorsal precentral cortex, and on the inferior frontal gyrus.
One the one hand, the present distinct cortical reflection of temporal information in speech agrees with the essential role of the sound envelope in speech understanding that has been Frontiers in Human Neuroscience www.frontiersin.org October 2014 | Volume 8 | Article 798 | 10 established by clinical results (Rosen, 1992;Drullman et al., 1994;Zeng et al., 1999;Lorenzi et al., 2006). On the other hand, the considerably weaker reflection in music suggests that, if speechrelated content is embedded as song in music, the impact of this feature may be overruled by the change from instrumental to vocal/lyrics sound. Together with the fact that the contrast between silence and speech in the speech condition effected less change-related high gamma activity, this may indicate that the pSTG (where vocals on/off in the music condition was reflected most consistently), is responsive to vocals/lyrics in the context of music. Thus, it may be associated with identifying vocal speechrelated content within the context of other complex sounds, reflecting a specific aspect of auditory scene analysis.
In the speech condition, for spectral centroid the group-level overlap of significant correlation is located on the pSTG, similarly to that in the music condition, but (apart from subject S10) different subjects contribute. This suggests that fluctuation of the spectral centroid is reflected on the posterior STG consistently in both conditions in single subjects, but, at the same time, that it is individually different for which stimulus and in whom these reflections reach significance.
In the speech condition, a reflection of the feature harmonic change is present in ECoG high-gamma power in one subject only while in the music condition 4/10 subjects showed a focus of significant correlation on the pSTG. The presence of an effect in the music condition demonstrates that the model-based extraction of harmonic change in music indeed has a physiological reflection. Notably, the harmonic change algorithm is tailored to measure the distance of chords within a tonal space, a metric that is not necessarily applicable to speech sounds. However, since speech also contains harmonic content, we hypothesized that, e.g., changes in the voice fundamental (F0) are extracted by the algorithm to some extent and may lead to a similar representation in the ECoG high gamma-power. Accordingly, the application of the harmonic change algorithm to the speech stimulus resulted in a time course showing a variance comparable to that in the music condition. The absence of such a representation in ECoG high gamma response (with exception of one subject) suggests that the music-specific extraction of harmonic features from the speech signal does not convey information triggering co-varying cortical processing.
To summarize, in this differential analysis of five features of a natural music stimulus, we found the on/offset of vocal lyrics to be the dominant driver of ECoG high-gamma power on the STG (mostly in the posterior part) and in peri-Sylvian areas consistently within the group of subjects. In parallel, in single subjects, sound intensity, harmonic change, and spectral centroid produced specific high gamma reflections in the same brain area. In the speech condition, topographically similar effects for sound intensity were present most consistently, and in single subjects for vocals on/off, harmonic change and spectral centroid. In general, these findings are in line with the assumed involvement of the pSTG in the intermediate stage of auditory processing (Pasley et al., 2012), more specifically in the selective extraction of spectro-temporal features relevant for auditory object recognition. Thus, the observed different activations between music and speech may demonstrate differences in the relative importance of the features in both stimuli. Tentatively, one may explain the interindividual differences with respect to the presence of effects for harmonic change, spectral centroid and sound intensity with different contributions of the five features to an individual's listening experience.
The present results differentiate further the pioneering work of Potes et al. (2012) where ECoG high-gamma features were found to trace sound intensity in two distinct regions, in the posterior STG and in the inferior frontal gyrus. The present follow-up analysis helps to attribute this effect mainly to the presence of vocal speech in the stimulus while the effect of sound intensity proper is found much weaker and confined to a smaller region on the STG. In addition, with spectral centroid and harmonic change, we identified two further aspects specific for music that have an impact on the high-gamma ECoG response in some subjects. Notwithstanding, in these single subjects, these effects are highly significant and derived from one presentation of the stimulus.
The present results complement those of Kubanek et al. (2013) where high-gamma ECoG activity was reported to track the temporal envelope of natural speech stimuli in non-primary areas of the auditory cortex: in the superior temporal gyrus (STG) and on the inferior frontal gyrus (IFG) near Broca's area. On the other hand, the temporal envelope of music including song (condition "lyrics," represented by the concatenated periods of singing) and purely instrumental music (condition "melody," represented by the purely instrumental periods) was tracked considerably weaker by high-gamma ECoG activity and only in the auditory belt area. This suggested a specificity of STG and IFG for speech-related processing and confirmed the importance of temporal information in speech. The present results help to further elucidate the previous results insofar that they demonstrate that not only the sound envelope is encoded weaker in the music condition, but that the alternating presence/absence of vocals is represented predominantly.
Since research on neural processing of natural music is heterogeneous with respect to data-recording techniques, stimulus material, music features, and methods of data analysis, it is difficult to directly compare results of different studies. With respect to physiology, the present data reveal that in addition to the alpha and theta frequency bands, which have been found to reflect dynamic rhythmic features of music (Cong et al., 2012), the high-gamma band carries information about the music stimulus.
Our analysis was focused on the high gamma frequency range, based on initial results that were not informative in lower frequency bands. Negative correlation between ECoG power in lower frequency bands (8-12 and 18-24 Hz) and sound intensity was reported in Potes et al. (2012). Recently, Potes et al. (2014) showed in a detailed analysis that envelope-related high gamma activity  in areas close to primary auditory cortex, in peri-Sylvian and superior pre-motor areas precedes and predicts envelope-related alpha band activity (8-12 Hz) near primary auditory areas that have been found to be the target of afferent auditory projections from the thalamus. In light of the hypothesis of Kumar et al. (2007) and Zatorre et al. (2007) the alpha band activity has been associated with relay mechanisms that govern the transfer of auditory information from the thalamus to core auditory areas. The high gamma activity has been related to cortical extraction of complex auditory features in nonprimary auditory areas. According to this model, the high gamma frequency range would be suitable starting point for a differential analysis of the processing of higher-level auditory features. However, extending the present analysis to lower frequency bands in the future could provide additional information. In a natural speech context, in particular phase information in low-frequency bands of the brain signal has been found to be informative in Zion Golumbic et al. (2013) and Ding and Simon (2012).

METHODOLOGY
With respect to methodology, we approach a typical problem that arises when assessing the relationship between brain recordings and natural auditory stimuli. We address the problem of interdependent features of natural complex stimuli that complicate the correlation-based analysis of the relation between brain signals and stimulus features. Recently, in Alluri et al. (2013), this challenge has been faced by applying principal component regression modeling where the interrelated multi-dimensional description of a music signal is transformed into a lower-dimensional space of uncorrelated components that are subsequently perceptually evaluated. Here, operating on the original features, we demonstrate that a partial correlation approach as an extension of multiple linear regression analysis (Schaefer et al., 2009) can be used to differentiate between the processing of aspects of natural music. Typically, in a naturalistic setting, multi-channel measurements are related (e.g., by correlation measures) to a multidimensional description of music, a situation which is prone to produce false positive effects such as spurious correlations. One way of constraining the solution is to assume inter-individual consistent spatial distribution of neural activity, e.g., by averaging the EEG time course across subjects (Schaefer et al., 2009) or by selecting components that are common to the majority of subjects Cong et al., 2012). The present results are an important complement to previous studies, as they were obtained at the single-subject level and for one single stimulus presentation owing to the ECoG's characteristics of offering both high temporal and spatial resolution. This sensitivity helped to reveal a considerable variability between subjects with respect to reflected features of music, an insight that suggests that assuming within-group consistency might neglect some of these individual effects.

CURRENT EXPERIMENTAL LIMITATIONS
Obviously, there are limitations of what can be achieved with this approach. Typical for ECoG recordings, the data were recorded from epilepsy patients whose physical and cognitive conditions were impaired to different degrees and whose brains may not be comparable to that of the healthy population in function and neuroanatomy. Furthermore, grid coverage was determined by clinical reasons and thus varied between subjects. Important issues, such as hemispheric specialization for speech and music, cannot be addressed with the present data set of left-hemispheric recordings. Another important issue is, that information about the patients' music preference, cultural background and musical training that could give valuable clues for interpreting interpersonal differences is not available in this follow-up analysis.

CONCLUSION
However, our analysis is an example of what can be achieved within these limits and contributes to the growing body of methodological approaches for research on the processing of natural music. Partial correlation, proposed here as one solution for inter-dependence of stimulus features, has detected specific reflections of music features in the ECoG high-gamma response. However, it has to be kept in mind that this method gives a differential picture of each music feature's impact on the brain response showing cortical reflections that are unique to this feature beyond all others in the feature set. Thus, for a given feature, the portion of independent variance from the other features is crucial for the detectability of its reflection in the brain response. It should be kept in mind that the present approach provides a differential view on brain responses to aspects of a natural music stimulus, not a comprehensive decomposition of the brain signal.
Naturally, when comparing two different stimuli, such as in our case in the speech and music condition, the individual interdependence of stimulus features is not the same, nor can the stimulus features themselves be balanced between both stimuli. Our results, therefore, have to be regarded as highly specific cortical imprints of two different, naturally unbalanced examples of natural auditory stimulation from two sound categories, not as general findings on the processing of music or speech. Nonetheless, the present differentiated picture of brain responses at the level of single subjects and a single presentation is a valuable complement of the recent series of investigations in natural music processing research.

ACKNOWLEDGMENT
This work was supported by grants from the US Army Research Office (W911NF-07-1-0415, W911NF-08-1-0216, W911NF-12-1-0109) and the NIH/NIBIB (EB006356 and EB000856). We gratefully acknowledge Dr. Ritaccio for his help with patient interactions, Drs. Peter Brunner and Aysegul Gunduz for their help with data collection, and the cooperation of the subjects in this study. Irene Sturm's work was funded by the Berlin School of Mind and Brain and also by the Christiane Nüsslein-Volhard foundation.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnhum. 2014.00798/abstract