Including Measures of High Gamma Power Can Improve the Decoding of Natural Speech From EEG

The human auditory system is highly skilled at extracting and processing information from speech in both single-speaker and multi-speaker situations. A commonly studied speech feature is the amplitude envelope which can also be used to determine which speaker a listener is attending to in those multi-speaker situations. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) has shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such processing is strongly reflected in the power of high frequency neural activity (around 70-150 Hz; known as high gamma). The first aim of this study was to determine if high gamma power scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Specifically, linear regression was used to investigate speech envelope and attention decoding in low frequency EEG, high gamma power EEG, and in both EEG signals combined. The second aim was to assess whether the information reflected in high gamma power EEG may be complementary to that reflected in well-established low frequency EEG indices of speech processing. Exploratory analyses were also completed to examine how low frequency and high gamma power EEG may be sensitive to different features of the speech envelope. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in some subjects. This same pattern was true for attention decoding using a separate group of subjects who participated in a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power EEG, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Our results indicated that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects. Combining high gamma power and low frequency EEG can improve the mapping between natural speech and the resulting neural responses.


INTRODUCTION
Scalp-recorded electroencephalography (EEG) provides a non-invasive means of investigating cortical activity with high temporal resolution. This makes it particularly suited for studying neural processes such as speech perception-where humans rapidly convert mechanical vibrations of the air into meaning. In terms of speech, the slow varying acoustic envelope of continuous natural speech was found to be reflected in EEG (Luo and Poeppel, 2007;Lalor and Foxe, 2010) which is valuable because speech modulations in the 4-16 Hz range have been shown to contain the most important information regarding intelligibility (Drullman et al., 1994). As a result, many studies tend to focus their analysis around this frequency range when using EEG or magnetoencephalography (MEG) to investigate cortical tracking of the speech envelope (Ahissar et al., 2001;Aiken and Picton, 2008;Peelle and Davis, 2012;Di Liberto et al., 2015).
In contrast to the emphasis on lower-frequency bands in EEG speech research, studies that employ electrocorticography (ECoG) often look at signals in the high gamma range (∼70-150 Hz). High gamma ECoG has also been shown to track the speech envelope (Pasley et al., 2012;Kubanek et al., 2013), even though high gamma and low frequency (LF) activity are thought to result from distinct physiological mechanisms (Edwards et al., 2009). The fidelity of speech tracking in high gamma power (HGP) ECoG data is so high that most ECoG studies focus exclusively on that frequency range and ignore the data at lower frequencies. Meanwhile, the high-frequency content of scalp-recorded EEG is typically disregarded because it is low pass filtered by the skull (Pfurtscheller and Cooper, 1975) and smeared by the dura and cerebrospinal fluid (Light et al., 2010), thus resulting in a low signal-to-noise ratio. Nevertheless, we questioned whether there is still useful stimulus-related information dissociable from low-frequency data that could be retrieved from high gamma EEG. If so, high gamma EEG could serve as a useful measure for studying speech and language processing in various populations.
To our knowledge, as this study represented the first effort to do this, we restricted our focus to examining the aforementioned tracking of the temporal speech envelope in HGP EEG. EEG data were recorded as subjects listened to continuous natural speech, and we mapped the EEG (filtered into LF, HGP, and both signals combined) to the temporal speech envelope using linear regression. In a second data set, we investigated whether the inclusion of HGP can improve auditory attention decoding. It is established that envelope tracking in LF EEG is modulated by attention (Kerlin et al., 2010;Power et al., 2012), and cortical HGP has also been shown to be strongly modulated by attention (Mesgarani and Chang, 2012;Zion Golumbic et al., 2013;Dijkstra et al., 2015). Here, we employed a framework that has been successful in ascertaining attentional selection within the context of a task in which subjects attend to one of two concurrent talkers . We compared how well attentional selection can be decoded from EEG when using LF, HGP, and a combination of the two. In doing so, we found that for a minority of subjects, the speech envelope and attention are reflected in HGP EEG in a way that may be complementary to the information available in LF EEG.

Subjects
Two experimental paradigms were explored in the present study using data from three previously published studies. The first paradigm involved subjects listening to a single speaker and the second involved subjects attending to one of two concurrently presented speakers. Data used in the single speaker paradigm originated from two previous studies examining semantic dissimilarity and phoneme level processing (Broderick et al., 2018;Di Liberto et al., 2018) 1 . Seventeen subjects (min = 19 years, max = 31 years, 12 males) were used in total. These studies were approved by the Ethics Committees of the School of Psychology at Trinity College Dublin and the Health Sciences Faculty at Trinity College Dublin. Data used in the attentional selection or cocktail party paradigm was from the control condition of a study which investigated the decoding of auditory attention . Fourteen subjects (min = 19 years, max = 30 years, 5 male) took part in the experiment. This study was approved by the Research Subjects Review Board at the University of Rochester. All subjects were native English speakers, provided written informed consent, and reported no history of hearing impairment or neurological disorders.

Stimuli and Procedure
The single speaker experiment consisted of subjects listening to 20-29 trials (approximately 180 s in length) of a mid-20th century audiobook read by one American male speaker (Hemingway, 1952). The subjects were tested in a dark, soundattenuated room and were instructed to attend to a fixation cross in the center of a screen. The storyline was preserved in the trials, with no repetitions or discontinuities present. In the multispeaker experiment, subjects undertook 20 trials (approximately 60 s in length). They were presented with two stories (Doyle, 1892(Doyle, , 1902 simultaneously, narrated by a male and a female speaker. The two audio streams were filtered using head-related transfer functions to simulate spatial separation of the speakers (one speaker at 90 degrees to the left and the other at 90 degrees to the right). The attended speaker was on the left in 50% of the trials and was on the right the other 50%. The subjects were instructed to attend to the male speaker in all trials and to minimize motor movements by fixating to a cross at the center of a screen. Subjects then answered four multiple-choice questions on the attended and unattended stories after each trial (which were not analyzed in this work). In all experiments, the stimuli were presented through Sennheiser HD650 headphones at a 44.1 kHz sampling rate using Presentation software from Neurobehavioral Systems 2 .

Data Acquisition and Preprocessing
For both experiments, EEG data were acquired at a 512 Hz sampling rate with the BioSemi Active Two system using 128 scalp electrodes (plus two mastoid channels that were not analyzed in this work). Each subject's scalp data were re-referenced to the common average. Noisy channels were determined based on three of EEGLAB's artifact rejection methods (kurtosis, spectral estimates, and probability distribution; Delorme and Makeig, 2004), and spline interpolation was used to reject and recalculate the data in those channels. The EEG was then filtered into two separate bands. The first band (low-frequency band) was high pass filtered at 1 Hz and then low pass filtered at 15 Hz using a zero-phase type 2 Chebyshev filter. For the second band, the raw data were bandpass filtered in the high gamma range from 70 to 150 Hz using a 200th order zero-phase FIR filter with a hamming window. The absolute value of the Hilbert transform was taken from the high gamma EEG to extract the power (as is typically done in ECoG studies). The resulting data was then low pass filtered at 15 Hz to match the power spectrum of the LF EEG and to ensure we did not artificially render the LF EEG and HGP EEG differentially sensitive to different features in the speech envelope; this way, decoding using both frequency bands could be more directly compared. Finally, all data were downsampled to 128 Hz.

Data Analysis
Our analyses for both experiments were based on assessing how strongly the speech signal was represented in our different EEG bands by reconstructing an estimate of the speech envelope from the neural data (Crosse et al., 2016). Speech envelopes were extracted from the stimuli using a gammachirp auditory filterbank which mimics the filtering properties of the human cochlea (Irino and Patterson, 1997). Afterward, the envelopes were normalized between 0 and 1, and the EEG data were z-scored. A backward model (decoder) was employed to reconstruct the speech envelope, s(t), from the neural response, r(t, n), while the decoder, g(τ , n), acted as a linear map between the two. The transformation can be expressed as: whereŝ(t) is the reconstructed speech envelope. The decoder integrates the EEG (with n electrodes) over a range of time lags, τ , from 0 to 250 ms, the range where low-level speech features (e.g., envelope, spectrogram, and phonemes) cause notable EEG responses to occur (Di Liberto et al., 2015). The decoder ( Figure 1A) was computed by the following operation: where R is the lagged time series, T, of the EEG data, λ is the regularization parameter, I is the identity matrix, and s is the speech envelope. Model performance was assessed according to the accuracy in which the speech envelope could be reconstructed using leave-one-out cross-validation. This regression allowed for an optimal regularization parameter to be chosen without overfitting to the training data. The regularization parameter that produced the highest Pearson's correlation coefficient between the reconstructed envelope and the actual speech envelope across trials was chosen as the optimal value. Separate decoders were created for the LF and HGP groups. A model was also calculated for the combination of LF and HGP (LF+HGP) signals by concatenating the two signals (each 128-channels by delays) to form one matrix of 256-channels by delays.
To decode attention, we employed a framework introduced by O'Sullivan et al. (2015; Figure 1B). Decoder models that mapped from the EEG data to the speech envelope of the attended speaker were computed for each subject and each trial. The regularization parameter was once again determined based on leave-oneout cross-validation. We could then reconstruct the stimulus envelope of a particular trial, n, using the average attended decoder of n − 1 trials. The Pearson's correlation coefficient, r, was computed between the reconstructed envelope and both the actual attended and unattended stimulus envelopes. A trial was deemed correctly classified if the reconstructed envelope was more correlated with the attended envelope than the unattended envelope (r attended > r unattended ).

Statistical Analysis
We compared envelope reconstructions between LF and HGP EEG conditions using paired t-tests. One-way repeated measure analysis of variance (ANOVA) tests were used to compare stimulus reconstructions between the LF, HGP, and LF+HGP EEG conditions. The resulting statistics from the ANOVAs were Bonferroni corrected to determine significance within subjects and between groups. Lastly, two-way repeated-measures ANOVA tests were conducted to examine the effect of multiple factors on envelope reconstruction accuracy.
Testing against chance was completed using permutation tests. In the envelope reconstruction analysis, a null distribution of 10,000 Pearson's r values was created by finding the correlation between randomly permuted trials of predicted audio envelopes and actual audio envelopes. The true mean correlation coefficient served as the observed value of the test statistic. In the decoding attention analysis, we performed 10,000 permutations to create the null distribution, where for each trial of each permutation we randomly selected a correlation value from either r attended or r unattended to be assigned to bin A, and the other to bin B. The observed value of the test statistic was the percentage of trials where r attended > r unattended . The threshold for significance and above chance performance was p = 0.05 for each test. All analyses were conducted in MATLAB (The MathWorks, Inc.) and SPSS (IBM SPSS Statistics).

Though Generally Weaker Than LF EEG, HGP EEG Consistently Tracks the Speech Envelope
We first tested how well the speech envelope is reflected in LF (1-15 Hz) and HGP (power in the 70-150 Hz range) EEG. To do so, a decoder model was calculated for both conditions. Pearson's r was used to quantify the relationship between the actual speech envelope and the reconstructed speech envelope. The grand average reconstruction accuracy (Pearson's r) for the LF and HGP conditions were significantly larger than chance (p < 0.001, permutation tests, Figure 2A). Thus, on a group level, the speech envelope appeared to be encoded in the HGP EEG. Paired t-tests were used to assess the differences between both decoders and showed that the LF decoder reconstructed the speech envelope significantly better than the HGP decoder (p = 0.006).
Since recorded brain activity may vary across individuals due to anatomical differences, we wanted to examine how the LF and HGP decoders performed on a single-subject level. When tested against chance, LF and HGP were significant for all subjects (p < 0.05, permutation tests). As expected, the LF decoder worked best for most participants (N = 12, p ≤ 0.05, paired t-tests, Figure 2B). Surprisingly, there was no difference in reconstruction accuracy for subjects 2, 3, 7, and 11 (p ≥ 0.05). Subject 10's LF decoder displayed a typical reconstruction accuracy, but the HGP decoder performed much better (p = 6.120e-08). Thus, though uncommon in EEG studies, HGP was able to track the speech envelope comparably or better than LF in some of our subjects.
To characterize factors contributing to the interindividual differences in high gamma tracking of the speech envelope, we tested whether decoding performance could be predicted from the raw power of each subject's high gamma-band EEG. Our rationale was that perhaps subjects with stronger raw gamma power on the scalp would show better HGP tracking of the speech envelope. We calculated the average high gamma-band power of the raw EEG and averaged the power across channels and trials for each subject. Afterward, we correlated each subject's power with their mean reconstruction accuracy using Pearson's r. We found no correlation between raw high gamma-band power and HGP stimulus reconstruction accuracy (r = −0.043, p = 0.870). That said, our raw gamma power measures were acquired during stimulus presentation which was not ideal. Unfortunately, we did not have a sufficient amount of baseline (i.e., no stimulus) EEG data to determine raw gamma power in the absence of speech. Future work might consider collecting prestimulus EEG data to test if changes in raw HGP correlates with reconstruction accuracy during stimulus presentation (relative to the prestimulus baseline).

FIGURE 2 | (A)
Grand average reconstruction accuracies across trials and subjects for the LF and HGP conditions. The black crosshairs are the mean reconstruction accuracies and the red crosshairs are outliers. Significance was calculated using paired t-tests (**p ≤ 0.01). (B) Mean (±SEM) reconstruction accuracy for each subject across trials for the LF and HGP conditions. Significance was calculated using paired t-tests (n.s., no significance, **p ≤ 0.01, and ***p ≤ 0.001).

Stimulus Reconstruction Suggests That LF and HGP EEG May Carry Complementary Information
Next, we tested if LF and HGP EEG carry complementary information regarding the speech envelope in subjects with comparable HGP measures. To do so, we created a combination model (LF+HGP) using 256 channels in total (128 LF EEG channels + 128 HGP EEG channels). The LF+HGP decoder had a significantly higher mean reconstruction accuracy than the LF decoder alone for subjects 2, 3, 10, and 11 (p < 0.05, one-way repeated measures ANOVA, Figure 3A); this suggests that HGP and LF EEG may carry complementary information. On the other hand, this could mean that LF and HGP EEG carry the same information-combining the two signals may have increased the signal-to-noise ratio of the EEG, in turn aiding the decoders' performance.
To further examine the complementary or analogous nature of the two signals, we investigated which components of the speech envelope were being decoded from the LF and HGP EEG. The first component we examined was syllable onset. It has been suggested that syllables contain important information regarding sound segments and prosodic features of speech (Hertrich et al., 2012). Syllable onsets are approximated by computing the first derivative of the speech envelope and setting the negative values to zero (Hertrich et al., 2012). New LF and HGP decoders were trained using the derivative of the speech envelope. Single-subject results are shown in Figure 3C.
We examined the effects of speech envelope type (full envelope vs. onset envelope) and EEG band (LF EEG vs. HGP EEG) on reconstruction accuracy. We did not find a significant interaction between speech envelope type and EEG band (p = 0.109, two-way repeated-measures ANOVA), but we did find significant main effects of EEG band (p = 6.622e-06) and envelope type (p = 3.919e-05) on reconstruction accuracy. More specifically, utilizing LF EEG led to a higher reconstruction accuracy than HGP EEG, and the full speech envelope was reconstructed better than syllable onsets. Compared to the full envelope, decoding syllable onsets caused reconstruction accuracy to decrease by 14.88% using LF EEG and by 51.94% using HGP EEG. The decline in decoding performance suggests that HGP EEG supplies more information about the broadband speech envelope than syllable onsets, but these changes in performance between EEG band and speech envelope type were not substantial enough to result in a significant interaction effect.
Previous literature suggests that different EEG bands may reflect different functional aspects of speech processing (Cogan and Poeppel, 2011;Ding and Simon, 2014;Molinaro and Lizarazu, 2018). We reasoned that LF and HGP EEG (both in the 1-15 Hz band) may be differentially sensitive to dynamics in the speech envelope that map on to delta, theta, and alpha frequency ranges. As such, we filtered the speech envelope into 1-4 Hz, 4-8 Hz, and 8-15 Hz bands and trained new decoders on these features. The result of a two-way repeated-measures ANOVA indicated a significant interaction between EEG band and speech envelope band on reconstruction accuracy (p = 1.031e-04). We explored this interaction further by examining the simple main effects. The simple main effects analysis showed that the 1-4 Hz, 4-8 Hz, and 8-15 Hz speech bands were reconstructed best with LF EEG (p < 0.001). LF EEG reconstructed the 4-8 Hz band better than the 1-4 and 8-15 Hz bands (p = 0.039, p = 4.679e-08, Figure 3B) and the 1-4 Hz band better than the 8-15 Hz band (p = 2.299e-04). Alternatively, there was no difference in how well the HGP EEG reconstructed the 1-4 Hz and 4-8 Hz bands (p = 0.523) or the 1-4 Hz and 8-15 Hz bands (p = 0.470). Similar to LF EEG however, the 4-8 Hz band was reconstructed better than the 8-15 Hz band (p = 0.026). While this pattern of results again suggests the possibility that LF and HGP EEG are differentially sensitive to different aspects of the envelope, we are reluctant to overinterpret this, particularly in light of the generally low reconstruction scores for HGP EEG across most subjects.

FIGURE 3 | (A)
The mean (± SEM) reconstruction accuracies for subjects 2, 3, 10, and 11 using the LF, HGP, and LF+HGP models. Significance was calculated using Bonferroni corrected one-way repeated measures analysis of variance (ANOVA) tests (n.s., no significance, *p ≤ 0.05, **p ≤ 0.01, and ***p ≤ 0.001). (B) Average reconstruction accuracy of filtered speech (1-4 Hz, 4-8 Hz, and 8-15 Hz bands) using the LF and HGP EEG. The black crosshairs are mean reconstruction accuracies and the red crosshairs are outliers. Significance was found using a two-way repeated-measures ANOVA test. (C) Syllable onset correlations of the actual speech envelope in comparison to the reconstructed speech envelope for the LF and HGP conditions. Paired t-tests were used to calculate significance.

LF and HGP Responses Exhibit Different Spatiotemporal Characteristics to Speech
Our stimulus reconstruction analysis leads us to tentatively suggest that LF and HGP EEG may carry complementary information. To investigate this further, we wanted to examine if there was any evidence that our HGP and LF responses may be derived from different neural generators. To do this, we focused on subjects who showed robust HGP responses and the distribution of decoder weights across the scalp for their LF and HGP decoders. However, as decoder channel weights cannot be interpreted neurophysiologically (Haufe et al., 2014), we transformed the weights into the forward modeling space using Haufe et. al's (2014) inversion procedure.
With this information, we compared the spatiotemporal profile of the LF and HGP EEG activity for subjects 2, 3, 10, and 11. The left panel of Figures 4A-D depicts the spatial dynamics of the forward transformed models (temporal response function or TRF) at various time lags. The LF TRF topographies appeared fairly typical (Crosse et al., 2016) as they alternated in positivity and negativity across time on the frontocentral area of the scalp (The differences in subject 2's topographies may be due to noise and/or a DC shift in this person's EEG data). On the other hand, their HGP activity displayed a strikingly different distribution with no prominent focus over the frontocentral scalp. These results suggest the possibility that the LF and HGP signals we see may have non-identical neural generators and further supports the idea that they may carry complementary information.
TRFs can also provide an example of how the speech envelope is transformed into neural responses over time at specific electrodes. The right panel of Figure 4 shows the forward transformed TRFs for subjects 2, 3, 10, and 11 at the electrode indicated in their topographies. The TRFs at the given electrodes are highly correlated for subject 10 (r = 0.872, Pearson's correlation), moderately correlated for subjects 3 and 11 (r = 0.649 r = −0.649), and weakly correlated for subject 2 (r = −0.222). Once again, this weak correlation in subject 2 may be due to noisy EEG, especially since this person had very weak reconstruction accuracies. While there are differences in the time course between LF and HGP TRFs-again supporting the notion that the neural generators might differ-the general timing of the two TRFs is similar for these subjects (except subject 2) supporting the notion that the HGP TRFs, we see in these subjects are capturing real responses to our speech stimuli.

HGP Also Improves the Decoding of Auditory Attention in Some Subjects
It has previously been shown that the LF envelope tracking response is modulated by attention (Ding and Simon, 2012) and that single-trial data from a task in which subjects attend to one of two concurrent talkers can be decoded to ascertain attentional selection . Given our finding that HGP contains informative temporal envelope information for a subset of subjects, we tested whether this signal is similarly modulated by attention and if it could be exploited to improve our ability to decode attention in multi-speaker situations. This was tested on a separate group of subjects (N = 14) from those used in the single speaker paradigm. The speakers were separated in space; the subjects attended to the speaker on the left in 50% of the trials and attended to the speaker on the right in 50% of the trials.
Here, we examine decoding accuracy which represents the percentage of trials in which the reconstructed stimuli were more correlated with the attended stream rather than the unattended stream (r attended > r unattended ) for LF, HGP, and LF+HGP EEG signals. Exploring this on a group-level showed a similar trend as our initial envelope tracking results from Figure 2A. The decoding accuracy was significantly larger than the chance for each model (p = 0.001, permutation test; Figure 5A). The HGP decoder was able to decode auditory attention, but it did not perform as well as the LF and LF+HGP decoders (p = 1.314e-04, p = 1.789e-03, one-way repeated measures ANOVA with Bonferroni corrections). The LF and LF+HGP decoders, however, performed similarly (p = 0.216).
We also examined how well we were able to decode auditory attention for individual subjects and found two whose decoding improved with HGP or LF+HGP EEG signals. Figure 5B shows the decoders that were able to track the temporal dynamics of the attended speaker significantly better than chance (p ≤ 0.05, permutation tests). The best attention decoding accuracy for subject 13 was achieved when using LF+HGP EEG signals (95% decoding accuracy), whereas subject 10's best decoding accuracy was achieved using HGP EEG (70%). Interestingly, most subjects saw no improvement in decoding accuracy when LF and HGP EEG were combined, and some performed worse. The decrease in decoding performance may be due to the limited amount of EEG data available for each subject. Intuitively, given enough information, the models should learn to up-weight informative EEG channels and down-weight uninformative channels so that decoding does not worsen overall. In our case, we have included LF EEG and much noisier HGP EEG (resulting in 256 channels rather than 128) and must consider competing speech streams. The models may not have been given enough stimulus and response data to learn to down-weight the uninformative HGP EEG channels.

DISCUSSION
Our present research investigated the extent to which the speech envelope and attentional selection are reflected in LF phase and HGP EEG signals. Given the success of using HGP to track the dynamics of speech in ECoG studies, we wanted to examine if any useful high-frequency EEG activity remained even after being smeared by brain tissue and filtered by the skull. In this study, linear regression techniques were used to map between neural responses and the acoustic envelope of speech. Our results demonstrate that HGP EEG activity is capable of carrying information regarding the speech envelope and attention and that this information may be complementary to that of LF EEG, as shown by some of our subjects.
Previous studies using EEG techniques to study gamma band activity in the human auditory system typically examined activity around 40 Hz (Jokeit and Makeig, 1994;Krause et al., 1998;Gurtubay et al., 2001;Hald et al., 2006). A more recent study examined the high gamma range and found no significant speech envelope tracking (Viswanathan et al., 2019). This lack of success in finding/using higher frequency EEG could be because HGP is generated by highly focal sources (Jerbi et al., 2009) which in turn produce lower amplitudes on the scalp (Nunez and Srinivasan, 2006;Jerbi et al., 2009), making it more difficult to detect using EEG. This poor signal-to-noise ratio (Crone et al., 2001), along with EEG's low spatial resolution, and high sensitivity to muscle artifacts (Llorens et al., 2011) has meant relatively few studies have focused on HGP.
In our study, we have shown that HGP, though typically weaker than LF EEG, can consistently track the temporal dynamics of speech. In some subjects, HGP EEG alone could decode the speech envelope and auditory attention best, and in others, combining LF and HGP neural signals further improved decoding. Akbari et al. (2019) also found that combining LF and HGP ECoG signals better reconstructed the speech envelope for their BCI system. Similarly, a cocktail party attention study showed that both LF phase and HGP ECoG signals can track the envelope of the attended speaker. The authors also suggested that combining the two may optimize attention encoding (Zion Golumbic et al., 2013).
The improvements in decoding/reconstruction for some subjects when LF and HGP EEG were combined-along with their notably differing topographies-suggested that LF and HGP EEG may carry complementary information. To investigate this further, we examined whether LF and HGP EEG may be sensitive to different aspects of the speech envelope. The specific features we examined were a proxy measure of syllable onsets and various frequency bands of the speech envelope. Neither analysis provided clear evidence of a dissociation, although both analyses hinted at possible differential sensitivities, with HGP being relatively less sensitive to syllable onsets and differences in 1-4 Hz vs. 4-8 Hz envelope features. Given the generally low fidelity of HGP tracking across subjects, however, we do not wish to overstate these exploratory results. Indeed, an alternative might be that LF and HGP EEG carry similar information, and their combination may improve decoding/reconstruction simply because having access to the two measures improves FIGURE 5 | (A) Grand average decoding accuracies across trials and subjects for the LF, HGP, and LF+HGP decoders. The black crosshairs are the mean accuracies. Significance was calculated using a Bonferroni corrected one-way repeated measures ANOVA test (n.s., no significance, *p ≤ 0.05, **p ≤ 0.01, and ***p ≤ 0.001). (B) The mean (±SEM) decoding accuracies for all subjects using the LF, HGP, and LF+HGP models. Significance was calculated using permutation tests.
the signal to noise ratio of speech tracking. However, using measures of synergy, Belitski et al. showed that there is very little redundant information carried between LF and high-frequency neural signals (Belitski et al., 2010). Previous studies have also provided further support for interactions between HGP and LF phase neurophysiological signals during sensory processing (Bruns and Eckhorn, 2004;Canolty et al., 2006;Osipova et al., 2008;Voytek et al., 2010) and for the notion that speech encoding may involve the combination of the two (Nourski et al., 2009;Zion Golumbic et al., 2013). Future work either with more subjects or with both EEG and ECoG may be needed to resolve this issue.
The TRF weightings of subjects in whom we were able to detect high gamma electrophysiology displayed different spatial patterns for LF and HGP responses. HGP and LF signals are said to originate from different locations in the brain (Crone et al., 2001;Edwards et al., 2009), supporting our findings in the left panel of Figure 4. Studies also suggest that HGP is mainly localized to the superior temporal gyrus (Crone et al., 2001;Towle et al., 2008;Sinai et al., 2009) in contrast to LF activity which is more spatially distributed across temporal and some frontal and parietal regions of the brain (Crone et al., 2006;Canolty et al., 2007;Zion Golumbic et al., 2013). Although neural signals are generally spatially smeared in EEG measures, we still saw differences in the scalp patterns which may be indicative of different sources of LF and HGP activity. Of course, we need to be somewhat circumspect here, because our contrasting scalp patterns may have only arisen due to differences in the biophysics of how signals at high and low frequencies project to the scalp (Buzsáki et al., 2012). Indeed, the broadly similar timing of our LF and HGP TRFs (Figure 4) might support that notion. The abovementioned ECoG work, however, ultimately supports the idea of different generators.
We saw robust stimulus reconstruction and attention decoding using HGP EEG, but only in certain subjects. This could be attributed to hard and soft tissues filtering the source potentials, making them more difficult to measure from the scalp (Nunez and Srinivasan, 2006). Cortical folding patterns also differ between subjects which can cause certain scalp projections to vary in strength and location (Onton and Makeig, 2006). Our attempt to predict which subject would show good HGP speech tracking based on their raw HGP EEG power was unsuccessful (although, as mentioned above our measure of raw HGP EEG power could only be calculated during the presentation of the speech stimulus). A recent study found significant low gamma power EEG (in addition to delta and theta band EEG) tracking of an attended speaker across subjects but did not find the same in HGP EEG. While they showed substantial individual differences in their measures for delta, theta, and low gamma tracking, they do not report these results for HGP. This study utilized a 32-electrode cap and suggested that future studies use high-density recordings to elucidate the between-subject differences (Viswanathan et al., 2019). Here, we more fully explored individual differences in high gamma responses using high-density recordings (128-channel EEG). Future work should consider using MRI/fMRI and modeling to investigate the idea that robust HGP tracking might be related to how auditory cortex projects to the scalp in individual subjects.
In terms of the variability of HGP performance across subjects, we also noticed a few unusual patterns for specific subjects. For example, the HGP reconstruction for subject 10 was remarkably high alongside fairly typical LF tracking. This result caused us to worry initially about electrical leakage from the headphones, but there are several reasons why we are confident that this is not the case. First, we do not see high HGP in every subject, so it is not a systematic issue with our setup. Second, if the effect was coming from electrode leakage it would peak at the 0 ms delay in the TRF, which is not the case (Figure 4); the fact that the TRF peaks after 0 ms suggests it is cortical. Third, if the effect was driven by electrode leakage in just a handful of subjects (say, because of the way the cap and electrode gel were applied), it would be surprising for it to be bilateral in all of those subjects. That would suggest that the misapplication of the gel was bilateral for each subject that showed the effect, but this seems implausible. Fourth, if the effect was driven by leakage from the headphones, we might expect to see lateralization in the cocktail party data given the use of head-related transfer functions for the cocktail party attention study. We do not see this. Finally, if the effect was leakage from the electrodes, we should not see successful HGP attention decoding for any subject.
Gamma activity assessed through non-invasive means has been shown to play a role in a variety of neural processes such as working memory (Tallon-Baudry et al., 1998;Howard et al., 2003;Mainy et al., 2007;Roux and Uhlhaas, 2014), motor and sensorimotor function (Medendorp et al., 2007;Ball et al., 2008;Cheyne et al., 2008), and visual processing (Adjamian et al., 2004;Hoogenboom et al., 2006;Fründ et al., 2007). Here, we show that HGP also has value when studying speech processing and auditory selective attention-albeit in a minority of subjects. In these subjects, high gamma activity supplemented lower frequencies to increase the sensitivity to speech and attentionrelated processes. Therefore, it is worth investigating HGP in all subjects as this increase in sensitivity could be beneficial, for instance, for the use of future individually tuned EEG-enabled hearing devices.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committees of the School of Psychology at Trinity College Dublin, Health Sciences Faculty at Trinity College Dublin, and the Research Subjects Review Board at the University of Rochester. The participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
ET collected the cocktail party data. SS and ET analyzed the data. SS, ET, and EL interpreted the data and wrote the article.

ACKNOWLEDGMENTS
This article has been released as a Pre-Print on bioRxiv (Synigal et al., 2019). We would also like to thank Dr. Gregory Cogan for his advice and useful discussions.