Widespread Brain Areas Engaged during a Classical Auditory Streaming Task Revealed by Intracranial EEG

The auditory system must constantly decompose the complex mixture of sound arriving at the ear into perceptually independent streams constituting accurate representations of individual sources in the acoustic environment. How the brain accomplishes this task is not well understood. The present study combined a classic behavioral paradigm with direct cortical recordings from neurosurgical patients with epilepsy in order to further describe the neural correlates of auditory streaming. Participants listened to sequences of pure tones alternating in frequency and indicated whether they heard one or two “streams.” The intracranial EEG was simultaneously recorded from sub-dural electrodes placed over temporal, frontal, and parietal cortex. Like healthy subjects, patients heard one stream when the frequency separation between tones was small and two when it was large. Robust evoked-potential correlates of frequency separation were observed over widespread brain areas. Waveform morphology was highly variable across individual electrode sites both within and across gross brain regions. Surprisingly, few evoked-potential correlates of perceptual organization were observed after controlling for physical stimulus differences. The results indicate that the cortical areas engaged during the streaming task are more complex and widespread than has been demonstrated by previous work, and that, by-and-large, correlates of bistability during streaming are probably located on a spatial scale not assessed – or in a brain area not examined – by the present study.

of both A and B tones and that when ∆F is large, they hear the sequence as two isochronous streams, one of A tones and one of B tones (Miller and Heise, 1950;van Noorden, 1975; see http:// web.mit.edu/∼adykstra/Public/streaming_demo.wav for a demo). Interestingly, percepts evoked by sequences with intermediate ∆F are bistable (i.e., can be heard as either one stream or two) and can switch between two stable states, either spontaneously or with effort (van Noorden, 1975;Anstis and Saida, 1985;Carlyon et al., 2001).
Recent interest in the neural underpinnings of auditory streaming has produced several studies using ABA tone sequences while recording from the auditory cortex in a variety of species including insects (Schul and Sheridan, 2006), fish (Fay, 1998(Fay, , 2000, bats (Kanwal et al., 2003), songbirds Klump, 2004, 2005;Klump, 2009, 2010;Bee et al., 2010), ferrets (Elhilali et al., 2009), non-human primates (Fishman et al., 2001(Fishman et al., , 2004Micheyl et al., 2005), and humans (Sussman et al., 1999;Deike et al., 2004Deike et al., , 2010Cusack, 2005;Gutschalk et al., 2005Gutschalk et al., , 2007Snyder et al., 2006;Snyder and Alain, 2007a;Wilson et al., 2007;Kondo and Kashino, 2009;Schadwinkel and Gutschalk, 2010a,b). A prevailing model from these studies posits that a two-stream percept will be evoked whenever the A and B tones excite non-overlapping populations of IntroductIon The auditory system is constantly faced with the challenge of decomposing the complex mixture of sound arriving at the eardrums into an accurate representation of the acoustic environment. This decomposition, termed auditory scene analysis (ASA, Bregman, 1994), is critical for survival and communication and its failure is a common symptom reported by elderly individuals and those with sensorineural hearing loss. Despite its importance in daily life, the neural mechanisms of auditory scene analysis remain unclear (Carlyon, 2004;Micheyl et al., 2007;Snyder and Alain, 2007b;Elhilali and Shamma, 2008;Nelken and Bar-Yosef, 2008;Bidet-Caulet and Bertrand, 2009;Winkler et al., 2009;. One aspect of ASA -auditory streaming (the segregation of time-varying acoustic energy into distinct perceptual objects) -can be studied in a controlled setting using sequences of pure-tone triplets of the form ABA-ABA- (Miller and Heise, 1950;van Noorden, 1975;Bregman, 1994), where A and B denote tones of different frequencies separated by a silent gap ( Figure 1A). Many psychophysical studies dating back to the 1950s have shown that when the frequency separation (∆F) between the A and B tones is small, listeners hear the sequence as a single stream comprised Widespread brain areas engaged during a classical auditory streaming task revealed by intracranial EEG neurons (but see Elhilali et al., 2009). However, inherent limitations in previous work related to spatiotemporal resolution, sparsity of coverage, and lack of direct behavioral measures in experimental animals preclude straight-forward interpretation. A general extension of this model is schematized in Figure 1B. Specifically, a parametric variation of a given stimulus or stimulus feature could produce neural activity patterns which vary linearly or categorically as shown by the blue and red curves, respectively. Noise in the response of a population showing a linear relationship with the stimulus, when fed to a population showing a more categorical relationship, could engender sufficient trial-to-trial variability for bistable perception. While such activity patterns have been widely reported in vision (for reviews see Logothetis, 1998;Leopold and Logothetis, 1999;Sterzer et al., 2009), only limited evidence for such a mechanism exists in the auditory system (Cusack, 2005;Gutschalk et al., 2005Gutschalk et al., , 2008Kondo and Kashino, 2009).
Here, we report the results from experiments in which direct cortical recordings were made from widespread brain areas of neurosurgical patients with epilepsy (Engel et al., 2005) while they participated in a classical auditory streaming paradigm. Our aims were to better characterize the neurophysiological correlates of auditory streaming, extend them into brain areas outside the auditory cortex and frequency regions less observable with noninvasive measure (Crone et al., 2001), and test the idea of neuronal variability as a mechanism for perceptual bistability in the auditory modality (Almonte et al., 2005;Moreno-Bote et al., 2007;Deco and Romo, 2008;Gigante et al., 2009;Shpiro et al., 2009) by comparing evoked responses to physically identical stimuli when they were perceived as one vs. two streams. Our participants listened to ABA tone sequences and indicated at the end of each sequence whether they were hearing one or two streams at the end of the sequence. For each electrode sampled in a given patient, we compared responses across ∆F conditions as well as perceptual report in an attempt to identify correlates of both during a classical auditory streaming task. We hypothesized that when a participant perceived one (two) stream(s), the evoked response would be similar to those conditions which consistently engender a one-stream (two-stream) percept. Responses from widespread brain areas showed robust correlates with ∆F but, surprisingly, rarely differed based on percept per se.

MaterIals and Methods ethIcs stateMent
All procedures were approved by the Institutional Review Boards at Partners Healthcare (MGH and BWH), the New York University (NYU) Langone Medical Center, and the Massachusetts Institute of Technology (MIT) in accordance with NIH guidelines. Written informed consent was obtained from all patients prior to their participation.

lIsteners
Twelve patients with intractable epilepsy underwent invasive monitoring in order to localize the epileptogenic zone prior to its surgical removal. Each patient was implanted with an array of sub-dural platinum-iridium electrodes embedded in silastic sheets (2.3 mm exposed diameter, 10 mm center-to-center spacing; Ad-tech Medical, Racine, WI, USA) placed directly on the cortical surface. Prior to implantation, each patient underwent high-resolution T1-weighted MRI. Subsequent to implantation, patients implanted at Massachusetts General Hospital (MGH) and Brigham and Women's Hospital (BWH) underwent post-operative computerized tomography (CT); patients implanted at NYU underwent post-operative MRI. Electrode coordinates obtained from post-operative scans were co-registered with preoperative MRI and overlaid onto the patient's reconstructed cortical surface using FreeSurfer (Dale et al., 1999;Fischl et al., 1999a) and custom MATLAB (The MathWorks, Framingham, MA, USA) scripts (Dykstra et al., under review; Wang et al., personal communication, Comprehensive Epilepsy Center, NYU School of Medicine). Electrode coordinates were then projected onto the FreeSurfer average brain using a spherical registration between the individual's cortical surface and that of the FreeSurfer average (Fischl et al., 1999b). The data from three patients were excluded from analysis due to excessive noise caused by technical malfunction; the data reported here were from the remaining nine patients (Table 1).

Figure 1 | Behavioral paradigm and conceptual model. (A)
Schematic illustration of the alternating-tone stimuli used in the experiment and how those stimuli are perceptually organized by the listener. The frequency of the B-tone was held constant at 1000 Hz and the frequency separation between the A-and B-tone varied between 0 and 12 semitones, resulting in A-tone frequencies between 500 and 1000 Hz. (B) Conceptual model of varying neural responses to parametric manipulation of the acoustic parameter (frequency separation). A linear variation of the neural response is to be expected if that response is coding the stimulus parameter, whereas a sigmoidal (i.e., categorical) response is to be expected if the response is coding the percept directly. Intracranial EEG data were bandpass filtered offline between 1 and 190 Hz and notch filtered at 60 Hz and its harmonics using zerophase shift FIR filters. Independent component analysis using the runica algorithm (Bell and Sejnowski, 1995) in EEGLAB (Delorme and Makeig, 2004) was performed on the "raw" data. Components dominated by large artifacts were identified and removed by inspection. The component data were then back-projected in order to remove the artifacts from the original data. The iEEG was epoched relative to the onset of sound sequences (yielding long epochs encompassing the entire sequence) as well as to the onset of individual ABA triplets (yielding short epochs of 0.5 s) and binned with respect to either ∆F or perceptual report within a given ∆F. For triplet-locked epochs, the first triplet in each sequence was discarded. Epochs were baseline corrected with respect to either the 500-ms preceding sequence onset (for sequence-locked epochs) or the 50-ms preceding triplet onset (for triplet-locked epochs). Epochs containing large artifacts were rejected automatically using joint probability and kurtosis algorithms in EEGLAB (Delorme et al., 2007). Specifically, trials with joint probabilities or kurtosis values more than four and five SDs from the normalized mean of these measures, respectively, were rejected as artifact. Additional epochs found to contain large epileptiform activity were rejected by visual inspection.

statIstIcal analysIs
A modified version of the cluster-based, non-parametric statistical procedure outlined by Maris and Oostenveld (2007) was used to test for effects of ∆F and bistability on triplet-locked EP amplitude. Spearman (non-parametric) rank correlation (in the case of a multiple-level factor, e.g., ∆F) and unpaired t-test (in the case of two-level factors, e.g., percept) were used as the sample-level (i.e., individual time point within a single channel) statistics in order stIMulI and procedure Stimuli were long sequences of pure-tone triplets of the form ABA-ABA-..., where A and B represent individual tones and the dash represents a silent gap ( Figure 3A). Each tone was 100 ms in duration with 10 ms raised-cosine on-and off-ramps. The inter-stimulus interval (ISI) between the first A-tone and B-tone, as well as between the B-tone and second A-tone, was 25 ms; the ISI between the second A-tone and subsequent triplet was 150 ms. Stimulus onset asynchrony (SOA) between successive A tones was 250 ms; SOA between successive B tones was 500 ms; triplet onset asynchrony was also 500 ms. Total duration of each sequence varied between 6.5 and 10 s (13 and 20 triplets, respectively) depending on the listener (for P1-P5, duration varied between 6.5 and 7.5 s; for P6-P9, duration was 10 s). The B-tone frequency was fixed at 1 kHz. The A-tone frequency varied between 0 and 12 semitones below the B-tone. Listeners P1, P2, P3, P4, and P5 participated in conditions in which the frequency separation was 0, 5, 6, 7, or 12 semitones, where 1 semitone is an approximately 6% frequency difference. Listeners P6, P7, P8, and P9 participated in conditions in which the frequency separation was 0, 2, 4, 6, 8, 10, or 12 semitones. Each patient listened to between 200 and 378 triplets for a given frequency separation. All sounds were generated digitally in MATLAB, stored as .wav files, and converted to analog waveforms by the on-board soundcard of a laptop equipped with Presentation software (Neurobehavioral Systems, Albany, CA, USA). Stimuli were presented at a comfortable listening level via Etymotic ER-2 insert earphones (Etymotic Research, Inc., Elk Grove Village, IL, USA), diotically (when possible) or monaurally contralateral to the hemisphere of implantation. Patients were instructed to listen to the sounds and to indicate at the end of each sequence whether, at the end of the sequence, they were hearing a single "stream" comprised of all tones or two "streams," one comprised of A tones and the other of B tones. Responses were made by button press with a response box (Cedrus Corporation, San Pedro, CA, USA) interfaced with Presentation via USB. Response windows were unconstrained, and the subsequent stimulus began 1 s after a response to the previous stimulus was entered.

data acquIsItIon
Intracranial EEG (iEEG) data at MGH and BWH were acquired with standard clinical EEG monitoring equipment (XLTEK, Natus Medical Inc., San Carlos, CA, USA) at a sampling rate of 250 Hz (P1) or 500 Hz (P2,P3,P6,P8). At NYU, iEEG data were acquired with a customized system at a sampling rate of 30 kHz Except for trials from the 0-semitone condition, the template was defined as the average EP for the 0-semitone condition. The template to which individual trials from the 0-semitone condition were compared was the average EP from the 0-semitone condition including all waveforms but the one from the trial i ("leave one out"). This index provides a measure of how dissimilar two waveforms are from each other. Although this index is biased to show a significant correlation with ∆F, it provides a means to (i) collapse waveforms across individual electrode sites and patients into a single quantitative metric and (ii) quantitatively compare responses to one-vs. two-stream percepts in a way that circumvents variable latencies and durations of percept-or ∆F-based effects across sites.

hIgh-gaMMa-power
Waveforms of high-gamma-power were constructed using the wavelet transforms built into EEGLAB (specifically, the newtimef function). Sequence-length (between 6.5 and 10 s) epochs were used to compute the event-related spectral perturbation (ERSP) which was baseline corrected to the 500-ms preceding stimulus onset. The number of wavelet cycles used varied logarithmically with respect to frequency from three cycles at the lowest frequency tested (5 Hz) to 10 at the highest (190 Hz), yielding approximate temporal resolution of <500 ms at 8 Hz and <125 ms in the gamma-band. High-gamma-power waveforms were constructed by summing the power in frequencies from 80-190 Hz for each time point in the full time-frequency representation. These waveforms were then baseline corrected by subtracting the mean power in each trial computed across the 500-ms preceding stimulus onset. Triplet-locked gamma-power epochs were constructed by time-locking with respect to each triplet onset and subsequently binned across the various ∆F and percept conditions in the same way as the evoked potentials. The same statistical procedures described above were applied to the high-gamma waveforms.

results
Twelve patients with intractable epilepsy listened to sequences of alternating pure tones ( Figure 1A) and indicated at the end of each sequence whether, at the end of the sequence, they were hearing the tones as grouped ("1 stream") or segregated ("2 streams") while we simultaneously recorded the intracranial EEG (Figure 2). Three patients were excluded from analysis for technical reasons (see Materials and Methods). Summed across the remaining nine patients (Table 1), we recorded from nearly 700 electrodes in the left hemisphere and 250 electrodes in the right hemisphere, mostly on lateral cortex of the temporal, frontal, and parietal lobes ( Figure 2E). Figure 3 shows the probability of hearing two streams as a function of ∆F averaged across all nine patients included in the analysis. Patients reported hearing a single stream when the ∆F was small and two streams when ∆F was large. At intermediate ∆F, the percept was bistable, i.e., patients sometimes reported hearing one stream and sometimes reported hearing two streams. A Kruskal-Wallis test confirmed a main effect of ∆F (χ 2 (1,8) = 34.1; p < 0.0001).

BehavIor
to evaluate possible effects of ∆F (five levels for P1-P5 and seven levels for P6-P9) and bistability (always two levels), respectively. Contiguous, statistically significant samples (defined as p < 0.05) within a single electrode were used to define the cluster-level statistic, which was computed by summing the sample-level statistics within a cluster. Statistical significance at the cluster-level was determined by computing a Monte Carlo estimate of the permutation distribution of cluster statistics using 1000 re-samples of the original data (Ernst, 2004). For multiple-level factors (∆F), the estimate of the permutation distribution was performed by 1000 re-samples of the condition labels associated with each level in the factor. Within a single electrode, a cluster was taken to be significant if it fell outside the 95% confidence interval of the permutation distribution for that electrode. The determination of significant clusters was performed independently for each electrode. This method controls the overall false alarm rate within an electrode across time points; no correction for multiple comparisons was performed across electrodes.
Due to the known buildup effects of auditory streaming (i.e., 2-stream percepts become more likely as time since sequence onset increases and the fact that listeners only reported what they heard at the end of each stimulus sequence, two independent analyses were carried out. The first used only the data from the second half of each sequence while the second used all data after removing the onset response (0-0.5 s after stimulus onset). The method of analysis did not effect the results, and only the results from the second analysis are shown.

dIssIMIlarIty Index
In order to further evaluate possible effects of perceptual bistability on the evoked waveforms, we computed a dissimilarity index between waveforms from individual trials and a template waveform within individual channels in which significant EP-∆F correlations were found. Qualitatively, this index is defined as the difference between the sum-squared error (SSE) computed for the condition of interest (i.e., a specific ∆F or percept) and the minimum SSE computed across all conditions, normalized by the difference between the maximum SSE and minimum SSE computed across all conditions. The index was computed by normalizing the average SSE between the trial and the template, as follows: where X 0 is the template waveform and X ij is the individual-trial waveform for trial i in condition j, t is the individual time point, and T is the overall number of significant time points in condition j. The average SSE for condition j was computed as: where N is the number of trials.
The index was then defined as:  (Figures A3 and A4). As can be seen, waveform morphology was complex and highly variable between different electrode sites, yet evoked responses in varying time windows and spatial positions robustly correlated with ∆F. The majority of sites which showed strong correlations with ∆F were over or adjacent to the posterior superior temporal gyrus (pSTG). However, several other sites also showed responses which correlated with ∆F. The sites which showed significant ∆F correlations across all nine patients included in the analysis are summarized in Figure 5, where electrode sites from each individual have been overlaid onto a template brain by spherical surface registration of each patient's pial surface with that of the FreeSurfer average (see Materials and Methods). Across patients, a widespread set of brain areas showing significant correlations with ∆F included pSTG (as was expected), middle temporal gyrus, pre-and post-central gyri (mainly ventrally), inferior and middle frontal gyri, and the supra-marginal gyrus.

evoked potentIals: BIstaBle perceptIon
After having established significant correlations with a physical stimulus parameter (∆F) known to produce changes in perceptual organization, we explicitly tested whether the same electrode sites showed significant triplet-locked EP differences based solely on how the sequences were perceptually organized (i.e., we compared EPs between sequences perceived as one stream vs. two streams within a given ∆F condition). For a given ∆F, responses were binned and averaged according to whether the listener reported hearing one or two streams. As for the analysis testing for effects of ∆F, two analyses were carried out; one using only the responses from the second half of each sequence and the other using responses from the entire sequence, expect for the first. Only the results from the second analysis are presented here. The results of this analysis for individual peri-STG sites across all nine patients are shown in Figure 6. The sites, overlaid onto each individual's pial surface as shown in the top row, were chosen based on the fact that each showed a significant evoked potentIals: ∆F In order to assess putative correlates of streaming, we tested for correlations between triplet-locked evoked-potential (EP) amplitude and ∆F which, when parametrically varied, produced changes in how the sequences were perceptually organized. In light of the known effects of perceptual buildup in streaming tasks, two analyses were carried out: one using only the triplet-locked responses from the second half of each sequence and another using all the responses to all triplets save for the first (see Materials and Methods). The results did not differ based on which analysis was used, thus only the second analysis is reported here. Significant correlations were determined by cluster-based non-parametric permutation statistics (Materials and Methods). Figure 4 shows the average tripletlocked evoked responses across an 8 × 8 grid of electrodes for the different ∆F conditions from a single patient (P4). The positions Electrode coordinates from all nine participants in the study were co-registered and overlaid onto the FreeSurfer average surface. In total, we sampled from nearly 1000 sites, mostly over lateral cortex.

Figure 3 | Behavioral results.
Subjects heard one stream when the frequency separation was small and two when the frequency separation was large. Intermediate frequency separations perceptually bistable, i.e., perceived either as one or two streams. Error bars represent the SE of the mean across participants. 1 semitone = 8% frequency separation.

Dykstra et al. Direct cortical recordings during streaming
Frontiers in Human Neuroscience www.frontiersin.org but this effect was inconsistent across the multiple ∆F conditions for which a bistable percept was evoked. In summary, several brain areas both within and outside of the auditory cortex showed evoked responses that significantly correlated with ∆F but not conscious perceptual organization.

dIssIMIlarIty analysIs
In order to further evaluate whether sites showing significant EP-∆F correlations also showed correlates of perceptual bistability, we carried out a dissimilarity analysis using the grand-average triplet-locked response to the 0-semitone condition as the template. Responses from each ∆F condition were binned according to percept as well as collapsed across them and compared to the template by SSE (see Materials and Methods). Our hypothesis was that responses from conditions with greater ∆F -as well as responses from trials in which the subject reported hearing two streamswould show a larger "dissimilarity index" computed from the SSE between the response of interest and the template. Figure 7 shows the results of this analysis. The value of the dissimilarity index increased as ∆F increased (Spearman's rho = 0.46, p < 0.0001) and, across all channels from all patients, showed a marginally significant difference based on percept alone (W+ = 3427, p = 0.097) in the expected direction (i.e., greater dissimilarity indices for 2-vs. 1-stream percepts), suggesting a propensity for activity during correlation with ∆F and was the site with largest triplet-locked RMS power in the vicinity of the pSTG. Responses to sequences that were perceptually bistable [defined as: 0.3 ≤ P(2-stream percept) ≤ 0.7] are shown by the blue (1-stream percepts) and red (2-stream percepts) traces; otherwise, traces are black. As can be seen, EP morphology was highly variable across individual subjects. Waveforms changed significantly as a function of ∆F as determined by Monte Carlo permutations using Spearman rank correlation as the sample-level statistic (see Materials and Methods), but, surprisingly, did not show significant differences based on percept per se. Across all the channels in the study, there were individual channels which showed significant differences based on percept,   post-central gyrus in S1 (not shown). None of the four channels which showed significant percept-based differences in the dissimilarity index showed significant differences in the waveforms when evaluated directly. A complementary analysis was carried out using the grandaverage triplet-locked response collapsed across all conditions as the template (Figure A1 in Appendix). Averaged evoked responses from each ∆F condition were binned according to percept as well as collapsed across them and compared to the template by SSE. Using this analysis, the dissimilarity index increased as ∆F increased [χ 2 (8,46) = 200.47, p < 0.0001] but, across all channels from all patients, did not differ based on percept alone (W+ = 1076, p = 0.33), confirming a significant main effect of ∆F and lack of a significant main effect of percept.

gaMMa-power analysIs
Two sets of triplet-locked high-gamma (80-190 Hz) power waveforms were constructed using either (i) wavelet transforms or (ii) analytic signal methods (see Materials and Methods). These waveforms were subjected to the same Monte Carlo permutation statistics as the triplet-locked evoked potentials to test for effects of either ∆F or percept. No significant effects were found (Figure A2 in Appendix).

dIscussIon
Combining a classical behavioral paradigm using long sequences of tones alternating in frequency and direct cortical recordings in humans, the present results demonstrate a widespread set of brain areas -mainly in posterosuperior temporal and peri-rolandic 2-stream percepts to be more similar to activity evoked by large ∆F conditions. However, a sufficient number of channels (23%) showed the opposite pattern so as to limit the statistical significance of the effect. Individually, across all sites which showed a significant correlation with ∆F (N = 44), four channels showed significant effects of percept on the dissimilarity index in the expected direction, while none showed a significant effect in the opposite direction. Three of those channels were from S4 [G30, G37, and a site over the left posterior STG (not shown)] whose data are shown in Figure 4, and the fourth was from a site over the inferior Figure 6 | evoked potentials from individual peri-Sylvian electrode sites in each of the nine subjects. Blue and red traces for a given frequency separation and subject indicate that the percept for that condition was bistable (*, this patient did not understand the task). Waveforms traced in black indicate that the percept for that condition was unstable. Electrode sites, shown in the top row over each subject's cortical reconstruction, were chosen based on their having the largest RMS power grand-average triplet-locked evoked response in the vicinity of the superior temporal gyrus. The frequency separation (∆F, semitones) for each set of waveforms is indicated in the left-most column. The timing of individual tones in the triplet is shown in the bottom row. MEG, represents a spatially smoothed version of the true cortical source configuration (Halgren, 2004;Ahlfors et al., 2010), and does not tend to see brain activity having response variability with high spatial frequency, contrary to the locally generated signals measured by intracranial EEG.

the role oF extra-audItory areas In streaMIng
The present study is the first to report brain activity from extraauditory cortical areas with high temporal resolution during the streaming paradigm. As shown in Figures 5 and 6, evoked potentials from several widespread brain areas correlated with ∆F. Waveform morphology was spatially variable both across and within macroscopic brain areas (though consistent across trials), even within individual participants, suggesting that (1) areas outside the auditory cortex may play an as-yet undetermined role in streaming and (2) the role of a given macroscopic brain area may not be uniform, known issues of ERP variability notwithstanding (e.g., Edwards et al., 2009). While several authors have posited a role for areas outside the classically defined auditory pathway in streaming Snyder and Alain, 2007b;Bidet-Caulet and Bertrand, 2009;Elhilali et al., 2009), nearly all neurophysiological studies of streaming have focused exclusively on the auditory cortex (but see, Cusack, 2005;Pressnitzer et al., 2008;Kondo and Kashino, 2009). Only two previous studies examined whole-brain activity during the streaming paradigm (Cusack, 2005;Kondo and Kashino, 2009). Cusack (2005), using a perceptually bistable sequence of tones similar to those used in the present study, reported increased BOLD activity in the intraparietal sulcus during 2-stream vs. 1-stream percepts, but did not report percept or ∆F-based differences in the auditory cortex. The present study could not assess the intraparietal sulcus given that (i) the sub-dural electrodes used were confined to superficial gyri and (ii) the lead field of sub-dural electrodes is unlikely to measure activity from as deep in the sulcus as the foci reported by Cusack. Studies utilizing methods with high temporal resolution (e.g., MEG, iEEG, or microelectrodes in experimental animals) focusing on this region could elucidate it is precise role in streaming and auditory perceptual organization more generally (e.g., Rauschecker and Scott, 2009;Teki et al., 2011). Given the results of the present study as well as previous work (Fishman et al., 2004(Fishman et al., , 2001Klump, 2004, 2005;Gutschalk et al., 2005;Micheyl et al., 2005;Snyder et al., 2006;Wilson et al., 2007;Bee et al., 2010), it is unclear why Cusack did not observe a neurophysiological correlate of ∆F in the auditory cortex, though an account based on subtle paradigmatic differences cannot be ruled out. Kondo and Kashino (2009) used an event-related fMRI paradigm in order to measure brain activity during perceptual switching. Their subjects listened to tone sequences nearly identical to those used in the present study and indicated when the percept switched from one to two streams and vice versa. In addition to the auditory cortex, significant switch-related activations were found in the posterior insula, medial geniculate body, and supra-marginal gyrus. No explicit contrasts were carried out to test for effects of perceptual organization or ∆F, but the results do highlight the need for further examination of the involvement of areas outside the auditory cortex in streaming.
cortex, but also extending to the middle temporal gyrus as well as inferior and middle frontal gyri -putatively involved in auditory streaming. EP amplitude tightly correlated with ∆F, but did not consistently differ based on perceptual organization alone. Waveform morphology was highly variable within and across brain areas, suggestive of their having different roles in auditory stream formation.
coMplex Meso-scale actIvIty In the audItory cortex durIng streaMIng Results from previous M/EEG (Gutschalk et al., 2005Snyder et al., 2006) and fMRI Wilson et al., 2007) studies of streaming have suggested either a uniform role for the whole of the auditory cortex in stream formation or that the majority of activity in response to stimuli similar to those used in the present study is localized on the superior temporal plane (either on Heschl's gyrus or just posterior to it). The results from the present study demonstrate that, in addition to there being responses in higher auditory areas (i.e., lateral STG), the activity within a given macroscopic brain area is not uniform, a result that has also been noted by other investigators using evoked responses from iEEG with other classic auditory paradigms (Howard et al., 2000;Crone et al., 2001;Brugge et al., 2003Brugge et al., , 2008Edwards et al., 2005Edwards et al., , 2009). This can be seen in the single-subject data shown in Figure 4, where the responses in adjacent electrode sites (e.g., G14 and G15 on the pSTG) indicate intra-areal variability in the response to the ABA-triplets.
This discrepancy may be due to several factors. First, the lead fields of the electrodes used to measure brain activity in the present study are more likely to measure responses from gyral crowns than from sulcal sources such as those located on the superior temporal plane (the area to where non-invasive studies have localized dipoles during streaming), although others have reported iEEG potentials interpreted to arise from sulci (Edwards et al., 2005;Acar et al., 2009;Whitmer et al., 2010). We observed little evidence for sources on the STP in that (i) there were very rarely clear polarity reversals across the lateral fissure and (ii) the earliest peak in the average response to sequence onset was >50 ms, later than the earliest response in the medial portion of the transverse gyrus of Heschl, which occurs at <25 ms (Liegeois-Chauvel et al., 1991). This last point does not preclude the possibility that some of the responses we measured arose from lateral portions of the STP, particularly in the N1-latency range (Gutschalk et al., 2005;Snyder et al., 2006). However, to us, this seems unlikely given point (i). Second, the responses we observed from the lateral STG could have radial source orientations, which would not be identified with MEG but could be with EEG. Indeed, Snyder et al. (2006) reported radially oriented sources which could have been localized to the STG. Third, although both aforementioned fMRI studies of streaming -as well as others (Deike et al., 2010) -reported activation maps with multiple foci of activation, the complex relationship between auditory-evoked responses and the fMRI BOLD signal (Mukamel et al., 2005;Gutschalk et al., 2010;Mayhew et al., 2010;Mulert et al., 2010Mulert et al., , 2005Steinmann and Gutschalk, 2011) as well as BOLD-fMRI's low temporal resolution precludes a detailed characterization of areal sub-specialization. Fourth, and perhaps most likely, the activity recorded by EEG and, to a lesser extent, of auditory perceptual organization and that we simply were unable to examine activity from these areas. Second, although the possibility that the known issue of trial-to-trial variability in the evoked potentials caused the lack of a significant percept-based finding cannot be ruled out, we find this explanation unlikely given the robust effects of ∆F as well as the relatively flat waveforms in the pre-sequence baseline period we observed. Finally, the neural correlates of auditory streaming could be found (i) in another cortical area not sampled, (ii) in a distributed network of brain areas which could not be determined based on the uni-variate analyses used, (iii) on a finer spatial scale than was assessed by the present study, or (iv) in an aspect of neural activity not examined such as sustained potentials or sustained gamma-band activity, though our analysis of evoked gammaband power showed neither ∆For percept-based effects. This is perhaps due to the relatively constant acoustic stimulation used in our paradigm vs. the less frequent stimuli used in previous reports demonstrating large gamma-band effects (Crone et al., 2001(Crone et al., , 2006Edwards et al., 2005

acknowledgMents
The authors wish to thank the patients and their families for their participation. The authors also wish to thank hospital staff, particularly Kristy Trip, Kara Houghton, Amy Trongnetrpunya, Olga Felsovalyi, and members of the Cortical Neurophysiology Laboratory at MGH including Alex Chan, Justine Cormier, Corey Keller, and Rodrigo Zepeda. Finally, the authors would like to thank Jennifer Melcher, Peter Cariani, and Barbara Shinn-Cunningham for helpful comments. Work supported by NIDCD grant T32 DC00038 to Andrew R. Dykstra, NIBIB grant T32 EB001680 to Andrew R. Dykstra, an Amelia Peabody Charitable Trust grant to Andrew R. Dykstra, NIH grant NS18741 to Eric Halgren, NINDS grant NS062092 to Sydney S. Cash.
Our results demonstrate that the cortical areas engaged during the streaming paradigm and much more complex and widespread than has been shown by previous work, and highlights the need for detailed neurophysiological examinations of the streaming paradigm in behavioral animal models.

FaIlure to oBserve correlates oF BIstaBIlIty
Contrary to the study of the visual system in which there are many reports of brain activity covarying directly with perception (Logothetis, 1998;Leopold and Logothetis, 1999;Sterzer et al., 2009), such observations are scarce in the auditory system (Hillyard et al., 1971;Cusack, 2005;Gutschalk et al., 2005Gutschalk et al., , 2008. By recording brain activity with high spatiotemporal precision from widespread areas of the human cortex, the present study attempted to identify neural correlates of streaming, per se, in the absence of physical stimulus differences. As mentioned above, Cusack (2005) reported increased BOLD activity in the anterior intraparietal sulcus during 2-vs. 1-stream percepts but did not find percept-or ∆F-based differences in the auditory cortex. The latter finding is contrary to what Gutschalk et al. (2005) reported using magnetoencephalography, namely amplitudes of the P 1 m and N 1 m components evoked by the B-tone in a sequence of ABA-triplets which co-varied with both ∆F and perceptual organization, per se. No evidence for activity in the intraparietal sulcus was found in that study, though this could be due to activity in the Cusack study not being precisely time-locked to the stimuli, a condition necessary for the measurement of evoked responses with EEG or MEG. Neither finding -increased activity in the intraparietal sulcus or planum temporale during 2-vs. 1-stream percepts -was replicated by the present study, possibly due to lack of coverage in the areas of activity reported by both Cusack and Gutschalk et al. (intraparietal sulcus, transverse gyrus on the superior temporal plane) or, again, that the electrical activity responsible for the generation of the BOLD effects reported by Cusack was not time-locked to the stimuli.
Possible explanations for why we did not observe robust correlates of perceptual bistability despite widespread cortical sampling (see Figure 3) are many. First, although it seems unlikely to us given the large amount of data suggesting a role for frontal areas in conscious visual perception (Libedinsky and Livingstone, 2011), it could be that the areas reported by Cusack (2005) and Gutschalk et al. (2005) are unique in maintaining representations appendIx Figure A1 | Complementary dissimilarity index. Dissimilarity index computed using the grand-average EP collapsed across all conditions as the template and the per-condition average EPs as the test waveforms. As in Figure 7, the left panel shows the dissimilarity index as a function of frequency separation collapsed across percept. The right panel shows the dissimilarity index as a function of percept collapsed across ∆F conditions in which the percept was bistable.
Figure A2 | High-gamma-power waveforms.  power waveforms for the same subject as in Figure 4. The legend is as in Figure 4 except for ordinate units, which are now in dB with respect to the pre-stimulus baseline period.