Original Research ARTICLE
Front. Hum. Neurosci., 28 March 2008 | https://doi.org/10.3389/neuro.09.005.2007
Neural substrate of concurrent sound perception: direct electrophysiological recordings from human auditory cortex
INSERM, U821, Lyon, France
Institut Fédératif des Neurosciences, Lyon, France
Université de Lyon, Lyon, France
Functional Neurology and Epileptology Department, Neurological Hospital, Lyon, France
MEG Department, CERMEP, Lyon, France
In everyday life, consciously or not, we are constantly disentangling the multiple auditory sources contributing to our acoustical environment. To better understand the neural mechanisms involved in concurrent sound processing, we manipulated sound onset asynchrony to induce the segregation or grouping of two concurrent sounds. Each sound consisted of amplitude-modulated tones at different carrier and modulation frequencies, allowing a cortical tagging of each sound. Electrophysiological recordings were carried out in epileptic patients with pharmacologically resistant partial epilepsy, implanted with depth electrodes in the temporal cortex. Patients were presented with the stimuli while they performed an auditory distracting task. We found that transient and steady-state evoked responses, and induced gamma oscillatory activities were enhanced in the case of onset synchrony. These effects were mainly located in the Heschl’s gyrus for steady-state responses whereas they were found in the lateral superior temporal gyrus for evoked transient responses and induced gamma oscillations. They can be related to distinct neural mechanisms such as frequency selectivity and habituation. These results in the auditory cortex provide an anatomically refined description of the neurophysiological components which might be involved in the perception of concurrent sounds.
In ecological situations, we are often confronted with a mixture of sounds produced by several simultaneously active sources and it is crucial to be able to parse and identify these multiple sources. In fact, within this mixture, independently of the orientation of our attention, we can suddenly detect one acoustic sequence or stream that can be clearly isolated from other sounds and attributed it to a source, like a bird song in the forest or a motorbike acceleration in a noisy street. This acoustic segregation relies on the ability of our auditory system to group together or to dissociate acoustic components at a given instant and over time, leading to the perception of several simultaneous auditory streams.
Many psychoacoustical studies (Bregman, 1990 ) have shown the contribution of various acoustic features (e.g., pitch, timbre, location, onset synchrony) in automatically segregating or grouping acoustic events. In particular, the acoustical context in which a sound is presented can highly influence its perception, i.e., either as belonging to a pre-existing stream or constituting a new one. This has been confirmed by electrophysiological recordings in humans (Alain and Izenberg, 2003 ; Nager et al., 2003 ; Sussman et al., 1999 , 2005 ; Takegata et al., 2005 ; Winkler et al., 2003 ). Nevertheless, few studies have investigated the neural mechanisms involved in the perceptual organization of sounds, and, among those, most of them have investigated the segregation of alternating sounds (see Micheyl et al., 2007 for a review). The present study is focusing on the neural mechanisms by which the auditory system dissociates or groups overlapping auditory components, as a function of the acoustical context. These neural mechanisms could be reflected in different electrophysiological responses such as evoked transient and sustained responses, steady-state responses (elicited by amplitude-modulated sounds) and induced gamma oscillations. Indeed, different evoked responses were found for harmonic (grouped percept) and mistuned (segregated percept) stimuli, in the chinchilla inferior colliculus (Sinex et al., 2002 ) and in human auditory cortex (Alain et al., 2001 , 2002 ; Dyson and Alain, 2004 ). Moreover, the steady-state responses can be used for tagging the electrophysiological activity corresponding to each concurrent component independently, when they have distinct amplitude modulation frequencies (Bidet-Caulet et al., 2007 ; Draganova et al., 2002 ; Lins and Picton, 1995 ). Finally, induced gamma oscillatory activities (20–100 Hz) are assumed to be involved in the construction of coherent objects (Bertrand and Tallon-Baudry, 2000 ) and could thus play an important role in auditory grouping. It is thus hypothesized that these different electrophysiological responses could be differentially modulated according to the percept, in particular that gamma oscillations could be increased in the case of auditory grouping.
In the present electrophysiological study, we manipulated sound onset asynchrony to induce two perceptual situations, the segregation or grouping of two concurrent sounds (Bregman, 1990 ; Darwin et al., 1995 ; Turgeon et al., 2002 ). Stimuli were made of two partially overlapping components having different carrier frequencies and amplitude modulation frequencies (21 and 29 Hz): the 21-Hz and 29-Hz components (Figure 1 ). To have a similar acoustical structure across conditions, the 21-Hz component always started first (part 1), and then, at the onset of the 29-Hz component (part 2), its carrier frequency could be kept constant (pitch continuity) or shifted (pitch discontinuity). To reduce electrophysiological differences due to the acoustical properties of the sounds, stimuli were identical in the last part, in the two conditions. By behavioral testing, we assessed that, in case of pitch continuity of the 21-Hz component (i.e., onset asynchrony), the two components were perceived as two distinct streams during part 2 (2-stream condition), whereas in case of pitch discontinuity (i.e., onset synchrony), they were perceived as one complex stream (1-stream condition). In other words, in the 2-stream condition, the second sound (29-Hz component) was perceived as an additional acoustic stream superimposed to and distinct from the first one (21-Hz component), while in the 1-stream condition, the second sound was merging to the first one to form a single and new complex stream. We used intracranial electrophysiological (EEG) recordings of epileptic patients with pharmacologically resistant epilepsy, implanted with multicontact depth electrodes in the temporal cortex. This constitutes a unique approach for providing a precise time-course and localization of the neurophysiological correlates of concurrent sound perception in humans. During the electrophysiological recordings, the patients were presented with both types of stimuli and performed an auditory distracting task which required focusing their attention away from the streams. Analyzing part 2 of the stimulus, when both components are present, allowed us to compare the electrophysiological responses to acoustically identical stimuli, corresponding to distinct percepts (one or two streams) because of different history in the part 1, i.e., different acoustical contexts. We investigated different electrophysiological components (transient and sustained evoked responses, steady-state responses and gamma oscillatory activities) to have a comprehensive insight into the neural mechanisms that could be involved at distinct processing levels, in different cortical auditory areas (Heschl’s gyrus, planum temporale and polare). From the analysis of these complementary electrophysiological components, directly recorded from the human auditory cortex, we suggest that the implicit segregation or grouping of two concurrent sounds could rely on the interaction of several neurophysiological mechanisms, highly related to the acoustic context, occurring at different latencies and locations in the human auditory system.
Figure 1. Stimuli. Stimuli were composed of two sounds which could be perceived as 1 complex stream or 2 distinct streams, corresponding to the 1-stream or 2-stream condition, respectively.All stimuli were composed of two parts lasting at least 0.810 second (see details in the text). During part 2, all stimuli consisted of two components. The 21-Hz component (gray bars) was composed of two tones, separated by two octaves, and amplitude modulated at a frequency of 21 Hz. The 29-Hz component (black bar) consisted of one tone, separated by one octave from each tone of the first component, and amplitude modulated at a frequency of 29 Hz. During part 2, all stimuli were thus acoustically identical. To have a similar acoustical structure across stimulus conditions, the 21-Hz component was always present during part 1. For 2-stream stimuli, the carrier frequency of the 21-Hz component (gray bars) was maintained constant (pitch continuity) whereas for 1-stream stimuli the 21-Hz component (white bars) was shifted by ±3 semitones (pitch discontinuity). Therefore, in the case of 2-stream stimuli, the 21-Hz component started before the 29-Hz component, which corresponded to an onset asynchrony and induced auditory segregation and the perception of two distinct streams. Conversely, in the case of 1-stream stimuli, the two acoustic components started at the same time (onset of part 2) inducing a grouping of the two sounds into one single complex stream.
Patients. Five patients (4 females and 1 male, age ranging from 22 to 36 years) suffered from pharmacologically resistant partial epilepsy and were candidates for surgery. Because the location of the epileptic focus could not be identified using non-invasive methods, they were stereotactically implanted with multicontact depth probes. Electrophysiological recording is part of the brain functional evaluation that is performed routinely before epilepsy surgery in patients implanted with depth electrodes. According to the French regulations concerning invasive investigations with a direct individual benefit, patients were fully informed about the electrode implantation, stereotactic EEG and evoked potential recordings and the cortical stimulation procedures used to localize the epileptogenic and functional brain areas. All patients gave their informed consent to participate in the experiment. The signals described here were recorded away from the seizure focus. When the recording sessions took place, antiepileptic drugs administered to the patients had been either discontinued or drastically reduced for the purpose of recording seizures. No patient was administered benzodiazepines. None of the patients reported any auditory complaint.
Stimuli and task. Stimuli were composed of two sounds which could be perceived as 1 complex stream or 2 distinct streams, corresponding to the 1-stream or 2-stream condition, respectively. They were randomly presented to patients in 4 blocks of 80 trials (resulting in 160 repetitions in each condition). They had to perform a diverted attention task which consisted of the detection of rare noise bursts superimposed to the stimuli, thus orienting their attention away from the stimulus content. They were instructed to answer as soon as they heard a noise burst, by pressing a button. The task presented no major difficulty but was demanding enough to keep the patients alert: all the patients detected all the noise bursts, except one who missed two of them. After the recording session, patients reported that they did not pay attention to the stimuli on which the noise bursts were superimposed and did not notice the presence of two types of stimuli.
All stimuli were composed of two successive parts (Figure 1 ). Part 1 lasted 0.810 second (1 second for the first patient) and the duration of part 2 was equiprobably chosen between 0.810, 0.905, 1 and 1.095 seconds (1.095, 1.190, 1.286 and 1.381 seconds for the first patient). During part 2, all stimuli are made of two components: the 21-Hz component and the 29-Hz component. The 21-Hz component was composed of two amplitude-modulated tones separated by two octaves, the carrier frequency of the lower one being equiprobably chosen between 659, 698, 740 and 784 Hz. These two tones were both amplitude modulated in phase at a frequency of 21 Hz. The 29-Hz component consisted of one tone separated by one octave from each tone of the first sound. This tone was amplitude modulated at a frequency of 29 Hz. Thus, during part 2, all stimuli were acoustically identical. To have a similar acoustical structure across stimulus conditions, the 21-Hz component was always present during part 1. For 2-stream stimuli, the carrier frequency of the 21-Hz component was maintained constant (pitch continuity) whereas for 1-stream stimuli it was shifted by ±3 semitones (pitch discontinuity). Therefore, in the case of 2-stream stimuli, the 21-Hz component started before the 29-Hz component, which corresponded to an onset asynchrony and induced auditory segregation and the perception of two distinct streams. Conversely, in the case of 1-stream stimuli, the two acoustic components started at the same time (onset of part 2) inducing a grouping of the two sounds into one single complex stream.
The superimposed target sounds were 150-ms band-pass filtered noise bursts (20-semitone wide starting at 784 Hz with 10 ms rise/fall time). The targets were delivered in 15% of the stimuli and randomly occurred during the stimulus, 0.2, 0.5, 1.2 or 1.5 seconds before the end of the stimulus. When a target was present in a trial, the next stimulus started between 0.7 and 1.0 second after the patient’s response, otherwise the inter-trial interval was randomized between 0.9 and 1.4 seconds.
The intensities of all tones were corrected according to their carrier frequency (Botte et al., 1989 , Figure 1.2, p.17) and then, 21-Hz and 29-Hz components were matched in intensity. Stimuli were delivered at an intensity level judged comfortable by the patient at the beginning of the experiment (about 70 dB SPL). Noise bursts were 5 dB above the stimulus intensity level. All component onsets and offsets were linearly ramped during 10 ms. In case of 1-stream stimuli, the 21-Hz component was also ramped on both sides of the pitch discontinuity.
Stimulus duration, carrier frequency and noise burst occurrence were randomized to limit habituation and predictability.
Recording and signal analysis. Intracranial recordings were performed at the Functional Neurology and Epilepsy Department (Lyon Neurological Hospital). EEG was recorded from 64 intracranial electrode contacts referenced to an intracranial contact away from the superior temporal cortex (ground electrode being at forehead). Signals were amplified, filtered (0.1–200 Hz bandwidth) and sampled at 1000 Hz (Synamps, Neuroscan Labs, Sterling, VA, USA).
The analysis was restricted to the electrodes located in the temporal cortex and its immediate vicinity. Raw data were visually inspected and trials showing epileptic spikes were discarded. Because of excessive epileptic spikes, patient #3 was excluded from evoked response analysis but kept for steady-state and gamma oscillation analysis because their frequency bands were less contaminated by the spectral content of epileptic spikes. Trials with a noise burst occurring before 1.620 seconds (about 12% of all trials) and incorrect trials were rejected from further analysis. The mean numbers of correct and non-artifacted trials were 134 and 131.6 out of 160 for 1-stream and 2-stream conditions, respectively.
ERPs were averaged during the shortest duration of each part (i.e., 0.810 second for each part), separately for the two conditions. The ERP of the two periods (parts) were corrected with respect to the same baseline defined between –150 and 0 ms before part 1 stimulus onset.
Both the steady-state periodic response, evoked by the amplitude modulation of the sound components at 21 and 29 Hz (phase-locked to the stimulus content) and the beta/gamma oscillations, in the 20–100 Hz range, induced by the stimulus (jittering in latency from trial to trial) were analyzed by means of a wavelet decomposition which provides a good compromise between time and frequency resolutions. Each single-trial signal was transformed in the time–frequency domain by convolution with complex Gaussian Morlet’s wavelets with a ratio f/σf of 10 with f being the central frequency of the wavelet and σf its standard deviation (Tallon-Baudry and Bertrand, 1999 , for details). The resulting time–frequency powers were then averaged across trials. This method led to a power estimate of both evoked and induced activities in the time–frequency (TF) domain. A baseline correction was applied on TF plots by subtracting the prestimulus power between –250 and –150 ms before stimulus onset, in each frequency band. To distinguish induced oscillatory activities from phase-locked evoked activities (i.e., transient and steady-state responses), we computed, at each point of the TF domain, the stimulus phase-locking factor (PLF) from the single-trial TF analysis (Tallon-Baudry et al., 1996 ). This factor ranges from 0 (uniform phase distribution, i.e., high latency jitter) to 1 (strict phase-locking to the stimulus). The Rayleigh statistic was used to test for the non-uniformity of phase distribution (Jervis et al., 1983 ). Therefore, with the number of trials used in this experiment, when the PLF was less than 0.17, oscillations were considered to be non-phase-locked to the stimulus.
Statistical analysis. The statistical analysis in the two conditions, during both parts, focused on different electrophysiological components: the transient evoked responses to the onsets of parts 1 and 2, the slow wave responses elicited during the whole duration of the stimulus (sustained responses), the steady-state activities at 21, 29 and 42 Hz (first harmonic of the 21 Hz), and the induced gamma oscillations. As the data were not normally distributed, only non-parametric tests (Wilcoxon or Mann–Whitney) were used with a number of trials superior to 100.
To identify, in each patient, the electrode contacts where a transient or sustained response was emerging, we computed a time varying Wilcoxon test from the single trials. It was applied on the mean amplitude of successive 20-ms windows between 0 and 200 ms and successive 100-ms windows between 200 and 800 ms after the onset of each part, compared to a prestimulus baseline (defined between –100 and 0 ms before part 1 onset). For contacts that showed a significant response, condition differences were estimated by a Mann–Whitney test applied to the same windows as described for the response identification test.
The emergence of steady-state activities was assessed by a Wilcoxon test on the mean power of each frequency (21, 29 and 42 Hz) over successive 100-ms windows between 0 and 800 ms after the onset of each part, compared to the prestimulus baseline power (between –250 and –150 ms before part 1 onset) of each frequency (note that the prestimulus baseline is shifted away from 0 ms because wavelet analysis tends to stretch out the early poststimulus low frequency components). For contacts that showed a significant steady-state response, condition differences were estimated by a Mann–Whitney test applied to the same windows as described for the response identification test. As the carrier frequencies of the 21-Hz component were the same during parts 1 and 2 of the 2-stream condition and part 2 of the 1-stream condition, we compared, with a Mann–Whitney test, the mean power between 200 and 800 ms of part 2 for each condition to the mean power between 200 and 800 ms of part 1 in the 2-stream condition.
The emergence of oscillatory beta/gamma activities was detected in the time–frequency domain with a Wilcoxon test on the mean power of 100 ms × 10 Hz windows (from 0 to 800 ms and from 20 to 100 Hz) compared to the prestimulus power (between –250 and –150 ms before part 1 onset) in the same frequency band. To circumscribe the components which are not phase-locked to the stimulus, i.e., induced oscillations, the combination of two criteria in the time–frequency domain was used: an emergence assessed by the Wilcoxon test and an absence of phase-locking to the stimulus assessed by the Rayleigh test (PLF < 0.17), as defined above. The 1- and 2-stream conditions were then compared with a Mann–Whitney test on the mean power of 50 ms × 6 Hz windows where induced gamma activities were previously found to emerge.
For each component, only those effects are discussed that met Bonferroni-corrected p-value criteria. In each patient, the probability threshold of 0.05 was thus divided by the number of tested windows over all investigated electrode contacts. The statistical analysis will mainly focus on the responses observed in part 2 during which the acoustical content of the stimuli were strictly the same but perception differed.
All signal analysis was performed with the ELAN-Pack software developed at INSERM U821.
Anatomical registration. Electrode contacts were 2 mm long and spaced every 3.5 mm (center-to-center). Depth probes (diameter 0.8 mm) with 10 or 15 contacts each were inserted perpendicularly to the sagittal plane using Talairach’s stereotactic grid (Talairach and Tournoux, 1988 ). Numbering of contacts is increasing from medial to lateral along an electrode track. Electrode locations were measured on X-ray images obtained in the stereotactic frame. The depth of penetration of each contact was measured on the frontal X-ray image from the tip of the electrode to the midline, which was visualized angiographically by the sagittal sinus. The co-registration of the lateral X-ray image and a midsagittal MRI scan, both having the same scale of 1, allowed to measure the electrode coordinates in the individual Talaraich’s space defined by the median sagittal plane, the AC–PC (anterior commissure – posterior commissure) horizontal plane and the vertical AC frontal plane, these anatomical landmarks being identified on the 3D MRI scans. This procedure led to the superposition of each electrode contact onto the patients’ structural MRIs. The accuracy of the registration procedure was 2 mm, estimated on another patient’s MR images obtained just after electrode explantation and in which electrode tracks were still visible.
Eventually, electrode contacts and experimental effects have been visualized on individual 3D rendering of isolated temporal cortices to facilitate the identification of auditory brain structures and the comparison of activated sites across subjects. The cortical surface was individually segmented by using FreeSurfer software (http://surfer.nmr.mgh.harvard.edu).
In a group of healthy subjects, a psychophysical experiment was used to ensure that the two types of stimuli implicitly induced two distinct perceptions. We considered that if perceptions during part 2 were different (whereas stimuli are acoustically identical during this part), behavioral performances in a stream-irrelevant auditory task could be differentially influenced. We chose to compare the detection thresholds of offset asynchrony for 1-stream and 2-stream stimuli.
Seventeen paid subjects participated in the behavioral study. All were free of neurological diseases and had normal hearing. Written informed consent was obtained from each subject.
A constant Yes/No procedure tracking 50% correct responses was used to define the detection threshold of offset asynchrony. The 1-stream and 2-stream stimuli described above were randomly presented to the subjects. In this experiment, the 29-Hz component could end either at the same time as the 21-Hz component (as in the electrophysiological experiment) or slightly before (time difference: DT). Subjects had to indicate whether the two components were ending at the same time, or not, by pressing, with the same hand, a left or right button, respectively.
Only those subjects who have more than 65% of correct responses for both DT = 0 ms and their maximal DT value, were kept in the analysis. Only 10 subjects (6 men, age ranging from 22 to 34 years) fulfilled these criteria. Subject’s psychometric function was fitted by a sigmoid function and the offset asynchrony detection threshold was defined as the DT value leading to 50% correct responses. The detection thresholds obtained in each condition were compared with a Wilcoxon test.
Behavioral Assessment of the Stream Perception Induced in the Two Stimuli
The aim of the psychophysical experiment was to ensure, by an indirect measure, that the two kinds of stimuli automatically induced two different perceptions (one stream or two streams) even when the subject is paying attention to aspects of the stimulus (e.g., offset asynchrony) that are not related to stream segregation. We chose to compare the detection thresholds of offset asynchrony for 1-stream and 2-stream stimuli. We found that the mean detection thresholds of offset asynchrony were significantly different in 1-stream and 2-stream conditions (52.1 and 71.5 ms, respectively, p = 0.017). Subjects were better at detecting offset asynchrony in the 1-stream rather than in the 2-stream condition.
In addition to the psychophysical experiment, we played the sounds to the subjects and asked them whether they heard one or two sounds during the second part. They always responded ‘one’ and ‘two’ for 1-stream and 2-stream conditions, respectively.
Moreover, four of the five patients also performed an active stream segregation task with the same 2-stream stimuli (Bidet-Caulet et al., 2007 ). They had to detect a spatial shift of the 21-Hz component, which was impossible to perform without segregating the two streams. Their hit rate was ranging between 85 and 99% correct responses.
Thus we have shown, with these stimuli, that onset asynchrony explicitly induces stream segregation in patients and implicitly induces auditory segregation in normal subjects, as it has already been shown with other types of sounds (Bregman, 1990 ; Darwin et al., 1995 ; Turgeon et al., 2002 ; Turgeon et al., 2005 ). The electrophysiological experiment was performed on alert patients engaged in detecting stream-irrelevant sounds, and therefore, the observed brain responses could be related to implicit concurrent sound segregation/fusion.
Three patients were implanted in the right hemisphere only, and two in both hemispheres. In all implanted hemispheres, at least one electrode track was located in the superior temporal plane. Electrodes H or H′ (for right or left hemisphere, respectively) were positioned posteriorly, passing through the Heschl’s gyrus (HG), the planum temporale (PT) and the superior temporal gyrus (STG), and electrodes T, T′ and W were positioned anteriorly, passing through the HG and the anterior PT or the planum polare (PP). Electrode A, A′, B, B′, and C were penetrating through the middle temporal gyrus (MTG). Electrode N was located just above the superior temporal plane, in the parietal operculum.
Even though intracranial recordings in epileptic patients provide a sparse spatial sampling of the auditory cortex, we could access not only primary auditory areas (posteromedial part of the HG) but also posterior and anterior secondary auditory regions.
Electrophysiological Components Modulated as a Function of Perception
Distinct electrophysiological components were identified in different parts of the auditory cortex. Figure 2 illustrates all these components, in the 1-stream condition, on the 3D rendering of patient #4’s temporal cortex. The 21, 29 and 42 Hz steady-state evoked activities were found in the posteromedial part of the HG and more anteriorly in the HG. Transient and sustained evoked responses were observed at most of the electrode contacts located in the superior temporal plane: in the HG, in the posterior part of the STG and in the planum temporale. Induced gamma oscillations, not phase-locked to stimulus onset, were found in the anterior part of the HG and in the lateral STG.
Figure 2. Illustration of the typical electrophysiological responses and their location in the 3D rendering of the temporal cortex (1-stream condition in patient #4). Evoked responses (obtained by single-trial averaging) are plotted in green. Transient and sustained evoked responses were observed at most of the electrode contacts located in the superior temporal plane: in the Heschl’s gyrus, HG (electrode contacts H7, T5 and T6), in the superior temporal gyrus, STG (T9) and in the planum temporale, PT (H12). The time–frequency plot of these activities (time–frequency power averaged after a wavelet-based analysis on each single trial) permits a good visualization of both evoked steady-state and gamma induced oscillations. Twenty-one hertz, 29 and 42 Hz steady-state evoked activities were found in the posteromedial part of the HG (H7) and more anteriorly in the HG (T6). Induced gamma oscillations were found in the anterior part of the HG (T5) and on the lateral STG (T9). The time profiles of induced activities at 80 Hz are depicted in red. When evoked and induced activities were present at the same place, in the same frequency band (see T5 as an example), the phase-locking factor indicated which activities were evoked (phase-locked to the stimulus: PLF > 0.17). The time profiles of the phase-locking factor at 80 Hz are depicted in blue. All these responses are baseline corrected with respect to the prestimulus period preceding part 1 onset.
All these components were present during both parts 1 and 2. Since part 1 is acoustically different in the 1- and 2-stream conditions, we concentrated on the electrophysiological activities of part 2, during which, inputs were acoustically identical whereas perceptions, induced by the preceding part, were different. All 1- vs. 2-stream effects on the different electrophysiological components following part 2 onset will be presented, in all patients, on the 3D rendering of their individual temporal cortex. To illustrate the time-course of the responses, typical waveforms will be plotted from electrode contacts showing such effects.
Transient and Sustained Evoked Responses (4 Patients)
Most contacts of electrodes H, H′, T, T′ and W presented transient evoked responses. Three main transient waves were found after the onset of part 2. A first one was maximal between 50 and 80 ms, a second one around 100 ms (between 80 and 150 ms) and a third one starting around 150 ms. The two first waves were significantly more prominent in the 1-stream condition, between 60 and 80 ms and between 80 and 150 ms, on several contacts of the four patients kept for this analysis. The third one presented a steeper slope in the 1-stream condition, between 120 and 200 ms. These effects were mainly located in the anterolateral HG and in the lateral part of the STG (Figure 3 ).
Figure 3. Modulations of evoked transient and sustained responses on the 3D rendering of individual temporal cortices. After the onset of part 2 (0 ms), three main transient waves were found more prominent in the 1-stream condition, on several contacts of the four patients kept for this analysis. A first one was modulated between 60 and 80 ms (orange), a second one between 80 and 160 ms (yellow) and a third one between 120 and 200 ms (green). These effects were mainly located on the lateral STG and their time-courses are depicted. In patient #4, effects on the sustained response were found between 200 and 700 ms: mean amplitude was greater in the 2-stream condition (red circles). Yellow-green hatchings correspond to electrode contacts where both second and third effects are present. These evoked responses are baseline corrected with respect to the (−100, 0 ms) period preceding part 1. Each number corresponds to a patient, patients #2 and 5 being implanted in both hemispheres.
At most of the electrode contacts presenting transient responses, a sustained slow wave was present until the end of the stimulus, with generally no condition effect. Only in patient #4, effects were found between 200 and 700 ms after the onset of part 2. In this case, the mean amplitude was significantly greater in the 2-stream condition. It should be noted that in the 2-stream condition very small transient evoked responses were generally elicited so that the sustained wave could begin earlier than in the 1-stream condition.
Steady-State Evoked Activities
Steady-state evoked responses were emerging at 21, 29 and 42 Hz on several contacts for all patients (Figure 4 ). Twenty-one hertz, 29 and 42 Hz activities could be observed at the same electrode contacts, but in most of the cases, their respective maximum values were at different adjacent contacts. A first focus was found bilaterally in the posteromedial part of the HG and a second one in the right hemisphere only, in a more anterior part of the HG. These activities were emerging during several hundreds of ms and, in some cases, during the whole duration of part 2.
Figure 4. Emergence and modulations of evoked steady-state activities on the 3D rendering of individual temporal cortices. Emergence after the onset of part 2 (0 ms) are depicted on the temporal cortices, in green for 21 and yellow for 29 Hz steady-state activities. A first focus of steady-state activities was bilateral and located in the posterior part of the HG and a second one in the right hemisphere only, in a more anterior part of the HG. Twenty-one hertz activities were found to be enhanced in the 1-stream condition, in four patients (‘1-stream > 2-stream’, red circles). Time profiles at 21 Hz of these effects are plotted (significant differences are defined by light-red shaded areas). The power of these oscillatory activities is baseline corrected with respect to the (−250, −150 ms) period preceding part 1. Forty-two hertz steady-state emergence and effects presented quite similar topographies than 21 Hz steady-state activities. Each number corresponds to a patient, patients #2 and 5 being implanted in both hemispheres.
Twenty-one hertz and 42 Hz activities were found significantly greater in the 1-stream condition, in four and three patients, respectively. These effects were mainly concentrated between 400 and 700 ms for 21 Hz, and between 300 and 700 ms, for 42 Hz activities. No condition effect was found concerning the 29 Hz steady-state activity.
The evolution of the 21 Hz power over the whole stimulus duration was also analyzed. When comparing the 21 Hz power between part 2 of each condition and part 1 of the 2-stream condition (comparison of 200–800 ms time-windows after part 1 and 2 onset), we found that, among the electrode contacts presenting a 21 Hz steady-state activity, 67 and 81% of them presented a greater activity during part 1, and only 5 and 2% during part 2, in the 1-stream and 2-stream conditions, respectively.
Induced Gamma Oscillations
Induced gamma oscillations were emerging at few contacts for each patient (Figure 5 ). They could be observed quite laterally in the HG and the STG. These activities were most frequently emerging between 100 and 350 ms in the 50–90 Hz frequency band (Figure 6 ). They were found significantly more prominent in the 1-stream condition, in four patients. These effects were quite distributed in the time–frequency (TF) domain (peaking between 150 and 300 ms) and were lasting 100 ms on average.
Figure 5. Emergence and modulations of induced gamma oscillations on the 3D rendering of individual temporal cortices. Emergence of induced gamma oscillations after the onset of part 2 (0 ms) is depicted in green. Induced gamma oscillations were emerging quite laterally in the HG and STG. They were found more prominent in the 1-stream condition, in four patients (‘1-stream > 2-stream’, red circles). At each electrode contact, significant effects were found in a specific frequency band. The gamma power time profiles of the corresponding frequency bands are plotted (significant differences are indicated by light-red shaded areas). The power of these oscillatory activities is baseline corrected with respect to the (−250, −150 ms) period preceding part 1. Each number corresponds to a patient, patients #2 and 5 being implanted in both hemispheres.
Figure 6. Emergence of induced gamma oscillations after the onset of part 2 (0 ms). Representation, in the time–frequency domain, of the mean number of electrode contacts across patients which presented significantly emerging induced gamma oscillations. By means of a time–frequency criteria based on the stimulus phase-locking factor (PLF), only non-phase-locked induced activities are represented here. These activities were most frequently present between 100 and 350 ms in the 50–90 Hz frequency band.
Topographies of the Different Electrophysiological Activities
The effects found for each component were combined on the individual 3D cortical renderings (Figure 7 ) to compare their topographies. Effects on induced gamma oscillations and transient evoked responses were mainly found in the lateral STG with a tendency to overlap. Effects on steady-state activities were more medial in the HG and had little overlap with other activities.
Figure 7. Topographies of the different electrophysiological activities on the 3D rendering of individual temporal cortices. The effects ‘1-stream > 2-stream’ after the onset of part 2 are depicted in yellow for transient evoked responses, green for steady-state activities and red for induced gamma oscillations. Effects for induced gamma oscillations and transient evoked responses were mainly found in the lateral STG with a tendency to overlap. Effects for steady-state activities were more medial in the HG and had little overlap with other activities. Each number corresponds to a patient, patients #2 and 5 being implanted in both hemispheres.
In the present experiment, we manipulated the onsets of two concurrent sounds to generate different percepts (segregation or fusion). We found that transient and steady-state evoked responses, and induced gamma oscillations, directly recorded in the human superior temporal cortex, are more pronounced in the case of onset synchrony of the two concurrent sounds than in the case of onset asynchrony. These findings are noteworthy, because they provide both a precise time-course and a detailed localization of the neurophysiological correlates of concurrent sound perception in humans. Indeed, investigating epileptic patients with intracranial recordings represents a rare opportunity to combine high spatio-temporal resolution of brain activities with well controlled behavioral measures. The earliest effect was found around 60 ms on the transient evoked responses elicited in secondary auditory areas (posterior lateral STG). This effect is subsequently spreading over the PT and the lateral STG, until 200 ms. Induced gamma oscillations were then modulated in the same or nearby regions, until 300 ms. Steady-state evoked responses were in turn affected during several hundreds of ms in the primary auditory cortex and in the anterolateral part of the HG, in the right hemisphere.
Spatio-Temporal Characterization of the Auditory Responses
Three transient evoked responses were found significantly enhanced in the 1-stream condition. The first wave, between 50 and 80 ms, was more pronounced in the lateral HG, in one patient only. The two later waves around 100 and 150–200 ms were more pronounced at two foci, one in the most anterior and lateral part of the PT and one more posteriorly. These effects are thus mainly located in associative areas, and not in the primary auditory cortex (PAC), located in the posteromedial part of the HG (Liegeois-Chauvel et al., 1991 ; Rivier and Clarke, 1997 ). The topographical distribution of those waves is similar to that found for tone bursts in previous intracranial studies (Howard et al., 2000 ; Liegeois-Chauvel et al., 1994 ).
The 21, 29 and 42 Hz steady-state activities were found bilaterally in the posteromedial part of the HG, corresponding to the PAC, and in the right hemisphere only, in the anterolateral part of the HG, considered as a secondary auditory area. The origin of steady-state activities in the PAC is consistent with previous studies (Gutschalk et al., 1999 ; Liegeois-Chauvel et al., 2004 ). Moreover, Gutschalk et al. (1999 ) found an additional generator of 40 Hz steady-state activity in the lateral HG, bilaterally, and Liegeois-Chauvel et al. (2004) in different auditory associative areas according to the frequency of the amplitude modulation.
Induced gamma oscillations were emerging, in the lateral HG and STG, in the 50–90 Hz frequency band, between 100 and 350 ms after the onset of part 2. With electrocorticographic (ECoG) recordings, similar activities were found in primary and secondary auditory areas in rats (Franowicz and Barth, 1995 ) and anesthetized monkeys (Brosch et al., 2002 ), and in the human STG (Crone et al., 2001 ; Edwards et al., 2005 ). The time–frequency characteristics of the observed induced gamma are consistent with previous MEG findings (Pantev, 1995 ). To our knowledge, the present study constitutes the first attempt to precisely localize these gamma oscillations in the human superior temporal plane, since ECoG does not permit to disentangle the contribution of different auditory regions (HG, PT, PP). Our results show that induced gamma oscillations can be elicited in several non-primary auditory areas.
Role of the Frequency Selectivity of Auditory Areas
In the present study, we manipulated the acoustical context during the part 1 (sound onset asynchrony/synchrony) to induce the segregation/fusion of the two concurrent sounds during the part 2. The two kinds of stimuli were thus acoustically different during the part 1 only. For the 2-stream stimuli, the carrier frequency of the 21-Hz component was maintained throughout parts 1 and 2, whereas, for 1-stream stimuli, it presents a discontinuity at part 2 onset. Therefore, the only acoustic difference between the two conditions was the pitch continuity or discontinuity of the 21-Hz component between parts 1 and 2.
As several human auditory areas, including the lateral HG and the PT, are known to be tonotopically organized (Formisano et al., 2003 ; Howard et al., 1996 ; Pantev et al., 1995 ; Romani et al., 1982 ; Talavage et al., 2004 ), a pitch discontinuity most likely activates new neural populations contributing to additional electrophysiological components. Thus, in the 1-stream condition, transient evoked response enhancement can be due to an additional ON-response induced by the pitch discontinuity. If an OFF-response to the offset of the 21-Hz component of part 1 cannot be ruled out, its contribution is probably minor (Makela et al., 1988 ). Even though this effect is found in the lateral STG, a contribution, through parallel thalamocortical pathways (Kaas et al., 1999 ), of subcortical relay nuclei, also known to be tonotopically organized, may also be considered. Because of relatively limited signal-to-noise ratio, we can not rule out the existence of an effect on earlier components in the PAC.
Similarly to the transient evoked responses, the enhancement of induced gamma oscillations in the 1-stream condition could be due to the frequency selectivity of neurons in the lateral STG. However, if, in monkey, the occurrence of induced gamma oscillations has been found to be related to frequency-selective neurons in the primary auditory area and in the caudomedial field, probably homologous of the human medial PT (Brosch et al., 2002 ), no such frequency selectivity of induced oscillations has been reported in the lateral STG.
Therefore, the enhancement of transient evoked responses and induced gamma oscillations, in the 1-stream condition, could be related to the frequency selectivity of the tonotopically organized auditory cortical fields. Transient evoked responses and induced gamma oscillations could reflect the encoding of low-level information, like sound changes (e.g., sound onset and pitch step), and also the simultaneity of these sound changes.
The Perceptual Binding Hypothesis in the Context of Concurrent Sound Perception
An alternative interpretation of the induced gamma oscillation enhancement could be based on the ‘perceptual binding hypothesis’ (von der Malsburg and Schneider, 1986 ). This hypothesis proposes that induced gamma oscillations could correspond to the neural mechanism by which the different areas, involved in the processing of distinct components of the same object, could interact together to lead to the construction of a coherent object (Tallon-Baudry and Bertrand, 1999 ). Our results are consistent with this hypothesis, since induced gamma oscillations are more pronounced when the two sounds are grouped together into one complex stream.
This grouping process seems to involve high-frequency gamma oscillations (i.e., above 50 Hz). This observation is consistent with recent MEG findings in the visual modality (Vidal et al., 2006 ) showing that induced high gamma oscillations can be modulated by the grouping properties of visual stimuli, high gamma being larger in a one-group condition than in two-group conditions irrespective of attention orientation. Therefore, as in the visual system, high-frequency gamma oscillations can be automatically induced in the auditory cortex by an acoustic event, and seem to be modulated in amplitude according to the number of perceptual groups formed, even in case of diverted attention.
Habituation of Steady-State Responses
The 21 and 42 Hz steady-state activities were found significantly greater in the 1-stream than in the 2-stream condition, between 300 and 700 ms during part 2. In fact, this effect does not correspond to an enhancement of those activities in the 1-stream condition, but rather to a greater reduction in the 2-stream condition. Indeed, it appeared that the power of steady-state activities tends to decrease with repetition, as it is observed from the onset to the end of the stimulus in both conditions. This diminution is most likely to be related to habituation mechanisms (Thompson and Spencer, 1966 ). The reason why these activities were found less reduced in the 1-stream condition (whereas the acoustic inputs were identical) is certainly the change in pitch of the 21-Hz component. Since steady-state activities have been shown to be generated in the PAC and the anterolateral part of the HG, known as tonotopically organized regions (Pantev et al., 1996 ; Romani et al., 1982 ), they should be sensitive to pitch changes and to frequency-selective habituation mechanisms. Thus, in the 1-stream condition, an habituation process developed during part 1, leading to a progressive reduction of the 21 Hz activity. This decrease was interrupted by the pitch discontinuity, which elicited an enhanced 21 Hz activity, selective of the new pitch of the 21-Hz component. In the 2-stream condition, the habituation process also developed and produced a diminution of the continuous 21 Hz activity, progressively until the end of the stimulus. Thus the 21 Hz response is more reduced in the 2-stream than in the 1-stream condition whereas the 29-Hz response is similar.
Therefore, effects on steady-state responses can be explained by the combination of two mechanisms, frequency selectivity and habituation of the auditory cortical fields. By modulating the weight of the 21-Hz response, these mechanisms lead to different amplitude ratios between 29 and 21 Hz activities in the two conditions. This ratio is greater in the 2-stream than in the 1-stream condition. In the case of pitch continuity (onset asynchrony), the 21-Hz response being highly reduced, the 29-Hz response becomes relatively more important. This could contribute to the increased saliency of the new-coming 29-Hz component, leading to the segregation into two streams. Conversely, in the case of pitch discontinuity (onset synchrony), the 21-Hz response being weakly reduced, the 29-Hz component tends to merge into the acoustic mixture and the two components are grouped into one complex stream.
Relationship with Electrophysiological Studies of Stream Segregation
Many studies have explored sequential stream segregation, but very few have investigated the neural mechanisms involved in the perception of sounds that overlap in time. With the exception of binaural cues, studies of concurrent sound segregation have focused on the mistuning of a component from an harmonic complex. Our results are partly consistent with previous human findings (Alain et al., 2001 , 2002 ; Dyson and Alain, 2004 ) showing that different electrophysiological waves originating in the auditory cortex (between 30 and 200 ms) were modulated according to the stimulus type, tuned or mistuned. The authors further proposed that their peak enhancement at 180 ms in case of mistuned stimuli could be due to an additional component, the ‘object-related-negativity’, associated with the segregation of simultaneous auditory objects. This effect could also be explained, as in the present experiment, by low-level processing of the acoustic contents which differed between tuned and mistuned stimuli.
In the present experiment, the electrophysiological effects can be directly related to acoustic differences between the two conditions (onset asynchrony or synchrony, i.e., pitch continuity or discontinuity), probably involving frequency selectivity and habituation mechanisms. In this concurrent sound context, the manipulation of onset synchrony induces the perception of one or two streams. The observed differences in the electrophysiological responses (due to acoustic differences between the two conditions) could thus participate in the construction of one or two stream percepts.
Therefore, the present study suggests that frequency selectivity and habituation mechanisms could be involved in the grouping or segregation of concurrent sounds. This is highly consistent with recent findings in the macaque PAC during sequential stream segregation (Micheyl et al., 2005 ). It has also been reported that sequential stream segregation could be based on low-level mechanisms, such as frequency selectivity and selective adaptation or suppression, in songbirds (Bee and Klump, 2004 ), in mustached bats (Kanwal et al., 2003 ), in monkeys (Fishman et al., 2001 , 2004 ) and in humans (Gutschalk et al., 2005 ; Snyder et al., 2006 ).
From the present neurophysiological observations, a possible interpretation of the dynamics of the neural interactions subtending concurrent sound perception could be proposed. Firstly, as can be observed on transient evoked responses, basic acoustic features are encoded in frequency-selective regions, without excluding a possible participation of sub-cortical relays to this selective encoding. Secondly, at a cortical level, gamma oscillations could mediate the binding of these active regions for the construction of a single coherent auditory stream. Lastly, steady-state activities in the primary auditory cortex, modulated by habituation mechanisms, could contribute to the maintenance of the 1- or 2-stream percept.
We could not definitely conclude whether the observed electrophysiological differences are related to the perception of two streams or to acoustic changes. However, we describe the neurophysiological correlates of the impact of the acoustic context on concurrent sound perception, specifically in terms of frequency selectivity and habituation mechanisms. The remaining questions are where and when in the brain these neural patterns could be integrated to build up stream percept. This study suggests that further investigations of gamma oscillations in auditory grouping could bring new insights on this phenomenon.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank Nicolas Grimault (CNRS-UMR5020, Lyon) for helpful discussions.