Neurophysiological Mechanisms Involved in Auditory Perceptual Organization

In our complex acoustic environment, we are confronted with a mixture of sounds produced by several simultaneous sources. However, we rarely perceive these sounds as incomprehensible noise. Our brain uses perceptual organization processes to independently follow the emission of each sound source over time. If the acoustic properties exploited in these processes are well-established, the neurophysiological mechanisms involved in auditory scene analysis remain unclear and have recently raised more interest. Here, we review the studies investigating these mechanisms using electrophysiological recordings from the cochlear nucleus to the auditory cortex, in animals and humans. Their findings reveal that basic mechanisms such as frequency selectivity, forward suppression and multi-second habituation shape the automatic brain responses to sounds in a way that can account for several important characteristics of perceptual organization of both simultaneous and successive sounds. One challenging question remains unresolved: how are the resulting activity patterns integrated to yield the corresponding conscious percepts?


INTRODUCTION
In our complex acoustic environment, we are confronted with a mixture of acoustic waves produced by several simultaneously active sources. However, we rarely perceive these sounds as incomprehensible noise. We are actually able to distinguish each sound source and to independently follow their emission over time. This capacity relies on the perceptual organization or auditory scene analysis: our auditory system groups together or dissociates acoustic components (or events) at a given instant and over time, leading to the perception of several simultaneous auditory streams.

PSYCHOLOGICAL FINDINGS ON AUDITORY SCENE ANALYSIS
The auditory scene analysis (ASA) has been largely investigated in psychoacoustical studies (reviewed in Bregman, 1990;Carlyon, 2004;Moore and Gockel, 2002). According to these experiments, several acoustic properties can infl uence sequential or simultaneous perceptual organization of sounds, following Gestalt laws: similarity, proximity, common fate and good continuity. Sequential organization has been mainly addressed with the "auditory streaming" phenomenon which corresponds to the perception of one or two streams from successive alternating acoustic events. Streaming is usually studied using simple sounds (Figure 1), such as sequences of pure tones alternating between two frequencies, A and B, according to a repeating ABAB or ABA_ABA pattern ('_' represents a silent gap). The percept induced by the tone sequence was found to depend on the frequency separation between the A and B tones and on the presentation rate (tempo) of these sounds. When the frequency difference is small and/or the tempo Harmonicity and pitch also play an important role in the simultaneous organization. Harmonic acoustic events share the same temporal periodicity and are usually heard as one sound, whereas a mistuned partial would rather be perceived as a pure tone separated from the other harmonic components ( Figure 1G).

NEUROPHYSIOLOGICAL MECHANISMS UNDERLYING SIMULTANEOUS ORGANIZATION
Neurophysiological mechanisms involved in auditory simultaneous organization remain equivocal. The main reason is the diffi culty to dissociate the neural activity specifi cally corresponding to either sound. This issue can be solved using overlapping sounds modulated at different frequencies, each sound eliciting a steady-state response at the same frequency as its amplitude is slow, the sequence is heard as one coherent stream of tones alternating in pitch. When the frequency difference is large and/or the tempo is fast, the sequence is perceived as two distinct streams, one with the frequency A and one with the frequency B. For intermediate separations and tempos, the percept can spontaneously fl ip from "one stream" to "two streams", and vice-versa, with an increased probability to hear two streams as the sequence progresses (streaming buildup). Other parameters, such as intensity, spatial location or timbre, can also infl uence the perception of sound sequences if the separation is large enough.
Simultaneous organization has been far less explored and is highly infl uenced by the synchronization of onsets or offsets of overlapping acoustic events ( Figure 1F). Acoustic events with an onset asynchrony inferior to 30 ms are more likely to be grouped into one auditory stream. Figure 1 | Schematic representation of stimuli traditionally used to investigate auditory perceptual organization. Sequential segregation can be explored using alternating pure tones following an ABAB (A,C) or an ABA_ (B,D) pattern ("_" corresponds to a silence). When the frequency separation between A and B tones is small, one stream is perceived (A,B). When this frequency separation is large, two streams are perceived (C,D). Simultaneous organization can be xplored using multiple synchronous and harmonic acoustic components (E). If some of theses component are asynchronous (F) or mistuned (G), two sounds are perceived.
modulation. In a recent study, the neurophysiological mechanisms involved in the perceptual organization of concurrent sounds were investigated using this electrophysiological tagging (21 and 29 Hz amplitude-modulated sounds) and intracortical signal directly recorded from the auditory cortex of epileptic patients (Bidet-Caulet et al., 2007A). This work revealed that different mechanisms, described in the following, are involved in the segregation or grouping of overlapping components as a function of the acoustic context.

FREQUENCY SELECTIVITY AND ONSET RESPONSE SYNCHRONIZATION
In this study, sound onset asynchrony was manipulated to induce the segregation or grouping of two concurrent components: either the 21-Hz component was starting 800 ms before the 29-Hz component (pitch continuity of the 21-Hz component), resulting in the perception of two streams; or the 21 and 29 Hz components were synchronous, preceded by a 21-Hz component at a distinct pitch (pitch discontinuity of the 21-Hz component), leading to the percept of one sound (Figure 2A). Transient evoked responses in secondary auditory areas (Figure 3) were found larger for pitch discontinuity than for pitch continuity. This can be explained by the frequency selectivity of auditory areas: the pitch discontinuity of the 21-Hz component activates a new neural population at the same time as the 29-Hz component onset, resulting in larger transient responses (Figures 2B,C). This fi nding suggests that synchronization of transient responses could account for grouping of overlapping auditory components.

FREQUENCY-SELECTIVE HABITUATION
During the overlap of the 21 and 29 Hz components (sound competition), the 21-Hz steadystate response (SSR), generated in the primary auditory cortex, PAC (Figure 3), was found larger for pitch discontinuity than for pitch continuity. A decrease of the 21-Hz SSR was observed over the course of the sound, suggesting the involvement of habituation mechanisms. In the case of onset asynchrony, 21-Hz SSR is continuously reduced by habituation mechanisms, resulting in a small 21-Hz SSR during sound competition; whereas in the case of onset synchrony, 21-Hz SSR reduction by habituation mechanisms is interrupted by the pitch discontinuity leading to a larger 21-Hz SSR during sound competition ( Figure 2D). By varying the weight of the 21-Hz response (the 29-Hz SSR being unaffected), frequency-selective habituation mecha-nisms modulate amplitude ratios between 29 and 21 Hz activities. In the case of pitch continuity, the 21-Hz response being highly reduced, the 29-Hz response becomes relatively more important. This could contribute to the increased saliency of the new-coming 29-Hz component, leading to the segregation into two streams. Conversely, in the case of pitch discontinuity, the 21-Hz response being slightly reduced, the 29-Hz component tends to merge into the acoustic mixture and the two components are grouped into one complex stream. Selective attention has been shown to modulate the cortical representation of concurrent sounds by increasing the SSR to relevant sound and decreasing it to irrelevant ones, resulting also in a modifi cation of the amplitude ratio between the cortical representations of each sound (Bidet-Caulet et al., 2007B). One can imagine that when the ratio is largely in favor of one component, two distinct sounds would be perceived with one being more salient; whereas when the ratio is close to one, the components would be grouped and perceived as one complex sound. Thus, the interplay between habituation, attention and other mechanisms could infl uence the cortical representation of sounds and be involved in maintaining one percept and/or in percept shifting.

GAMMA OSCILLATORY ACTIVITIES AND PERCEPTUAL BINDING
In secondary auditory areas, induced oscillatory activities in the gamma range (50-90 Hz) were found more pronounced when one stream, rather than two, was perceived. This effect could be explained by the pitch discontinuity producing, in addition to the 29-Hz onset, a gamma response. However, oscillatory activities have been proposed as the neural mechanism promoting the interaction between different neural populations which are involved in processing distinct components of the same object (Tallon-Baudry and Bertrand, 1999), and thus could play an important role in the construction of coherent percepts (Vidal et al., 2006). Bidet-Caulet et al. (2007A) fi ndings are consistent with this hypothesis, since induced gamma oscillations were larger when the two components are grouped together into one complex and coherent sound. Therefore, gamma oscillations could integrate and bind acoustic processing of the different components, which is performed by distinct groups of neurons, and directly refl ect the auditory percept.

OTHER MECHANISMS
Concurrent sound perception has also been explored using scalp EEG and harmonic com-

Sound source
Physical object emitting sounds.

Acoustic component or event
Sound emitted by a sound source.

Auditory stream
Mental representation of successive or overlapping acoustic events emitted by the same sound source.

Sequential organization
Segregation or integration of successive acoustic events over time into streams.

Simultaneous organization
Segregation or integration of overlapping acoustic events at a given instant into streams.

Frequency selectivity
Frequency selectivity refers to neurons in the auditory system that are responding to specifi c frequency band(s). In addition, at many levels of the auditory system, neurons are spatially organized so that neighboring neurons are selective to tones close to each other in frequency, resulting in a topographic organization called tonotopy.

Multi-second adaptation/habituation
Mechanism inducing a response decrement following stimulus repetition over hundreds of milliseconds and characterized by response recovery to a change stimulus and dishabituation to a previously habituated stimulus after a change stimulus.

September 2009 | Volume 3 | Issue 2 | 185
Bidet-Caulet and Bertrand Brain mechanisms of auditory organization Hz. 29-Hz SSR (purple) is observed only during sound competition with similar amplitudes in both conditions. 21-Hz SSR (blue) is observed during the whole stimulus duration, its amplitude is decreasing with time, probably because of habituation mechanisms. In the case of pitch continuity, habituation mechanisms are not interrupted, resulting in a 21-Hz SSR of small amplitude during sound competition. When habituation mechanisms are interrupted by the pitch discontinuity, the 21-Hz SSR is of larger amplitude during sound competition. Thus, frequency-selective habituation mechanisms modulate amplitude ratios between 29 and 21 Hz SSR. In the case of pitch continuity, the 21-Hz SSR being highly reduced, the 29-Hz response becomes relatively more important, resulting in an increased saliency of the new-coming 29-Hz component. Conversely, in the case of pitch discontinuity, the 21-Hz response being slightly reduced, the 29-Hz component tends to merge into the acoustic mixture and the two components are grouped into one complex stream. Importantly, this result interpretation is valid if the responses are generated in auditory areas with frequency-selective neurons, but not necessarily tonopically organized.  Fishman et al. (2001) investigated, for the fi rst time, the neural responses to "streaming" sequences. They recorded both multi-unit activity and LFP from the PAC of awake macaques. The A-tone frequency was adjusted to be close to the best frequency of the recording site, whereas the B-tone frequency differed from A-tone frequency by 10-50%. They found that (1) the faster the tempo, the more the neural response to non-best frequency B tones was attenuated, i.e. the neural activity of the recorded site was mainly composed of responses to best frequency A tones at a rate twice slower than the tone presentation rate, and (2) the more the B-tone frequency was different from the best frequency of the recording site, the stronger the suppression of responses to B tones (see Figure 4A for similar results in a more recent study). Thus, the responses found in the PAC were infl uenced by the tempo and frequency separation of A and B tones in the same way than the percept induced by these sequences in Human. The authors suggested a forward suppression mechanism which reduces the neural response to a stimulus because of the preceding one, especially when the sounds are close in time. This suppression mechanism is frequency-selective since it is more pronounced for non-best frequency tones than for best frequency tones.

Frequency-selectivity and forward suppression
They proposed a physiological model of stream segregation, based on the PAC tonotopic organization: plexes (see Alain, 2007 for a review). When all components are tuned, one sound is perceived; whereas when one component is mistuned, two sounds are heard. A negative temporal wave, named "object related negativity" (ORN) has been observed around 180 ms in response to mistuned complex. Even if the amplitude of this wave was correlated with the probability of hearing two distinct sounds, we cannot infer whether this response refl ects acoustic or perceptual changes. However, this ORN has also been observed when segregation is induced by inter-aural differences or pitch differences between two vowels. Thus, the ORN does not seem to be specifi cally related to mistuned harmonic and could more generally index the perception of two different sounds (McDonald and Alain, 2005). However a similar component was not observed in the human auditory cortex when sound segregation was induced by onset asynchrony (Bidet-Caulet et al., 2007A).

NEUROPHYSIOLOGICAL MECHANISMS UNDERLYING SEQUENTIAL ORGANIZATION
Neurophysiological mechanisms involved in auditory sequential organization have been mostly investigated with "streaming" protocol (reviewed in Micheyl et al., 2007;Snyder and Alain, 2007). Animal single-unit, multi-unit, local-fi eld potential (LFP) and human scalp EEG recordings have suggested the involvement of frequency selectivity, forward suppression and habituation in sequential organization, in a very similar way as for simultaneous organization. Steady-state responses were found larger when one sound rather than two was perceived in Heschl's gyri. Transient evoked responses and induced gamma oscillations were found larger when one sound rather than two was perceived in secondary auditory areas in the superior temporal gyrus. PAC: primary auditory cortex (red circles), HG: Heschl's gyrus, STG: superior temporal gyrus.

Forward suppression
A powerful reduction of neuron responsiveness elicited by brief stimuli and that can persist for hundreds of milliseconds, the shorter the delay between two successive stimuli, the stronger the reduction. in Human (behavioral measure). They found that both probabilities were showing the same trend as a function of frequency separation and time ( Figure 4C). This model extends the one by Fishman et al. (2001) by adding habituation mechanisms which reduce responses to A and B tones over time: responses exceeding the threshold at the beginning of the sequence can be reduced below the threshold, leading to a percept switch without any concomitant change in the acoustic content (Figures 5B,D). Interestingly, these results were replicated using recordings from the cochlear nucleus of anaesthetized guinea pig : neural responses displayed frequency selectivity, forward suppression and multi-second adaptation and predicted the perception of a sequence. These results were also true for the bushy neurons of the cochlear nucleus receiving direct input from the auditory nerve, raising the possibility that these mechanisms are already present at the level of the auditory nerve.
These studies highly suggest that multi-second adaptation or habituation plays an important role in the switch from one-stream to two-stream percept corresponding to the buildup of streaming. This is consistent with behavioral results in Human showing that the percept is infl uenced, over several seconds, by previous sounds perceived as one or two streams (Snyder et al., 2008). However, these results do not explain why the perception of the same sequence can spontaneously alternate between one and two streams and vice-versa (Pressnitzer and Hupe, 2006). Although perceptual bistability could be explained by bottom-up processes (Hupe et al., 2008;Noest et al., 2007), top-down mechanisms, such as attention, intention or knowledge, might also be involved in these perceptual switches.

Relation between neural and psychoacoustical responses
To build models of stream segregation, different variables were used to predict the percept: the ratio between multi-unit activity to B and A tones (Fishman et al., 2001(Fishman et al., , 2004, the difference between fi ring rate to A and B tones (Bee and Klump, 2004) and a threshold (Micheyl et al., 2005;Pressnitzer et al., 2008). To some extent, all these variables depend from each other. They were used to correlate with different psychophysical measures. Indeed, Bee's variable is well correlated with the fi ssion boundary (when subjects cannot avoid hearing one stream even if they try to perceive two streams), but not with the temporal coherence boundary (when subjects can not avoid hearing two streams even if they try to perceive one stream); whereas • for large frequency separation, A and B tones activate different neural populations, producing the perception of two streams ( Figure 5C); • for small frequency separation, A and B tones activate nearly the same neural population, inducing the perception of one stream ( Figure 5A); • for intermediate frequency separation (Figure 5B), A and B tones activate overlapping neural populations; if the tempo is slow the overlap is large enough to induce a onestream percept; whereas if it gets faster, suppression mechanism differentially reduces response to best and non-best frequency tones and, consequently, the overlap between neural populations activated by A and B tones, resulting in the perception of two streams.
Therefore, a frequency-selective suppression mechanism amplifi es the spatial separation between neural populations activated by A and B tones by narrowing the neuron receptive fi elds. The more separated these populations, the more likely the sequence would be perceived as two distinct streams.

Neural adaptation or habituation
To investigate the buildup of stream segregation (increased probability to hear two streams as the sequence progresses), Micheyl et al. (2005) recorded single unit activity in the PAC of the awake macaque in response to ABA triplets with different frequency separations. They replicated Fishman et al. (2001) results by showing a reduction of fi ring rate to B tone with increasing frequency separation ( Figure 4A). More interestingly, they observed that responses to all tones were decreasing from the fi rst to the last triplet embedded in the same sequence, irrespective of the frequency separation. The responses were decreasing strongly during the fi rst 2 s of the sequence and then slowly until the end of the sequence (Figure 4B), suggesting the involvement of multi-second adaptation or habituation mechanisms. Micheyl et al. (2005) computed the probability that the response to B tones (recorded from neurons tuned to A-tone frequency) exceeds a specifi c threshold and used this value as an estimate of the probability that the same sequence is perceived as two streams in the macaque. They compared this estimate with the probability that the sequence is perceived as two streams on the frequency separation, this correlation does not infer if response modulation refl ects acoustic or perceptual changes.
At fast tempo, the evoked response reduction with decreasing frequency separation in human could be directly related to the neural response suppression observed in animal PAC (Figure 5). Conversely, at slow tempo, no reduction was observed in animals; whereas human brain responses are reduced with decreasing frequency separation (Gutschalk et al., 2005). This suggests that the modulations of these human brain responses do not only refl ect differential encoding in the PAC, and that processing in higher level auditory areas could also account for properties of auditory percept construction.
Interestingly, Gutschalk et al. (2005) measures P1 and N1 amplitudes in response to A and B tones embedded in a same sequence inducing the percept of one or two streams. The N1 and P1 amplitudes in response to B and second A tones were found larger when the subjects reported the percept of two streams rather than one. This result is in the same direction as the fi nding with acoustic manipulations (frequency separation) and suggests that the relationship between the evoked response amplitudes and the percept are not only driven by physical stimulus changes. However, an infl uence of attention mechanisms on these responses can not be ruled out. Along this line, Snyder et al. (2006) could fi nd, only in an attentive condition, a component of temporal origin paralleling the buildup of stream segregation over time, observed psychologically.
These enhanced transient responses to twostream percept for sequential organization could appear in contradiction with the stronger transient responses to one-stream percept for simultaneous organization reported above. However, one should keep in mind that, in the case of concurrent sounds (see Figure 2), we are comparing responses to one sound (onset asynchrony) with responses to two sounds (onset synchrony), whereas in the case of alternating sounds we are comparing responses to each sound, separately.

CONCLUSION
It is noteworthy that similar mechanisms, namely frequency selectivity, forward suppression and multi-second habituation, have been found involved in both simultaneous and sequential organizations, with multi-second habituation most likely participating in the maintenance and evolution of the percept over time for both organizations. These basic mechanisms, known to be involved in the processing of acoustic properties, have been observed in the auditory cortex, Fishman's variable predicts well both boundaries. Finally, Micheyl's threshold is good to estimate the buildup of streaming. It is also noteworthy that all these models share the assumption that neural activation pattern evoked in A1 are "read out" by other neurons which behave as binary classifi ers according to a specifi c variable.
From these previous models, we propose a model of stream segregation (Figure 5) based on the separation between the neural populations activated by each tone and on a specifi c threshold: (1) if responses to A and B tones exceed this threshold within the same neural population, one stream is perceived, whereas (2) if responses to A and B tones exceed this threshold in nonoverlapping neural populations, two streams are heard. Our model seems to be based on frequency selectivity and to only account for the segregation of pure tones with distinct frequencies. However, psychophysical studies have demonstrated that perceptual organization does not only depend on frequency parameters, but also on intensity, location, variations over time… Forward suppression and habituation mechanisms could also be selective to other acoustic properties than frequency and might be a general neural mechanism subserving perceptual organization, amplifying the separation between populations activated by the sounds to segregate. Our model takes into account these possibilities since it is based on the overlap and separation between the neural populations activated by each sound, the more different the sounds, the less overlap between the two neural populations, the more likely two streams are perceived. Therefore, our model extends previous models (mainly based on the responses to A and B sounds of the best-frequency neuron) to the population level.

RECORDINGS IN HUMAN
Human studies have the advantage to compare, in the same subjects, brain responses and percepts induced by the sequences. Two studies in MEG (Gutschalk et al., 2005) and EEG (Snyder et al., 2006) found that evoked responses (P1, N1 and P2) to B tones of the ABA triplet were increasing with larger frequency separation and longer interval between tones (Figure 5, right panel), in agreement with previous fi ndings showing that N1 amplitude depends on the frequency separation and time interval between two successive sounds (e.g. Picton et al., 1978). Moreover, for a given tempo, the evoked response enhancement was found to be correlated to the percept; the larger P1, N1 and P2 responses, the more likely two streams are perceived. However, as both the percept and evoked response amplitude depend ary auditory areas, were found to be related to the percept rather than the acoustic content, and could also index the percept Gutschalk et al., 2005;Snyder et al., 2006). As multiple mechanisms seem to interact to shape neural responses, several mechanisms could also be involved in reading and interpreting the resulting patterns.
Taken together these results suggest that various mechanisms at different levels of the auditory pathway are involved in auditory perceptual organization. Further research is needed to elucidate which mechanisms actually underlie conscious percepts. In particular, further investigations of oscillatory activities could bring new insights in auditory perceptual organization mechanisms.

ACKNOWLEDGMENTS
The authors wish to acknowledge the contributions of Catherine Fischer, Françoise Lecaignard and Pierre-Emmanuel Aguera for data recording and analysis on the research presented in the focused review. and at a stage of auditory processing as early as the cochlear nucleus. These fi ndings only demonstrate that neural responses recorded from the auditory cortex or the cochlear nucleus can account for several important characteristics of auditory organization, but they do not prove that the conscious percepts are actually determined at one of this level. It is more likely that percepts are actually formed at a higher level.
A possible scenario would be that basic mechanisms, such as frequency selectivity, suppression, habituation and others, would shape the automatic responses to acoustic properties in subcortical and cortical auditory areas. Then, the resulting patterns of neural activation would be interpreted in higher level areas and/or at a later processing stage to construct the percept. The most challenging question remains: How are these activity patterns transformed, integrated, read out to yield different percepts? Oscillatory activities constitute a good candidate to achieve this binding and underlie the conscious percept. Some cortical evoked brain responses, most likely to be generated in second-