The Functional Role of Neural Oscillations in Non-Verbal Emotional Communication

Effective interpersonal communication depends on the ability to perceive and interpret nonverbal emotional expressions from multiple sensory modalities. Current theoretical models propose that visual and auditory emotion perception involves a network of brain regions including the primary sensory cortices, the superior temporal sulcus (STS), and orbitofrontal cortex (OFC). However, relatively little is known about how the dynamic interplay between these regions gives rise to the perception of emotions. In recent years, there has been increasing recognition of the importance of neural oscillations in mediating neural communication within and between functional neural networks. Here we review studies investigating changes in oscillatory activity during the perception of visual, auditory, and audiovisual emotional expressions, and aim to characterize the functional role of neural oscillations in nonverbal emotion perception. Findings from the reviewed literature suggest that theta band oscillations most consistently differentiate between emotional and neutral expressions. While early theta synchronization appears to reflect the initial encoding of emotionally salient sensory information, later fronto-central theta synchronization may reflect the further integration of sensory information with internal representations. Additionally, gamma synchronization reflects facilitated sensory binding of emotional expressions within regions such as the OFC, STS, and, potentially, the amygdala. However, the evidence is more ambiguous when it comes to the role of oscillations within the alpha and beta frequencies, which vary as a function of modality (or modalities), presence or absence of predictive information, and attentional or task demands. Thus, the synchronization of neural oscillations within specific frequency bands mediates the rapid detection, integration, and evaluation of emotional expressions. Moreover, the functional coupling of oscillatory activity across multiples frequency bands supports a predictive coding model of multisensory emotion perception in which emotional facial and body expressions facilitate the processing of emotional vocalizations.


INTRODUCTION
Effective communication is crucial for the formation and maintenance of social relationships in complex societies. Emotional communication is a complex process where the expression and perception of emotional signals exchanges information about internal affective states. While some of these signals can be expressed through verbal means, much of our emotional communication occurs nonverbally through changes in facial, body, and vocal expressions. Therefore, our ability to perceive and interpret nonverbal expressions of emotion can have a profound impact on the quality of our social interactions, affecting our mental health and wellbeing. To this end, deficits in emotion perception have been observed in a number of neurological and psychiatric conditions (Phillips et al., 2003;Garrido-Vásquez et al., 2011) and may negatively correlate with subjective quality of life in a number of these conditions (i.e., Phillips et al., 2010Phillips et al., , 2011Fulford et al., 2014). Despite this importance, the neural mechanisms and dynamics underpinning the perception of emotional cues within and between sensory modalities is poorly understood. This review explores the functional role of neural oscillations in mediating neural communication within and between sensory modalities in order to facilitate the detection, integration, and evaluation of emotional expressions.
Emotions are commonly defined as brief, coordinated neural, physiological, and behavioral responses to relevant events (Scherer, 2000). These responses can manifest behaviorally as changes in facial expression, body language, tone of voice (prosody), or any combination thereof. Thus, emotion perception can be described as the process of detecting salient signals, integrating those signals with prior knowledge of emotional meaning, and evaluating the integrated representation within the context of the current environment. According to current models, emotion perception of visual (Adolphs, 2002a;De Gelder, 2006), auditory (Schirmer and Kotz, 2006;Wildgruber et al., 2009;Kotz and Paulmann, 2011), and audiovisual (Brück et al., 2011) signals unfolds in three fast yet distinct stages: detection, integration, and evaluation.
The first stage consists of early perceptual processing in what are traditionally considered modality-specific cortices. For visual expressions of emotion, this includes regions of the occipito-temporal cortex, most notably the fusiform gyrus (Adolphs, 2002a,b;Vuilleumier and Pourtois, 2007) with distinct subregions of the fusiform gyrus responding preferentially to facial and body expressions (Schwarzlose et al., 2005), and extriate body area (Grèzes et al., 2007;Kret et al., 2011;Meeren et al., 2013). Early detection of complex acoustic cues occurs in the belt region of the primary auditory cortex (Woods and Alain, 2009) and later in multiple voice-sensitive areas in the temporal lobe (Belin et al., 2004;Wiethoff et al., 2008;Ethofer et al., 2009Ethofer et al., , 2011Pernet et al., 2015).
Following the extraction of low-level visual and acoustic features, a more detailed representation of the emotional expression is generated in the superior temporal sulcus (STS). Evidence from neuroimaging research suggests a functional subdivision within the STS with face-sensitive regions in the posterior terminal ascending branch and voice-sensitive regions in the trunk section . Functional differentiation between middle and anterior regions of the STS has also been noted during the perception of emotional vocal expressions (Kotz and Paulmann, 2011). Receiving input from both visual and auditory cortices, the STS also plays a key role in audiovisual integration (i.e., Calvert et al., 2000Calvert et al., , 2001Beauchamp et al., 2004a,b;Stevenson et al., 2007;Stevenson and James, 2009). To this end, facial and vocal expressions activate overlapping face-and voice-sensitive regions within the STS, suggesting that the STS is essential for the integration of audiovisual emotional information (Robins et al., 2009;Watson et al., 2013Watson et al., , 2014. In the final stage, the behavioral and motivational significance of the expression is interpreted and evaluated within the inferior frontal gyrus (IFG; Frühholz and Grandjean, 2013a) and orbitofrontal cortex (OFC; Adolphs, 2002b;Kotz et al., 2012). Involved in the processing of reward and punishment (Kringelbach and Rolls, 2004;Rolls, 2004), the OFC is thought to be involved in the representation of stimulus value across sensory modalities. Thus, during emotion perception, the OFC may be responsible for evaluating the emotional value of the expression within the context of the current environment.
In addition to these cortical regions, many studies also support a key role for subcortical structures such as the amygdala and basal ganglia in the perception of emotions. For example, the amygdala has been implicated in the processing of facial (i.e., Phillips et al., 1997;Blair et al., 1999;Whalen et al., 2001Whalen et al., , 2013Williams et al., 2004), body (Hadjikhani and de Gelder, 2003;Grèzes et al., 2007), and vocal (Fecteau et al., 2007;Frühholz and Grandjean, 2013b) expressions in both early and late stages of emotion perception. Studies also support a key role for the basal ganglia in the processing of facial (Adolphs, 2002b) and vocal (Kotz et al., 2003) expressions. Furthermore, deep brain stimulation of the basal ganglia, specifically the subthalamic nucleus, can impair emotion perception from facial and vocal expressions (Péron et al., 2010a,b). Given the importance of the basal ganglia in other aspects of emotion processing (i.e., subjective feeling and production of emotional expressions), it has been proposed that the basal ganglia coordinates the synchronization of different components of emotion processing (Péron et al., 2013). While the amygdala appears to be involved in both the early and late stages of emotion perception, consistent with dual-pathway models of emotion processing (LeDoux, 1996), basal ganglia activity is more often observed in the later stages as a function of attention .
In sum, the perception of emotion from facial, body and vocal expressions involves a distributed neural network of cortical and subcortical structures. The question then becomes, how does the brain selectively attend to and integrate these signals across space and time in order to give rise to a unified representation of an emotional expression?
The investigation of such rapid online processing of dynamic changes in sensory input requires adequate methods to capture neural information processing in real time. Electroencephalography (EEG) and magnetoencephalography (MEG) are particularly well suited for the study of emotion perception due to their millisecond temporal resolution.
Results from event-related potential (ERPs) suggest differentiation between emotional and neutral facial expressions within 120 ms of stimulus onset (Eimer and Holmes, 2002;Eimer et al., 2003). The time course of emotion-related effects on evoked responses to vocal expressions depends on the stimulus type with earlier effects for affective bursts such as laughs and screams (Liu et al., 2012) compared to changes in emotion prosody (Paulmann and Kotz, 2008;Paulmann et al., 2012Paulmann et al., , 2013Pell et al., 2015). These early ERP effects are thought to reflect rapid detection of salience. Visual (Stefanics et al., 2012) and auditory (Schirmer et al., 2005) deviance detection, in the form of the mismatch negativity (MMN) is observed at approximately 200 ms, supporting the idea that integration of emotional signals occurs at this stage. Across domains, emotional expressions also elicit a sustained positivity (the late positive component or LPC) beginning between 300-400 ms post-stimulus, reflecting the interpretation and evaluation of emotional significance (visual, Eimer and Holmes, 2007;auditory, Paulmann et al., 2013).
Taken together, findings from ERP studies largely support the staged models of visual and auditory emotion perception and further establish a time course for the stages of detection, integration, and evaluation of emotional expressions. While studies of ERPs have undoubtedly advanced our understanding of the time course and neural bases of emotion perception, they can only provide limited insight into the dynamic interaction within and between nodes of functional neural networks. That is, we know substantially more about when and where certain processes may occur, than about how these processes arise and unfold within the human brain.
Neural oscillations, which reflect rhythmic fluctuations in the synchronization of neuronal populations, provide a measure of the dynamic interactions within and between regions involved in the different stages of emotion perception. Changes in oscillatory activity are commonly analyzed by treating the on-going EEG (or MEG) signal as the sum of pure sinusoids, which are separated into characteristic frequency bands each associated with distinct cognitive and computational operations. Decomposing the EEG/MEG signal into its constituent sinusoids, allows for the measurement of changes in power (amplitude) and phase within and between each frequency band, at different time points and in different brain regions. While increases or decreases in power-referred to as event-related synchronization (ERS) or desynchronization (ERD) respectively-indicate changes in neural synchronization within a specific node or region, phase coherence across brain regions reflects synchrony between brain regions that make up a functional neural network (Bastiaansen et al., 2012). According to one hypothesis, phase coherence enables the effective communication between neuronal populations (Fries, 2005). Moreover, cross-frequency coupling may facilitate the integration of information across different spatial and temporal scales (Canolty and Knight, 2010). Thus, neural oscillations can provide an index of the dynamic interaction between brain regions involved in emotion perception as well as a plausible mechanism by which the brain can integrate rapidly changing emotional information from facial, body, and vocal expressions. To date, the majority of studies investigating emotion perception have focused solely on power changes. Thus, this review will primarily focus on ERS and ERD, while noting the critical importance of phase coherence and cross-frequency coupling in elucidating the functional dynamics of emotion perception within and between sensory modalities.

PERCEPTION OF EMOTION FROM FACIAL EXPRESSIONS
Facial expressions are by far the most commonly studied means of emotional communication. In a typical study, participants are presented with images of facial expressions and asked to respond to the emotion (explicit) or identity/gender (implicit) of the face. Using this type of paradigm, studies have found changes in oscillatory activity across multiple frequency bands during the perception of emotion from facial expressions.

Delta
Delta oscillations have been implicated in a wide range of processes including the perception of faces and facial expressions (Knyazev, 2012;Güntekin and Bas , ar, 2015). While frontal delta synchronization is characteristic of many more ''cognitive'' tasks, face processing is associated with delta synchronization over more posterior regions (Güntekin and Bas , ar, 2009). Moreover, emotional expressions appear to induce stronger delta synchronization than neutral expressions over occipitoparietal regions, which is suggested to reflect stimulus updating (Balconi and Lucchiari, 2006;Balconi and Pozzoli, 2007). Effects of emotion on delta oscillations have also been observed over fronto-central regions, correlating with behavioral measures emotional involvement (Knyazev et al., 2009b). Of note here is that the studies observing occipitoparietal delta synchronization have typically used passive viewing paradigms while Knyazev et al. (2009a) used both implicit (gender identification) and explicit (emotion categorization) tasks. Thus, emotion may differentially affect delta responses to facial expressions depending on task demands. Together these findings suggest a role for delta oscillations in the perception of emotional facial expressions; yet the functional significance of delta synchronization in this context remains unclear. Further research is needed in order to determine more precisely the functional role of delta oscillations within the context of emotion perception.

Theta (4-7 Hz)
Most commonly associated with memory encoding and retrieval (Klimesch, 1999), theta band oscillations are thought to play a key role in the processing of emotion (Knyazev, 2007). To this end, recent studies have shown enhanced theta synchronization for emotional compared to neutral facial expressions, suggesting that theta oscillations may facilitate the rapid encoding of emotionally salient sensory information. For instance, Balconi and colleagues have observed enhanced theta synchronization over predominantly right frontal regions of the scalp between 150-250 ms extending into the later time window of 250-350 ms which they suggest reflects the orienting of attention toward the emotional significance of the stimulus during the early stages of conceptual processing (Balconi and Lucchiari, 2006;Balconi and Pozzoli, 2009). Similar results were reported by Knyazev et al. (2009bKnyazev et al. ( , 2010, who found increased early theta ERS over right frontal regions during the implicit processing of emotional facial expressions, that is, when participants performed a gender categorization task in which attention was directed away from the emotional content of the stimulus. Furthermore, these authors observed a second distinct theta ERS between 230-350 ms that was greater when the emotional content of the stimulus was processed explicitly during an emotion categorization task. Source localization revealed differential activation in the right parietal cortex (angry) and insula (happy) in the early time window and left temporal lobe (angry) and bilateral PFC (happy) in the later time window. Interestingly, some studies have also observed theta synchronization over more posterior (occipital and occipitoparietal) regions within a similar (early) time window (i.e., Bas , ar et al., 2006;Pozzoli, 2007, 2008), an effect that increases as a function of visual awareness (Zhang et al., 2012). However, the extent to which this theta synchronization is emotion-specific may be called into question on the basis of a study by González-Roldan et al. (2011) showing no effect of emotion on theta synchronization during an explicit task. Instead, the authors observed a main effect of intensity on theta ERS between 200-400 ms over frontal, central, and parietal regions. This may suggest that theta synchronization in response to emotional facial expressions may reflect facilitated encoding of the biological or motivational significance rather than the emotional quality of the expression per se. That is, emotional expressions (relative to neutral expressions) contain more behaviorally relevant sensory information, which reduces uncertainty, resulting in stronger neural synchronization in the theta frequency. This enhanced theta synchronization facilitates the dynamic between brain regions involved in the early detection and integration of static emotional facial expressions.

Alpha (8-12 Hz)
As first noted by Berger (1929), neural oscillations in the alpha frequency band show strong synchronization over occipital regions in the absence of visual stimulation (i.e., with eyes closed). Based on further evidence showing alpha ERS over cortical regions not necessary for a given task, alpha synchronization was initially taken as an indicator of cortical idling (Pfurtscheller et al., 1996). However, more recent hypotheses suggest that alpha synchronization serves an active role in the inhibition of task-irrelevant brain regions (Klimesch et al., 2007;Jensen and Mazaheri, 2010). The rhythmic fluctuation of alpha oscillations thus produces temporal windows in which neurons are more or less likely to fire. Larger amplitudes (reflecting stronger inhibition) result in smaller temporal windows and thus more precise timing of neuronal firing. Smaller amplitudes, associated with release of inhibition, result in greater cortical excitability over longer temporal intervals. Within the context of emotion perception, alpha oscillations may be involved in the selective attention to emotionally salient social cues through active inhibition of task-irrelevant regions and pathways. It is notable, however, that many studies using static faces have found no difference in alpha synchronization between emotional and neutral expressions (Balconi and Lucchiari, 2006;Balconi and Pozzoli, 2007. Differences in alpha power emerge more reliably when comparing expressions of positive and negative valence. While perception of negative emotional expressions was associated with right-lateralized alpha ERD, perception of positive emotional expressions was associated with leftlateralized alpha ERD (Balconi and Ferrari, 2012). Although greater when facial expressions were presented supraliminally, these valence-specific differences were also observed when expressions were presented subliminally. Further support for these findings comes from a study by Del Zotto et al. (2013) showing valence-specific lateralization of frontal alpha power in a patient with cortical blindness. Results from this study showed that alpha ERD was greatest for fear compared to happy expressions over right frontal regions even though the patient could not report seeing the stimuli. Other evidence suggests that alpha synchronization over posterior regions may also differentiate between stimuli of negative and positive valence (Bas , ar et al., 2006), though this effect was only observed when selecting the stimuli with the most extreme valence ratings for analysis.
While studies using static stimuli highlight the roles of valence in alpha responses to facial expressions, they have two important limitations. Firstly, in naturalistic human communication, facial expressions are inherently dynamic and therefore the extent to which these findings would be valid in naturalistic settings is unclear. Secondly, although providing a rough estimate as to the topographical distribution of alpha ERD, these studies only provide limited insight into the patterns of functional connectivity underpinning the perception of emotion from facial expressions. Addressing these issues, a recent MEG study used dynamic facial expressions to explore changes in spatial connectivity during emotion perception (Popov et al., 2013). Findings from this study provide evidence for two stages of upper alpha desynchronization during facial emotion perception: a prerecognition stage associated with increased alpha power over frontal and sensorimotor regions and decreased alpha power over occipital regions followed by a post-recognition stage associated with the reversed pattern. Moreover, these power changes were associated with inverse patterns of functional connectivity, suggesting that alpha synchronization and desynchronization may regulate the exchange of information between visual and sensorimotor. That these effects were stronger in response to emotional compared to neutral expressions implies that emotion may enhance the functional coupling, facilitating recognition of facial expressions of emotion.

Beta (13-30 Hz)
Oscillatory activity in the beta frequency is typically associated with sensorimotor processing (Brovelli et al., 2004). However, recent evidence suggests a broader role for beta synchronization in the maintenance of current sensory, motor, and cognitive sets (Engel and Fries, 2010). Beta band oscillations have also been implicated in the perception of emotion from facial expressions. However, the direction, time course, and topography of beta modulation vary considerably between studies. For example, Güntekin and Bas , ar (2007a) found increased beta power for angry compared to happy expressions over frontal and central regions. In a similar study including occipital electrodes, however, the authors found no main effect of emotion on beta band activity (Güntekin and Bas , ar, 2007b). Thus, it seems that only fronto-central beta synchronization reflects differentiation between emotional and neutral facial expressions. Other evidence suggests that such differences in beta synchronization may also be modulated by attention. To this end, asymmetry in restingstate parietal beta band activity has been negatively correlated with attentional bias towards angry facial expressions (Schutter et al., 2001).
Given the importance of beta oscillations in the perception of biological motion, which is thought to be critical for social cognition in naturalistic environments (Pavlova, 2012). Jabbi et al. (2015) used MEG to compare evoked beta band activity in response to dynamic and static facial expressions. Perhaps unsurprisingly, greater beta power was observed for dynamic compared to static facial expressions in occipital, superior temporal and sensorimotor cortices. When comparing dynamic emotional to neutral expressions, the authors found stronger beta power in regions such as the amygdala, STS, and OFC. Furthermore, beta power in the left STS was negatively correlated with the time course of fearful facial expressions but positively correlated with the time course of happy facial expressions. These emotion-specific differences suggest that the observed changes in beta power were not solely due to the processing of biological motion. Although this study investigates evoked rather than induced oscillatory activity, its findings support a putative role for beta oscillations-particularly within the STS-in tracking the temporal dynamics of facial expressions of emotion.

Gamma (>30 Hz)
Reflecting neuronal communication on a more local scale, gamma oscillations have been implicated in a number of cognitive processes including feature integration (Singer and Gray, 1995) and sensory selection (Fries et al., 2002). Within the context of emotion perception, event-related gamma synchronization has been commonly used to explore the functional dynamics underpinning the conscious and unconscious processing of emotional facial expressions. Studies investigating the spatial and temporal dynamics of emotion perception support a dual-pathway model of emotion perception consisting of a cortical and subcortical pathway (i.e., LeDoux, 1996). Accordingly, in an MEG study, Luo et al. (2007Luo et al. ( , 2009 have reported that fearful expressions elicit early gamma band activity in the amygdala followed by later responses in the occipital, parietal, and prefrontal cortices. These authors have also observed a later attention-dependent gamma response localized to the amygdala, presumably due to feedback from prefrontal regions (Luo et al., 2010). However, these studies, as with any EEG or MEG study reporting activation from deep, subcortical structures, should be considered respect to the current limitations in source analysis techniques. Although greater for supraliminally-presented facial expressions, gamma synchronization is also observed in response to facial expressions processed subliminally (Balconi and Lucchiari, 2008;Luo et al., 2009), suggesting that gamma synchronization can be influenced by emotion even in the absence of visual awareness.
These findings are supported by intracranial studies showing localized gamma synchronization in brain regions implicated in emotion processing-most notably, the amygdala and OFC. Recording intracranial field potentials from the amygdala of pre-surgical epileptic patients, Sato et al. (2011) found increased gamma synchronization in the amygdala for fearful compared to neutral facial expressions. The early time course of gamma synchronization (50-150 ms) supports the presence of a subcortical pathway involved in the rapid detection of emotionally salient facial features. Gamma synchronization has also been observed over prefrontal cortices during the later stage of emotional face perception. Consistent with findings from functional neuroimaging studies demonstrating functional subdivisions between medial and lateral regions of the OFC (i.e., Kringelbach and Rolls, 2004), gamma responses in the lateral OFC are greater in response to negative emotions (Jung et al., 2011). However, this effect only occurred when attention was explicitly directed to the emotional quality of the expressions. Thus, during an implicit processing task, no responses in the lateral OFC were observed. Moreover, Jung et al. (2011) observed increased gamma band activity in the medial OFC only in response to target stimuli, regardless of emotional valence. These finding suggest that the medial-lateral distinction between subregions of the OFC cannot be explained simply in terms of valence but may instead reflect the processing of relative value within the context of the current environment. Recent studies have also observed differential effects of attention on gamma band activity in a network of brain regions the amygdala and OFC during the perception of emotional facial expressions (Müsch et al., 2014). Thus, gamma synchronization in the OFC may reflect the attention-dependent binding of emotionally salient stimuli with internal representations of their motivational significance.

Summary
Taken together, the current evidence supports the idea that the perception of emotional facial expressions is mediated by the synchronization of neural oscillations across multiple frequency bands (Güntekin and Bas , ar, 2014). Overall, it appears that lower frequency bands may coordinate patterns of long-range connectivity necessary for the encoding and selection of emotionally salient facial features while higher frequency bands may be associated with the integration of these features at multiple stages of emotion processing.

PERCEPTION OF EMOTION FROM VOCAL EXPRESSIONS
Within the auditory domain, emotion can be communicated via affective bursts (laughs, screams, cries, etc.) or more subtle changes in tone of voice, or emotion prosody. While both convey important affective information, perception of emotion from these two types of vocal expressions occurs along different time scales and may rely on different patterns of neural activity and connectivity. Although very few studies have investigated the role of neural oscillations in perception of emotion from either type of vocal expression, current evidence suggests that theta synchronization may play a particularly important role in facilitating the detection of emotionally salient vocal cues.

Detection of Prosodic Change
A considerable body of research suggests that theta band oscillations drive the processing of slow acoustic changes in speech perception (Peelle and Davis, 2012). To explore the role of oscillatory activity in the detection of emotional prosodic change, Chen et al. (2012) used a cross-splicing procedure to artificially combine vocalizations spoken in angry and neutral prosodies. Thus, vocalizations could change from neutral to angry, angry to neutral, or remain constant. In this paradigm, detection of prosodic change was associated with an increase in fronto-central theta synchronization between 100-600 ms. Furthermore, for angry prosodies only, theta synchronization was modulated by intensity with greater power for high compared to low intensity vocalizations. Subsequent research by the same group has extended these findings, showing increased theta synchronization for neutral to angry change compared to no change for both implicit and explicit tasks suggesting that the emotional content of the stimulus may facilitate the detection of acoustic change . In this study, significant beta desynchronization was also observed between 400-750 ms, but only when the task required explicit processing of emotional change, which the authors interpret as re-integration of the cross-spliced portion of the sentence with its preceding context. Although these findings provide preliminary support for the role of theta synchronization and beta desynchronization in the detection of emotion prosody, the precise temporal and spatial dynamics of these effects needs to be addressed in order to provide a better characterization of the function of these frequency bands in vocal emotion perception.

Oscillatory Response to Affective Bursts
With regards to affective bursts, what little evidence there is suggests that gender differences may also influence theta band activity. In a study by Bekkedal et al. (2011), the authors found no main effect of emotion on frontal theta synchronization. Instead, they found an interaction between emotion and gender such that women showed increased theta synchronization for angry expressions over bilateral anterior regions while men showed increased theta synchronization for expressions of pleasure over right anterior regions. As noted by the authors, this gender difference in theta synchronization may be due to differences in arousal, although behavioral measures would certainly be needed to support this claim. Moreover, the wide time intervals used for analysis (500 ms) make the functional interpretation of these gender differences in theta synchronization difficult and may partially account for the absence of any statistically significant differences in other frequency bands.

Summary
Though few in number, the existing studies suggest that theta synchronization may facilitate the perception of emotion from vocal expressions. Consistent with findings from the speech literature, theta synchronization appears to mediate the detection of acoustic change, an effect which is modulated by emotion. Additionally, beta desynchronization may also play a role in vocal emotion perception, but only when explicitly attending to the change in prosody. Thus, theta synchronization may be involved in the detection of emotionally significant acoustic features during vocal emotion perception while beta desynchronization may facilitate the integration of these features with contextual information.

INTEGRATION OF FACIAL, BODY, AND VOCAL EXPRESSIONS OF EMOTION
In natural environments, emotion perception requires the integration of emotional cues from both visual and auditory modalities. Based on current models of visual and auditory emotion perception, it could be hypothesized that multisensory emotional expressions are integrated in a convergent manner such that visual and auditory cues are processed separately in modality-specific cortices, integrated into a coherent multisensory percept the STS, and evaluated in the PFC. However, it is important to note that facial and vocal expressions occur along different temporal scales with changes in facial expression often preceding changes in vocal expressions. Therefore, based on dynamic changes in facial and body expressions, the brain can generate predictions about the timing and content of forthcoming vocal expressions. Evidence from ERP studies suggests that emotional facial expressions elicit stronger (i.e., more reliable) predictions than neutral expressions Jessen et al., 2012;Ho et al., 2015;Kokinous et al., 2015), resulting in facilitated processing of predicted emotional vocalizations. Together with recent proposals suggesting that neural oscillations play an important role in multisensory processing (Schroeder et al., 2008;Senkowski et al., 2008;Arnal and Giraud, 2012), this suggests that neural synchronization may facilitate the processing of multisensory emotional expressions through: (i) the selective binding of emotionally-salient sensory input from different modalities; and (ii) the formation and modification of sensory predictions.

Multisensory Integration of Facial and Vocal Expressions
Many earlier studies of multisensory emotion perception relied on the use of static facial expressions paired with words or phrases spoken in emotional or neutral prosody. In one such study, Chen et al. (2010) sought to determine whether multisensory integration effects could be observed in the primary sensory cortices during emotional face-voice processing. Using MEG, the authors recorded changes in oscillatory activity during visual, auditory, and audiovisual processing of emotional expressions. However, no integration effects were observed in either visual or auditory cortices. While this finding is interpreted as absence of audiovisual integration in primary sensory cortices, it could also be explained by the absence of predictive visual information since visual and auditory cues were presented simultaneously (see Vroomen and Stekelenburg, 2010). Interestingly, however, the authors observed alpha synchronization over superior frontal and cingulate cortices, which may suggest that increasing the amount of information available to the sensory systems via multiple modalities reduces the cognitive demand on prefrontal regions (Schelenz et al., 2013). Other studies using static facial expressions have found cross-modal interactions in other frequency bands and brain regions. For instance, by presenting participants with static fearful and neutral facial expressions paired with congruent vocal expressions, Hagan et al. (2009) demonstrated supra-additive increases in oscillatory activity in the STS, with theta and gamma bands contributing most to the increase in broadband activity. Subsequent research by the same group showed that supra-additive increases in the STS occurred in both congruent and incongruent conditions (albeit later in the incongruent condition), suggesting automatic integration of emotional facial and vocal expressions (Hagan et al., 2013). Consistent with these findings, other studies have observed theta synchronization during the integration of facial and prosodic change (Chen et al., 2015). Together, these findings suggest that oscillatory activity in the alpha and theta frequency bands drive the integration of facial and vocal expressions. Thus, without predictive visual information, theta synchronization in the STS may facilitate the feedforward integration of visual and auditory input into a coherent percept, reducing the processing demands on prefrontal regions involved in the interpretation and evaluation of the expression.

Cross-Modal Predictive Coding of Emotional Expressions
Although these studies using static facial expressions have undoubtedly contributed to our understanding of audiovisual integration of emotional expression, their findings could be challenged on the grounds of ecological validity. Therefore, more recent studies have moved towards the use of dynamic facial, body, and vocal expression in order to explore the oscillatory correlates of emotion perception in more naturalistic environments. In among the first to do so, Jessen and Kotz (2011) presented participants with video clips of dynamic facial, body, and vocal expressions. Using EEG, the authors found significant decreases in both alpha and beta power for audiovisual compared to the sum of auditory-and visualonly conditions with additional suppression for emotional compared to neutral expressions. These findings were replicated in a subsequent study, which also showed that while beta suppression for the contrast between multimodal and unimodal conditions was localized to the premotor cortex, suppression for the contrast between emotional and neutral conditions was localized to the posterior parietal cortex . Since previous studies have demonstrated beta suppression in these regions during the processing of biological motion (Muthukumaraswamy et al., 2006;Muthukumaraswamy and Singh, 2008), it could be argued that the observed differences in beta power are due to differences in the motion content between emotional and neutral expressions. However, for the stimuli used in these studies, there was no difference in the motion content before the onset of the vocal expression (see Jessen and Kotz, 2011) making it unlikely that beta suppression was an artifact of differences in motion content. Instead, beta oscillations may play a broader role in the predictive coding of audiovisual information (i.e., Arnal and Giraud, 2012). Furthermore, the observed differences in beta ERD between emotional and neutral expressions provide support for the hypothesis that emotional expressions generate stronger cross-modal predictions compared to neutral expressions (Jessen and Kotz, 2013).

Summary
Taken together, these studies support previous research suggesting that neural oscillations play an important role in multisensory processing. Furthermore, these findings show that the emotional content of the stimulus may facilitate flexible integration of facial, body, and vocal expressions. The simultaneous presentation of visual and auditory expressions results in synchronization of theta oscillations in the STS (i.e., the STS) and alpha oscillations over prefrontal regions, suggesting that theta synchronization mediates the integration of audiovisual emotional expressions. Previous evidence suggests that multimodal expressions generally are more easily recognizable than unimodal expressions (Collignon et al., 2008;Tanaka et al., 2010;Föcker et al., 2011), frontal alpha synchronization may reflect relative inhibition of regions needed to resolve any remaining uncertainty with regards to the emotional content of the stimulus. Since this effect was observed in emotion categorization tasks, it is possible that different task demands will induce different spatial and temporal patterns of alpha synchronization. In contrast, the natural temporal delay between visual and auditory expressions enables the brain use changes in facial and body expression to generate predictions about the timing and content of forthcoming vocal expressions. Thus, cross-modal prediction results in ERD, particularly in the alpha and beta frequencies. These findings support the idea that multisensory integration and cross-modal prediction are distinct yet interactive mechanisms underpinning the multisensory emotion perception Kotz, 2013, 2015).

DISCUSSION
Nonverbal emotion perception is driven by dynamic, contextdependent interactions within and between brain regions involved in the detection, integration, and evaluation of emotional expressions. Where and when such interactions occur depends on the sensory modality (or modalities) through which the emotion is expressed as well as the emotional quality of the stimulus itself. However, emotional expressions are dynamic events that continuously evolve over time. Therefore, the neural system(s) supporting emotion perception must be able to flexibly adapt to and integrate rapidly changing sensory input from multiple modalities. Based on the reviewed evidence, we propose that neural synchronization underpins the selective attention to and the flexible binding of emotionally salient sensory input across different spatial and temporal scales. Furthermore, neural oscillations provide a mechanism through which emotional facial and body expressions can predictively modulate the processing of subsequent vocal expressions.
The recognition of an expression as ''emotional'' requires the selective binding of emotionally relevant sensory information. However, individual features of an emotional expression can occur at different points in time and are processed in spatially distinct regions of the brain. Thus, the brain is challenged with the task of binding only those features belonging to the same event across both space and time. One mechanism through which this may occur is the synchronization of neural oscillations, which creates temporal windows in which information belonging to the same event can be selected and integrated (Singer and Gray, 1995). Moreover, coherence between distinct neuronal populations may enable the flexible neuronal communication across different regions of the brain (Fries, 2005). Consistent with this idea, current evidence suggests that the synchronization of neural oscillations supports the selection and integration of sensory information within and between modalities (Senkowski et al., 2008;van Atteveldt et al., 2014). Gamma band oscillations, in particular, are thought to be important for sensory binding and feature integration on a local scale (Tallon-Baudry and Bertrand, 1999). As previously discussed, perception of emotion from facial expressions results in increased gamma band synchronization, suggesting that gamma band oscillations may mediate the rapid integration of emotionally salient sensory input. However, gamma band synchronization may be modulated by lowerfrequency oscillations. Since lower frequency bands represent the activity of larger neuronal populations and longer temporal windows, such cross-frequency coupling between low and high frequency oscillations may enable the integration of information across different spatial and temporal scales (Canolty and Knight, 2010).
Natural communicative signals exhibit strong regularities that enable the brain to generate predictions about forthcoming sensory information within and between sensory modalities. This process may be mediated by the functional coupling of neural oscillations, which can facilitate the efficient allocation of processing resources to the predicted sensory input. For instance, synchronization of low-frequency oscillations may coordinate the allocation of processing resources, via highfrequency oscillations, at the phase in which the predicted sensory input occurs (Hyafil et al., 2015). As an example, the natural temporal delay between visual and acoustic speech signals provides a means through which the visual signal can alter the phase of ongoing neural oscillations such that the expected acoustic signal occurs at the phase of optimal neuronal excitability (Schroeder et al., 2008). While the phase of low-frequency oscillations may create temporal windows for the selection of relevant sensory information, higherfrequency beta and gamma oscillations may be involved in the transmission of top-down predictions (both formal and temporal) and bottom-up prediction errors, respectively (Arnal et al., 2011;Arnal and Giraud, 2012). If this is indeed the case, then it follows that neural oscillations, particularly within these frequency bands, may facilitate the predictive coding of nonverbal communicative signals such as dynamic facial, body, and vocal expressions. In this respect, emotion perception is similar to other forms of perception, with emotion acting as a highly salient source of relevant information that must be encoded and integrated with other sources of sensory information.

Effect of Modality
Although early on, Charles Darwin recognized the equal importance of facial, body, and vocal expressions in emotional communication, research over the past 50 years has focused predominantly on the perception of emotion from facial expressions. Thus, the role of neural oscillations in emotion perception has primarily been studied by presenting participants with images of static facial expressions. While this approach has yielded some valuable results, it does not necessarily reflect how emotions are expressed and perceived in natural human communication.
In everyday life, emotional expressions are dynamic, characterized by changes in facial expression, body language, and prosody unfolding over time. To this end, previous functional neuroimaging research has shown distinct neural pathways involved in the perception of emotion from static and dynamic facial expressions (i.e., Kilts et al., 2003). Consistent with these findings, results from Jabbi et al. (2015) suggest that oscillatory activity in the beta frequency band may track dynamic changes in sensory input facilitating the differentiation of emotional expressions. Although the use of dynamic facial expressions adds an additional level of stimulus complexity, it also affords greater ecological validity, which can improve our understanding of the neural dynamics underpinning naturalistic emotion perception. Moreover, the dynamic nature of emotional expressions enables the brain to use incoming sensory input to generate predictions about future events. Future studies using methods such as dynamic causal modeling (DCM) can be used to compare convergent and predictive coding models of multisensory emotion perception.
A second issue relates to the fact that facial expressions are also not the only means of emotional communication. Changes in emotional body language (De Gelder, 2006) and prosody (Schirmer and Kotz, 2006) also provide important information about one's emotional state. Compared to facial expressions, however, little is known about the oscillatory dynamics underpinning the perception of emotion from body and vocal expressions. Therefore, further research into: (i) the perception of emotion from dynamic body and vocal expressions; and (ii) the integration of emotional expressions from multiple modalities is needed if we are to understand the neural bases of emotion perception in human social interactions.

Emotional Differentiation
Each emotion is associated with a unique physiological, cognitive, and behavioral profile that serves an adaptive and, in social species, a communicative function. Therefore, it is likely that distinct patterns of neural activity and connectivity drive the expression and perception of different emotions.
One of the broadest distinctions between emotions is that of valence, which categorizes emotions as positive (pleasant) or negative (unpleasant). Within the brain, some have proposed that the right hemisphere is dominant for the processing of negative emotions while the left hemisphere is dominant for positive emotions Schwartz, 1979, 1985;Silberman and Weingartner, 1986). Although valence-specific asymmetry has primarily been discussed within the context of emotional experience, studies in healthy individuals and in patients with unilateral brain damage suggest that there may also be hemispheric asymmetry in the perception of emotion (i.e., Jansari et al., 2000;Adolphs et al., 2001), though this may be influenced by task demands (Kotz et al., 2003. Consistent with this hypothesis, there is preliminary support for valence-specific hemispheric asymmetry of alpha desynchronization during the emotion perception (i.e., Balconi and Ferrari, 2012). However, given support for alternative hypotheses such as the approachwithdrawal model of hemispheric lateralization (Davidson, 1992), future studies examining patterns of coherence across brain regions during the perception of positive and negative emotions are needed in order to elucidate the functional dynamics underpinning the differentiation of emotional valence.
Since each emotion serves a distinct function, it has been hypothesized that there may be different, yet partially overlapping, neural pathways specialized for the processing of different emotions (i.e., LeDoux, 2000). Thus, we may expect specific patterns of neural synchronization during the perception of different emotions. In support of this idea, distinct spatial and temporal patterns of theta (Knyazev et al., 2009b) and gamma (Luo et al., 2007) band activity have been observed in response to different emotions. So although perception of different emotions may rely partially overlapping networks, further investigations into patterns of neural synchronization and coherence may reveal subtle changes in functional dynamics that enable us to differentiate between emotions.

Individual Differences
Due to the interaction between neurophysiological and environmental factors, individual differences can have a profound effect on how we perceive and interpret nonverbal expressions of emotion. Underlying these individual differences are changes in functional coupling that can be investigated by examining patterns of neural synchrony. To this end, gender differences are reflected in beta (Güntekin and Bas , ar, 2007b) and theta (Knyazev et al., 2010) synchronization in response to emotional facial expressions. Furthermore, alpha desynchronization has been negatively associated with extraversion (Fink, 2005) and hostility (Knyazev et al., 2009b) and positively associated with anxiety (Knyazev et al., 2008) and depression (Knyazev et al., 2015). Individual differences have also been observed in the theta band, with reduced frontal theta synchronization in individuals with high levels of anxiety (Knyazev et al., 2008) and depression (Knyazev et al., 2015) and increased theta synchronization in those scoring high on measures of emotional intelligence (Knyazev et al., 2013). Additionally, hostility has been associated with gender differences in alpha and theta synchronization over posterior regions (Knyazev et al., 2009b) while dominance motivation is associated with delta/beta asymmetry (Hofman et al., 2013). Taken together, these findings suggest that changes in patterns of neural synchronization may mediate individual differences in the perception of emotional expressions.

Clinical Implications
Deficits in the ability to accurately perceive and interpret emotions have been observed in a number of neurological and psychiatric conditions, the neural bases of which remain poorly understood. By enabling us to look beyond the activity of specific brain regions into the dynamics of functional neural networks, investigations into changes in neural synchronization and coherence can advance our understanding of the specific impairments associated with different clinical conditions. Work in this area has already begun with studies showing reduced theta synchronization during perception of emotional facial expressions in individuals with schizophrenia (Ramos-Loyo et al., 2009;Csukly et al., 2014). Schizophrenia has also been associated with abnormal patterns of alpha synchronization (Ramos-Loyo et al., 2009;, though this may be improved through targeted training in facial affect recognition . Other studies have found that oscillatory responses to facial expressions in the gamma band differentiate between unipolar and bipolar depression; while individuals with unipolar depression show reduced gamma power in response to sad facial expressions, those with bipolar show increased gamma band activity in response to highly arousing emotions (Liu et al., 2014). Finally, adolescents with Autism Spectrum Disorder show reduced interregional beta synchronization in response to angry facial expressions, suggesting that impairments in functional connectivity within networks involved in emotion processing may contribute to the deficits in facial emotion perception observed in autism (Leung et al., 2014). Thus, a better characterization of oscillatory responses to emotional expressions may aid in the diagnosis and treatment of a number of clinical conditions.

CONCLUSION
From the reviewed studies, it is clear that the perception of facial, body, and vocal expressions of emotion is mediated by oscillatory activity in multiple frequency bands. Although research on delta synchronization has been primarily restricted to the visual domain, the important delta oscillations and their functional coupling with higher (beta/gamma) frequency bands, in basic biological, cognitive, and emotional processes highlights the need for further research into the functional role of delta oscillations in emotion perception within and between sensory modalities. Across modalities, theta synchronization most consistently differentiates between emotional and neutral expressions and may reflect the initial encoding and derivation of emotional significance. Changes in alpha power have been primarily observed in studies with a visual component, with some evidence of valence-specific lateralization over frontal regions. Based on the hypothesis that alpha synchronization reflects active inhibition of task irrelevant brain regions (Klimesch et al., 2007), modulation of alpha power may reflect sensory selection and inhibition of behaviorally relevant sensory information. Although evidence is still inconclusive as to the role of beta oscillations in emotion perception, changes in beta power are more likely to be observed in studies using dynamic stimuli or in those involving shifts in attention, consistent with the idea that beta band activity reflects the maintenance of the current cognitive or sensorimotor set (Engel and Fries, 2010). Gamma synchronization has been observed in emotion processing regions such as the amygdala, STS, and OFC, suggesting that oscillatory activity in this frequency band is associated with the binding of emotionally salient sensory input. The modulation of specific frequency bands by emotion enables the selective detection, integration, and evaluation of emotional signals through coordinated changes in effective connectivity. From a predictive coding perspective, the emotional quality of the expression may act as a particularly salient source of information, strengthening the precision of sensory predictions through enhanced neural synchronization. However, further research, particularly in the auditory and audiovisual domains, is clearly necessary to gain a deeper understanding of the neural dynamics underpinning the perception of emotion within and between sensory modalities.

AUTHOR CONTRIBUTIONS
Main contribution by first author (AES). All the other authors contributed equally to this work.