- 1Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- 2Institute of Child and Adolescent Psychiatry, University of Kiel, Kiel, Germany
- 3Core-Facility Brainimaging, Faculty of Medicine, University of Marburg, Marburg, Germany
Introduction: Understanding how emotions are encoded at the neural level remains a central challenge in human neuroscience. Facial expressions are among the most powerful and frequently used stimuli to study emotion processing. Face perception itself is a complex function supported by a core network—including bilateral occipito-fusiform and superior temporal regions—and an extended network involving anterior structures such as the bilateral amygdalae. However, previous findings on how emotional content modulates these networks have been inconsistent.
Methods: To disentangle perceptual and affective components of face emotion processing, we combined high-frequency pupillometry with functional magnetic resonance imaging (fMRI). Pupillary dilation serves as a sensitive index of two distinct processes: perceptual load, reflecting the informational complexity of a face, and arousal, indicating its immediate sensory impact. In our study, 25 participants (13 female) viewed faces expressing anger, fear, happiness, or neutrality as well as luminance-matched houses serving as control stimuli. A one-back task unrelated to emotion masked the true experimental purpose.
Results: Relative to houses, faces elicited stronger pupillary dilations as well as enhanced blood-oxygen-level-dependent (BOLD) activity in bilateral occipital and fusiform cortices as well as in both amygdalae. Among facial expressions, angry faces evoked the largest pupillary dilations, while fearful faces elicited the strongest neural responses within a right-lateralized network centered on the superior temporal sulcus (rSTS). Across all faces>houses (conjunction minimum-statistic inference), pupil size correlated positively with BOLD activity in the right fusiform gyrus (rFFG), left inferior occipital gyrus (lIOG), bilateral calcarine cortex, and bilateral lingual gyrus.
Discussion: These findings indicate that emotional faces impose a higher perceptual load than matched control stimuli, engaging a distributed network spanning early visual and attention-related areas. In conclusion, our results suggest that emotional quality is specified early in the perceptual process, with divergent pupillary and neural signatures separating arousal-driven threat responses (anger) from socially complex alarm cues (fear).
1 Introduction
The quality of an emotion constitutes a complex set of informational features. In the visual domain, the human face represents the most salient and universally used medium for expressing emotions. Across cultures, observers reliably categorize a limited set of basic emotions from facial expressions (Ekman, 1992). Yet it remains an open question whether the human brain performs a similar classification of facial emotions—and which neural regions integrate emotional quality into perceptual and experiential representations. FMRI studies have established that face processing engages a core network, encompassing the bilateral occipital and fusiform gyri as well as the superior temporal cortex, and an extended network, including anterior structures such as the bilateral amygdalae (Haxby et al., 2000, 2002; Fairhall and Ishai, 2007). Recent meta-analytical work supports the notion of the core and extended face processing networks even for dynamic stimuli (Zinchenko et al., 2018).
How exactly emotional expressions are represented within this network remains an open question in both computationally inspired and fundamental neuroscience. It is still debated whether emotional information is embedded within the holistic (Gestalt) representation of a face or constitutes a distinct perceptual cue. Lesion evidence particularly implicates the right fusiform gyrus in processing individual facial identity (Rossion, 2008), whereas emotion processing has been more broadly linked to amygdala function—extending beyond facial stimuli (Davis and Whalen, 2001).
In fMRI research, the focus has traditionally been on the type of information being processed—the “what”—rather than on the dynamics of processing—the “how” (e.g., Fusar-Poli et al., 2009a, 2009b). In contrast, evidence from a recent event-related potential study comparing bodily and facial expressions suggests that the automatic processing of emotional signals from the body influences face recognition, but not as strongly in the opposite direction (Puffet and Rigoulot, 2025). Phenomenologically, the faster processing of body compared to facial emotions may reflect the fact that bodily cues more directly indicate possible actions and are therefore, as sensory data, more immediately relevant to the perceiver (de Gelder, 2006). In short, whereas engineered, action-unit–based models infer emotional categories from discrete cues such as facial muscle configurations, the human brain implements a more complex, valence-guided, and feedback-sensitive coding scheme that continuously biases activity in early visual areas (Murphy et al., 2011; Deen et al., 2015).
Moreover, the processing of movement and continuity—the temporal integration of dynamic changes in facial and bodily signals—plays a crucial role in social perception. Within the core face network, the superior temporal cortex has been identified as the key site for integrating social and motion-related cues, thereby supporting both social-cognitive interpretation (Blakemore, 2008) and visuomotor processing (Grosbras et al., 2012).
Conceptually, the term emotion quality remains under-specified in face perception research. Most paradigms in the field still follow Ekman’s framework of basic emotions (Ekman, 1992; see Vytal and Hamann, 2010). This model distinguishes classes of negative (e.g., fear, anger) and positive (e.g., happiness) valence; yet, what constitutes successful processing of such emotions remains unclear. For instance, autism research frequently operationalizes social cognition through Ekman-based facial emotion recognition tasks (Nagy et al., 2021). Paradoxically, the sheer abundance of studies using Ekman faces has produced rather inconsistent behavioral and neural results. A central conceptual debate concerns whether a baseline emotion truly exists—that is, whether any facial expression can be regarded as genuinely neutral and emotionally unloaded (Uljarevic and Hamilton, 2013). At the neurophysiological level, emotional valence interacts with visual salience (or arousal) (Corbetta and Shulman, 2002). In subjective experience, both implicit salience and explicit valence jointly shape the perceived Gestalt of a face.
Pupil dilation offers a valuable window into the neural mechanisms underlying emotion perception. Two complementary concepts are particularly informative:
i. Perceptual load, which reflects the cognitive demands of stimulus processing and has been linked to emotional valence (Kahneman and Beatty, 1966; Kahneman and Wright, 1971; Kahneman et al., 1969).
ii. Arousal-based neural responses, which precede non-arousal-related processes and correspond to sensory salience (Honma et al., 2012; Murphy et al., 2011; Tamietto et al., 2009).
In neuroimaging studies of emotional face perception, high-frequency pupillometry can help disentangle these processes by distinguishing rapid, arousal-driven responses from slower, cognitively mediated perceptual load effects. Traditional pupillometry shows that the time course of dilation encodes stimulus salience (Kret et al., 2013), whereas the slower component of the pupil response—emerging later and reflecting the processing demands of complex visual stimuli—can be captured with high temporal precision using high-frequency eye-tracking (Wierda et al., 2012). The simultaneous acquisition of high-frequency pupillometry and fMRI is methodologically critical for a holistic neurophysiological account of face perception, as it bridges a fundamental resolution gap. fMRI’s low temporal resolution, on the order of seconds, is ill-suited to capture the rapid, sub-second dynamics of the subcortical visual pathways-including the superior colliculus and the pulvinar-that are intimately engaged in the initial, arousal-related components of processing socially salient faces. Pupillary oscillations, controlled by these same autonomic brainstem circuits, provide a continuous, millisecond-scale readout of this rapid arousal response. By correlating this high-fidelity temporal trace of arousal witch the spatially precise hemodynamic response signal from fMRI, researchers can disambiguate the distinct, yet temporally intertwined, contributions of the fast, subcortical arousal network from the slower, higher-order cortical regions involved in detailed face analysis, thereby providing a more complete and directionally linked model from initial orienting to full cognitive appraisal.
We therefore combined fMRI (MRI data entered a connectivity analysis whose results are already published; see, Kessler et al., 2021) with high-frequency pupillometry to identify distinct neural mechanisms underlying face perception and the processing of emotional quality in response to Ekman (1992) expressions. We focused on anger and fear, as previous work suggests that these emotions differ in their underlying quality and functional significance (Davis et al., 2011). We hypothesized that angry faces, as direct threat signals, would evoke greater pupil dilation, reflecting heightened arousal, whereas fearful faces, which indicate environmental alarm, would preferentially engage the right superior temporal sulcus (rSTS)—a region implicated in the integration of social and motion cues. To ensure balanced emotional valence, happy and neutral expressions were included as comparison conditions (Uljarevic and Hamilton, 2013). Luminance-matched house images served as non-social control stimuli.
2 Methods
2.1 Subjects
Twenty-five healthy volunteers (13 female; age range 21–29 years, mean = 24.3, SD = 2.1), recruited from students and staff at the University of Marburg, participated in the study. All participants were right-handed (Oldfield, 1971), had normal or corrected-to-normal vision, and reported no history of neurological or psychiatric disorders. Written informed consent was obtained from all participants. Experimental procedures were conducted in accordance with the Declaration of Helsinki and approved by the local Ethics Committee (proposal #30/16).
2.2 Experimental design
Five stimulus conditions were presented: faces displaying neutral (NF), happy (HF), angry (AF), or fearful (FF) expressions from the Radboud Faces Database (Langner et al., 2010), and houses (H) as a control condition. All images were converted to grayscale and cropped to 500 × 400 px using ImageMagick (version 6.8.9–9, Q16 x86_64; ©1999–2014 ImageMagick Studio LLC). Mean luminance was equated across stimuli using the SHINE toolbox for MATLAB (Willenbockel et al., 2010). Spatial-frequency matching was deliberately omitted in order to preserve the natural frequency content that is critical for rapid face and emotion processing via subcortical pathways (Vuilleumier et al., 2003). Because pupil responses are highly sensitive to even subtle changes in low- and mid-frequency structure, any artificial SF equalization would have compromised the ecological validity of the stimuli as well as the perceptual mechanisms underlying the pupillary signal. In addition, all stimuli were already luminance-matched and have been validated in previous work; further SF manipulation would likely have introduced distortions that run counter to the aim of presenting perceptually natural emotional stimuli. Example stimuli are shown in Figure 1. Experimental Procedure is shown in Figure 2.
Figure 1. Example stimuli. Faces displaying neutral (top left), happy (top middle), angry (top right), and fearful (bottom left) expressions, alongside luminance-matched houses as a control condition (bottom middle). Faces reproduced with permission from Langner et al. (2010).
Figure 2. Procedure of the experiment. Each stimulus is presented for 350 ms, with an inter-stimulus interval of 150 ms. Twenty-four images form a block with a duration of 12 s. Between blocks, there is a break lasting between 4 and 7 s. Twenty blocks from each category (neutral, happy, angry, fearful, houses) are presented in a pseudo-randomized order. If the same stimulus is repeated, the participant should indicate the repetition by pressing a key. Faces reproduced with permission from Langner et al. (2010).
Stimuli were presented in alternating block conditions (NF, HF, AF, FF, H) on a rear-projected 16:9 monitor, viewed via a mirror positioned approximately 15 cm above the participant’s eyes in the MRI scanner (Presentation v14.1, Neurobehavioral Systems). Participants were naive to the experimental purpose and performed a cover one-back task (button presses with both index fingers) to maintain attention.
Each block contained 24 stimuli, each displayed for 350 ms with 150 ms inter-stimulus intervals. The block order was identical for all participants. In total, 100 blocks (~12 s each) were presented, yielding an experimental duration of ~30 min. Each block was preceded by a fixation cross presented for a jittered interval of 4,000–7,000 ms.
2.3 Data acquisition
2.3.1 Pupillometry data
Left-eye pupil diameter was recorded continuously at 1 kHz during each ~12 s block using an MRI-compatible EyeLink 1,000 infrared camera (SR Research). A 5-point calibration was performed prior to recording. Blinks were identified using the standard EyeLink detection routines.
2.3.2 MRI data
MRI data was acquired on a 3 T Siemens scanner (TIM Trio, Siemens, Erlangen, Germany). High-resolution T1-weighted anatomical images were acquired for each participant using a magnetization-prepared rapid gradient-echo (3D MP-RAGE) sequence in sagittal orientation (TR = 1900 ms, TE = 2.54 ms, voxel size = 1 × 1 × 1 mm3, 176 slices, 1 mm thickness, flip angle 9°, matrix size = 384 × 384, FoV = 384 × 384 mm). Functional data were collected using a T2*-weighted EPI sequence sensitive to the BOLD contrast (TR = 1,550 ms; TE = 36 ms; flip angle = 70°) with 20 transverse slices (slice thickness = 2.7 mm; interslice gap = 0.4 mm; FoV = 200 mm; voxel size = 2.8 × 2.8 × 3.1 mm, including gap). This sequence was chosen based on pilot data to provide robust single-subject amygdala activation.
2.4 Data analysis
2.4.1 Pupillometry data
Blinks and saccades were detected using EyeLink routines with standard thresholds (saccade acceleration ≥ 500°/s2; velocity ≥ 50°/s). Microsaccades were treated as saccades. Segments containing blinks within the first 1,500 ms of a block were excluded due to presumed reduced attention at block onset. For the included trials/blocks, blink periods were linearly interpolated (Frässle et al., 2016). To reduce sequence effects, pupil traces were normalized per block to the average pupil size during the first 200 ms following the first stimulus onset (Wierda et al., 2012). Preprocessing and temporal analyses were performed in MATLAB (R2014a).
Pupil traces from 0–5 s relative to the first stimulus onset were extracted for each condition. This window captures both fast (initial) and slow (later) responses, while remaining within the first half of the block, which is assumed to be less affected by blink-related artifacts. The remaining ~7,000 ms of each block were excluded due to increased noise from blinks.
Mean pupil dilation within the 0–5 s window was compared between conditions using Wilcoxon–Mann–Whitney tests with sequential Bonferroni correction (α = 0.05). In addition, an ANOVA (SPSS 21, IBM) was conducted to assess the effect of condition on mean pupil dilation.
Parametric modulation of fMRI by pupil size: Parametric regressors were derived from the initial 5 s of each block. Pupil data were normalized to baseline (0–200 ms) as percentage change, downsampled to match the MR micro-time resolution, and demeaned for SPM compatibility. Regressors were then convolved with the canonical hemodynamic response function and resampled at the TR (1.55 s). These parametric regressors were included per condition as effects of interest in a second first-level fMRI model.
2.4.2 MRI data
MRI data were preprocessed using SPM12 (r6685; Wellcome Centre for Human Neuroimaging; MATLAB). The first three functional volumes were discarded to allow for T1 signal stabilization. Field maps were computed from phase and magnitude images, converted to voxel displacement maps, and used to unwarp EPI images. A combined realign-and-unwarp procedure corrected for static and motion-related susceptibility distortions, while within-subject motion was further corrected using 6-parameter rigid-body transformations. Functional images were then normalized to Montreal Neurological Institute (MNI) space and smoothed with a 6 mm full-width-at-half-maximum (FWHM) Gaussian kernel.
A general linear (Generalized Linear Model, GLM) block design was specified for the five conditions (NF, HF, AF, FF, H) using the canonical HRF without temporal or dispersion derivatives. Onset vectors were generated for each participant from Presentation logs. The six motion parameters were included as nuisance regressors. A high-pass filter at 1/256 Hz was applied (extended from the standard 1/128 Hz). For each participant, condition-specific effects produced five t-contrasts corresponding to NF, HF, AF, FF, and H.
For the pupil-covariation GLM, conditions were modeled as regressors of no interest, while t-contrasts targeted the parametric modulators derived from the initial-phase pupil data. To isolate face-selective correlations, each face condition was contrasted against houses at the first level, resulting in four t-contrasts per participant (pupilmod_NF: NF > H, etc.).
At the group level, a flexible factorial model was used to combine single-subject contrasts. Unless otherwise noted, second-level contrasts were evaluated using t-statistics with voxel-wise family-wise error (FWE) correction (p < 0.05) and a cluster-extent threshold of k ≥ 10 voxels, to reduce false positives in small ROIs such as the amygdalae. For the anatomical labelling of resulting cluster peak voxel location, the SPM-implemented anatomy toolbox (atlas) was used.
2.4.2.1 Contrasts of interest
Commonalities and differences in emotional face processing: a group conjunction of all face > house contrasts (NF > H ∩ HF > H ∩ AF > H ∩ FF > H) was used to assess shared face-related BOLD responses. Differences among the two negative emotions were examined using pairwise contrasts (i.e., FF > AF, AF > FF). To assess the common negative valence of fear and anger, we computed for each the conjunction contrasts to the two non-negative conditions (FF > HF) ∩ (FF > NF), (AF > HF) ∩ (AF > NF).
Parametric modulation by pupil dilation: a main-effect t-contrast tested the average modulation of face-related BOLD activity by pupil dilation over 0–5 s (pupilmod_NF, pupilmod_HF, pupilmod_AF, pupilmod_FF), with family-wise error correction applied.
3 Results
3.1 fMRI data
3.1.1 Activation for faces across all emotions
The conjunction of all face > house contrasts (NF > H ∩ HF > H ∩ AF > H ∩ FF > H) revealed increased BOLD activity in the bilateral inferior occipital (IOG) and fusiform (FFG) gyri, corresponding to the core face perception network, as well as in the bilateral amygdalae (AMY) (Figure 3, turquoise; Table 1).
Figure 3. Turquoise: face-related brain activation, irrespective of emotional content [i.e., conjunction contrast (neutral faces > houses) ∩ (happy faces > houses) ∩ (angry faces > houses) ∩ (fearful faces > houses)], was observed in the bilateral IOG and FFG —the core face perception network—as well as in the bilateral AMY. Green: associations between pupil dilation and BOLD activity were found in multiple regions of the occipito-temporal cortex, including the core system of face perception as well as more posterior located regions in the early visual cortex. Statistical threshold: p < 0.05, FWE-corrected, with a cluster-extent threshold of 10 voxels. IOG = inferior occipital gyrus, FFG = fusiform gyrus, AMY = amygdala.
Table 1. fMRI results for the conjunction contrast (neutral faces > houses) ∩ (happy faces > houses) ∩ (angry faces > houses) ∩ (fearful faces > houses).
3.1.2 Differential activations of negative emotions
Significant differences were observed only for fearful faces (Table 2). The conjunction FF > HF ∩ FF > NF revealed clusters in the right superior temporal sulcus and gyrus (STS/STG), right IOG and left IOG. The contrast FF > AF showed increased responses in the right IOG, right STS, and right amygdala (AMY).
3.2 Pupillometry data
Initial pupil constriction peaked at approximately 600 ms after the first stimulus onset, followed by redilation, which reached its maximum from around 2,500 ms onward. After the initial constriction, all face conditions exhibited larger pupil sizes over time compared with houses (Figure 4, light stars). A repeated-measures ANOVA revealed a significant main effect of condition on pupil dilation, F = 14.8, p < 0.001, partial η2 = 0.059. A sensitivity power analysis conducted in G*Power 3.1 for a within-subjects ANOVA with five measurements, using a sample size of 25, α = 0.05, and 80% power, determined that this design could detect effects of size f ≥ 0.22, which is below our observed effect size. Notably, ranksum test revealed that angry faces elicited a significantly greater increase over time than all other face conditions (Figure 4, dark stars). Bonferroni post-hoc tests revealed that pupil dilation was significantly greater in AF compared to NF (mean difference = 0.0081, p < 0.001), HF (mean difference = 0.0055, p = 0.011), FF (mean difference = 0.0065, p = 0.001) and H (mean difference = 0.0168, p < 0.001), while H elicited significantly smaller dilation than all other conditions (mean differences H-NF = −0.0087, H-HF = −0.0113, H-FF = −0.0103, all p < 0.001).
Figure 4. Averaged pupil dilations over time (normalized to the first 200 ms of each block; 0–5,000 ms after first stimulus onset). Following stimulus onset, the pupil initially constricted, peaking at approximately 600 ms, and subsequently redilated. Redilation was significantly larger for faces than for houses (light stars). Among face conditions, angry faces elicited a significantly greater increase over time compared with the other expressions (dark stars).
3.3 Combination of pupillometry and fMRI data
Parametric analyses revealed a significant main effect of parametric modulation by pupil dilation on face-emotion-specific BOLD activity in the left inferior and middle occipital gyri (IOG, MOG), the right fusiform (FFG), as well as in more posterior regions such as the left occipital pole (OCP), the right calcarine gyrus (CAL), and the bilateral lingual gyrus (LG) (Figure 3, green; Table 3).
Table 3. Main effect of pupil dilation on BOLD responses for neutral, happy, angry, and fearful faces.
4 Discussion
The perception of emotion from faces integrates both sensory input and cognitive appraisal. In this study, we examined how the perceptual load of emotional faces, indexed by pupillometry, relates to their neural processing, with a particular focus on dissociating pathways for the negative emotions of anger and fear.
4.1 The perceptual load of faces
Consistent with the established core face-perception network (Haxby et al., 2000, 2002), faces evoked greater BOLD responses in bilateral occipital and fusiform cortices and larger pupillary dilations than luminance-matched houses. This is in line with evidence for face-selective attentional modulation under high perceptual load (Neumann et al., 2011).
Critically, the parametric modulation of the BOLD signal by pupil dilation provides direct evidence for this link, showing that face-evoked pupil time courses correlated with activity in a distributed occipital network including the right fusiform gyrus (rFFG), left inferior (lIOG) and middle occipital gyrus (lMOG), the left occipital pole (lOCP), and crucially, early visual areas such as the bilateral calcarine (CAL) and lingual gyri (LG).
The pupil-linked modulation in the calcarine cortex is particularly informative. As a site of primary visual processing (Klein et al., 2000) that is modulated by attention and behavioral relevance (Han et al., 2005), its correlation with pupil size—a known index of arousal and processing demand (Kahneman and Beatty, 1966; Bradley et al., 2008)—strongly suggests that faces impose a higher perceptual load than inanimate objects.
This load-related activity extends into the lingual gyrus, a region associated with internally directed attention and known for early face-selective responses (Benedek et al., 2016; Paré et al., 2023), supporting its role in forming the abstract representation of the face category (Watson et al., 2016). These findings indicate that the core face network, particularly the right fusiform gyrus, is not only engaged for face processing per se, but that its activity level is tuned to occipital-lingual representations of overall perceptual load, as reflected in pupil diameter.
4.2 Anger dilates: a threat-triggered arousal response
Our key finding reveals a clear dissociation between anger and fear: while anger specifically enhanced pupil dilation, fear preferentially engaged distinct neural regions. This suggests that anger processing is characterized by a broad, arousal-dominated response.
Anger likely drives this heightened perceptual effort due to its direct threatening nature, whereas fear signals an indirect, environmental threat. This aligns with findings that angry faces are better remembered, suggesting they draw attention to the threatening agent itself, whereas fear directs attention outward to the environment (Davis et al., 2011). Our pupillometric data indicate that this anger-specific response is rapid, with a stronger pupil response emerging between 1800–2,900 ms—a timeframe compatible with late affective appraisal in event-related potential (ERP) studies (Klein et al., 2015). This rapid arousal response likely biases early visual processing (Vinck et al., 2015), priming the system for immediate action.
From a Gestalt perspective, visual systems prioritize cues with immediate behavioral relevance. The direct threat of potential violence conveyed by anger is thus prioritized, triggering a global arousal state reflected in the pupil. This dovetails with work showing angry faces modulate frontal empathy networks (Enzi et al., 2016). The fact that the amygdala was more engaged by fear than anger further underscores this dissociation; the amygdala’s role in vigilance for ambiguous threats (Davis and Whalen, 2001) makes it more critical for processing the alarm signal of fear than the clear, direct threat of anger.
4.3 Fear engages: a neural signature for social alarm
In contrast to the broader arousal response elicited by anger, fearful faces recruited a circumscribed and right-lateralized network encompassing the superior temporal sulcus (STS), inferior occipital gyrus (IOG), and the amygdala. Fearful expressions selectively increased activation in the right STS and IOG relative to happy and neutral faces, and—critically—engaged the right STS and amygdala more strongly than anger.
This pattern suggests that fear processing extends beyond basic threat detection, engaging circuits specialized for decoding socially informative cues. The STS is a well-established hub for integrating dynamic facial features, biological motion, and gaze direction (Deen et al., 2015; Grosbras and Paus, 2006), all of which are essential for identifying both the source and direction of potential danger. Recent evidence further indicates that rapid visual pathways supporting fear detection may already encode high-level social information rather than merely low-level threat signals (e.g., Lanzilotto et al., 2025). This interpretation aligns with contemporary work emphasizing that the amygdala contributes not only to vigilance but also to the evaluation of ambiguous or context-dependent social stimuli (Davis and Whalen, 2001).
In essence, while anger tends to trigger a direct “body alarm” reflected in peripheral autonomic responses such as pupil dilation, fear preferentially engages a “social-cognitive alarm” that mobilizes the STS and amygdala to search for the source of threat in the environment.
5 Conclusion and synthesis
In summary, our multimodal approach dissociates the neural and psychophysiological pathways for processing angry and fearful faces. We demonstrate that anger is predominantly associated with a threat-triggered arousal response, indexed by pupil dilation, which reflects a global state of preparedness. In contrast, fear is characterized by the specific engagement of a right-lateralized network—including the STS and amygdala—specialized in processing social cues and environmental alarm. This “anger dilates, fear engages” dichotomy provides a parsimonious framework for understanding how the brain efficiently processes distinct negative emotional qualities to guide adaptive behavior. We particularly consider the role of a fast, subcortical pathway (involving the superior colliculus, pulvinar, and amygdala) in the rapid processing of fear. This “low road” provides a mechanistic foundation for the amygdala’s rapid, automatic response to fearful faces, which then initiates a vigilant state and guides subsequent cortical analysis (de Gelder et al., 2011). In this framework, the direct threat of anger may be less dependent on this rapid subcortical alert. Instead, anger processing might engage cortical pathways more directly from the outset, supporting the detailed appraisal of hostile intent and coordinating the broad, sustained cortical arousal reflected in the pupil dilation.
5.1 Limitations
The interpretability of our findings is subject to several design constraints. Conceptually, whether the higher perceptual load is due to the no-emotion-neutrality of faces is an interpretation of the pupillary modulation findings that needs to be verified. A replication of the combined high-frequency pupillometric and fMRI study using only neutral face stimuli, would serve this purpose. Methodically, our strategic choice to optimize for robust subcortical and ventral temporal coverage resulted in a limited field of view (20 slices), potentially omitting activity in higher-order regions such as the prefrontal cortex. Additionally, the fixed block design, while powerful, precludes the disentanglement of transient neural responses from sustained emotional adaptation and may be susceptible to order effects. Leaving spatial frequencies natural preserves the ecological validity but might have confounded results. Future studies manipulating spatial frequencies, in particular in relation to the amygdala response, would help address this question. Finally, the use of a one-back cover task, though effective for controlling attention, may have inadvertently modulated emotional processing through its added cognitive load.
5.2 Outlook
The distinct “arousal-for-threat” versus “engagement-for-alarm” model we propose provides a clear, testable framework for future research. Crucially, these findings underscore the necessity for replication in independent cohorts, particularly to confirm the robustness of the right STS in fear processing. Our study also highlights the advantage of a multimodal approach. Relying solely on fMRI might have led to the simplistic conclusion that fear is “more processed” than anger in temporal regions, whereas pupillometry alone would have suggested anger is the more potent stimulus. It was only by combining these measures that we could dissociate the broad, arousal-based impact of anger from the specific, socially-informative neural engagement elicited by fear. Future studies should leverage this multimodal strategy to investigate whether this dichotomy generalizes to other stimuli, such as dynamic faces or full-body expressions, and to explore its potential alterations in clinical populations with deficits in threat or social cue processing.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The study procedure involving humans conformed to the Declaration of Helsinki and was approved by the local ethics committee of the Medical Faculty of the University of Marburg (file ref. 39–17 BO). The participants provided their written informed consent to participate in this study.
Author contributions
KW: Methodology, Formal analysis, Data curation, Supervision, Software, Conceptualization, Investigation, Writing – original draft, Writing – review & editing. RK: Writing – original draft, Visualization, Formal analysis, Methodology, Data curation, Writing – review & editing, Validation, Investigation. KR: Writing – original draft, Conceptualization, Supervision, Methodology, Writing – review & editing. JS: Software, Supervision, Writing – review & editing, Writing – original draft. AJ: Methodology, Supervision, Writing – original draft, Resources, Conceptualization, Project administration, Writing – review & editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the Deutsche Forschungsgemeinschaft (German Research Foundation, DFG) under Germany’s Excellence Strategy (EXC 3066/1 “The Adaptive Mind”, Project No. 533717223), by the DFG – Project-ID 521379614 (projects B01 and INF) – TRR 393 and by the DYNAMIC center, funded by the LOEWE program of the HMWK (grant number: LOEWE1/16/519/03/09.001(0009)/98).
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The author AJ declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.
Generative AI statement
The author(s) declared that Generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Benedek, M., Jauk, E., Beaty, R., Fink, A., Koschutnig, K., and Neubauer, A. C. (2016). Brain mechanisms associated with internally directed attention and self-generated thought. Sci. Rep. 6:22959. doi: 10.1038/srep22959,
Blakemore, S. J. (2008). The social brain in adolescence. Nat. Rev. Neurosci. 9, 267–277. doi: 10.1038/nrn2353,
Bradley, M. M., Miccoli, L., Escrig, M. A., and Lang, P. J. (2008). The pupil as a measure of emotional arousal and autonomic activation. Psychophysiology 45, 602–607. doi: 10.1111/j.1469-8986.2008.00654.x,
Corbetta, M., and Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3, 201–215. doi: 10.1038/nrn755,
Davis, F. C., Somerville, L. H., Ruberry, E. J., Berry, A. B., Shin, L. M., and Whalen, P. J. (2011). A tale of two negatives: differential memory modulation by threat-related facial expressions. Emotion 11, 647–655. doi: 10.1037/a0021625,
Davis, M., and Whalen, P. J. (2001). The amygdala: vigilance and emotion. Mol. Psychiatry 6, 13–34. doi: 10.1038/sj.mp.4000812,
de Gelder, B. (2006). Towards the neurobiology of emotional body language. Nat. Rev. Neurosci. 7, 242–249. doi: 10.1038/nrn1872,
de Gelder, B., van Honk, J., and Tamietto, M. (2011). Emotion in the brain: of low roads, high roads and roads less travelled. Nat. Rev. Neurosci. 12:425. doi: 10.1038/nrn2920-c1,
Deen, B., Koldewyn, K., Kanwisher, N., and Saxe, R. (2015). Functional organization of social perception and cognition in the superior temporal sulcus. Cereb. Cortex 25, 4596–4609. doi: 10.1093/cercor/bhv111,
Ekman, P. (1992). An argument for basic emotions. Cogn. Emot. 6, 169–200. doi: 10.1080/02699939208411068
Enzi, B., Amirie, S., and Brüne, M. (2016). Empathy for pain-related dorsolateral prefrontal activity is modulated by angry face perception. Exp. Brain Res. 234, 3335–3345. doi: 10.1007/s00221-016-4731-4,
Fairhall, S. L., and Ishai, A. (2007). Effective connectivity within the distributed cortical network for face perception. Cereb. Cortex 17, 2400–2406. doi: 10.1093/cercor/bhl148,
Frässle, S., Paulus, F. M., Krach, S., Schweinberger, S. R., Stephan, K. E., and Jansen, A. (2016). Mechanisms of hemispheric lateralization: asymmetric interhemispheric recruitment in the face perception network. NeuroImage 124, 977–988. doi: 10.1016/j.neuroimage.2015.09.055,
Fusar-Poli, P., Placentino, A., Carletti, F., Allen, P., Landi, P., Abbamonte, M., et al. (2009a). Laterality effect on emotional faces processing: ALE meta-analysis of evidence. Neurosci. Lett. 452, 262–267. doi: 10.1016/j.neulet.2009.01.065,
Fusar-Poli, P., Placentino, A., Carletti, F., Landi, P., Allen, P., Surguladze, S., et al. (2009b). Functional atlas of emotional faces processing: a voxel-based meta-analysis of 105 functional magnetic resonance imaging studies. J. Psychiatry Neurosci. 34, 418–432. doi: 10.1139/jpn.0953
Grosbras, M. H., Beaton, S., and Eickhoff, S. B. (2012). Brain regions involved in human movement perception: a quantitative voxel-based meta-analysis. Hum. Brain Mapp. 33, 431–454. doi: 10.1002/hbm.21222,
Grosbras, M. H., and Paus, T. (2006). Brain networks involved in viewing angry hands or faces. Cereb. Cortex 16, 1087–1096. doi: 10.1093/cercor/bhj050,
Han, S., Jiang, Y., Mao, L., Humphreys, G. W., and Gu, H. (2005). Attentional modulation of perceptual grouping in human visual cortex: functional MRI studies. Hum. Brain Mapp. 25, 424–432. doi: 10.1002/hbm.20119,
Haxby, J. V., Hoffman, E. A., and Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends Cogn. Sci. 4, 223–233. doi: 10.1016/S1364-6613(00)01482-0,
Haxby, J. V., Hoffman, E. A., and Gobbini, M. I. (2002). Human neural systems for face recognition and social communication. Biol. Psychiatry 51, 59–67. doi: 10.1016/S0006-3223(01)01330-0,
Honma, M., Tanaka, Y., Osada, Y., and Kuriyama, K. (2012). Perceptual—and not physical—eye contact elicits pupillary dilation. Biol. Psychol. 89, 112–116. doi: 10.1016/j.biopsycho.2011.09.015,
Kahneman, D., and Beatty, J. (1966). Pupil diameter and load on memory. Science 154, 1583–1585. doi: 10.1126/science.154.3756.1583,
Kahneman, D., and Wright, P. (1971). Changes of pupil size and rehearsal strategies in a short-term memory task. Q. J. Exp. Psychol. 23, 187–196. doi: 10.1080/14640747108400239,
Kahneman, D., Tursky, B., Shapiro, D., and Crider, A. (1969). Pupillary, heart rate, and skin resistance changes during a mental task. J. Exp. Psychol. 79, 164–167. doi: 10.1037/h0026952
Kessler, R., Rusch, K. M., Wende, K. C., Schuster, V., and Jansen, A. (2021). Revisiting the effective connectivity within the distributed cortical network for face perception. NeuroImage 1:100045. doi: 10.1016/j.ynirp.2021.100045,
Klein, F., Iffland, B., Schindler, S., Wabnitz, P., and Neuner, F. (2015). This person is saying bad things about you: the influence of physically and socially threatening context information on the processing of inherently neutral faces. Cogn. Affect. Behav. Neurosci. 15, 736–748. doi: 10.3758/s13415-015-0361-8,
Klein, I., Paradis, A. L., Poline, J. B., Kosslyn, S. M., and Le Bihan, D. (2000). Transient activity in the human calcarine cortex during visual-mental imagery: an event-related fMRI study. J. Cogn. Neurosci. 12, 15–23. doi: 10.1162/089892900564037,
Kret, M. E., Stekelenburg, J. J., Roelofs, K., and de Gelder, B. (2013). Perception of face and body expressions using electromyography, pupillometry and gaze measures. Front. Psychol. 4:28. doi: 10.3389/fpsyg.2013.00028,
Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D. H. J., Hawk, S. T., and van Knippenberg, A. (2010). Presentation and validation of the Radboud faces database. Cogn. Emot. 24, 1377–1388. doi: 10.1080/02699930903485076
Lanzilotto, M., Dal Monte, O., Diano, M., Panormita, M., Battaglia, S., Celeghin, A., et al. (2025). Learning to fear novel stimuli by observing others in the social affordance framework. Neurosci. Biobehav. Rev. 169:106006. doi: 10.1016/j.neubiorev.2025.106006,
Murphy, P. R., Robertson, I. H., Balsters, J. H., and O’Connell, R. G. (2011). Pupillometry and P3 index the locus coeruleus-noradrenergic arousal function in humans. Psychophysiology 48, 1532–1543. doi: 10.1111/j.1469-8986.2011.01226.x
Nagy, E., Prentice, L., and Wakeling, T. (2021). Atypical facial emotion recognition in children with ASD: exploratory analysis on task demands. Perception 50, 819–833. doi: 10.1177/03010066211038154,
Neumann, M. F., Mohamed, T. N., and Schweinberger, S. R. (2011). Face and object encoding under perceptual load: ERP evidence. NeuroImage 54, 3021–3027. doi: 10.1016/j.neuroimage.2010.10.075,
Oldfield, R. C. (1971). The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9, 97–113. doi: 10.1016/0028-3932(71)90067-4
Paré, S., Bleau, M., Dricot, L., Ptito, M., and Kupers, R. (2023). Brain structural changes in blindness: a systematic review and an anatomical likelihood estimation (ALE) meta-analysis. Neurosci. Biobehav. Rev. 150:105165. doi: 10.1016/j.neubiorev.2023.105165,
Puffet, A. S., and Rigoulot, S. (2025). The role of cognitive load in automatic integration of emotional information from face and body. Sci. Rep. 15, 28184. doi: 10.1038/s41598-025-12511-8
Rossion, B. (2008). Constraining the cortical face network by neuroimaging studies of acquired prosopagnosia. Neuroimage 40, 423–426. doi: 10.1016/j.neuroimage.2007.10.047
Tamietto, M., Castelli, L., Vighetti, S., Perozzo, P., Geminiani, G., Weiskrantz, L., et al. (2009). Unseen facial and bodily expressions trigger fast emotional reactions. Proc. Natl. Acad. Sci. 106, 17661–17666. doi: 10.1073/pnas.0908994106,
Uljarevic, M., and Hamilton, A. (2013). Recognition of emotions in autism: a meta-analysis. J. Autism Dev. Disord. 43, 1517–1526. doi: 10.1007/s10803-012-1695-5,
Vinck, M., Batista-Brito, R., Knoblich, U., and Cardin, J. A. (2015). Arousal and locomotion make distinct contributions to cortical activity patterns and visual encoding. Neuron 86, 740–754. doi: 10.1016/j.neuron.2015.03.028,
Vuilleumier, P., Armony, J. L., Driver, J., and Dolan, R. J. (2003). Distinct spatial frequency sensitivities for processing faces and emotional expressions. Nat. Neurosci. 6, 624–631. doi: 10.1038/nn1057
Vytal, K., and Hamann, S. (2010). Neuroimaging support for discrete neural correlates of basic emotions: a meta-analysis. J. Cogn. Neurosci. 22, 2864–2885. doi: 10.1162/jocn.2009.21366
Watson, R., Huis in 't Veld, E. M., and de Gelder, B. (2016). The neural basis of individual face and object perception. Front. Hum. Neurosci. 10:66. doi: 10.3389/fnhum.2016.00066
Wierda, S. M., van Rijn, H., Taatgen, N. A., and Martens, S. (2012). Pupil dilation deconvolution reveals the dynamics of attention at high temporal resolution. Proc. Natl. Acad. Sci. 109, 8456–8460. doi: 10.1073/pnas.1201858109,
Willenbockel, V., Sadr, J., Fiset, D., Horne, G. O., Gosselin, F., and Tanaka, J. W. (2010). Controlling low-level image properties: The SHINE toolbox. Behav. Res. Methods 42, 671–684. doi: 10.3758/BRM.42.3.671
Keywords: emotion quality, face processing, fMRI, gestalt, occipital cortex, perceptual load, pupillometry, superior temporal cortex
Citation: Wende KC, Kessler R, Rusch KM, Sommer J and Jansen A (2026) Differential arousal and neural engagement for angry and fearful faces: a combined pupillometric and fMRI study. Front. Hum. Neurosci. 19:1739802. doi: 10.3389/fnhum.2025.1739802
Edited by:
Matteo Toscani, Bournemouth University, United KingdomCopyright © 2026 Wende, Kessler, Rusch, Sommer and Jansen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Kim C. Wende, a2ltLndlbmRlQHN0YWZmLnVuaS1tYXJidXJnLmRl
Kristin M. Rusch1