ORIGINAL RESEARCH article

Front. Psychol.

Sec. Emotion Science

Volume 16 - 2025 | doi: 10.3389/fpsyg.2025.1548975

Exploring the Impact of Noise, Language Familiarity, and Experimental Settings on Emotion Recognition

Provisionally accepted
  • 1University of Campania Luigi Vanvitelli, Caserta, Campania, Italy
  • 2Delft University of Technology, Delft, Netherlands

The final, formatted version of the article will be published soon.

Introduction: This work aims to understand the contextual factors affecting speech emotion recognition (SER), more specifically the current research investigates whether the identification of vocal emotional expressions of anger, fear, sadness, joy, and neutrality is affected by three factors: a) the experimental setting, exploring vocal emotion recognition in both a controlled, soundproof laboratory and a more natural listening environment; b) the effect of stimuli's background noise: sentences were presented with three different levels of noise to gradually increase the level of difficulty: one clear (no noise) condition and two noise conditions; c) language familiarity, since the stimuli comprised Italian sentences, and participants were both native (Italians) and Dutch speakers, who did not know Italian. Method: Dutch and Italian participants were involved in a vocal emotion recognition task carried out in two different experimental settings (realistic vs. laboratory). The stimuli were vocal utterances from the Italian EMOVO dataset, conveying emotions like anger, fear, sadness, joy, and neutrality, and were presented in three different noise conditions. Results: Concerning the effect of the experimental setting, even in higher levels of background noise conditions, individuals possess the remarkable ability to discern emotional nuances conveyed through voice. Regarding familiarity with the language, differences in emotion recognition performance between the Italian and Dutch listeners were observed, but the error magnitude was contingent on the emotional categories. Higher noise levels reduced accuracy, but people could still discern emotions, especially prosody.The study highlighted that emotion recognition is influenced by variables such as listening context, background noise, and language familiarity. These results could be useful for developing robust Speech Emotion Recognition (SER) systems and improving human-computer interaction.

Keywords: speech recognition1, vocal emotion recognition2, noise3, language proficiency4, language understanding5

Received: 20 Dec 2024; Accepted: 03 Jun 2025.

Copyright: © 2025 Amorese, Cuciniello, Alterio, Pepe, Scharenborg, Cordasco and Esposito. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Terry Amorese, University of Campania Luigi Vanvitelli, Caserta, 81100, Campania, Italy

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.