Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Natural Language Processing

Volume 8 - 2025 | doi: 10.3389/frai.2025.1618698

This article is part of the Research TopicThe Convergence of Cognitive Neuroscience and Artificial Intelligence: Unraveling the Mysteries of Emotion, Perception, and Human CognitionView all 8 articles

EmoShiftNet: A Shift-Aware Multi-Task Learning Framework with Fusion Strategies for Emotion Recognition in Multi-Party Conversations

Provisionally accepted
  • Informatics Institute of Technology, Colombo, Sri Lanka

The final, formatted version of the article will be published soon.

Emotion Recognition in Conversations (ERC) is vital for applications such as mental health monitoring, virtual assistants, and human-computer interaction. However, existing ERC models often neglect emotion shifts—transitions between emotional states that occur across dialogue turns in multi-party conversations (MPCs). These shifts are subtle, context-dependent, and further complicated by class imbalance in datasets like Multimodal EmotionLines Dataset (MELD). To address this, this study proposes EmoShiftNet, a shift-aware multi-task learning (MTL) framework that jointly performs emotion classification and emotion shift detection. The model fuses multimodal features, contextualized text embeddings from BERT, acoustic features such as Mel-Frequency Cepstral Coefficients (MFCCs), pitch, loudness, and temporal cues like pause duration, speaker overlap, utterance length to model both static and dynamic emotional signals. Emotion shift detection is included as an auxiliary task using a composite loss function combining focal loss, binary cross-entropy, and triplet margin loss. Evaluated on the MELD dataset, EmoShiftNet achieves higher emotion recognition F1-score compared to both traditional and graph-based ERC models. It also improves the recognition of minority emotions under imbalance. These results highlight the value of modeling emotional transitions and show that MTL enhances contextual awareness in ERC systems.

Keywords: emotion recognition, emotion shift detection, deep learning, speechemotion analysis, multi-party conversations

Received: 26 Apr 2025; Accepted: 19 Aug 2025.

Copyright: © 2025 Nirujan and Yapa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Hinduja Nirujan, Informatics Institute of Technology, Colombo, Sri Lanka

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.