ORIGINAL RESEARCH article
Front. Neurosci.
Sec. Perception Science
Volume 19 - 2025 | doi: 10.3389/fnins.2025.1657562
This article is part of the Research TopicThe Convergence of Cognitive Neuroscience and Artificial Intelligence: Unraveling the Mysteries of Emotion, Perception, and Human CognitionView all 9 articles
Object-Scene Semantics Correlation Analysis for Image Emotion Classification
Provisionally accepted- Northwestern Polytechnical University, Xi'an, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Image emotion classification (IEC), which predicts human emotional perception from images, is a research highlight for its wide applications. Recently, most existing methods have focused on predicting emotions by mining semantic information. However, the "affective gap" between low-level pixels and high-level emotions constrains semantic representation and degrades model performance. It has been demonstrated in psychology that emotions can be triggered by the interaction between meaningful objects and their rich surroundings within an image. Inspired by this, we propose an Object-Scene Attention Fusion Network (OSAFN) that leverages object-level concepts and scene-level reasoning as auxiliary information for enhanced emotional recognition. Specifically, concepts are selected by utilizing an external concept extraction tool, and an Appraisal-based Chain-of-Thought (Appraisal-CoT) prompting is introduced to guide large language models in generating scene information. Next, two different attention-based modules are developed for aligning semantic features with visual features to enhance visual representations. Then, an adaptive fusion strategy is introduced for integrating the results of both the object-semantic stream and the scene-semantic stream. Additionally, a polarity-aware contrastive loss is proposed to model the hierarchical structure of emotions, improving the discrimination of fine-grained emotional categories. Experiments on four affective datasets demonstrate that OSAFN achieves superior performance and represents a notable contribution in the field of IEC.
Keywords: human cognition, semantic attention, Adaptive fusion, Polarity-awarecontrastive loss, Image Emotion Classification
Received: 01 Jul 2025; Accepted: 21 Aug 2025.
Copyright: © 2025 Zhou, Zhai, Chen and Lu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Zhengjun Zhai, Northwestern Polytechnical University, Xi'an, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.