Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Neurosci.

Sec. Perception Science

Volume 19 - 2025 | doi: 10.3389/fnins.2025.1657562

This article is part of the Research TopicThe Convergence of Cognitive Neuroscience and Artificial Intelligence: Unraveling the Mysteries of Emotion, Perception, and Human CognitionView all 9 articles

Object-Scene Semantics Correlation Analysis for Image Emotion Classification

Provisionally accepted
Zibo  ZhouZibo ZhouZhengjun  ZhaiZhengjun Zhai*Huimin  ChenHuimin ChenSheng  LuSheng Lu
  • Northwestern Polytechnical University, Xi'an, China

The final, formatted version of the article will be published soon.

Image emotion classification (IEC), which predicts human emotional perception from images, is a research highlight for its wide applications. Recently, most existing methods have focused on predicting emotions by mining semantic information. However, the "affective gap" between low-level pixels and high-level emotions constrains semantic representation and degrades model performance. It has been demonstrated in psychology that emotions can be triggered by the interaction between meaningful objects and their rich surroundings within an image. Inspired by this, we propose an Object-Scene Attention Fusion Network (OSAFN) that leverages object-level concepts and scene-level reasoning as auxiliary information for enhanced emotional recognition. Specifically, concepts are selected by utilizing an external concept extraction tool, and an Appraisal-based Chain-of-Thought (Appraisal-CoT) prompting is introduced to guide large language models in generating scene information. Next, two different attention-based modules are developed for aligning semantic features with visual features to enhance visual representations. Then, an adaptive fusion strategy is introduced for integrating the results of both the object-semantic stream and the scene-semantic stream. Additionally, a polarity-aware contrastive loss is proposed to model the hierarchical structure of emotions, improving the discrimination of fine-grained emotional categories. Experiments on four affective datasets demonstrate that OSAFN achieves superior performance and represents a notable contribution in the field of IEC.

Keywords: human cognition, semantic attention, Adaptive fusion, Polarity-awarecontrastive loss, Image Emotion Classification

Received: 01 Jul 2025; Accepted: 21 Aug 2025.

Copyright: © 2025 Zhou, Zhai, Chen and Lu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Zhengjun Zhai, Northwestern Polytechnical University, Xi'an, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.