Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med.

Sec. Ophthalmology

Volume 12 - 2025 | doi: 10.3389/fmed.2025.1614046

This article is part of the Research TopicInnovative Advancements in Eye Image Processing for Improved Ophthalmic DiagnosisView all articles

A Dual Attention and Multi-Scale Fusion Network for Diabetic Retinopathy Image Analysis

Provisionally accepted
Menglin  ZhangMenglin Zhang1Qi  LiuQi Liu2Jialei  ZhanJialei Zhan3Jinwen  GaoJinwen Gao4Dong  XieDong Xie1*Jialang  LiuJialang Liu3
  • 1Changchun University of Chinese Medicine, Changchun, China
  • 2Jiangxi Academy of Sciences, Nanchang, Jiangxi Province, China
  • 3National University of Defense Technology, Changsha, Hunan Province, China
  • 4Wenzhou Medical University, Wenzhou, Zhejiang Province, China

The final, formatted version of the article will be published soon.

Robust classification of medical images is crucial for reliable automated diagnosis, yet remains challenging due to heterogeneous lesion appearances and imaging inconsistencies. We introduce DWAM-MSFINET (Dual Window Adaptation and Multi-Scale Feature Integration Network ), a novel deep neural architecture designed to address these complexities through a dual-pathway integration of attention and resolution-aware representation learning. Specifically, the Multi-Scale Feature Integration (MSFI) module hierarchically aggregates semantic cues across spatial resolutions, enhancing the network's capacity to identify both fine-grained and coarse pathological patterns. Complementarily, the Dual Weighted Attention Mechanism (DWAM) adaptively modulates feature responses in both spatial and channel dimensions, enabling selective focus on clinically salient structures. This unified framework synergizes localized sensitivity with global semantic coherence, effectively mitigating intra-class variability and improving diagnostic generalization. DWAM-MSFINET achieved 78.6% Top-1 accuracy on the standalone Messidor dataset (Table 2), demonstrating robustness against domain shift. DWAM-MSFINET surpasses state-of-the-art CNN and Transformer-based models, achieving a Top-1 accuracy of 82.59%, outperforming ResNet50 (81.68%) and Swin Transformer (80.26%), while inference latency is 16.0 ms per image (not seconds) when processing batches of 16 images on NVIDIA RTX 3090, equivalent to 62.5 images per second. These results validate the efficacy of our approach for scalable, real-time medical image analysis in clinical workflows. We have released our code and datasets at https://github.com/eleen7/data.

Keywords: Medical image classification, Multi-scale feature fusion, Dual attention mechanism, Adaptive Feature Representation, deep neural networks, Lesion recognition

Received: 18 Apr 2025; Accepted: 25 Aug 2025.

Copyright: © 2025 Zhang, Liu, Zhan, Gao, Xie and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dong Xie, Changchun University of Chinese Medicine, Changchun, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.