ORIGINAL RESEARCH article
Front. Med.
Sec. Ophthalmology
Volume 12 - 2025 | doi: 10.3389/fmed.2025.1614046
This article is part of the Research TopicInnovative Advancements in Eye Image Processing for Improved Ophthalmic DiagnosisView all articles
A Dual Attention and Multi-Scale Fusion Network for Diabetic Retinopathy Image Analysis
Provisionally accepted- 1Changchun University of Chinese Medicine, Changchun, China
- 2Jiangxi Academy of Sciences, Nanchang, Jiangxi Province, China
- 3National University of Defense Technology, Changsha, Hunan Province, China
- 4Wenzhou Medical University, Wenzhou, Zhejiang Province, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Robust classification of medical images is crucial for reliable automated diagnosis, yet remains challenging due to heterogeneous lesion appearances and imaging inconsistencies. We introduce DWAM-MSFINET (Dual Window Adaptation and Multi-Scale Feature Integration Network ), a novel deep neural architecture designed to address these complexities through a dual-pathway integration of attention and resolution-aware representation learning. Specifically, the Multi-Scale Feature Integration (MSFI) module hierarchically aggregates semantic cues across spatial resolutions, enhancing the network's capacity to identify both fine-grained and coarse pathological patterns. Complementarily, the Dual Weighted Attention Mechanism (DWAM) adaptively modulates feature responses in both spatial and channel dimensions, enabling selective focus on clinically salient structures. This unified framework synergizes localized sensitivity with global semantic coherence, effectively mitigating intra-class variability and improving diagnostic generalization. DWAM-MSFINET achieved 78.6% Top-1 accuracy on the standalone Messidor dataset (Table 2), demonstrating robustness against domain shift. DWAM-MSFINET surpasses state-of-the-art CNN and Transformer-based models, achieving a Top-1 accuracy of 82.59%, outperforming ResNet50 (81.68%) and Swin Transformer (80.26%), while inference latency is 16.0 ms per image (not seconds) when processing batches of 16 images on NVIDIA RTX 3090, equivalent to 62.5 images per second. These results validate the efficacy of our approach for scalable, real-time medical image analysis in clinical workflows. We have released our code and datasets at https://github.com/eleen7/data.
Keywords: Medical image classification, Multi-scale feature fusion, Dual attention mechanism, Adaptive Feature Representation, deep neural networks, Lesion recognition
Received: 18 Apr 2025; Accepted: 25 Aug 2025.
Copyright: © 2025 Zhang, Liu, Zhan, Gao, Xie and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Dong Xie, Changchun University of Chinese Medicine, Changchun, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.