ORIGINAL RESEARCH article
Front. Cell Dev. Biol.
Sec. Molecular and Cellular Pathology
This article is part of the Research TopicIntegrating Cutting-edge Technologies in Ophthalmology: From AI Breakthroughs to Cellular Research DiscoveriesView all articles
Image-Text Guided Fundus Vessel Segmentation via Attention Mechanism and Gated Residual Learning
Provisionally accepted- 1Nanjing University of Aeronautics and Astronautics, Nanjing, China
- 2Peking University School and Hospital of Stomatology & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices & NHC Key Laboratory of Digital Stomatology, Beijing, China
- 3Shenzhen Eye Hospital, Shenzhen Eye Medical Center, Southern Medical University, Shenzhen, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Background: Fundus vessel segmentation is crucial for the early diagnosis of ocular diseases. However, existing deep learning-based methods, although effective for detecting coarse vessels, still face challenges in segmenting fine vessels and heavily rely on time-consuming and labor-intensive pixel-level annotations. Methods: To alleviate these limitations, this study proposes an image-text guided segmentation model enhanced with the Squeeze-and-Excitation (SE) module and gated residual learning. Concretely, the multimodal fundus vessel datasets with text labels are primarily constructed, effectively supporting our pioneering effort to successfully introduce an image-text model into fundus vessel segmentation. Secondly, an improved image-text model is meticulously designed, focusing on the following two aspects: (1) embedding the SE module in the CNN backbone to adaptively recalibrate channel weights for enhanced vessel feature representation; (2) integrating gated residual learning into the ViT backbone to dynamically regulate the information flow between image and text features. Results: Extensive quantitative and qualitative experiments on two publicly available datasets, including DRIVE and ROSE-1, demonstrate that the proposed model achieves superior segmentation performance. Specifically, on the DRIVE dataset, the model attains an F1-score of 82.01%, an accuracy of 95.72%, a sensitivity of 83.25%, and a specificity of 97.43%. On the ROSE-1 dataset, the model records an F1-score of 86.34%, an accuracy of 94.61%, a sensitivity of 90.14%, and a specificity of 95.80%. Compared with most deep learning methods, these results reveal the competitiveness of the improved model, indicating its feasibility and potential value in fundus vessel segmentation, which is expected to expand a new research approach in this field.
Keywords: Fundus vessel segmentation, deep learning, Image-text, Squeeze-and-excitation, gatedresidual learning
Received: 22 Sep 2025; Accepted: 30 Nov 2025.
Copyright: © 2025 Xu, Liu, Shen, Tan, Tian, Chi and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Jianguo Xu
Wei Chi
Weihua Yang
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
