ORIGINAL RESEARCH article
Front. Cell Dev. Biol.
Sec. Molecular and Cellular Pathology
Volume 13 - 2025 | doi: 10.3389/fcell.2025.1642539
This article is part of the Research TopicArtificial Intelligence Applications in Chronic Ocular Diseases, Volume IIView all 45 articles
Multimodal Reasoning Agent for Enhanced Ophthalmic Decision-Making: A Preliminary Real-World Clinical Validation
Provisionally accepted- Shenzhen Eye Hospital, Shenzhen, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Although large language models (LLMs) show significant potential in clinical practice, accurate diagnosis and treatment planning in ophthalmology require multimodal integration of imaging, clinical history, and guideline-based knowledge. Current LLMs predominantly focus on unimodal language tasks and face limitations in specialized ophthalmic diagnosis due to domain knowledge gaps, hallucination risks, and inadequate alignment with clinical workflows. This study introduces a structured reasoning agent (ReasonAgent) that integrates a multimodal visual analysis module, a knowledge retrieval module, and a diagnostic reasoning module to address the limitations of current AI systems in ophthalmic decision-making. Validated on 30 real-world ophthalmic cases (27 common and 3 rare diseases), ReasonAgent demonstrated diagnostic accuracy comparable to ophthalmology residents (β=-0.07, p=0.65). However, in treatment planning, it significantly outperformed both p=0.01) and residents (β=1.71, p<0.001), particularly excelling in rare disease scenarios (all p<0.05). While GPT-4o showed vulnerabilities in rare cases (90.48% low diagnostic scores), ReasonAgent's hybrid design mitigated errors through structured reasoning. Statistical analysis identified significant case-level heterogeneity (diagnosis ICC=0.28), highlighting the need for domain-specific AI solutions in complex clinical contexts. This framework establishes a novel paradigm for domain-specific AI in real-world clinical practice, demonstrating the potential of modularized architectures to advance decision fidelity through human-aligned reasoning pathways.
Keywords: artificial intelligence, Large language models, Reasoning Agent, GPT-4o, ocular diseases
Received: 06 Jun 2025; Accepted: 10 Jul 2025.
Copyright: © 2025 Zhuang, Fang, Li, Bai, Hei, Feng, Li and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Shaochong Zhang, Shenzhen Eye Hospital, Shenzhen, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.