AUTHOR=Gu Xingwang , Zhou Yang , Zhao Jianchun , Zhang Hongzhe , Pan Xinlei , Li Bing , Zhang Bilei , Wang Yuelin , Xia Song , Lin Hailan , Wang Jie , Ding Dayong , Li Xirong , Wu Shan , Yang Jingyuan , Chen Youxin 

TITLE=Diagnostic performance and generalizability of deep learning for multiple retinal diseases using bimodal imaging of fundus photography and optical coherence tomography

JOURNAL=Frontiers in Cell and Developmental Biology

VOLUME=Volume 13 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/cell-and-developmental-biology/articles/10.3389/fcell.2025.1665173

DOI=10.3389/fcell.2025.1665173

ISSN=2296-634X

ABSTRACT=PurposeTo develop and evaluate deep learning (DL) models for detecting multiple retinal diseases using bimodal imaging of color fundus photography (CFP) and optical coherence tomography (OCT), assessing diagnostic performance and generalizability.MethodsThis cross-sectional study utilized 1445 CFP-OCT pairs from 1,029 patients across three hospitals. Five bimodal models developed, and the model with best performance (Fusion-MIL) was tested and compared with CFP-MIL and OCT-MIL. Models were trained on 710 pairs (Maestro device), validated on 241, and tested on 255 (dataset 1). Additional tests used different devices and scanning patterns: 88 pairs (dataset 2, DRI-OCT), 91 (dataset 3, DRI-OCT), 60 (dataset 4, Visucam/VG200 OCT). Seven retinal conditions, including normal, diabetic retinopathy, dry and wet age-related macular degeneration, pathologic myopia (PM), epiretinal membran, and macular edema, were assessed. PM ATN (atrophy, traction, neovascularization) classification was trained and tested on another 1,184 pairs. Area under receiver operating characteristic curve (AUC) was calculated to evaluated the performance.ResultsFusion-MIL achieved mean AUC 0.985 (95% CI 0.971–0.999) in dataset 2, outperforming CFP-MIL (0.876, P < 0.001) and OCT-MIL (0.982, P = 0.337), as well as in dataset 3 (0.978 vs. 0.913, P < 0.001 and 0.962, P = 0.025) and dataset 4 (0.962 vs. 0.962, P < 0.001 and 0.962, P = 0.079). Fusion-MIL also achieved superior accuracy. In ATN classification, AUC ranges 0.902–0.997 for atrophy, 0.869–0.982 for traction, and 0.742–0.976 for neovascularization.ConclusionBimodal Fusion-MIL improved diagnosis over single-modal models, showing strong generalizability across devices and detailed grading ability, valuable for various scenarios.