AUTHOR=Gu Xingwang , Zhou Yang , Zhao Jianchun , Zhang Hongzhe , Pan Xinlei , Li Bing , Zhang Bilei , Wang Yuelin , Xia Song , Lin Hailan , Wang Jie , Ding Dayong , Li Xirong , Wu Shan , Yang Jingyuan , Chen Youxin TITLE=Diagnostic performance and generalizability of deep learning for multiple retinal diseases using bimodal imaging of fundus photography and optical coherence tomography JOURNAL=Frontiers in Cell and Developmental Biology VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/cell-and-developmental-biology/articles/10.3389/fcell.2025.1665173 DOI=10.3389/fcell.2025.1665173 ISSN=2296-634X ABSTRACT=PurposeTo develop and evaluate deep learning (DL) models for detecting multiple retinal diseases using bimodal imaging of color fundus photography (CFP) and optical coherence tomography (OCT), assessing diagnostic performance and generalizability.MethodsThis cross-sectional study utilized 1445 CFP-OCT pairs from 1,029 patients across three hospitals. Five bimodal models developed, and the model with best performance (Fusion-MIL) was tested and compared with CFP-MIL and OCT-MIL. Models were trained on 710 pairs (Maestro device), validated on 241, and tested on 255 (dataset 1). Additional tests used different devices and scanning patterns: 88 pairs (dataset 2, DRI-OCT), 91 (dataset 3, DRI-OCT), 60 (dataset 4, Visucam/VG200 OCT). Seven retinal conditions, including normal, diabetic retinopathy, dry and wet age-related macular degeneration, pathologic myopia (PM), epiretinal membran, and macular edema, were assessed. PM ATN (atrophy, traction, neovascularization) classification was trained and tested on another 1,184 pairs. Area under receiver operating characteristic curve (AUC) was calculated to evaluated the performance.ResultsFusion-MIL achieved mean AUC 0.985 (95% CI 0.971–0.999) in dataset 2, outperforming CFP-MIL (0.876, P < 0.001) and OCT-MIL (0.982, P = 0.337), as well as in dataset 3 (0.978 vs. 0.913, P < 0.001 and 0.962, P = 0.025) and dataset 4 (0.962 vs. 0.962, P < 0.001 and 0.962, P = 0.079). Fusion-MIL also achieved superior accuracy. In ATN classification, AUC ranges 0.902–0.997 for atrophy, 0.869–0.982 for traction, and 0.742–0.976 for neovascularization.ConclusionBimodal Fusion-MIL improved diagnosis over single-modal models, showing strong generalizability across devices and detailed grading ability, valuable for various scenarios.