SYSTEMATIC REVIEW article

Front. Med., 07 May 2025

Sec. Ophthalmology

Volume 12 - 2025 | https://doi.org/10.3389/fmed.2025.1519768

Artificial intelligence versus manual screening for the detection of diabetic retinopathy: a comparative systematic review and meta-analysis

    HN

    Hasan Nawaz Tahir 1*

    NU

    Naseer Ullah 2*

    MT

    Mursala Tahir 3

    IS

    Inbaraj Susai Domnic 4

    RP

    Ramaprabha Prabhakar 5

    SS

    Semmal Syed Meerasa 5

    AI

    Ahmed Ibrahim AbdElneam 6,7

    ST

    Shahnawaz Tahir 8

    YA

    Yousaf Ali 1

  • 1. Department of Community Medicine, College of Medicine, Dwadimi, Shaqra University, Shaqra, Saudi Arabia

  • 2. Department of Community Medicine, Khyber Medical College Peshawar, Peshawar, Pakistan

  • 3. Department of Community Medicine, Liaquat National Hospital and Medical College, Jinnah Sindh Medical University, Karachi, Pakistan

  • 4. Department of Pharmacology, College of Medicine, Dwadimi, Shaqra University, Shaqra, Saudi Arabia

  • 5. Department of Physiology, College of Medicine, Shaqra University, Shaqra, Saudi Arabia

  • 6. Departments of Clinical Biochemistry and Basic Medical Sciences, College of Medicine, Dwadimi, Shaqra University, Shaqra, Saudi Arabia

  • 7. Molecular Genetics and Enzymology Department, Human Genetics and Genome Research Institute, National Research Center, Dokki, Cairo, Egypt

  • 8. Department of Gastroenterology, Dow University of Health Sciences, Karachi, Pakistan

Article metrics

View details

8

Citations

4,5k

Views

855

Downloads

Abstract

Background:

Diabetic retinopathy is one of the leading causes of blindness globally, among individuals with diabetes mellitus. Early detection through screening can help in preventing disease progression. In recent advancements artificial Intelligence assisted screening has emerged as an alternative to traditional manual screening methods. This diagnostic test accuracy (DTA) review aims to compare the sensitivity and specificity of AI versus manual screening for detecting diabetic retinopathy, focusing on both dilated and un-dilated eyes.

Methods:

A systematic review and meta-analysis were conducted for comparison of AI vs. manual screening of diabetic retinopathy using 25 observational (cross sectional, validation and cohort) studies with total images of 613,690 used for screening published between January 2015 and December 2024. Outcomes of the study was sensitivity, and specificity. Risk of bias was assessed using the QUADAS-2 tool for validation studies, the AXIS tool for cross-sectional studies, and the Newcastle-Ottawa Scale for cohort studies.

Results:

The results of this meta-analysis showed that for un-dilated eyes, AI screening showed pooled sensitivity of 0.90 [95% CI: 0.85–0.94] and pooled specificity of 0.94 [95% CI: 0.91–0.96] while manual screening shows pooled sensitivity of 0.79 [95% CI: 0.60–0.91] and pooled specificity of 0.99 [95% CI: 0.98–0.99]. For dilated eyes the pooled sensitivity of AI screening is 0.95 [95% CI: 0.91–0.97] and pooled specificity is 0.87 [95% CI: 0.79–0.92], while manual screening sensitivity is 0.90 [95% CI: 0.87–0.92] and specificity is 0.99 [95% CI: 0.99–1.00]. These data show comparable sensitivities and specificities of AI and manual screening, with AI performing better in sensitivity.

Conclusion:

AI-assisted screening for diabetic retinopathy shows comparable sensitivity and specificity compared to manual screening. These results suggest that AI can be a reliable alternative in clinical settings, with increased early detection rates and reducing the burden on ophthalmologists. Further research is needed to validate these findings.

Systematic review registration:

https://www.crd.york.ac.uk/PROSPERO/home, CRD42024596611.

Introduction

Diabetic Retinopathy (DR) is one of the most prevalent microvascular complications of diabetes, characterized by damage to the retina due to prolonged hyperglycemia. It remains a leading cause of blindness globally, particularly among working-age adults. The World Health Organization (WHO) estimates that over 422 million people worldwide have diabetes (1), with approximately 103.12 million adult individuals affected by diabetic retinopathy and 160.50 million by 2045 (2). In advanced stages, untreated DR can lead to severe vision impairment and blindness. According to a 2023 global report on vision by the WHO report globally distance vision impairment or blindness from diabetic retinopathy are 3.9 million (3). Early detection and timely treatment can significantly reduce the risk of vision loss, but widespread screening remains a challenge, particularly in low-resource settings.

Screening for diabetic retinopathy has traditionally been performed through manual methods, including fundus photography, direct ophthalmoscopy, mydriatic and non mydriatic retinal photography, slit lamp microscopy, and retinal video recording conducted by trained ophthalmologists. However, these methods are often time-consuming and require specialized equipment and personnel, limiting their availability in certain regions (4). Recent technological advancements have led to the development of automated screening methods using artificial intelligence (AI). AI-based algorithms, particularly deep learning models, can analyze retinal images and detect signs of DR with comparable sensitivity and specificity to human graders. These systems have the potential to increase screening efficiency, reduce costs, and provide access to screening in underserved populations. AI has been recognized for its ability to identify DR and classify the severity of the condition, making it a valuable tool in large-scale screening programs.

There are few systematic reviews and meta-analyses which have evaluated the performance of AI-based systems for DR screening. Meta-analysis reported high sensitivity and specificity for AI algorithms (5–8). Another review (9) supported these findings but highlighted the variability in performance. However there is no review on comparison of AI vs. manual method to clarify the role of AI in different screening contexts, particularly in comparison to manual methods.

This Review aims to evaluate the performance of AI versus manual screening in DR detection. We systematically review the sensitivity and specificity of AI and manual methods, with a focus on both dilated and un-dilated eye conditions.

Methods

Search strategy

We conducted a literature search for AI and manual screening methods of diabetic retinopathy using PubMed and Google Scholar to identify relevant studies published between January 2015 to September 2024 and a second search was done in Feb 2025 which added 13 studies to included studies which become 25 included studies. Search strategy contain mesh terms and keywords which included “diabetic retinopathy,” “artificial intelligence,” “deep learning,” “manual screening,” and “automated detection.” Only English language articles were included if they show AI-based or manual-based screening methods for DR detection and reported sensitivity and specificity outcomes.

Inclusion criteria

Studies were included if they were observational or validation and evaluated AI algorithms or manual screening for DR with patients aged 15 to 90 years diagnosed with DR and reported sensitivity and specificity outcomes for either dilated or un-dilated eye conditions. Studies were excluded if they did not report the outcomes of interest (specificity and sensitivity), the author of the studies did not respond or if the full text were not available.

Study selection

Initially two independent reviewers screened the articles by titles and abstracts. Once the articles met the inclusion criteria or were uncertain than full texts were obtained for those. The same reviewers then independently assessed the full texts. Discrepancies were resolved through discussion or, if needed, consultation with a third reviewer. PRISMA flow diagram was used for documentation of selection process Figure 1.

Figure 1

Figure 1

PRISMA flow diagram for included studies.

Quality assessment

Each study was assessed for quality by two independent reviewers to evaluate selection bias, outcome/exposure assessment bias, follow-up bias, measurement bias, sample representativeness, reporting bias, index test bias, reference standard bias, flow and timing bias, and ethical considerations bias was evaluated. Three different tools QUADAS-2, AXIS tool, and Newcastle-Ottawa scale was used according to type of studies (validation study, cross-sectional and cohort respectively) to evaluate risk of bias, which were used for strength of evidence of meta-analysis results.

Data extraction

Sensitivity and specificity data for AI and manual screening methods were extracted using a standardized data collection form for dilated or un-dilated eyes. Extracted information included study characteristics such as first author, country, number of participants, number of images, age of participants, comparison to human grader, photographic protocol, reference standard and outcomes of interest like sensitivity, and specificity. Two reviewers independently extracted data to minimize bias, by consensus or consulting a third reviewer disagreements were resolved. The information was initially entered into Excel tables and then transferred to Review Manager 5.4 and R-software for analysis. The risk of bias was assessed using the Newcastle-Ottawa scale for cohort studies, the AXIS tool for cross-sectional studies, and the QUADAS-2 tool for validation studies.

Results

Study characteristics

A total of 25 studies met the inclusion criteria of this review which evaluated Artificial intelligence based screening and manual screening for diabetic retinopathy. Twelve studies reported images of un-dilated eyes screened by AI-based or manual methods, while 14 studies show dilated eyes images screened by AI-based and manual methods. Twelve out of 25 studies were prospective (10–21), and 13 were retrospective design (22–34).

The range of sample size is from 54 to 5,738 in 19 studies with total participants of 29,358 while six studies did not mentioned number of participants but only images, 613,690 images in 25 studies were used for screening process, in a broad geographic range of settings (out patients, hospital, community based and nationwide survey) and populations. The details are given in Table 1.

Table 1

Study Country Study setting No. of images No. of participants Prospective Compared to human graders Photographic protocol Reference standard
Ting et al. 2017 (25) Singapore Community-based and clinic-based populations 225,302 Not mentioned No Yes 2 fields images, Mydriasis Grading by a retinal specialist (>5 years’ experience in conducting diabetic retinopathy assessment)
Sosale et al. 2020 (15) India Outpatient 618 297 Yes Yes 3-fields dilated retinal imaging, Mydriasis Adjudicated diagnosis of the two fellowship-trained vitreoretinal specialists
Surya et al. 2023 (16) India Outpatient 1,234 1,085 Yes Yes 5 fields imaging, No Mydriasis Diagnosis made by the specialist ophthalmologists
Piatti et al. 2024 (13) Italy Outpatient 602 598 Yes Yes 2 field imaging, Mydriasis Classification of the retinal images by the human ophthalmologist grader
Sedova et al. 2022 (14) Austria Outpatient 113 54 Yes Yes 45-degree, 2 fields imaging, No Mydriasis Manual grading of images by retina specialists
Ipp 2021 (10) United states Outpatient 4,004 893 Yes Yes 4-wide field imaging for no Mydriasis and 2 fields imaging No Mydriasis Grading of 4-wide-field stereoscopic dilated fundus photographs by the WFPRC
Tokuda et al. 2022 (17) Japan Inpatient 69 70 Yes No 45-degree, no mydriasis Grading of the fundus images by three retinal experts according to the ICDRS scale
Acharyya et al. 2024 (22) India Outpatient 1,783 Not mentioned No Yes 45-degree, no mydriasis Consensus of three blinded vitreoretinal specialists, with an arbitrator resolving any disagreements.
Arenas-Cavalli et al. 2022 (23) Chile Outpatient 1,142 1,123 No Yes 45-degree, 2 fields, variable for case to case assessment performed remotely by a clinical ophthalmologist.
Li et al. 2022 (11) China Hospital-based study 1,464 1,147 Yes Yes 45-degree, no mydriasis Grading of the retinal fundus images by a certified retinal specialist with more than 12 years of experience, who used the 5-point (ICDRS) scale to assign grades
Limwattanayingyong et al. 2020 (24) Thailand Nationwide screening program 11,148 5,738 No Yes 45-degree, 1 field, no mydriasis Grading of the retinal photographs by a panel of three IRS
Lupidi et al. 2023 (12) Italy Outpatient 831 251 Yes Yes 50-degree, 1 field, no mydriasis Fundus biomicroscopic examination by an experienced retina specialist
González-Gonzalo et al. 2020 (26) Sweden Dataset 600 288 No Yes 45-degree field, no mdriasis Certified ophthalmologist with over 12 years of experience
Lin et al. 2018 (27) United states Dataset 33,000 No no not mentioned Well-trained clinicians according to the International Clinical Diabetic Retinopathy scale
Li et al. 2019 (28) China Hospital-based study 19,233 5,278 No Yes Inner circle of retina Expert committee of three senior ophthalmologists
Soto-Pedre et al. 2015 (18) Spain Dataset 10,556 5,278 Yes Yes 45-degree field, mdriasis One retinal specialist
Hansen et al. 2015 (29) Kenya Community-based 6,788 3,460 No Yes 2 field, mydraisis Moorfields Eye Hospitals Reading Centre in the UK
Rajalakshmi et al. 2018 (19) India Hospital-based study 2,408 301 Yes Yes 45-degree field, mdriasis Ophthalmologists (retina specialists)
Gargeya and Leng 2017 (30) United states Dataset 75,137 Not mentioned No Yes inner retinal circle Panel of human retinal specialists
Wang et al. 2018 (20) India Outpatient 1,661 383 Yes Yes non-steered central image, mydriasis Certified diabetic retinopathy (DR) graders at the Doheny Image Reading Center (DIRC)
Abràmoff et al. 2016 (31) United states Dataset 1,748 874 No Yes 45-degree field, mdriasis Three US Board certified retinal specialists
Zhang et al. 2019 (32) China Hospital-based study 13,767 1,872 No Yes 45-degree field, mdriasis One retinal specialist with over 27 years of experience and two ophthalmologists with over 5 years of experience
Li et al. 2018 (21) China and Australia Hospital-based study 106,244 Not mentioned Yes Yes 45-degree field, mdriasis and non mydraisis Panel of ophthalmologists
Zhang et al. 2022 (33) China Dataset 92,894 Not mentioned No Yes Fundus images Ophthalmologist used international grading system for diabetic retinopathy
Kumar et al. 2016 (34) India Hospital-based study 1,344 368 No Yes 50-degree field, mdriasis Panel of expert ophthalmologists at the Regional Institute of Ophthalmology

Characteristics of included studies.

WFPRC, Wisconsin Fundus Photograph Reading Center; IRS, international retina specialists; ICDRS, International Clinical Diabetic Retinopathy Severity scale. aThese are the external datasets for which accuracy estimates were included in the meta-analysis; datasets used for training and internal validation were not included. b. “Compared to human graders” refers to whether retinal images were graded and compared with the results provided by AI with human graders. c. Where specified the mydriatic or non-mydriatic imaging protocols were followed depending on the study setting, with multiple fields captured. d. For certain studies, the primary reference standard was provided by expert ophthalmologists or retinal specialists with a minimum of 5 years’ experience in diabetic retinopathy assessment, though in some cases, decisions were made through consensus from multiple specialists or reading centers. e. External validation of these studies was conducted in clinical settings such as hospital-based, outpatient, or community-based screening programs, as specified.

Test accuracy

The diagnostic accuracy of AI-based diabetic retinopathy (DR) screening compared to manual methods shows that, in dilated eyes, the SROC curves shows wider confidence intervals of specificities across the included studies, indicating variability in diagnostic performance.

Un-dilated eye screening tends to achieve high sensitivity and specificity values with most of the studies reporting sensitivity and specificity of more than 0.90. This suggests a reliable ability of AI algorithms to correctly identify DR in un-dilated eye examinations. The studies generally cluster around the upper-left corner of the plot, indicating strong diagnostic performance with low rates of false positives and false negatives.

Overall, these SROC plots highlight that AI models demonstrate robust diagnostic accuracy for detecting diabetic retinopathy in both dilated and un-dilated settings, with higher sensitivity and closer specificity compared to manual screening methods in most of the studies as can be seen in the Figures 2, 3.

Figure 2

Figure 2

SROC plot for un-dilated eyes screening.

Figure 3

Figure 3

SROC plot for dilated eyes screening.

Sensitivity

The sensitivity of AI-based screening for dilated eyes show consistent results across the studies with a pooled sensitivity of 0.95 (95% CI: 0.91, 0.97). For manual screening in dilated eyes, the pooled sensitivity reported was 0.90 (95% CI: 0.87, 0.92), showing lower performance than AI as given in Table 2 and Figure 4. For un-dilated eyes, AI screening achieved a pooled sensitivity of 0.92 (95% CI: 0.87, 0.95). In the manual screening of un-dilated eyes images pooled sensitivities of 0.79 (95% CI: 0.60, 0.91) is reported given in Table 2 and Figure 5. AI-based screening shows higher performance than manual screening.

Table 2

Study Outcome Dilated/Un-dilated eye TP FP FP TN Sensitivity (CI at 95%) Specificity (CI at 95%)
Piatti et al. 2024 (13) Mild DR with AI Dilated 70 102 102 399 0.41 [0.33, 0.48] 0.93 [0.90, 0.95]
Piatti et al. 2024 (13) Moderate and beyond with AI Dilated 35 0 0 0 1.00 [0.90, 1.00] Not estimable
Sosale et al. 2020 (15) AI for referable DR Dilated 120 23 23 153 0.84 [0.77, 0.90] 0.99 [0.96, 1.00]
Sosale et al. 2020 (15) AI for any DR Dilated 105 8 8 168 0.93 [0.87, 0.97] 0.91 [0.86, 0.95]
Ting et al. 2017 (25) AI for referable DR Dilated 3,057 9,172 9,172 100,097 0.25 [0.24, 0.26] 1.00 [1.00, 1.00]
Ting et al. 2017 (25) Moderate and beyond with AI Dilated 676 9,969 9,969 102,003 0.06 [0.06, 0.07] 1.00 [1.00, 1.00]
Ipp 2021 (10) AI for Mod and beyond Dilated 356 375 375 2,630 0.49 [0.45, 0.52] 0.99 [0.99, 1.00]
Soto-Pedre et al. 2015 (18) AI screening for DR Dilated 535 1,034 1,034 2,277 0.34 [0.32, 0.37] 0.69 [0.67, 0.70]
Wang et al. 2018 (20) AI screening for DR Dilated 213 205 205 206 0.51 [0.46, 0.56] 0.50 [0.45, 0.55]
Abràmoff et al. 2016 (31) AI screening for DR Dilated 182 88 88 598 0.67 [0.61, 0.73] 0.87 [0.84, 0.90]
Hansen et al. 2015 (29) AI screening for DR Dilated 91 900 900 2,093 0.09 [0.07, 0.11] 0.70 [0.68, 0.72]
Rajalakshmi et al. 2018 (19) AI screening for DR Dilated 184 21 21 84 0.90 [0.85, 0.94] 0.80 [0.71, 0.87]
Kumar et al. 2016 (34) AI screening for DR Dilated 722 176 176 176 0.80 [0.78, 0.83] 0.50 [0.45, 0.55]
Zhang et al. 2019 (32) AI screening for DR (Grading system) Dilated 414 4 4 344 0.99 [0.98, 1.00] 0.99 [0.97, 1.00]
Zhang et al. 2019 (32) AI screening for DR (identification system) Dilated 412 8 8 340 0.98 [0.96, 0.99] 0.98 [0.96, 0.99]
Zhang et al. 2022 (33) AI screening for DR (InceptionV3_299) Dilated 12,440 3,580 3,580 35,953 0.78 [0.77, 0.78] 0.91 [0.91, 0.91]
Zhang et al. 2022 (33) AI screening for DR (InceptionV3_896) Dilated 12,984 3,676 3,676 35,857 0.78 [0.77, 0.79] 0.91 [0.90, 0.91]
Sedova et al. 2022 (14) AI screening for DR Undilated 27 1 1 16 0.96 [0.82, 1.00] 0.80 [0.56, 0.94]
Ipp 2021 (10) AI for Mod to Severe Undilated 331 345 345 2,342 0.49 [0.45, 0.53] 0.99 [0.99, 1.00]
Surya et al. 2023 (16) AI screening for DR Undilated 42 10 10 283 0.81 [0.67, 0.90] 0.91 [0.88, 0.94]
Limwattanayingyong et al. 2020 (24) 1st screening DL for DR Undilated 669 102 102 4,932 0.87 [0.84, 0.89] 0.99 [0.99, 1.00]
Limwattanayingyong et al. 2020 (24) 2nd screening DL for DR Undilated 190 84 84 3,853 0.69 [0.64, 0.75] 0.99 [0.99, 1.00]
Arenas-Cavalli et al. 2022 (23) AI screening for DR Undilated 226 227 227 657 0.50 [0.45, 0.55] 0.98 [0.97, 0.99]
Lupidi et al. 2023 (12) AI screening for DR (Selena +) Undilated 121 4 4 122 0.97 [0.92, 0.99] 0.97 [0.92, 0.99]
Acharyya et al. 2024 (22) AI screening for DR Undilated 848 128 128 732 0.87 [0.85, 0.89] 0.91 [0.88, 0.93]
Li et al. 2022 (11) AI screening for DR Undilated 86 25 25 1,323 0.77 [0.69, 0.85] 0.99 [0.99, 1.00]
Tokuda et al. 2022 (17) AI screening for DR Undilated 13 5 5 49 0.72 [0.47, 0.90] 0.96 [0.87, 1.00]
Li et al. 2019 (28) AI screening for DR Undilated 519 16 16 256 0.98 [0.97, 0.99] 0.94 [0.91, 0.97]
Lin et al. 2018 (27) AI screening for DR Undilated 10,254 1,519 1,519 13,481 0.68 [0.68, 0.69] 0.90 [0.89, 0.90]
González-Gonzalo et al. 2020 (26) AI screening for DR Undilated 132 30 30 295 0.92 [0.86, 0.96] 0.91 [0.87, 0.94]
Gargeya and Leng 2017 (30) AI screening for DR Undilated 813 113 113 761 0.93 [0.91, 0.95] 0.87 [0.85, 0.89]
Li et al. 2018 (21) AI screening for DR Undilated 371 199 199 13,057 0.93 [0.89, 0.95] 0.98 [0.98, 0.99]
Limwattanayingyong et al. 2020 (24) !st screening Manual for DR Undilated 165 124 59 3,915 0.74 [0.67, 0.79] 0.97 [0.96, 0.97]
Limwattanayingyong et al. 2020 (24) 2nd screening Manual for DR Undilated 519 185 71 4,963 0.88 [0.85, 0.90] 0.96 [0.96, 0.97]
Sedova et al. 2022 (14) Manual screening for DR Undilated 21 2 1 32 0.95 [0.77, 1.00] 0.94 [0.80, 0.99]
Sedova et al. 2022 (14) Manual screening for DR Undilated 22 2 1 32 0.96 [0.78, 1.00] 0.94 [0.80, 0.99]
Ting et al. 2017 (25) Manual for referable DR Dilated 3,077 302 768 108,501 0.80 [0.79, 0.81] 1.00 [1.00, 1.00]
Ting et al. 2017 (25) Moderate and beyond with Manual Dilated 558 78 447 111,525 0.56 [0.52, 0.59] 1.00 [1.00, 1.00]

Results for outcomes.

CI, Confidence Interval; DR, Diabetic Retinopathy; Referable DR, severity grade 2 and above; DL, Deep Learning; DLA, Deep Learning Algorithm; FN, False Negative; FP, False Positive; Mod, Moderate; RDR, Referable Diabetic Retinopathy; SVM, Support Vector Machine; TP, True Positive; TN, True Negative; UWF, Ultra-Wide Field Grading.

Figure 4

Figure 4

Specificity forest plot for un-dilated eyes.

Figure 5

Figure 5

Sensitivity forest plot for un-dilated eyes.

Specificity

Pooled specificity of AI screening for dilated eyes was reported at 0.87 (95% CI: 0.79, 0.92) showing a good performance and manual screening for dilated eyes also showed a high pooled specificity value of 0.99 (95% CI: 0.99, 1.00). Showing a good performance of both AI-based and manual screening methods as shown in the Figure 6. For un-dilated eyes, AI screening demonstrated pooled specificity of 0.94 (95% CI: 0.91, 0.96). Manual screening similarly showed robust specificity 0.99 (95% CI: 0.98, 0.99) as given in the Figure 7. Showing that AI a comparable alternative to manual screening.

Figure 6

Figure 6

Specificity forest plot for dilated eyes.

Figure 7

Figure 7

Sensitivity forest plot for dilated eyes.

Multi-test analysis

The combined pooled sensitivity and specificity of dilated eye is 0.94 [95% CI: 0.90; 0.97] and 0.91 [0.83; 0.95] with heterogeneity of 95.2 and 99.9% and p value of 0.0386 and 0.0001, respectively, showing comparable results in the outcomes with high variability among studies as shown in Figures 4, 6. Un-dilated eye report combined pooled sensitivity and specificity of 0.90 [95% CI: 0.85; 0.94] and 0.95 [0.93; 0.97] with heterogeneity of 98.1 and 99.1% and p value of 0.0437 and 0.0001, respectively, showing results with no statistically significant difference as shown in Figures 5, 7.

Risk of bias

Risk of bias was systematically assessed using appropriate tools for the study designs. For the 16 validation studies (13, 16, 18–21, 23, 26–34), the QUADAS-2 tool was used. Thirteen of these studies demonstrated a low risk of bias, while three study shows some concerns particularly in the domain 3 and 4, as shown in the accompanying Figures 8, 9.

Figure 8

Figure 8

Risk of bias assessment traffic light plot for QUADAS-2 tool.

Figure 9

Figure 9

Risk of bias assessment summary plot for QUADAS-2 tool.

For the five cross-sectional studies, the AXIS tool was used to assess the risk of bias (10, 12, 15, 22, 25). The results reported a moderate risk of bias across the studies, with bias related to results and conclusion. These findings are summarized in Table 3.

Table 3

Author Intro Methods Results Conclusions Other Risk
Ting et al. 2017 (25) 100% 100% 50% 75% 50% Moderate
Sosale et al. 2020 (15) 100% 100% 50% 75% 0% Moderate
Ipp 2021 (10) 100% 100% 50% 50% 100% Moderate
Acharyya et al. 2024 (22) 100% 90% 50% 75% 0% Moderate
Lupidi et al. 2023 (12) 100% 100% 50% 75% 0% Moderate

AXIS risk of bias assessment summary-percentages of items satisfied.

AXIS, Appraisal tool for Cross-Sectional Studies; %, percentage of the bias.

In the risk of bias assessment of four cohort studies, the Newcastle-Ottawa Scale was applied. All four studies demonstrated a low risk of bias, in all domains such as selection, comparability, and outcome assessment (11, 14, 17, 24). These results are detailed in Table 4, supporting the reliability of the included cohort studies.

Table 4

Study Adequacy of selection Comparability Outcome assessment Asterisk rating Overall
Representative of the exposed cohorts Selection of the exposed cohorts ascertainment of exposure Demonstration that Outcome of Interest was Not Present at Start of Study Assessment of outcomes Follow-up period long enough for outcome to occur Adequacy of follow-up period among cohorts
Sedova et al. 2022 (14) * * * ** * * * 8.0/9.0 Low
Tokuda et al. 2022 (17) * * * ** * * 7.0/9.0 Low
Li et al. 2022 (11) * * * ** * * 7.0/9.0 Low
Limwattanayingyong et al. 2020 (24) * * ** * * * 7.0/9.0 Low

Asterisk rating in observational studies according Newcastle-Ottawa scale (NOS) tool.

NOS, Newcastle-Ottawa Scale; **Indicates two stars in NOS for comparability domain; *Indicates one star for selection, comparability, or outcome assessment based on NOS guidelines for cohort studies; Adequacy of follow-up period: evaluated based on sufficient follow-up time.

Discussion

The development of artificial intelligence based screening systems has led to potential use as a diagnostic tool in health care system. Evaluating the accuracy of AI in clinical settings is essential to ensure its implementation in clinical settings. Diabetic retinopathy screening is important in preventing vision loss. In this meta-analysis, we assessed the diagnostic accuracy of AI-based systems versus manual screening methods for both dilated and un-dilated eyes, for detecting DR. The aim was to determine whether AI systems could offer a comparable or superior alternative to manual methods in clinical practice.

Our results showed that AI systems demonstrated a high sensitivity across most studies. In comparison sensitivity for both dilated and un-dilated eyes using AI screening shows a good performance and specificity for AI screening and manual screening was generally comparable, with dilated eyes as well as un-dilated eyes.

These results highlight that AI systems, especially in un-dilated eye conditions, show promise for clinical use with reliable sensitivity and specificity, but variations exist depending on the system and clinical setting.

Most of the studies exhibit low risk of bias showing which shows robust methodologies and reliable findings but some validation studies have shown moderate risk of bias especially in index test and reference standards suggesting possible inconsistencies in diagnostic criteria or lack of blinding. Also the studies assessed with axis tool shows moderate risk of bias in all studies especially in the results and conclusion domain indicates potential selective reporting, which could introduce bias in outcome interpretation.

Limitations and implications

Despite the promising outcomes, several limitations must be acknowledged. First, there is considerable heterogeneity across the included studies in terms of study settings, photographic protocols, and reference standards. The studies vary from community-based to outpatient settings, and the imaging techniques range from two-field to five-field photography with or without mydriasis. These differences may have influenced the diagnostic performance of AI based screening, limiting the generalizability of the findings. Additionally, the reference standards used for manual grading differ across studies, with some having single specialists and others using diagnoses by multiple experts, potentially affecting the accuracy of comparisons. Second, not all studies report the number of participants, making it difficult to assess the true sample size, which could impact diagnostic validity. Third, there is a significant variability among the studies in AI based screening, Variability in AI performance can arise from differences in study methodologies, dataset quality, and model training conditions. The findings highlight the need for standardized evaluation metrics and more transparent reporting to solve inconsistencies. Addressing these issues will enhance the reliability of AI applications in clinical settings and ensure robust decision-making.

Moreover, some of the studies had a moderate risk of bias which could lead to over-estimation or down-estimation of accuracy. To ensure that AI systems are safe and effective for real-world use, evaluations need to be conducted in representative clinical settings. Systems should be tested on a wide range of image qualities, and medical settings.

Conclusion

The findings from this meta-analysis suggest that AI systems are promising for DR screening, especially in settings where high sensitivity is critical. However, further independent studies, particularly those assessing the dilated eyes screening, are required to establish the efficacy of AI in broader clinical practice. Factors such as system technical failures, and operational settings should also be considered before full implementation. In conclusion, while AI-based systems offer a valuable tool for reducing the workload on human graders, their clinical utility depends on continued rigorous evaluation and refinement.

Future research

Future work should focus on refining AI algorithms for dilated eye conditions and exploring the integration of AI screening into routine ophthalmic practice. Large-scale, prospective validation studies will be essential to confirm these findings and guide the adoption of AI in DR screening protocols.

Statements

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

HT: Conceptualization, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing. NU: Conceptualization, Data curation, Formal analysis, Investigation, Writing – review & editing, Writing – original draft. MT: Formal analysis, Validation, Writing – review & editing. ID: Data curation, Formal analysis, Writing – review & editing. RP: Data curation, Validation, Writing – review & editing. SM: Formal analysis, Writing – review & editing. AA: Methodology, Writing – review & editing. ST: Formal analysis, Writing – review & editing. YA: Methodology, Writing – review & editing.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Acknowledgments

The authors would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work. This research is supported by the author, there are no sponsors or funds for the research.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1.

    World Health Organization . Global report on diabetes. Geneva: World Health Organization (2016).

  • 2.

    Teo ZL Tham YC Yu M Chee ML Rim TH Cheung N et al . Global prevalence of diabetic retinopathy and projection of burden through 2045: systematic review and meta-analysis. Ophthalmology. (2021) 128:158091. doi: 10.1016/j.ophtha.2021.04.027

  • 3.

    World Health Organization . (2023) Blindness and visual impairment fact sheet. Available online at: https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment

  • 4.

    Piyasena M Murthy GVS Yip JLY Gilbert C Zuurmond M Peto T et al . Systematic review on barriers and enablers for access to diabetic retinopathy screening services in different income settings. PLoS One. (2019) 14:e0198979. doi: 10.1371/journal.pone.0198979

  • 5.

    Hasan SU Siddiqui MAR . Diagnostic accuracy of smartphone-based artificial intelligence systems for detecting diabetic retinopathy: a systematic review and meta-analysis. Diabetes Res Clin Pract. (2023) 205:110943. doi: 10.1016/j.diabres.2023.110943

  • 6.

    Uy H Fielding C Hohlfeld A Ochodo E Opare A Mukonda E et al . Diagnostic test accuracy of artificial intelligence in screening for referable diabetic retinopathy in real-world settings: a systematic review and meta-analysis. PLOS Glob Public Health. (2023) 3:e0002160. doi: 10.1371/journal.pgph.0002160

  • 7.

    Wang S Zhang Y Lei S Zhu H Li J Wang Q et al . Performance of deep neural network-based artificial intelligence method in diabetic retinopathy screening: a systematic review and meta-analysis of diagnostic test accuracy. Eur J Endocrinol. (2020) 183:419. doi: 10.1530/EJE-19-0968

  • 8.

    Zhelev Z Peters J Rogers M Allen M Kijauskaite G Seedat F et al . Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: a systematic review. J Med Screen. (2023) 30:97112. doi: 10.1177/09691413221144382

  • 9.

    Tan CH Kyaw BM Smith H Tan CS Tudor Car L . Use of smartphones to detect diabetic retinopathy: scoping review and Meta-analysis of diagnostic test accuracy studies. J Med Internet Res. (2020) 22:e16658. doi: 10.2196/16658

  • 10.

    Ipp E Liljenquist D Bode B Shah VN Silverstein S Regillo CD et al . Pivotal evaluation of an artificial intelligence system for autonomous detection of referrable and vision-threatening diabetic retinopathy. JAMA Netw Open. (2021) 4:e2134254. doi: 10.1001/jamanetworkopen.2021.34254

  • 11.

    Li N Ma M Lai M Gu L Kang M Wang Z et al . A stratified analysis of a deep learning algorithm in the diagnosis of diabetic retinopathy in a real-world study. J Diabetes. (2022) 14:11120. doi: 10.1111/1753-0407.13241

  • 12.

    Lupidi M Danieli L Fruttini D Nicolai M Lassandro N Chhablani J et al . Artificial intelligence in diabetic retinopathy screening: clinical assessment using handheld fundus camera in a real-life setting. Acta Diabetol. (2023) 60:10838. doi: 10.1007/s00592-023-02104-0

  • 13.

    Piatti A Romeo F Manti R Doglio M Tartaglino B Nada E et al . Feasibility and accuracy of the screening for diabetic retinopathy using a fundus camera and an artificial intelligence pre-evaluation application. Acta Diabetol. (2024) 61:638. doi: 10.1007/s00592-023-02172-2

  • 14.

    Sedova A Hajdu D Datlinger F Steiner I Neschi M Aschauer J et al . Comparison of early diabetic retinopathy staging in asymptomatic patients between autonomous AI-based screening and human-graded ultra-widefield colour fundus images. Eye. (2022) 36:5106. doi: 10.1038/s41433-021-01912-4

  • 15.

    Sosale B Sosale AR Murthy H Sengupta S Naveenam M . Medios-an offline, smartphone-based artificial intelligence algorithm for the diagnosis of diabetic retinopathy. Indian J Ophthalmol. (2020) 68:3915. doi: 10.4103/ijo.IJO_1203_19

  • 16.

    Surya J Garima Pandy N Hyungtaek Rim T Lee G Priya MNS et al . Efficacy of deep learning-based artificial intelligence models in screening and referring patients with diabetic retinopathy and glaucoma. Indian J Ophthalmol. (2023) 71:303945. doi: 10.4103/IJO.IJO_11_23

  • 17.

    Tokuda Y Tabuchi H Nagasawa T Tanabe M Deguchi H Yoshizumi Y et al . Automatic diagnosis of diabetic retinopathy stage focusing exclusively on retinal hemorrhage. Medicina. (2022) 58:1681. doi: 10.3390/medicina58111681

  • 18.

    Soto-Pedre E Navea A Millan S Hernaez-Ortega MC Morales J Desco MC et al . Evaluation of automated image analysis software for the detection of diabetic retinopathy to reduce the ophthalmologists' workload. Acta Ophthalmol. (2015) 93:e526. doi: 10.1111/aos.12481

  • 19.

    Rajalakshmi R Subashini R Anjana RM Mohan V . Automated diabetic retinopathy detection in smartphone-based fundus photography using artificial intelligence. Eye (Lond). (2018) 32:113844. doi: 10.1038/s41433-018-0064-9

  • 20.

    Wang K Jayadev C Nittala MG Velaga SB Ramachandra CA Bhaskaranand M et al . Automated detection of diabetic retinopathy lesions on ultrawidefield pseudocolour images. Acta Ophthalmol. (2018) 96:e16873. doi: 10.1111/aos.13528

  • 21.

    Li Z Keel S Liu C He Y Meng W Scheetz J et al . An automated grading system for detection of vision-threatening referable diabetic retinopathy on the basis of color fundus photographs. Diabetes Care. (2018) 41:250916. doi: 10.2337/dc18-0147

  • 22.

    Acharyya M Moharana B Jain S Tandon M . A double-blinded study for quantifiable assessment of the diagnostic accuracy of AI tool "ADVEN-i" in identifying diseased fundus images including diabetic retinopathy on a retrospective data. Indian J Ophthalmol. (2024) 72:S46s52. doi: 10.4103/IJO.IJO_3342_22

  • 23.

    Arenas-Cavalli JT Abarca I Rojas-Contreras M Bernuy F Donoso R . Clinical validation of an artificial intelligence-based diabetic retinopathy screening tool for a national health system. Eye (Lond). (2022) 36:7885. doi: 10.1038/s41433-020-01366-0

  • 24.

    Limwattanayingyong J Nganthavee V Seresirikachorn K Singalavanija T Soonthornworasiri N Ruamviboonsuk V et al . Longitudinal screening for diabetic retinopathy in a Nationwide screening program: comparing deep learning and human graders. J Diabetes Res. (2020) 2020:8839376. doi: 10.1155/2020/8839376

  • 25.

    Ting DSW Cheung CYL Lim G Tan GSW Quang ND Gan A et al . Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. (2017) 318:221123. doi: 10.1001/jama.2017.18152

  • 26.

    González-Gonzalo C Sánchez-Gutiérrez V Hernández-Martínez P Contreras I Lechanteur YT Domanian A et al . Evaluation of a deep learning system for the joint automated detection of diabetic retinopathy and age-related macular degeneration. Acta Ophthalmol. (2020) 98:36877. doi: 10.1111/aos.14306

  • 27.

    Lin GM Chen M-J Yeh C-H Lin Y-Y Kuo H-Y Lin M-H et al . Transforming retinal photographs to entropy images in deep learning to improve automated detection for diabetic retinopathy. J Ophthalmol. (2018) 2018:2159702. doi: 10.1155/2018/2159702

  • 28.

    Li F Liu Z Chen H Jiang M Zhang X Wu Z . Automatic detection of diabetic retinopathy in retinal fundus photographs based on deep learning algorithm. Transl Vis Sci Technol. (2019) 8:4. doi: 10.1167/tvst.8.6.4

  • 29.

    Hansen MB Abràmoff MD Folk JC Mathenge W Bastawrous A Peto T . Results of automated retinal image analysis for detection of diabetic retinopathy from the Nakuru study, Kenya. PLoS One. (2015) 10:e0139148. doi: 10.1371/journal.pone.0139148

  • 30.

    Gargeya R Leng T . Automated identification of diabetic retinopathy using deep learning. Ophthalmology. (2017) 124:9629. doi: 10.1016/j.ophtha.2017.02.008

  • 31.

    Abràmoff MD Lou Y Erginay A Clarida W Amelon R Folk JC et al . Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning. Invest Ophthalmol Vis Sci. (2016) 57:52006. doi: 10.1167/iovs.16-19964

  • 32.

    Zhang W Zhong J Yang S Gao Z Hu J Chen Y et al . Automated identification and grading system of diabetic retinopathy using deep neural networks. Knowl-Based Syst. (2019) 175:1225. doi: 10.1016/j.knosys.2019.03.016

  • 33.

    Zhang X Li F Li D Wei Q Han X Zhang B et al . Automated detection of severe diabetic retinopathy using deep learning method. Graefes Arch Clin Exp Ophthalmol. (2022) 260:849856. doi: 10.1007/s00417-021-05402-x

  • 34.

    Kumar PS Deepak RU Sathar A Sahasranamam V Rajesh Kumar R . Automated detection system for diabetic retinopathy using two field fundus photography. Procedia Comput Sci. (2016) 93:48694. doi: 10.1016/j.procs.2016.07.237

Summary

Keywords

diabetic retinopathy, screening, artificial intelligence, deep learning, manual screening, automated detection

Citation

Tahir HN, Ullah N, Tahir M, Domnic IS, Prabhakar R, Meerasa SS, AbdElneam AI, Tahir S and Ali Y (2025) Artificial intelligence versus manual screening for the detection of diabetic retinopathy: a comparative systematic review and meta-analysis. Front. Med. 12:1519768. doi: 10.3389/fmed.2025.1519768

Received

07 November 2024

Accepted

14 April 2025

Published

07 May 2025

Volume

12 - 2025

Edited by

Yanwu Xu, Baidu, China

Reviewed by

Xiuju Chen, Xiamen University, China

Rajalakshmi R., Madras Diabetes Research Foundation, India

Updates

Copyright

*Correspondence: Hasan Nawaz Tahir, Naseer Ullah,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics