SYSTEMATIC REVIEW article

Front. Digit. Health, 13 January 2026

Sec. Health Informatics

Volume 7 - 2025 | https://doi.org/10.3389/fdgth.2025.1685771

Autonomous chest x-ray image classification, capabilities and prospects: rapid evidence assessment

    YV

    Yuriy Vasilev 1,2

    AB

    Alexander Bazhin 1,2

    RR

    Roman Reshetnikov 1

    ON

    Olga Nanova 1*

    AV

    Anton Vladzymyrskyy 1,3

    KA

    Kirill Arzamasov 1,4

    PG

    Pavel Gelezhe 1

    OO

    Olga Omelyanskaya 1

  • 1. Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department, Moscow, Russia

  • 2. National Medical and Surgical Center Named After N.I. Pirogov of the Ministry of Health of the Russian Federation, Moscow, Russia

  • 3. I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation, Sechenov University, Moscow, Russia

  • 4. Ministry of Science and Higher Education, Moscow Technical University—MIREA, Moscow, Russia

Article metrics

View details

860

Views

63

Downloads

Abstract

Background:

Screening methods are essential for detection of numerous pathologies. Chest x-ray radiography (CXR) is the most widely used screening modality. During the screening, radiologists primarily examine normal radiographs, which results in a substantial workload and an increased risk of errors. There is an increasing necessity to automate radiological screening in order to facilitate the autonomous sorting of normal studies.

Objective:

We aimed to evaluate the capabilities of artificial intelligence (AI) techniques for the autonomous CXRs triage and to assess their potential for integration into routine clinical workflow.

Methods:

A rapid evidence assessment methodology was employed to conduct this review. Literature searches were performed using relevant keywords across PubMed, arXiv, medRxiv, Elibrary, and Google Scholar covering the period from 2019 to 2025. Inclusion criteria comprised large-scale studies addressing multiple pathologies and providing abstracts in English. Meta-analysis was conducted using confusion matrices derived from reported diagnostic performance metrics in the selected studies. Methodological quality and the overall quality of evidence were assessed using a combination of QUADAS-2, QUADAS-CAD, and GRADE frameworks.

Results:

Out of 327 records, 11 studies met the inclusion criteria. Among these, three studies analyzed datasets reflecting the real-world prevalence of pathologies. Three studies included very large cohorts exceeding 500,000 CXRs, whereas the remaining studies used considerably smaller samples. The proportion of autonomously triaged CXRs ranged from 15.0% to 99.8%, with a weighted average of 42.3% across all publications. Notably, in a study conducted under real-world clinical conditions on continuous data flow, this proportion was 54.8%. Sensitivity was 97.8% (95% CI: 94.8%–99.1%), and specificity was 94.8% (95% CI: 53.0%–99.7%). Fifty-five percent of the studies were classified as having a low risk of bias. Primarily, elevated risk of bias and heterogeneity of results were attributed to variability in sample selection criteria and reference standard evaluation.

Conclusions:

Modern AI systems for autonomous triage of CXRs are ready to be implemented in clinical practice. AI-driven screening can reduce radiologists' workload, decrease sorting errors and lower the costs associated with screening programs. However, implementation is often hindered by regulatory and legislative barriers. Consequently, comprehensive clinical trials conducted under real-world conditions remain scarce.

1 Introduction

1.1 Rationale

Chest radiography is one of the most frequently performed preventive radiological examinations. However, the ongoing shortage of radiologists (14) and the subsequent adoption of new x-ray equipment, makes the optimization of radiological workflows for screening studies increasingly critical. The vast majority of these preventive diagnostic imaging studies reveal an absence of pathological findings. The reported prevalence of pathologies in chest radiographs ranges from 1% to 20% (58). This low prevalence implies that radiologists spend most of their time interpreting “normal” examinations, which may inadvertently increase the risk of missing abnormalities. This phenomenon is an example of cognitive bias, where a radiologist, upon detecting an initial finding or confirming the absence of pathology, prematurely terminates the image analysis. Several factors contribute to such diagnostic errors, including «satisfaction of search», «localisation errors», «lack of thorough structural analysis», and «data shortage» (9).

In order to mitigate the risk of missed pathologies and reduce the workload of radiologists, clinical decision support systems—particularly those employing artificial intelligence (AI)—have gained widespread adoption. In the context of screening examinations, a notable approach is the autonomous triage of chest radiographs (CXRs), whereby AI algorithms differentiate normal from abnormal images without direct radiologist involvement. In practice, this means the automated exclusion of routine CXRs, enabling radiologists to concentrate on more complex cases. The primary objective of autonomous triage is to alleviate the burden on radiologists and reduce reporting delays while maintaining patient safety (10, 11). Over the past five years, numerous studies have investigated AI systems for autonomous chest radiograph triage.

1.2 Objectives

The present study aimed to evaluate the capabilities of contemporary AI methods for the autonomous triage of CXRs and assess their prospects for integration into routine clinical practice. Although numerous publications have explored AI for the detection and screening of individual diseases, such narrowly focused strategies have limited applicability in practical screening workflows.

In this study, we conducted a selective review of scientific literature to determine the performance and applicability of modern AI for autonomous chest radiograph triage. We employed a Rapid Evidence Assessment (REA) (12) methodology to enable a rapid yet comprehensive evaluation of the feasibility and clinical utility of autonomous triage systems in current real-world practice.

2 Methods

The prospective, unregistered study protocol is provided in the Supplementary Material.

2.1 Literature search

The article search was conducted for the period from 2019 to 2025. All articles and preprints with an English abstract were considered. Searches were performed across multiple databases: PubMed, arXiv, medRxiv, Elibrary. An additional literature search was conducted using the Google Scholar search engine.

The search query used in PubMed was as follows:

(“Radiography, Thoracic”[Mesh] OR Thoracic Radiograph*[tiab] OR (x-ray[tiab] AND (chest[tiab] OR thora*[tiab])) OR CXR) AND (“Radiographic Image Interpretation, Computer-Assisted”[Mesh] OR Artificial Intelligence[tiab] OR Computer vision[tiab] OR (learning[tiab] AND (machine[tiab] OR deep[tiab]))) AND (Autonom*[tiab] OR standalone[tiab] OR unaided[tiab] OR unassisted[tiab]).

The search query for arXiv was: “Autonomous AI AND chest x-ray OR CXR AND triage”. The search query for medRxiv, Elibrary, and Google Scholar was: “Autonomous AI chest x-ray triage”.

2.2 Study selection

The present study incorporated all screening studies that used CXRs to analyze multiple pathologies. A two-stage approach was employed for study selection. In the initial phase, two independent reviewers screened titles and abstracts to identify the most relevant studies to our research question. In the subsequent stage, the full texts of articles that passed the initial screening were independently reviewed by two researchers, who made inclusion decisions.

The search and selection of articles were carried out by two researchers, each with over 10 years of experience in medical informatics.

2.3 Data extraction

The following information was extracted from the selected studies: bibliometric data, country of study, research objectives, study design (prospective/retrospective, multicenter/single-center, binary classification/prioritization), number of abnormalities, employed AI algorithm, inclusion and exclusion criteria, reference standard, assessment of time benefits and economic efficiency, all values of AI diagnostic accuracy, values of true positives (TP), false negatives (FN), false positives (FP), true negatives (TN), and comparison of AI diagnostic accuracy with that of physicians.

Information extraction was conducted independently by two researchers, each with over 10 years of experience in medical informatics.

2.4 Meta-analysis

The number of included and autonomously triaged CXRs for each study was reported.

Components of the confusion matrix that were not reported in the publications were independently calculated based on the reported diagnostic performance metrics, following the methodology suggested by (13). In studies employing prioritization, the category designated as “normal” was classified as normal, whereas all other categories (“non-urgent”, “urgent”, and “critical”) were classified as abnormal. Subsequently, sensitivities and specificities were calculated for each study with corresponding 95% confidence intervals (CIs). Paired forest plots displayed individual study estimates alongside pooled meta-analysis results for sensitivity and specificity. Additionally, a receiver operating characteristic (ROC) plot of sensitivity vs. 1-specificity summarized diagnostic performance based on meta-analysis data.

The meta-analysis was conducted using the MetaDTA tool (14).

The expected prevalence level of abnormalities was calculated per 10,000 cases based on minimal, maximal, and real-world findings in the selected studies.

A sensitivity analysis was conducted across two scenarios. In the first, two studies employing prioritization were excluded from the overall sample. The second scenario excluded the two studies with the highest sensitivity and specificity values (occupying the upper left corner on the ROC plot) from the overall sample.

Subgroup and random effects analyses were conducted to investigate potential sources of heterogeneity. The analyses were performed using the metafor package for R, employing the REML (Restricted Maximum Likelihood) model for small samples. The factors used were sample size (as a continuous variable), year of study publication (as a continuous variable), and population (as a discrete binary variable: general or hospital).

2.5 Assessment of methodological quality

Two review authors independently assessed the methodological quality of included studies using a combination of two instruments: QUADAS-2 (15) and the AI-adapted QUADAS-CAD (16, 17). We assessed each of the four domains (Patient Selection, Index Test, Reference Standard, Flow and Timing) in terms of risk of bias using QUADAS-CAD, and the first three domains in terms of concerns regarding applicability to the review question using QUADAS-2.

Any disagreements were resolved through discussion to reach consensus. If needed, a third author facilitated consensus.

In the Patient Selection domain, we assessed whether the screening samples reflected the distribution of normal and pathological cases in the real-world population. In the Index Test domain, we analyzed how the index test was applied. In the Reference Standard domain, we evaluated the types of reference standards used, taking into account the presence of multiple pathologies in the samples. In the Flow and Timing domain, we assessed whether the reference standards were consistently applied across all cases and examined the overall transparency of the results.

We visually examined forest plots and ROC plots for heterogeneity and discussed potential data heterogeneity sources. These estimations were further used in the assessment of certainty of evidence. We assessed the certainty of evidence using the GRADE methodology (18, 19).

3 Results

3.1 Results of the search and study selection

327 records were identified (Figure 1). Of these, 40 records were identified in PubMed, 1 in arXiv, 14 in medRxiv, 7 in Elibrary, and 265 in Google Scholar.

Figure 1

Flowchart detailing the study selection process: 327 records identified from PubMed (40), ArXiv (1), MedRxiv (14), Elibrary (7), and Google Scholar (265). All records screened, with 311 excluded for irrelevance. 16 full-text articles assessed, 5 excluded for single pathology focus. 11 studies included in the final analysis.

Study flow diagram.

311 records were excluded as irrelevant to the classification of CXRs. After the initial screening, 16 articles underwent full-text analysis: 5 from PubMed, 1 from arXiv, 0 from medRxiv, 1 from Elibrary, and 9 from Google Scholar.

During the full-text review, 5 articles were excluded because they focused on the detection of a single pathology (Appendix 1). Consequently, 11 articles were included in the systematic review: 3 from PubMed, 1 from arXiv, 0 from medRxiv, 1 from Elibrary, and 6 from Google Scholar.

In the study by (20), several distinct datasets were employed. For inclusion in our review, we selected two datasets relevant to our research focus (DS-1 and CXR-14), which involved multiple pathologies, and excluded model datasets containing only a single target pathology (e.g., tuberculosis or COVID-19).

Confusion matrices were reported in two publications (10, 21). For the remaining studies, we independently calculated confusion matrices based on the reported diagnostic performance metrics, following the methodology described in (13).

3.2 Main characteristics of the studies

The main characteristics of the selected publications are presented in Table 1 (general study information) and Table 2 (quantitative characteristics).

Table 1

First author/year Title Journal Location
1 Vasilev et al. (22) Autonomous artificial intelligence for sorting results of preventive radiological examinations of chest organs: medical and economic efficiency Digital Diagnostics Russian Federation
2 Blake et al. (11) Using Artificial Intelligence to stratify normal versus abnormal Chest x-rays: external validation of a deep learning algorithm at East Kent Hospitals University NHS Foundation Trust Diagnostics United Kingdom
3 AlJasmi et al. (28) Post-deployment performance of a deep learning algorithm for normal and abnormal chest x-ray classification: A study at visa screening centers in the United Arab Emirates European Journal of Radiology Open United Arab Emirates
4 Plesner et al. (10) Autonomous chest radiograph reporting using AI: estimation of clinical impact Radiology Denmark
5 Yoo et al. (24) Artificial Intelligence-Based Identification of Normal Chest Radiographs: A Simulation Study in a Multicenter Health Screening Cohort Korean J Radiology Republic of Korea
6 Schalekamp et al. (26) Performance of AI to exclude normal chest radiographs to reduce radiologists' workload European Radiology The Netherlands
7 Subramanian et al. (25) Autonomous AI for multi-pathology detection in Chest x-Rays: a multi-site study in Indian healthcare system arXiv Republic of India
8 Sridharan et al. (23) Real-World evaluation of an AI triaging system for chest x-rays: A prospective clinical study European Journal of Radiology Singapore
9 Annarumma et al. (21) Automated triaging of adult Chest Radiographs
with Deep Artificial Neural Networks
Radiology United Kingdom
10 Dyer et al. (27) Diagnosis of normal chest radiographs using an
autonomous deep-learning algorithm
Clinical Radiology United Kingdom
11 Nabulsi et al. (20) Deep learning for distinguishing normal versus abnormal chest radiographs and generalization
to two unseen diseases tuberculosis and COVID-19
Scientific Reports USA
Objective Design Abnormalities number AI algorithms
1 To evaluate the feasibility, effectiveness, and efficiency of autonomous sorting of results from preventive radiological examinations of chest organs Prospective, multicenter, binary classification 9 The experiment used medical products based on AI technologies to analyse the results of CXRs and fluorography:
  • – PhthisisBioMed Artificial Medical Intelligence: Software for Automated Analysis of Digital Chest x-ray/Fluorograms (medical device registration certificate No. RZN 2022/17406 dated May 31, 2022) [10.17691/stm2023.15.4.01];

  • – «Third Opinion. Fluorograms/Chest x-ray » (ООО «Third Opinion Platform»; No. RZN 2021/14506);

  • – «CELSUS» (ООО «Medical screening systems»; No. RZN 2021/14449)

2 To evaluate the performance and clinical utility of a fully automated computer-aided detection system in stratifying CXRs as normal or abnormal as well as its potential impact on optimizing the workload of reporting radiologists while ensuring patient safety Retrospective, multicenter, binary classification Unknown: various abnormal findings qXR (CE-marked)
3 To assess the agreement of AI in classifying normal versus abnormal CXRs with reporting radiologists, to assess clinical and technical users provide insights into the perceived implementation challenges and benefits of integrating AI into clinical workflows Retrospective, multicenter, binary classification Unknown: multiple abnormalities qXR version 2.1 (CE-marked)
4 To perform an external evaluation of a commercially available AI tool for (a) the number of chest radiographs autonomously
reported, (b) the sensitivity for AI detection of abnormal chest radiographs, and (c) the performance of AI compared with that of the
clinical radiology reports
Retrospective, multicenter, binary classification 38 ChestLink, version 2.6 (CE-marked); Oxipit
5 To investigate the feasibility of using AI to identify normal CXR from the worklist of radiologists in a health-screening environment Retrospective, multicenter, binary classification 10 Lunit INSIGHT CXR3, version 3.5.8.8 (CE mark, FDA)
6 To investigate the performance of a commercially available AI to identify normal CXRs and its potential to reduce radiologist workload Retrospective, multicenter, binary classification 10 Lunit INSIGHT CXR3 (CE mark, FDA)
7 To study AI-based approach capabilities to CXRs pathology detection in the
Indian healthcare system
Retrospective, multicenter, binary classification 75 This AI system integrates advanced architectures, including
Vision Transformers, Faster R-CNN, and various U-Net models (such as Attention U-Net, U-Net++, and Dense U-Net)
8 To evaluate AI triaging software performance in a prospective, real-world setting Prospective, single center, prioritization Unknown LUNIT INSIGHT CXR Triage (CE mark, FDA)
9 To develop and test an AI system, based on deep convolutional neural networks, for automated
real-time triaging of adult CXRs on the basis of the urgency of imaging appearances
Retrospective, multicenter, prioritization 15 Ensemble of two CNNs
10 To evaluate the suitability of a deep-learning algorithm for identifying normality
as a rule-out test for fully automated diagnosis in frontal adult CXRs in an
active clinical pathway
Retrospective, multicenter, binary classification Unknown The ensemble contained a mix of two different types of state-of-the-art architectures, Densenet121 and EfficientNet B4
11 To develop deep learning system that classifies CXRs as normal or abnormal using data containing a diverse array of CXR abnormalities from 5 clusters of hospitals from 5 cities in India, to evaluate the deep learning system for its generalization to unseen data sources and unseen diseases using 6 independent datasets Retrospective, multicenter, binary classification Unknown CNN
Inclusion criteria Exclusion criteria Reference test Time benefits assessement Economic efficiency assessment
1 Patients ≥ 18 y. o.
Appointment of a preventive examination of the chest organs
Radiography or fluorography of the chest organs
The signed voluntary informed consent of the patient
Time period: 01.05.2024–30.09.2024
Absence of results of preventive examination in Unified medical information and analytical system of the city of Moscow
Technical defects in performing preventive examination
Reference test No. 1: protocol prepared by a radiologist of the Russian Medical Academy of Continuous Professional Education of the Ministry of Health of the Russian Federation
Reference test No. 2: expert review by a radiologist with subspecialisation in thoracic radiology of the Scientific and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Department of Health
no yes
2 Patients ≥ 18 y. o.
PA/AP view
Minimum image resolution of 1440 × 1440
Minimum of 10 gray levels
Lateral CXRs
Incomplete view of the chest
CXR images containing excessive motion artifacts
The agreement between two senior radiologists with experience of 10 and 12 years no no
3 All the consecutive frontal (PA/AP) CXRs acquired from patients ≥ aged 18 y. o.
Time period: January 2021–June 2022
Any CXRs with a non-available AI result and CXRs where the radiologist suggested a repeat CXR due to lack of image quality 17 radiologists and 3 health professionals (PACS/ IT managers) who had prior experience in using the AI software no no
4 All patients ≥ 18 y. o. who underwent chest radiography in the hospitals, including emergency department patients, in-hospital patients, and outpatients. Only the first CXR was included, but
follow-up CXRs were allowed if a primary CXRhad been acquired before the inclusion period
Time period: 1–12 January 2020
Patients with duplicate records, missing Digital Imaging and Communications
in Medicine, or DICOM, images, CXRs from a nontarget hospital, and/or insufficiently visualized chest
The reference standard was provided independently by two board-certified thoracic radiologists, with a third radiologist in cases of disagreement no no
5 Patients who visited health screening center and underwent both CXR and chest CT beween January and December 2018 Interval between CXR and chest CT was more than 1 months Two stages: 1)Three board-certified thoracic radiologists (with 19, 13, and 12 years of
experience in thoracic imaging, respectively); 2) relevant diagnosis from the electronic medical record
no no
6 All consecutive CXRs of inpatient and outpatient collected from an academic hospital (Radboudumc, Nijmegen, The Netherlands) and a large community hospital (Jeroen Bosch Hospital, 's Hertogenbosch, The Netherlands)
Emergency CXRs Only the first initial exam in inclusion period for each patient
Time period: 1–14 October 2016
Bedside radiographs, pediatric and incomplete visualized chests, as well as the radiographs imported from other hospitals 3 chest radiologists (9, 12, and 20 years of experience) no no
7 Unclear Unclear Unclear no no
8 Underwent a CXR during the study period: August–December 2023
Above 10 years of age, given that pediatric radiology involves different considerations
Had no prior CXRs taken within the same hospitalization to ensure independent assessments
Unclear Unclear yes no
9 Frontal CXRs obtained from January 2005 to May 2017 at a publicly funded university hospital network consisting of three hospitals Pediatric CXRs obtained in patients <16 y. o. Unclear no no
10 All frontal CXRs performed on patients ≥ 18 y. o.
Computed or digital radiography
Patient type as one of accident and emergency, general practitioner or outpatient
Time period: 22.12 2017–14.05.2020
Non-CXR examinations including lateral CXR and chesteabdomen radiographs Labelling of two independent FRCR-trained radiologists, with a minimum of 10 years NHS clinical experience at consultant level. In case of a discrepancy, images and conflicting radiologists' labels were sent for arbitration by a third FRCR radiologist no no
11 The first dataset DS-1 was from five clusters of hospitals across five different cities
in India (Bangalore, Bhubaneswar, Chennai, Hyderabad, and New Delhi)5. DS-1 consisted of images from consecutive inpatient and outpatient encounters between November 2010 and January 2018, and reflected the natural population incidence of the abnormalities in the populations
The second dataset CXR-14 contains 2000 randomly selected CXRs from the publicly specified
test set (25,596 CXRs from 2797 patients) of CXR-14 from the National Institute of Health
Unclear Normal/abnormal based
on majority vote of 3
radiologists
no no

The studied literature.

Table 2

Study ID Population Total number of CXRs Number (%) of autonomously sorted CXRs TP FN FP TN Sensitivity % Specificity %
Vasilev et al. (22) General population 575,549 54.8 260,058 290 0 315,201 99.9 100.0
Blake et al. (11) Multiple clinical patients (general practices, accident and emergency departments, and inpatient and outpatient) 993 39.3 497 3 105 387 99.4 78.7
AlJasmi et al. (28) Migrants in screening visa centers 1,309,443 71.5 18,883 749 354,308 935,503 96.2 72.5
Plesner et al. (10) Multiple hospitals patients including isolated areas of interstitial lung disease and lung cancer 1529 28.0 1090 10 309 120 99.1 28.0
Yoo et al. (24) General population 5,887 42.9 632 18 2,729 2,508 97.2 47.9
Schalekamp et al. (26) Multiple hospitals patients (inpatient, outpatient, and emergency chest radiographs) 1,670 53.0 207 18 578 867 92.0 60.0
Subramanian et al. (25) General population 1,012,851 99.8 2,022 1,011 4 1 009 814 66.7 100.0
Sridharan et al. (23) Patients of general hospital 20,944 28.6 12 412 299 2,542 5,690 97.6 69.1
Annarumma et al. (21) Multiple hospitals patients 3,229 20.3 2,524 147 160 398 94.5 71.3
Dyer et al. (27) Multiple hospitals patients (accident and emergency, general practitioner or outpatient) 3,887 15.0 3,294 13 10 570 99.6 98.3
Nabulsi et al. (20)(a) (DS-1) Multiple hospitals patients (inpatients and outpatients) 7,747 29.9 1,792 46 3,639 2,270 97.5 38.4
Nabulsi et al. (20)(b) CXR-14 Multiple hospitals patients 810 24.0 548 29 68 165 95.0 070.8

Quantitative characteristics of the studies.

TP, true positive; FN, false negative; FP, false positive; TN, true negative.

Only two of the 11 studies were prospective and conducted on streaming data (22, 23). Among these, only one large-scale study was performed under real-world conditions (22). Of the 11 publications, 10 were multicenter studies, and one was a single-center study (23).

Real-world population data were employed in three studies (22, 24, 25). In other cases, normal/abnormal distribution in employed samples could differ from general population: patients from hospitals (10, 11, 20, 21, 23, 26, 27) including those specializing in pulmonary pathologies (10), as well as migrant populations (28).

Most studies (9 out of 11) employed binary classification distinguishing between normal and abnormal categories. The abnormal category predominantly encompassed multiple pathologies (ranging from 9 to 75 categories) (10, 11, 22, 2426, 28), while in some studies, the number of pathologies was unspecified (20, 27).

Two studies utilized prioritization schemes: normal, non-urgent, and urgent (23), or normal, non-urgent, urgent, and critical (21).

Most studies included CXRs of adult patients; however, two studies (23, 25) also included CXRs of patients under 18 years of age.

The largest datasets, each comprising over half a million samples, were employed in three publications (28): with 1,309,443 CXRs (accounting for 44.5% of the total review sample) (25), with 1,012,851 CXRs (34.4%), and (22) with 575,549 CXRs (19.5%).

The sample sizes of the remaining publications (10, 11, 20, 21, 23, 24, 26, 27) were considerably smaller, ranging from 810 to 20,944 CXRs (contributing from 0.03% to 0.71%, respectively).

3.3 Meta-analysis

The proportion of autonomously triaged CXRs varied substantially across studies, ranging from 15.0% to 99.8% (Table 2). In the largest screening studies, the number of autonomously triaged CXRs was notably high. For instance, in a real-world data stream study by (22) the proportion of autonomously triaged CXRs was 54.8%. In retrospective studies with sample sizes exceeding one million (25, 28), the proportion reached 99.8% and 71.5%, respectively.

Estimates of sensitivity and specificity are presented and summarized in Table 2. Sensitivity was consistently high across all studies, with a median value of 97.8% (95% CI: 94.8%–99.1%). Specificity was also generally high on average but exhibited considerable variability, with a median of 94.8% (95% CI: 53.0%–99.7%). In the first sensitivity analysis scenario, the median sensitivity was 97.8% (95% CI: 94.8–99.1%), and the median specificity was 94.8% (95% CI: 53.0%–99.7%). In the second scenario, the median sensitivity was 97.6% (95% CI: 93.5%–99.1%), and the median specificity was 97.5% (95% CI: 61.7%–99.9%).

Summary results regarding the variability of key diagnostic metrics—sensitivity and specificity—are illustrated in the forest plots (Figure 2) and the ROC curve plot (Figure 3). Notably, substantial heterogeneity was observed in sensitivity estimates and especially in specificity estimates.

Figure 2

Forest plot showing sensitivity and specificity of various studies from 2019 to 2025. Each study includes a point estimate with confidence intervals. Vertical axes represent study names; horizontal axes show sensitivity (0.65 to 1.00) and specificity (0.24 to 1.00).

Forest plot of studies, 95% CI provided in parentheses.

Figure 3

Hierarchical Summary Receiver Operating Characteristic (HSROC) plot showing sensitivity versus false positive rate. The solid line represents the HSROC curve, while the dashed line indicates the ninety-five percent confidence estimate. A red square symbolizes the summary point. Black dots correspond to predictive values, with numbers labeling specific points.

Summary ROC plot. The plot shows summary estimates, 95% confidence (dotted lines) and 95% prediction intervals (dashed lines). Designations: (1) Vasilev et al. (22); (2) Blake et al. (11); (3) AlJasmi et al. (28); (4) Plesner et al. (10); (5) Yoo et al. (24); (6) Schalekamp et al. (26); (7) Subramanian et al. (25); (8) Sridharan et al. (23); (9) Annarumma et al. (21); (10) Dyer et al. (27); (11) Nabulsi et al. (20)(a) (DS-1); (12) Nabulsi et al. (20)(b) CXR-14.

In most cases, relatively narrow confidence intervals were reported, which is typical for studies with large sample sizes. However, wide confidence intervals for diagnostic performance were observed in (26) and (20).

The highest simultaneous sensitivity and specificity values were reported in the population-based real-time study by (22) and the hospital-based dataset by (27).

Additionally, sensitivity values exceeding 95.0% were found in the studies by (10, 11, 20, 23, 24, 28), although these were accompanied by a substantial reduction in specificity, ranging from 28.0% to 79.0%.

In (26), sensitivity estimates demonstrated a relatively wide confidence interval and low specificity (60%). Conversely, a high specificity of 100.0% was observed in (25), albeit with a relatively low sensitivity of 67.0%.

Analysis of variability without factors revealed substantial between-study heterogeneity: (estimated amount of residual heterogeneity) = 0.0005, (residual heterogeneity/unaccounted variability) = 100.0%. The Q test for heterogeneity also indicated significant between-study heterogeneity (p < 0.01).

Incorporating of sample size as a factor into the model left the unexplained between-study heterogeneity high: τ2 = 0.0006, I2 = 100.0%, and (amount of heterogeneity accounted for) = 0.0%. The intercept was statistically significant in the model (0.97, p < 0.01), while the coefficient for the factor was not (0.0041, p = 0.93).

Including year of publication as a factor yielded a nonsignificant effect: τ2 = 0.0004, I2 = 22.5%, R2 = 20.2%. The intercept (−15.0617, p = 0.54) and factor coefficient (0.0079, p = 0.51) were both nonsignificant.

The population factor effectively explained the differences between studies: τ2 = 0.0, I2 = 0.0%, R2 = 100.0%. In the model, the intercept (0.96, p < 0.01) and factor coefficient (0.04, p < 0.01) were both statistically significant. The subgroup effect estimates for population were as follows: “general” subgroup — sensitivity 0.999 ± 0.0, specificity 1.0 ± 0.0; “hospital” subgroup — sensitivity 0.962 ± 0.0, specificity 0.7 ± 0.06.

3.4 Quality assessment

The risk of bias assessment results according to QUADAS-CAD and QUADAS-2 is presented in Figure 4. A low risk of bias was assigned to six (55%) publications (11, 20, 22, 24, 26, 27). A high risk of bias was assigned to five (45%) publications (10, 21, 23, 25, 28).

Figure 4

Table showing risk of bias and applicability concerns for various studies. Green circles indicate low risk, yellow circles show some concerns, and red circles signify high risk. Details are provided for patient selection, index test, reference standard, and flow and timing for each study.

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study.

The risk of bias in the Patient Selection domain was due only to a few screening studies using samples representative of the real-world population (22, 24, 25). In other cases, the sample composition was either biased (10, 26, 28) or there was a potential risk of bias (11, 20, 21, 23, 27).

A high risk of bias in the Flow and Timing domain was identified in three (27%) publications (21, 23, 25) due to concerns that not all cases employed the same reference standard and, correspondingly, due to unclear timing and transparency in obtaining results.

We calculated prevalence using the GRADE tool, based on the expected rate of CXRs observed in a real-world population (22) (45%), as well as minimum and maximum values.

The overall certainty of the evidence was rated as low owing to a high risk of bias affecting 45% of the included studies and the substantial heterogeneity of diagnostic estimates (Table 3), which was caused by significant differences in the analyzed populations.

Table 3

Outcome № of studies (№ of patients) Study design Factors that may decrease certainty of evidence Effect per 10 000 patients tested Test accuracy CoE
Risk of bias Indirectness Inconsistency Imprecision Publication bias pre-test probability of 0.2% pre-test probability of 45% pre-test probability of 85%
True positives (patients with normal/abnormal CXR) 11 studies 306,592 patients cross-sectional (cohort type accuracy study) seriousa not serious seriousb not serious none 20 (19 to 20) 4,401 (4,266 to 4,460) 8,313 (8,058 to 8,424) ⊕⊕○○
Lowa,b
False negatives (patients incorrectly classified as not having normal/abnormal CXR) 0 (0 to 1) 99 (40 to 234) 187 (76 to 442)
True negatives (patients without normal/abnormal CXR) 11 studies
26,37,945 patients
cross-sectional (cohort type accuracy study) seriousa not serious seriousc not serious none 9,461 (5,289 to 9,950) 5,214 (2,915 to 5,484) 1,422 (795 to 1,496) ⊕⊕○○
Lowa,c
False positives (patients incorrectly classified as having normal/abnormal CXR) 519 (30 to 4,691) 286 (16 to 2,585) 78 (4 to 705)

GRADEpro estimation: any CXR abnormalities. Review question: what is the accuracy of any CXR abnormality as a screening test for detecting abnormalities in a general population of people. Role of index test: to estimate number of autonomously sorting CXRs without abnormalities. Reference standards: radiologists reports. Study design: cross-sectional studies. Any CXR abnormality summary sensitivity (95% CI): 97.8% (94.8 to 99.1); summary specificity (95% CI): 94.8% (53.0 to 99.7).

Explanations.

a

Downgraded by one since 5 of 11 studies (45%) have high risk of bias due Patient Selection domain (3 studies have high risk of bias and 5 studies have some concerns according risk of bias), Reference standard domain (3 studies have some concerns according risk of bias), and Flow and timing domain (3 studies have high risk of bias and 1 study has some concerns).

b

Downgraded by one since high heterogeneity level of sensitivity and its CI estimation, variability in reference standards between studies and lack of clear levels in reference standards.

c

Downgraded by one since the wide range of autonomously sorted results (15%–99.8%) that cannot be fully explained by inter-population variability.

3.5 Assessment of temporal and economic benefits, comparison of diagnostic accuracy results with physicians

The assessment of temporal benefits of automated triage compared to triage performed by physicians was conducted in one study (23). The study (23) reports that the mean difference between AI and radiologist turnaround time (defined as the interval from CXR upload to the system until approval of the radiology report by the radiologist) was 818.9 min, which was statistically significant (p < 0.05, Wilcoxon signed-rank test).

Economic efficiency was evaluated in a single study (22). The study (22) demonstrated economic viability, with costs reduced by 43.7% over a 5-month experimental period.

Diagnostic accuracy between AI and radiologists was compared in three studies (10, 20, 27). Overall, AI demonstrated superior diagnostic performance compared to radiologists.

In the (10), AI had significantly higher sensitivity for detecting abnormal CXRs than clinical radiology reports, with sensitivities of 99.1% vs. 72.3% respectively (p < 0.001).

The study (20) reported sensitivity value of 98.0% for AI and 48.0% for radiologists for the DS-1 dataset, with specificities of 38.0% and 96.0%, respectively. For the CXR-14 dataset, sensitivity was 95.0% for AI and 54.0% for radiologists, with specificity of 71.0% and 94.0%, respectively.

In the study by (27), the false negative (FN) rate for AI was 0.33%, which was substantially lower than the 13.5% FN rate observed for radiologists.

4 Discussion

The proportion of studies that are autonomously triaged varies widely across different publications (15.0%–99.8%) and is largely determined by whether the sample was obtained through screening a natural population or represents a biased cohort with an increased risk of pathology detection.

For studies conducted on samples reflecting the natural population (22, 24, 25), the proportion of autonomously triaged studies exceeds 40%, which closely aligns with the overall average of 42.3% across all publications. In the recent study by (22) involving real-world data and a large sample of over half a million cases, the proportion of autonomously triaged CXRs was found to be 54.8%. Consequently, this value can be considered a natural and practically expected benchmark. A subgroup analysis revealed that the differences between studies conducted in general and hospital populations accounted for a significant amount of the observed heterogeneity.

On average, AI algorithms demonstrated high sensitivity across studies: 97.8% (95% CI: 94.8–99.1%). The sensitivity level of radiologists when analyzing CXRs, which serves as a benchmark, ranges from 53.6% to 95.5% and is highly dependent on the radiologists' experience and the type of pathology (29, 30). In contrast to radiologists, AI algorithms can be finely tuned by adjusting the threshold to balance sensitivity and specificity. Theoretical research on a limited dataset has demonstrated the feasibility of tuning AI services to achieve 100% sensitivity while maintaining 77.4% specificity (31). In a prospective experimental study of 209,497 screening CXRs, the false-negative rate of the AI was significantly lower (0.04%) than that of radiologists (8.6%) (32). The AI also autonomously triaged 55.9% (117,041) of CXRs in this study. Maximizing sensitivity inevitably results in decreased specificity. Nevertheless, across the majority of analyzed studies, AI also exhibited an acceptable specificity level of 94.8% (95% CI: 53.0–99.7%). Direct comparisons of the sensitivity and specificity of AI vs. radiologists in the analyzed studies (10, 20, 27) show that the performance of AI screening was not inferior to that of radiologists.

The sensitivity analysis revealed no distortion of the estimated sensitivity and specificity values when extreme values were excluded from the overall sample or when studies that used prioritization instead of binarization were excluded.

Most studies used either small datasets (72.7%) or retrospective designs (81.8%), which contributed to heterogeneity and limited the quality of the evidence. Consequently, considerable interest is directed towards studies conducted in real-world settings (22, 23) with large datasets (22, 25, 28), especially those that satisfy both criteria (22). The high complexity of large-scale screening experiments under real-world conditions stems from the stringent regulatory environments in many countries (33).

Currently, only AI systems that are registered as medical devices can be integrated into clinical practice. Prospective studies only include a limited number of AI solutions, such as the ChestEye algorithm by Oxipit (34), LUNIT INSIGHT CXR by Lunit (23), and QXR by Qure.ai (28). However, large-scale studies addressing financing models and the integration of autonomous triage into national healthcare systems are lacking for these solutions.

The temporary benefits of autonomous triage are often evaluated indirectly. Many researchers report a reduced workload for radiologists, primarily based on the theoretical savings of resources derived from the proportion of CXRs that are autonomously triaged. A prospective study involving 20,944 CXRs assessed time savings, approximated as potentially freed radiologist time (77%) (23).

The actual economic impact has only been evaluated in one study (22), largely due to the legislative and infrastructural environment established by the Moscow Experiment — the largest global, prospective, multicenter study on the applicability, safety and quality of AI in real clinical settings (3540). Economic efficiency calculations incorporated the costs of reporting and interpreting each CXR with and without autonomous triage, as well as the expenses involved in acquiring and maintaining the AI system. The proposed autonomous triage model yielded a 43.7% reduction in costs. Additionally, a protocol was proposed to eliminate clinically significant errors by implementing dual automated reviews, whereby two independent AI medical devices analyze each diagnostic image simultaneously, with the final decision favoring the patient. While this approach would double the interpretation fee, it is projected to maintain a 40.4% economic benefit.

Selective literature review indicates that AI is highly ready for the autonomous triage of screening CXRs due to its high diagnostic performance. This would enable a reduction in radiologists' workload, optimization of workflows, and budgetary savings. The proposed methodology can be scaled up to various AI solutions for CXR interpretation. The real-world experiment conducted on large datasets was successful (22). However, more studies in real clinical settings are needed to robustly assess the potential benefits and challenges of implementing this technology into practice.

Our review focuses on a specific aspect: the ability of modern AI to autonomously identify normal examinations during screening. The next step in determining the readiness of AI for clinical implementation in CXR analysis is to evaluate its ability to classify chest pathologies (41), detect single pathology (42, 43), and foreign objects (44). AI has already demonstrated impressive diagnostic performance in these areas.

This review has several limitations. We selected only the most comprehensive studies that included analyses of multiple pathologies with English abstracts. The number of such studies is rather limited. Furthermore, there is a lack of research conducted in real-world conditions.

5 Conclusions and practical recommendations

A systematic review of studies has demonstrated that modern AI systems enable safe autonomous triage of CXRs, allowing the identification of on average over 40% of normal examinations, with a mean sensitivity of up to 97.8%. In conclusion, the diagnostic performance of AI in the preventive screening of thoracic diseases and the autonomous sorting of CXRs is highly promising.

However, there is a need for large-scale studies based on real-world data, which to date have been reported only in limited publications. This limitation is associated with multiple evident legislative and infrastructural barriers that have been overcome in only a few countries.

The implementation of AI for autonomous triage has been demonstrated to optimize workflow processes and reduce the routine workload of radiologists. The implementation of an autonomous triage approach within public healthcare systems has been demonstrated to result in a substantial reduction in financial costs, with a reported decrease of 43.7%. The proposed double parallel autonomous triage model is expected to completely mitigate clinically significant errors.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

YV: Writing – review & editing, Conceptualization, Funding acquisition, Supervision, Project administration. AB: Conceptualization, Supervision, Project administration, Writing – review & editing. RR: Supervision, Writing – review & editing, Writing – original draft, Software, Investigation, Formal analysis, Project administration, Conceptualization, Methodology. ON: Writing – original draft, Visualization, Investigation, Formal analysis, Software, Writing – review & editing. AV: Methodology, Validation, Supervision, Conceptualization, Writing – review & editing, Resources. KA: Writing – original draft, Methodology, Investigation, Writing – review & editing, Conceptualization. PG: Writing – review & editing. OO: Funding acquisition, Project administration, Resources, Writing – review & editing, Supervision.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This article was prepared by a team of authors within the framework of a scientific and practical project in the field of medicine (No. EGISU: 125051305989-8) “A promising automated workplace of a radiologist based on generative artificial intelligence” in accordance with ordinance no. 656-PP dated April 1, 2025: on approval of state assignments funded from the Moscow city budget to state budgetary (autonomous) institutions subordinated to the Moscow Healthcare Department.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fdgth.2025.1685771/full#supplementary-material

References

  • 1.

    Rimmer A . Radiologist shortage leaves patient care at risk, warns royal college. Br Med J. (2017) 359:j4683. 10.1136/bmj.j4683

  • 2.

    Golubev Ogryzko EV Tyurina EM Shelepova EA Shelekhov PV . Features of the development of the radiation diagnostics service in the Russian federation for 2014–2019. Curr Prob Health Care Med Stats. (2021) 2:35676. 10.24412/2312-2935-2021-2-356-376

  • 3.

    Shelekhov PV . Personnel situation in radiative diagnostics. Curr Prob Health Care Med Stats. (2019) 1:26575. 10.24411/2312-2935-2019-10018

  • 4.

    Chew C O'Dwyer PJ Young D . Radiology and the medical student: do increased hours of teaching translate to more radiologists?BJR Open. (2021) 3(1):20210074. 10.1259/bjro.20210074

  • 5.

    Bouck Z Mecredy G Ivers NM Pendrith C Fine B Martin D et al Routine use of chest x-ray for low-risk patients undergoing a periodic health examination: a retrospective cohort study. CMAJ Open. (2018) 6(3):E3229. 10.9778/cmajo.20170138

  • 6.

    Twabi HH Semphere R Mukoka M Chiume L Nzawa R Feasey HRA et al Pattern of abnormalities amongst chest x-rays of adults undergoing computer-assisted digital chest x-ray screening for tuberculosis in Peri-Urban Blantyre, Malawi: a cross-sectional study. Trop Med Int Health. (2021) 26(11):142737. 10.1111/tmi.13658

  • 7.

    Markelov Y Shchegоleva LV . Clinical and economic aspects of tuberculosis detection during mass fluorographic examinations of the population. J Radiol Nuclear Med. (2021) 102(3):14854. 10.20862/0042-4676-2021-102-3-148-154

  • 8.

    Markelov Y Schegoleva LV . Evaluation of clinical and economic efficiency and impact of mass fluorography screening on tuberculosis epidemiological rates in four federal districts of the Russian federation with different levels of population coverage with mass fluorography screening. Tuberculosis Lung Dis. (2023) 101(1):816. 10.58838/2075-1230-2023-101-1-8-16

  • 9.

    Nechaev VA Vasilev A . Risk factors for development of perceptual errors in radiology. Vestnik SurGU Meditsina. (2024) 17(4):1422. 10.35266/2949-3447-2024-4-2

  • 10.

    Plesner LL Müller FC Nybing JD Laustrup LC Rasmussen F Nielsen OW et al Autonomous chest radiograph reporting using ai: estimation of clinical impact. Radiology. (2023) 307(3):e222268. 10.1148/radiol.222268

  • 11.

    Blake SR Das N Tadepalli M Reddy B Singh A Agrawal R et al Using artificial intelligence to stratify normal versus abnormal chest x-rays: external validation of a deep learning algorithm at east Kent hospitals university NHS foundation trust. Diagnostics. (2023) 3(22):3408. 10.3390/diagnostics13223408

  • 12.

    Barends E Rousseau DM Briner RB (Eds). CEBMA Guideline for Rapid Evidence Assessments in Management and Organizations, Version 1.0. Amsterdam: Center for Evidence Based Management (2017). p. 39. Available online at:https://cebma.org/resources/frequently-asked-questions/what-is-a-rapid-evidence-assessment-rea/(Accessed May 16, 2025).

  • 13.

    Vasilev YuA Vladzymyrskyy AV Omelyanskaya OV Reshetnikov RV Shumskaya YuF . Guidelines for writing a systematic review. Part 2: studies screening, data extraction. Moscow: State Budget-Funded Health Care Institution of the City of Moscow “Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department”. (2023). p. 32. Available online at:https://telemedai.ru/biblioteka-dokumentov/metodicheskie-rekomendacii-po-podgotovke-sistematicheskogo-obzora-chast-2-skrining-issledovanij-izvlechenie-dannyh(Accessed May 16, 2025).

  • 14.

    Patel A Cooper N Freeman S Sutton A . Graphical enhancements to summary receiver operating characteristic plots to facilitate the analysis and reporting of meta-analysis of diagnostic test accuracy data. Res Syn Meth. (2021) 12:3444. 10.1002/jrsm.143944

  • 15.

    Whiting PF Rutjes AW Westwood ME Mallett S Deeks JJ Reitsma JB et al QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. (2011) 155(8):52936. 10.7326/0003-4819-155-8-201110180-0000

  • 16.

    Kodenko MR Vasilev Y Vladzymyrskyy AV Omelyanskaya OV Leonov DV Blokhin IA et al Diagnostic accuracy of AI for opportunistic screening of abdominal aortic aneurysm in CT: a systematic review and narrative synthesis. Diagnostics. (2022) 12:3197. 10.3390/diagnostics12123197

  • 17.

    Kodenko MR Reshetnikov RV Makarova TA . Modification of quality assessment tool for artificial intelligence diagnostic test accuracy studies (QUADAS-CAD). Digit Diagnostics. (2022) 3(S1):45. 10.17816/DD105567

  • 18.

    Schünemann HJ Mustafa RA Brozek J Steingart KR Leeflang M Murad MH et al GRADE Guidelines: 21 part 1. Study design, risk of bias, and indirectness in rating the certainty across a body of evidence for test accuracy. J Clin Epidemiol. (2020) 122:12941. 10.1016/j.jclinepi.2019.12.020

  • 19.

    Schünemann HJ Mustafa RA Brozek J Steingart KR Leeflang M Murad MH et al GRADE Guidelines: 21 part 2. Test accuracy: inconsistency, imprecision, publication bias, and other domains for rating the certainty of evidence and presenting it in evidence profiles and summary of findings tables. J Clin Epidemiol. (2020) 122:14252. 10.1016/j.jclinepi.2019.12.021

  • 20.

    Nabulsi Z Sellergren A Jamshy S Lau C Santos E Kiraly AP et al Deep learning for distinguishing normal versus abnormal chest radiographs and generalization to two unseen diseases tuberculosis and COVID-19. Sci Rep. (2021) 11:15523. 10.1038/s41598-021-93967-2

  • 21.

    Annarumma M Withey SJ Bakewell RJ Pesce E Goh V Montana G . Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology. (2019) 291(1):196202. 10.1148/radiol.2018180921

  • 22.

    Vasilev Y Sychev DA Bazhin AV Shulkin IM Vladzymyrskyy AV Golikova A et al Autonomous artificial intelligence for sorting results of preventive radiological examinations of chest organs: medical and economic efficiency. Digit Diagnostics. (2025) 6(1):522. 10.17816/DD641703

  • 23.

    Sridharan S Xin Hui AS Venkataraman N Tirukonda PS Jeyaratnam RP John S et al Real-world evaluation of an AI triaging system for chest x-rays: a prospective clinical study. Eur J Radiol. (2024) 181:111783. 10.1016/j.ejrad.2024.111783

  • 24.

    Yoo H Kim EY Kim H Choi YR Kim MY Hwang SH et al Artificial intelligence-based identification of normal chest radiographs: a simulation study in a multicenter health screening cohort. Korean J Radiol. (2022) 23(10):100918. 10.3348/kjr.2022.0189

  • 25.

    Subramanian B Jaikumar S Kumarasami PSN Sivasailam K Keerthana ADR Mounigasri M et al Autonomous AI for multi-pathology detection in chest x-rays: a multi-site study in Indian healthcare system. arXiv [preprint]. (2025). Available online at:https://arxiv.org/html/2504.00022v1(Accessed August 04, 2025)

  • 26.

    Schalekamp S van Leeuwen K Calli E Murphy K Rutten M Geurts B et al Performance of AI to exclude normal chest radiographs to reduce radiologists' Workload. Eur Radiol. (2024) 34:725563. 10.1007/s00330-024-10794-5

  • 27.

    Dyer T Dillard L Harrison M Morgan TN Tappouni R Malik Q et al Diagnosis of normal chest radiographs using an autonomous deep-learning algorithm. Clin Radiol. (2021) 76(6):473.e9e15. 10.1016/j.crad.2021.01.015

  • 28.

    AlJasmi AAM Ghonim H Fahmy ME Nair A Kumar S Robert D et al Post-deployment performance of a deep learning algorithm for normal and abnormal chest x-ray classification: a study at visa screening centers in the United Arab Emirates. Eur J Radiol Open. (2024) 13:100606. 10.1016/j.ejro.2024.100606

  • 29.

    Bradley SH Bhartia BSK Callister MEJ Hamilton WT Hatton NLF Kennedy MPT et al Chest x-ray sensitivity and lung cancer outcomes: a retrospective observational study. Br J Gen Pract. (2021) 71(712):e8628. 10.3399/BJGP.2020.1099

  • 30.

    Hossain R Wu CC de Groot PM Carter BW Gilman MD Abbott GF . Missed lung cancer. Radiol Clin North Am. (2018) 56(3):36575. 10.1016/j.rcl.2018.01.004

  • 31.

    Vasilev Y Tyrov IA Vladzymyrskyy AV Arzamasov KM Pestrenin LD Shulkin IM . A new model of organizing mass screening based on stand-alone artificial intelligence used for fluorography image triage. Public Health Life Environ. (2023) 31(11):2332. 10.35627/2219-5238/2023-31-11-23-32

  • 32.

    Vasilev Y Tyrov IA Vladzymirskyy AV Shulkin IM Arzamasov KM . Autonomous artificial intelligence for sorting the preventive imaging studies’ results. Russian J Prev Med. (2024) 27(7):239. 10.17116/profmed20242707123

  • 33.

    Mennella C Maniscalco U De Pietro G Esposito M . Ethical and regulatory challenges of AI technologies in healthcare: a narrative review. Heliyon. (2024) 10(4):e26297. 10.1016/j.heliyon.2024.e26297

  • 34.

    Miró Catalina Q Vidal-Alaball J Fuster-Casanovas A Escalé-Besa A Ruiz Comellas A Solé-Casals J . Real-world testing of an artificial intelligence algorithm for the analysis of chest x-rays in primary care settings. Sci Rep. (2024) 4(1):5199. 10.1038/s41598-024-55792-1

  • 35.

    Bobrovskaya TM Vasilev Y Nikitin N Arzamasov KM . Approaches to building radiology datasets. Med Doctor IT. (2023) 4:1423. 10.25881/18110193_2023_4_14

  • 36.

    Vasilev Y Vladzymyrskyy AV Omelyanskaya OV Arzamasov KM Chetverikov SF Rumyantsev DA et al Methodology for testing and monitoring artificial intelligence-based software for medical diagnostics. Digit Diagnostics. (2023) 4(3):25267. 10.17816/DD321971

  • 37.

    Vasilev Y Vladzymyrskyy AV Arzamasov KM Shulkin IM Aksenova LE Pestrenin LD et al The first 10,000 mammography exams performed as part of the “description and interpretation of mammography data using artificial intelligence” service. Manager Zdravookhranenia. (2023) 8:5467. 10.21045/1811-0185-2023-8-54-67

  • 38.

    Vasilev Y Arzamasov KM Kolsanov AV Vladzymyrskyy AV Omelyanskaya OV Pestrenin LD et al Experience of application artificial intelligence software on 800 thousand fluorographic studies. Medi Doctor Info Technol. (2023) 4:5465. 10.25881/18110193_2023_4_54

  • 39.

    Vasilev Y Vladzimirsky AV Arzamasov KM Andreichenko AE Gombolevskiy VA Kulberg NS et al Computer Vision in Radiology: Stage one of the Moscow Experiment. 2nd ed.Moscow: Publishing solutions (2023). p. 376. (In Russ).

  • 40.

    Vasilev Y Vladzimirsky AV . Artificial intelligence in radiology: per Aspera ad astra. Moscow: Publishing Solutions (2025). p. 491. (In Russ).

  • 41.

    Kufel J Bielówka M Rojek M Mitręga A Lewandowski P Cebula M et al Multi-label classification of chest x-ray abnormalities using transfer learning techniques. J Pers Med. (2023) 13:1426. 10.3390/jpm13101426

  • 42.

    DuPont M Castro R Kik SV Palmer M Seddon JA Jaganath D . Computer-aided reading of chest radiographs for paediatric tuberculosis: current status and future directions. Lancet Digit Health. (2025) 7:100884. 10.1016/j.landig.2025.100884

  • 43.

    Linsen S Kamoun A Gunda A Mwenifumbo T Chavula C Nchimunya L et al A comparison of CXR-CAD software to radiologists in identifying COVID-19 in individuals evaluated for Sars CoV-2 infection in Malawi and Zambia. PLoS Digit Health. (2025) 4:e0000535. 10.1371/journal.pdig.0000535

  • 44.

    Kufel J Bargieł-Łączek K Koźlik M Czogalik Ł Dudek P Magiera M et al Chest x-ray foreign objects detection using artificial intelligence. J Clin Med. (2023) 12:5841. 10.3390/jcm12185841

Appendix 1 List of articles excluded during full-text screening

First Author/Year Title/Journal DOI Reason for exclusion
1 Yoon 2024 Use of artificial intelligence in triaging of chest radiographs to reduce radiologists' workload/ European Radiology 10.1007/s00330-023-10124-1 Comparison of radiologists’ readings with and without AI assistance.
2 Pillai 2022 Multi-Label Chest X-Ray Classification via Deep Learning. /Journal of Intelligent Learning. Systems and Applications 10.4236/jilsa.2022.144004 Classification of individual pathologies by different AI systems.
3 Albahli 2021 AI-driven deep CNN approach for multi-label pathology classification using chest X-Rays/PeerJ Computer Science 10.7717/peerj-cs.495 Classification of individual pathologies by AI.
4 Karthik 2021 Performance of a deep learning CNN model for the automated detection of 13 common conditions on Chest x-rays/ TechRxiv 10.36227/techrxiv.16725727.v1 Classification of individual pathologies by AI.
5 Govindarajan 2022 Role of an Automated Deep Learning Algorithm for Reliable Screening of Abnormality in Chest Radiographs: A Prospective Multicenter Quality Improvement Study/Diagnostics 10.3390/diagnostics12112724 Classification of individual pathologies by AI.

Summary

Keywords

AI, autonomous triage, chest x-ray, radiology, screening

Citation

Vasilev Y, Bazhin A, Reshetnikov R, Nanova O, Vladzymyrskyy A, Arzamasov K, Gelezhe P and Omelyanskaya O (2026) Autonomous chest x-ray image classification, capabilities and prospects: rapid evidence assessment. Front. Digit. Health 7:1685771. doi: 10.3389/fdgth.2025.1685771

Received

14 August 2025

Revised

02 December 2025

Accepted

18 December 2025

Published

13 January 2026

Volume

7 - 2025

Edited by

Xiaofeng Li, Heilongjiang International University, China

Reviewed by

Mansoor Fatehi, National Brain Mapping Laboratory (NBML), Iran

Jakub Kufel, Medical University of Silesia, Poland

Updates

Copyright

* Correspondence: Olga Nanova

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics