You're viewing our updated article page. If you need more time to adjust, you can return to the old layout.

SYSTEMATIC REVIEW article

Front. Neurol., 01 December 2025

Sec. Sleep Disorders

Volume 16 - 2025 | https://doi.org/10.3389/fneur.2025.1663851

Accuracy of deep learning in diagnosis of apnea syndrome: a systematic review and meta-analysis

  • 1. Cardiac Function Department, Heart Center, The First Affiliated Hospital of Xinjiang Medical University, Ürümqi, China

  • 2. Department of Science and Technology, Xinjiang Medical University, Ürümqi, China

  • 3. The Third General Department, The First Affiliated Hospital of Xinjiang Medical University, Ürümqi, China

Article metrics

View details

1,1k

Views

75

Downloads

Abstract

Objectives:

This systematic review and meta-analysis was carried out to elucidate the accuracy of image-based deep learning (DL) methods in the real-time detection of obstructive sleep apnea syndrome (OSAS).

Methods:

A systematic search was conducted for studies published since database establishment up to September 25, 2025, across databases including PubMed, Embase, Web of Science, and the Cochrane Library. The included studies were assessed for risk of bias by using the QUADAS-2 tool. During this meta-analysis, a bivariate mixed-effects model was employed and only synthesized the results from the meta-analysis of the validation sets. Meanwhile, subgroup analyses were conducted based on the generation methods of the validation sets.

Results:

A total of 39 original studies were ultimately included, all of which constructed DL images derived from electrocardiogram (ECG) images. Our meta-analysis results suggested that for the comprehensive validation set, the sensitivity, specificity, diagnostic odds ratio (DOR), and the area under summary receiver operating characteristic (SROC) curve were 0.93 (95% CI: 0.90–0.96), 0.95 (95% CI: 0.92–0.96), 252 (95% CI:116–549), and 0.98 (95% CI: 0.42–1.00), respectively. For the independent validation set, the sensitivity, specificity, and SROC curve were 0.93 (95% CI: 0.88–0.96), 0.95 (95% CI: 0.92–0.97), and 0.98 (95% CI: 0.42–1.00), respectively. For the K-fold cross-validation set, the sensitivity, specificity, positive likelihood ratio (LR), and SROC curve were 0.94 (95% CI: 0.88–0.97), 0.94 (95% CI: 0.89–0.96), 15.0 (95% CI: 8.1–27.6) and 0.98 (95% CI: 0.65–1.00), respectively.

Conclusion:

The ECGs-based DL models demonstrate ideal accuracy for the detection of OSAS and appear to be a viable method for real-time detection. During our research process, we found that the modeling was actually based on extracting studies from segments of ECGs, but the extracted segments appeared to vary in duration. Since this aspect was not subjected to subgroup analysis in our study, we plan to conduct further exploration and validation in subsequent research.

Systematic review registration:

CRD42023465176, https://www.crd.york.ac.uk/PROSPERO/home.

Introduction

OSAS is a clinical syndrome characterized by recurrent upper respiratory tract obstruction during sleep due to various causes, which leads to fragmented sleep and intermittent hypoxia during sleep periods (1). Studies have found that in patients with hypertension, coronary artery disease, heart failure, pulmonary hypertension, atrial fibrillation, and stroke, the prevalence of OSAS is as high as 40 to 80% (2). Its characteristics include repeated partial or complete respiratory pauses due to upper respiratory tract obstruction, affecting ventilation during sleep (3). Obstructive sleep apnea is very common among patients with cardiovascular diseases and is associated with the incidence and prevalence of hypertension, arrhythmias, coronary heart disease, heart failure, and stroke (4). It was found that the number of individuals affected by OSAS remains high, with an estimated 1 billion people suffering from OSAS worldwide (5).

As of now, the diagnostic process for OSAS is quite laborious. The gold standard method for diagnosing OSAS requires a polysomnography (PSG) system, a hospital setting, technical personnel, and a specialist physician (6). PSG involves recording, analyzing, and simultaneously collecting changes in multiple physiological signals, typically including but not limited to ECG and respiratory signals (6). However, a PSG device involves nearly 60 electrodes, making it difficult for a patient to have a normal sleep experience with so many connecting cables. Therefore, it becomes especially important to employ a method that not only offers high diagnostic reliability but is also simpler and more comfortable for patients (7).

In recent years, AI has begun to emerge and develop rapidly. In 2022, the highest investment was directed toward the healthcare and medical care sectors (8). Many anticipate that AI will achieve similar successes in the health sector, especially in diagnostics. Some believe that AI applications might even replace entire medical disciplines or create new roles to assist physicians (9). Against this background, medical image-based DL methods have gradually garnered extensive attention from researchers in clinical practice. For instance, in oncology, DL methods are employed to the identification of the benign or malignant nature of tumors based on pathological images, including early cancer detection, diagnosis, tumor classification and grading, molecular characterization, prognosis prediction, treatment response prediction, personalized treatment, automated radiation therapy workflows, discovery of novel anticancer drugs, and support in clinical trials (10, 11). Furthermore, we have noted other studies on DL and its application in diagnosing various types of diseases (12, 13).

In this context, some researchers have attempted to develop ECG-based DL models for detecting OSAS. However, the results from these DL approaches remain controversial, presenting certain challenges for the development of AI in this field. Therefore, this systematic review and meta-analysis was carried out to investigate the effectiveness and safety of DL methods based on medical images for real-time monitoring of OSAS, and to provide evidence-based opinions for the development and update of future real-time monitoring tools, such as wearable devices.

Methods

Study registration

The systematic review and meta-analysis adhered to the PRISMA, PRISMA NMA, and DMA guidelines, and prospectively registered on PROSPERO (ID: CRD42023465176).

Eligibility criteria

Inclusion criteria

  • Studies such as case–control, cohort, case-cohort, nested case–control, and cross-sectional studies.

  • Studies that have fully constructed DL models for diagnosing OSA.

  • Some studies did not use other datasets for validation of the constructed models, and only performed cross-validation. These studies were also included in this systematic review.

  • Different DL studies were based on the same dataset. These studies were also included in our systematic review.

  • Included studies were original research reported in English.

Exclusion criteria

  • Studies such as meta-analyses, literature reviews, conference papers, guidelines, and expert opinions.

  • Given that this is a study on DL, which places a greater emphasis on the performance in the validation set, studies that did not perform any form of validation were excluded from our research.

  • Studies lacking an assessment of DL model accuracy with the following outcome measures: ROC curve, C-index, sensitivity, specificity, accuracy, precision, confusion matrix, F1 score, and calibration curve.

  • Studies focused on image segmentation.

  • Studies with populations being neonates and children.

Data sources and search strategy

A systematical search was done for relevant literature published until September 25, 2025, in PubMed, Embase, Cochrane, and Web of Science databases, using MeSH + free-text terms, without limiting publication region or year (Table S1 for search strategy). To minimize the risk of missing newly published literature, we conducted a supplementary search of the databases on September 25, 2025.

Study selection and data extraction

Obtained articles were imported into EndNote. After excluding duplicates, original studies that initially met the criteria were reviewed by titles and abstracts. Full texts of preliminary eligible articles were downloaded for further screening to select the final original studies for this review.

Prior to extracting data, a standardized data extraction spreadsheet was developed, including title, the first author, year of publication, author’s country, study type, patient source, image source, number of patients with sleep apnea syndrome, the total number of cases involved in the generation of the validation set, number of cases of sleep apnea syndrome in the validation set, model type used, and whether a comparison with clinical physicians was made.

The literature selection and data extraction were independently done by two researchers (Alimila Saiyitijiang; Zhihui Nai) and cross-checked. In the event of any disputes, a third reviewer was engaged for resolution.

Risk of bias in studies

The risk of bias and applicability of the included studies were evaluated by adopting the QUADAS-2 tool (14). This tool evaluates four domains: patient selection, index test, reference standard, and flow and timing. Each domain contains specific questions answered as “Yes,” “No,” or “Uncertain,” corresponding to a judgment of bias risk as “Low,” “High,” or “Uncertain,” respectively. Studies are considered at low risk of bias if all signaling questions in each domain are answered with “Yes.” Any “No” among the answers to the signaling questions indicates potential bias, and the evaluator must then judge the risk of bias according to the guidelines provided. An “Uncertain” rating refers to situations with insufficient information for a definitive judgment.

Synthesis methods

Data analysis was performed utilizing Stata 15.0. A bivariate mixed-effects model was employed for the meta-analysis. Pooled estimates of sensitivity, specificity, Positive likelihood ratio (PLR), Negative likelihood ratio (NLR), Diagnostic odds ratio (DOR), and corresponding 95% CIs, were calculated using this model. The area under the SROC curve was also estimated. Deek’s funnel plot was adopted to assess publication bias. Additionally, a nomogram was employed to evaluate the clinical applicability of DL. Throughout the analysis, the prevalence of OSA in the studies included was used as the prior probability. Furthermore, we conducted subgroup analyses on the methods of validation set generation (independent validation and k-fold cross-validation). p < 0.05 was deemed statistically significant.

Results

Study selection

From the various databases, we retrieved 3697 publications. After removing 2,102 duplicates, we further excluded articles based on the following criteria: not related to the subject of this study or not involving DL (1,435 articles), not written in English (18 articles), systematic reviews and guidelines as well as registries and those using other research methods (99 articles). Additionally, 4 articles were excluded due to being unable to download or incomplete data. Ultimately, 39 publications (15–53) were included in the study (Figure 1).

Figure 1

Flowchart illustrating study selection via databases and registers. Identification includes records from Embase (1367), Cochrane (45), Web of Science (1927), PubMed (358), with 3636 total. 3636 records are screened; 2102 duplicates removed, 1435 ineligible by automation, 99 for other reasons. After title and abstract screening, 61 records remain, 18 excluded. 43 reports sought for retrieval, none not retrieved. Eligibility assessment of 43 reports leads to 3 excluded for repetitive data, 1 for lacking outcome measures. 39 studies included in review.

Literature screening flowchart.

Study characteristics

These articles were published primarily between 2018 and 2023, and only one (48) was published in 2011; 21 of these articles were from China (16, 17, 20–23, 25, 26, 29, 32–34, 36–39, 44–46, 50, 53), 5 from India (27, 29–31, 52), 3 from the United States (24, 40, 51), 3 from Iran (15, 19, 35), 2 from Korea (18, 41), 1 from Australia (42), 1 from Bangladesh (43), 1 from Jordan (48), and 1 from Turkey (49). Data from these articles were derived primarily from the 2000 Cardiology Association Physical Apnea-ECG Database, the PhysioNet publicly available Apnea-ECG Database, the UCDDB, and the Philips University Physical Apnea-ECG Database. In these studies, only 2 articles (33, 53) used 5-min segments, 1 (28) used 3-min segments, and 1 (26) used 6-min segments, while the rest of the articles used 1-min segments. Fourteen articles (15, 16, 22, 26, 28, 29, 31–33, 38, 44, 47–49) employed internal random sampling for validation, 13 (18, 19, 23, 24, 27, 30, 35, 36, 39, 40, 50–52) used K-fold cross-validation, and 13 (17, 19, 21, 25, 34, 37, 41–43, 45, 46, 51, 52) utilized external multicenter validation (Table 1).

Table 1

No. First author Year of publication Author’s country Patient source Image source Segment duration (min) Type of deep learning models Total cases of OSA Total number of cases
1 Bahrami, Mahsa 2022 Iran Single-center Single lead 1 CNN P:19
I: 13066
P:32
I:34428
2 Chang, H. Y 2020 China, Taiwan Single-center Single lead 1 CNN 13,174 Segments: 34,2,131
3 Chen, J. 2022 China Multicenter Single lead 1 CNN 34,347 segments
4 Ullah, N 2023 Korea Single-center Single lead 1 DCDA P: Unknown; I: 13,060 P: 70
I: 33,060
5 Zarei, A 2022 Iran Multicenter Single lead/dynamic 3-lead 1 CNN Holdout set OSA segments: 6,550 P: 70
Segments: 34,313
6 Yang, Q. 2022 China Internal Single lead 1 ResNet Total apnea segments: 13,048 P: 32 individuals;
Segments: 34,129
7 Wang, Z. 2022 China Multicenter Single lead 1 URNet. Database A contained 12,963 OSA segments and database B contained 10,337 OSA segments, with 8,328 in the training set and 2,009 in the test set. Database A contained a total of 33,645 segments and database B contained 27,488 training and testing segments, involving 62 individuals
8 Fei Teng 2022 China Single-center Single lead 1 DCNN Unknown 70 recordings
9 Febryan Setiawan 2022 China, Taiwan Internal Single lead 1 CNN 20 recordings 35 recordings
10 Tanmoy Pau 2022 USA Single-center Single lead 1 DFNN A total of 1,072 ECG signals for OSA, with 271 test sequences being OSA Total ECG sequences: 2,606, with 652 sequences in the test set
11 Qin, H 2022 China Multicenter Single lead 1 DCNN Training set OSA sequences: 15,637; test set OSA sequences: 13,023 The training set consisted of 105 single-lead ECG recordings from overnight sessions, totaling 52,004 RR sequences; the test set included 70 ECG recordings, totaling 33,992 RR sequences.
12 Shuaicong Hu 2022 China Internal Single lead 6 CNN A total of 13,066 OSA cases, with the test set comprising 6,552 OSA segments 70 recordings, totaling 34,428 segments (with 17,303 segments in the test set)
13 Kapil Gupta 2022 India Single-center Single lead 1 CNN The database contained 70 overnight ECG records, which included a total of 11,620 segments after segmentation. 70 recordings
14 Liu, H 2023 China Single-center Single lead 3 CNN Unknown 70 recordings (totaling 34,313 segments, of which 17,045 were in the release set, and 17,268 were in the holdout set)
15 Kumar Tyagi, P 2023 India Single-center Single lead 1 CNN A total of 13,174 segments, with 6,657 OSA cases in the training group and 6,517 OSA cases in the validation group 70 recordings, totaling 34,212 min, 16,979 min in the training group, and 17,233 min in the validation group
16 Kumar, Chandra Bhushan 2023 India Internal 3-lead 1 CNN 70 (35 for the training set and 35 for the test set)
17 Hemrajani, P 2023 India Single-center Single lead 1 CNN 70 individuals, 70 recordings (35 for the holdout set and 35 for the released set) The holdout set was used to train the model, and the release set was used for validation.
18 Chen, X 2023 China Single-center Single lead 1 BAF-Net 70 individuals, 70 recordings (half for the training set and half for the validation set)
19 Xianhui Chen 2022 China Single-center Single lead 5 CNN 46 (23 patients each in the training set and test set) 70 recordings, 70 individuals (34,039 min segments, with 16,945 segments in the test set)
20 Keyan Cao 2022 China Multicenter Single lead 1 CNN In database 1, the training set contained 6,538 OSA segments, and the validation set contained 6,490 OSA segments; database 2 contained 2,633 OSA segments. Database 1 consisted of 70 ECG signals totaling 33,715 segments, with 35 recordings making up a holdout set of 16,833 segments and 35 recordings in the release set totaling 16,882 segments. The other database included 25 ECG signals totaling 10,217 segments.
21 Mahsa Bahrami 2022 Iran Single-center Single lead 1 CNN 32 individuals with 70 ECG records
22 Yuankai YU 2021 China Single-center Single lead 1 CNN 32 individuals with 70 ECG records
23 Kunyang Li 2018 China Multicenter Single lead 1 CNN Among the 70 recordings, the release set was utilized for training classifiers, while the holdout set was used for validation. In total, there were 34,313 segments included in the two groups, with the release set comprising 17,045 segments and the holdout group comprising 17,268 segments.
24 Lei Wang 2019 China Single-center Single lead 1 CNN 35 individuals, 35 recordings totaling 16,988 min of segments, with apnea types including 6,496 min and non-apnea types including 10,492 min.
25 Feng, K. C. 2021 China Single-center Single lead 1 CNN 32 individuals, 70 overnight ECG records (4 individuals each with 1 recording, 22 individuals each with 2 recordings, 2 individuals each with 3 recordings, and 4 individuals each with 4 recordings). The dataset was evenly divided into a release set and a holdout set.
26 Faust, O 2021 USA Single-center Single lead 1 CNN 35 recordings
27 Urtnasan, E. 2018 South Korea Multicenter Single lead 1 CNN The training and testing datasets consisted of data from events involving 63 patients (34,281 events) and 19 patients (8,571 events), respectively. A total of 82 subjects were randomized into two groups to form the training and testing datasets. The training dataset group included 17 cases of mild OSA, 23 cases of moderate OSA, and 23 cases of severe OSA. The testing dataset group comprised 5 cases of mild OSA, 7 cases of moderate OSA, and 7 cases of severe OSA.
28 Sharan, R. V 2020 Australia Multicenter Single lead 1 CNN 70 overnight ECG recordings, 35 recordings were used for training the model, and the other 35 were designated for testing
29 Mashrur, F. R 2021 Bangladesh Multicenter Single lead 1 CNN 70 subjects, 35 in the release set, while another 35 retained. The release dataset included 6,514 min of apnea events and 10,496 min of non-apnea events.
30 Junming Zhang 2021 China Single-center Single lead 1 CNN 70 recordings
31 Fang, H 2022 China Multicenter Single lead 1 CNN 32 subjects, a total of 33,752 segments were retained, with 16,743 segments allocated to the training set and 17,009 segments designated for the testing set.
32 Shen, Q 2021 China Multicenter Single lead 1 CNN 70 overnight ECG recordings, 35 recordings were used for training the model, and the other 35 were designated for testing
33 Thompson, S 2020 UK Single-center Single lead 1 CNN 70 individuals, 35 recordings, totaling 17,125 min (or 285 h and 25 min) of sleep time; of which 6,514 min (or 108 h and 34 min) were apnea, and 10,611 min (or 176 h and 51 min) were non-apnea.
34 Lweesy, K 2011 Jordan Single-center 12-lead 1 CNN 25 individuals (1,500 data columns)
35 Nasifoglu, H 2021 Turkey Single-center Single lead CNN 152 recordings
36 Niroshana, S. M. I. 2021 China, Taiwan Single-center Single lead 1 CNN 70 recordings, divided into two groups (release group and holdout group), each with 35 subjects
37 Sheta, A. 2021 USA Single-center Single lead CNN 70 primary records, evenly divided into a learning set and a test set containing 35 records
38 Singh, H 2020 India Multicenter Single lead 1 CNN From the Apnea-ECG database, a total of 6,509 cases of apnea events and 10,442 instances of normal events were obtained. 70 recordings
39 Wang, T 2019 China Multicenter Single lead 5 CNN 70 recordings
Validation set generation method Number of OSA cases in validation set Total cases in validation set
Internal validation
Internal validation 6,517 min Contents in the holdout set were all used for validation (17,234 min)
External validation
10-fold cross validation 2,351 images (20% of the validation set) 5,951 images
External validation (multicenter) 35 recordings
External validation 17,164 segments were used for the test set, among which 6,536 segments indicated apnea 17,164 segments were used for the test set, among which 6,536 segments indicated apnea
External validation (multicenter) 2009 10 individuals, involving a total of 4,574 segments
Internal validation (random sampling) 35 (a total of 70 records, divided into groups of 35 each; 70% of one group was used for training, and the remaining 30% for testing)
10-fold cross validation
10-fold cross validation The test set included 271 OSA sequences The test set included a total of 652 sequences
External validation (all 105 recordings were used for training, and all 70 recordings were used for testing). OSA sequence: 13,023 70 recordings with a total of 33,992 RR sequences
Internal validation Training set (35 recordings) and test set (35 recordings), with a total of 17,125 marked segments for both the training and test sets (6,514 marked as OSA and 10,611 marked as normal), and 17,303 segments (6,552 marked OSA and 10,751 marked as normal)
A 10-fold cross validation strategy was utilized to construct the model. The entire dataset was divided into 10 parts, with eight parts used for training, one part for validation, and the remaining for testing. This process was repeated 10 times, each time with changes to the testing, validation, and training data. 80% was used as the training set, 10% as the test set.
Internal validation (80% of the release set was used for training, 20% for validation; the holdout set was used for testing) The training and validation sets had a ratio of 8:2
Internal validation 6,517 min 35 recordings, 17,233 min
10-fold cross validation
Internal validation Unknown 35 recordings
Internal validation Specifically, the training set was further divided into training and validation sets with a stratification ratio of 70%:30%
Internal validation 23 35 recordings
External validation (multicenter) In dataset 1, the release set consisted of 16,833 segments, with 6,538 being OSA segments and 10,295 being normal segments. The holdout set comprised 16,882 RRIs segments, with 6,490 for OSA and 10,392 for normal segments. The release set was used for training classifiers, while the holdout set was utilized to validate the performance of OSA detection algorithms. In the UCDDB dataset 2, 10,217 segments were retained, including 2,633 OSA segments and 7,584 normal segments, all of which were used for model testing
10-fold stratified cross-validation 10% of the training data was used as a validation set
10-fold cross validation
Internal validation
Internal validation 5 recordings
10-fold cross validation The holdout set was used for testing, where OSA segments numbered 6,502, and normal segments 10,611 The total processed segments were 33,976 (release set: 16,863, holdout set: 17,113). The released collection was randomly divided into two parts: Part A (6,699 segments/14 subjects) and Part B (10,164 segments/21 subjects)
10-fold cross-validation + leave-one-out Validation segments: 935,462
External validation (multicenter)
External validation (multicenter) The training dataset comprised 16,817 segments, while the testing dataset contained 16,996 segments.
External validation (multicenter) The training, validation, and test sets included 11,367, 2,435, and 2,438 segments, respectively. UCDDB dataset was subjected to training (7,025 segments), validation (1,505 segments), and testing (1,505 segments)
Internal validation Training set (N210,680 + 130,050); validation set (N213,830 + 13,102)
External validation (multicenter) 32 subjects, 70 ECG signal recordings, a total of 33,752 segments were retained. Of these, 16,743 segments were allocated to the training set, and 17,009 segments were designated for the testing set.
External validation (multicenter) For the training sequence segments, there were 6,488 segments with OSA and 10,236 normal segments. The validation set retained a total of 16,988 sequence segments, comprising 6,489 OSA segments and 10,499 normal segments.
Internal validation Group A: OSA events: 6,250 Each experiment’s data were divided into 72% for training, 20% for testing, and 8% for validation
Internal validation 70% (1,052 data columns) were randomly selected for artificial neural network training, and the remaining 30% (448 data columns) were divided into two groups: one for validation (224 data columns) and the other for testing (224 data columns)
Internal validation
10-fold cross validation 10-fold cross validation
10-fold cross validation After preprocessing steps on the training set, the generated dataset contained 14,775 samples, of which 10,078 were marked as normal, and 4,679 were marked as affected by OSA The test data generated after 10-fold cross-validation contained 4,935 samples, with 3,197 being marked as normal, and 1,738 marked as affected by OSA
10 fold cross validation
External validation (multicenter)

Essential details of the included literature.

Risk of bias in studies

Utilizing the QUADAS-2 tool, we primarily assessed the overall risk of bias and concerns regarding applicability. In terms of overall bias risk, regarding case selection, none of the studies avoided case–control designs. However, since these studies primarily utilized ECG-based DL, where the modeling variables do not involve manual coding, we believed this introduced only a minimal risk of bias. Moreover, we considered all exclusions of cases to be appropriate; thus, from the perspective of case selection, those studies were judged to have a low risk of bias. As for the index test, most studies did not describe or provide information on whether the interpretation of outcomes was done without knowledge of the reference standard results, and there was no specific threshold established. Since both factors are unlikely to affect the results of DL, we considered the risk of bias from the index test perspective as low. All included studies were able to accurately distinguish the target disease states. The use of blinding in interpreting the reference standard results was not mentioned, but considering its minimal impact on DL models, we viewed the implementation and interpretation of the reference standard as having a low risk of bias. Most studies did not explain the time interval between the index test and the reference standard, but given that OSAS is a chronic condition, we assessed this as having a low risk of bias. Since all patients underwent the same reference standard and all cases meeting the inclusion criteria were analyzed, we considered the risk of bias in terms of the flow and timing of patients as low. Since the relevant included patients and backgrounds were well matched with the evaluation questions, and the reference standard is highly applicable, we assessed the applicability concerns related to patient selection and the reference standard as low. For independent validation sets, the index test’s implementation and interpretation matched the evaluation questions well, suggesting a low risk. However, for K-fold cross-validation, the implementation and interpretation of the index test presented a high risk of bias in matching the evaluation questions (Figures 2, 3).

Figure 2

Bar graph displaying the risk of bias and applicability concerns across four categories: Patient Selection, Index Test, Reference Standard, and Flow and Timing. All categories show low risk and low applicability concerns, except Index Test, which has some high risk indicated in red. A legend identifies colors: red for high, yellow for unclear, and green for low.

Methodological quality graph.

Figure 3

Chart showing risk of bias and applicability concerns across various studies. Columns cover patient selection, index test, reference standard, and flow and timing. Green circles indicate low risk, yellow for unclear, and red for high. Most studies show low risk, with some red circles indicating high risk in specific categories.

Methodological quality summary.

Meta-analysis

Synthesized results

The outcomes of our meta-analysis employing a bivariate mixed-effects model showed that the comprehensive validation set had a sensitivity of 0.93 (95% CI: 0.90–0.96), specificity of 0.95 (95% CI: 0.92–0.96), PLR of 17.7 (95% CI: 11.8–26.7), NLR of 0.07 (95% CI: 0.05–0.11), DOR of 252 (95% CI: 116–549), I2 = 99.76 (99.74–99.77) and an area under the SROC curve of 0.98 (95% CI: 0.42–1.00) (Figures 4, 5).

Figure 4

Forest plot comparing sensitivity and specificity of various studies. The left side shows sensitivity with confidence intervals and a combined sensitivity of 0.94. The right side shows specificity with confidence intervals and a combined specificity of 0.95. Data points are distributed around a central vertical line representing the average. Each side lists specific studies with corresponding values.

Forest plot of the meta-analysis results of the sensitivity and specificity of OSA detection by ECG segment-based DL models.

Figure 5

A Summary Receiver Operating Characteristic (SROC) plot shows sensitivity versus specificity. It includes observed data, a summary operating point, and SROC curve. The area under the curve (AUC) is 0.98. Confidence and prediction contours are marked. A legend explains symbols, with a summary sensitivity of 0.94 and specificity of 0.95.

SROC curve of the meta-analysis results of OSA detection by ECG segment-based DL models.

In the studies included, about 30% of the ECG image segments were associated with OSAS. Using this data as a hypothesis for the prior probability of OSAS, when the DL judgment result was positive, the probability of the true result for sleep apnea syndrome was 88%; when the DL judgment result was negative, the probability of the true result not being sleep apnea syndrome was 97% (Figure 6). Deek’s funnel plot indicated no significant publication bias across the studies (Figure 7).

Figure 6

Funnel plot showing the relationship between diagnostic odds ratio and one over the square root of effective sample size (ESS). Dots represent individual studies, with a regression line through the points. The plot is used to assess asymmetry, and the p-value is 0.85. A legend indicates symbols for studies and the regression line.

Nomogram of the meta-analysis results of OSA detection by ECG segment-based DL models.

Figure 7

Funnel plot showing diagnostic odds ratio on the x-axis and one over the square root of the effective sample size on the y-axis. Most studies cluster near the top-left, indicating low asymmetry. A regression line is present. The Deeks' Funnel Plot Asymmetry Test p-value is 0.81, suggesting no significant asymmetry. A legend indicates circles for studies and a dashed line for regression.

Deek’s funnel plot of meta-analysis results of OSA detection by ECG segment-based DL models.

Subgroup analysis

The sensitivity, specificity, PLR, NLR, DOR and SROC curve of the independent validation set were 0.93 (95% CI: 0.88–0.96), 0.95 (95% CI: 0.92–0.97), 19.5 (95% CI: 11.3–33.7), 0.07 (95% CI: 0.04–0.12), 274 (95% CI: 101–743), and 0.98 (95% CI: 0.42–1.00), respectively (Supplementary Figures S1, S2) and I2 = 99.76 (95% CI: 99.74–99.77). When the DL results were positive, the probability that the true result for apnea syndrome was 88%. When the DL judgment result was negative, the probability of the true result not being sleep apnea syndrome was 97% (Supplementary Figure S3). Deek’s funnel plot indicated no significant publication bias across the studies (Supplementary Figure S4).

The sensitivity, specificity, PLR, NLR, DOR and SROC curve of the K-fold cross-validation set were 0.94 (95% CI: 0.88–0.97), 0.94 (95% CI: 0.89–0.96), 15.0 (95% CI: 8.1–27.6), 0.07 (95% CI: 0.03–0.13), 227 (95% CI: 64–808), and 0.98 (95% CI: 0.65–1.00), I2 = 99.76 (99.74–99.77) respectively (Supplementary Figures S5, S6). When the DL results were positive, the probability that the true result for apnea syndrome was 90%. When the DL judgment result was negative, the probability of the true result not being sleep apnea syndrome was 96% (Supplementary Figure S7). Deek’s funnel plot revealed no significant publication bias across the studies (Supplementary Figure S8).

Discussion

In this systematic review and meta-analysis, 39 original studies were analyzed using a bivariate mixed-effects model, synthesizing solely the results of the meta-analysis of the validation sets. From the analysis, it was observed that the sensitivity and specificity for real-time OSAS detection using ECG image-based DL were 0.93 (95% CI: 0.90–0.96) and 0.95 (95% CI: 0.92–0.96), respectively. For the independent validation sets, the sensitivity and specificity of detecting OSAS with ECG image-based DL were 0.93 (95% CI: 0.88–0.96) and 0.95 (95% CI: 0.92–0.97), respectively. For the K-fold cross-validation sets, the sensitivity and specificity of detecting OSAS using ECG image-based DL were 0.94 (95% CI: 0.88–0.97) and 0.94 (95% CI: 0.89–0.96), respectively.

In this study, we also noticed that some researchers have concentrated on the detection of OSAS using other methods such as polysomnography (PSG), biomarkers, imaging, etc. For instance, the method of “balloon angiography” used by Huysmans et al. (54) involved installing motion sensors under the bed of sleeping individuals to record coarse body movements, respiratory-related movements, and even cardiac motion during PSG. The combination of these three signals can largely provide a relatively better assessment of sleep state margins and sleep-related breathing disorders. However, this detection method is more complex and requires a higher level of environmental monitoring. However, the study results showed a screening sensitivity of 0.77 and specificity of 0.62 for patients with severe apneas, and for general apnea patients, a screening sensitivity of 0.72 and specificity of 0.70, both lower than those found in our study.

A study conducted by Zorlu D et al. (55) investigated the use of complete blood count parameters to predict the OSAS diagnosis and to grade the severity of OSAS. The study found that a lymphocyte value of 737.14 as a cutoff point showed a sensitivity (90.7%) and specificity (92.6%), indicating that the lymphocyte variable possesses a certain diagnostic value for mild OSAS. No significant cutoff values could be identified for moderate and severe OSAS groups, as the area under the ROC curve was not significant (p > 0.05) for these groups. Blood cell data are influenced by various factors, although eliminating the impact of patient-related factors on NLR, PLR, and WMR parameters was a notable strength in their study, they did not evaluate the impact of comorbidities, a history of corticosteroid treatments, or other inflammatory factors like calcitonin on complete blood count parameters.

Mahesh N et al. (56) also conducted a meta-analysis on diagnosing OSAS using the STOP-Bang questionnaire validated with polysomnography. In the sleep clinic population, the sensitivities for detecting any OSAS (AHI > 5), moderate to severe OSAS (AHI > 15), as well as severe OSAS (AHI > 30) were 90, 94, and 96%, respectively. Similarly, this pattern was noted in the surgical patient population. As the STOP-Bang score escalates, the likelihood of moderate and severe OSAS occurrence rises correspondingly. However, this study demonstrated moderate to high heterogeneity in the systematic review and meta-analysis, one factor of which was the variability of target populations in various studies and potential differences in the prevalence of OSAS among diverse populations. Additionally, there is a dearth of confirmatory studies in surgical patients. Moreover, it failed to achieve real-time detection, and the detection process was time-consuming.

In our study, we found that ECG-based methods for diagnosing OSA demonstrated highly favorable accuracy. ECG-based DL methods also achieved excellent diagnostic accuracy for OSA. However, significant challenges remain in the detection of OSA. For instance, OSAS is a sleep disorder characterized by repetitive cessation of airflow lasting at least 10 s per event, with each apnea episode accompanied by cardiovascular changes. During apnea, heart rate often decreases, and when breathing resumes, a relative tachycardia can be observed. Blood pressure drops during apnea episodes and rises at the end of apnea as sympathetic nerve activity increases. Oxygen saturation declines with the cessation of breathing, reaching its lowest point after several cycles of resumed breathing. The characteristics of fluctuations in blood oxygen saturation and heart rate are utilized by portable diagnostic devices for the early detection of sleep apnea (57).

Hypoxemia and hypercapnia episodes during sleep apnea can trigger physiological responses, including activation of the sympathetic nervous system, oxidative stress, inflammation, and endothelial dysfunction. Excessive daytime sleepiness may be described by patients as fatigue, low energy, or difficulty concentrating, reflecting an inability to maintain full wakefulness or alertness during the wakeful portion of the sleep–wake cycle (58). Studies have also indicated a certain association between OSA and traffic accidents. However, the relationship between OSA and traffic incidents is often complex and multifactorial in etiology, necessitating further investigation into the potential causes of these events. The process of screening for OSA solely through ECG measurements remains limited (59).

In our study of the ECG image-based DL models, the validation approach presented the true detection performance of AI for OSAS. In the included studies, 26 used an independent validation set to provide results, and 13 used K-fold cross-validation to provide results. According to the results, the sensitivity and specificity for real-time OSAS detection by ECG image-based DL were 0.93 (95% CI: 0.90–0.96) and 0.95 (95% CI: 0.92–0.96), respectively. For the independent validation sets, the sensitivity and specificity of detecting OSAS with ECG image-based DL were 0.93 (95% CI: 0.88–0.96) and 0.95 (95% CI: 0.92–0.97), respectively. For the K-fold cross-validation sets, the sensitivity and specificity of detecting OSAS using ECG image-based DL were, respectively, 0.94 (95% CI: 0.88–0.97) and 0.94 (95% CI: 0.89–0.96). The performance of independent validation sets and K-fold cross-validation showed no significant differences, indicating that DL methods for OSAS detection are highly stable. Moreover, among the included studies, Feiteng (22) designed a user-friendly OSAS monitoring system equipped with multimedia devices for accurate and efficient OSAS detection in intelligent healthcare management. In the study by Hemrajani, P (31), a compact, accurate, and portable wearable device, Sleepify, was developed to address the cumbersome and time-consuming nature of PSG, allowing patients to comfortably wear the device at home, record their ECG signals, and detect sleep apnea events, with the device alerting them to any incidents during the night.

Advantages and limitations of the study

The DL models we investigated analyze long-duration single-lead ECG records of patients, and use one-dimensional ECG signals as input to detect apnea events, which also involve training and validation of the collected data. This approach does not involve QRS complex detection or analysis of RR intervals or cardiac function and requires less labor compared to traditional diagnostic methods, offering greater convenience and high accuracy.

In comparison with non-traditional detection methods, such as polysomnography (PSG) and blood biomarkers, our method is more straightforward in operation. Nevertheless, it is undeniable that our detection method is influenced by numerous factors. These factors encompass the instruments employed and the proficiency of the testing personnel. Additionally, given that the data utilized is sourced from public databases, there are indeed certain constraints in this respect.

As this is a retrospective study, it is seriously lacking in the explanation of blinding. In this deep learning, there are inherent rules for determining the threshold. We believe that it used the threshold rather than set the threshold probability. Therefore, we think that this does not bring an excessively high risk of bias to the entire training process.

Nevertheless, the specificity in our study population remains a challenge, mainly due to our modeling data primarily coming from four public databases: the 2000 Cardiology Association Physical Apnea-ECG Database, the PhysioNet publicly available Apnea-ECG Database, the UCDDB, and the Philips University Physical Apnea-ECG Database. These databases used different units for ECG recording, such as 1 min, 3 min (28), 5 min (33, 53), and 6 min ECG recording segments (26), the majority of them developed models based on segments with a duration of 1 min. Only a small number of models were constructed using segments of 5 min, 3 min, and 6 min. Moreover, some of these studies utilized cross - validation. As a result, it was challenging for us to perform further subgroup analysis. This also represents a limitation of our present study.

In addition, since the diagnosis of OSA was not clearly marked in most of the studies we included, we were unable to further present precise information about the diagnostic process in the original literature.

Additionally, the data included in our manuscript is not comprehensive; for instance, data from regions such as South America and New England are not represented. While there is a noted association between OSA and traffic accidents, the etiological relationship is often complex and multifactorial, requiring further investigation into the potential causes of these incidents. The screening process based solely on ECG measurements remains insufficient. Furthermore, some studies have highlighted the role of nocturnal blood oxygen saturation in the diagnosis of OSA. However, due to the limitations of the included literature, our study did not incorporate nocturnal blood pressure or nocturnal blood oxygen saturation measurements.

Also, We endeavored to utilize external validation to explore the influence of the same dataset across different studies on our research outcomes. Nevertheless, the scarcity of external validation datasets has impeded our ability to conduct a more in - depth analysis in this regard.

Among the 39 studies in our review, 33 used a 1-min window, one used a 3-min window, two used a 5-min window, one used a 6-min window, and two did not explicitly report the segment duration. For the meta-analysis, we employed a bivariate random-effects model, which requires at least four studies providing 2 × 2 contingency tables for diagnostic accuracy. Due to the limited number of studies in each subgroup defined by segment duration, we were unable to perform stratified subgroup analyses. This represents a limitation of our study.

Due to the existence of diverse frameworks in deep learning and the use of the bivariate mixed-effects model, at least four studies are required. Among the included studies, there are significant differences in these deep learning frameworks, and some have even conducted partial ablation experiments. Therefore, we were unable to further discuss the detection performance of different deep learning methods in this aspect. We directly summarized them as ideal diagnostic tools. This is also a major approach for discussing the detection of diseases by deep learning based on a single systematic review.

Conclusion

The results from this study suggest that medical image-based DL methods have demonstrated marked efficacy and safety profiles in real-time monitoring of OSAS, and the use of AI for the diagnosis of OSAS is a novel and effective diagnostic approach, which can provide reliable evidence for future development and design of real-time monitoring tools for OSAS.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

AS: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft, Writing – review & editing. ZN: Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft. YG: Conceptualization, Funding acquisition, Supervision, Validation, Writing – review & editing. PF: Data curation, Formal analysis, Methodology, Software, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the Special Project for Research and Development of Key Public Health Technologies and Construction of Epidemic Prevention System in Xinjiang (Grant No: 2020A03004–1(B)), and the Xinjiang Uygur Autonomous Region “Tianshan Elite” High-Level Medical and Health Talent Program (Grant No: TSYC202401A069).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fneur.2025.1663851/full#supplementary-material

References

  • 1.

    Drager LF Togeiro SM Polotsky VY Lorenzi-Filho G . Obstructive sleep apnea: a cardiometabolic risk in obesity and the metabolic syndrome. J Am Coll Cardiol. (2013) 62:56976. doi: 10.1016/j.jacc.2013.05.045

  • 2.

    Yeghiazarians Y Jneid H Tietjens JR Redline S Brown DL El-Sherif N et al . Obstructive sleep Apnea and cardiovascular disease: a scientific statement from the American Heart Association. Circulation. (2021) 144:e5667. doi: 10.1161/CIR.0000000000000988

  • 3.

    Mannarino MR Di Filippo F Pirro M . Obstructive sleep apnea syndrome. Eur J Intern Med. (2012) 23:58693. doi: 10.1016/j.ejim.2012.05.013

  • 4.

    Javaheri S Barbe F Campos-Rodriguez F Dempsey JA Khayat R Javaheri S et al . Sleep Apnea: types, mechanisms, and clinical cardiovascular consequences. J Am Coll Cardiol. (2017) 69:84158. doi: 10.1016/j.jacc.2016.11.069

  • 5.

    Lyons MM Bhatt NY Pack AI Magalang UJ . Global burden of sleep-disordered breathing and its implications. Respirology. (2020) 25:690702. doi: 10.1111/resp.13838

  • 6.

    Patil SP . What every clinician should know about polysomnography. Respir Care. (2010) 55:117995.

  • 7.

    Bilgin C Erkorkmaz U Ucar MK Akin N Nalbant A Annakkaya AN . Use of a portable monitoring device (Somnocheck Micro) for the investigation and diagnosis of obstructive sleep apnoea in comparison with polysomnography. Pak J Med Sci. (2016) 32:4715. doi: 10.12669/pjms.322.9561

  • 8.

    Maslej N Fattorini L Brynjolfsson E Etchemendy J Ligett K Lyons T et al . The AI index 2023 annual report. Stanford, CA: Stanford University (2023).

  • 9.

    Coiera E . The fate of medicine in the time of AI. Lancet. (2018) 392:23312. doi: 10.1016/S0140-6736(18)31925-1

  • 10.

    Chen ZH Lin L Wu CF Li CF Xu RH Sun Y . Artificial intelligence for assisting cancer diagnosis and treatment in the era of precision medicine. Cancer Commun. (2021) 41:110015. doi: 10.1002/cac2.12215

  • 11.

    Bhinder B Gilvary C Madhukar NS Elemento O . Artificial intelligence in Cancer research and precision medicine. Cancer Discov. (2021) 11:90015. doi: 10.1158/2159-8290.CD-21-0090

  • 12.

    Shen D Wu G Suk HI . Deep learning in medical image analysis. Annu Rev Biomed Eng. (2017) 19:22148. doi: 10.1146/annurev-bioeng-071516-044442

  • 13.

    Qiu S Miller MI Joshi PS Lee JC Xue C Ni Y et al . Multimodal deep learning for Alzheimer's disease dementia assessment. Nat Commun. (2022) 13:3404. doi: 10.1038/s41467-022-31037-5

  • 14.

    Whiting PF Rutjes AW Westwood ME Mallett S Deeks JJ Reitsma JB et al . QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. (2011) 155:52936. doi: 10.7326/0003-4819-155-8-201110180-00009

  • 15.

    Bahrami M Forouzanfar M . Detection of sleep Apnea from single-Lead ECG: Comparison of deep learning algorithms. 2021 IEEE international symposium on medical measurements and applications (MeMeA); 2021IEEE: Switzerland

  • 16.

    Chang HY Yeh CY Lee CT Lin CC . A sleep Apnea detection system based on a one-dimensional deep convolution neural network model using single-Lead electrocardiogram. Sensors. (2020) 20:4157. doi: 10.3390/s20154157

  • 17.

    Chen J Shen M Ma W Zheng W . A spatio-temporal learning-based model for sleep apnea detection using single-lead ECG signals. Front Neurosci. (2022) 16:972581. doi: 10.3389/fnins.2022.972581

  • 18.

    Ullah N Mahmood T Kim SG Nam SH Sultan H Park KR . DCDA-net: dual-convolutional dual-attention network for obstructive sleep apnea diagnosis from single-lead electrocardiograms. Eng Appl Artif Intell. (2023) 123:106451. doi: 10.1016/j.engappai.2023.106451

  • 19.

    Zarei A Beheshti H Asl BM . Detection of sleep apnea using deep neural networks and single-lead ECG signals. Biomed Signal Process Control. (2022) 71:103125. doi: 10.1016/j.bspc.2021.103125

  • 20.

    Yang Q Zou L Wei K Liu G . Obstructive sleep apnea detection from single-lead electrocardiogram signals using one-dimensional squeeze-and-excitation residual group network. Comput Biol Med. (2022) 140:105124. doi: 10.1016/j.compbiomed.2021.105124

  • 21.

    Wang Z Peng C Li B Penzel T Liu R Zhang Y et al . Single-lead ECG based multiscale neural network for obstructive sleep apnea detection. Inter Things. (2022) 20:100613. doi: 10.1016/j.iot.2022.100613

  • 22.

    Teng F Wang D Yuan Y Zhang H Singh AK Lv Z . Multimedia monitoring system of obstructive sleep apnea via a deep active learning model. IEEE Multi. (2022) 29:4856. doi: 10.1109/MMUL.2022.3146141

  • 23.

    Setiawan F Lin CW . A deep learning framework for automatic sleep Apnea classification based on empirical mode decomposition derived from single-Lead electrocardiogram. Life. (2022) 12:1509. doi: 10.3390/life12101509

  • 24.

    Paul T Hassan O Alaboud K Islam H Rana MKZ Islam SK et al . ECG and SpO(2) signal-based real-time sleep apnea detection using feed-forward artificial neural network. AMIA Jt Summits Transl Sci Proc. (2022) 2022:37985.

  • 25.

    Qin H Liu G . A dual-model deep learning method for sleep apnea detection based on representation learning and temporal dependence. Neurocomputing. (2022) 473:2436. doi: 10.1016/j.neucom.2021.12.001

  • 26.

    Hu S Cai W Gao T Wang M . A hybrid transformer model for obstructive sleep Apnea detection based on self-attention mechanism using single-Lead ECG. IEEE Trans Instrum Meas. (2022) 71:111. doi: 10.1109/TIM.2022.3193169

  • 27.

    Gupta K Bajaj V Ansari IA . Osacn-net: automated classification of sleep apnea using deep learning model and smoothed Gabor spectrograms of ECG signal. IEEE Trans Instrum Meas. (2022) 71:19. doi: 10.1109/TIM.2021.3132072

  • 28.

    Liu H Cui S Zhao X Cong F . Detection of obstructive sleep apnea from single-channel ECG signals using a CNN-transformer architecture. Biomed Signal Process Control. (2023) 82:104581. doi: 10.1016/j.bspc.2023.104581

  • 29.

    Kumar Tyagi P Agrawal D . Automatic detection of sleep apnea from single-lead ECG signal using enhanced-deep belief network model. Biomed Signal Process Control. (2023) 80:104401. doi: 10.1016/j.bspc.2022.104401

  • 30.

    Kumar CB Mondal AK Bhatia M Panigrahi BK Gandhi TK . Self-supervised representation learning-based OSA detection method using Single-Channel ECG signals. IEEE Trans Instrum Meas. (2023) 72:115. doi: 10.1109/TIM.2023.3261931

  • 31.

    Hemrajani P Dhaka VS Rani G Shukla P Bavirisetti DP . Efficient deep learning based hybrid model to detect obstructive sleep apnea. Sensors. (2023) 23:104692. doi: 10.3390/s23104692

  • 32.

    Chen X Ma W Gao W Fan X . BAFNet: bottleneck attention based fusion network for sleep apnea detection. IEEE J Biomed Health Inform. (2023) 28:112. doi: 10.1109/JBHI.2023.3278657

  • 33.

    Chen X Chen Y Ma W Fan X Li Y . Toward sleep apnea detection with lightweight multi-scaled fusion network. Knowl-Based Syst. (2022) 247:108783. doi: 10.1016/j.knosys.2022.108783

  • 34.

    Cao K Lv X . Multi-task feature fusion network for obstructive sleep apnea detection using single-lead ECG signal. Measurement. (2022) 202:111787. doi: 10.1016/j.measurement.2022.111787

  • 35.

    Bahrami M Forouzanfar M . Deep learning forecasts the occurrence of sleep Apnea from single-Lead ECG. Cardiovasc Eng Technol. (2022) 13:80915. doi: 10.1007/s13239-022-00615-5

  • 36.

    Yu Y Yang Z You Y Shan W . FASSNet: fast apnea syndrome screening neural network based on single-lead electrocardiogram for wearable devices. Physiol Meas. (2021) 42:085005. doi: 10.1088/1361-6579/ac184e

  • 37.

    Li K Pan W Li Y Jiang Q Liu G . A method to detect sleep apnea based on deep neural network and hidden Markov model using single-lead ECG signal. Neurocomputing. (2018) 294:94101. doi: 10.1016/j.neucom.2018.03.011

  • 38.

    Wang L Lin Y Wang J . A RR interval based automated apnea detection approach using residual network. Comput Methods Prog Biomed. (2019) 176:93104. doi: 10.1016/j.cmpb.2019.05.002

  • 39.

    Feng K Qin H Wu S Pan W Liu G . A sleep Apnea detection method based on unsupervised feature learning and single-Lead electrocardiogram. IEEE Trans Instrum Meas. (2021) 70:112. doi: 10.1109/TIM.2020.3017246

  • 40.

    Faust O Barika R Shenfield A Ciaccio EJ Acharya UR . Accurate detection of sleep apnea with long short-term memory network based on RR interval signals. Knowl-Based Syst. (2021) 212:106591. doi: 10.1016/j.knosys.2020.106591

  • 41.

    Urtnasan E Park JU Joo EY Lee KJ . Automated detection of obstructive sleep Apnea events from a single-Lead electrocardiogram using a convolutional neural network. J Med Syst. (2018) 42:104. doi: 10.1007/s10916-018-0963-0

  • 42.

    Sharan RV Berkovsky S Xiong H Coiera E . ECG-derived heart rate variability interpolation and 1-D convolutional neural networks for detecting sleep Apnea. Annu Int Conf IEEE Eng Med Biol Soc. (2020) 2020:63740. doi: 10.1109/EMBC44109.2020.9175998

  • 43.

    Mashrur FR Islam MS Saha DK Islam SMR Moni MA . SCNN: scalogram-based convolutional neural network to detect obstructive sleep apnea using single-lead electrocardiogram signals. Comput Biol Med. (2021) 134:104532. doi: 10.1016/j.compbiomed.2021.104532

  • 44.

    Zhang J Tang Z Gao J Lin L Liu Z Wu H et al . Automatic detection of obstructive sleep Apnea events using a deep CNN-LSTM model. Comput Intell Neurosci. (2021) 2021:5594733. doi: 10.1155/2021/5594733

  • 45.

    Fang H Lu C Hong F Jiang W Wang T . Sleep apnea detection based on multi-scale residual network. Life. (2022) 12:119. doi: 10.3390/life12010119

  • 46.

    Shen Q Qin H Wei K Liu G . Multiscale deep neural network for obstructive sleep Apnea detection using RR interval from single-Lead ECG signal. IEEE Trans Instrum Meas. (2021) 70:113. doi: 10.1109/TIM.2021.3062414

  • 47.

    Thompson S Fergus P Chalmers C Reilly D Detection of obstructive sleep Apnoea using features extracted from segmented time-series ECG signals using a one dimensional convolutional neural network. 2020IEEE Access (99):1 doi:10.1109/ACCESS.2023.3346689

  • 48.

    Lweesy K Fraiwan L Khasawneh N Dickhaus H . New automated detection method of OSA based on artificial neural networks using P-wave shape and time changes. J Med Syst. (2011) 35:72334. doi: 10.1007/s10916-009-9409-z

  • 49.

    Nasifoglu H Erogul O . Obstructive sleep apnea prediction from electrocardiogram scalograms and spectrograms using convolutional neural networks. Physiol Meas. (2021) 42:065010. doi: 10.1088/1361-6579/ac0a9c

  • 50.

    Niroshana SMI Zhu X Nakamura K Chen W . A fused-image-based approach to detect obstructive sleep apnea using a single-lead ECG and a 2D convolutional neural network. PLoS One. (2021) 16:e0250618. doi: 10.1371/journal.pone.0250618

  • 51.

    Sheta AF Turabieh H Thaher T Too J Mafarja MM Hossain MS et al . Diagnosis of obstructive sleep apnea from ECG signals using machine learning and deep learning classifiers. Appl Sci. (2021) 11:6622. doi: 10.3390/app11146622

  • 52.

    Singh H Tripathy RK Pachori RB . Detection of sleep apnea from heart beat interval and ECG derived respiration signals using sliding mode singular spectrum analysis. Digit Signal Process. (2020) 104:102796. doi: 10.1016/j.dsp.2020.102796

  • 53.

    Wang T Lu C Shen G Hong F . Sleep apnea detection from a single-lead ECG signal with automatic feature-extraction through a modified LeNet-5 convolutional neural network. PeerJ. (2019) 7:e7731. doi: 10.7717/peerj.7731

  • 54.

    Huysmans D Borzée P Testelmans D Buyse B Willemen T Huffel SV et al . Evaluation of a commercial ballistocardiography sensor for sleep apnea screening and sleep monitoring. Sensors. (2019) 19:133. doi: 10.3390/s19092133

  • 55.

    Zorlu D Ozyurt S Bırcan HA Erturk A . Do complete blood count parameters predict diagnosis and disease severity in obstructive sleep apnea syndrome?Eur Rev Med Pharmacol Sci. (2021) 25:402736. doi: 10.26355/eurrev_202106_26044

  • 56.

    Nagappa M Liao P Wong J Auckley D Ramachandran SK Memtsoudis S et al . Validation of the STOP-bang questionnaire as a screening tool for obstructive sleep Apnea among different populations: a systematic review and Meta-analysis. PLoS One. (2015) 10:e0143697. doi: 10.1371/journal.pone.0143697

  • 57.

    Roos M Althaus W Rhiel C Penzel T Peter JH von Wichert P . Comparative use of MESAM IV and polysomnography in sleep-related respiratory disorders. Pneumologie. (1993) 47:1128.

  • 58.

    Slater G Steier J . Excessive daytime sleepiness in sleep disorders. J Thorac Dis. (2012) 4:60816. doi: 10.3978/j.issn.2072-1439.2012.10.07

  • 59.

    Felix M Intriago Alvarez MB Vanegas E Farfán Bajaña MJ Sarfraz Z Sarfraz A et al . Risk of obstructive sleep apnea and traffic accidents among male bus drivers in Ecuador: is there a significant relationship?Ann Med Surg. (2022) 74:103296. doi: 10.1016/j.amsu.2022.103296

Summary

Keywords

deep learning, diagnosis, ECG, meta-analysis, OSAS

Citation

Saiyitijiang A, Nai Z, Gao Y and Fan P (2025) Accuracy of deep learning in diagnosis of apnea syndrome: a systematic review and meta-analysis. Front. Neurol. 16:1663851. doi: 10.3389/fneur.2025.1663851

Received

11 July 2025

Revised

27 October 2025

Accepted

14 November 2025

Published

01 December 2025

Volume

16 - 2025

Edited by

Gian Luigi Gigli, University of Udine, Italy

Reviewed by

Huu Hoang Nguyen, University of Medicine and Pharmacy at Ho Chi Minh City, Vietnam

Ruchi Patel, Marwadi University, India

Updates

Copyright

*Correspondence: Ping Fan,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics