- Department of Emergency Medicine, Affiliated Kunshan Hospital of Jiangsu University, Kunshan, China
Background: Accurate mortality prediction in emergency departments (ED) is crucial for timely intervention and resource allocation. This study developed and compared multiple machine learning models to predict in-hospital mortality among ED patients.
Methods: We retrospectively analyzed 1,389 ED patients admitted to Affiliated Kunshan Hospital of Jiangsu University between January and December 2021. After excluding patients under 16 years and those transferred or discharged against medical advice, we collected demographic data, vital signs, and laboratory results within 30 min of ED arrival. Nine machine learning models including Logistic Regression, Random Forest, XGBoost, LightGBM, Gradient Boosting, Support Vector Machine (SVM), Neural Network, AdaBoost, and an ensemble voting classifier were developed and compared using metrics including area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and calibration.
Results: Among 1,389 patients (mean age 67.72 ± 19.28 years, 63.1% male), the mortality rate was 11.59%. LightGBM demonstrated the best performance with an AUROC of 0.9605 (95% CI: 0.94–0.98), sensitivity of 78.12%, and specificity of 93.90%. The ensemble voting classifier achieved comparable performance (AUROC: 0.9599). SHAP analysis identified serum lactate (importance: 0.252), Glasgow Coma Scale (GCS) (0.085), albumin (0.075), base excess (BE) (0.061), and systolic blood pressure (SBP) (0.049) as the top five predictive features. Calibration curves demonstrated excellent agreement between predicted and observed mortality rates, and decision curve analysis confirmed clinical utility across various threshold probabilities. Risk stratification based on predicted mortality probabilities effectively separated patients into prognostically distinct groups.
Conclusion: Machine learning models, particularly LightGBM, provide highly accurate mortality prediction for ED patients. The integration of readily available clinical and laboratory parameters enables early risk stratification and may facilitate targeted interventions to improve patient outcomes.
1 Introduction
Emergency departments serve as critical entry points for acutely ill patients, where rapid assessment and appropriate triage decisions can significantly impact patient outcomes. Accurate early prediction of mortality risk is essential for optimal resource allocation, treatment prioritization, and informed clinical decision-making (1, 2). Traditional scoring systems such as the Modified Early Warning Score (MEWS) and the National Early Warning Score (NEWS) have been widely used in emergency settings; however, these tools often rely on limited variables and may not capture the complex, non-linear relationships between multiple clinical parameters and patient outcomes (3–7).
The advent of machine learning (ML) techniques has revolutionized predictive modeling in healthcare by enabling the analysis of high-dimensional data and the identification of complex patterns that may be imperceptible to traditional statistical methods (8–10). Recent studies have demonstrated the superiority of ML algorithms over conventional scoring systems in predicting various clinical outcomes, including mortality, sepsis, and acute kidney injury (11–15). Tree-based ensemble methods, such as Random Forest, XGBoost, and LightGBM, have shown particular promise due to their ability to handle non-linear relationships, interactions between variables, and missing data while maintaining interpretability through feature importance analysis (13, 14, 16–18).
Despite the growing body of literature on ML applications in emergency medicine, several gaps remain. First, most existing studies focus on specific patient populations or disease conditions, limiting their generalizability to the heterogeneous ED setting. Second, there is a lack of comprehensive comparison of multiple state-of-the-art ML algorithms using standardized evaluation metrics. Third, the interpretability of ML models—a critical requirement for clinical adoption—has not been adequately addressed through modern explainable AI techniques such as SHapley Additive exPlanations (SHAP) (19–21).
Motivated by these gaps, we developed a reproducible modeling pipeline using early ED variables available within 30 min of arrival (demographics, vital signs, GCS, and arterial blood gas–related laboratory markers) and performed a head-to-head comparison of nine common ML approaches, including gradient-boosting frameworks (XGBoost, LightGBM) and an ensemble soft-voting classifier. We evaluated each model across discrimination, calibration, and decision-analytic utility, and used SHAP to improve interpretability of the top-performing model.
Accordingly, our objectives were to: (1) identify the optimal ML algorithm for in-hospital mortality prediction in an unselected ED cohort; (2) determine the most important predictive features; (3) evaluate model calibration and clinical utility; and (4) provide an interpretable and reproducible framework that can support future external validation and implementation.
2 Methods
2.1 Study design and setting
This retrospective cohort study was conducted at the Emergency Department of Affiliated Kunshan Hospital of Jiangsu University, a tertiary care hospital in Jiangsu Province, China. The study was approved by the Ethics Review Committee of Kunshan Affiliated Hospital of Jiangsu University (No. 2024–03-002-K01) and conducted in accordance with the Declaration of Helsinki. The requirement for informed consent was waived due to the retrospective nature of the study.
2.2 Study population and data collection
We included all patients aged 16 years or older who presented to the ED between January 1, 2021, and December 31, 2021. Exclusion criteria comprised: (1) patients under 16 years of age; (2) patients transferred to other institutions; and (3) patients who left against medical advice. The primary outcome was in-hospital mortality (all-cause death during hospitalization).
We collected comprehensive clinical data including demographic parameters (age and gender), vital signs measured within 30 min of ED arrival (systolic blood pressure [SBP], heart rate, respiratory rate, body temperature, and peripheral oxygen saturation [SpO2]), Glasgow Coma Scale (GCS) score, and laboratory results from arterial blood gas analysis. Blood samples were obtained within 30 min of ED arrival and analyzed using the ABL90 FLEX blood gas analyzer (Radiometer Medical ApS, Denmark). Laboratory parameters included pH, serum lactate, base excess (BE), hemoglobin (HB), white blood cell count (WBC), lymphocyte count, platelet count, glucose, albumin, prothrombin time (PT), and activated partial thromboplastin time (APTT).
2.3 Data preprocessing
All records with missing values in any variable were excluded from the analysis to ensure data completeness and model reliability. The final cleaned dataset comprised 1,389 patients with complete data for all variables.
2.4 Model development and validation
We randomly split the dataset into training (80%, n = 1,111) and testing (20%, n = 278) sets using stratified sampling to maintain the original outcome distribution. For models requiring scaled inputs (Logistic Regression, SVM, and Neural Network), we applied standardization using StandardScaler to normalize features to zero mean and unit variance, fitted on the training set and applied to the testing set to prevent data leakage.
To enable reproducibility, all splits and model training were performed using a fixed random seed (random_state = 42) in scikit-learn and equivalent seed settings in XGBoost/LightGBM where applicable.
Nine ML algorithms were developed and compared: (1) Logistic Regression with L2 regularization; (2) Random Forest with 200 trees; (3) eXtreme Gradient Boosting (XGBoost); (4) Light Gradient Boosting Machine (LightGBM); (5) Gradient Boosting; (6) Support Vector Machine with radial basis function kernel; (7) Multi-layer Perceptron Neural Network with three hidden layers (100, 50, 25 neurons); (8) Adaptive Boosting (AdaBoost); and (9) an ensemble voting classifier combining the top-performing models using soft voting. All tree-based models incorporated class balancing to address the imbalanced nature of mortality outcomes. Hyperparameters were optimized through grid search with 5-fold stratified cross-validation on the training set.
2.5 Model evaluation
Model performance was assessed using multiple metrics. Discrimination was evaluated using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity, specificity, positive predictive value (precision), F1-score, and Brier score.
Cross-validation performance was assessed using 5-fold stratified cross-validation to evaluate model stability and generalizability. Learning curves were generated for the best-performing model to assess the relationship between training set size and model performance.
Risk stratification was performed by categorizing patients into four risk groups based on predicted mortality probability: low risk (0–25%), medium risk (25–50%), high risk (50–75%), and very high risk (75–100%). We compared actual mortality rates across these risk strata.
2.6 Feature importance and model interpretability
Feature importance was quantified for all models supporting this functionality. For tree-based models, we used built-in feature importance scores based on information gain. We calculated mean feature importance across all models to identify the most consistently important predictors.
To enhance model interpretability, we applied SHAP analysis to the best-performing model. SHAP values quantify each feature’s contribution to individual predictions using game-theoretic principles, providing both global feature importance and local explanations for individual predictions. This allows clinicians to understand not only which features are globally important, but also how specific values of key variables (e.g., a particular lactate level or GCS score) contribute to a given patient’s predicted mortality risk.
2.7 Statistical analysis
Descriptive statistics are presented as mean ± standard deviation and median [minimum-maximum] for continuous variables, and as frequency (percentage) for categorical variables. All analyses were performed using Python 3.10 with scikit-learn 1.3, XGBoost 2.0, LightGBM 4.1, CatBoost 1.2, and SHAP 0.43 libraries. Statistical significance was defined as p < 0.05.
3 Results
3.1 Patient characteristics
After applying exclusion criteria and removing cases with missing data, the final cohort comprised 1,389 patients. The mean age was 67.72 ± 19.28 years (median: 73 years, range: 16–104 years), with males comprising 63.1% (n = 876) of the cohort. The overall in-hospital mortality rate was 11.59% (161 deaths). Baseline characteristics are summarized in Table 1. The mean GCS score was 13.72 ± 2.67, indicating generally preserved consciousness across the cohort. Mean vital signs included SBP 141.28 ± 32.47 mmHg, heart rate 100.16 ± 25.28 bpm, respiratory rate 23.40 ± 5.59 breaths/min, temperature 37.21 ± 0.94 °C, and SpO2 90.96 ± 10.24%. Laboratory results showed mean serum lactate of 2.64 ± 3.15 mmol/L, pH 7.36 ± 0.13, BE −1.21 ± 8.18 mmol/L, hemoglobin 131.43 ± 27.77 g/L, WBC 11.16 ± 7.34 × 1109/L, lymphocyte count 1.67 ± 3.25 × 109/L, platelet count 202.96 ± 86.16 × 109/L, glucose 11.14 ± 7.80 mmol/L, albumin 39.97 ± 11.00 g/L, PT 12.75 ± 4.64 s, and APTT 31.11 ± 7.20 s. To further characterize the heterogeneity of the study population, baseline variables stratified by survival status (survivors vs. non-survivors) are provided in Supplementary Table S1.
3.2 Distribution analysis
Kernel density estimation plots revealed distinct distributional differences between survivors and non-survivors across multiple variables. Non-survivors exhibited higher serum lactate levels, lower GCS scores, more negative BE values, and lower SBP compared to survivors (Figure 1).
3.3 Model performance comparison
All nine models demonstrated excellent discrimination ability with AUROC values exceeding 0.87 (Table 2). LightGBM achieved the highest AUROC of 0.9605 (95% CI: 0.94–0.98), followed closely by the ensemble voting classifier (0.9599) and Random Forest (0.9512). LightGBM also demonstrated optimal balance across multiple metrics with accuracy of 92.09%, precision of 62.50%, sensitivity of 78.12%, specificity of 93.90%, F1-score of 69.44%, AUPRC of 0.6769, and Brier score of 0.0596, indicating excellent calibration. The ROC curves demonstrated superior performance of tree-based ensemble methods compared to traditional approaches. LightGBM, ensemble voting, and Random Forest showed nearly identical ROC curves with minimal separation from the upper-left corner (Figure 2). From a clinical perspective, this level of discrimination suggests that these models can reliably distinguish patients at low versus high risk of in-hospital mortality based on information available very early in the ED stay.
3.4 Cross-validation results
Five-fold stratified cross-validation confirmed the robustness of the models, with mean cross-validation AUROC values ranging from 0.8627 (AdaBoost) to 0.9241 (Random Forest). LightGBM achieved a mean cross-validation AUROC of 0.9070 ± 0.0382, demonstrating stable performance across different data subsets with minimal overfitting (Table 3). The learning curve for LightGBM showed convergence of training and validation scores, indicating adequate sample size and minimal overfitting.
3.5 Feature importance analysis
Feature importance analysis identified serum lactate as the most important predictor across all models (mean importance: 0.252), followed by GCS (0.085), albumin (0.075), BE (0.061), and SBP (0.049) (Table 4; Figure 3). These findings were consistent across different model architectures, lending credibility to their biological and clinical significance. Notably, gender showed minimal predictive importance (0.011), suggesting that mortality risk in the ED is primarily driven by acute physiological derangements rather than demographic factors.
3.6 SHAP analysis and model interpretability
SHAP analysis of the best-performing LightGBM modelprovided detailed insights into feature contributions. The SHAP summary plot demonstrated that higher serum lactate values consistently increased mortality risk, while higher GCS scores decreased risk (Figures 4 and 5). The SHAP feature importance plot confirmed serum lactate, GCS, and albumin as the three most influential features.
3.7 Risk stratification
Risk stratification analysis demonstrated effective separation of patients into distinct prognostic groups. The very high-risk group (predicted probability >75%, n = 34) had an observed mortality rate of 70.59%, substantially exceeding the cohort average of 11.59% (Table 5). The high-risk group (50–75%, n = 6) showed 16.67% mortality, the medium-risk group (25–50%, n = 3) showed 66.67% mortality, and the low-risk group (<25%, n = 235) showed only 2.13% mortality. However, the medium- and high-risk strata contained very few patients in the test set (n = 3 and n = 6, respectively). In contrast, the low- and very high-risk groups were larger and showed a clear and clinically meaningful gradient in observed mortality, suggesting that the model is most robust for distinguishing patients at the extremes of predicted risk.
4 Discussion
This comprehensive study compared nine machine learning algorithms for predicting in-hospital mortality among ED patients and demonstrated that tree-based ensemble methods, particularly LightGBM, achieve excellent discrimination and clinical utility. Our findings provide a framework for implementing interpretable, clinically useful prediction models.
The AUROC of 0.9605 achieved by LightGBM substantially exceeds the performance of traditional severity scoring systems. A meta-analysis by Patel et al. reported pooled AUROCs of 0.5–0.89 for EWS in predicting in-hospital mortality (22). Our results align with recent ML studies showing superior performance: Klug et al. reported an AUROC of 0.962 using gradient boosting on a large Israeli cohort (23). The exceptional performance in our study may reflect the comprehensive feature set including laboratory parameters, which are often unavailable in studies relying solely on vital signs and demographics.
The identification of serum lactate as the most important predictor aligns with extensive literature establishing lactate as a marker of tissue hypoperfusion and anaerobic metabolism. A meta-analysis by Haas et al. (24) demonstrated that initial lactate levels >10 mmol/L are associated with increased mortality across diverse patient populations. The importance of GCS as the second-ranked predictor underscores the prognostic significance of altered consciousness. Reduced GCS reflects both primary neurological injury and secondary effects of systemic illness, serving as an integrated measure of illness severity (25, 26). The relatively low importance of demographic factors (age ranked 7th, gender 19th) suggests that acute physiological derangements dominate mortality risk in the ED setting.
Our risk stratification analysis illustrates how the model might be used in practice. Patients categorized as very high risk (>75% predicted probability) experienced markedly elevated observed mortality, suggesting that they may benefit from early intensive monitoring, expedited diagnostic work-up, and prompt consideration of ICU admission when appropriate. Conversely, patients in the low-risk group (<25% predicted probability) had very low observed mortality, which might help support safe de-escalation of monitoring intensity or ED disposition decisions when consistent with the overall clinical picture. Importantly, the model is intended as a decision-support tool rather than a replacement for clinical judgment. In some cases, repeated measurements and close observation may be more informative than a single early lactate value, even though the model recognizes lactate as a strong predictor at the population level.
This study has several limitations. First, it was a retrospective, single-center analysis conducted in a tertiary hospital in China. As such, the case mix, practice patterns, and resource availability may differ from those in other institutions or countries, and external validation in independent cohorts is required before broad implementation. Second, we excluded patients with any missing values to facilitate model development on complete cases. While this approach simplifies analysis and may improve model stability, it can introduce selection bias if patients with missing data differ systematically from those with complete information. Third, we did not include comorbidity indices, medication use, or additional laboratory tests beyond those obtained from arterial blood gas analysis, which may have limited the ability of the models to capture the full spectrum of risk factors. Finally, the numbers of patients in the medium- and high-risk strata in the test set were small, leading to unstable mortality estimates in these intermediate categories; the risk stratification is most robust at the extremes (low vs. very high risk).
5 Conclusion
Machine learning models, particularly LightGBM, demonstrate excellent performance for predicting in-hospital mortality among emergency department patients. Serum lactate, GCS, albumin, base excess, and systolic blood pressure emerge as the most important predictors, all readily available within 30 min of ED arrival. These findings support the potential for implementing ML-based decision support systems to enhance early risk assessment in emergency departments.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author/s.
Ethics statement
The studies involving humans were approved by the Affiliated Kunshan Hospital of Jiangsu University Clinical Project. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin because given the retrospective nature of the study, the requirement for informed consent was waived. All patient data were anonymized to ensure confidentiality and to protect the privacy of the individuals included in the study.
Author contributions
ZJ: Conceptualization, Writing – review & editing, Software, Resources, Writing – original draft, Supervision, Data curation, Project administration. JM: Data curation, Visualization, Conceptualization, Resources, Formal analysis, Writing – review & editing, Writing – original draft, Project administration. ZG: Investigation, Validation, Formal analysis, Visualization, Data curation, Writing – original draft. QF: Writing – original draft, Supervision, Investigation, Resources, Formal analysis, Methodology. HY: Writing – review & editing, Project administration, Writing – original draft, Conceptualization, Investigation, Funding acquisition.
Funding
The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the Affiliated Kunshan Hospital of Jiangsu University Clinical Project (No. KETDCX202422).
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that Generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2026.1721101/full#supplementary-material
References
1. Guan, G, Lee, CMY, Begg, S, Crombie, A, and Mnatzaganian, G. The use of early warning system scores in prehospital and emergency department settings to predict clinical deterioration: a systematic review and meta-analysis. PLoS One. (2022) 17:e0265559. doi: 10.1371/journal.pone.0265559,
2. Jones, S, Moulton, C, Swift, S, Molyneux, P, Black, S, Mason, N, et al. Association between delays to patient admission from the emergency department and all-cause 30-day mortality. Emerg Med J. (2022) 39:168–73. doi: 10.1136/emermed-2021-211572,
3. Hester, J, Youn, TS, Trifilio, E, Robinson, CP, Babi, MA, Ameli, P, et al. The modified early warning score: a useful marker of neurological worsening but unreliable predictor of Sepsis in the Neurocritically ill-a retrospective cohort study. Crit Care Explor. (2021) 3:e0386. doi: 10.1097/CCE.0000000000000386,
4. Monzon, LDR, and Boniatti, MM. Use of the modified early warning score in intrahospital transfer of patients. Rev Bras Ter Intensiva. (2020) 32:439–43. doi: 10.5935/0103-507X.20200074,
5. Edelson, DP, Churpek, MM, Carey, KA, Lin, Z, Huang, C, Siner, JM, et al. Early warning scores with and without artificial intelligence. JAMA Netw Open. (2024) 7:e2438986. doi: 10.1001/jamanetworkopen.2024.38986,
6. Williams, B. The National Early Warning Score: from concept to NHS implementation. Clin Med (Lond). (2022) 22:499–505. doi: 10.7861/clinmed.2022-news-concept,
7. Zhang, K, Zhang, X, Ding, W, Xuan, N, Tian, B, Huang, T, et al. National Early Warning Score Does not Accurately Predict Mortality for patients with infection outside the intensive care unit: a systematic review and Meta-analysis. Front Med (Lausanne). (2021) 8:704358. doi: 10.3389/fmed.2021.704358,
8. Raita, Y, Goto, T, Faridi, MK, Brown, DFM, Camargo, CA, and Hasegawa, K. Emergency department triage prediction of clinical outcomes using machine learning models. Crit Care. (2019) 23:64. doi: 10.1186/s13054-019-2351-7,
9. Almulihi, QA, Alquraini, AA, Almulihi, FAA, Alzahid, AA, al Qahtani, SSAJ, Almulhim, M, et al. Applications of artificial intelligence and machine learning in emergency medicine triage - a systematic review. Med Arch. (2024) 78:198–206. doi: 10.5455/medarh.2024.78.198-206,
10. Bomrah, S, Uddin, M, Upadhyay, U, Komorowski, M, Priya, J, Dhar, E, et al. A scoping review of machine learning for sepsis prediction- feature engineering strategies and model performance: a step towards explainability. Crit Care. (2024) 28:180. doi: 10.1186/s13054-024-04948-6,
11. Naemi, A, Schmidt, T, Mansourvar, M, Naghavi-Behzad, M, Ebrahimi, A, and Wiil, UK. Machine learning techniques for mortality prediction in emergency departments: a systematic review. BMJ Open. (2021) 11:e052663. doi: 10.1136/bmjopen-2021-052663,
12. Son, B, Myung, J, Shin, Y, Kim, S, Kim, SH, Chung, JM, et al. Improved patient mortality predictions in emergency departments with deep learning data-synthesis and ensemble models. Sci Rep. (2023) 13:15031. doi: 10.1038/s41598-023-41544-0,
13. Yu, M, Wang, S, He, K, Teng, F, Deng, J, Guo, S, et al. Predicting the complexity and mortality of polytrauma patients with machine learning models. Sci Rep. (2024) 14:8302. doi: 10.1038/s41598-024-58830-0,
14. Nikouline, A, Feng, J, Rudzicz, F, Nathens, A, and Nolan, B. Machine learning in the prediction of massive transfusion in trauma: a retrospective analysis as a proof-of-concept. Eur J Trauma Emerg Surg. (2024) 50:1073–81. doi: 10.1007/s00068-023-02423-5,
15. Taylor, RA, Pare, JR, Venkatesh, AK, Mowafi, H, Melnick, ER, Fleischman, W, et al. Prediction of in-hospital mortality in emergency department patients with Sepsis: a local big data-driven. Mach Learn Approach Acad Emerg Med. (2016) 23:269–78. doi: 10.1111/acem.12876,
16. Campbell, BR, Rooney, AS, Krzyzaniak, A, Calvo, RY, Checchi, KD, Carroll, AN, et al. Machine learning differentiates extracorporeal membrane oxygenation mortality risk profiles among trauma patients. Am Surg. (2024) 90:2640–8. doi: 10.1177/00031348241256068,
17. Deleon, A, Murala, A, Decker, I, Rajasekaran, K, and Moreira, A. Machine learning-based prediction of mortality in pediatric trauma patients. Front Pediatr. (2025) 13:1522845. doi: 10.3389/fped.2025.1522845,
18. Holtenius, J, Mosfeldt, M, Enocson, A, and Berg, HE. Prediction of mortality among severely injured trauma patients a comparison between TRISS and machine learning-based predictive models. Injury. (2024) 55:111702. doi: 10.1016/j.injury.2024.111702,
19. Guo, J, Cheng, H, Wang, Z, Qiao, M, Li, J, and Lyu, J. Factor analysis based on SHapley additive exPlanations for sepsis-associated encephalopathy in ICU mortality prediction using XGBoost - a retrospective study based on two large database. Front Neurol. (2023) 14:1290117. doi: 10.3389/fneur.2023.1290117,
20. Cao, S, and Hu, Y. Creating machine learning models that interpretably link systemic inflammatory index, sex steroid hormones, and dietary antioxidants to identify gout using the SHAP (SHapley additive exPlanations) method. Front Immunol. (2024) 15:1367340. doi: 10.3389/fimmu.2024.1367340,
21. Chowdhury, SU, Sayeed, S, Rashid, I, Alam, MGR, Masum, AKM, and Dewan, MAA. Shapley-additive-explanations-based factor analysis for dengue severity prediction using machine learning. J Imaging. (2022) 8:229. doi: 10.3390/jimaging8090229,
22. Patel, R, Nugawela, MD, Edwards, HB, Richards, A, le Roux, H, Pullyblank, A, et al. Can early warning scores identify deteriorating patients in pre-hospital settings? systematic review. Resuscitation. (2018) 132:101–11. doi: 10.1016/j.resuscitation.2018.08.028,
23. Klug, M, Barash, Y, Bechler, S, Resheff, YS, Tron, T, Ironi, A, et al. A gradient boosting machine learning model for predicting early mortality in the emergency department triage: devising a nine-point triage score. J Gen Intern Med. (2020) 35:220–7. doi: 10.1007/s11606-019-05512-7,
24. Haas, SA, Lange, T, Saugel, B, Petzoldt, M, Fuhrmann, V, Metschke, M, et al. Severe hyperlactatemia, lactate clearance and mortality in unselected critically ill patients. Intensive Care Med. (2016) 42:202–10. doi: 10.1007/s00134-015-4127-0,
25. Gao, T, Nong, Z, Luo, Y, Mo, M, Chen, Z, Yang, Z, et al. Machine learning-based prediction of in-hospital mortality for critically ill patients with sepsis-associated acute kidney injury. Ren Fail. (2024) 46:2316267. doi: 10.1080/0886022X.2024.2316267,
26. Meral, G, Ardıç, Ş, Günay, S, Güzel, K, Köse, A, Durmuş, HG, et al. Comparative analysis of Glasgow coma scale, quick Sepsis-related organ failure assessment, base excess, and lactate for mortality prediction in critically ill emergency department patients. Turk J Emerg Med. (2024) 24:231–7. doi: 10.4103/tjem.tjem_45_24,
Keywords: emergency department, LightGBM, machine learning, mortality prediction, risk stratification, SHAP analysis
Citation: Jiang Z, Ma J, Guo Z, Feng Q and Yuan H (2026) Machine learning-based mortality prediction models for emergency department patients: a comparative analysis. Front. Med. 13:1721101. doi: 10.3389/fmed.2026.1721101
Edited by:
Miodrag Zivkovic, Singidunum University, SerbiaReviewed by:
Nebojsa Bacanin, Singidunum University, SerbiaOral Menteş, Gülhane Askerî Tıp Akademisi, Türkiye
Copyright © 2026 Jiang, Ma, Guo, Feng and Yuan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Hua Yuan, eWVraW5nM0Bob3RtYWlsLmNvbQ==
†These authors have contributed equally to this work
Zhen Jiang†