Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Public Health

Sec. Disaster and Emergency Medicine

Volume 13 - 2025 | doi: 10.3389/fpubh.2025.1635708

Prediction of in-hospital death among patients admitted to a tertiary care hospital over the first ten years: A machine learning approach

Provisionally accepted
Edel  Rafael Rodea-MonteroEdel Rafael Rodea-Montero1,2Brenda  Jesús Rodríguez-AlcántarBrenda Jesús Rodríguez-Alcántar2Dagoberto  Armenta-MedinaDagoberto Armenta-Medina1*
  • 1Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación, Aguascalientes, Mexico
  • 2Hospital Regional de Alta Especialidad del Bajío, Guanajuato, Mexico

The final, formatted version of the article will be published soon.

Purpose: To describe the pre-and postadmission characteristics of hospitalized patients in a tertiary care hospital and to adjust machine learning models capable of predicting and identifying the factors that are associated with and have a greater prognostic value for in-hospital death. Materials and methods: This was a retrospective study based on data from patients who were discharged from a Mexican tertiary care hospital during its first ten years of operation (2007-2016). Preadmission characteristics were analyzed using descriptive statistics. Comparison tests (Mann‒ Whitney U) and association tests (chi-square) were applied according to the absence or presence of in-hospital death. Multivariate models (logistic regression, random forest and XGBoost) were fitted. Their ROC curves were compared using the DeLong test, and performance metrics were evaluated. Results: In total, 55,253 hospital discharges were considered, only 45,011 (0-101 years) had complete data, and the rate of in-hospital death was 4.17%. In total, 70% of the data were used for training and 30% for testing. Two-to-two comparisons between areas under the curve (AUCs) revealed that XGBoost (AUC = 0.9162) outperformed logistic regression (AUC = 0.9036) and random forest (AUC = 0.8978) (p value < 0.0001 in both cases). XGBoost had a sensitivity of 87%, specificity of 81.3% and balanced efficiency of 84.2%. The most relevant predictive factors were medical service that performed the admission, number of conditions, origin of the outpatient consultation of the hospital, and the main condition diagnosed at admission according to the ICD-10, age, month of admission, and day of the week of admission. Conclusions: Owing to its ability to capture complex patterns, the XGBoost model makes it possible to identify patients with a relatively high risk of in-hospital death using the data available at hospital admission. This constitutes a support tool for decision-making, helping to determine which patients require closer monitoring and follow-up during their hospital stay to improve the quality of medical care.

Keywords: In-hospital death, Logistic regression, prediction, random forest, XGBoost

Received: 30 May 2025; Accepted: 05 Sep 2025.

Copyright: © 2025 Rodea-Montero, Rodríguez-Alcántar and Armenta-Medina. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dagoberto Armenta-Medina, Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación, Aguascalientes, Mexico

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.