ORIGINAL RESEARCH article
Front. Med. Technol.
Sec. Medtech Data Analytics
Volume 7 - 2025 | doi: 10.3389/fmedt.2025.1621158
Using machine learning methods to investigate the impact of comorbidities and clinical indicators on the mortality rate of COVID-19
Provisionally accepted- 1National Taiwan University Hospital, Taipei, Taiwan
- 2National Taiwan University, Taipei, Taiwan
- 3Cedars-Sinai Medical Center, Los Angeles, United States
- 4Harvard T.H. Chan School of Public Health, Boston, United States
- 5Department of Information Management, Ministry of Health and Welfare, Taipei, Taiwan
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
ABSTRACT Background This study aims to develop a machine learning model to predict the 30-day mortality risk of hospitalized COVID-19 patients while leveraging federated learning to enhance data privacy and expand the model's applicability. Additionally, SHapley Additive exPlanations (SHAP) values were utilized to assess the impact of comorbidities on mortality. Methods A retrospective analysis was conducted on 6,321 clinical records of hospitalized COVID-19 patients between January 2021 and October 2022. After excluding cases involving patients under 18 years of age and non-Omicron infections, a total of 4,081 records were analyzed. Key features included three demographic data, six vital signs at admission, and 79 underlying comorbidities. Four machine learning models were compared, including Lasso, Random Forest, XGBoost, and TabNet, with XGBoost demonstrating superior performance. Federated learning was implemented to enable collaborative model training across multiple medical institutions while maintaining data security. SHAP values were applied to interpret the contribution of each comorbidity to the model's predictions. Results A subset of 2,156 records from the Taipei branch was used to evaluate model performance. XGBoost achieved the highest AUC of 0.96 and a sensitivity of 0.94. Two versions of the XGBoost model were trained: one incorporating vital signs, suitable for emergency room applications where patients come in with unstable vital signs, and another excluding vital signs, optimized for outpatient settings where we encounter patients with multiple comorbidities. Federated learning was then applied to refine the final models using data from three hospital branches. SHAP analysis identified key comorbidities influencing 30-day mortality, providing insights into potential risk factors. Conclusion XGBoost outperformed other models making it a viable tool for both emergency and outpatient settings. The study underscores the importance of chronic disease assessment in predicting COVID-19 mortality, revealing some comorbidities such as diabetes mellitus, cerebrovascular disease and chronic lung disease to have protective association with 30-day mortality. These findings suggest potential refinements in current treatment guidelines, particularly concerning high-risk conditions. The integration of federated learning further enhances the model's clinical applicability while preserving patient privacy.
Keywords: COVID-191, mortality2, survival analysis3, machine learning 4, federated learning 5
Received: 06 May 2025; Accepted: 01 Sep 2025.
Copyright: © 2025 Hsieh, Chen, Tsao, Hu, Hsu and Lee. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Chien-Chang Lee, Department of Information Management, Ministry of Health and Welfare, Taipei, Taiwan
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.