AUTHOR=Xiong Yuanguo , Cai Xu , Lai Xin , Wang Yuwen , Xin Hao , Song Wei , Lv Feng , Guo Xianxi , Yang Ge , Wu Yue 

TITLE=Real-world data-driven early warning system for risk-stratified liver injury in hospitalized COVID-19 patients—Machine learning models for clinical decision support

JOURNAL=Frontiers in Public Health

VOLUME=Volume 13 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1566260

DOI=10.3389/fpubh.2025.1566260

ISSN=2296-2565

ABSTRACT=ObjectiveTo develop and validate a real-world evidence-driven early warning system for the risk-stratified prediction of coronavirus disease 2019 (COVID-19)-associated hepatic dysfunction in hospitalized patients, leveraging interpretable machine learning models to provide clinically actionable decision support for timely intervention.MethodsA retrospective single-center cohort study was conducted utilizing high-resolution electronic health records (EHRs) from 983 hospitalized COVID-19 patients. Clinical features (e.g., laboratory results, medication exposures, and disease progression markers) were systematically analyzed. To mitigate class imbalance, we employed the Synthetic Minority Oversampling TEchnique (SMOTE) prior to model development. Thirteen distinct machine learning (ML) algorithms were trained and benchmarked to construct an optimal risk stratification framework. Model performance was rigorously evaluated using metrics, including accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). SHapley Additive exPlanations (SHAP) analysis was employed to enhance clinical interpretability and provide transparent insights for decision-making.ResultsThe SMOTE-edited nearest neighbors (ENN) technique (SMOTE-ENN) resampling strategy, combined with random forest (RF) and extra trees (ET) models, demonstrated superior predictive performance, achieving AUC values of 0.998 ± 0.002 (RF) and 0.997 ± 0.002 (ET), respectively. The SHAP-based interpretability analysis identified glutathione administration and hepatic enzymes (e.g., gamma-glutamyltransferase [GGT] and alanine aminotransferase [ALT]) as the most influential predictors. The online prediction platforms were developed for liver injury early warning risk stratification (low- and high-risk) based on predicted probabilities classification.ConclusionThis research successfully established a machine learning-powered early warning system capable of real-time risk stratification for COVID-19-associated liver injury through dynamic integration of clinical data. The ensemble RF/ET-based models demonstrated significant clinical utility as decision support tools, particularly through their ability to identify high-risk patients requiring intensified monitoring and optimize hepatoprotective. By emphasizing drug-induced injury markers and disease progression process, ML models establish a personalized monitoring framework that could potentially transform clinical management for target patients.