AUTHOR=Xiong Yuanguo , Cai Xu , Lai Xin , Wang Yuwen , Xin Hao , Song Wei , Lv Feng , Guo Xianxi , Yang Ge , Wu Yue TITLE=Real-world data-driven early warning system for risk-stratified liver injury in hospitalized COVID-19 patients—Machine learning models for clinical decision support JOURNAL=Frontiers in Public Health VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1566260 DOI=10.3389/fpubh.2025.1566260 ISSN=2296-2565 ABSTRACT=ObjectiveTo develop and validate a real-world evidence-driven early warning system for the risk-stratified prediction of coronavirus disease 2019 (COVID-19)-associated hepatic dysfunction in hospitalized patients, leveraging interpretable machine learning models to provide clinically actionable decision support for timely intervention.MethodsA retrospective single-center cohort study was conducted utilizing high-resolution electronic health records (EHRs) from 983 hospitalized COVID-19 patients. Clinical features (e.g., laboratory results, medication exposures, and disease progression markers) were systematically analyzed. To mitigate class imbalance, we employed the Synthetic Minority Oversampling TEchnique (SMOTE) prior to model development. Thirteen distinct machine learning (ML) algorithms were trained and benchmarked to construct an optimal risk stratification framework. Model performance was rigorously evaluated using metrics, including accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). SHapley Additive exPlanations (SHAP) analysis was employed to enhance clinical interpretability and provide transparent insights for decision-making.ResultsThe SMOTE-edited nearest neighbors (ENN) technique (SMOTE-ENN) resampling strategy, combined with random forest (RF) and extra trees (ET) models, demonstrated superior predictive performance, achieving AUC values of 0.998 ± 0.002 (RF) and 0.997 ± 0.002 (ET), respectively. The SHAP-based interpretability analysis identified glutathione administration and hepatic enzymes (e.g., gamma-glutamyltransferase [GGT] and alanine aminotransferase [ALT]) as the most influential predictors. The online prediction platforms were developed for liver injury early warning risk stratification (low- and high-risk) based on predicted probabilities classification.ConclusionThis research successfully established a machine learning-powered early warning system capable of real-time risk stratification for COVID-19-associated liver injury through dynamic integration of clinical data. The ensemble RF/ET-based models demonstrated significant clinical utility as decision support tools, particularly through their ability to identify high-risk patients requiring intensified monitoring and optimize hepatoprotective. By emphasizing drug-induced injury markers and disease progression process, ML models establish a personalized monitoring framework that could potentially transform clinical management for target patients.