AUTHOR=Yang Jijun , Peng Hongbing , Luo Youhong , Zhu Tao , Xie Li TITLE=Explainable ensemble machine learning model for prediction of 28-day mortality risk in patients with sepsis-associated acute kidney injury JOURNAL=Frontiers in Medicine VOLUME=Volume 10 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2023.1165129 DOI=10.3389/fmed.2023.1165129 ISSN=2296-858X ABSTRACT=Background: Sepsis-associated acute kidney injury(S-AKI) is a significant cause of death in intensive care units(ICU). Early prediction of mortality risk to optimize clinical decisions is critical to improve prognosis. The study will apply an explainable ensemble machine learning(ML) algorithm to develop a 28-day mortality risk prediction model of S-AKI. Methods: Data on patients with S-AKI were extracted from the Medical Information Mart for Intensive Care IV (MIMIC-IV 2.0) database. The Boruta algorithm was used for feature selection. The Synthetic Minority Oversampling Technique(SMOTE) improves the imbalance of the data. Four ML models were constructed: Random Forest(RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting(XGBoost), and Logistic Regression(LR) by tuning hyperparameters with random search and five-fold cross-validation. ROC, K-S, and LIFT curves evaluated the performance of all models. The area under the receiver operating characteristic curve (AUC) compared the discrimination of ML models and traditional scoring systems. The SHapley Additive exPlanation(SHAP) interpreted the ML model and sorted essential variables. The association between the nine most important continuous variables and the 28-day mortality risk was studied by adjusting age and comorbidities using COX regression-restricted cubic splines. Results: A total of 9158 patients with S-AKI were included, with the 28-day mortality group of 1940 and the survival group of 7578. The XGBoost model had the highest AUC (0.873 95% CI 0.860-0.886), LR, RF, and GBM of 0.850 (95% CI 0.836-0.864), 0.849 (95% CI 0.834-0.863), and 0.865 (95% CI 0.860-0.886), respectively, all the better than APS-III (0.713 95% CI 0.694-0.733) and SAPS-II (0.681 95% CI 0.661-0.701). The K-S and LIFT curves suggested that XGBoost had the best predictive ability for 28-day mortality risk. ROCpr curves, calibration curves, accuracy, precision, and F1 scores evaluate XGBoost model performance. In addition, SHAP force plots interpreted the personalized, predictive power of the 28-day mortality risk model and visualized it. Conclusion: Ensemble ML models for early predicting 28-day mortality risk in S-AKI patients outperform the LR model and conventional scoring systems. Visualization of the XGBoost model with the best predictive performance helps clinicians in the early identification of S-AKI patients at 28-day mortality high risk and improved prognosis.