AUTHOR=Qi Jing , Lei Jingchao , Li Nanyi , Huang Dan , Liu Huaizheng , Zhou Kefu , Dai Zheren , Sun Chuanzheng TITLE=Machine learning models to predict in-hospital mortality in septic patients with diabetes JOURNAL=Frontiers in Endocrinology VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2022.1034251 DOI=10.3389/fendo.2022.1034251 ISSN=1664-2392 ABSTRACT=Background: Sepsis is a leading cause of morbidity and mortality in hospitalized patients. Up to now, there is no well-established longitudinal networks from molecular mechanisms to clinical phenotypes in sepsis. Adding to the problem, about one in five patients presented with diabetes. For this subgroup, management is difficult, and prognosis is difficult to evaluated. Methods: From three databases, a total of 7001 patients were enrolled based on sepsis-3 standard and diabetes diagnosis. Input variables selection are based on the result of correlation analysis in a hand-picking way, and 53 variables were left. 5727 records were collected from Medical Information Mart for Intensive Care database, and randomly split into a training set and a internal validation at a ratio of 7:3. Then, logistic regression with lasso regularization, Bayes logistic regression, decision tree, random forest, and XGBoost were conducted to build the predictive model by using training set. Then the models were tested by internal validation set. The data from eICU Collaborative Research Database(n=815) and dtChina critical care database(n=459) were used to test the model performance as external validation set. Results: In internal validation set, the accuracy of logistic regression with lasso regularization, Bayes logistic regression, decision tree, random forest, and XGBoost were 0.878, 0.883, 0.865, 0.883, 0.882, respectively. Likewise, in external validation set1, lasso regularization=0.879, Bayes logistic regression=0.877, decision tree=0.865, random forest=0.886, and XGBoost=0.875. In external validation set2, lasso regularization=0.715, Bayes logistic regression=0.745, decision tree=0.763, random forest=0.760, and XGBoost=0.699. Conclusion: Top three models for internal validation set were bayes logistic regression, random forest and XGBoost, while top three models for external validation set1 were random forest, logistic regression and bayes logistic regression. Additionally, top three models for external validation set2 were decision tree, random forest and bayes logistic regression. Random forest model performed well with training and three validation sets. The most important features are age, albumin and lactate.