AUTHOR=Hong Wandong , Lu Yajing , Zhou Xiaoying , Jin Shengchun , Pan Jingyi , Lin Qingyi , Yang Shaopeng , Basharat Zarrin , Zippi Maddalena , Goyal Hemant 

TITLE=Usefulness of Random Forest Algorithm in Predicting Severe Acute Pancreatitis

JOURNAL=Frontiers in Cellular and Infection Microbiology

VOLUME=Volume 12 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/cellular-and-infection-microbiology/articles/10.3389/fcimb.2022.893294

DOI=10.3389/fcimb.2022.893294

ISSN=2235-2988

ABSTRACT=Background & Aims: This study aimed to develop an interpretable random forest model for predicting severe acute pancreatitis(SAP). 
Methods: Clinical and laboratory data of 648 patients with acute pancreatitis were retrospectively reviewed and randomly assigned to the training set and test set in a 3:1 ratio. Univariate analysis was used to select candidate predictors for SAP. Random Forest (RF) and Logistic Regression (LR) models were developed in the training sample. The prediction models were then applied to the test sample. The performance of risk models was measured by calculating the area under the receiver operating characteristic (ROC) curves (AUC) and area under precision recall curve. We provide visualized interpretation by using local interpretable model-agnostic explanations (LIME).
Results: The LR model was developed to predict SAP as following function: -1.10-0.13×albumin (g/L)+0.016×serum creatinine(μmol/L)+0.14×glucose(mmol/L)+1.63×pleural effusion(0/1)(No/Yes). The coefficients of this formula were utilized to build a nomogram. The RF model consist of sixteen variables identified by univariate analysis was developed and validated by a tenfold cross-validation in the training sample. Variables importance analysis suggested that blood urea nitrogen, serum creatinine, albumin, High-density lipoprotein cholesterol, low-density lipoprotein cholesterol, calcium and glucose were the most important seven predictors of SAP. The AUCs of RF model in ten-fold cross-validation of train set and in the test set was 0.89 and 0.96, respectively. Both the area under precision recall curve and the diagnostic accuracy of RF model were higher than that of both LR model and BISAP score. LIME plots were used to explain individualized prediction of RF model.
Conclusions: An interpretable random forest model exhibited the highest discriminatory performance to predict SAP. Interpretation with LIME plots would be useful for individualized prediction in clinical setting. A nomogram consists of albumin, serum creatinine, glucose and pleural effusion was useful for prediction of SAP.