Machine learning prediction model for post- hepatectomy liver failure in hepatocellular carcinoma: A multicenter study

Introduction Post-hepatectomy liver failure (PHLF) is one of the most serious complications and causes of death in patients with hepatocellular carcinoma (HCC) after hepatectomy. This study aimed to develop a novel machine learning (ML) model based on the light gradient boosting machines (LightGBM) algorithm for predicting PHLF. Methods A total of 875 patients with HCC who underwent hepatectomy were randomized into a training cohort (n=612), a validation cohort (n=88), and a testing cohort (n=175). Shapley additive explanation (SHAP) was performed to determine the importance of individual variables. By combining these independent risk factors, an ML model for predicting PHLF was established. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, and decision curve analyses (DCA) were used to evaluate the accuracy of the ML model and compare it to that of other noninvasive models. Results The AUCs of the ML model for predicting PHLF in the training cohort, validation cohort, and testing cohort were 0.944, 0.870, and 0.822, respectively. The ML model had a higher AUC for predicting PHLF than did other non-invasive models. The ML model for predicting PHLF was found to be more valuable than other noninvasive models. Conclusion A novel ML model for the prediction of PHLF using common clinical parameters was constructed and validated. The novel ML model performed better than did existing noninvasive models for the prediction of PHLF.


Introduction
In 2020, primary liver cancer was the sixth most commonly diagnosed cancer and the third leading cause of cancer-related deaths worldwide, as approximately 906,000 new cases and 830,000 deaths occurred in 2020 (1). More than 50% of the world's total new cases of liver cancer each year are attributed to hepatitis B, which has a high incidence in China (2). Radical liver resection remains the first choice of treatment for hepatocellular carcinoma (HCC) (3). Post-hepatectomy liver failure (PHLF) is the most common cause of postoperative death among patients who undergo hepatectomy for HCC (4). The incidence of PHLF has been reported to be 1.2%-32% and is attributed to different etiologies and surgical procedures (5,6) as the most common cause of early death after liver surgery (7).
A variety of comprehensive scoring systems and nomogram prediction models can be used to help predict PHLF in patients with HCC (8)(9)(10). However, no universally recognized method for the prediction of PHLF has been established. Machine learning (ML), one of the most important branches of artificial intelligence (AI), has undergone rapid development and is being widely used in the field of disease prediction, where it has achieved remarkable results in clinical practice (11). ML is widely used in cancer research, where it is applied to clinical data, radiomics, and genomics to develop predictive models for efficient and accurate decision making (12)(13)(14). ML uses computational algorithms to learn from and analyze large amounts of data in a short period of time. Therefore, ML may outperform traditional risk stratification tools via the integration of different algorithms such as decision trees, artificial neural networks, random forests, support vector machines, extreme gradient boosting, and light gradient boosting machines (LightGBM) (15). LightGBM uses a histogram-based decision tree algorithm. Compared with other ML models, the LightGBM model is characterized by fast training speed and low memory usage. ML based on LightGBM has only recently been introduced in research involving liver disease (16)(17)(18), and the LightGBM model has not yet been used to predict PHLF.
In this study, a novel ML model based on the LightGBM algorithm, namely ML PHLF, was constructed. This novel model may replace traditional scoring systems and facilitate the assessment of liver function and reduction of the incidence of PHLF and postoperative mortality after radical hepatectomy.

Study population
This retrospective study was performed using a multicenter database of patients who underwent radical hepatectomy for HCC at the following hospitals: The Xingtai People's Hospital, The Second Affiliated Hospital of Nanjing Medical University, Fifth Medical Center of People's Liberation Army (PLA) General Hospital, The First Affiliated Hospital of Dalian Medical University, and Tongji Hospital Affiliated to Huazhong University of Science and Technology. Two independent investigators (JW and JL) reviewed the baseline data, laboratory parameters, treatment records, and pathological findings. All patients were randomly divided into the training, validation, and testing cohorts at a ratio of 7:2:1. This study was performed in accordance with the ethical guidelines of the Declaration of Helsinki and approved by the Institutional Review Board (2022-006).
Patients aged > 18 years with confirmed HCC based on histopathological examination of the tumor specimen and no history of anticancer therapy, including transarterial chemoembolization, ablation, or targeted drugs, were included in this study. Patients who underwent other surgical procedures at the time of hepatectomy and those with insufficient data on important indicators, such as total bilirubin (TBIL) and international normalized ratio (INR) on or after the fifth postoperative day, were excluded from the study.

Data collection
Patient demographic data, including age, weight, body mass index (BMI), sex, presence of hypertension, etiology of liver disease, and cirrhosis, were retrieved from the medical records. Data on the surgical method (open or minimally invasive), extent of liver resection (major resection: ≥ 3 segments; minor resection:< 3 segments), requirement of intraoperative blood transfusion, number of tumors, maximum tumor diameter, and intraoperative blood loss were extracted from the preoperative and surgical records. Laboratory indicators included red blood cell (RBC) count, white blood cell (WBC) count, platelet (PLT) count, TBIL, direct bilirubin (DBIL), albumin (ALB), alanine aminotransferase (ALT), aspartate aminotransferase (AST), serum alpha-fetoprotein (AFP), carcinoembryonic antigen (CEA), creatinine (Cr), prothrombin time (PT), and INR. Portal hypertension was defined as the presence of varicose veins or a PLT count<100 x 10 9 /L and a spleen diameter > 12 cm. According to previous literature, the model for end-stage liver disease (MELD) score ( The albumin-bilirubin (ALBI) score (21) was calculated as: 0:66 Â lg (TBIL, μ mol=LÞ -0:085 Â ALB, g=L ð Þ The aminotransferase-to-platelet ratio index (APRI) score (22) was calculated as: AST level (=ULN)=platelet counts (10 9=L) ½ Â100 Finally, the Child-Turcotte-Pugh (CTP) score (23) was calculated and obtained. Based on the Chinese Society of Hepatology guidelines for the diagnosis and treatment of liver cirrhosis, the diagnosis of liver cirrhosis was made using preoperative clinical variables such as the etiology, history, clinical manifestations, complications, laboratory results, imaging examinations, or liver biopsy histology (24).

Definition of PHLF
The International Study Group of Liver Surgery (ISGLS) diagnostic criteria for PHLF were used in this study (5). PHLF was defined as an increase in TBIL and INR on or after the fifth postoperative day when compared to preoperative levels, after the exclusion of biliary obstruction as a cause for increased TBIL or INR.

Development of the ML PHLF model
A total of 835 patients, including 192 with PHLF and 683 without PHLF, were included in this study. Twenty-five clinical variables, including sex, age, weight, liver disease etiology, cirrhosis, portal hypertension, PLT count, RBC count, WBC count, TBIL, ALB, AST, ALT, DBIL, Cr, PT, INR, AFP, CEA, tumor size, tumor number, surgical approach, extent of resection, intraoperative blood loss, and intraoperative blood transfusion, were used in this study. To verify the performance of the model, 70% of the dataset was used as the training set, 10% was used as the validation set, and 20% was used as the testing set. Data from the training and validation sets were applied to LightGBM, which computed the value of each variable using a decision tree to generate a prediction model for PHLF (Figure 1).
The Shapley additive explanation (SHAP), a game-theoretic approach to interpreting the output of the ML PHLF (25,26), was used to quantitatively measure the importance of each variable and describe the overall relationship between PHLF and all variables. To obtain the best ML model for PHLF, the LightGBM algorithm was optimized by adjusting the number of iterations, number of leaves, and maximum depth of the tree. The optimal number of trees, maximum tree depth, and number of leaves obtained were combined with the hyperparameters adjusted by the validation set to construct an optimal LightGBM model. In addition, the LightGBM algorithm can speed up the training process without affecting the performance of the model. This overall increase in speed is the result of a combination of gradient-based one-sided sampling and exclusive feature bundling. Subsequently, the LightGBM model is used to establish an accurate PHLF diagnosis model with a favorable area under the receiver operating characteristic curve (AUROC). The AUROCs of the training, verification, and testing cohorts were determined.

Statistical analysis
Continuous variables with normal distribution are presented as median and interquartile range or mean and standard deviation. These variables were compared using Student's t-test. Non-normal variables were analyzed using the Mann-Whitney rank sum test. Categorical variables are presented as numbers and frequencies (%). The chi-squared test or Fisher's exact test were used to analyze categorical variables. The predictive performance of the ML PHLF model was assessed using AUROC, sensitivity, specificity, positive predictive value, and negative predictive value (NPV). Decision curve analyses (DCA) were used to measure the clinical utility of each model by calculating the net benefit at various threshold probabilities. R software (version 4.1.2) or Python software (version 3.7.9) was used for data analysis and model building. Statistical significance was set at P< 0.05.

Study population
A total of 875 patients were enrolled in this study and randomly assigned to the training (n=612), validation (n=88), and testing (n=175) cohorts at a ratio of 7:1:2. The baseline characteristics of the three groups were not significantly different ( Table 1).

Interpretation of the model using the SHAP algorithm
The top five factors associated with PHLF were PLT count, age, Cr, INR, and AFP ( Figure 2A). The top 20 variables and the correlation between high or low SHAP values and the predicted PHLF are presented in Figure 2B.  Figure 3). The ML PHLF model identified PHLF in the training cohort with a sensitivity, specificity, and NPV of 87.6%, 85.9%, and 96.0%, respectively. The sensitivity, specificity, and NPV of the ML PHLF model were 100%, 64.4%, and 100%, respectively, in the validation cohort and 87.5%, 64.4%, and 94.6%, respectively, in the testing cohort (Tables 2-4).

Comparison of the ML PHLF model and other noninvasive models
We further compared the diagnostic performance of the ML PHLF model with that of routine clinical models, such as ALBI, FIB-4, APRI, MELD, and CTP. The ML PHLF model had the highest AUC for the prediction of PHLF among the noninvasive models ( Figure 3). In addition, the AUCs for the ALBI, FIB-4, APRI, MELD, and CTP score were 0.570, 0.595, 0.568, 0.512, and 0.512, respectively, in the training cohort ( Figure 3A); 0.615, 0.632, 0.554, 0.539, and 0.500, respectively, in the validation cohort ( Figure 3B); and 0.703, 0.619, 0.613, 0.574, and 0.549, respectively, in the testing cohort ( Figure 3C). The diagnostic performances of routine clinical models in the training, validation, and testing cohorts are summarized in Tables 2, 3, and 4, respectively. The ML PHLF model added more value than did the FIB-4, APRI, ALBI, MELD, or CTP score for predicting PHLF in the training cohort ( Figure 4A). The results were similar in the validation and testing cohorts. The novel ML PHLF model was more reliable than the traditional models ( Figure 4B and Figure 4C).

Online calculator application
The ML model is composed of 29 decision trees based on the LightGBM algorithm. Owing to the large number of trees and the complex structure of each tree, only the first three and last two decision trees are shown in Supplementary Figure S1. To increase the clinical utility, a web calculator application based on   Figure S2).

Discussion
In this study, a novel ML PHLF model for predicting the risk of PHLF was developed based on the LightGBM algorithm using the data of 875 patients with HCC who underwent liver resection. This novel model exhibited the best AUC when compared to existing noninvasive prediction models and good decision making in the training, validation, and testing cohorts in this study. This valuable and reliable predictive model for PHLF may be effective in optimizing personalized treatment options for patients with HCC, allowing for early identification of patients with HCC at high risk for PHLF.
Radical liver resection remains the first-choice treatment for HCC. With the update of new surgical techniques, optimization of innovative surgical instruments, and advancement of surgical intensive care medicine, the safety of liver resection has significantly improved. Therefore, perioperative mortality after hepatectomy has decreased (27). However, PHLF remains a serious complication for patients with HCC after hepatectomy. Accurate identification of patients with a high risk of PHLF is critical. Therefore, development of a predictive model for PHLF is crucial for clinical decision making.
Previously reported, noninvasive models for the prediction of PHLF are mainly based on laboratory indicators. The FIB-4, APRI, CTP score, MELD, and ALBI are widely used scoring systems for the evaluation of liver function and have been confirmed to predict the occurrence of PHLF (28)(29)(30)(31)(32). However, the predictive efficacy of these traditional noninvasive models that are based on simple laboratory indicators is relatively poor, whereas AI-based combined models using multiple clinical parameters have greater predictive potential.
ML is a field of AI that uses data-driven mining of complex datasets to predict future outcomes (33)(34)(35). The use of various  ML algorithms to perform disease risk prediction has become a research hotspot in the field of medical big data. Various complex algorithms can be used to deeply mine the relationships between disease variables. The ML model has two advantages over other models, including the use of nonlinear functions and the consideration of the possible effects between all variables. ML algorithms have been increasingly applied to pertinent issues in the field of liver surgery (16). Mai et al. (36) developed an artificial neural network-based model to predict the risk of PHLF in patients with HCC undergoing partial hepatectomy. The predictive performance of the model exceeded that of traditional logistic regression models and commonly-used scoring systems. However, no research regarding ML models developed based on LightGBM that predict PHLF has been reported.

B A
Twenty-five clinically meaningful variables were used to develop the ML PHLF model according to the SHAP analysis in this study. Specifically, both the importance matrix plot and the SHAP results indicate that PLT count, age, Cr, INR, and AFP are the five most important contributors to the final model. The preoperative PLT count was identified as the most important factor. A meta-analysis of 13 studies (37) assessed the effects of B C A FIGURE 3 ROC curves. The ROC curves of the FIB-4 score, APRI score, CTP score, MELD score, and ALBI score are compared with that of the ML model in the training (A), validation (B), and testing (C) cohorts. ROC, receiver operating characteristic curves; ML, machine learning; FIB-4, fibrosis-4; APRI, aminotransferase to platelet ratio index; CTP, Child-Turcotte-Pugh; MELD, model for end-stage liver disease; ALBI, albumin-bilirubin.  This is the first multicenter study to explore the development and validation of a LightGBM-based model for the prediction of PHLF in patients with HCC. The ML PHLF model is based on routine clinical parameters obtained in patients with HCC. With the advantages of convenient data collection, availability, and objectiveness, the novel model is suitable for the prediction of PHLF in most clinical situations, showing good interpretability and consistency with clinical experience and demonstrating good reliability.
However, this study has some limitations. First, a selection bias was unavoidable; however, this offset has been minimized via the multicenter design. Second, the ML PHLF model is poorly interpretable, a black box, and prone to overfitting. Therefore, interpretable ML algorithms will be assessed in follow-up studies. Last, the novel ML model predicts the overall risk of PHLF as defined by the ISGLS criteria. Prospective multicenter studies are required to determine the predictive value of ML PHLF models in CTP class B and C subgroups and other PHLF diagnostic criteria, such as the 50-50 criteria.

Conclusion
In conclusion, an ML PHLF model using common clinical parameters was constructed and validated based on the  FIGURE 4 DCA curves. The DCA curves of the FIB-4 score, APRI score, CTP score, MELD score, and ALBI score are compared with that of the ML model in the training (A), validation (B), and testing (C) cohorts. sDCA, decision curve analysis; ML, machine learning; FIB-4, fibrosis-4; APRI, aminotransferase-to-platelet ratio index; CTP, Child-Turcotte-Pugh; MELD, model for end-stage liver disease; ALBI, albumin-bilirubin.

B C A
LightGBM algorithm. Compared to other noninvasive models, this novel model has the best PHLF-predictive ability. This model can be used to help accurately predict the risk of PHLF, screen high-risk PHLF subgroups, and help surgeons determine personalized treatment options.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by Ethics Committee of Xingtai People's Hospital of Hebei Province. The patients/participants provided their written informed consent to participate in this study.