A Novel Prognostic Scoring System of Intrahepatic Cholangiocarcinoma With Machine Learning Basing on Real-World Data

Background and Objectives Currently, the prognostic performance of the staging systems proposed by the 8th edition of the American Joint Committee on Cancer (AJCC 8th) and the Liver Cancer Study Group of Japan (LCSGJ) in resectable intrahepatic cholangiocarcinoma (ICC) remains controversial. The aim of this study was to use machine learning techniques to modify existing ICC staging strategies based on clinical data and to demonstrate the accuracy and discrimination capacity in prognostic prediction. Patients and Methods This is a retrospective study based on 1,390 patients who underwent surgical resection for ICC at Eastern Hepatobiliary Surgery Hospital from 2007 to 2015. External validation was performed for patients from 2015 to 2017. The ensemble of three machine learning algorithms was used to select the most important prognostic factors and stepwise Cox regression was employed to derive a modified scoring system. The discriminative ability and predictive accuracy were assessed using the Concordance Index (C-index) and Brier Score (BS). The results were externally validated through a cohort of 42 patients operated on from the same institution. Results Six independent prognosis factors were selected and incorporated in the modified scoring system, including carcinoembryonic antigen, carbohydrate antigen 19-9, alpha-fetoprotein, prealbumin, T and N of ICC staging category in 8th edition of AJCC. The proposed scoring system showed a more favorable discriminatory ability and model performance than the AJCC 8th and LCSGJ staging systems, with a higher C-index of 0.693 (95% CI, 0.663–0.723) in the internal validation cohort and 0.671 (95% CI, 0.602–0.740) in the external validation cohort, which was then confirmed with lower BS (0.103 in internal validation cohort and 0.169 in external validation cohort). Meanwhile, machine learning techniques for variable selection together with stepwise Cox regression for survival analysis shows a better prognostic accuracy than using stepwise Cox regression method only. Conclusions This study put forward a modified ICC scoring system based on prognosis factors selection incorporated with machine learning, for individualized prognosis evaluation in patients with ICC.

Background and Objectives: Currently, the prognostic performance of the staging systems proposed by the 8th edition of the American Joint Committee on Cancer (AJCC 8th) and the Liver Cancer Study Group of Japan (LCSGJ) in resectable intrahepatic cholangiocarcinoma (ICC) remains controversial. The aim of this study was to use machine learning techniques to modify existing ICC staging strategies based on clinical data and to demonstrate the accuracy and discrimination capacity in prognostic prediction.
Patients and Methods: This is a retrospective study based on 1,390 patients who underwent surgical resection for ICC at Eastern Hepatobiliary Surgery Hospital from 2007 to 2015. External validation was performed for patients from 2015 to 2017. The ensemble of three machine learning algorithms was used to select the most important prognostic factors and stepwise Cox regression was employed to derive a modified scoring system. The discriminative ability and predictive accuracy were assessed using the Concordance Index (C-index) and Brier Score (BS). The results were externally validated through a cohort of 42 patients operated on from the same institution.
Results: Six independent prognosis factors were selected and incorporated in the modified scoring system, including carcinoembryonic antigen, carbohydrate antigen 19-9, alpha-fetoprotein, prealbumin, T and N of ICC staging category in 8th edition of AJCC. The proposed scoring system showed a more favorable discriminatory ability and model performance than the AJCC 8th and LCSGJ staging systems, with a higher C-index of 0.693 (95% CI, 0.663-0.723) in the internal validation cohort and 0.671 (95% CI, 0.602-0.740) in the external validation cohort, which was then confirmed with lower BS (0.103 in internal validation cohort and 0.169 in external validation cohort). Meanwhile, machine learning techniques for variable selection together with stepwise Cox regression for survival analysis shows a better prognostic accuracy than using stepwise Cox regression method only.

INTRODUCTION
Intrahepatic cholangiocarcinoma (ICC) is a malignant neoplasm originating from the epithelial cells of bile ducts located above the secondary bile duct branch (1). It is the second most common primary malignancy of liver and its incidence has been increasing in recent years (2)(3)(4). Surgical resection is the main potentially curative for ICC, the 5-year overall survival (OS) rates after hepatectomy and lymphadenectomy is 15 to 35% (5-9). Appropriate staging for ICC patients can be used to describe the severity and range of involvement of malignant tumors, thus prompting clinicians to understand the prognosis of the disease. Now the eighth edition of American Joint Committee on Cancer (AJCC 8th) staging system and the Liver Cancer Study Group of Japan (LCSGJ) staging system are widely used in clinical practice (10)(11)(12)(13). Although studies have demonstrated that the modified AJCC staging system improves stratifying ability, it remains controversial (14,15). The LCSGJ staging system focuses on the hepatocellular carcinoma (HCC) which has distinct differences in biological behaviors and postoperative outcomes (16). Some new stratification strategies begin to incorporate readily available clinical parameters, such as carbohydrate antigen 19-9 (CA19-9), alkaline phosphatase (ALP) and alpha-fetoprotein (AFP) (17)(18)(19). To more effectively utilize these clinical parameters, not just on surgicalpathological factors, we combined the robust machine learning methods to analyze the high-dimension data in clinical practice.
Meanwhile, the selection of variables which involved in the outcome imputation was significant for staging performance. In similar studies, multivariate analysis using Cox regression to identify the independent prognostic factors for survival was a common method, such as the ICC prognostic staging systems performed by Zhou et al. (19), the modified staging system for mass-forming ICC (16), the Fudan score (17), and in nomogram predicting strategies (18). In present study, we attempted to improve the conventional survival analysis by combining with machine learning algorithms for variable selection, since in the real-world studies, variables are not always independent to each other and they are closely related in the non-linear way. The normal used multivariate analysis methods or linear models cannot capture the complex relationships of variables, which are machine learning methods skilled in, especially we used decision tree-based ensemble methods, i.e., eXtreme Gradient Boosting (XGBoost), random forest (RF), and gradient boosted decision tree (GBDT). The three methods are able to divide and re-aggregate the variables to achieve the minimum prediction error when growing sub-trees. Through this way, the non-linear relationship between variables can be well captured. In addition, they are all with the ability of learning from data with missing values directly, that can better adapt to the data situation in the real world. To confirm their effectiveness, we performed the three variable selection methods for comparison and our proposed method outperforms others by a significant margin. Moreover, our study also incorporated the prognostic factors for TNM staging as an improvement of traditional strategy.
The objective of the current study is to integrate pathological factors and clinical parameters to construct a useful and personalized scoring system with machine learning methods, which can accurately predict the survival outcomes of ICC patients under surgical resection.

Patients Cohort
The cohort comprised 1,390 pathologically confirmed ICC patients who underwent hepatectomy between January 2007 and October 2015 at the Eastern Hepatobiliary Surgery Hospital (EHBH) in Shanghai, China, which is a high-volume medical center. The data collection was cut-off on November, 2018. Patients diagnosed with Perihilar (Klatskin) tumors and mixed with hepatocellular carcinoma tumors were excluded. All deaths were confirmed to have occurred after ICC recurrence to avoid the interference of competing mortality. The data collection and tumor staging processes were supervised and examined by two pathologists. The patients in external validation cohort (n=42, January 2016 to June 2017) were screened with the same criteria of the internal cohort. The data collection was cut-off on June, 2020. Variable characteristic statistics of the training cohort and external validation cohort were summarized in Supplemental Table and Supplementary Data of Entire Cohort. The protocol of this study has been approved by the Ethics Committee of the EHBH, and the informed consent has been exempted in the Ethical approval documents.
We collected data of 27 clinical independent variables including provided basic clinical information (age, gender, jaundice, history of stone, history of tumor, and smoking), laboratory results [blood type, hepatitis B virus (HBV), CA19-9, g-glutamyltranspeptidase (g-GT), albumin (Alb), alanine aminotransferase (ALT), ALP, prealbumin (PA), aspartate aminotransferase (AST), carcinoembryonic antigen (CEA), AFP, direct bilirubin (DBIL), and total bilirubin (TBIL)], and perioperative data (T/N/M or TNM stage in AJCC 8th, T or TNM stage in LCSGJ, resection type, and tumor size). All laboratory examinations were performed within 1 week before resection or intervention. To be applicable to machine learning, all relevant variables were cleansed and converted into numerical codes.

Study Design
The aim of this research was to construct a more accurate and simple ICC scoring system for predicting the prognosis after resection based on the clinical factors and stages. Overall Survival for 3 years after resection was the end point in our study. We enriched many types of variables in the initial cohort, and variable selection was implemented via three machine learning methods, i.e., XGBoost, RF, and GBDT. The algorithms calculated the contribution of each independent variable to the target variable and obtained the importance score (IS). We combined the intersection variables with the highest IS for further analysis.
Cox proportional hazard models with backward stepwise regression were used to evaluate the impacts of intersection variables on survival, and the prognostic scoring equation was obtained. Overall, the predictive accuracy and discrimination ability between models were compared. In addition, for validating the advantages of the research methods, we compared survival predictions with/without machine learning screening. Since the data collection and research were implemented in the Eastern Hepatobiliary Surgery Hospital (Shanghai, China), this scoring strategy we proposed is simply called EHBH-ICC in the later section. The overall study process is illustrated in Figure 1.

Tumor, Node, Metastasis Stage
The 8th edition of AJCC and the LCSGJ staging manual in patients who underwent operations were adopted as baseline models for performance comparison (1,20).

Machine Learning
In the process of machine learning modeling, we chose the XGBoost, RF, and GBDT for the variable selection, which are capable of dealing with missing values under certain assumptions and do not require data imputation. Since our data was derived from real-world settings with a small number of missing values, machine learning methods with incomplete data learning ability are necessary. We performed these three algorithms using Scikit-learn: a machine learning framework (https://www.scikit-learn.org/stable/) in Python 3.6.8. In order to achieve their best performance, the AutoML (https:// github.com/ClimbsRocks/auto_ml) method was adopted to automatically select their hyperparameters.

Statistical and Survival Analysis
Data statistics were characterized as quantity (%) or median (interquartile range, IQR). Mann-Whitney U test and chi-square were used on continuous variables and categorical variables respectively, and p<0.05 was considered statistically significant. Relevant prognostic predictors were evaluated by the Cox proportional hazard model using backward stepwise regression (Wald-test, p<0.05 represents a significant difference). We ensured comparability of the training and internal validation cohorts, a random distribution was applied in a ratio of 8:2. To estimate the influence of prognostic factors, the hazard ratio (HR) was calculated. Kaplan-Meier analysis was used in survival analysis and log-rank test was adopted to compare significant differences. The Concordance Index (C-index) and Brier Score (BS) were utilized to evaluate the discrimination ability and predictive performance of the staging methods. The higher C-index indicates, the better discrimination ability of the model. BS was an important measure of model calibration, i.e., the mean squared difference between the predicted probability and the actual outcome. The lower BS value indicates the higher prediction accuracy of the model. Statistical analysis and modeling were performed using Python (version 3.6.8) and R Studio (version 1.1.463).

Clinicopathologic Characteristics of Patients
A total of 1,390 patients underwent surgical resection for ICC during the study period. Twenty-seven types of variables included in the primary entire cohort were sorted out and input into the models, patients' demographic information, medical history, tumor information, and examination information were contained in modeling and reported in Table 1. The median survival time was 15.5 months (IQR 7.7 to 27.7 months). Of all ICC patients in this study, there were 560 of them (40.3%) having a survival of less than 1 year, 576 patients (41.4%) died between 1 and 3 years after surgery, while 254 (18.2%) died after 3 years. There were 939 females (67.6%) and 451 males (32.4%) enrolled in the study, with a male-to-female ratio of 1:2.1. Among study population, 316 patients   Nodal and metastasis categories' conditions between the two staging systems were similar, so we counted them together. Only one patient was diagnosed with T1b, that is, had a tumor size larger than 5 cm and without vascular invasion, T1a and T1b tumors were combined in the following study.

Selection and Comparison of Prognostic Factors
The IS of variables, most relevant to patient OS for 3 years were calculated by XGBoost, RF, and GBDT, the top 20 important variables selected from which were assembled in Table 2. Then we extracted the intersection of the above variables, and the retained 15 important variables were ALP, g-GT, N, T, Alb, tumor size, AST, DBIL, TBIL, PA, ALT, AFP, CEA, CA19-9, and age. Among the variables, IS of T staging of AJCC 8th were higher than that of LCSGJ staging system, therefore T (AJCC 8th) was adopted and used "T" as a general name in the following analysis. Variables screened by machine learning participated in developing the Cox proportional hazard regression model.

Variable Selection Methods Comparison
The Cox regression models with stepwise selection were commonly used in similar studies to select variables, which significantly associated with the prognostic outcome after ICC resection. To verify whether the variable selection incorporated machine learning algorithms can improve the model accuracy or not, we performed three approaches for comparison: only by Cox proportional hazards model with backward stepwise regression (namely SR), only by machine learning (namely ML), and combining both methods (SR+ML) (Figure 2). By establishing the survival prediction models, the C-index ( Figure 2A) and BS ( Figure 2B) of the above three approaches were obtained, and the results demonstrated that SR+ML (C-index, 0.693; BS, 0.115) had better performance in the most of survival time than only a Since the importance score of T (AJCC 8th) in the three models is greater than T (LCSGJ), stage T is merged and only expressed as T in the intersection variables and following article.
ML and only SR. Therefore, machine learning was proven to capture the prognostic predictors of postoperative outcome more accurately during variable processing, consequently improving the prediction performance of the model. The influenced factors selected via only SR including: sex, age, history of stone, smoking habit, HBV, T, N, M, CA19-9, PA, CEA, DBIL, TBIL, excision, and the blood type A. The variables screening results of SR via Cox analysis were summarized in Supplemental Table 2.

Comparison of Predictive Accuracy for
Overall Survival in Eastern Hepatobiliary Surgery Hospital-Intrahepatic Cholangiocarcinoma, American Joint Committee on Cancer 8th and the Liver Cancer Study Group of Japan Staging System Further, we made a comparison of the EHBH-ICC staging system with AJCC 8th and the LCSGJ staging systems. Since time-to-mortality and time-to-event were crucial to interpret the results, Figures 4A-C depict the Kaplan-Meier curves of the three different staging systems. All of three systems in our study appeared a progressive decrease in OS during the study period.
The log-rank test proved that all these staging methods have p<0.001. The discrimination ability and prediction performance of EHBH-ICC score model in internal validation cohort and external validation cohort were respectively indicated with higher C-index of 0.693 (95% CI, 0.663-0.723) and 0.671 (95% CI, 0.602-0.740) than the AJCC 8th and LCSGJ staging systems, which were then confirmed with lower probability calibration of BS (0.103 in internal validation cohort and 0.169 external validation cohort). Detailed C-index and BS results are presented in Table 5 and Figures 4D, E. The model evaluation results show that the EHBH-ICC score was the most precise in predicting the survival after resection in this study.

DISCUSSION
ICC is the second most common primary hepatic malignancies after HCC with increasing incidence and mortality worldwide (21,22). Hepatectomy is considered as the mainstay of curative option for ICC (23). Accurate tumor staging provides the prognostic details, evaluates the risk level appropriately, as well as assists the choice of adjuvant therapeutic options.
At present, the most commonly used staging systems for ICC are the TNM classification systems, among which, the AJCC 8th and LCSGJ are widely approbatory. With relentless efforts of AJCC to improve the prognostic staging of ICC, there are still research evidences that it is inadequate. T1b with single lesion larger than 5 cm without vascular invasion in AJCC was often rare in clinical treatments. And some recent studies indicated that stage II and stage IIIA for ICC patients in AJCC edition failed to show significant prognostic differentiation. Survival time for intrahepatic metastases was sometimes lower than in patients with serous membrane protruding tumors; however, these patients were only at T2 stage. Some recent studies assessed the prognostic performance of the 7th and 8th edition versions of AJCC staging system, proving that there was no remarkable improvement in overall prognostic discrimination, especially in the staging of T3 category (14,24,25). While the LCSGJ focuses on the HCC which has distinct differences in biological behaviors and postoperative outcomes. Some modified staging systems for resectable ICC reserved the prognostic factors in TNM classification or combined these two systems as one of the predictors (19,26). In our investigation, we analyzed the diagnoses of both staging systems above as separate independent variables. We hypothesized that pathology factors are important prognostic factors for postoperative ICC patients but are only partially relevant. Our study was based on multidimensional clinical real-world data in relatively larger population, thus we could seek factors affecting postoperative survival of ICC patients with a wider perspective.
We derived 15 important factors by three algorithms concurrently ( Table 2), and further identified T (AJCC 8th) and N classifications, CEA, CA19-9, AFP, PA as the prognostic predictive factors. Multiple potential tumor biomarkers have been used in evaluating the prognosis of ICC (27)(28)(29). For now, many researches have constructed some new assessment systems with diagnostic biomarkers to predict the survival of patients, such as CA19-9, AFP, CEA, ALP, and PA (17,19,30). These factors were confirmed by our results and were involved in the outcome scoring of ICC patients. Serum CA 19-9 and CEA were most investigated in prognosis of ICC (17,18,31). Jaklitsch et al. had proven that the inclusion of preoperative CA 19-9 and CEA in AJCC and LCSGJ staging systems improved the prognostic survival prediction after resection for ICC (32). Serum AFP is a widely used tumor marker of HCC (33), and the positive serum AFP (>20 ng/ml) is seen in approximately 19% of ICC patients (34). Zhou et al. showed that the lymph node metastasis rate was low in ICC patients with positive AFP (35). PA generated by liver is commonly regarded as a sensitive marker of nutritional status. A study reported that patients with lower PA have poorer outcomes in ICC (19), which is consistent with our result that PA level is negatively associated with the score. Compared with pathological factors, clinical parameters are easier to obtain and can also provide valuable reference. In our EHBH-ICC scoring system, the diagnosis of T and N and the laboratory results can be directly substituted into the calculation to obtain the corresponding risk level scores.
To our knowledge, our report is the first ICC staging method developed based on machine learning models. In recent years, machine learning-based methods are widely used in diagnosis, treatment and outcome prediction such as prostate cancer (36), renal cancer (37), non-small cell lung cancer (38), and cardiovascular event prediction (39). Machine learning can deal with different data types even if data are incomplete or incoherent comparing with traditional statistics. Many studies have demonstrated the advantages of machine learning algorithms over traditional statistical methods (40). According to the EHBH-ICC scoring system, patients are divided into four survival risk grades (low to extremely high). This is a scoring approach to predict the outcome of resectable ICC in Chinese population. The other scoring approach, for instance, the Fudan scoring system was only conducted for 344 patients with multivariate Cox regression. Compared with the Fudan scoring system, the EHBH-ICC has different calculation methods and key prognostic factors. A similarity between Fudan scoring system and our system was the discovery and application of the prognostic value of readily available clinical parameters. Our ultimate validation methods of discrimination ability and performance were C-index and BS. The EHBH-ICC scoring system (C-index, 0.693; BS, 0.103) has more accurate prognostic prediction for ICC patients via comparison with the AJCC 8th and LCSGJ edition ( Figures 4D, E).
In our study, patients' tumor diversity was well reflected. With the continuously increasing sample size, the evaluation system will be more optimized to predict the prognosis of patients more accurately to make decision of the treatment. We cannot only obtain the proportion of risk factors in the prognosis of patients, but also accurately predict the prognosis of patients with the increasing score via machine learning.
However, there are limitations in our study. Our study is a retrospective study in one single center. More medical centers and samples could be added to optimize our evaluation system and solve the limitation. In conclusion, the EHBH-ICC scoring system shows good predictive ability for ICC patients who underwent surgical operation via evaluation and comparison with existing staging systems (the AJCC 8th and LCSGJ). The machine learning-based EHBH-ICC scoring system can effectively evaluate the ICC prognosis after resections and be used in clinical practice.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The protocol of this study has been approved by the Ethics Committee of the Eastern Hepatobiliary Surgery Hospital, and the informed consent has been exempted in the Ethical approval documents.

AUTHOR CONTRIBUTIONS
ZL and LY conceptualized the study. JS and ZW contributed to the methodology. JS and CZ conducted the formal analysis and investigation. ZL, YW, and XH wrote and prepared the original draft. FG and XJ provided the resources and supervised the study. All authors contributed to the article and approved the submitted version.