A Machine Learning-Based Model to Predict Survival After Transarterial Chemoembolization for BCLC Stage B Hepatocellular Carcinoma

Objective We sought to develop and validate a novel prognostic model for predicting survival of patients with Barcelona Clinic Liver Cancer Stages (BCLC) stage B hepatocellular carcinoma (HCC) using a machine learning approach based on random survival forests (RSF). Methods We retrospectively analyzed overall survival rates of patients with BCLC stage B HCC using a training (n = 602), internal validation (n = 301), and external validation (n = 343) groups. We extracted twenty-one clinical and biochemical parameters with established strategies for preprocessing, then adopted the RSF classifier for variable selection and model development. We evaluated model performance using the concordance index (c-index) and area under the receiver operator characteristic curves (AUROC). Results RSF revealed that five parameters, namely size of the tumor, BCLC-B sub-classification, AFP level, ALB level, and number of lesions, were strong predictors of survival. These were thereafter used for model development. The established model had a c-index of 0.69, whereas AUROC for predicting survival outcomes of the first three years reached 0.72, 0.71, and 0.73, respectively. Additionally, the model had better performance relative to other eight Cox proportional-hazards models, and excellent performance in the subgroup of BCLC-B sub-classification B I and B II stages. Conclusion The RSF-based model, established herein, can effectively predict survival of patients with BCLC stage B HCC, with better performance than previous Cox proportional hazards models.


INTRODUCTION
Hepatocellular carcinoma (HCC) is the second leading cause of cancer-related deaths in the world (1)(2)(3). Its prognosis remains poor, owing to a relatively high proportion of unresectable disease at the time of diagnosis, although the Barcelona Clinic Liver Cancer (BCLC) staging system, endorsed by the European Association for the Study of the Liver (EASL) and the American Association for the Study of Liver Diseases (AASLD), have been extensively used in clinical practice (4). Patients with stage B BCLC are considered unsuitable for curative treatment, and their overall survival rates are varied mainly due to heterogeneity of liver function and tumor burdens (5). Consequently, several subclassification systems or risk predication models for BCLC stage B HCC patients have been proposed.
The subclassification system proposed in 2012 categorized patients with intermediate HCC into four substages, namely B1 to B4 (6). The following year, Kadalayil et al. developed a simple prognostic score which is entitled HAP score with several parameters including albumin, bilirubin, a-fetoprotein (AFP), and tumor size (7). Recently, an inflammation biomarker was shown to be a prognostic predictor for cancer patients, whereas Chon developed and validated a nomogram, including neutrophil-tolymphocyte ratio, for predicting survival rates of patients with intermediate HCC (8). Despite these advancements, all aforementioned models were based on the traditional Cox proportional-hazards approach.
Although several prognostic models have been established, no tool exists that can effectively estimate survival outcomes after TACE for BCLC stage B HCC. Previous studies have reported the potential for integrated machine learning algorithms in developing effective models to predict risk factors associated with survival outcomes (9). Particularly, this approach enhances understanding of patterns and hidden relationships between factors that could be missed when traditional biostatistical methods are used (10,11). Among known machine-learning classifiers, the random forest classifier offers excellent performance in modeling and has subsequently been used in management of right-censored survival data. The resulting RSF is a non-parametric classifier that provides variable importance values for all candidate predictors (12). In the present work, we evaluated whether RSF could predict survival outcomes of patients with BCLC stage B HCC. Additionally, we assessed the importance and predictive value of clinical variables for prognostic outcome and compared RSF-derived results with those previously obtained using Cox proportional-hazards models.

Study Population and Selection Criteria
We retrospectively recruited 979 consecutive patients with BCLC Stage B HCC from a database (13), between January 2007 and December 2016. The inclusion criteria were: (1) adult patients diagnosed with HCC according to the AASLD guidelines; (2) patients with liver function of Child-Pugh class A or B; (3) patients with an Eastern Cooperative Oncology Group (ECOG) performance status of 0; (4) patients with multiple tumors and no vascular invasion or lymphatic/extrahepatic metastasis; and (5) patients who had complete follow-up by magnetic resonance imaging or computed tomography and biochemical routine test. The exclusion criteria were: (1) patients with a history of malignancies other than HCC; (2) those who manifested recurrent HCC or HCC with vascular invasion or lymphatic/extrahepatic metastasis; (3) patients with a liver function of Child-Pugh class C; (4) those with hepatic encephalopathy/refractory ascites/gastrointestinal hemorrhage; (5) patients with immunodeficiency or autoimmune disease; and (6) those whose follow-up duration was less than three months. All patients were divided into training and validation groups, at a ratio of two to one, then an individual cohort comprising 414 patients from the same database was used for external validation. All patients in the external validation cohorts came from different hospitals from the primary cohort.

Establishment of the Prognostic Model
We collected demographic and biochemical parameters from all patients for analysis. These included their age, gender, virus infection status, hemoglobin level, white blood cell count, platelet count (PLT), aspartate aminotransferase (AST), albumin, total bilirubin, c-reactive protein (CRP), prothrombin time (PT), ascites, alpha-fetoprotein, tumor number and size, tumor vascular invasion, distant or lymph node metastasis, and performance status score. We evaluated the Child-Pugh grade using laboratory data from albumin, PT, and total bilirubin, as well as clinical data of hepatic encephalopathy and ascites. Particularly, the ascites were defined as the radiological ascites, whereas the AST to platelet ratio index (APRI) was calculated using the following formula: ([AST/upper limit of normal]/ platelet count [10 9 )/L]) × 100. On the other hand, the ALBI score was calculated as follows: linear predictor = (log10 bilirubin x 0.66) + (albumin × −0.085), where bilirubin is in mol/L and albumin in g/L. Additionally, the BCLC-B sub-classification was as previously described by Bolondi L (6). Overall survival comprised primary outcomes and was defined as the time from HCC diagnosis to last follow-up. Patients were followed up monthly, during the period of initial treatment, then after every 2 to 3 months for the first 2 years if complete remission was achieved. Frequency of follow-up gradually decreased to every 3 to 6 months after 2 years' remission. Overall survival rates were estimated using the Kaplan-Meier method, with the log-rank test used to compare survival curves.
Thereafter, we selected prognostic factors based on the RSF classifier method, with permutation-based selection conducted using the variable importance (VIMP) metric of the RSF. For VIMP, a random subset of predictor variable values was permuted then the difference in prediction error, between the observed and randomly permutated variables, calculated as previously described (14,15). Summarily, a high VIMP suggests that misspecification worsens predictive accuracy in the forest, whereas a low VIMP suggests that noise is more informative than the observed variable. The resulting top five risk factors, with the highest VIMP, are chosen for model development by the RSF classifier. We validated the selected variables using the minimal depth and the frequency form the 10-fold cross validation.

Statistical Analysis
Continuous variables were presented as means with standard deviation (SD) of the means or median with interquartile ranges (IQR), whereas categorical ones were presented as percentages. We adopted the multiple imputation method for missing data, and trained RSF by growing a large number of individual trees with each tree trained on a random-bootstrap sample from the original cohort, followed by a 10-fold cross validation. Starting with the entire sample at the tree trunk, we chose a random set of variables as candidates for splitting the branch into two subbranches, with the aim of maximizing the difference in survival between subbranches. We determined optimal splitting threshold for each candidate variable, then chose the variable with maximum log-rank statistic between split data for splitting. This process was repeated until a predetermined terminal node size was achieved. A trained random survival forest predicts an individual mortality, which was calibrated on the number of events. Specifically, if all patients shared similar characteristics, the predicted mortality would be equal to the number of expected deaths. To evaluate the predictive performance of the random survival forest, we calculated concordance index (c-index) of the final forest, then evaluated accuracy of the predicted outcome using AUROC. Additionally, we compared our model's performance with previously established ones, such as the HAP score, the mHAP II score, the ALBI-TAE model, as well as the up-to-seven, four-and-seven, six-and-twelve score, BCLC-B sub-staging and the New BCLC B sub-staging systems. All statistical analyses were performed using packages implemented in R software (version 3.5), with statistical significance set at p<0.05.

Patient Characteristics
A total of 903, out of 979, patients met the inclusion criteria and were therefore used for model development and validation. 602 and 301 patients were placed into training and internal validation cohorts, respectively. Their baseline characteristics are presented in Table 1. Summarily, median follow-up periods for the training and validation cohorts were 17.6 and 17.0 months, respectively. Most of the patients were infected with HBV, with only a handful infected with HCV. This may be because the included patients were all from Asia. Almost all clinical parameters, except Child-Pugh and ALBI grades, were well-balanced between the training and validation groups. The percentage of patients of Child-Pugh A in the training group was more than that in the validation group, with more ALBI grade I patients found in the validation than in the training group. A total of 343 patients were used for external validation. Their baseline characteristics are summarized in Supplementary Table 1.
Patients in BCLC-B sub-classification B I stage had a significantly better overall survival than the others (Figure 1). However, the Child-Pugh score could hardly distinguish patients with diverse prognosis (Supplementary Figure 1).

RSF Models
A total of 21 covariates, including clinical variables and laboratory data, were collected at baseline and were considered candidates for analysis and modeling. All statistical analysis procedures used in this study are outlined in Figure 2. Data transformation, indexing, and imputation were performed to generate data points for predicting overall survival rates during the follow-up period. Summarily, all variables were ranked    Figure 4A).

Model Validation and Comparison
We validated model performance using the validation group. Specifically, AUROC-based prediction of survival outcomes for the first three years reached 0.70, 0.71, and 0.68 respectively, in the internal validation cohorts, whereas that in the external validation cohort reached a respective 0.69, 0.76, and 0.70 ( Figures 4B, C). A comparison between our model with eight others (6,7,(16)(17)(18)(19)(20)(21), including the HAP and mHAP II scores, the ALBI-TAE model, as well as the up-to-seven, the four-and-seven, the six-and-twelve score, the BCLC-B sub-staging, and the New BCLC B sub-staging systems, indicated that ours had the highest c-index ( Table 2).

DISCUSSION
In the present study, we used RSF, a machine learning-based algorithm, to establish a model for predicting survival outcomes of patients with BCLC stage B HCC. Based on VIMP, we identified and evaluated five parameters, namely tumor size, BCLC-B subclassification, AFP, and ALB levels, as well as number of lesions as strong predictors. These were subsequently used for establishment of the model. A comparison between our and other traditional Cox proportional-hazards models revealed that the present model is an effective tool for estimating survival outcomes after TACE for patients with BCLC stage B HCC. Previously developed predictive models for patients with intermediate HCC are all based on the traditional Cox proportional-hazards method, which is limited by the possibility of over-fitting, data mining purposes due to correlation between variables, or non-linearity of variables (including potential complex interactions among them) (4,22). Recently, a machine-learning based statistical model, called RSF has emerged as an intuitive technique for predicting individual risk in cancer patients. This method has potential for establishing predictive models, especially in cases where response variables are censored survival data and the relationship between response and predictor is complex. In fact, recent studies have proved its efficacy in treatment responses and predicting survival outcome events in several types of cancer (14).
Based on bootstrap data and numerous lines of evidence from individual decision trees, it is evident that RSF offers the following advantages: 1) it allows for an intuitive assessment of variable importance; 2) it can deal with correlated parameters, variable interactions, and non-linear effects; and 3) it requires little input from the analyst. Additionally, RSF does not rely on restrictive assumptions, in contrast with traditional Cox proportional-hazards models (23). In the present study, our model revealed that several predictors, namely, tumor sizes, AFP level, and the number of lesions were strong predictors, consistent with previous studies. And ALB level was shown to be an effective tool for assessing liver function and has subsequently been adopted as a prognostic marker for HCC (23)(24)(25). Several traditional prognostic factors, such as ALBI, were not ranked high in the present model, possibly because those factors are fundamental to development, maintenance, and progression of HCC death. Additionally, they are intrinsic components of other risk factors, particularly sub-clinical ones that are more distal to disease initiation but closer to adverse outcomes.
This study had several limitations. Firstly, the inherent limitations associated with a retrospective study. Secondly, the AUROC was low and should be validated using other cohorts. Thirdly, all participants were from the Asian centers. These findings need to be validated using western populations. Fourthly, despite the included patients receiving TACE as a first-line treatment therapy, additional treatments, such as radioembolization, targeted therapy or ablation therapy, during the follow-up period may have influenced survival rates, although these need not be controlled. Fifthly, we only included 21 clinical parameters in our analysis, although other parameters such as genetics and imaging features could also be informative in the modeling. Lastly, the used database did not provide definitions for multiple lesions, while the data on how far apart the lesions were could be included in the future study.
In conclusion, we used RSF-based approach to successfully develop a model for predicting survival rates of patients with BCLC stage B HCC. This model guarantees superior performance compared to previously published Cox proportional hazards models.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
All authors collected, extracted, and analyzed the data and wrote the article. HL and YZ conceived and designed this study. All authors contributed to the article and approved the submitted version.