Development and Validation of a Machine Learning Prognostic Model for Hepatocellular Carcinoma Recurrence After Surgical Resection

Surgical resection remains primary curative treatment for patients with hepatocellular carcinoma (HCC) while over 50% of patients experience recurrence, which calls for individualized recurrence prediction and early surveillance. This study aimed to develop a machine learning prognostic model to identify high-risk patients after surgical resection and to review importance of variables in different time intervals. The patients in this study were from two centers including Eastern Hepatobiliary Surgery Hospital (EHSH) and Mengchao Hepatobiliary Hospital (MHH). The best-performed model was determined, validated, and applied to each time interval (0–1 year, 1–2 years, 2–3 years, and 3–5 years). Importance scores were used to illustrate feature importance in different time intervals. In addition, a risk heat map was constructed which visually depicted the risk of recurrence in different years. A total of 7,919 patients from two centers were included, of which 3,359 and 230 patients experienced recurrence, metastasis or died during the follow-up time in the EHSH and MHH datasets, respectively. The XGBoost model achieved the best discrimination with a c-index of 0.713 in internal validation cohort. Kaplan-Meier curves succeed to stratify external validation cohort into different risk groups (p < 0.05 in all comparisons). Tumor characteristics contribute more to HCC relapse in 0 to 1 year while HBV infection and smoking affect patients’ outcome largely in 3 to 5 years. Based on machine learning prediction model, the peak of recurrence can be predicted for individual HCC patients. Therefore, clinicians can apply it to personalize the management of postoperative survival.


INTRODUCTION
Hepatocellular carcinoma (HCC) is the most common primary liver cancer and ranks as the fourth leading cause of cancerrelated mortality (8.2%) worldwide (1). Surgical resection remains the primary curative treatment for patients with adequate liver function (2). However, 50% to 70% of patients who undergo complete tumor resection still suffer from frequent recurrence and disease progression, ultimately leading to unfavorable prognoses (3). Therefore, the identification of patients at high risk of recurrence after surgical resection is essential for clinicians to provide appropriate surveillance and therapy.
During the past decade, researchers have primarily focused on prognosis-predictive models based on biological, demographic, and clinical factors. The most acknowledged system of the American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) is commonly used to determine the staging of liver cancer. However, its prognostic value in predicting tumor recurrence is widely debated (4). Recent models, including the Singapore Liver Cancer Recurrence (SLICER) score, Surgery-Specific Cancer of the Liver Italian Program (SS-CLIP), and the Korean model, were designed to detect tumor recurrence in specific groups of patients. Due to the inaccuracy and diversity of these models, they have not been widely implemented (5)(6)(7). In addition, the Early Recurrence After Surgery for Liver tumor (ERASL) model, which is based on Cox regression analysis, has been established to predict early tumor recurrence after liver resection. Despite its better discriminatory performances than other tools, the limited clinical parameters and the prediction for 2-year recurrence restrict its application in the full HCC survivorship management (8).
Machine learning, a field of computer science in which machines mimic, recognize, and learn cognitive functions of the human mind to make empirical predictions, is gaining more and more attention in recent years (9). For cancer, machine learning demonstrates the advantages of image recognition and feature selection compared to traditional methods (10,11). Recently, automated machine learning algorithms have been developed to detect metastasis in sentinel lymph nodes of women with breast cancer, and showed better diagnostic performance than pathologists (12). In patients with bladder cancer, a novel predictive model based on machine learning algorithms was also created. In the model, disease recurrence after cystectomy was predicted with more than 70% sensitivity and specificity (13). However, few studies have applied a machine learning framework to identify HCC patients with the potential risk of recurrence after curative treatment.
Briefly, we aimed to utilize machine learning algorithms to develop a risk prediction model to predict HCC recurrence among patients who underwent surgical resection. We also explored feature importance in this process, verifying the important prognostic factors for tumor relapse. In addition, a risk heat map covering five years that visually depicts the risk of recurrence was constructed. In this way, we hope to improve the performance of HCC recurrence predictive models using big data and to provide evidential support for individualized management.

MATERIALS AND METHODS
This analysis was reported according to the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines (14).

Patients
The database was retrospectively derived from patients with HCC who underwent hepatic resection at Eastern Hepatobiliary Surgery Hospital, Second Military Medical University (EHSH) (n = 7,411, from May 2008 to Sept. 2018) or Mengchao Hepatobiliary Hospital, Fujian Medical University (MHH) (n = 508, from Nov. 2014 to Nov. 2018). The patients in this study met the inclusion criteria as follows: (1) pathological confirmation of HCC, (2) Child-Pugh A/B before surgery, (3) R0 surgical resection of tumor with curative intent. However, patients who (1) died within 30 days after surgery or lost to follow-up, (2) received preoperative neoadjuvant treatment (3) diagnosed with extrahepatic cancers, HCC relapse, or metastasis (4) younger than 18 years old were excluded from this study. Inclusion and exclusion of patients and following analysis can be found in Supplementary Figure 1.
Different models were constructed on the EHSH dataset, which was randomly divided into derivation and internal validation cohorts at a ratio of 8:2. The models were validated externally using the dataset from MHH. The study was approved by the Ethics Committee of the two centers, and the requirement of written informed consent was waived. All procedures were performed in accordance with the Declaration of Helsinki.

Clinical Variables
The demographics, laboratory tests, and HCC etiologies were collected from the database. The laboratory tests included various parameters of blood examination, liver and coagulation function, and hepatitis virus markers. Tumor characteristics included, but were not limited to, the number of tumors, the diameter of the largest nodule, differentiation, capsule, cirrhosis in non-cancerous tissues, and vascular invasion. Macrovascular invasion was defined as tumor invasion of large vessels, which can be detected by Computed Tomography/Magnetic Resonance Imaging (CT/MRI) (8). Microvascular invasion refers to the histologically microscopic presence of cancer cell clusters in the blood vessels lined with endothelial cells (15). Thirty-five variables were selected by health professionals based on literature review and clinical expertise.

Follow-up and Outcome
During the follow-up, serum alpha-fetoprotein (AFP) levels were measured, as well as ultrasonography, CT, or MRI of the chest and abdomen once every two months for six months, and then once every three months for the next 1.5 years. For patients who were free of cancer recurrence two years after surgery, a 6-month interval surveillance was carried out. The outcome of this study, recurrence-free survival (RFS), was defined as the time from surgery to the detection of recurrence, metastasis, or death.

General Statistical Principle
After preliminary data cleaning, multiple imputation was performed in R (v3.6.2) based on the Multivariate Imputation by Chained Equations (MICE, v3.8.0). Continuous variables, which were tested for normality by Anderson-Darling tests, were abnormally distributed. Therefore, the variables were summarized by median (IQR), and Wilcoxon rank-sum tests were used for between-groups comparisons. Categorical variables were expressed as frequency (%), and Chi-squared tests or Fisher's exact tests were applied, as appropriate. All statistical analyses above were two-sided, while p < 0.05 was considered statistically significant, and conducted in Python (v3.7) with Scipy (v1.4.0) package.

Cox Proportional Hazards Model (CPH)
The clinicopathologic parameters of HCC recurrence were fitted by the Cox regression using the Survival package (v3.1) in R-language. Univariable Cox regression was firstly conducted to identify potential predictors (p < 0.1). Variables identified in univariable cox model were then applied in multivariable cox regression with stepwise selection method.

Machine Learning Models
Three machine learning models, including Deep Learning-based Survival Model (DeepSurv), Extreme Gradient Boosting (XGBoost), and Random survival forest (RSF) were applied to perform the task of predicting HCC recurrence using all 35 variables preselected. DeepSurv is a multi-layer feed-forward neural network that predicts the effects of diverse variables on their hazard rate parameterized by the weights of the network (16). Based on its algorithm principle, we redeveloped DeepSurv in Python under Pytorch deep learning framework (version 1.3.1, CPU version) and optimized the hyper-parameter search. XGBoost is an improved supervised learning algorithm based on the Gradient Boosting Decision Tree algorithm, which can deal with survival problems by setting partial likelihood functions of the optimization object and log-rank tests as node split criteria (17). Our XGBoost model was implemented in Python using the XGBoost (v.0.9) package. RSF is another machine learning approach for survival analysis that eliminates the proportional hazard assumption and can fit a more general spectrum of survival problems, which conducted in R (randomForestSRC v2.9.3) (18).

Model Discrimination and Calibration
The discrimination performance among the four models in both derivation and validation sets were measured by Harrell's c-index. Comparison of c-index among different models in each cohort was conducted afterwards (19).
As suggested by previous study Kaplan-Meier survival curves for various risk groups were used as informal evidence of discriminative ability (20). Kaplan-Meier curve for the external validation cohort after calibration allows a visual comparison of discrimination among different risk groups at the cut-off of 50th and 84th centiles.
Calibration plots of XGBoost were applied to the derivation and validation sets to determine whether each patient's predicted risk was consistent with the actual outcome. We followed the practice of Chan et al. to draw the calibration plots (8) at 1, 2, 3, and 5 years.

Models in Different Time Intervals and Predictive Heat Map
Inspired by lifetable methodology, we applied XGBoost to different time intervals, including 0 to 1 year, 1 to 2 years, 2 to 3 years, and 3 to 5 years, with the same software. Importance scores were exported, and the Harrell's c-index of each interval were reported at the same time. Furthermore, fifty patients from the external validation cohort were randomly selected to create a heat map for visually illustrating the risk of recurrence within five years after surgery, with aim of providing guidance and support in clinical practice.

Clinicopathologic Features and Outcome
A total of 7,919 patients who underwent surgical resection from two centers were included in the study. 80% of EHSH cohort was assigned as the derivation set (n = 5,928) and the rest was designated as internal validation set (n = 1,483). By the time of data analysis, 3,359 and 230 patients experienced recurrence, metastasis or died during the follow-up time in the EHSH dataset and MHH datasets, respectively. Median follow-up period for two datasets were 3.51 (IQR: 0.41-8.32) and 2.04 (IQR: 0.23-3.88) years. Detailed outcome descriptions are provided in Supplementary Table 1.
Thirty-five predictors were included in the final analysis. Preoperative clinical and postoperative pathologic characteristics of the three cohorts are shown in Table 1.

Predictive Performance
The discriminatory performance of the four models was assessed with the Harrell's c-index ( Table 2) As shown in Figure 2, the calibration plots demonstrated a satisfying agreement between predictions made by XGBoost and actual patient outcomes in all datasets.

Models and Feature Importance in Different Time Intervals
We established the XBGoost model in different time intervals, including 0 to 1 year, 1 to 2 year, 2 to 3 year, and 3 to 5 years, to examine the dynamics of feature importance in HCC patients. The specific predictive performance measurements using c-index and 95% CI for each time slot are listed in Table 3.
The variables with the top 10 importance scores are shown in Table 4. During 0 to 1 year after resection, the importance score of tumor thrombus (defined as the tumor extending into a vessel, typically portal vein) was 103.01, substantially higher than scores of other factors, such as tumor diameter (33.94), gammaglutamyl transpeptidase (GGT) (20.25), and tumor capsule (19.22). For 1 to 2 year, tumor number (13.39) was the most important variable related with patient outcomes, followed by resection type (major resection 13.22), tumor thrombus (13.04), and tumor diameter (12.36). In the latter two intervals, apart from tumor number, HBV infection was found to be a relatively important variable. HBV-DNA load has the third highest importance score for 2 to 3 years and HBsAg ranked first in the last period. Furthermore, smoking, an unhealthy lifestyle, was also associated with late recurrence.

The Pattern of Recurrence Risk
Using the XGBoost model in different time intervals, a risk heat map covering four time intervals was developed that visually depicts a patient's risk of tumor recurrence, metastasis or death after undergoing curative liver resection. In general, individual heat map indicated a trend of relatively high recurrence risk in 0 to 1 year and 3 to 5 years after surgical resection ( Figure 3).

DISCUSSION
HCC is one of the most common malignancies worldwide. Though curative resection offers the best prognosis for patients, disease recurrence remains a major obstacle to the long-term survival of patients (21). Moreover, little is known about the potential risk and peak time periods of HCC recurrence after curative surgery (22,23). We therefore conducted this research to mediate this gap. In this study, the risk prediction model based on the XGBoost algorithm showed the best c-index in the EHSH validation set. To observe the recurrence risk of individual patients at different time intervals post-surgery, a heat map was constructed based on the XGBoost model for 50 randomly selected HCC patients. The majority of patients had a similar trend of postoperative recurrence that risks in 0 to 1 and 3 to 5 years after surgery were higher than those in 1 to 2 and 2 to 3 years.
In the past few years, several scoring systems have been developed for estimating HCC recurrence risk and stratifying patients. These systems have primarily selected significant clinical parameters through multivariate analyses and constructed conventional Cox proportional hazard models based on the limited risk factors (24)(25)(26). One of the important assumptions for Cox proportional hazards regression is that each variable makes linear contribution to model. However, in clinical studies, multiple risk factors usually have non-linear effects with recurrence-free survival, especially in cancer studies (16,27,28). Due to this reason, the previous models might fail to show goodness-of-fit and to make accurate prediction. Machine learning algorithms are probably superior than conventional CPH because they can fit more sophisticated non-linear relationship. According to our attempts of building different models, the XGBoost model did better prediction of liver recurrence.     Apart from an individualized heatmap for illustrating recurrence risk, a feature importance analysis was conducted based on the XGBoost model and was used to evaluate dynamics of variables contributing to the interesting outcome. Specifically, tumor characteristics, such as tumor thrombus, tumor number, tumor size, and tumor differentiation, contributed more to the model's predictive performance in our study. In addition, macrovascular invasion (MaVI), microvascular invasion (MVI), gamma-glutamyl transpeptidase (GGT), intraoperative blood transfusion and major resection also showed a more significant contribution to the predictive performance of the model. Furthermore, smoking as an unhealthy lifestyle also hampered prognosis of HCC patients. These findings are supported by previous research as follows.
Firstly, previous studies found that patients with portal vein tumor thrombosis (PVTT) usually decreased liver function reserves, which was a high-risk factor for disease progression and recurrence (29,30). In addition to tumor thrombus, tumor volume is also associated with HCC recurrence. In another study, tumor volume was shown to be a predictor of HCC recurrence after liver transplantation (31). A clinical study in Korea confirmed that the maximal size of HCC and the number of tumors were significantly correlated with the recurrence of HCC after liver transplantation (32). In line with our results, MVI was also a unique parameter assessed in the ERASL, SLICER, SS-CLIP, and Korean models (5)(6)(7)(8). The dissemination and spread of tumors through micro-vessels may explain the advanced tumor stage, tumor progression, and worse outcomes (33)(34)(35).
Secondly, perioperative blood transfusions were independently associated with survival and cancer recurrence after surgical resection (36). A meta-analysis found that allogeneic blood transfusions were associated with poor clinical prognoses in patients with HCC who underwent radical hepatectomy (37). The association between major resection and blood loss as well as RFS of HCC patients has been examined: the more complicated hepatectomy is, the more likely patients are to suffer from intraoperative blood loss, leading to shorter time to recurrence (38).
Thirdly, liver function presented by GGT was another crucial prognostic factor to predict tumor recurrence (39). GGT was first found to modulate the metabolism of glutathione (GSH) and facilitate amino-acid recovery for GSH synthesis (40). Recently, GGT was reported to be involved in tumor initiation, progression, and invasion. As such, GGT may induce the production of endogenous reactive oxygen species (ROS), leaving cells exposed to persistent oxidative stress, leading to DNA damage and tumor growth (41,42).
Moreover, smoking was associated with an increased risk of HCC (43,44) and disease-free survival of patients who underwent resection (45). In the current study, we found that smoking was associated with a recurrence risk of 2 to 3 and 3 to 5 years after HCC. The underlying mechanism might be that nicotine increases the expression of a-7-nicotinic acetylcholine receptor (a-7-nAChR), leading to recurrence through the JAK2/ STAT3 signaling pathway (46). A previous study found that the history and amount of smoking were both risk factors for the progressive recurrence of HBV-related HCC (47).
Finally, early disease recurrence (0-1 year) is often thought to be a result of intrahepatic metastases, while late recurrence is more likely to result from newly-onset tumors with multicenter origins (48,49). In accordance with this theory, HBV-DNA load and HBsAg contribute significantly to HCC recurrence from two to five years in our study, which likely induce genomic alternations and pro-oncotic signaling for de novo HCC in the long term (50).
Our results suggest that clinicians can provide personalized management of recurrence risk after surgical resection in HCC patients based on information provided by heat maps and feature importance, which may improve postoperative survival outcomes. The risk heat map allows clinical teams to detect patients most at risk of HCC recurrence, schedule appointments for them in the "heat zones" that most likely for recurrence, and take interventions as needed. For example, clinicians may give greater attention to malignant characteristics of tumors, including the presence of tumor thrombus, larger tumor sizes, multiple tumor nodules, and micro-or macro-vascular invasion, if the heat map indicates a high risk within one year after surgery.
There are certain underlying limitations to our study. Firstly, our model is primarily based on two Chinese institutions of patients with HCC in hepatitis B virus-endemic areas. It is necessary to validate our model in international cohorts to extend our results to patients with HCC of various etiologies. Second, some other variables that may be associated with the prognosis of HCC patients, such as postoperative adjunctive therapies and serum inflammatory markers, were not evaluated in this study. In addition, further prospective studies with longer follow-ups are essential to extend the performance of our model further.
In summary, we have developed a model based on a machine learning algorithm that better predicts the risk of disease recurrence in individual patients following hepatic resection in a large population. We further applied this model to four time periods to describe patterns of HCC relapse, and to explore important risk factors. The heat map offers clinicians a decision support tool to identify individuals prone to recurrence, while also allowing clinicians to identify the prognostic factors, which are clinically useful in terms of individualized patient monitoring, surveillance, and management. Future prospective studies are needed to verify our conclusions.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, upon reasonable request.

AUTHOR CONTRIBUTIONS
YH and HC contributed equally to the manuscript as they both took charge of study design and implementation, as well as drafting manuscript. YZ also participated in study design and literature review. Both ZL and HM conducted statistical and machine learning analysis. Corresponding author, JL, made a huge contribution to the manuscript in terms of revising the draft and reviewing the final version. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
Thanks to all the staff of Mengchao Hepatobiliary Hospital who contributed to the study.