Artificial Intelligent Multi-Modal Point-of-Care System for Predicting Response of Transarterial Chemoembolization in Hepatocellular Carcinoma

Hepatocellular carcinoma (HCC) ranks the second most lethal tumor globally and is the fourth leading cause of cancer-related death worldwide. Unfortunately, HCC is commonly at intermediate tumor stage or advanced tumor stage, in which only some palliative treatment can be used to offer a limited overall survival. Due to the high heterogeneity of the genetic, molecular, and histological levels, HCC makes the prediction of preoperative transarterial chemoembolization (TACE) efficacy and the development of personalized regimens challenging. In this study, a new multi-modal point-of-care system is employed to predict the response of TACE in HCC by a concept of integrating multi-modal large-scale data of clinical index and computed tomography (CT) images. This multi-modal point-of-care predicting system opens new possibilities for predicting the response of TACE treatment and can help clinicians select the optimal patients with HCC who can benefit from the interventional therapy.


INTRODUCTION
Liver cancer is the second most lethal tumor after pancreatic cancer and ranks the fourth leading cause of cancer-related death worldwide (Craig et al., 2020;Villanueva et al., 2019;Tao et al., 2020). In China, the 5-year survival rates have been reported to be 12% (Zheng et al., 2018). Hepatocellular carcinoma (HCC), which is the most common form of liver cancer (∼90% of liver cancer), remains a health challenge in the world (Llovet et al., 2021;Yu et al., 2020). In order to predict the prognosis of patients with HCC, the Barcelona Clinic Liver Cancer (BCLC) staging classification, which is approved by European Association for the Study of the Liver (EASL) and American Association for the Study of Liver Diseases (AASLD), has emerged as the standard classification in recent years (Llovet et al., 2008;Vitale et al., 2011;Yang et al., 2012). However, HCC is commonly at intermediate tumor stage (BCLC stage B) or advanced tumor stage (BCLC stage C), in which only some palliative treatment can be used to offer a limited overall survival (∼11-20 months) (Llovet et al., 2002;Hucke et al., 2011;Sieghart et al., 2015). According to international guidelines, transarterial chemoembolization (TACE) is the recommended treatment for Barcelona stage B patients with localized liver disease and good liver function (Camma et al., 2002;Otto et al., 2006;Takayasu et al., 2006). However, HCC is highly heterogeneous at the genetic, molecular, and histological levels, which makes the prediction of preoperative TACE efficacy and the development of personalized regimens challenging. Therefore, there are growing demands for exploiting a method to accurately predict response of TACE in HCC. Imaging setting, which included ultrasound, computed tomography (CT), and magnetic resonance imaging (MRI), can be a promising tool for the detection stage and risk assessment of HCC (Banerjee et al. (2015); Woodall et al., 2007). Due to the high sensitivity, worldwide availability, and easy interpretability, CT is still the most commonly used in the field of response of TACE therapy. The best response of TACE cannot always be achieved after one session of CT imaging, especially for patients with large tumors. However, multiple CT examinations can easily damage the liver function of patients. Therefore, other clinical evaluation indexes should be added to build a point-of-care predicting system for improving the predicting accuracy of TACE responses. Crucially, inflammation has been recognized as a major role in the tumorigenic process for HCC. Recent studies confirm that inflammation also plays a prognostic role in the whole clinical process of malignancy (Sanghera et al., 2019;Chan et al., 2020;Wang et al., 2021). A number of inflammation-based indexes (IBIs) are derived from peripheral blood counts for prognostic purposes, with examples including neutrophil-to-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR), monocyte-to-lymphocyte ratio (MLR), systemic immune-inflammation index (SII), and neutrophil-to-lymphocyte ratio (SIRI) (Pinato et al., 2012;Yang et al., 2020). Therefore, combined CT images with inflammation-based indexes to predict postoperative treatment responses and accurately identify patients who responded after TACE is of important clinical guiding significance. Recently, artificial intelligence (AI), which is capable of maximizing the predictive accuracy from static or dynamic data sources using analytic or probabilistic models, has markedly extended the reach of human beings in biomedical tasks (Esteva et al., 2017;Li et al., 2017;Kermany et al., 2018;Li et al., 2018;Chang et al., 2019;Zhang et al., 2020). Deep learning is especially recognized as demonstrating good performance for assessing radiological and recognizing images. Because of the multifactorial and complex nature of HCC, the convolutional neural network of deep learning algorithms has shown great potential in fully mining image information. This approach does not need to manually screen image features, and it shows good training performance for high-dimensional data processing (Gulshan et al., 2016;Peng et al., 2019;Liu et al., 2020). The texture analysis based on contrast-enhanced pretherapeutic dynamic CT may act as imaging biomarkers to predict response for HCC. Higher gray-level co-occurrence matrix and smaller tumor size are significant signs. However, the highest AUC was only 0.72 (Park et al., 2017;Kermany et al., 2018). It is necessary to find a new method to increase predicting accuracy of TACE responses. Because of the multifactorial and complex nature of HCC, building a deep learning point-of-care predicting system to integrate multiple factors (e.g., CT images and inflammation-based indexes) would appear to be a highly effective technique to autonomously predict the response of TACE therapy. In this paper, we aim to develop a point-of-care system for predicting the response of TACE in HCC by a concept of multi-modal large-scale data by combining clinical indexes with CT images. This multimodal point-of-care predicting system opens new possibilities for predicting the response of TACE treatment and can help clinicians to select optimum patients with HCC who can benefit from the interventional therapy.

Patients
This study included patients in the Second Affiliated Hospital of Harbin Medical University. A total of 1,890 patients who underwent TACE were recruited from January 2011 to September 2020. Finally, 399 patients were enrolled. The patients who matched inclusion criteria were as follows: (1) Patients were diagnosed as HCCs via biopsy or radiological for the Study.
(2) All patients did not have a history of previous TACE of HCC before CT examination. (3) Those who had hepatic-arterial CT imaging within 7 days before and 1 month after treatment. (4) Patients with BCLC stage B. The exclusion criteria were as follows: (1) Those with a history of previous TACE, liver transplantation, targeting therapy, radiotherapy, and palliative care treatment.
(2) Patients with major thrombosis in portal vein or abdominal lymph node or distant metastases. (3) Other liver tumors that were confirmed with pathology or imaging. The response of hepatic-arterial CT images was classified into objective response [containing complete response (CR) and partial response (PR)] and nonresponse [containing progressive disease (PD) and stable disease (SD)] according to the modified Response Evaluation Criteria in Solid Tumors (mRECIST).

CT Scan Protocols and Region-of-Interest Segmentation
Contrast-enhanced computed tomography (CECT) was performed with a 64-detector row scanner CT machine (GE Healthcare, United States). The scanning parameters were as follows: tube current, 250 mA; tube voltage, 120 kV; and slice thickness, 5 mm. Contrast agent (Ultravist, Bayer, Germany) for CECT was injected through a pump injector at a rate of 3.0 ml/s from the antecubital vein. Hepatic-arterial phase CT images were obtained at 35 s. All CT images were input into the Dr. wise AI software (Deepwise Inc., China). The regions of interest (ROIs) were delineated manually by two senior radiologists who had 15 years experience (reader 1, Prof. Huijie Jiang) and 13 years experience (reader 2, Prof. Jinling Zhang). The entire cohort included 399 patients who were randomly divided into a training dataset (319 cases) and validation dataset (80 cases) by a ratio of 8: 2. The validation dataset evaluated the accuracy of the training dataset. The ROIs of CT images from the training cohort and the validation cohorts were manually segmented by the two readers who were specifically blinded to the therapy outcome of the patients.

Image Analysis and Preprocessing
All CT original images were reconstructed using a postprocessing workstation to achieve uniform slice thickness and input the reconstructed image into Deepwise software to delineate the ROIS. We saved one CT image and the corresponding mask of ROIs for each patient from the largest tumor area in hepatic-arterial phase CT images. ROI was delineated around the largest tumor area selected by transverse and sagittal observations, and the ROI area was outlined close to the edge of the tumor. A total of 319 patients were used as the training set and 80 patients were used as the validation set. Random image cropping and patching (RICAP) were employed for data augmentation for deep convolutional neural network training (Takahashi et al., 2018). The details are as follows: RICAP cropped new training CT images randomly from the original CT images and patched them to compose new training CT images set. Using this method, 5,460 patches Frontiers in Bioengineering and Biotechnology | www.frontiersin.org November 2021 | Volume 9 | Article 761548 3 were used to construct the new training set. In order to enhance the generalization ability of the model, the RICAP-based data augmentation was used in real time.

Deep Learning Convolutional Neural Network
GhostNet is an improved deep convolution neural network developed by Huawei Noah Ark Laboratory (Han et al., 2020;Paoletti et al., 2021). A GhostNet is a type of convolutional neural network that is built using Ghost modules, which aim to generate more features by using fewer parameters (allowing for greater efficiency). The architecture of GhostNet and the flowchart of deep learning for CT images are shown in Figure 1B. There are two important constituent concepts in the GhostNet. One is the Ghost module that can generate more feature maps from cheap operations. Through a series of linear transformations, ghost module can generate many ghost feature maps that can fully reveal the information behind the intrinsic features at a low computational cost. Another important concept is the Ghost bottleneck, which is designed to stack Ghost modules.
The Ghost bottleneck appears to be the basic block in Ghostnet in which several convolutional layers and shortcuts are integrated. In general, the ghost bottleneck consists of two stacked Ghost modules. The first Ghost module expands the number of channels and the second Ghost module reduces the number of channels to match the shortcut path. Then, there is shortcut connected between the inputs and the outputs of these two Ghost modules. After each layer, the ReLU nonlinearity and batch normalization (BN) are applied, except that ReLU is not used after the second Ghost module. GhostNet mainly consists of a stack of Ghost bottlenecks that consist of the Ghost modules as the building block. Here, we clearly explain the meaning of the parameters of "G-bneck a, b, c, d" in the figure. The G-bneck denotes Ghost bottleneck, the first parameter "a" means expansion size, "b" means the number of output channels, "c" denotes whether using SE module, and "d" denotes the stride. The first layer is a standard convolutional layer with 16 filters and then a series of Ghost bottlenecks with gradually increased channels connected in turn. According to the sizes of their input feature maps, these Ghost bottlenecks are grouped into different stages, and all the Ghost bottlenecks in each stage are applied with stride 1 except that the last one is with stride 2. At the end of the Ghostnet, the global average pooling (7 × 7) and the convolutional layer are utilized to transform the feature maps to a 1,280-dimensional feature vector for final classification. In the Ghostnet, some ghost bottlenecks also contain the squeeze and excite (SE) module. However, there is no hard-swish nonlinearity function due to its large latency, which is different from the MobileNetV3.

Implementation Details
Our implementation was based on the package for the Ghostnet Network in python (version 3.7.1). Our training experiments were performed in a Windows 10 environment on a computer server with the following specifications: CPU Intel Xeon Processor Platinum 8124 M at 3.00 GHz, GPU NVIDIA RTX 3060, and 128 GB RAM.

Statistical Analysis
Statistical analyses were performed with R statistical software (R Core Team, 2018) and Origin 9.1 (OriginLab Corporation, United States). Categorical variables were described as frequency (percentage), use Ghostnet to perform 50 iterations on the data, and finally calculate the AUC value (95% confidence interval). The performance of the prediction model was evaluated with the area under the receiver operating characteristic (ROC) curve, and the confusion matrices were plotted in validation cohorts to calculate the accuracies of estimating the response of TACE therapy.

Multi-Modal Point-of-Care Predicting System
In order to predict the response of TACE for HCC therapy, we developed a point-of-care system by a concept of integrating multi-modal large-scale data of CT imaging and clinical indexes ( Figure 1A). This artificial intelligent predicting system could be divided into two parts: the computed tomography image-based predicting response of TACE and the clinical index-based evaluation of HCC therapy. The GhostNet, which was developed by Huawei Noah Ark Laboratory, was employed as a deep learning score for predicting the response of patients with HCC after treatment. The architecture of GhostNet and the flowchart of deep learning for CT images are shown in Figure 1B. The GhostNet consisted of two important constituent concepts. Firstly, the Ghost module could generate more feature maps from cheap operations. Secondly, the Ghost bottleneck was designed to stack Ghost modules. More details can be seen in Materials and Methods.

Patients Clinical Characteristics
Our retrospective study had been approved by the institutional review board and Ethical Committee (KY 2019-217). Finally, 399 patients with HCC were enrolled in this study: 319 patients and 80 patients were allocated to the training cohort and validation cohort, respectively.

Classification of the TACE Therapy Response
Four typical CT images with different TACE responses from the validation cohort are shown in Figure 2. According to the mRECIST standards, the responses of patients after TACE treatment could be divided into two groups: the objective response and non-response. The objective response was defined as the tumor disappearance or the tumor area and corresponding cross-sectional diameter decreased at least 30%. After the TACE therapy, the cross-sectional diameter gradient of tumor of patients 1 and 2 were 43.9% (from 38.2 to 21.3 mm) and 40.7% (from 31.6 to 12.8 mm), respectively. Therefore, patients 1 and 2 were the typical objective response. The non-response was defined as the tumor area and corresponding cross-sectional diameter decreased less than 30% or the tumor progressed. Patients 3 and 4 belonged to the nonresponse. As can be seen, the cross-sectional diameter of tumor of patient 3 decreased from 32.5 to 30.7 mm (∼5.5%). Especially for patient 4, the cross-sectional diameter showed an increased trend

Training and Validation of the Point-of-Care Predicting System
The training cohort was augmented through the way of RICP to avoid data overfitting. In order to increase the robustness of the model, an improved deep convolution neural network (GhostNet) was used for data training. As shown in Figures 3A,B, the accuracy of the improved deep learning model was approximately 98% and the cross-entropy loss was close to 0.4 after training (∼50 training epoch). These results indicated that the improved deep learning model showed good performance on distinguishing the response of TACE therapy in these cohorts. In order to evaluate the training effect of the GhostNet based improved deep learning model, the AUC of the receiver operating characteristic curve was calculated. As shown in Figure 4A, the AUC of the deep learning model was about 0.98. The predictive accuracy of the deep learning model in each patch by confusion matrix after 50 epochs training was also investigated. The number of true-positive (TP) patches, false-positive (FP) patches falsenegative (FN) patches, and true-negative (TN) patches were 30, 2, 0, and 48, respectively as shown in Figure 4B. Hence, the precision, F1 score, and accuracy were 0.94, 0.97, and 0.98, respectively. These results indicated that the improved deep learning model could increase the robust accuracy of predicting the TACE response.

IBI-Based Predicting of TACE Response
The best response of TACE could not always be achieved after one session of CT imaging, especially for patients with large tumors. In addition, the CT image could not be achieved frequently due to damage of ionizing radiation on the patients. Therefore, other evaluation clinical indexes should be added to this model for efficiently predicting the response of TACE therapy. Recent studies confirmed that IBI also plays a prognostic role in the whole clinical process of malignancies. Figure 5 illustrates the boxplot of the clinical evaluation indexes. As can be seen, the box and median of objective response for NLR was extremely larger than that of non-response. Hence, SIRI had significant association with the response relation of TACE. Other clinical evaluation indexes had a certain correlation with the response relation. All the p values of the factors are less than 0.05. It presented that there are significant differences between the evaluation values of the six factors in response and non-response.
To further investigate the correlation of clinical evaluation indexes with TACE therapy, the thermodynamic diagram was also achieved by statistical analysis. Figure 6 shows the association between the IBI and the objective response and non-response. Create a dummy variable for the reaction variable, and then the correlation was calculated; 1 represented the objective response. According to the correlation coefficient thermodynamic diagram, among the six  factors, all factors had a certain correlation with the response relation, among which SIRI had a significant association with a correlation of 0.53, and AFP had no significant association with the response relation with a correlation of 0.16. Among them, PLR had a significant association. Hence, SIRI and PLR could be used to predict the response of TACE therapy.

DISCUSSION
In this study, we demonstrated a new artificial intelligent pointof-care multi-modal system for predicting the response of TACE therapy on patients with HCC. Based on integrating the multimodal data of CT images and IBI, an improved deep learning model was employed to formulate precisely the interventional treatment plan for HCC (especially for patients at intermediate tumor stage or advanced tumor stage). According to international guidelines, the TACE was recommended as the optimal treatment for patients with HCC at the intermediate tumor stage. The response of TACE therapy was crucial for clinicians to accurately identify patients who responded after TACE. Recent studies showed that CT imaging setting could be a promising tool for the detection, stage, and risk assessment of HCC (Woodall et al., 2007;Banerjee et al., 2015). In our study, 399 patients with HCC who underwent TACE had preoperative and postoperative CT enhanced images and clinical information, were enrolled to training a new deep learning model for predicting the response of TACE before operation. Using the random image cropping and patching method, 5,460 patches, which were cropped and patched from the original CT images, were used to construct a new training set. The accuracy of the improved deep learning model was approximately 98.0%, and the cross-entropy loss was close to 0.4. These results indicated that the improved deep learning model showed a good performance on the distinguishing of the response of TACE therapy in these cohorts.
However, CT was inapplicable for suspicious recurrence or atypical image. In addition, the CT images could not be achieved frequently due to damage of ionizing radiation on patients. Hence, it was necessary to find other clinical indexes combined with CT images for TACE response prediction. The included inflammation indexes (such as the NLR, MLR, PLR, SII, and SIRI) and AFP, which were derived from peripheral blood counts (e.g., neutrophil, lymphocyte, and platelet) and acutephase proteins [e.g., C-reactive protein (CRP) and albumin], represented enabling tumor characteristics. Hence, task challenge remained about how to preferably integrate multimodal data such as clinical index and CT images into an artificial intelligent system that enabled patients' outlook to be predicted accurately. The studies showed that SIRI and PLR are significantly correlated with the response of TACE therapy.
We showed that this artificial intelligent point-of-care system integrating multi-modal data of CT images, SIRI, and PLR could be used for precisely predicting the response of the patients with HCC. However, our study has several limitations. First, this was a retrospective research. Second, multi-center prospective data will be collected for external verification of GhostNet performance in the following study. In the future, we would apply other feature selection techniques and clinical indexes (such as genes and proteins) to further improve the accuracy of diagnosis of HCC.

CONCLUSION
In summary, a new artificial intelligent point-of-care multi-modal system based on CT images and clinical evaluation indexes would potentially serve as a new tool for predicting the response of TACE therapy on patients with HCC. The accuracy of this artificial intelligent system was approximately 98.0%, and the cross-entropy loss was close to 0.4. In addition, SIRI and PLR had a significant association with the responses of TACE therapy. These results indicated that this system showed a good performance on distinguishing the response of TACE therapy. This multi-modal point-of-care predicting system opened new possibilities to help clinicians select optimum patients with HCC who could benefit from the interventional therapy.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Second Affiliated Hospital of Harbin Medical University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
ZSu and HJ conceived the project, performed the experiments, collected and analyzed the data, performed the analysis, and wrote the manuscript. ZSh, YX, DW, HJ, and ZW performed experiments and/or collected the data. SZ, HJ, LZ, and YD. revised the manuscript. All authors contributed to the article and approved the submitted version. FUNDING FIGURE 6 | Correlation matrix between clinical evaluation indexes. The closer the absolute value of correlation coefficient was to 1, the stronger the correlation between variables was. The closer the absolute value of correlation coefficient was to 0, the weaker the correlation between variables was. The blue circle denotes a positive correlation, while red indicates a negative correlation.