Clinical Value of Machine Learning-Based Ultrasomics in Preoperative Differentiation Between Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma: A Multicenter Study

Objective This study aims to explore the clinical value of machine learning-based ultrasomics in the preoperative noninvasive differentiation between hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC). Methods The clinical data and ultrasonic images of 226 patients from three hospitals were retrospectively collected and divided into training set (n = 149), test set (n = 38), and independent validation set (n = 39). Manual segmentation of tumor lesion was performed with ITK-SNAP, the ultrasomics features were extracted by the pyradiomics, and ultrasomics signatures were generated using variance filtering and lasso regression. The prediction models for preoperative differentiation between HCC and ICC were established by using support vector machine (SVM). The performance of the three models was evaluated by the area under curve (AUC), sensitivity, specificity, and accuracy. Results The ultrasomics signatures extracted from the grayscale ultrasound images could successfully differentiate between HCC and ICC (p < 0.05). The combined model had a better performance than either the clinical model or the ultrasomics model. In addition to stability, the combined model also had a stronger generalization ability (p < 0.05). The AUC (along with 95% CI), sensitivity, specificity, and accuracy of the combined model on the test set and the independent validation set were 0.936 (0.806–0.989), 0.900, 0.857, 0.868, and 0.874 (0.733–0.961), 0.889, 0.867, and 0.872, respectively. Conclusion The ultrasomics signatures could facilitate the preoperative noninvasive differentiation between HCC and ICC. The combined model integrating ultrasomics signatures and clinical features had a higher clinical value and a stronger generalization ability.


INTRODUCTION
Primary liver cancer (PLC) is the second most common cause of cancer-related death worldwide (1,2). The incidence and mortality of PLC are steadily increasing (3), which is a great threat to global public health. Histologically, PLC is divided into hepatocellular carcinoma (HCC), intrahepatic cholangiocarcinoma (ICC), and rare types (less than 1%), such as mixed liver cancer (4). Although HCC and ICC share some similar risk factors and clinical manifestations, they differ in molecular features and carcinogenic mechanism (5). Therefore, the therapeutic decision-making and the prognosis also differ between the two (6). For HCC patients, surgical resection remains the first-line treatment (7). Early ICC is usually asymptomatic and the appearance of clinical symptoms may indicate the spread and metastasis of cancer. ICC generally remains undetected until the late stage. These features of ICC have limited the choices of surgery or liver transplantation for ICC patients. According to international guidelines, accurate differentiation between HCC and ICC is a prerequisite for sufficient first-line therapy of patients (8)(9)(10). Also, the survival and the prognosis of ICC patients are usually worse than those of HCC patients (11). It is critical to differentiate between HCC and ICC before surgery to make correct clinical decisions and prognostic predictions.
Generally, HCC and ICC are diagnosed based on imaging and serological and pathological evaluations (12). It was realized that the naked eye could identify limited information, and the conventional preoperative imaging evaluation could be highly subjective and differed based on the radiologist's experience. It may fail to detect hidden metastases or determine the infiltration scope of the tumor lesions (13). Also, in patients with liver cirrhosis, the conventional imaging techniques can hardly differentiate between small lesions of ICC and HCC. This is because most ICC and HCC lesions share a similar enhancement pattern (14). Given the facts above, the conventional imaging techniques only have a limited application value in tumor patients. Alpha-fetoprotein (AFP) and carbohydrate antigen 19-9 (CA19-9) are considered the ideal serum tumor markers for HCC and ICC. However, these two tumor markers are generally unsatisfactory in diagnostic sensitivity or specificity.
They may be unreliable if the diagnosis of tumors is made based on them alone (15,16). As the risk of cancers increases at the late stage, tumor biopsy does not apply in most situations (17). At present, it is urgent to look for a preoperative noninvasive method to differentiate between HCC and ICC.
Radiomics is an emerging technology, which deals with the extraction of a large bulk of information from medical images with high throughput, for example, shape, grayscale, textures, and wavelets. Radiomics involves deeper mining, prediction, and analysis of the extracted information to make more accurate diagnoses and tap into the full potential of medical imaging. In recent years, radiomics has been widely applied to tumor diagnosis (18)(19)(20), pathology grade (21,22), vascular invasion and therapeutic evaluation (23,24), and prognostic prediction (25,26). Compared with other imaging techniques, ultrasound has the advantages of low cost, easy operation, immediate result interpretation after examination, and no radiation exposure (27,28). Due to these advantages, the clinical application of ultrasomics is worthy of further investigation. Ultrasomics has been proven useful in the early diagnosis, preoperative grading prediction, efficacy evaluation and prognosis evaluation of liver tumor, breast tumor, thyroid tumor, gastrointestinal tumor, glioma, and other common tumor diseases (29)(30)(31)(32). However, there are few reports on the preoperative differentiation between HCC and ICC based on ultrasomics. Peng et al. applied ultrasomics to preoperative noninvasive differentiation between the histopathological subtypes of PLC (33). However, their study was confined to a single center and lacked of further validation of the findings. The present study was intended to investigate the clinical value of ultrasomics signatures in preoperative differentiation between HCC and ICC. The model performance was also tested on an independent validation set.

Study Population
A multicenter retrospective study involving three hospitals was performed, which was approved by the ethics committee. Informed consent was waived given the retrospective nature of the study. Clinical data and ultrasound images were collected from 2,137 patients pathologically confirmed as HCC or ICC at three hospitals from January 2019 to March 2021. Among them, 226 patients (HCC = 176, ICC = 50) were included in the final analysis. The inclusion criteria were as follows: (1) being pathologically confirmed as HCC or ICC; (2) having received liver ultrasound within 1 month before surgery and the ultrasound images information being intact; (3) having not received antitumor treatments before, including liver transplantation (LT), microwave ablation (MWA), radiofrequency ablation (RFA), and transcatheter arterial chemoembolization (TACE); (4) the ultrasound images satisfying the analytical requirements and the target lesions being totally visible on the ultrasound images; and (5) no history of concurrent malignancies. The flow chart of subject inclusion and exclusion is shown in Figure 1.

Clinicopathological Characteristics of Patients
The clinical data were acquired from the electronic health records, including: demographics (gender, age, history of hepatitis), laboratory tests (AFP, ALT, AST, TB, CB, and UCB) and ultrasound features (size of lesion). The laboratory tests and ultrasound imaging were examined within 1 month before surgery. The patients' pathology information (the pathological diagnosis of HCC or ICC) was obtained from the pathology information system.

Imaging Acquisition and Segmentation
Ultrasound images of liver tumors were collected using the Color Doppler Ultrasound System with a convex array transducer (frequency range 2.5-6 MHz), including GE Logiq E9, GE Vivid E9, HI VISION Ascendus, HI ALOK ProSound A5, Philips EPIQ 5, and Aloka EZU-MT28-S1. All ultrasound scans were performed by ultrasound physicians who had over 5 years of experience in liver ultrasound. At least one original ultrasound image showing the lesion and the same image containing the measurement parameters were stored in the DICOM format.
The open-source software ITK-SNAP v.3.6.0 was used to manually delineate the region of interest (ROI) (34). First, an ultrasound physician with over 9 years of experience loaded the images into the ITK-SNAP software and manually annotated the entire lesion. Another ultrasound physician with 30 years of experience then delineated ROI in the lesions for all ultrasound images. The reproducibility of feature extraction from ROI was evaluated according to the delineation results. Both ultrasound physicians had 4 years of working experience concerning ITK-SNAP software. They were blinded to clinical history and pathology results but were aware of the purpose and design of the study. The ROI segmentation results for the representative liver lesions are shown in Figure 2.

Feature Extraction and Selection
A researcher with 5 years of experience performed image preprocessing to eliminate variability of the ultrasound images arising from the use of different ultrasound equipment at different hospitals and to improve the reproducibility of feature extraction. First, the ultrasound images were normalized based on the mean and standard deviation. Second, the images were resampled by B-spline interpolation to 1 mm × 1 mm pixel. Finally, gray-level discretization was performed for the histogram with the bin width fixed at 25 (35).
The open-source Python package Pyradiomics v.2.1.2 was used to extract ultrasound features from each patient. The   Since the unit and value range varied for different extracted features, the feature values were of varying scales. To cope with this problem, we performed Z-score normalization before feature selection to ensure a relatively uniform distribution of the image features. However, all of the extracted features were high dimensional. The use of high-dimensional features might have the problems of low computational efficiency and overfitting (37). First, the features with zero variance were excluded by using the variance filtering method. Next, lasso method was performed for further dimensionality reduction of the features and the most valuable features were selected. The 10-fold cross-validation process was repeated 1,000,000 times to obtain the optimal value of parameter l, which was introduced into the lasso method to calculate the regression coefficients of each feature. Finally, the features with nonzero coefficient were selected. The study workflow is shown in Figure 3.

Machine Learning Model Construction and Evaluation
We invoked the Python scikit-learn 0.23.2 package for SVM model training and performance evaluation. The patients from two hospitals were randomly divided into training set and test set by stratified sampling at a ratio of 8:2. The patients from a third hospital were used as an independent validation set. The learning curve and the grid search were used concomitantly to select the optimized parameter combination consisting of the kernel function, coefficient of kernel function, penalty coefficient and class_weight. The specific process of parameter tuning is available in the Supplementary Material 3.
Three models were constructed in this paper. First, the clinical model was constructed using the patients' clinical data, including gender, age, history of hepatitis, AFP, ALT, AST, TB, CB, UCB, and the size of lesion. Second, an ultrasomics model was constructed using the ultrasomics signatures extracted and selected from the ROI delineated on the ultrasound images of the HCC or ICC patients. Finally, the combined model was built by integrating the clinical features and the ultrasomics signatures. The details of the model construction process can be found in the Supplementary Material 4.
The three models built upon the training set were evaluated using the test set and the independent validation set. The predictive performance of the three models was evaluated by plotting the ROC and estimating the performance indicators, including AUC (along with 95% CI), accuracy, sensitivity, and specificity. An overview of the entire process is shown in Figure 3.

Statistical Analysis
SPSS 25.0 software was used for statistical analysis. The normality of continuous variables was tested using the Kolmogorov-Smirnov test. Continuous variables obeying a normal distribution were analyzed by the independent-samples t-test. Otherwise, they were analyzed by Wilcoxon's rank-sum test. The relationships between the categorical variables were tested by using the chi-square test. The continuous variables obeying a normal distribution were expressed as mean ± standard deviation. Otherwise, the continuous variables were expressed by medians [interquartile range (IQR)]. Categorical variables were expressed as n (%). p < 0.05 indicated a significant difference. The Delong test was employed for a quantitative comparison of the ROC among the three models (38).
The reproducibility of feature extraction was evaluated using the intraclass correlation coefficient, which greater than 0.8 indicated high consistency, 0.5 to 0.79 moderate consistency, and less than 0.5 low consistency (39).

Clinicopathological Characteristics of Patients
The clinicopathological features in the training set, test set, and independent validation set are shown in Table 1. The percentages of ICC patients in the training set, test set, and independent validation set were 20.8% (31/149), 26.3% (10/38), and 23.1% (9/39), respectively. The percentages of ICC patients with a history of hepatitis were 63.8% (95/149), 73.7% (28/38), and 74.4% (29/39), respectively. The average age of patients was 57.2 ± 11.1, 58.7 ± 9.3, and 59.1 ± 11.2 in the training set, test set, and independent validation set, respectively. The three sets did not differ significantly in demographics, laboratory test results, and ultrasound features (p > 0.05). Among these extracted features, 330 features with an intraclass correlation coefficient below 0.8 were first excluded. Then 16 features with zero variance were excluded using the variance filtering method. Lasso was used to reduce the dimensionality, which finally resulted in 14 features. The process of lasso feature selection is illustrated in Figure 4, with detailed information shown in the Supplementary Figures S1-S3.

Predictive Performance of the Clinical Model and the Ultrasomics Model
The ROC curves of the clinical model and the ultrasomics model on the training set, test set, and independent validation set are shown in Figures 5A, B Table 2.   respectively. Thus, the combined model integrating ultrasomics signatures and clinical features had a better performance in differentiation between HCC and ICC than the other two models. The combined model had more stable performance and higher generalization ability (p < 0.05).

DISCUSSION
In clinical practice, physicians depend heavily on clinical symptoms, tumor serum markers, and imaging examination to differentiate between PLC subtypes before surgery. Since HCC and ICC share similar risk factors and clinical manifestations, the routine examination methods may lead to diagnostic mistakes. In the present study, ultrasomics signatures were generated using normalization, variance filtering, and lasso regression, and the prediction models were established by using SVM. The results showed that the ultrasomics signatures were successfully used to differentiate between HCC and ICC on the training set, test set, and the independent validation set (p < 0.05). The combined model outperformed the ultrasomics model on the test set, while the performance of clinical model was worse, the AUC of which was 0.936, 0.843, and 0.711, respectively. On the independent validation set, the performance of the combined model was still the best (p < 0.05). However, the performance of the ultrasomics model was worse than that of the clinical model (p < 0.05). The AUC was 0.874, 0.730, and 0.800, respectively. This was probably due to the differences in the type of equipment at diverse hospitals and different habits of using the ultrasound equipment among the physicians. Medical imaging is an important diagnostic tool and plays an increasingly vital role as precision medicine continues to develop (40). It has been shown that imaging method based on multimodal imaging techniques can preoperatively differentiate between HCC and ICC to varying degrees. Ichikawa et al. determined the imaging hallmarks for distinguishing intrahepatic mass-forming biliary carcinomas (IMBCs) from HCC, and the diagnostic value was further verified by Bayesian  statistics (AUC is 0.960) (41). However, only the radiographic manifestations of patients with good liver function and receiving surgical treatment were investigated, and such a selection bias might influence the results. Lewis et al. evaluated the ability of quantitative apparent diffusion coefficient (ADC) histogram analysis parameters and LI-RADS category in differentiating between HCC and other subtypes of PLC (42). In the two independent observers, the combined AUC of sex and LI-RADS and ADC at the fifth percentile for the diagnosis of liver cancer was 0.90 and 0.89, respectively. The result showed that HCC can be better distinguished from ICC and cHCC-ICC by combination of the ADC histogram parameters and LI-RADS categorization. However, there were a small number of samples and extracted features in their study. None of the studies above proceeded to deep mining and utilization of the radiographic images. As a result, a large number of tumor features and heterogeneity information of the tumor went unheeded. As a branch of radiomics, ultrasomics has been proven helpful for liver fibrosis evaluation (43), differential diagnosis of liver tumors, and microvascular invasion assessment of HCC (44,45). However, there have been few reports on the use of ultrasomics signature for the differentiation between HCC and ICC. Peng et al. applied ultrasomics analysis for noninvasive differentiation between the histopathological subtypes of PLC (33). The features were selected by using the Spearman correlation and lasso regression. Then the HCC-vs-non HCC radiomics model was constructed using a logistic regression algorithm. The AUC of which on the test set was 0.775. However, their findings were not subjected to multicenter validation. In our study, the AUC of the combined model for preoperative differentiation between HCC and ICC was 0.936 and 0.874 on the test set and the independent validation set, respectively, which were higher than those reported in the existing literature. It was indicated that ultrasomics seems to be potentially used clinically in the future.
However, there were also certain limitations in our study. Firstly, different grayscale ultrasound imaging systems were used to acquire the ultrasound images. Although the images were preprocessed before feature extraction, the use of different equipment for the imaging might affect the feature extraction results. Therefore, whether the established models are robust and universal remains to be further verified by incorporating more data. Secondly, all of the data were collected from consecutive cases in a retrospective manner, leading to inevitable selection bias. Therefore, it is necessary to increase the sample size in future studies. Moreover, the differentiation performance of the ultrasomics signatures and the established models remains to be further verified by a prospective study. Thirdly, only two subtypes of PLC, namely, HCC and ICC, were covered in our study, but the rare subtypes, such as the mixed liver cancer, were not. The data of other subtypes of liver cancer should be included in future studies for optimized universality and clinical value of the models.
Taken together, the clinical value of machine learning-based ultrasomics was confirmed for the preoperative noninvasive differentiation between HCC and ICC. The combined model not only had a better performance in differentiation between HCC and ICC than either the clinical model or the ultrasomics model alone but also had a higher generalization ability.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The study involving human participants were reviewed and approved by the ethics committees, and the written informed consents were waived.

AUTHOR CONTRIBUTIONS
LZ and SR conceived and designed the study. SR, QL, QQ, SD, BM, XL, and YW collected the data. LZ, SR, and SL analyzed the data. SR and LZ wrote the paper. All authors contributed to the article and approved the submitted version.