Can a proposed double branch multimodality-contribution-aware TripNet improve the prediction performance of the microvascular invasion of hepatocellular carcinoma based on small samples?

Objectives To evaluate the potential improvement of prediction performance of a proposed double branch multimodality-contribution-aware TripNet (MCAT) in microvascular invasion (MVI) of hepatocellular carcinoma (HCC) based on a small sample. Methods In this retrospective study, 121 HCCs from 103 consecutive patients were included, with 44 MVI positive and 77 MVI negative, respectively. A MCAT model aiming to improve the accuracy of deep neural network and alleviate the negative effect of small sample size was proposed and the improvement of MCAT model was verified among comparisons between MCAT and other used deep neural networks including 2DCNN (two-dimentional convolutional neural network), ResNet (residual neural network) and SENet (squeeze-and-excitation network), respectively. Results Through validation, the AUC value of MCAT is significantly higher than 2DCNN based on CT, MRI, and both imaging (P < 0.001 for all). The AUC value of model with single branch pretraining based on small samples is significantly higher than model with end-to-end training in CT branch and double branch (0.62 vs 0.69, p=0.016, 0.65 vs 0.83, p=0.010, respectively). The AUC value of the double branch MCAT based on both CT and MRI imaging (0.83) was significantly higher than that of the CT branch MCAT (0.69) and MRI branch MCAT (0.73) (P < 0.001, P = 0.03, respectively), which was also significantly higher than common-used ReNet (0.67) and SENet (0.70) model (P < 0.001, P = 0.005, respectively). Conclusion A proposed Double branch MCAT model based on a small sample can improve the effectiveness in comparison to other deep neural networks or single branch MCAT model, providing a potential solution for scenarios such as small-sample deep learning and fusion of multiple imaging modalities.


Introduction
As one of the most common primary liver malignancies, hepatocellular carcinoma (HCC) is the third leading cause of tumour-related deaths worldwide (1,2). The optimal treatment choices for HCC like surgical resection and transplantation have been consistently improved in recent decades. However, due to high recurrence rates, early recurrence and long-term prognoses remain unsatisfactory (3). Among several factors, such as histological grade and tumour size, previous studies (4,5) have confirmed that microvascular invasion (MVI) is a vital factor for early recurrence and poor long-term prognosis in HCC patients treated by resection or transplantation (6,7). However, the preoperative evaluation of MVI is difficult, which warrants a noninvasive, highly accurate tool for evaluating the presence/ absence of MVI in HCC patients when making treatment decisions preoperatively.
Previous studies (8)(9)(10)(11) have attempted to predict the presence of MVI based either on computed tomography (CT) or MRI imaging features alone. Although promising results were presented, several limitations still existed to negatively affect the diagnostic performance and clinical applicability to some degree. For example, considerable interobserver variability were found in the assessment of MVI in HCC using MRI (12), even for more experienced radiologists due to inevitable subjective bias in the process of personal imaging analysis. In addition, few study have attempted to predict MVI in HCC based on multi-phase CT and multi-sequence MRI techniques together. Previous studies have seperately found that CT-based features like tumor margin (13) or MRI-based features like ADC value (8) and peritumor hypointensity in the hepatobiliary phase (HBP) (14) had moderate to high correlation with the presence of MVI in HCC. As the presence of MVI in HCC co-exists with diverse radiologic features simultaneously, it is necessary to investigate whether or not the noninvasive prediction accuracy of MVI in HCC could be improved based on a combination of CT and MRI imaging features rather than on a single imaging modality alone.
With the rapid development of machine learning, recent studies have attempted to explore the potential of machine learning including deep learning (15,16) and radiomics (17,18) in prediction of MVI in HCC. Based on various deep learning models, several studies found that the deep learning models could achieve a moderate diagnostic accuracy in a range from 0.66 to 0.76 on CT or MRI imaging alone (19-21). However, the further improvement of prediction accuracy of deep learning model for MVI in HCC is hampered by several factors, in which the limited well-annotated medical imaging data is a major one. It is wellknown that at least10 thousand data are required for deep learning model to achieve a relatively optimal training and verification results. Nevertheless, even the largest sample size of 750 cases in a published multi-center study (19) is still far from the needs for deep learning. Moreover, the substantial increase of wellannotated medical imaging data to the requirement of deep learning is deemed a genuine hardship considering the far from enough qualified imaging data and high time-consuming for well annotation. In addition, common deep learning models give the same weight to each sequence channel in the diagnosis process of medical problems, rather than assigning different weights to different sequences according to their importance like the diagnostic logic applied by radiologists. Theoretically, diagnostic experiences from radiologists can improve the construction of deep neural networks and alleviate the problem caused by insufficient training samples to a large extent.
To solve the problems mentioned above, we propose a new double-branch multimodality-contribution-aware TripNet (MCAT) model. In this model, the data augmentation and metric learning techniques were applied to overcome the negative effect of small sample size, and the incorporated radiologists' diagnostic experience with modal (sequence) attention schemes was used to improve the accuracy of deep neural network and alleviate the effect caused by insufficient training samples. It is hypothesized that the MCAT model could improve the prediction accuracy of MVI in HCC compared to other commonly-used deep neural networks on small samples. Therefore, this study aimed to investigate the possible improvement of prediction performance of MCAT model in MVI of HCC compared to other deep neural networks, based on a multi-modality CT and MRI data with small samples.

Patients
This retrospective study was approved by the institutional Human Ethics Committee after the written informed agreement was waived. From January 2015 to December 2020, 302 consecutive patients underwent dynamic contrast-enhanced CT (CE-CT) or/and dynamic contrast-enhanced MRI (DCE-MRI) with other conventional MRI sequences to evaluate HCC in the Department of Radiology, Beijing Friendship Hospital. The inclusion criteria were as follows: (1) four-phasic liver DCE-MRI images were available, including precontrast images and those of the arterial, portal venous, and equilibrium phase, or three-phasic CE-CT images were available, including precontrast images and those of the arterial and portal venous phase (PVP); (2) the pathologic MVI of HCC was obtained by surgical resection; (2) no previous treatment, such as percutaneous ethanol injection, radiofrequency ablation, or transcatheter arterial chemoembolization had occurred. The exclusion criteria were as follows: (1) inaccurate time point of phase; (2) hepatobiliary contrast agent for MRI; (3) an interval between CT or MRI imaging examinations and resection longer than 4 weeks; and (4) prominent artifacts that affected the observation of HCCs. See Flow Chart Figure 1 for details.

Image acquisition
Liver CE-CT scans were acquired by various multidetector CT scanners. CT images were obtained before and after administration of contrast agent during the arterial phase (AP), portal venous phase (PVP) and/or equilibrium phase. Among them, plain scan, AP and PVP images were used for analyses. All CE-MRI examinations were performed in 3.0 T scanners. Three MRI sequences were used for analyses: T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI) and T1-weighted imaging (T1WI), including imaging before and after intravenous injection of diethylenetriaminepentaacetic acid (DTPA) at the AP, PVP, and equilibrium phase (EP). Details of imaging acquisition protocols are shown in Supplementary File 1.

Histopathologic MVI diagnosis of HCC
Histopathologic examination for surgical specimens was performed at each site by two experienced pathologists who were unaware of the patient's radiologic examination results and clinical history. The MVI of HCC was defined as the presence of tumour thrombus in small peritumor vessels (portal vein, hepatic vein or large capsular vesselslined with surrounding liver tissue) only detected under the microscopy. Any differences were resolved by consensus. Details of the histopathologic diagnosis of MVI are provided in Supplementary File 2.
Double-branch multimodalitycontribution-aware TripNet based on small samples Due to the particularity of the data composition with only CT form, only MRI form, and CT&MRI mixed form, the doublebranch multimodality-contribution-aware TripNet based on small sample is adopted, and the 2D slice with the largest lesion area in each modality of CT images and MRI images was used for ROI (region of interest) extraction and greyscale normalization. The segmentation boundaries were drawn with ITK-SNAP software (https://www.radiantviewer.com) slice by slice for each volume along the visible borders of the lesion. In order to facilitate understanding, the whole process was divided into three parts. The first part is the establishment of multimodality-channel contribution aware single-branch TripNet, which consists feature embedding module and evaluation module, using pure CT image data and pure MR data respectively. In the feature embedding module, multimodal channel adaptive weighted modules (MAWM) are added to consider the final classification weight of the features in different modal channel dimension that is similar to the prior knowledge of radiologists to consider the importance of different modal in clinical diagnosis. For example, in clinical work, radiologists believe that arterial phase images in CT are more important in the diagnosis of MVI in hepatocellular carcinoma, so we give more weight to arterial phase channels in MAWM. In the second part, single-branch pretraining based on small samples is added for each of the multimodality-channel contribution aware single-branch TripNet to form single branch network including CT branch network and MRI branch network. It consists of two stages, namely, the feature embedding pretraining and the fine-tuning stage of model. The data augmentation and metric learning (22,23) are added in the feature embedding pretraining to solve small samples problem. In the third part, CT branch network and MRI branch network are weighted and fused according to a certain proportion, and the parameters of the two branches are further updated by mixed CT and MRI data, and finally the double-branch multimodalitycontribution-aware TripNet based on small sample is obtained. See Figure 2 and detailed mechanism and formula in Supplementary File 3.

Statistical analysis Comparison method
To evaluate the effectiveness of the proposed method in the diagnosis and evaluation of hepatocellular carcinoma (HCC) MVI, three groups of comparative tests were conducted including comparison with other deep neural networks, comparison with end-to-end training model and Comparison between double branch network and single branch networks. The whole three comparisons used the same data and same conditions from this study.

Evaluation indicators
The evaluation indexes adopted in this paper are Accuracy, Sensitivity, Precision and F1-Score. The default value 50% was used as the threshold value. The receiver operating characteristic curve (ROC) was drawn for the prediction results of each model, and the area under the curve (AUC) was further used to evaluate the diagnostic quality of MVI. Comparison between AUC values were performed by Z test and P values less than 0.05 were considered statistically significant. Some of the formulas are shown in Supplementary File 4. Flow chart for details. HCC, hepatocellular carcinoma; MVI, microvascular invasion; TACE, transcatheter arterial chemoembolization; RFA, radiofrequency ablation; PEI, percutaneous ethanol injection.

Results
A total of 103 patients with 121 pathologically confirmed HCCs were included in this study, including 77 negative for MVI and 44 positive for MVI. The data in this study were composed of 49 CT images (18 positive/31negative), 32 MR images (12 positive/20negative) and 40 mixed images (14 positive/ 26negative) of HCCs. The demographic and pathological information of the patients is summarized in Table 1. Each network training set and validation set were matched in a ratio of 4:1. The data of the training set was expanded by data augmentation in single-branch pretraining based on small samples. The validation set used the original sample data without augmentation. Each network carries out 5 fold cross validation.

Comparison with other deep neural networks
The results are shown in

Comparison with end-to-end training model
In the same experiment setting, comparison between end-toend training and single branch pretraining based on small samples in different branches have been carried out. The results are shown in Table 3. The AUC value of model with single branch pretraining based on small samples is higher than model with end-to-end training in CT branch, MRI branch or double branch (0.62 vs 0.69, 0.68 vs 0.73, 0.65 vs 0.83, respectively). And among them the AUC values in the CT branch and double branch were statistically different (Z=2.41, p=0.016 and Z=2.54, p=0.010, respectively). In order to FIGURE 2 Flow chart of CT&MRI double-branch multimodality-contribution-aware TripNet. MAWM, multimodal channel adaptive weighted modules. The whole model consists of three steps. The first step is to establish Multimodality-Channel Contribution Aware Single-Branch TripNet. The highlight of this step is to integrate the prior knowledge of radiologists by setting the multimodal channel adaptive weighted modules (MAWM). Considering the importance of different modal in clinical diagnosis, the network can adaptively increase important modal features, and the attention to the features of the unimportant modal channels can be reduced. In the second step, based on the first step, a single-branch twostep training strategy is added for small sample problems, including data enhancement and metric learning, and then the CT branch network and MRI branch network are obtained. In the third part, the CT Branch and MRI Branch network are fused and updated with CT and MRI Mixed data, and double-branch Multimodality-contribution-aware TripNet is finally obtained.
intuitively demonstrate the effectiveness of the single branch pretraining based on small samples, we also visualized the feature embedding space of the CT branch and MRI branch obtained by the two training methods, as shown in Figure 4.

Comparison between double branch network and single branch networks
The comparison results between double branch network and single branch networks are shown in the Table 2. ROC curves and AUC values of CT branch network, MRI branch network and double branch network in the 5-fold crossover experiment were plotted and calculated, and the experimental results are shown in Figure 5. By comparing the average AUC value of different branch network, the average AUC value of the double branch network was significantly higher than that of the CT branch network or MRI branch, network with Z=3.39, p<0.001 and Z=2.18, p=0.029, respectively. But there is no significant difference in the average AUC value between CT branch network and MRI branch network (Z=0.934, p=0.350).
The diagnostic results of two cases with MVI positive lesions (No.108 and No. 12) in different branch networks (CT branch network, MRI branch network and double branch network) were presented. The double branch network and MRI branch network of lesion No. 108 were correctly diagnosed, while the CT branch network was incorrectly diagnosed (the probability of predicting positive MVI was 0.7, 0.7 and 0.1, respectively). The double branch network and CT branch network of lesion No. 12 were correctly judged, while the MRI branch network was incorrectly judged (the probability of predicting positive MVI was 0.6, 0.8 and 0.4, respectively). When the probability of predicting positive MVI was greater than 0.5, the network classified it as positive. CT and MRI images and pathological results of these two lesions are shown in Figure 6.

Discussion
In comparison to other deep neural networks, our study shows that the proposed double-branch MCAT model could significantly improves the prediction accuracy of MVI in HCC by the use of the modal (sequences) attention schemes and fusion of CT and MRI images based on small samples.
SeNet (25) is proposed as a representative channel attention mechanism based on ResNet and CNNs in 2017. It proposes a novel architectural unit, which is termed the "Squeeze-and-Excitation" (SE) block, calculates the weighted features by focusing on the importance of different channels of feature vectors. Considering that different channel-wise features have various contributions to the different clinical issues such as diagnosis of HCC or evaluation of MVI in HCC, some studies have attempted to apply SE block in the field of medical image analysis. In Chen et al. (27) study, a SE block was introduced into traditional CNN which achieved a good performance improvement in classification of benign and malignant pulmonary nodules with a max AUC value of 0.930. In Jia et al. (28) study to the diagnosis of HCC pathologic grades, a SE block was combined with residual calculation Block to calculate the contribution degree of multiple MRI sequences such as T2WI and T1WI with a purpose to inhibit the influence of non-important MRI sequences while paying attention to important MRI sequence. However, although several studies (19-21) were focused on the prediction of MVI in HCC by using traditional CNN or ResNet, few study has applied SE block to improve the prediction accuracy. ECANet (26) and MCAT are optimized on the basis of SENet. ECANet is efficient channel attention and MCAT is Mutli-modals (sequences) contribution aware   Histograms of accuracy, sensitivity, precision and F1 score of different networks in CT branch, MRI branch and double branch models. The heights of the blue, dark blue, red, orange and yellow histograms represent the average values of accuracy, sensitivity, precision and F1 score in the CT branch, MRI branch and double branch, respectively, using different deep neural networks. The short black line at the top of the histograms represents the standard deviation of of the corresponding mean. AUC, the areas under the receiver operating characteristic curves; a , Comparison AUC value of CT branch between using end-to-end training and using single-branch pretraining; b , Comparison AUC value of MRI branch between using end-to-end training and using single-branch pretraining; c , Comparison AUC value of double branch between using end-to-end training and using single-branch pretraining; end-to-end training, without single-branch pretraining; single-branch pretraining= for solving small sample problem to create including feature embedding pretraining (data augmentation and metric learning) and the fine-tuning stage of model.
The comparison between the feature space obtained by single-branch pretraining based on small sample and the feature space obtained by end-to-end training. Each point in the figure represents a lesion. Blue represents MVI-negative lesions, and orange represents MVI-positive lesions. Panels (A, D) represent the original feature space of the CT branch and MRI branch. Blue and orange points are irregularly mixed together. Panels (B, E) show the feature space of the CT branch and MRI branch after end-to-end training. Blue and orange points start to gather, but they are still mixed together. Panels (C, F) show the feature space of the CT branch and MRI branch after single-branch pretraining based on small samples. Points of the same colour begin to gather, and blue points and orange points are basically separated, which improves the ability to distinguish between negative and positive MVI.
mechanism based on modal (sequences) attention. ECANet is currently mainly used in computer vision for image classification and segmentation, such as electrocardiogram classification (29) and image reconstruction (30 As popular approaches to solve the small-sample size problems, both data augmentation and metric learning were also used in the proposed MCAT model to improve the prediction accuracy (28,31). Theoretically, data augmentation can enrich the sample diversity and solve the problem of small samples at the data level. while the measurement of metric embedding loss can make the obtained feature space more discriminable and solve the problem of small samples at the feature level (22,23). In our study, we used both data augmentation and metric learning to solve small sample problems with remarkable results. Jingwei et al. (19) included a sample size of 750 cases in their study, in which the deep learning model ResNet18 without data augmentation and metric learning. The AUC value of CE-CT and EOB-MRI were 0.734 and 0.802, respectively. The total number of samples in our study was 121, and the data volume of the double branch was only 40 cases, with an AUC value of 0.83, which is higher than the previous results of a large sample size. This indicates that data augmentation and metric learning could be a solution to the problem of small sample size without affecting the results. Moreover, our study shows that using single branch pretraining can intuitively improve the diagnostic evaluation performance of the model compared with end-to-end training, effectively reduce the standard deviation of each evaluation index, and make the model performance obtained by training more excellent and stable. In computing, the power of 10 is an order of magnitude, and 100 and 1000 are small sample sizes for a computer. In medical imaging, more than 100 well-annotated images are difficult to obtain. Therefore, the application of data augmentation and metric learning can greatly solve the problem of small sample sizes in the medical field.
Additionally, in our study, the double branch of CT and MRI images was more efficient than the single branch of CT or MRI images alone, with AUCs of 0.83, 0.69 and 0.73, respectively. A possible explanation is that CT and MRI are different imaging techniques and have different advantages in evaluating MVI. Jingwei Wei et al. (19) established ResNet18 models based on CE-CT and EOB-MRI images and concluded that the model based on EOB-MRI had a better effect than the model based on CE-CT with AUC values of 0.802 and 0.734, respectively. In addition, Hu et al. showed that CT is superior to MRI in evaluating tumour margins (13). CT and MRI have their own imaging characteristics. In contrast, CT has higher spatial resolution, while MRI has higher tissue resolution. Therefore, CT and MRI images may also provide texture features in different directions for deep learning models, and the simultaneous uptake of these features can improve diagnostic efficiency. Moreover, our model can accept different imaging images to predict MVI of HCC, such as CT, MRI, or both. This may have broader clinical applicability. For various reasons, we cannot guarantee that patients will be able to choose imaging modalities as we require.
Our study also has some limitations. First, the number of HCCs in the present study is limited, which may influence the generalization of the deep learning model. However, we A B C FIGURE 5 ROC curves of the performance of different diagnostic branch models in the 5-fold cross test. (A) the AUC of the CT branch; (B) the AUC of the MRI branch; (C) the AUC of the MIX branch; ROC, receiver operating characteristic curve; AUC, the area under the receiver operating characteristic curve.
adopted data augmentation and metric learning to solve this problem. Second, MVI grade was not considered in the MVIpositive group. In addition, other contrast agents or Gd-EOB-DPTA-enhanced MRI images in the hepatobiliary phase were not compared or evaluated in this study, and we will continue to study them in the future.
In conclusion, our study indicates that double branch MCAT based on small samples can improve the effectively compared with other deep neural networks, providing a solution for scenarios such as small-sample deep learning and fusion of different imaging technologies.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Author contributions
Study concept and design: DY, ZY. Acquisition of data: YD, XJ. Analysis and interpretation of data: DY, ZY,YD, XJ. Drafting