Differentiation of benign and malignant parotid gland tumors based on the fusion of radiomics and deep learning features on ultrasound images

Objective The pathological classification and imaging manifestation of parotid gland tumors are complex, while accurate preoperative identification plays a crucial role in clinical management and prognosis assessment. This study aims to construct and compare the performance of clinical models, traditional radiomics models, deep learning (DL) models, and deep learning radiomics (DLR) models based on ultrasound (US) images in differentiating between benign parotid gland tumors (BPGTs) and malignant parotid gland tumors (MPGTs). Methods Retrospective analysis was conducted on 526 patients with confirmed PGTs after surgery, who were randomly divided into a training set and a testing set in the ratio of 7:3. Traditional radiomics and three DL models (DenseNet121, VGG19, ResNet50) were employed to extract handcrafted radiomics (HCR) features and DL features followed by feature fusion. Seven machine learning classifiers including logistic regression (LR), support vector machine (SVM), RandomForest, ExtraTrees, XGBoost, LightGBM and multi-layer perceptron (MLP) were combined to construct predictive models. The most optimal model was integrated with clinical and US features to develop a nomogram. Receiver operating characteristic (ROC) curve was employed for assessing performance of various models while the clinical utility was assessed by decision curve analysis (DCA). Results The DLR model based on ExtraTrees demonstrated superior performance with AUC values of 0.943 (95% CI: 0.918-0.969) and 0.916 (95% CI: 0.861-0.971) for the training and testing set, respectively. The combined model DLR nomogram (DLRN) further enhanced the performance, resulting in AUC values of 0.960 (95% CI: 0.940- 0.979) and 0.934 (95% CI: 0.876-0.991) for the training and testing sets, respectively. DCA analysis indicated that DLRN provided greater clinical benefits compared to other models. Conclusion DLRN based on US images shows exceptional performance in distinguishing BPGTs and MPGTs, providing more reliable information for personalized diagnosis and treatment plans in clinical practice.


Introduction
The parotid gland is a vital exocrine organ and the primary site for salivary gland tumors.Parotid gland tumors (PGTs) account for approximately 3-12% of head and neck neoplasms, with 80% of all salivary gland tumors occurring in this location (1,2).The majority of these tumors are benign, comprising around 75% to 80%,with pleomorphic adenomas (PA) and Warthin tumors being the most common types, followed by basal cell adenomas (BCA).Mucoepidermoid carcinoma (MEC) is the most frequent malignant parotid gland tumor (MPGTs), followed by adenoid cystic carcinoma (ACC) and acinar cell carcinoma (3,4).The pathological subtypes of PGTs are complex.Accurate discrimination between benign and malignant PGTs is crucial for clinical management and prognosis assessment.For most benign parotid gland tumors (BPGTs), partial gland or simple tumor resection suffices (5).However, MPGTs often require more aggressive interventions such as total parotidectomy along with potential lymph node dissection, complemented by radiotherapy if deemed necessary (6).
Preoperative auxiliary diagnosis of PGTs primarily involves two methods: fine-needle aspiration cytology (FNAC) and imaging examination.FNAC is currently widely utilized as an adjunctive diagnostic tool, exhibiting an accuracy rate ranging from 85% to 97% in distinguishing between BPGTs and MPGTs (7).However, due to the limited sample size, it may not fully represent the overall characteristics of the tumor, leading to inconclusive diagnoses (8).Furthermore, FNAC is an invasive procedure that carries risks of tumor cell implantation metastasis and inducing parotitis (9).Currently employed imaging techniques for parotid examination include ultrasound (US), computed tomography (CT),and magnetic resonance imaging (MRI).CT can effectively illustrate the relationship between the tumor and surrounding tissue structures.MRI offers high soft tissue resolution enabling assessment of nerve invasion by tumors.Nevertheless, their clinical application is restricted by ionizing radiation exposure, high costs, and various contraindications (10,11).In comparison, US possesses noninvasive features with real-time capability at a lower cost.It provides comprehensive information regarding the location, size, shape, margin, and blood supply of tumors; hence, it is considered as the preferred preoperative imaging method for evaluating PGTs (12).Nonetheless, the US features of PGTs partially overlap, and interpretation of US findings may vary depending on operator experience, resulting in discrepancies in diagnostic outcomes (13).
Radiomics is a field emerged from the convergence of artificial intelligence (AI) and medical imaging.It enables the extraction of potential features from medical images that are imperceptible to the human eye in a high-throughput manner, which can be transformed into visual data for quantitative analysis (14).By utilizing machine learning models, radiomics facilitates non-invasive assessment of various biological behaviors associated with tumors, making it widely applicable in early diagnosis, prognosis prediction, and treatment evaluation (15)(16)(17).While several scholars have conducted radiomics research on PGTs using CT and MRI images (18)(19)(20), there is limited literature based on US images (21).
In recent years, the rapid development of AI has led to the widespread application of deep learning (DL) in various medical fields.Among different types of DL architectures, convolutional neural networks (CNNs) have emerged as the most commonly used approach (22).Compared to traditional radiomics, DL neural networks with their multi-layer structure can automatically learn semantic and spatial features from hidden layers, enabling end-toend mapping from input to output.This capability has shown promise in improving tumor classification performance (23-25).Yu et al. (26) developed multiple DL models based on multi-center CT images to assist in diagnosing BPGTs and MPGTs, and it was found that MobileNet V3 exhibited the best predictive performance.When compared to the traditional radiomic SVM model, MobileNet V3 demonstrated a significant increase in sensitivity by 0.111 and 0.207 for internal and external test sets respectively (P < 0.05).The utilization of these models resulted in notable improvements in clinical benefits and overall efficiency for less experienced radiologists.
The traditional radiomics methods have complex workflows and primarily rely on manually defined features, which may not fully capture the inherent heterogeneity within lesions.Although DL has the potential to automatically learn more comprehensive features, its algorithms are abstract and less interpretable.While radiomics and DL features have their own distinct advantages and limitations, their integration offers complementary information, making it a prominent research direction in recent years.To our knowledge, there is currently no existing research that utilizes fusion models of radiomics and DL features for characterizing the differentiation of BPGTs and MPGTs including US, CT, and MRI.We hypothesize that fused features can offer additional valuable information to enhance the efficacy of US radiomics in distinguishing between BPGTs and MPGTs.In this study, we compared the diagnostic performance of multiple radiomics classifier models with various DL models.Additionally, we developed a feature fusion model and integrated clinical and US features to construct a nomogram, aiming to enhance the visual classification of preoperative diagnosis for PGTs and facilitate personalized precision diagnosis and treatment for patients.

Patients
The present study has obtained approval from the hospital ethics committee (protocol code 2024KS002).Given its retrospective nature, patient informed consent was waived.
A retrospective analysis was conducted on US images obtained from January 2017 to December 2023, involving a consecutive cohort of 608 patients with PGTs who received treatment at our hospital.The inclusion criteria consisted of: (1) either BPGTs or MPGTs confirmed by postoperative pathology, (2) preoperative US examination, and (3) complete clinical data.Exclusion criteria included: (1) previous history of surgery or treatment in the parotid gland region, (2) maximum tumor diameter less than 0.5cm,and (3) poor image quality, including blurred images or incomplete visualization of lesions.In cases with multiple lesions, the largest or most representative malignant lesion was selected for analysis.Detailed recruitment methods can be found in Figure 1.
Relevant clinical information including age, gender, smoking and drinking history, along with postoperative pathological results were retrieved from the Electronic Health Records (EHR) system.
The study enrolled a total of 526 patients, including 427 cases of BPGTs and 99 cases of MPGTs.These patients were randomly allocated to a training and testing set in a ratio of 7:3.The study design and workflow are illustrated in Figure 2.

Image acquisition and analysis
Preoperative US examination of the parotid gland region was performed using iU22 (PHILIPS), EPIQ7 (PHILIPS),S2000 (SIEMENS), and ACUSON Sequoia (SIEMENS) ultrasound diagnostic devices, equipped with corresponding high-frequency linear array probes.Two-dimensional US images of PGTs were acquired from the Picture Archiving and Communication System (PACS), capturing essential characteristics including maximum diameter, shape (regular/irregular), margin (well/poorly-defined), echogenicity (homogeneous/hetero-geneous), cystic component (absent/present), calcification (absent/present), and posterior acoustic enhancement (absent/present).The analysis of US images was independently conducted in a blinded manner by two experienced ultrasound physicians A and B (with over 5 years and 10 years of experience in superficial organ diagnosis respectively) without access to clinical information or pathological results.In case of discrepancies, consensus was reached through discussion.The patient recruitment process and distribution in the training and testing sets.

Image segmentation
The ITK-SNAP software (version 3.8.0)was utilized for manual delineation of the region of interest (ROI) along the tumor periphery on images displaying the maximum lesion diameter.Initially, ultrasound physician A performed the ROI delineation, and subsequently, a subset of 100 patients were randomly selected after a two-week interval for independent ROI delineation by both ultrasound physicians A and B, aiming to assess the selected features with high reproducibility and robustness in terms of intra-observer and inter-observer agreement.

HCR feature extraction
Handcrafted radiomic (HCR) feature extraction was performed with the Pyradiomics (version 3.0.1),adhering to the Imaging Biomarker Standardization Initiative (IBSI) guidelines.The documentation for this program can be accessed at https:// pyradiomics.readthedocs.io.HCR features are classified into three primary groups: (1) Geometry, (2) Intensity, (3) Texture.Geometry features are designed to characterize the spatial structure and contour of lesion; Intensity features analyze voxel intensityrelated information using first-order statistical methods; and Texture features capture subtle variations in lesions through more intricate second-and higher-order analyses.Various techniques were utilized to extract texture features, including gray-level cooccurrence matrix (GLCM), gray-level dependence matrix (GLDM),gray-level run length matrix (GLRLM),gray-level size zone matrix (GLSZM),and neighborhood gray-tone difference matrix (NGTDM).

DL feature extraction
In order to ascertain the most suitable algorithm for our specific research requirements, we explored the performance of prominent networks including DenseNet121, VGG19, and ResNet50.To improve the generalization capability across diverse datasets, transfer learning was implemented by initializing the models with pre-trained weights from the ImageNet database and fine-tuning the learning rate using the cosine decay learning rate strategy.Further details regarding the specific definition and methodology can be found in Supplementary Material 1.
Prior to training, the input images underwent cropping and Zscore normalization, retaining only the minimum bounding rectangle that encompasses the ROI.This simplified complexity and reduces background noise in algorithmic analysis.During training, we employed real-time data augmentation techniques such as random cropping, horizontal flipping, and vertical flipping.For testing set images, only normalization was performed during processing.The overall workflow of this study.
The classification performance of three DL models was compared, and DL features were extracted from the penultimate layer (average pooling layer) of the most effective model for subsequent analysis.

Feature selection and fusion
For HCR features, the initial step involves calculating the intraclass correlation coefficient (ICC) between HCR features and retaining those with an ICC value ≥ 0.85, indicating a high level of stability.Subsequently, feature standardization is performed using Z-scores, complemented by intergroup comparisons based on t-test.Features exhibiting p-values < 0.05 were selected for further analysis.Furthermore, we examined repeatable features using Pearson's correlation coefficient and opted to retain only one feature in cases where the correlation between feature pairs exceeded 0.9.To reduce redundancy further, a greedy recursive deletion strategy is employed for feature filtering.Finally, least absolute shrinkage and selection operator (LASSO) regression with cross-validation utilizing a minimum criterion of 10 folds is applied to adjust the penalty parameter (l), aiming to identify HCR features among non-zero coefficients that possess superior predictive value.
For DL features, we applied principal component analysis (PCA) to reduce the dimensionality of these transfer learning features from 50,176 to 512, in order to enhance the model's generalization ability and mitigate the risks of overfitting.
In the stage of feature fusion, we employed a pre-fusion algorithm that integrated HCR features with DL features to form a comprehensive feature set.Subsequently, we followed the similar process as that for HCR features for fusion feature selection.

Model construction and validation
HCR features and fused features obtained through feature selection are combined with several machine learning classifiers to construct traditional radiomics models and deep learning radiomics (DLR) models for discriminating BPGTs and MPGTs.Seven mainstream classifiers, including linear models (logistic regression (LR), support vector machine (SVM)),tree-based models (RandomForest, ExtraTrees, XGBoost, LightGBM), as well as a deep learning-based multi-layer perceptron (MLP) model were selected.For model hyperparameter tuning, we applied 5-fold cross-validation on the training set and utilized the Gridsearch algorithm.The model parameters that exhibited superior median performance were chosen for final model training.
Through a comprehensive analysis of relevant clinical data and US characteristics, we conducted univariate r followed by multivariate logistic regression analysis to identify significant features for constructing clinical models.Furthermore, these selected features were integrated with the most optimal predictive machine learning model to develop a nomogram.
The receiver operating characteristic (ROC) curve was employed for assessing the diagnostic performance of various models, while the Delong test was utilized to compare the area under the curves (AUC) of each model.Calibration curves and Hosmer-Lemeshow (HL) analysis were plotted to evaluate the concordance between predicted probabilities and actual outcomes.Decision curve analysis (DCA) was applied to assess the clinical utility of these models.

Statistical analysis
The analyses were performed using Python (version 3.7.12)and statsmodels (version 0.13.2).The development of our machine learning models utilized the scikit-learn (version 1.0.2) interface.DL training was conducted on an NVIDIA 4090 GPU, with MONAI 0.8.1 and PyTorch 1.8.1 frameworks.
For quantitative data, normality and homogeneity of variance tests were conducted.If the data followed a normal distribution, it was represented as mean ± standard deviation and an independent samples t-test was used for comparison.If the data did not follow a normal distribution, median and interquartile range (IQR) were used for representation, and a non-parametric Mann-Whitney U test was employed for comparison.For categorical data, a chisquare test was utilized for comparison.A significance level of P<0.05 indicated statistical significance.

Clinical and US characteristics
Ultimately, a total of 526 patients were enrolled in the study, including 283 males and 243 females, with ages ranging from 12 to 87 years (mean age: 51.73 ± 15.17 years).Among the cohort of BPGTs (n=427), PA was the most prevalent subtype (207 cases; accounting for 48.48%), followed by Warthin tumor (133 cases; accounting for 31.15%).Of the MPGTs (n=99),MEC exhibited the highest proportion (28 cases; accounting for 28.28%).The distribution of tumors is presented in Table 1.
The baseline characteristics of the training and testing sets were compared in Table 2, and no statistically significant differences (P>0.05) were observed between the clinical and US characteristics of the two groups, ensuring an unbiased data partition.Extensive univariate and multivariate analyses were conducted on the baseline characteristics BPGTs and MPGTs to determine odds ratios (ORs) for each feature along with their corresponding p-values (Supplementary Table 1).Univariate analysis revealed significant differences (P < 0.05) between the two groups regarding smoking history, maximum diameter, shape, margin, calcification, and posterior acoustic enhancement.Multivariate analysis identified only irregular shape (OR=1.257),poorly-defined margin (OR=1.323),andabsence of posterior acoustic enhancement (OR=0.807)as independent risk factors for MPGTs.
We performed numerical mapping on these features and subsequently modeled them by machine learning algorithms.The diagnostic performance of various clinical models was compared in Table 3 and Supplementary Figure 1.Among all the models, ExtraTrees exhibited superior performance in the test set with an AUC of 0.886 (95% CI: 0.807 -0.965).

Radiomics models
In this study, a total of 1562 HCR features were extracted and their distribution is presented in Supplementary Figure 2.After feature selection, 16 HCR features were ultimately chosen for further analysis and construction of traditional radiomics models (Supplementary Figure 3).The predictive performance of different classifiers combined is summarized in Table 4.Among these models, the ExtraTrees model demonstrated superior predictive performance in the test set, achieving an AUC of 0.853 (95% CI: 0.770 -0.936).The ROC curve can be found in Supplementary Figure 4.

DL models
The performance of three DL models is presented in Table 5 and Supplementary Figure 5.The Densenet121 model demonstrated superior performance compared to the ExtraTrees model based on clinical and traditional radiomics, achieving an AUC of 0.883 (95% CI: 0.817 -0.947) in the testing set.
To investigate the recognition ability of the Densenet121 model across different samples, we utilized the Gradientweighted Class Activation Mapping (Grad-CAM) technique for visualization.Figure 3 demonstrates the application of Grad-CAM, effectively highlighting the activation status of the final convolutional layer relevant to cancer type prediction.This approach facilitates identification of image regions that significantly influence model decisions and provides valuable insights into the interpretability.

Feature fusion models
After feature selection, a total of 31 HCR features and 24 DL features were retained from the fused feature set comprising 2,074 dimensions (Figures 4, 5).Subsequently, DLR feature fusion models were constructed by combining multiple classifiers, and the performance comparison is presented in Table 6 and Supplementary Figure 6.The ExtraTrees model achieved an AUC of 0.916 (95% CI: 0.861 -0.971) in the testing set, demonstrating further enhancement compared to the Densenet121 model (AUC=0.916vs 0.891).

Construction of nomogram and comparison of all models
The DLR model demonstrated superior performance compared to alternative models, thereby we integrated meaningful clinical features with the DLR model's predictions for constructing the final combined model, which was effectively visualized by nomogram (DLRN).Nomogram illustrated that DLR factor played a significant role in predicting the risk level of PGTs (Figure 6).
The performance of the clinical model, radiomics model, DL model, DLR model, and DLRN was summarized in Table 7.Among all models evaluated (Figure 7), DLRN exhibited superior performance with an AUC of 0.960 (95% CI: 0.940 -0.979) for the training set and 0.934 (95% CI: 0.876 -0.991) for the testing set.Delong test (Supplementary Figure 7) revealed statistically significant differences between DLR and DLRN model compared to others in the training set (P < 0.05).However, no statistically significant difference was observed among all models in the testing set (P > 0.05).The calibration curves ((Supplementary Figure 8) demonstrated excellent fit for DLRN with a HL test statistic of 0.327 for the training set and 0.793 for the testing set.Furthermore, based on DCA curves results (Figure 8), it could be concluded that DLRN provided superior clinical benefits compared to other models.

Discussion
Our research findings demonstrated that DL models outperformed traditional radiomics models in the classification of PGTs based on US images (AUC=0.883vs 0.853).Furthermore, the performance of feature fusion DLR model further enhanced (AUC=0.916).Clinical and US characteristics also provide valuable information for model construction, and the DLRN model that integrated all available data demonstrated superior performance (AUC=0.934).The DCA curve illustrated that the adoption of DLRN would yield enhanced benefits for patients.
Controversy surrounds the diagnostic value of clinical data and US characteristics for PGTs.BPGTs typically exhibit well-defined margin, homogeneous echogenicity, and posterior acoustic enhancement in US images.In contrast, high-grade malignant tumors often display heterogeneous echogenicity, poorly-defined margin, and internal calcifications (27).However, PGTs encompass a wide range of histological types with diverse cellular origins or differentiations.Additionally, tumor cells can undergo various forms of metaplasia, resulting in variations or overlaps in the pathological and corresponding radiological manifestations.In this study, through univariate and multivariate logistic regression analysis, irregular shape, poorly-defined margin, and absence of posterior acoustic enhancement were identified as independent Radiomics is the process that converts digital medical images into high-dimensional, mineable data.Numerous domestic and international studies have investigated its application in distinguishing PGTs (18)(19)(20)(21).Qi et al. (19) conducted a study to differentiate between BPGTs and MPGTs, as well as different subtypes of benign tumors.The results demonstrated that the multi-sequence radiomics model based on conventional MRI exhibited excellent performance in classifying BPGTs and MPGTs, with further improvement when combined with clinical features (AUC=0.863).Li et al. (21) validated the effectiveness of radiomics analysis using conventional ultrasound (CUS) images for preoperative prediction of the malignant potential of parotid lesions.By combining radiomic features, CUS features, and clinical information in the nomogram, the ability to differentiate between benign and malignant parotid lesions was enhanced (AUC=0.91).The traditional radiomics models, combined with diverse classifiers, showed satisfactory diagnostic performance in our study.The training set had an AUC ranging from 0.768 to 0.960, while the testing set ranged from 0.738 to 0.853.Feature extraction plays a crucial role in radiomics, but conventional radiomics often generate numerous low-level and predefined features that may not fully capture the heterogeneity of images.This limitation restricts the potential of radiomics models.In recent years, the integration of DL and radiomics has gained momentum due to the unique advantages of DL in computer vision and image recognition tasks.DL networks autonomously learn high-level features specific to research problems, enabling a more comprehensive reflection of information within lesions.However, their performance heavily relies on data volume and entails significant computational costs.Transfer learning can be leveraged by utilizing pre-trained DL networks from large-scale datasets like ImageNet and fine-tuning them for extracting DL features from smaller datasets for radiomics analysis.This approach helps mitigate overfitting issues caused by limited data availability and opens up new avenues for advancing radiomics (28).Existing studies have demonstrated that models combining DL features with radiomics features outperform those using either features alone in various clinical problems such as breast tumors (29), renal cystic lesions (30), meningiomas (31),and tuberculosis   The Grad-CAM visualizations for four typical samples.These visualizations are instrumental in demonstrating how the model focuses on different regions of the images for making its predictions.The rapid advancement of deep learning in computer vision has led to the emergence of highly competitive approaches in tumorrelated domains through the integration of multi-modal and multiomics features.Fusion feature selection using LASSO (A) and the histogram of the feature importance score (B) based on the selected features.The optimal l value of 0.0020 was selected.
The selected fusion features and corresponding coefficients.In future studies, we will concentrate on constructing models using multi-modal imaging to extract comprehensive information and integrating deep learning automatic segmentation algorithms to improve delineation accuracy and repeatability, thereby enhancing diagnostic performance.

(
32).In our study, while each individual model demonstrated satisfactory performance in isolation, the integration of DL with clinical and radiomics data yielded a more robust predictive tool, effectively capitalizing on the unique strengths of each individual component.In a recent study examining the application of deep learning in parotid gland tumors, Liu et al. (33) evaluated five DL models (ResNet50, MobileNetV2, InceptionV1, DenseNet121 and VGG16) based on US images to differentiate PA and WT.DL models are superior to ultrasound and FNAC, the AUC value of these DL models in the test set was from 0.828 to 0.908 and ResNet50 demonstrated the optimal performance.In our study, we attempted to utilize various CNNs including Densenet121, VGG19, and Resnet50.The disparities in performance among different DL models can be attributed to variations in their internal network architectures.Specifically, Densenet121 (34) utilizes a dense connection structure wherein the output of each layer is directly connected to the input of all subsequent layers.This architectural design enhances scalability and parameter efficiency while mitigating gradient vanishing issues and expediting model training processes.Visualization using Grad-CAM demonstrated that model decision-making focused on edge areas of tumors predominantly, which aligned with clinical factors and contributed to interpretability of the models.Selecting an appropriate and efficient modeling classifier is crucial for developing robust models.In the discrimination of BPGTs and MPGTs, Yu et al. (35) utilized SVM and LR paired with three feature selection methods, to construct distinct radiomics models based on multi-phase CT images.The results demonstrated that the SVM model utilizing a combination of three phases exhibited superior predictive performance, achieving an AUC of 0.936 in the testing set.Lu et al. (20) conducted radiomics analysis of PGTs employing five common machine learning classifiers based on plain CT images and observed variations in optimal classification efficacy among different subtypes of PGTs across these classifiers.Notably, the RandomForest model achieved the highest AUC (0.834) in distinguishing between BPGTs and MPGTs, indicating that model performance may be influenced by key tumor features as well as algorithmic characteristics inherent to each classifier.The ExtraTrees classifier demonstrated superior performance in the testing set of clinical, radiomics, and DLR models in our study.By incorporating additional randomness derived from RandomForest, the ExtraTrees effectively reduces model variance and enhances generalization capabilities, making it highly efficient for handling extensive datasets (36).

Wang et al. 10 .
3389/fonc.2024.1384105Frontiers in Oncology frontiersin.orgDyAM model that combines histology, radiology, and genomics to accurately predict immunotherapy response in NSCLC patients.The model (AUC = 0.80, 95% CI 0.74-0.86)outperformed unimodal measures, including tumor mutation burden and programmed deathligand-1 immunohistochemistry score.These findings suggest that machine learning techniques combining multiple modalities have complementary and synergistic effects, facilitating oncology decision-making.The present study is subject to certain limitations.Firstly, the retrospective design employed in this study may introduce potential selection bias.Secondly, patients were recruited from a single-center medical institution and lacked external validation.Future research should involve multi-center participation to expand the sample size and enhance model generalizability.Lastly, our feature extraction and model construction solely relied on conventional two-dimensional US images with manually delineated ROI, without incorporating other modalities such as elastography or contrast-enhanced imaging.Utilizing standardized single-modality images allows for easier acquisition and wider applicability and dissemination of the model.

TABLE 1
Distribution of tumors confirmed by histologic results in the whole cohort.

TABLE 2
Baseline clinical and US characteristics of patients in training and testing sets.

TABLE 3
Performance comparison of different clinical models.

TABLE 5
Performance comparison of DL models.

TABLE 6
Performance comparison of different DLR models.