The Role of Preoperative Computed Tomography Radiomics in Distinguishing Benign and Malignant Tumors of the Parotid Gland

Objective This study aimed to develop and validate an integrated prediction model based on clinicoradiological data and computed tomography (CT)-radiomics for differentiating between benign and malignant parotid gland (PG) tumors via multicentre cohorts. Materials and Methods A cohort of 87 PG tumor patients from hospital #1 who were diagnosed between January 2017 and January 2020 were used for prediction model training. A total of 378 radiomic features were extracted from a single tumor region of interest (ROI) of each patient on each phase of CT images. Imaging features were extracted from plain CT and contrast-enhanced CT (CECT) images. After dimensionality reduction, a radiomics signature was constructed. A combination model was constructed by incorporating the rad-score and CT radiological features. An independent group of 38 patients from hospital #2 was used to validate the prediction models. The model performances were evaluated by receiver operating characteristic (ROC) curve analysis, and decision curve analysis (DCA) was used to evaluate the clinical effectiveness of the models. The radiomics signature model was constructed and the rad-score was calculated based on selected imaging features from plain CT and CECT images. Results Analysis of variance and multivariable logistic regression analysis showed that location, lymph node metastases, and rad-score were independent predictors of tumor malignant status. The ROC curves showed that the accuracy of the support vector machine (SVM)-based prediction model, radiomics signature, location and lymph node status in the training set was 0.854, 0.772, 0.679, and 0.632, respectively; specificity was 0.869, 0.878, 0.734, and 0.773; and sensitivity was 0.731, 0.808, 0.723, and 0.742. In the test set, the accuracy was 0.835, 0.771, 0.653, and 0.608, respectively; the specificity was 0.741, 0.889, 0.852, and 0.812; and the sensitivity was 0.818, 0.790, 0.731, and 0.716. Conclusions The combination model based on the radiomics signature and CT radiological features is capable of evaluating the malignancy of PG tumors and can help clinicians guide clinical tumor management.


INTRODUCTION Background
Parotid gland (PG) tumors are rare and account for approximately 1%-3% of all head and neck tumors (1). Parotid tumors are a clinically, morphologically, radiologically diverse group of neoplasms that may present significant diagnostic and management challenges. Radical tumor resection with lymph node dissection remains the mainstay treatment for malignant parotid tumors, followed by adjuvant chemotherapy and radiotherapy (2). Knowledge of the clinical information and imaging characteristics before surgery would be of outstanding importance for evaluating these tumors, tailoring treatment decisions and optimizing individualized surgical plans. Additionally, for malignancies, preoperative knowledge of the tumor type would also be of paramount importance.
Currently, multiple imaging techniques are available to study the parotid region, such as ultrasound, computed tomography (CT), and magnetic resonance imaging (MRI). Although CT is not a first-line method for parotid gland tumor evaluation, it can be used to help clinicians evaluate PG tumors to confirm the presence of a parotid mass, assess the extent of tumor especially in the deep lobe, and detect enlarged lymph nodes, to facilitate the determination of benign or malignant nature of the tumor for appropriate treatment. However, CT involves radiation, and various neoplasms may have similar imaging features on CT (3,4). Furthermore, contrast-enhanced CT may cause contrastinduced adverse reactions, though it is generally considered safe, with an overall prevalence of adverse reactions around 0.7%, among which most of the events (more than 80%) are mild (5). Ultrasound is a cheap and effective tool for delineating cystic versus solid tumors, tumor borders, and cervical lymph nodes; however, it poorly visualizes the deep lobe and is dependent on operator expertise. MRI has sufficiently high resolution to detect and evaluate parotid tumors noninvasively. As a functional MRI technique, DWI can be used to explore the diffusion changes in composition of tissues, which is helpful for parotid tumor detection and differential diagnosis (6). Nevertheless, its disadvantages include limited availability for patients with metal prostheses, high cost and long waiting times. Fine-needle aspiration biopsy (FNAB) is an accurate method for identifying the nature of these tumors; however, FNAB is invasive and may cause hemorrhage, facial nerve injury, and acute sialadenitis at the needle puncture site (7,8). In addition, there are significant variations in the performance of FNAB within different practice settings which is associated with inadequate diagnoses and missed malignancies, with a sensitivity for detecting malignancy between 70% and 80% and non-diagnostic rates average at 14%-18% (9).Thus, challenges remain in non-invasively and accurately distinguishing benign from malignant lesions on pre-operative CT images.
Radiomics is new method of mining objective and quantitative features such as the shape, intensity, and energy of regions of interest from medical images (e.g., gray-level cooccurrence matrix (GLCM) and run length matrix (RLM) features), describing the relationships between image voxels far beyond the traditional visual features we can obtain and thus reflecting the underlying genetic and biological variability of the analyzed tissue, which can promote accurate diagnosis and individualize cancer treatment (10). Recent studies describe the use of radiomic analysis for head and neck tumors (11), glioblastoma (12), breast cancer (13), rectal cancer (14), hepatocellular carcinoma (15), etc. These studies demonstrated that radiomic features are closely associated with the histopathological types, grading and prognosis of tumors and can help solve many clinical problems and optimize patient treatment. A previous study (16) used energy spectrum CT, which is not commonly used, in the tissue classification of benign parotid tumors and demonstrated good discrimination ability.
However, as mentioned above, it is difficult to distinguish benign from malignant PG tumors with conventional CT images, but considering the novelty of radiomics and the powerful performance of tumor differential diagnosis in other field, we hypothesize that radiomics analysis based on conventional CT images can be used to distinguish between benign and malignant PG tumors. The purpose of this study is to utilize conventional CT images to extract PG tumor-related radiomic features and combine them with conventional radiological features to build a classification model for distinguishing benign and malignant PG tumors.

MATERIAL AND METHODS
This retrospective study was approved by Ethics Committee of Zhejiang Provincial People's Hospital. All patients' informed consents were waived for the retrospective nature of this study.
The research method was carried out in accordance with the relevant guidelines and regulations. Patients' clinical and image data were obtained from routine clinical records and the picture archiving and communication systems (PACS) of the hospital.

Patient Population
A retrospective review of clinical and radiological databases was performed from January 1, 2017, to January 30, 2020, in hospital #1 and from Jan 1, 2019, to January 30, 2020, in hospital #2. The inclusion criteria were as follows: (1) patients with PG-related symptoms or masses; (2) confirmation of the PG tumor by surgery and postoperative pathological diagnosis; and (3) noncontrast computed tomography (NCCT) and contrast enhanced computed tomography (CECT) images of the head and neck containing the PG obtained within 2 weeks before the operation. The exclusion criteria were as follows: (1) CECT images with obvious artifacts, such as artifacts from false teeth, motion artifacts, etc.; (2) fine needle aspiration performed before imaging of the PG; or (3) patients with parotid lesions less than 1.0 cm in diameter. Medical records from 208 patients were initially analyzed, and 125 patients were finally included in this study (see Figure 1 for details). In order to know whether there is selection bias, a comparison between the included and excluded dataset were performed.
CT scans were performed with a multi-slice CT scanner (Siemens 40) or a 64 multidetector scanners (LightSpeed VCT; GE Healthcare, Waukesha, WI, USA) with the following parameters: tube voltage of 120 kVp; tube current of 150 mA; section thickness of 3 mm; and section interval of 3 mm. The scanning ranged from the base of the skull to the inlet of the thorax. The CECT images were obtained after an intravenous injection of 80-100 ml of nonionic iodinated iopamidol containing 370 mg iodine per ml (Isovue 370, Bracco Healthcare, Princeton, NJ) at 3-4 ml/s. Arterial-phase CECT images were obtained 35 seconds after contrast material injection.

Clinical and Radiological Data Analysis
Clinical parameters, including age, sex, disease duration, and smoking status, were collected from the hospital medical record system. All original CT images were reviewed and assessed by two experienced head and neck radiologists who were blinded to the clinical data, including tumor location (in the deep or shallow parotid), maximum diameter, distribution (single or bilateral), shape (round or not), capsule (with or without), regularity (regular or irregular), margin (clear or unclear), density (hypo-, iso-, or hyperdense), enhancement (enhanced or nonenhanced), cystic degeneration (with or without), lymph node metastasis (with or without), hemorrhage (with or without), calcification (with or without), and enhancement type (slightmoderate or obvious). The definition of some radiological features can be found in the Supplementary Material. Discordant interobserver interpretations were resolved by consensus. If there were multiple lesions in the parotid gland, the largest lesion with confirmed pathology was chosen for the analysis.

Image Preprocessing
All NCCT and CECT images were stored in Digital Imaging and Communications in Medicine (DICOM) format and imported to ITK-SNAP software for three-dimensional manual segmentation of the region of interest (ROI). The ROI of each case was  manually drawn on the CECT images by two independent head & neck radiologists (Radiologist X with 11 years of experience and Radiologist W with 6 years of experience) who were blinded to the clinical information, carefully avoiding the vessels, bones and lymph nodes. All ROIs were then replicated to the NCCT images, and manual correction was also performed to adjust small deviations in delineating the ROI boundaries. All ROIs from the NCCT and CECT images were uploaded into AK analysis software (Artificial Intelligence Kit V3.0.0. R, GE Healthcare) for feature extraction. To eliminate the potential impact of different imaging parameters on the extracted features, we preprocessed the segmented images, including resampling the images to 1×1 ×1 mm 3 voxel size, intensity normalizing, and standardizing the gray levels to range from 1 to 32 (17).

Radiomics Feature Selection
The preprocessed images were used to extract the radiomics features, including the histogram, Haralick, FormFactor, gray level co-occurrence matrix (GLCM), run length matrix (RLM) and gray level size zone matrix (GLZSM) features. In this study, a joint feature set was obtained from both the NCCT and CECT images. The most robust features of the two separate ROI datasets from the two radiologists were used to ensure the reproducibility and repeatability of the radiomics features (18). Spearman's rank test was utilized to evaluate the correlation coefficients between the features of the datasets segmented by Radiologist X and Radiologist W. Any features that had correlation coefficients greater than 0.8 were defined as "robust" features (19). A large quantity of features with a limited sample size may hinder the predictive ability of the model, especially in a high-dimensional feature space, owing to the "curse of dimensions" (20) ; therefore, the dimensions of the extracted features were reduced to address this issue. Analysis of variance was first performed on the extracted features to select those features that were statistically significant. Subsequently, the minimum redundancy maximum correlation (mRMR) algorithm was used to reduce the dimensions of the selected features as well as to select the features that had the highest correlation with the tumor classification and had the smallest redundancy between one another. After that, the emerging gradient boosting decision tree (GBDT) algorithm was used to further reduce the dimensionality of the preselected features. In this study, feature selections were performed on both the NCCT and CECT images of all cases and finally obtained a joint feature set containing NCCT and CECT image features. Based on these selected features, logistic regression was used to construct the radiomics signature.
To determine the correlation between the radiomics signature and tumor classification, a logistic regression (LR)-based signature model based on the combined feature set was used to calculate a score to reflect the actual tumor classification in the training set, defined as the rad-score, which was then used to determine the effectiveness of the signature models in differentiating between patients with benign and malignant parotid tumors. The formula of the model used in the training set was then employed to calculate the scores for the test set.
The area under the curve (AUC) of the receiver operating characteristic (ROC) curve was used to evaluate the accuracy of the radiomics signature in both the training and validation sets. The calibration performance was assessed with the calibration curve for the continuous variables. Furthermore, decision curve analysis (DCA) was used to assess the clinical efficiency of the radiomics signature in classifying the tumor by calculating the net benefit.

Prediction Model Construction and Validation
One-way analysis of variance was performed for each potential predictor variable including clinical characteristics, radiological characteristics, and the radiomics signature in the training group, and then multivariable logistic regression was used on the preselected features with significant differences to obtain the predictors that were ultimately employed for model construction. As machine learning can provide highly accurate and reliable models to improve clinical oncology decisions, in this study, we chose a support vector machine (SVM) to build a combined prediction model based on the selected predictors. The performance of the model was validated with the training set and the test set separately, including calibration performance assessed with the calibration curve, diagnostic accuracy using ROC analysis and net benefit evaluated by DCA. In addition, the tumor prediction value of each patient in the training set and test set was calculated according to the model, the cut-off value of the ROC curve was used to divide parotid tumors into low-risk and high-risk groups, and the clinical effectiveness was determined by the actual tumor classification in the different groups. Figure 2 shows the workflow of this study.
Statistics Analysis SPSS 17.0 software (IBM, Chicago, IL, USA) was used to evaluate the normality of the distribution of the dataset using the Kolmogorov-Smirnov test and to perform the chi-square test for the categorical data. The T test is used for normally distributed variables, and the Mann-Whitney test is used for non-normally distributed variables. The conditional forward stepwise selection method was applied for the multivariable logistic regression model. MedCalc15.8 software (MedCalc, Ostend, Belgium) was used to assess the ROC curves for the diagnostic performance of the models, and differences between the various AUCs were compared with the DeLong test. R statistical software was used for all other statistical analyses. The "mRMRe" and "gbm" packages were used for mRMR and GBDT analyses, respectively. DCA plots were generated with the "dca. R" package. Two-tailed p-values less than 0.05 were considered statistically significant.

Clinical Features
A total of 125 patients (63 male and 62 female) with PG tumors were included from two medical centers. The results showed that there was no significant difference between the included and excluded datasets (see Table 1 in the Supplementary Materials for details). Seventeen different tumor types were represented, most of which were benign tumors, especially pleomorphic adenomas and Warthin tumors. The details of the PG tumors are shown in Table 1. The clinical and radiological features of the patients in the training set and test set are shown in Table 2. No statistically significant difference was found for any of the data between the two groups. In the training set, the tumor location and borders and lymph node status were significantly different between benign and malignant PG tumors. In the test group, there were significant differences only in tumor location ( Table  3). Figure 3 shows an example of radiological feature analysis of two cases.

Radiomics Signature Construction and Validation
A total of 378 radiomics features were extracted from a single ROI; thus, 756 radiomics features were extracted from each patient in the two scan phases. Among these features, 14 retained after feature dimensionality reduction, including 5 features from NCCT images (HaralickCorrelation_angle45_offset7_P, Inertia_angle45_offset7_P, LongRunEmphasis_angle135_offset1_P, uniformity_P, Percentile5_P) and 9 features from CECT images (Correlation_ AllDirection_offset7_SD_A, Correlation_angle90_offset1_A, GLCMEnergy_AllDirection_offset7_SD_A, GreyLevel Nonuniformity_AllDirection_offset1_SD_A, HaralickCorrelation_ AllDirection_offset1_SD_A, HighGreyLevelRunEmphasis_ AllDirection_offset7_SD_A, HighIntensitySmallAreaEmphasis_A, kurtosis_A, ShortRunEmphasis_angle0_offset1_A). Logistic regression was used to construct the radiomics signature. The radscores were significantly different between the training group and the test group. The predictive effects of the two groups of patients were favorable, with AUCs of 0.772 and 0.771, specificities of 0.878 and 0.889, and sensitivities of 0.808 and 0.790, respectively (see Figure 4 for details). Detailed information about the dimensionality reduction procedures and results can be found in the Supplementary Materials.

Classification Model Construction and Validation
Analysis of variance and multivariable logistic regression analysis showed that location, lymph node status, and rad-score were independent predictors of benign and malignant tumors. See Table 4 for details. ROC curve analysis shows that the accuracy of the SVM-based prediction model, radiomics signature, location and lymph node status in the training set was 0.854, 0.772, 0.679, and 0.632, respectively; the specificity was 0.869, 0.878, 0.734, and 0.773; and the sensitivity was 0.731, 0.808, 0.723 and 0.742. In the test set, the accuracy was 0.835, 0.771, 0.653, and 0.608, respectively; the   Figure 5. We performed DCA for the SVM model, radiomics signature, location, and lymphadenopathy in the training and test sets. The results show that the SVM model has the greatest net benefit in both datasets. Additionally, we conducted calibration curve analysis of these continuous variables in both datasets, and they all showed good consistency, as shown in Figure 6. According to the optimal diagnostic cut-off value of the model (0.323), patients were divided into a low-risk group and a high-risk group. There were significant differences in the number of malignant PG tumors between the low-risk group and the high-risk group in both the training set and test set (P<0.0001), as shown in Figure 7.

DISCUSSION
We developed and validated a combined prediction model based on radiological data and CT-radiomics for differentiating benign and malignant PG tumors in two independent clinical cohorts. The combined model was constructed by incorporating the radscore from the radiomics signature and two radiological features. The rad-score was calculated using the LR model, which was developed with 14 selective features, including five features from NCCT and nine features from CECT images of PG tumors. The combined SVM model outperformed the radiomics signature and individual radiological features in both the training and test groups. Thus, the proposed non-invasive method of the favorable, combined prediction model makes it a potential preoperative evaluation tool in clinical practice.
Medical imaging is one of the major factors in clinical evaluation and treatment. However, traditional medical imaging is primarily a subjective or qualitative science. Radiomics, a relatively newly developed set of techniques, allows the high-throughput extraction of imaging features to quantify the different characteristics that oncologic tissues exhibit in medical imaging (20). Recently, there have been several studies on the application of radiomics to PG disease. Ajmi et al. (16) used dual-energy CT to investigate the classification of two benign parotid tumors, Warthin tumors and pleomorphic adenomas. Pallamar et al. (21) utilized standard MRI sequence-based textures to discriminate PG masses; however, only a rather small 38 patients with various pathological entities were enrolled, and regions of interest derived from only three slices rather than from the whole tumor were used for extracting a limited number of features. Another study (22) used only the arterial phase to distinguish pleomorphic adenoma from Warthin tumor, both of which are benign tumors and have similar clinical management. In addition, several studies have investigated the changes in the parotid morphology and secretory function induced by radiotherapy for head and neck cancers (23,24). Unlike the above-mentioned literature, we performed 3D whole tumor analysis to differentiate benign and malignant PG tumors with larger, multicentre datasets of both plain CT and contrastenhanced CT images. Furthermore, machine learning methods were employed to ensure robustness and reproductivity, which make this study more clinically practical.
In this study, 14 texture features were selected with a machine learning method from both NCCT and CECT images to develop a radiomics signature, including first-order and high-order features. Previous literature has demonstrated that radiomics features may reflect relevant and potentially important phenotypic information, such as intra-tumor heterogeneity, subsequently providing valuable information for diagnosis, prognosis and individualized therapy (25). Our results showed that several GLCM features survived as robust in the radiomics signature and participated in the construction of the prediction model. GLCM features describe the relationship between two neighboring pixels, which could potentially reflect local intratumor heterogeneity and is associated with tumor malignancy. It is hypothesized that intra-tumor heterogeneity can be exhibited at several spatial levels-macroscopic, cellular and molecular (genetics)-, all leading to radiological differences; thus, radiological tumor phenotype characteristics may be useful for   investigating the underlying evolving biology and have been reported to be associated with worse survival in tumor patients (26). Our radiomics result was consistent with that of Zhang (27), who found that texture features, mainly consisting of GLCM features, in malignant PG tumors were significantly different from those in benign tumors. It can be surmised that malignant tumors grow rapidly and are mostly infiltrating, resulting in insufficient blood supply, which can easily cause microbleeds and necrosis of the tumor. Therefore, the heterogeneity of malignant lesions is higher than that of benign lesions. Our study employed both plain CT and CECT to extract features to construct the radiomics signature as PG tumors have different CT densities for different tumor histologies and are mainly supplied by arterial blood, yielding more texture information than only the arterial phase, as used in a previous study (22). More features were selected from CECT images than from NCCT images; thus, CECT may reflect more important information, and it may be speculated that unlike plain CT, CECT might also reflect some heterogeneous features associated with the tumor blood supply (3). Our results may further suggest that blood supply information is different between benign and malignant PG tumors, which may be because tumor capillaries generally have wider inter-endothelial junctions and a large number of discontinuous or absent basement membranes in malignant tumors, which result in different haemodynamic conditions in the arterial stage between benign and malignant tumors (28). Our integrated model showed the best performance, followed by the radiomics signature alone and then individual radiological features, which indicates that although the radiomics features of the tumor itself had better predictive ability than the radiological features themselves, extra-tumor radiological information such as lymphadenopathy is also important; only by combining these two complementary features could the model provide a precise evaluation of the entire tumor for management.
Our study still has several limitations. First, the sample size was relatively small, with 87 patients in the training group. However, 38 patients were enrolled from another independent medical center as the test group to investigate the models' reproducibility, and our results showed that the prediction model based on the training set was also stable for the test set. In the future, we will carry out further multicentre studies with a larger sample size. Second, the manual process of tumor segmentation and the reproducibility of radiomics features is one debatable aspect in radiomics analysis, as there is some subjectivity involved in the delineation of tumor boundaries. However, a recent study on the robustness and reproducibility of radiomics features suggested that only those reproducible features should be selected in building a radiomics model (29), which was employed in this study to ascertain the robustness of the features extracted from the segmented tumors by the two radiologists independently. In addition, ultrasound is sufficient for most benign tumors for primary diagnosis. However, CT can be used as a complementary assessment tool in some cases such as deep tissue involvement, recurrence, suspicious malignancy or large tumors.

CONCLUSION
This study developed and validated a combined prediction model based on radiological data and CT radiomics features to distinguish benign and malignant PG tumors in two independent clinical cohorts; this model showed better prediction accuracy than the radiomics signature and radiological features alone. Thus, the proposed model could be   used as a noninvasive prognostic or predictive biomarker for personal evaluation and could help clinicians guide surgical decisions. Multicentre and prospective validation studies with larger datasets should be further implemented prior to practical application of the model in the clinic.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Zhejiang Provincial People's Hospital. Written informed consent for participation was not required for this study is in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
The study concept and design were carried out by YX, ZS, and XW. Literature research was collected by YX and YL. Clinical studies were conducted by YX, GS, and YL. Data and statistical analyses were performed by YX, PP, and ZS. XG and XW guarantee the integrity of the entire research study. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the Fund of Health Commission of Zhejiang Province (2017KY230、2020KY402).