Application of Radiomics Analysis Based on CT Combined With Machine Learning in Diagnostic of Pancreatic Neuroendocrine Tumors Patient’s Pathological Grades

Purpose To evaluate the value of multiple machine learning methods in classifying pathological grades (G1,G2, and G3), and to provide the best machine learning method for the identification of pathological grades of pancreatic neuroendocrine tumors (PNETs) based on radiomics. Materials and Methods A retrospective study was conducted on 82 patients with Pancreatic Neuroendocrine tumors. All patients had definite pathological diagnosis and grading results. Using Lifex software to extract the radiomics features from CT images manually. The sensitivity, specificity, area under the curve (AUC) and accuracy were used to evaluate the performance of the classification model. Result Our analysis shows that the CT based radiomics features combined with multi algorithm machine learning method has a strong ability to identify the pathological grades of pancreatic neuroendocrine tumors. DC + AdaBoost, DC + GBDT, and Xgboost+RF were very valuable for the differential diagnosis of three pathological grades of PNET. They showed a strong ability to identify the pathological grade of pancreatic neuroendocrine tumors. The validation set AUC of DC + AdaBoost is 0.82 (G1 vs G2), 0.70 (G2 vs G3), and 0.85 (G1 vs G3), respectively. Conclusion In conclusion, based on enhanced CT radiomics features could differentiate between different pathological grades of pancreatic neuroendocrine tumors. Feature selection method Distance Correlation + classifier method Adaptive Boosting show a good application prospect.


INTRODUCTION
Pancreatic neuroendocrine tumors (PNETs) are tumors that originate from the neuroendocrine system of the pancreas, accounting for 2%-10% of pancreatic tumors (1,2). In recent years, researches on pancreatic neuroendocrine tumors have received more and more attention. According to the World Health Organization's classification system, PNETs are classified into three pathological grades according to mitotic count and Ki-67 index: G1, G2, G3. In PNETs, G1 grade tumors have a lower degree of malignancy, and G2 and G3 tumors have a higher degree of malignancy (3,4). Different treatment options for different grades of tumors are also different. At present, the more accepted treatment is surgical resection. For the unresectable or metastatic PNETs, local treatment, chemotherapy and targeted therapy can be used as treatment options (5)(6)(7). Of course, tumors with different malignant grades will also have an impact on the development of treatment options.
At present, enhanced CT examination has been used as a common imaging examination method for pancreatic tumors, and it is an important auxiliary tool for clinicians in evaluating tumor staging (8)(9)(10). However, enhanced CT is not able to directly determine the malignant grade of PNETs. The confirmed diagnosis still needs to rely on pathological diagnosis, which invisibly increases the difficulty of diagnosis and the suffering of patients. New methods are explored to identify the grade of carcinoma non-invasively by using the image data with the development of medical imaging and post-processing. It is valid to predict Gleason Score of Prostate cancer, malignancy stage of Colorectal cancer by analyzing image feature (11,12). This suggests that radiomics information can predict pathological information to a certain extent. Compared with pathological information, there is a huge amount of valuable information hidden in the image. Notably, as an emerging noninvasive way, radiomic analysis makes a transformation of medical images into available data combined with other clinical information of patients, playing a crucial role in diagnosis of multiple tumors (13).
Texture analysis, one type of radiomics analysis, is a method of quantifying texture parameters by post-processing conventional images, mathematically analyzing and calculating the intensity and spatial distribution characteristics of image pixels (14). The main image sources are CT, MRI, PET, and some post-processing images. Enhanced CT texture analysis can reflect the uneven distribution of contrast agent inside and outside the blood vessel (15). In recent years, CT texture analysis also has been applied to the diagnosis, grading, and prognostic evaluation of various tumors, such as colorectal cancer (16)(17)(18).
It has been reported that preoperative texture analysis of PNETs patients provides a novel and feasible method for the diagnosis and prognosis of PNETs patients (19)(20)(21)(22)(23). However, as far as we know, although some studies have found that several texture parameters have potential application value in PNETs diagnosis and grading, the explored parameters are still incomplete and need further investigation. To show the full picture of the ability of texture parameters based on enhanced CT to differentiate pathological grading of PNETs, we conducted this retrospective study. This is the first time that five feature selection methods and nine classifiers have been used to help identify pancreatic neuroendocrine tumors with different pathological grades. The purpose of this study was to evaluate the ability of CT based radiomic combined with machine learning to identify pathological grade of pancreatic neuroendocrine tumor, and to compare the performance of five feature selection methods and nine classifiers.

Study Population
We retrospectively reviewed a computer database of PNETs patients treated at our hospital from March 2011 to November 2019, yielded 201 patients who had treated for PNETs with their clinical records and CT images. Clinical data including age, sex, location, tumor size, pathological typing, date of baseline CT, condition of secretory function and dates of surgery were recorded in our computer medical system. All CT images were exported through the hospital's PACS (Picture Archiving and Communication System). One hundred nineteen patients were excluded after the initial evaluation on images and patient profiles, the reasons are as follows: Patients who did not undergo an enhanced abdominal CT scan within 2 months prior to surgery (n=38); The patient did not had a definite pathological results (n=36); Relevant tumor treatment history in other hospitals (n=41); the image quality does not meet the requirements (n=4). A total number of 82 patients were introduced in our study, finally. This study was approved by the Ethics Administration Office of the West China Hospital, Sichuan University, and the requirement for informed consent was waived.

Texture Features Extraction
We retrieved and extracted the digital imaging and communication medical data (400-bit gray scale) of the enhanced CT of the study patient from the image archiving system. In order to quantify the lesion segmentation and automatic quality characteristics, these data were loaded into a personal computer-based partial image feature extraction software (LIFEx v3.74, CEA-SHFJ, Orsay, France) for segmentation and texture analysis. In enhanced CT fusion images, regions of interest (ROIs) were drawn by hand around tumor lesions (27,28). All CT data were selected for arterial phase. The drawing process of ROIs was done independently by two radiologists. The radiogist contoured along the tumor tissue slice by slice to draw the region of interest (ROI), and the three dimensional texture features were automatically generated with default setting (29).
The cystic, calcified and vascular shadows of the tumors were removed during the process. To ensure objectivity, we implemented mutual blindness for two radiologists. A third radiologist evaluated the ROIs sketched by the previous two radiologists and selected the more accurate ROIs to generate texture features. Texture features of all image data are then automatically calculated and extracted by computer software Lifex. A total of 40 subdivided texture features were extracted, including features from the first order (minimum value, maximum value, average value and standard deviation value, histogram-based matrix and shape-based matrix) and features from second or higher order [gray-level co-occurrence matrix (GLCM), gray-level zone length matrix (GLZLM), neighborhood gray-level dependence matrix (NGLDM), and gray-level run length matrix (GLRLM)].

Machine Learning
The establishment of machine learning model includes two key points: feature selection by algorithm and modeling. The patients were randomly separated into two sets in the ratio of 3:1 as the training set and the validation set. Considering that there are so many features, over fitting will occur, which will affect the prediction performance of the model (30).
The purpose of feature selection is to reduce the effect of overfitting. Considering that there are many different selection methods at present, we evaluated five selection methods: Distance Correlation(DC), Random Forest (RF), Least Absolute Shrinkage And Selection Operator (LASSO), Extreme Gradient Boosting (Xgboost) and Gradient Boosting Decision Tree (GBDT). We apply all radiomics features selected to the classification algorithm to establish the discrimination model of different algorithm combinations for pathological grading of PNETs. The nine machine learning classifiers were: linear discriminant analysis (LDA), Support Vector Machines (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), K-nearest neighborhood (KNN), Gaussian Naive Bayes (GaussianNB), Logistic Regression (LR), Gradient Boosting Decision Tree(GBDT) and Decision Tree (DT). For each model, we repeated 10 times machine learning process to obtain the real distribution of classification. In analysis of diagnostic grading, we made receiver operating characteristic (ROC) curves of every diagnostic models. The discriminating power of the model was measured by the area under the curve (AUC) of the ROC curve. The predicted targets were pathological grade(G1, G2, and G3). Sensitivity is defined as the proportion of positive samples judged to be positive. Specificity is defined as the proportion of negative samples judged to be negative. The accuracy was defined as the percentage of the sum of true positive and true negative in the number of subjects. The association between texture parameters was evaluated using Pearson correlation coefficient test.
A P value < 0.05 was considered to indicate statistical significance and all P values were based on two-sided testing. All regular statistical analyses were performed using the SPSS software (Version 20.0, IBM Corporation, Armonk, NY, USA). The machine learning algorithms were programmed using were performed on Python software (sklearn package). The studyprocess diagram is shown in Figure 1.

RESULT Patient Population
Among the 82 patients (mean age, 52.6 years old; range: 24-77 years old) included in the present study, 52 patients were male and 30 patients were female. The pathological type of all patients was PNET. The median OS for this cohort was 58.2 months. As for the pathological results, there are 20 patients confirmed as G1, 33 patients were G2, and 29 patients were G3. There were 21 patients with tumor secretory function. Seven patients died and the remaining 75 patients survived.
We found that the gender, age and survival status of the patients had no significant relationship with the pathological grade of the tumor. (P > 0.05). At the same time, tumor location, tumor secretory function, vascular invasion, peripancreatic permeation, pancreatic duct dilatation, boundary form, calcification, pancreatic atrophy, and the maximum diameter of the tumor are not related to the pathological grade of the tumor.
The baseline characteristics of all patients and lesions were summarized in Table 1.

Texture Features
According to the Pearson correlation coefficient of the extracted features, most texture features were independent or weakly correlated. There were also a few features showing strong positive correlation and strong negative correlation. The correlation of all texture features was shown in Figure 2 in the form of heatmap. In different comparisons, each selection method had selected different radiomics features. In terms of feature selection, RF and lasso select the most number of radiomics features, while Xgboost selects the least number of features. The feature selection results of each method were shown in Table 2.

Model Performance
We made three comparisons: G1 vs G2, G2 vs G3, and G1 vs G3. Each comparison established 45 diagnostic models and the best diagnostic model of each comparison was to select the one of the highest validation set AUC values in all models and in the three comparisons, there was no over-fitting or under-fitting in the model constructed by the algorithm combination. And once the diagnosis performance of the model appears overfitting or underfitting, the combination of algorithms used in the model was not considered to be the best model.
About the comparison of G1 and G2, the AUC values of most models were between 0.60 and 0.82. There was also a few model with high AUC values in comparison with G1 and G2, but showed overfitting in G2 and G3 comparisons (AUC of validation set is much lower than AUC of training set). Like RF + AdaBoost, Xgboost + AdaBoost, Xgboost + GBDT, GBDT + RF, GBDT + AdaBoost, and GBDT + GBDT, although they had high AUC values in the comparison of G1 and G2, they all showed over-fitting in the comparison of G2 and G3 ( Figure 3). We found that the model constructed by the combination of DC + AdaBoost, DC + GBDT and Xgboost+RF was very valuable for the differential diagnosis of three pathological grades of PNET, and these models did not show over-fitting and underfitting. The model performance of the combination of these three algorithms was shown in Table 3. DC + AdaBoost has the best performance in the three comparisons G1 vs G2, G2 vs G3, and G1 vs G3). The ROC curves of the DC + AdaBoost models was shown in Figure 4.
The detailed AUC values for all models were shown in Supplementary Material 1.

DISCUSSION
In this study, we construct diagnostic models through cross combination of five feature selection methods and nine classification methods based on radiomics features. We evaluated the ability of the model to identify the pathological grade of PNETs, and explored the potential application value of machine learning combined with radiomics in the diagnosis of PNETs.
One of the main findings of this study was that the model constructed by feature selection method Distance Correlation + classifier method Adaptive Boosting had a strong ability to predict the pathological grade of PNETs. The validation set AUC of DC + AdaBoost is 0.82 (G1 vs G2), 0.70 (G2 vs G3), and 0.85 (G1 vs G3), respectively. As the result shows, these models had satisfactory results to indicate the pathological grade of PNETs (G1,G2, and G3). Models of DC + GBDT and Xgboost + RF also showed good diagnostic performance. At present, CT has been widely used in the diagnosis, monitoring and prognosis evaluation of pancreatic neuroendocrine tumors. Enhanced CT can further differentiate tumors from normal tissues by using contrast media, especially for the detection of vascular proliferation and small lesions. It can improve the accuracy of clinical staging of cancer patients and help formulate treatment strategies. Especially, enhanced CT is suitable for breast and abdominal tumors and has certain diagnostic advantages (31)(32)(33). Texture analysis uses mathematically defined parameters to estimate the distribution of gray scale, roughness, and regularity within the lesion. So we can quantify some of the image features of the tumor by analyzing these parameters. The significant role of texture analysis in diagnosis and prognosis in combination with ultrasound, CT, MRI, and PET/CT has been confirmed. In some previous studies, it has been reported the value of texture analysis for the diagnosis and prognosis of various types of cancers, including lung, stomach, breast and rectal tumor (34)(35)(36)(37). Heterogeneity is recognized as a characteristic of malignant tumors (38). Heterogeneity may be related to gene changes, tumor microenvironment and other factors that are different from normal tissues. Previous studies have also confirmed that CT texture features can also reflect the microenvironment of tumor vessels (39).
A negative finding is that indicators such as pancreatic duct dilatation, peripancreatic infiltration, vascular invasion, pancreatic atrophy and clear pancreatic boundaries do not indicate the pathological grading of tumors. It's contrary to some previous studies (20,21,40). We do not deny that because of its higher malignancy. High-grade PNETs should grow faster and be more invasive than low-grade PNETs. Although many studies have found some correlation between pathological grade and imaging features of PNETs, there is not enough clear evidence that the morphology of tumors, like pancreatic duct dilatation, vascular invasion and other factors can clearly indicate the pathological grade of PNETs. However, previous studies have found that G1/G2 patients have clearer tumor boundaries than G3 patients. Larger tumor diameter, pancreatic duct dilatation, vascular invasion and other manifestations were more common in G2/G3 patients, and had statistical correlation (19,41). This may be due to the limited sample size of this study, and more evidence from similar studies is needed to support these views. At present, pathological grading of PNETs still requires pathological sections to determine its grading. It is difficult but necessary to find the best radiomics feature of machine learning algorithm. Appropriate selection method plays an important role in the performance of classifier. Previous studies on radiomics used many methods for feature selection, such as Mann Whitney U test with AUC of ROC, random forest and student's t-test with recursive feature extraction, etc (42,43). For our research, the number of extracted features is large, which increases the chance to select the optimal feature, but also increases the difficulty of selection. We consider using five different artificial intelligence methods for feature selection, which is better than using a single selection method in previous studies. In fact, three feature selection methods, Lasso, DC, and GBDT have been used in previous studies (44). On this basis, we add RF and Xgboost methods.
The algorithm used in Xgboost is the improvement of GBDT, which can be used for classification and regression problems (45).
From our feature selection results, we can find that the features selected by different selection algorithms are not identical. Some algorithms select a lot features, such as LASSO, RF, and some algorithms select few features, such as Xgboost, but some features can be selected by most algorithms. Maxvalue is a feature describing the maximum value of a tumor image. This is a parameter based on the overall evaluation. In theory, highgrade tumors have more angiogenesis, and their overall characteristics are relatively more complex, which partly explains why maxvalue can describe high-grade PNETs features (23). In fact, we also find that maxvalue is chosen by almost all the selection algorithms. Many previous studies have found that the pathological grading of PNETs is closely related to the parameters of HISTO (Skewness, Kurtosis, Entropy and Energy) (21,23,40,46). However, in our study, only skewness and kurtosis are selected by algorithms, indicating that they are related to pathological grade. We also found SHAPE_Volume (# ml), which is the same first-order parameter, has an indicative     of spatial rate of change in intensity (27). These parameters reflect the differences in gray scale and voxel manifestations of tumors with different pathological grades. The imaging manifestations of malignant tumors are analyzed in a more detailed way.
In addition to finding the best diagnostic model, we also found that some of the models performed poorly. A previous study used LDA and SVM classifier machine learning methods to identify glioblastoma (GBM) and anaplastic oligodendrocytoma (AO), and found that the AUC of testing set was all above 0.90 (47). However, in our study, these two classification algorithms do not show good diagnostic performance. In particular, the models using SVM algorithm often show over-fitting or underfitting. We find that the models using the rest of the classification algorithms perform better than all SVM based models, and the improvement of the models using different selection algorithms is limited. SVM algorithm is usually used to solve machine learning problems with small samples. This seems to be a good fit for the small sample size of this study, but in fact it shows a  Our research uses three-dimensional texture analysis, which can provide more information than the two-dimensional analysis. Compared with previous studies, based on image parameters and imaging parameters, this study introduces more comprehensive parameters such as GLCM, GLRLM, and GLZLM. Many previous studies only studied some parameters of HISTO and morphological characteristics of tumors. We use almost all the machine learning methods involved in the current research to analyze. This can intuitively compare the performance of various algorithm combinations and indicate the best combination. Another advantage of our study is that we have complete preoperative imaging, clinical and pathological data for reference. And the software used to extract texture parameters and execute machine learning to build prediction models in this study is free and open, which is conducive to replicate our research for other researchers.
There are still some shortcomings in this study. First, retrospective design may lead to selection bias. Then the sample size of this study is small and only included in patients undergoing abdominal enhanced CT examination, and the number of pathological grades is different, which may have certain selection bias. Future research needs a larger sample size to evaluate the application value of machine learning and radiomics in describing tumor pathological grading. Secondly, we only roughly defined the time points for performing enhanced CT examination before treatment, which resulted in different time points for enhanced CT examination, which led to deviations in the evaluation of texture features. Then, only the texture features extracted from the arterial phase CT images are used to establish the prediction model, while CT images of other phases are not explored. Finally, due to the lack of external validation, we cannot ensure that our model will have the same diagnostic performance when dealing with external data sets.

CONCLUSION
The preoperative enhanced CT image texture analysis to predict the pathological grade of PNETs patients has a potential application. Radiomics analysis is expected to assist radiologists in obtaining more information from images.

DATA AVAILABILITY STATEMENT
The dataset generated for this study can be obtained from the correspondence author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Administration Office of the West China Hospital, Sichuan University. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
TZ and XM are responsible for research design and project management. HX is responsible for data collection. YZ and XL are responsible for image processing and feature extraction. CC is responsible for article writing. YL and XZ are responsible for statistical analysis. All authors contributed to the article and approved the submitted version.