MRI-Based Radiomics Models for Predicting Risk Classification of Gastrointestinal Stromal Tumors

Background We conduct a study in developing and validating four MRI-based radiomics models to preoperatively predict the risk classification of gastrointestinal stromal tumors (GISTs). Methods Forty-one patients (low-risk = 17, intermediate-risk = 13, high-risk = 11) underwent MRI before surgery between September 2013 and March 2019 in this retrospective study. The Kruskal–Wallis test with Bonferonni correction and variance threshold was used to select appropriate features, and the Random Forest model (three classification model) was used to select features among the high-risk, intermediate-risk, and low-risk of GISTs. The predictive performance of the models built by the Random Forest was estimated by a 5-fold cross validation (5FCV). Their performance was estimated using the receiver operating characteristic (ROC) curve, summarized as the area under the ROC curve (AUC). Area under the curve (AUC), accuracy, sensitivity, and specificity for risk classification were reported. Linear discriminant analysis (LDA) was used to assess the discriminative ability of these radiomics models. Results The high-risk, intermediate-risk, and low-risk of GISTs were well classified by radiomics models, the micro-average of ROC curves was 0.85, 0.81, 0.87 and 0.94 for T1WI, T2WI, ADC and combined three MR sequences. And ROC curves achieved excellent AUCs for T1WI (0.85, 0.75 and 0.82), T2WI (0.69, 0.78 and 0.78), ADC (0.85, 0.77 and 0.80) and combined three MR sequences (0.96, 0.92, 0.81) for the diagnosis of high-risk, intermediate-risk, and low-risk of GISTs, respectively. In addition, LDA demonstrated the different risk of GISTs were correctly classified by radiomics analysis (61.0% for T1WI, 70.7% for T2WI, 83.3% for ADC, and 78.9% for the combined three MR sequences). Conclusions Radiomics models based on a single sequence and combined three MR sequences can be a noninvasive method to evaluate the risk classification of GISTs, which may help the treatment of GISTs patients in the future.


INTRODUCTION
Gastrointestinal stromal tumors (GISTs) is a rare sarcoma of soft tissue that can occur anywhere in the gastrointestinal tract, affecting 6-20 people per million per year in Western and Asian countries (1,2). GISTs originates from the interstitial cells (ICC) of Cajal or common precursor cells (3). Surgical resection is the gold standard for the treatment of gastrointestinal stromal tumors, but as the risk of tumors increases, the risk of postoperative recurrence also increases (4,5). At present, the recognized standard for risk classification of GISTs is the National Institutes of Health (revised in 2008), which can be classified as high-risk, intermediate-risk, low-risk and very lowrisk, according to tumor size, mitotic index, and primary tumor site (6). Studies have shown that NIH classification has important prognostic value (5). The survival rate of high-risk GISTs patients is significantly worse than that of intermediate-risk or low/very low-risk GISTs patients (7). However, pathological evaluation of these surgical specimens is applied postoperatively because it is difficult to calculate the mitotic count before surgery. Therefore, it is still difficult to classify the risk of GISTs before operation. However, for high-risk GISTs patients, previous studies have shown that preoperative targeted drug therapy, such as Imatinib, can shrink the tumor and limit the scope of surgical resection, and improve the prognosis of patients with GISTs (8,9). Therefore, accurate preoperative assessment the risk of GISTs has high clinical value, which can provide important clues for predicting the prognosis of the disease and the use of adjuvant chemotherapy.
In recent years, with the development and application of radiomics, hundreds of standardized and quantifiable imaging features can be extracted from CT/MRI images to assess the biological behavior of a tumor comprehensively, which may potentially improve the accuracy of diagnosis, prognosis and prediction (10). Previous studies used the subjective manifestations of lesions on CT images (tumor size, shape, CT density, enhancement mode, etc.), CT functional parameters, fractal analysis and CT-based radiomics to assess the risk classification of GISTs (11)(12)(13)(14). Large differences between observers shown in subjective signs of imaging in the judgment of GISTs risk classification, due to the different experience of imaging doctors and poor repeatability of subjective signs. Besides, CT fractal analysis can be influenced by various factors such as noise, window width and level, and setting of the software (15). CT-based radiomics have obtained good results in the risk classification of GISTs. However, compared with MR multi-sequence imaging, it provides relatively limited texture features. Studies have used SUVmax in PET/CT to assess the risk of GISTs, it's clinical application value is relatively low due to the high price and long examination time (16). MR imaging, as a non-ionizing radiation examination compared with CT, can provide more lesion information through multisequence imaging in evaluating the biological behavior of abdominal tumors (17). DWI can reflect the dispersion and movement restriction of water molecules. Some studies have shown that DWI texture features can be used as a biological indicator to evaluate the heterogeneity and prognosis of metastatic GISTs (18). Therefore, this study will use DWI texture analysis to study the heterogeneity of GISTs in water molecular dispersion. As a comparison, we will also study the effectiveness of the risk classification of GISTs in the T1, T2 sequence and combined three MR sequences. The purpose of this study was to establish MRI-based radiomics models for noninvasive assessment of GISTs risk classification.

Study Participants
The institutional review board of our hospital approved this retrospective study and waived the requirement to obtain patient approval or written informed consent for the review of medical records or images.
We enrolled 47 patients with Gastrointestinal stromal tumors (GISTs) from our center from September 2013 and March 2019. The inclusion criteria were as follows: (1) patients who underwent surgery for GISTs with curative intent; (2) patients underwent MR less than 15 days before surgical resection; (3) patients with complete clinicopathologic data. The exclusion criteria were as follows: (1) patients received imatinib therapy or other tyrosine kinase inhibitor as a neoadjuvant before surgery (n=4); (2) ADC sequence image was missing (n=2). Finally, 41 patients were included in our study.
Demographic and clinicopathologic data, including age, gender, primary tumor site, size of the tumor (maximum diameter) and mitotic count, were derived from medical records. The NIH modified criteria ( Table 1) were used to stratify the malignant potential of GISTs on the basis of the clinical and postoperative histological index, as a verification of our model. Studies have shown that according to the NIH standards, there is no significant prognostic difference between the very low-risk group and low-risk group (7). Therefore, we combined the very low-risk and low-risk into one group (lowrisk group). Finally, our study includes three groups (low-risk, intermediate-risk, and high-risk).

MR Radiomics Analysis
After patients' MR images were collected, the region of interest (ROI) were contoured manually on three MR sequences (T1WI, T2WI, and ADC), respectively. Two experienced abdominal radiologists, including a 10 years experienced radiologist (depict the ROI) and a 15 years experienced radiologist (check the ROI), outlined each layer of the lesion to form a 3D ROI and saved in a 3D format by using ITK-snap (Version3.8.0, www. itksnap.org, Figure 1) (19). Then, the imaging features were extracted using AK (Artificial Intelligent Kit, GE Healthcare, China). Finally, a total of 396 features were extracted from the analysis of the volumes inspected. These parameters included Histogram Parameters (Energy, Entropy, MaxIntensity, MinIntensity, MeanValue, FrequencySize, VolumeCount, etc); Texture features (Skewness, Kurtosis, Correlation, Cluster Shade, Cluster Prominence); Form Factor Parameters (Sphericity, Surface area, Compactness, Maximum 3D diameter, Spherical disproportion) and second-order features (gray level cooccurrence matrix, GLCM; gray level size zone matrix, GLSZM; gray level run-length matrix, GLRLM). All these radiomics features were further analyzed within the entire cohort of 41 patients. Figure 2 describes the texture parameter extraction process.

Statistical Analysis
Continuous variables are summarized with medians and ranges; categorical variables are described with frequencies and percentages. The patients' clinical characteristics among the three groups were analyzed with the chi-square test.
Using the Kruskal-Wallis test, differences among the three groups' radiomics features were measured in the T1WI, T2WI, and ADC sequences; p values were adjusted using Bonferonni correction and p values less than 0.017 (0.05/3) were considered statistically significant. And then, the variance threshold algorithm was used to remove those radiomics features with low variances. Hence, the appropriate feature sizes were selected by feature selection methods (Kruskal-Wallis test and variance threshold). Using these selected radiological characteristics, the Random Forest model is used for analysis (three classification model) to determine whether the selected characteristics can distinguish different risk classification. The predictive performance of the models built by the Random Forest was estimated by a 5-fold cross validation (5FCV). Their performance was estimated using the receiver operating characteristic (ROC) curve, summarized as the area under the ROC curve (AUC). The cohort was randomly split into five subsamples: one formed the test dataset for verifying the effectiveness of the model, and the others formed the training dataset to determine risk classification of GISTs for the model. The cross-validation process was repeated five times, with each of the five subsamples used as the validation data once. For selecting the best features, their performance was estimated using the receiver operating characteristic (ROC) curve, summarized as the area under the ROC curve (AUC). Ranked by AUC, the 30 most important features of all features were used to train the classifier. In addition, the AUC, accuracy, specificity and sensitivity at the best cut-off point, and 95% confidence interval are also demonstrated. Since the results of our study are multi-category indicators, we use micro-averaging of ROC to make statistics on each example in the data set regardless of category, to evaluate the effectiveness of the model. At the same time, LDA (multiclassification model) and LOOCV (Leave-oneout cross-validation) was performed to evaluate and verify the discrimination ability of the single and combined sequences models on the basis of the selected radiomics features by above method.
Statistical analyses for the present study were performed with R (version 3.5.1). A two-sided p value < 0.05 indicated statistical significance.

Patient Characteristics
Forty-one patients were comprised of men (19 cases) and women (22 cases), gastric (32 cases) and non-gastric (9 cases), the lowrisk (17 cases, 66.4 years, range 49-84 years), the intermediaterisk (13 cases, 71.2 years, range 59-85years), and the high-risk (11 cases, 65.0 years, range 47-87 years). A statistical difference in tumor size and mitotic number among the three groups (P <0.001, P =0.002) was found, but no statistical differences were found in age (P = 0.249), gender (P = 0.360), primary tumor site (P = 0.252) among the three groups. The clinicopathologic characteristics of gender, age, primary tumor site, risk classification, and mitotic count of the three groups were summarized in Table 2.

Selection of Extracted Radiomics Features and Performance of Risk Classification
In the MR images of three sequences (T1WI, T2WI, and ADC), the thirty most important parameters based on contribution to classification in Random Forest for each sequence are shown in Supplement. Radiomics features such as Grey Level Nonuniformity, Run Length Nonuniformity, Volume were significantly different among the three GISTs risks groups on three sequences. By using ROC curves after 5FCV selecting features, the effectiveness of these selected features for the risk    Tables 3, 4 and Figure 3. LDA was then used to assess the discriminative ability of these selected radiomics features, while LOOCV was used to correct the result. (Figure 4). For T1WI sequence, 61.0% of the three originally grouped cases (three GISTs risk groups) were correctly classified, while 58.5% of these three grouped cases abovementioned were correctly classified after cross-validation. For T2WI sequence, 70.7% of the originally grouped cases were correctly classified and 58.5% of the cross-validated grouped cases were correctly classified. As for ADC sequence, 83.3% of the originally grouped cases were correctly classified and 66.2% of the cross-validated grouped cases were correctly classified. When combining three MR sequences,78.9% of the three originally grouped cases (three GISTs risk groups) were correctly classified, while 65.0% of these three grouped cases abovementioned were correctly classified after cross-validated.

DISCUSSION
In this study, we evaluated the diagnostic value of radiomic features extracted from MR images in identifying the risk of GISTs (lowrisk, intermediate-risk, and high-risk). Previous studies have mostly used imaging findings, such as necrotic cysts, to assess the risk of GISTs, but the precise staging of stromal tumor aggressiveness has great differences between observers and  limited accuracy (20)(21)(22).We found that some radiomics features were significantly different among the three risk classifications of GISTs. Based on these radiomics features, the ROC curve yielded decent AUC. This result indicated that our MR-based radiomics method yielded excellent performance in distinguishing low-risk, intermediate-risk and high-risk GISTs. Considering that all three sequences in our study are routinely used in our center, our results have good clinical application prospects. We found several interesting points of significantly different radiomics features among the three levels of risk of GISTs. For example, among T1WI, T2WI and ADC sequences, Grey Level Nonuniformity, Volume, Run Length Nonuniformity, and Frequency Size have significant specificity among the three levels of risk of GISTs. These findings abovementioned might indicate greater textural heterogeneity on high risk GISTs, which were consistent with previous studies (23,24). Ren et al. aimed to predict the malignant potential assessment of GISTs patients through CT texture features before surgery and found that high malignant potential GISTs demonstrated obviously higher heterogeneity than low malignant potential GISTs   demonstrated (23). Furthermore, Yang et al. constructed a nomogram based on MR radiomics features such as RunLengthNonUniformity, ShortRunHighGrayLevelEmphasis, and OriginalFirstorderMinimum. The calculated scores demonstrated that high malignant potential GISTs was significantly more heterogeneous than low malignant potential (24). In addition, among the features we selected, the parameters reflecting the shape of the lesion, such as volume and maximum 3D diameter on MRI has value for guiding GISTs risk classification, which is consistent with tumor size as an important factor in assessing the malignant potential and prognosis of GISTs (see Table 1).
We found that the ADC sequence outperformed T1WI and T2WI sequences in evaluating the risk classification of GISTs. In ADC sequence, the AUC of high-risk group was 0.85, which indicated that the ADC sequence had high efficiency for high-risk group identification. To date, some studies have evaluated the discriminative ability of different MRI sequences on the basis of radiomics, among which some studies have mentioned to the favorable predictive value of ADC in radiomics analyses on discriminating benign and malignant tumors (25,26). The radiomics model based on ADC sequence has a positive application in the classification of meningioma, cholangiocarcinoma and glioma (27)(28)(29)(30).
In our study, we have proven that our combined three MR sequences radiomics model has excellent performance in diagnosing different risk classifications of GISTs correctly (micro-average=0.94), especially in identifying high-risk GISTs (AUC=0.96). Wang et al. (31)used CT images to establish a predictive model to distinguish the high and low malignant potential of GISTs and the AUC of the model was 0.882. The results show that the feature extraction of multi sequence MR images can provide more texture information of lesions, which is helpful to improve the ability of the model to evaluate the risk of GISTs (32)(33)(34). In addition, our study also employed an uncommon statistical method (LDA) to assess the diagnostic ability of radiomics models as a supplement to ROC curve, which provides a new perspective for evaluating radiomics data. However, our study had several limitations although the results were encouraging. First, compared with a large number of extracted radiological features, the sample size of our study is relatively small. Therefore, we use Random Forest to avoid overfitting in the model derivation process (35), and in the future can improve by increasing the sample size. Therefore, large-scale, prospective and multi-center studies are needed to validate our results. Second, due to the large slice thickness and interslice gap in MR imaging, it is easy to cause the partial volume effect of small tumors. Therefore, tumors size less than 2 cm were excluded from our study. In future, we consider reducing the thickness of the slice to facilitate the inclusion of small tumors with a maximum diameter of 1.0-2.0 cm. Third, we did not consider gene mutations in this study, such as KIT and PDGFRA mutations (36,37), which are essential for diagnosing some difficult cases, predicting the therapeutic effect of targeted A B D C FIGURE 4 | Discriminant function analysis based on radiomics features extracted from three MRI sequences for the GISTs risk classification. (A) T1WI, 61.0% of the three originally grouped cases (three GISTs risk classification) were correctly classified; (B) T2WI, 70.7% of the three originally grouped cases (three GISTs risk classification) were correctly classified; (C) ADC, 83.3% of the three originally grouped cases (three GISTs risk classification) were correctly classified; (D) combined three MR sequences, 78.9% of the three originally grouped cases (three GISTs risk classification) were correctly classified. drugs and guiding medical decision-making. Therefore, we will consider genome characteristics to build a more comprehensive radiogenomics model in the future.

CONCLUSION
In conclusion, our research proposes that radiomics models based on a single sequence and combination of multiple sequences can help classify the risk of GISTs. As a noninvasive and reproducible method, radiomic analysis may become a potential biomarker for GISTs. If finally put into practice, it may completely change the diagnosis and clinical treatment of GISTs, although it still has a long way to go.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of Shaoxing People's Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements. Written informed consent was not obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.