Machine Learning-Based Analysis of Magnetic Resonance Radiomics for the Classification of Gliosarcoma and Glioblastoma

Objective To identify optimal machine-learning methods for the radiomics-based differentiation of gliosarcoma (GSM) from glioblastoma (GBM). Materials and Methods This retrospective study analyzed cerebral magnetic resonance imaging (MRI) data of 83 patients with pathologically diagnosed GSM (58 men, 25 women; mean age, 50.5 ± 12.9 years; range, 16-77 years) and 100 patients with GBM (58 men, 42 women; mean age, 53.4 ± 14.1 years; range, 12-77 years) and divided them into a training and validation set randomly. Radiomics features were extracted from the tumor mass and peritumoral edema. Three feature selection and classification methods were evaluated in terms of their performance in distinguishing GSM and GBM: the least absolute shrinkage and selection operator (LASSO), Relief, and Random Forest (RF); and adaboost classifier (Ada), support vector machine (SVM), and RF; respectively. The area under the receiver operating characteristic curve (AUC) and accuracy (ACC) of each method were analyzed. Results Based on tumor mass features, the selection method LASSO + classifier SVM was found to feature the highest AUC (0.85) and ACC (0.77) in the validation set, followed by Relief + RF (AUC = 0.84, ACC = 0.72) and LASSO + RF (AUC = 0.82, ACC = 0.75). Based on peritumoral edema features, Relief + SVM was found to have the highest AUC (0.78) and ACC (0.73) in the validation set. Regardless of the method, tumor mass features significantly outperformed peritumoral edema features in the differentiation of GSM from GBM (P < 0.05). Furthermore, the sensitivity, specificity, and accuracy of the best radiomics model were superior to those obtained by the neuroradiologists. Conclusion Our radiomics study identified the selection method LASSO combined with the classifier SVM as the optimal method for differentiating GSM from GBM based on tumor mass features.

While the similarity in the clinical presentation of the two types of tumors underscores the importance of their radiological differentiation, most of the radiological signs of the two tumors overlap (2,4). Prior imaging research has therefore sought to find a method by which to reliably distinguish the two types of tumors: peritumoral edema seen on routine magnetic resonance imaging (MRI) is more severe in patients with GSM (1,2), and other imaging modalities, including diffusion weighted imaging (DWI), perfusion weighted imaging (PWI), and magnetic resonance spectroscopy (MRS), have also proven to be helpful in the identification of the tumors (7,12). However, these imaging methods have not been substantive enough to guide clinical practice due to some limitations. First, qualitative radiological features are susceptible to intra and interobserver variability and lacking reproducibility among evaluators. Second, these radiological modalities only focus on the tumor masses of GSM and GBM when peritumoral edema also requires attention.
Radiomics, a new method for imaging data analysis, has been successfully used for the differentiation of central nervous system tumors: e.g., differentiation between primary central nervous system lymphoma and atypical GBM (13), between GBM and metastasis (14)(15)(16), and between GBM and anaplastic oligodendroglioma (17). Like any high-throughput datamining field, the curse of dimensionality presents a challenge for radiomics analysis. Feature selection is the process of removing irrelevant features that are most conducive to reducing the difficulty of learning task and minimizing the risk of overfitting. This study extracted a large panel of radiomic features from the tumor masses and peritumoral edema of GSM and GBM to inform an optimal machine learning-based algorithm for differentiating GSM from GBM.

Patient Enrollment
The ethics committee of our hospital approved this retrospective study. This study enrolled 83 patients with GSM (58 men, 25 women; mean age, 50.5 ± 12.9 years; range, 16-77 years) between July 2009 and August 2018 and 100 consecutive patients with GBM (58 men; 42 women; mean age, 53.4 ± 14.1 years; range, 12-77 years) between December 2016 and February 2017.
The inclusion criteria for this study were as follows: (I) pathologically confirmed GBM or GSM, as defined by the World Health Organization (WHO) criteria; (II) available preoperative multi-parametric MRI data, including T2weighted imaging (T2WI) and contrast enhanced (CE) data; (III) patients with no history of preoperative treatment for the tumor before receiving MR; and (IV) available clinical data. Patients were excluded if (I) preoperative MR images were not available in our institute; (II) the images were inadequate for image analysis (for example, they featured obvious artifacts); (III) the lesion showed no enhancement on post-contrast images; or (IV) the lesion was recurrent or had received previous treatment. The clinical and imaging characteristics of all patients were retrospectively assessed, including age, gender, tumor location, and the identification of intra-tumoral necrosis and cystic changes and peritumoral edema. The flowchart of 83 patients with GSM and 100 patients with GBM is presented as Supplementary Figure 1. The patients were randomly assigned to either the training (n = 93) or validation groups (n = 90).

MRI Data Acquisition and Region of Interest Segmentation
MRI data included pre-and post-contrast scanning. The detailed scanning parameters are shown in Supplementary Table 1. The presence of intra-tumoral necrosis and cystic changes and peritumoral edema were determined for each case. The intratumoral necrosis and cystic changes were defined as low signal intensity without enhancement on post-contrast images and high signal on T2WI. The peritumoral edema was defined as low signal intensity around enhanced tumors and high signal on T2WI. The identification of intra-tumoral necrosis, cystic changes, and peritumoral edema were performed by two of the co-authors; conflicting opinions were resolved with discussion.
Several postprocessing steps following the acquisition of MR images were performed to reduce data heterogeneity bias. The adjustment of image resolution was first conducted to resample all voxel size to 3.00 × 3.00 × 3.00 mm 3 without gaps between consecutive slices for each MRI image. Image intensity normalization transformed MR imaging intensity into standardized ranges (0-1). The contour of the tumor on axial images in the CE sequence and the high signal around the tumor in the T2 sequence (the tumor itself and peritumoral edema) were manually segmented into region of interest (ROI) on multiple slices with the opensource software MRIcro (http:// www.mccauslandcenter.sc.edu/mricro/). The ROI of the peritumoral edema on CE images was generated by the voxelwise subtraction of the contrast enhancement in CE sequence from high signals on T2WI using FSL (http://fsl.fmrib.ox.ac.uk/ fsl/fslwiki/FSL).

Radiomic Feature Extraction and Stability Evaluation
PyRadiomics (http://readthedocs.org/projects/pyradiomics/) computed a total of nine feature categories, including firstorder statistics, shape descriptors, texture classes (gray level co-occurrence matrix, GLCM), gray level run length matrix (GLRLM), and gray level size zone matrix (GLSZM), and six built-in filters (wavelet, Laplacian of Gaussian (LoG), square, square root, logarithm, and exponential), resulting in a total of 1,303 radiomic features (13 shape features, 18 first-order intensity statistics features, 68 texture features, 86 square features, 86 square root features, 86 logarithm features, 86 exponential features, 172 LoG features, and 688 wavelet features). First-order features are intensity-based statistical features describing the distribution of voxel intensities. Shape features describe the size and shape of the ROIs. GLCM, GLRLM and GLSZM features are all texture-related features defined by different computations based on the gray level of the image. All of the features were defined in compliance with the Imaging Biomarker Standardization Initiative (IBSI). All the radiomics features were listed in the Supplementary Table 2.

Feature Selection and Classification
A total of three feature selection methods based on statistical approaches were applied in this study: least absolute shrinkage and selection operator (LASSO), Relief and Random Forest (RF). While LASSO and RF are embedded methods, Relief is a filter method. The embedded methods (LASSO and RF) and filter method (Relief) are commonly and effectively used feature selection methods. From the performance of the final model, the wrapped feature selection is better than the filtered feature selection, but the model needs to be trained multiple times, so the computational cost is relatively large. We chose these methods mainly because of their efficiency and popularity among previous studies. In the LASSO algorithm, the shrinkage parameter lambda was identified when the misclassification error was smallest in 10-fold cross-validation. The LASSO, Relief, and RF curve analysis were conducted based on the "glmnet", "vsurf", and "CORElearn" packages by R software (version 3.4.0, R Foundation for Statistical Computing), respectively. Then, three machine-learning classifiers were then applied for feature classification: adaboost classifier (Ada), support vector machine (SVM), and RF. These classifiers are widely used pattern recognition tools and imported from the Python (version 3.6.4) machine learning library named scikit-learn (version 19.0).

Differentiation Performance of the Radiomics Models
The three subsets of selected features were then used as an input to each of the three machine-learning classifiers, which generated nine (3×3 = 9) radiomics models. We applied 5-fold crossvalidation as the criteria for each of the nine radiomics models in the training cohort. The differentiation performance was evaluated in the validation cohort. The area under the curve (AUC) and accuracy (ACC) from the receiver operating characteristic curve analysis were calculated to evaluate the differentiation performances of the radiomics models. The optimal thresholds of the AUCs were determined by maximizing the sum of the sensitivity and specificity values calculated for the differentiation of GBM from GSM.
To compare the differentiation performances of the radiomics models and neuroradiologists in differentiating GBM from GSM, we employed the two aforementioned neuroradiologists, who were blinded to the clinical and pathological data, to manually differentiate the GBM from GSM according to all of the sequences (T1WI, T2WI, and CET1WI) showing on the Picture Archiving and Communication Systems (PACS), just as the daily radiological diagnosis workflow before ROI segmentation. They were allowed to see the full MRI images used in this study for the first time. The results of inter-observer variation and concordance with final histopathology statistics between the two neuroradiologists are shown in Supplementary Table 3. The chi-square test was performed to compare the proportion of predicted GBM/GSM between the neuroradiologists and the best radiomics model. The entire analysis process is shown in Figure 1.

Statistical Analysis
Differences in the clinical and MRI characteristics between GBM and GSM were evaluated using the t-test and chi-square test. Pvalues of less than 0.05 were considered to indicate statistical significance. The statistical analysis and figure plots were performed using R (version 3.0.1; http://www.R-project.org) and SPSS (SPSS Inc.).

Selection of Stable Features
We calculated intraclass correlation coefficient (ICC) to select for the robustness of radiomic features in tumor mass and peritumoral edema. For the tumor mass, 918 of the 1,303 (70.5%) extracted radiomic features showed high stability, including 13  Unsupervised clustering of these stable features was conducted and presented as a heat map to yield two imaging subtypes ( Figure 2). However, the association between the imaging and histology subtypes was not obvious.  Figure 5.

Feature Selection and Radiomics Model Construction
To avoid biases and confirm the efficacy of the radiomics model, we compared the performance of the selection method LASSO + classifier SVM in 90 validation cases with that of experienced and inexperienced raters. As shown in Table 4, the clinical performance of the LASSO + SVM radiomics model was superior to that of the neuroradiologists in terms of sensitivity, specificity, and accuracy.

DISCUSSION
This retrospective study developed and validated a favorable predictive model with radiomics features extracted from tumor mass and peritumoral edema to distinguish GSM from GBM. Importantly, the trend of the diagnostic performance of this machine-learning radiomics model was similar in the training set, validation set, and cross-validation analysis. In our study, two neuroradiologists independently rendered diagnosis of the FIGURE 1 | A schematic figure shows the radiomic analysis process. After feature extraction, stable features are selected. Three feature selection and classification methods are combined with favorable models selected and cross-validated in the training cohort. In an independent validation cohort, the optimal model is identified by comparing with pathology. The performance of the optimal model is compared with that of the two neuroradiologists. two kinds of tumors based on the routine MRI; their accuracy was less than 50.0%, lower than the accuracy of the radiomics analysis, suggesting the superiority of radiomics relative to human analysis in distinguishing GSM from GBM. In agreement with previous research (18,19), our study indicated that GSM usually showed enhancement on the solid component with peritumoral edema on routine MRI. These findings, however, are insufficient to inform the distinction of GSM from GBM. Some advanced imaging modalities, such as DWI, PWI, and MRS (7,12,20), have therefore been used to better identify the characteristics of GSM. On DWI, the thicker or more solid components of GSM show a restricted diffusion ratio of as high as 72.7% (8/11) (7); on PWI, the tumor featured high perfusion (7); on MRS, GSM shows a lactate peak indicating local necrosis and hypoxia of the tumor and a higher lipidcholine ratio than do GBM (12,20). These indices obtained from the advanced MR modalities were all derived from analysis of the solid part of the tumor. However, due to the fact that GSM and GBM usually evince necrosis and cystic changes, a comprehensive differentiation between the two tumors should FIGURE 2 | A heat map shows the stable radiomic features. Each column and row correspond to one patient and z-score normalized radiomic feature, respectively. The AUC of the cross-combination methods based on tumor mass and peritumoral edema features is showed in the training set (no brackets) and the validation set (in brackets). Ada, adaboost; AUC, area under the receiver-operating characteristic curve; LASSO, least absolute shrinkage and selection operator; PEF, peritumoral edema feature; RF, random forest; SVM, support vector machine; TMF, tumor mass feature.     simultaneously involve the solid part and non-solid components. The peritumoral region, which usually shows as edema, is also neglected during differentiation. In our study, the differentiation between GSM and GBM not only included the whole part of the lesion but also the peritumoral edema outside of the lesion. Our investigation revealed that, based on the peritumoral edema region, the two tumors can be differentiated with the radiomics method of Relief + SVM (AUC, 0.78; ACC, 0.73). Showing as high signal intensity on T2WI, this region included both vasogenic edema and the infiltration of tumor cells (21)(22)(23). However, compared with this region, analysis of the tumor mass itself allowed for the more efficient differentiation between tumor types. This can be explained by the fact that there are far more tumor cells in the region of tumor mass than in the peritumoral region. Moreover, the whole region of the tumor mass, including necrosis, cystic changes, and other non-enhanced components, was analyzed for its capacity to inform differentiation. As previous studies that employed PWI, DWI, and MRS (7,12), only focused on the solid part of the two kinds of tumors, our analysis is more factual and practicable.
Radiomics is an emerging non-invasive method that extracts high-dimensional sets of imaging features to build appropriate models for survival prediction (24), distant metastasis prediction (25), and molecular characteristics classification (26). However, dimensionality is a critical challenge in radiomics analysis and limits the potential of the radiomics model. Hence, this study compared three feature selection methods and classification methods for improving the stability and classification performance of the radiomics model. After performing nine cross-combinations comparisons, we found the LASSO selection method and the classifier SVM to best differentiate of GSM from GBM. The LASSO is a regularization technique used to minimize the number of non-zero elements and make the solution unique (27). It is therefore often used to solve the problem of large sets of radiomics features derived from a relatively small sample size. The SVM is a powerful classification algorithm that can estimate the classification probabilities and control complexity. These properties account for its effective application in the fields of neuroimaging and molecular biology (16,28) and its superb pairing with the LASSO selection method in our radiomics analysis.
Our study has several limitations. First, it may be subjective to selective bias as a retrospective study. Second, the scanning parameters were not uniform, requiring the preprocessing of the data. Third, compared with the large radiomic features dataset, the sample size was relatively small. Therefore, our results may be caused by overfitting. Fourth, only T2WI and axial post contrast T1WI were used in our radiomic analysis, multi-model imaging data (such as DWI, PWI, MRS) needs to be integrated into our model in the future, to improve its performance. Finally, being a single center study, our study is lack of external independent validation.
In conclusion, this retrospective study presents the machine learning-based MR radiomics model as a non-invasive tool for preoperatively differentiating GSM from GBM with favorable predictive accuracy and stability. Prospective studies are needed to further validate its classification ability.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Beijing Tiantan Hospital. Written informed consent for participation was not provided by the participants' legal guardians/next of kin because: As a retrospective study, it was approved by our institute committee without the informed consent of the patients.