Characterizing Brain Tumor Regions Using Texture Analysis in Magnetic Resonance Imaging

Purpose To extract texture features from magnetic resonance imaging (MRI) scans of patients with brain tumors and use them to train a classification model for supporting an early diagnosis. Methods Two groups of regions (control and tumor) were selected from MRI scans of 40 patients with meningioma or glioma. These regions were analyzed to obtain texture features. Statistical analysis was conducted using SPSS (version 20.0), including the Shapiro–Wilk test and Wilcoxon signed-rank test, which were used to test significant differences in each feature between the tumor and healthy regions. T-distributed stochastic neighbor embedding (t-SNE) was used to visualize the data distribution so as to avoid tumor selection bias. The Gini impurity index in random forests (RFs) was used to select the top five out of all features. Based on the five features, three classification models were built respectively with three machine learning classifiers: RF, support vector machine (SVM), and back propagation (BP) neural network. Results Sixteen of the 25 features were significantly different between the tumor and healthy areas. Through the Gini impurity index in RFs, standard deviation, first-order moment, variance, third-order absolute moment, and third-order central moment were selected to build the classification model. The classification model trained using the SVM classifier achieved the best performance, with sensitivity, specificity, and area under the curve of 94.04%, 92.3%, and 0.932, respectively. Conclusion Texture analysis with an SVM classifier can help differentiate between brain tumor and healthy areas with high speed and accuracy, which would facilitate its clinical application.


INTRODUCTION
Brain cancer remains a diagnostic challenge for clinicians and radiologists because malignant brain tumor cells can invade into the neighboring cells in the brain and spinal cord with fuzzy borders and have a high progression rate (Wild, 2014;Vargo, 2017;Tandel et al., 2019). Treatment of advanced brain tumors is difficult; therefore, early diagnosis is of great importance in clinical settings. The approaches currently employed for the diagnosis of brain tumors include both invasive and noninvasive methods. Although the invasive diagnostic method-biopsy-is viewed as the golden standard for the diagnosis of brain tumors, noninvasive diagnostic methods including magnetic resonance imaging (MRI) are safer and more widely used (Zhao and Jia, 2016). Determination of the accurate location and segmentation of the brain tumor on MRI scans are essential for treatment planning (Mahaley et al., 1989). Several studies have found MRI features capable of differentiating between the tumor and healthy regions (May et al., 1991;Drape et al., 1992;Mullen and Huang, 2017). However, in most cases, the diagnostic accuracy only depends on the proficiency of the medical practitioner reading the MRI scan (Hayward et al., 2008). Many complex patterns, also called image textures, remain imperceptible to the naked eye. Texture analysis is a practical approach for image pattern recognition by extracting objective information through the analysis of the spatial distribution of intensity variations on images (Haralick and Shanmugam, 1973;Haralick, 1979). Furthermore, several studies have confirmed the efficiency of texture analysis (Bayanati et al., 2015;Hodgdon et al., 2015;Skogen et al., 2016).
To increase the diagnostic precision and efficiency, many computer-assisted methods have been developed and introduced, including machine learning (ML) and deep learning (DL) (Zhao and Jia, 2016;Boissoneault et al., 2017;Salvador et al., 2017). Texture analysis combined with ML methods has been widely used to evaluate medical images and yielded promising results (Fetit et al., 2015;Li et al., 2016;Bisdas et al., 2018). However, to the best of our knowledge, there are a few reports on the use of t-distributed stochastic neighbor embedding (t-SNE), which is a new dimensionality reduction and visualization technique to foresee data for preventing problems such as incorrect marking of images and that can help increase the accuracy of the classification.
We hypothesized that some texture features acquired from MRI scans would serve as classification features and markedly improve classification efficiency. To test our hypothesis, the Gini impurity index in the random forests (RFs) was applied to select features, which were then used to develop classification models. Finally, the performance of the features and the models in confirming our hypothesis was assessed.

Subjects
The data used were collected from the Affiliated Nanjing Brain Hospital of Nanjing Medical University. Patients in whom meningioma or glioma was histopathologically confirmed between January 2014 and December 2014 were selected. In all, 40 patients (average age: 51.10 years) comprising 22 men (average age: 52.36 years) and 18 women (average age: 47.33 years) were included. The exclusion criteria were as follows: (1) presence of other organic mental disorders and nervous system diseases and (2) a history of major physical illnesses. All of the patients met the above criteria. The study was approved by the medical ethics committee of Nanjing Medical University. All patients provided signed written informed consent.

MRI Acquisition
All images were acquired using a 3T Siemens MRI system. The patients were instructed to relax, keep their eyes closed, stay awake, and remain still. Patient compliance was confirmed after scanning was completed. The images were recorded axially for 6 min by using an echo-planar imaging sequence with the following parameters: TR = 1900 ms, TE = 2.49 ms, slice thickness = 1 mm, flip angle = 90 • , and matrix size = 256 × 256. All patients underwent MRI without reporting discomfort during or after the procedure.

Preparation Before Classification
For the experimental preparation, the raw sample image format was changed from DICOM to JPG. In the texture analysis, the tumor region in the coronal MRI image was selected as the experimental group, and the symmetrical healthy region on the other side of the brain was selected as the control group. There were 40 tumor regions in the experimental group and 40 healthy regions in the control group. In each group, 25 texture features (belonging to three categories) were calculated, as shown in Table 1.
The 25 texture features were recorded as mean ± SD. Statistical analysis was performed using SPSS (version 20.0), including the Shapiro-Wilk test and Wilcoxon signed-rank test, which was used for testing significant differences in each feature between tumors and healthy areas. Meanwhile, an RF model was employed to predict whether each sample was a tumor or a healthy area and for importance rankings of 25 texture features according to the Gini impurity index in the RF (Menze et al., 2009;Liu et al., 2018). All texture features were selected as predictors to compare the experimental results from the Wilcoxon signed-rank test and RF prediction. In addition,   t-SNE, a new dimension reduction and visualization technique for high-dimensionality data, was performed in the exploratory analysis . It was applied to all 40 pairs of samples with 25 features to delete the samples that would apparently have a negative effect on the latter classification.

Classification
The samples were randomly divided into training (70%) and test sets (30%). This was iterated five times to provide five  unique training and testing groups. The training set was used to generate classification models with three different classifiers: RF, BP, and SVM. The RF is fast, is flexible, and has become a standard tool in biomedical informatics. Each classifier in the ensemble is a decision tree classifier and is generated using random selection of attributes at each node to determine the split.  During classification, each tree votes, and the most popular class is returned. The BP iteratively processes a set of training tuples and compares the network's prediction with the actual known target value. For each training tuple, the weights are modified to minimize the mean squared error between the network's prediction and the actual target value. Modifications are made in the backwards direction. The process will reach the terminating condition when the error is very small.
The SVM is a classification method for both linear and nonlinear data. It uses nonlinear mapping to transform the original training data into a higher dimension. With the new dimension, it searches for the linear optimal separating hyperplane. With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane. SVM finds this hyperplane using support vectors and margins.
Four indexes were used to evaluate each model, including the area under the curve (AUC), error rate, sensitivity, and specificity. Moreover, the receiver operating characteristic (ROC) curve was constructed for each model.

Texture Feature Analysis
Using the Wilcoxon signed-rank test, 25 texture features could reveal regions with higher and lower texture values when comparing the experimental (tumor region) and control groups (health region), as shown in Tables 2-4. We obtained the importance rankings of the 25 texture features according to the Gini impurity index in the RF with a training set (80%). The top five features were standard deviation, first-order moment, variance, third-order absolute moment, and third-order central moment, as shown in Table 5.
The t-SNE test results are shown in Figure 1. In Figure 1A, the original features were those found in the Wilcoxon signedrank test (19 features in total), and in Figure 1B, the original features were the top five features determined in the RF's importance rankings. However, the data distributions after t-SNE were similar. All samples were evidently divided into two clusters, except 12 samples (1, 9, 10, 11, 17, 19, 23, 30, 35, 44, 73, and 79), which were seemingly distributed mistakenly. In addition, t-test was used to examine 40 samples to determine whether their features were relatively different between the tumor and healthy regions. We found that the mean P-value was 0.2390645. The P-values of seven samples-1, 9, 10, 19, 23, 30, and 35-were greater than the mean P-value, and these samples were also mistakenly distributed in the t-SNE picture and were deleted.

Classifier Evaluation
On the basis of the results obtained above, we selected the five features (standard deviation, first-order moment, variance, third-order absolute moment, and third-order central moment) identified in the RF to set up classifiers, which helped save calculation time and resources. Three classification models (RF, SVM, and BP) were applied, and five features were used to train each classifier. A detailed summary of the model's performance is presented in Table 6.
All three models showed satisfactory AUCs of 0.85-0.95. The RF and the BP shared a similar performance based on the AUC, error rate, sensitivity, and specificity. The model trained by the SVM classifier demonstrated the best performance among the three models, with markedly better AUC, error rate, sensitivity, and specificity, indicating that this model could correctly classify the tumor and healthy regions. Receiver operating characteristic (ROC) curves were constructed for the three models to compare their performance directly, as shown in Figure 2.

DISCUSSION
Some studies have reported the same methods to select features, and the validity of this approach has been proven. Wang et al. (2018) evaluated the importance of spectra lines based on RFs and then used a support vector machine (SVM) classifier to classify the laser-induced plasma spectra (LIBS) of bacteria species. The primary objective of this study was to characterize tumor regions using MRI-based texture analysis. We used texture analysis to compute 25 texture features from MRI images. Using the Wilcoxon signed-rank test, we confirmed that 19 texture features of the total 25 features were different between the healthy and tumor regions. Using the t-SNE technique, the dataset was divided into two clusters, indicating that there is a high possibility to set up a classification model with these 19 features. However, training a model with high-dimensionality data requires a lot of time and space. To facilitate faster and more accurate classification, the importance rankings of the features in the RF were calculated, and the top five features were found to show the same classification effectiveness as the 19 features selected before.
The images for the t-SNE test results showed some seemingly noisy dots. Considering the possibility that all mistakenly distributed samples may be deleted incorrectly, the t-test was applied to generally examine whether the healthy and tumor regions showed significant differences in the 25 texture features for each sample. To determine the modified number of samples that would be deleted, the mean P-value was set as the deletion standard, and seven samples were excluded on the basis of this standard. Since the samples were marked manually and these samples were likely to be marked mistakenly, this was a limitation that has been mentioned in many previous studies.
On the basis of the five features, three class-action models were built by training three ML classifiers, namely, RF, SVM, and BP. The SVM classifier was superior to the RF and BP classifiers, as shown in Table 6, since it provided better performance in terms of AUC, error rate, sensitivity, and specificity. These results were shown to be convincing through fivefold confirmation, which was consistent with the findings of previous studies (Zhang et al., 2017). The model in this article was superior to the previous models since it depended on only five features while showing the same AUC. Since the software that is needed to perform texture analysis and build classification models is readily available, clinicians can easily perform such analyses in clinical settings.
This study had some limitations. First, the dataset was modified, since the model was trained with only 80 samples. Its robustness needs further examination. Second, some degree of selection bias may exist. Different categories of brain tumors have different texture features. Some unique features were excluded, which may have influenced the results of our analysis. Third, a manual approach was adopted to segment tumors in this study. Although manual segmentation generally works better than automatic methods, segmentation errors could still exist. Some noise dots may have been mistakenly marked manually, negatively influencing the formation of our model.

CONCLUSION
In conclusion, we hypothesized that a few of the textures acquired from the MRI images could serve the role of classification features, thereby significantly improving the classification efficiency. The Gini impurity index in the RF was applied to select features. On the basis of the five features, three class-action models were built by training three ML classifiers, including RF, SVM, and BP. The classifier model in this article was superior to the previous models, since it depended on only five features. On the basis of our initial findings, tumor regions characterized on the basis of MRI-based texture analysis may have clinical usefulness in differentiating brain tumors.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because the data cannot be used out of the hospital. Requests to access the datasets should be directed to med.info@njmu.edu.cn.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Medical Ethics Committee of Nanjing Medical University. The patients/participants provided their written informed consent to participate in this study.