Original Research ARTICLE
Comparison of Radiomics-Based Machine-Learning Classifiers in Diagnosis of Glioblastoma From Primary Central Nervous System Lymphoma
- 1State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Collaborative Innovation Center for Biotherapy, Chengdu, China
- 2Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, China
- 3West China School of Medicine, West China Hospital, Sichuan University, Chengdu, China
- 4School of Computer Science, Nanjing University of Science and Technology, Nanjing, China
- 5Department of Biotherapy, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
Purpose: The purpose of the current study was to evaluate the ability of magnetic resonance (MR) radiomics-based machine-learning algorithms in differentiating glioblastoma (GBM) from primary central nervous system lymphoma (PCNSL).
Method: One-hundred and thirty-eight patients were enrolled in this study. Radiomics features were extracted from contrast-enhanced MR images, and the machine-learning models were established using five selection methods (distance correlation, random forest, least absolute shrinkage and selection operator (LASSO), eXtreme gradient boosting (Xgboost), and Gradient Boosting Decision Tree) and three radiomics-based machine-learning classifiers [linear discriminant analysis (LDA), support vector machine (SVM), and logistic regression (LR)]. Sensitivity, specificity, accuracy, and areas under curves (AUC) of models were calculated, with which the performances of classifiers were evaluated and compared with each other.
Result: Brilliant discriminative performance would be observed among all classifiers when combined with the suitable selection method. For LDA-based models, the optimal one was Distance Correlation + LDA with AUC of 0.978. For SVM-based models, Distance Correlation + SVM was the one with highest AUC of 0.959, while for LR-based models, the highest AUC was 0.966 established with LASSO + LR.
Conclusion: Radiomics-based machine-learning algorithms potentially have promising performances in differentiating GBM from PCNSL.
Glioblastoma (GBM) and primary central nervous system lymphoma (PCNSL) are considered as the common primary brain tumors, which share similar radiological characteristics but diverse in therapeutic strategies (1–3). The standard of treatment for a GBM is total resection, followed by daily radiation and chemotherapy (like temozolomide) for 6.5 weeks, then a 6-month regimen of oral chemotherapy given 5 days a month, while the first-line treatment for PCNSL is systemic chemotherapy (like high-dose methotrexate regimen) (4). In most cases, the morphological description of two types of tumors on MRI is characteristic enough for adequate discrimination (5, 6). However, misdiagnosis could still incur in some cases because the images of atypical GBM and atypical PCNSL could mimic each other (7). Advanced MRI technology could be useful in the differentiation. However, the urgency of novel radiological methods focused on conventional MR sequences has still been highlighted given that the advanced MRI cannot be performed as the routine examination for every patient.
Texture analysis (TA) refers to a number of a set of mathematical methods describing the features of images, with which non-visual information could be represented with analyzable pixel intensities and the spatial distributions (8, 9). It has been applied as the radiological imaging biomarkers to evaluate tumor heterogeneity, and showed promising ability in as tumor diagnosis, presurgical grading, as well as gene mutation prediction (10–12). Moreover, with quantified analyses of images, it has also been incorporated with various novel computer technologies, such as machine learning (13–16).
The purpose of the present study is to discriminate GBM from PCNSL with radiomics-based machine-learning algorithms in contrast-enhanced T1-weighted (T1C) imaging. In addition, we evaluated different combinations of selection methods and classifiers, trying to make comparison of models' performances.
The patients were selected from neurosurgery department by reviewing the electronic medical records between 2015 and 2018. The including criteria of patients were as follows: (1) pathologically confirmed on GBM or PCNSL; (2) undertook MR scan before any tumor biopsy or surgery; (3) newly diagnosed GBM or PCNSL. Some patients were excluded because of the history of intracranial surgery or irrelevant intracranial diseases. In total, 138 patients (72 men, median age 48 years; and 66 women, median age 54 years) were enrolled from the institution database, including 76 patients diagnosed with GBM and 62 diagnosed with PCNSL.
The MR images were collected from the PACS system in the radiological department. We focused on conventional MR sequences, including T1-weighted image (T1WI), contrast-enhanced T1-weighted (T1C) imaging, T2-weighted image (T2WI), and fluid-attenuated inversion recovery, considering that the advanced MR sequences were not commonly used in our institution. After the initial evaluation of images, T1C was selected as the study sequences with rather clear description of the boundary between the tumor tissues and normal brain tissue (Figure 1).
Figure 1. The magnetic resonance images (T1C) of patients with (A) primary central nervous system lymphoma (PCNSL) or (B) glioblastoma (GBM).
The preoperative MR scan was conducted with 3-T GE MRI system with an eight-channel phase-array head coil. The protocols of the contrast-enhanced T1-weighted imaging were time repetition = 2,000 ms, field of view = 240 × 240 mm2, time echo = 30 ms, 30 axial slices, slice thickness = 5 mm (no slice gap), flip angle = 90°, and 200 volumes in each run. Gadopentetate dimeglumine (0.1 mmol/kg) were taken as the contrast agent. The multi-directional data of contrast-enhanced MRI were collected with the continuous interval time of 90–250 s.
All procedures involving human participants were in accordance with the ethical standards of the institutional and/or national research committee. The Ethics Committee of Sichuan University approved this retrospective study. Written informed consent was necessary before radiological examination (written informed consent for patients <16 years old was signed by parents or guardians) for all patients. They agreed to undertake the examination if needed and were informed that the statistics (including MR image) might be used for academic purposes in the future.
Texture Feature Extraction
Two neurosurgeons participated in the extraction of texture features by using lifeX software (http://www.lifexsoft.org) under the supervisions of senior radiologists. By manually drawing along the tumor tissue slice by slice, the software automatically retrieved 3D-based texture features from two sets of orders with default settings (17). In the first order, statistics from shape- and histogram-based matrix were retrieved. In the second order, statistics from gray-level co-occurrence matrix (GLCM), gray-level zone length matrix (GLZLM), neighborhood gray-level dependence matrix (NGLDM), and gray-level run length matrix (GLRLM) were retrieved. The images were excluded of which the volume of interest did not reach 64 voxels to avoid the interference of the lower image matrix resolution.
Mann–Whitney U-test was employed to explore if there is significant statistical difference between the data extracted by two researchers. The results suggested that none of the features were significantly different, implying that the results could be considered reliable and reproducible (shown in Supplementary Material 1).
Classification Algorithm Application
The patients were randomly divided into the training group and the validation group on the proportion of 4:1. For machine-learning classifiers, the optimal texture features were selected first for classifiers to reduce the number of input variables to improve the performance of the model and to both reduce the computational cost. Considering the optimal selection method was controversial for different classifiers, five methods were conducted separately, including distance correlation, random forest (RF), least absolute shrinkage and selection operator (LASSO), eXtreme gradient boosting (Xgboost), and Gradient Boosting Decision Tree (GBDT).
The purpose of machine learning was to establish and train the models to discriminate GBM from PCNSL with radiomics features extracted from T1C imaging. Three classifiers were tested, including linear discriminant analysis (LDA), support vector machine (SVM), and logistic regression (LR). Thus, 15 diagnostic models were evaluated with different combinations of selection methods and classifiers. The models were trained with the statistics of the training group and tested in the validation group. Sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and accuracy of each model were recorded for evaluation. On application of each model, the cycle of training-validation was performed 100 times to obtain the realistic distribution of classification accuracies. The flow chart of the study is represented in Figure 2.
The models were programmed using Python Programming Language in this study. The models were directly established with default hyperparameter settings of scikit-learn packages (https://scikit-learn.org/stable/).
The selected features with different methods are represented in Table 1. Four features, GLRLM_LGRE, GLRLM_HGRE, GLRLM_SRHGE, and GLZLM_HGZE, were almost selected even using different methods, suggesting that they were the most significant features in discrimination compared with the others. The other selected features should be reasonably considered as relevant in discrimination, but was hard to tell how much they influenced the algorithms' performances.
The performances of models are listed in Table 2. As mentioned previously, the models were established with different combinations of selection methods and classifiers. The results indicated that all three classifiers represented impressive differential ability when using suitable selected features, and the LDA classifier showed much better compatibility compared with other classifiers. Over-fitting was observed in six models, including RF + SVM, Xgboost + SVM, GBDT + SVM, and RF + LR, Xgboost + LR, and GBDT + LR. For LDA-based models, the AUCs in the validation group were 0.978, 0.964, 0.977, 0.750, and 0.956; for the SVM-based models, the AUCs were 0.959 and 0.822; and for LR-based models, the AUCs were 0.933 and 0.975.
Table 2. Results of the discriminative model in distinguishing GBM from PCNSL in the training and validation group.
In the current study, the optimal model was Distance Correlation + LDA. In the training group, the predictive model showed the discriminative ability with AUC of 0.992, accuracy of 0.993, sensitivity of 0.996, and specificity of 0.990. In the validation group, the performance of the model was rather good, with AUC of 0.978, accuracy of 0.979, sensitivity of 0.982, and specificity of 0.976. The association between discriminative functions from models is represented in Figure 3. Figure 4 represents the examples of distribution of the direct LDA function diagnosis of GBM and PCNSL for one cycle.
In the current study, we performed research in differentiating GBM from PCNSL with the radiomics-based machine-learning technology. Radiomics parameters were extracted from T1C images to detect non-visual information of two types of tumors. The models were established with five selection methods and three classifiers and tested to find the optimal model. The result showed that the radiomics-based machine-learning classifier represented excellent performance in all classifiers with AUC more than 0.900. The optimal model was the combination of Distance Correlation + LDA with AUC of 0.978, accuracy of 0.979, sensitivity of 0.982, and specificity of 0.976. Given that the T1C image was routine examination for GBM and PCNSL, our results suggested that radiomics was a feasible solution for clinical application without requiring additional fees or platform.
Generally, contrast-enhanced T1imaging is a routine radiological examination for patients with GBM or PCSNL. A previous study indicated that at the time of initial presentation for many cases, routine morphological MRI is capable enough in differentiating between GBM and PCNSL lesions. The image patterns are correlated with the tumor characteristics, such as intratumoral hemorrhage, angiogenesis, and necrotic or cystic components. Specifically, heterogeneous enhancement was present in 98.1% of GBM cases and homogenous enhancement in 64.8% of PCNSL cases; necrosis was observed in 88.9% of GBM lesions and 5.6% of PCNSL lesions; multiple lesions were shown in 51.9% of PCNSL cases and 35.2% of GBM cases. Signs of bleeding were uncommon in PCNSL (5.6%) and frequent in GBM (44.4%) (18). Advanced imaging techniques, such as apparent diffusion coefficient (ADC), diffusion-tensor imaging (DTI), dynamic susceptibility-weighted contrast-enhanced MRI, and perfusion weighted imaging, were also additionally performed in discriminating GBM and PCNSL if necessary (19–21). Surgeons could obtain the information on characteristics of tumors to make diagnostic and treatment decisions. However, even with these researches, the differential diagnosis between GBM and PCNSL was still a challenge in some cases, especially given that the conventional MR sequence could only make limited discrimination between two types of tumors and that advanced imaging techniques were not available for all patients.
Comparing with GBM, permeable neovascularization and higher degree of cellularity were more likely to be observed in PCNSL, which theoretically provide the mechanism of TA-based image discrimination (22–24). In our study, radiomics of T1C imaging were used to detect the microscopic differences between GBM and PCNSL, and the results suggested TA was the feasible solution in discriminating GBM and PCNSL radiologically. Radiomics has been reported to distinguish GBM from PCNSL in a previous study, and machine-learning classification model was reported to improve the performance in discrimination (6, 25). Researchers made comparison on diagnostic accuracy between radiologists and machine-learning classifiers, and they suggested that classifiers yielded better diagnostic performance than human radiologists (25). However, the sample sizes of these studies were not large enough and only a few models were tested. Our study enrolled 138 patients with rational proportion of each group and made an evaluation on 15 combinations. In a previous study, RF-based classifier represented perfect performance in discriminating atypical glioblastoma from PCNSL with AUC of 0.98 (6), and SVM-based classifier also represented non-inferior performance to expert human with AUC of 0.877 (25). In our study, the results showed that all three classifiers represented perfect performance when combined with a suitable selection method. It is worth noting that the result of the optimal SVM-based model in our study was with AUC of 0.96, demonstrating much better diagnostic performance than the previous study.
The possible explanation for the improvement was the performance improvement in selection method. Radiomics analysis involved large amounts of features, but machine learning required the most suitable parameters. Previous researchers selected parameters with F-statistic approach into SVM classifier, while we selected with distance correlation, RF, LASSO, Xgboost, or GBDT approach. The combination of LASSO + SVM represented similar discriminative performance such as in the previous study with AUC of 0.822. Besides performances, we can also find that the selection methods were also important to the model stability. Over-fitting is a problem that should be avoided in designing the machine-learning models, which happens when the models catch inaccurate values in the data and the noisy data. Our results suggested that over-fitting probably occurred when using RF, Xgboost, and GBDT as selection methods. Perhaps the features selected with these methods contained too much noise and led to the over-fitting of models.
As for the classifier selection, the purpose of enrollment of three classifiers was to choose the suitable one in discriminating GBM from PCNSL. The results suggested that with suitable features, all of them could represent discriminative ability. It is worthy to note that although we chose Distance Correlation + LDA as the optimal model, some models (like LASSO + LDA and LASSO + LR) also represented pretty similar discriminative performances. The model Distance Correlation + LDA was chosen as the optimal one because it has the minimal difference between sensitivity and specificity compared with LASSO + LDA and LASSO + LR. However, given that all classifier/feature selection methods investigated seem to perform quite comparably and variance in AUC may be partially attributed to small statistical group, the additional gain in information by comparing machine-learning models was quite limited and carefully interpreted. Future investigations with larger sample sizes are required to address this problem and verify our results.
There were several limitations to our study. First, the isolated evaluation of T1C image is not representative of the real clinical work given other sequences (such as ADC, perfusion, DTI, and T2 gradient-echo) could also be useful. Second, the diagnostic performance of radiomics-based machine learning was not compared with other advanced MRI technology. Third, the study cohort is not large enough, requiring study with a large population to verify our results. Forth, the machine-learning classifier was not validated in the other dataset. Considering the considerable variability in images acquired with various MR scanner at different institutions, we cannot guarantee the diagnostic ability of our machine-learning classifier for external datasets. However, the image processing and analysis protocol were open-source packages, meaning they should be validated and reproduced with other datasets.
Radiomics with machine-learning algorithm technology represented promising ability in differentiating GBM from PCNSL.
Data Availability Statement
The datasets generated for this study are available on request to the corresponding author.
The studies involving human participants were reviewed and approved by the Ethics Committee of Sichuan University. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.
XM participated in conceptualization and revised some intellectual content in the manuscript. CC collected MR image, participated in MRI features extraction, and drafted this manuscript. XO collected MR image and participated in MRI features extraction. JW deployed the machine-learning algorism and responsible for statistical analysis. AZ participated in the most revision work. All authors contributed to the article and approved the submitted version.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2020.01151/full#supplementary-material
1. Dolecek TA, Propp JM, Stroup NE, Kruchko C. CBTRUS statistical report: primary brain and central nervous system tumors diagnosed in the United States in 2005-2009. Neuro Oncol. (2012) 14(Suppl. 5):v1–49. doi: 10.1093/neuonc/nos218
3. Kickingereder P, Wiestler B, Sahm F, Heiland S, Roethke M, Schlemmer HP, et al. Primary central nervous system lymphoma and atypical glioblastoma: multiparametric differentiation by using diffusion-, perfusion-, and susceptibility-weighted MR imaging. Radiology. (2014) 272:843–50. doi: 10.1148/radiol.14132740
4. von Baumgarten L, Illerhaus G, Korfel A, Schlegel U, Deckert M, Dreyling M. The diagnosis and treatment of primary CNS lymphoma. Dtsch Arztebl Int. (2018) 115:419–26. doi: 10.3238/arztebl.2018.0419
6. Suh HB, Choi YS, Bae S, Ahn SS, Chang JH, Kang SG, et al. Primary central nervous system lymphoma and atypical glioblastoma: differentiation using radiomics approach. Eur Radiol. (2018) 28:3832–9. doi: 10.1007/s00330-018-5368-4
7. Al-Okaili RN, Krejza J, Woo JH, Wolf RL, O'Rourke DM, Judy KD, et al. Intraaxial brain masses: MR imaging-based diagnostic strategy–initial experience. Radiology. (2007) 243:539–50. doi: 10.1148/radiol.2432060493
10. Li Y, Liu X, Qian Z, Sun Z, Xu K, Wang K, et al. Genotype prediction of ATRX mutation in lower-grade gliomas using an MRI radiomics signature. Eur Radiol. (2018) 28:2960–8. doi: 10.1007/s00330-017-5267-0
11. Ditmer A, Zhang B, Shujaat T, Pavlina A, Luibrand N, Gaskill-Shipley M, et al. Diagnostic accuracy of MRI texture analysis for grading gliomas. J Neuro Oncol. (2018) 140:583–9. doi: 10.1007/s11060-018-2984-4
12. Zhou H, Vallieres M, Bai HX, Su C, Tang H, Oldridge D, et al. MRI features predict survival and molecular markers in diffuse lower-grade gliomas. Neuro Oncol. (2017) 19:862–70. doi: 10.1093/neuonc/now256
15. van IJzendoorn DGP, Szuhai K, Briaire-de Bruijn IH, Kostine M, Kuijjer ML, Bovee JVMG. Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas. PLoS Comput Biol. (2019) 15:e1006826. doi: 10.1371/journal.pcbi.1006826
17. Nioche C, Orlhac F, Boughdad S, Reuze S, Goya-Outi J, Robert C, et al. LIFEx: a freeware for radiomic feature calculation in multimodality imaging to accelerate advances in the characterization of tumor heterogeneity. Cancer Res. (2018) 78:4786–9. doi: 10.1158/0008-5472.CAN-18-0125
18. Malikova H, Koubska E, Weichet J, Klener J, Rulseh A, Liscak R, et al. Can morphological MRI differentiate between primary central nervous system lymphoma and glioblastoma? Cancer Imaging. (2016) 16:40. doi: 10.1186/s40644-016-0098-9
19. Toh CH, Wei KC, Chang CN, Ng SH, Wong HF. Differentiation of primary central nervous system lymphomas and glioblastomas: comparisons of diagnostic performance of dynamic susceptibility contrast-enhanced perfusion MR imaging without and with contrast-leakage correction. AJNR. (2013) 34:1145–9. doi: 10.3174/ajnr.A3383
20. Radbruch A, Wiestler B, Kramp L, Lutz K, Baumer P, Weiler M, et al. Differentiation of glioblastoma and primary CNS lymphomas using susceptibility weighted imaging. Eur J Radiol. (2013) 82:552–6. doi: 10.1016/j.ejrad.2012.11.002
21. Choi YS, Lee HJ, Ahn SS, Chang JH, Kang SG, Kim EH, et al. Primary central nervous system lymphoma and atypical glioblastoma: differentiation using the initial area under the curve derived from dynamic contrast-enhanced MR and the apparent diffusion coefficient. Eur Radiol. (2017) 27:1344–51. doi: 10.1007/s00330-016-4484-2
22. Kickingereder P, Sahm F, Wiestler B, Roethke M, Heiland S, Schlemmer HP, et al. Evaluation of microvascular permeability with dynamic contrast-enhanced MRI for the differentiation of primary CNS lymphoma and glioblastoma: radiologic-pathologic correlation. AJNR. (2014) 35:1503–8. doi: 10.3174/ajnr.A3915
23. Guo AC, Cummings TJ, Dash RC, Provenzale JM. Lymphomas and high-grade astrocytomas: comparison of water diffusibility and histologic characteristics. Radiology. (2002) 224:177–83. doi: 10.1148/radiol.2241010637
24. Toh CH, Castillo M, Wong AM, Wei KC, Wong HF, Ng SH, et al. Primary cerebral lymphoma and glioblastoma multiforme: differences in diffusion characteristics evaluated with diffusion tensor imaging. AJNR. (2008) 29:471–5. doi: 10.3174/ajnr.A0872
25. Alcaide-Leon P, Dufort P, Geraldo AF, Alshafai L, Maralani PJ, Spears J, et al. Differentiation of enhancing glioma and primary central nervous system lymphoma by texture-based machine learning. AJNR. (2017) 38:1145–50. doi: 10.3174/ajnr.A5173
Keywords: glioblastoma, primary central nervous system lymphoma, magnetic resonance imaging, radiomics, machine learning
Citation: Chen C, Zheng A, Ou X, Wang J and Ma X (2020) Comparison of Radiomics-Based Machine-Learning Classifiers in Diagnosis of Glioblastoma From Primary Central Nervous System Lymphoma. Front. Oncol. 10:1151. doi: 10.3389/fonc.2020.01151
Received: 17 July 2019; Accepted: 08 June 2020;
Published: 15 September 2020.
Edited by:Tsair-Fwu Lee, National Kaohsiung University of Science and Technology, Taiwan
Reviewed by:Remco Molenaar, Amsterdam University Medical Center, Netherlands
Yu-Jie Huang, Kaohsiung Chang Gung Memorial Hospital, Taiwan
Copyright © 2020 Chen, Zheng, Ou, Wang and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xuelei Ma, firstname.lastname@example.org
†These authors share first authorship