- 1Department of Radiology, Aerospace Center Hospital, Beijing, China
- 2Department of Nephrology, Aerospace Center Hospital, Beijing, China
- 3Department of Medical Imaging Center, Affiliated Hospital of Inner Mongolia Minzu University, Tongliao, China
Background: Preoperative differentiation of primary central nervous system lymphoma (PCNSL) from adult-type diffuse glioma(ADG) is important to guide neurosurgical decision-making.To develop and validate a MRI–based interpretable radiomic-clinical(Rad-Clinic) fusion model to differentiate PCNSL from ADG by seven machine learning algorithms.
Methods: In this retrospective study, we recruited 165 patients who underwent preoperative conventional MRI(CET1WI, FLAIR, DWI, ADC) with PCNSL and ADG from two institutions (115 in the training cohort and 50 in the external validation cohort). we selected seven machine learning algorithms to construct a framework incorporating radiomic features and clinical parameters. SHapley Additive exPlanations (SHAP) values elucidated feature contributions, and a radiomic nomogram was developed for clinical translation.
Results: The CET1WI+DWI+FLAIR fusion model exhibited optimal performance among all the single-sequence and multi-sequence radiomic models, and the AUC for external validation cohort were 0.871. But the Rad-Clinic fusion model performed well in differentiating PCNSL from ADG, and the AUC for the training and external validation cohort were 0.973 and 0.940, outperforming radiomic model and clinical model.SHAP summary plot illustrated the feature’s value affected the feature’simpact attributed to the Rad-Clinic fusion model.The nomogram demonstrated clinical interpretability through visualised risk stratification.
Conclusion: An interpretable Rad-Clinic fusion model enables accurate preoperative to differentiate PCNSL from ADG, and may assist improve clinical decision-making.
Introduction
Primary central nervous system lymphoma (PCNSL) and glioma represent the two most prevalent primary malignant intracranial neoplasms (1). The fifth edition of the the 2021 World Health Organization (WHO) Classification of Tumors of the Central Nervous System (CNS) introduced substantial revisions to the molecularly-defined classification of glioma subtypes, with adult-type diffuse glioma (ADG) emerging as the most common histological subtype (2, 3). PCNSL is predominantly managed through non-surgical therapeutic modalities, including chemotherapeutic regimens and targeted therapies such as high-dose methotrexate combined with consolidation chemotherapy and whole-brain radiotherapy (4). In contrast, the standard treatment paradigm for ADG involves maximal safe tumor resection followed by adjuvant radiotherapy and chemotherapy. Given these fundamentally divergent therapeutic approaches between the two malignancies, accurate preoperative differential diagnosis becomes imperative for optimal clinical decision-making and treatment stratification (5).
Although the characteristic magnetic resonance imaging(MRI) features of ADG and PCNSL have been extensively documented in the literature, the majority of clinical presentations manifest atypical radiological profiles, posing significant diagnostic challenges. This diagnostic ambiguity becomes particularly pronounced and clinically consequential when neoplasms arise in the central core regions of the cerebral hemispheres, where preoperative differentiation between these entities assumes heightened clinical significance (6). Consequently, accurate preoperative differentiation PCNSL from ADG is critical for guiding neurosurgical management strategies, avoiding unnecessary and potentially detrimental surgical interventions, and ultimately optimizing clinical outcomes, healthcare quality, and cost-effectiveness.
Radiomics extracts quantitative and reproducible features from medical images in high-throughput, complex modalities that are difficult to identify or quantify visually, which may be associated with a specific disease, and is used for tumor diagnosis, grading, efficacy assessment, and prognosis prediction by building predictive models (7–9). Currently, machine learning algorithms have been used with good success as a tool to differentiate PCNSL from ADG (10), and advanced sequences help to identify complex diseases (11), but these advanced examination sequences require additional costs and are not routinely used as sequences in clinical examinations of patients. Therefore, machine learning is applied on the basis of routine MRI sequences for a comprehensive analysis of tumors, which can reflect the cellular composition and heterogeneity of tumors by extracting radiomic features (12, 13). Constructing models by one or more machine learning algorithms for the characterization problem based on MRI routine sequences also includes the study and comparison of individual sequences or combined sequences, so the results of machine learning models developed based on MRI are not the identical, and the problem of differentiating PCNSL from ADG is still unresolved and un-recognized (14, 15). Therefore, further studies are required to determine the machine learning methods and the selection of the optimal sequence or combination of sequences for MRI. However, limited sample size and lack of interpretability of ml-based models limit the application of radiomic-based studies in clinical practice.The SHapley Additive exPlanations (SHAP) approach can help to solve this problem (16–18). Thus, the combination of SHAP and radiomics can over interpret the models.
This study aims to develop and validate multiple machine learning algorithms based on multiparametric MRI to discriminate PCNSL from ADG, and find the model with optimal performance by incorporating clinical factors and verified the generalizability of models by external validation.The SHAP algorithm is used to explore the interpretability of the models.
Materials and methods
Patients
This retrospective multicenter study received ethical approval from the institutional review boards, and the informed consent was waived. We systematically reviewed all ADGs and PCNSL patients who underwent preoperative conventional MRI protocols whic including contrast-enhanced T1-weighted imaging(CET1WI), fluid attenuated inversion recovery(FLAIR), diffusion-weighted imaging(DWI) and apparent diffusion coefficient(ADC) between January 2015 and July 2024 across both institutions. ADG encompasses IDH-mutant astrocytoma, oligodendroglioma, and IDH-wildtype glioblastoma, for detailed information on the specific stratification of ADG, add details in Supplementary Information 1. The detailed inclusion and exclusion criteria are shown in Figure 1.We used Institution 1 as the training cohort and we used Institution 2 as the external validation cohort. Detailed MRI acquisition parameters across different scanner platforms are provided in Supplementary Tables S1, S2. A total of 165 patients were finally included in the two centers, with a total of 115 patients enrolled in Institution 1, including 69 patients with ADGs and 46 patients with PCNSL, and a total of 50 patients enrolled in Institution 2, including 29 patients with ADGs and 21 patients with PCNSL.Demographic and clinical characteristics were extracted from electronic medical records, including age, sex, and tumor morphological features (enhancement, number, location, edema and cystic).
Figure 1. Recruitment pathway of the patient selection process from the two participating medical institutions of this study.
Data preprocessing and feature extraction
As the differences between centers and inter-scanner, we employed Combat during preprocessing to correct batch effects and achieve harmonization, yielding favorable results. To demonstrate this, we have included before and after PCA visualizations in the Supplementary Figures S1A-D. The images were resampled, and grayscale discretized prior to feature extraction. All images were resampled to 1 mm×1 mm×1 mm for the same resolution.To standardize the MR images of all sequences, the mean and standard deviation of the intensities in each MRI volume image were calculated and each value was standardized by the z-score method, which consists of subtracting the mean intensity and dividing by the standard deviation of the intensity (15).The images were analyzed separately and independently by two radiologists with 10 years of experience in neurological MRI diagnosis using a double-blind method.The region of interest (ROI) was manually segmented slice by slice on MRI images by using ITK-SNAP (http://www.itksnap.org);Areas of edema and necrosis were excluded so that only tumor was included.When outlining ROI for different sequences and simultaneously referring to image features of other sequences to ensure the accuracy of ROI.When outlining ROI on FLAIR images, the ROIs shall be adjusted by identifying the solid portion of the tumor against DWI and CET1WI images (19). Since the results of radiomic feature calculations depend on the ROI outlines depicted by radiologists, both test-retest analysis and inter-radiologist analysis should be applied to assess the robustness of all features. Based on 20 randomly selected patients, a test-retest analysis was performed in which each patient’s ROI was segmented twice by a radiologist. The dataset used for inter-evaluator analysis consisted of 20 patients selected as described above, where each patient’s ROI was split independently by two radiologists.
Radiomic features were calculated based on the specified ROIs (20). We used PyRadiomics package (version 3.0) for Python (version 3.7.3) for radiomic feature extraction, and we extracted radiomic features from CET1WI, FLAIR, DWI, and ADC of tumor lesions. The first-order statistical features describe the distribution of voxel intensities within the tumor, and the texture features use a grayscale matrix to represent the spatial heterogeneity of the tumor.
Feature selection and model construction
Feature selection was needed to reduce overfitting, redundancy, or any other type of bias. Radiomic features were calculated after each depiction and intra- and inter-observer reproducibility was determined for each feature. Features with low reproducibility (interclass correlation coefficient (ICC) below 0.75 were excluded) (21–23). All eigenvalues were normalized using the z-score transformation to reduce potential differences in eigenvalues between the two cohorts (24). Spearman correlation coefficient was applied to calculate the correlation between features and one of the features with a correlation coefficient greater than 0.9 between any two features was retained (25). The least absolute shrinkage and selection operator (LASSO) algorithm is further applied to filter the best radiomic features by a ten-fold cross-validation method (15). Finally, we selected seven machine learning classifiers to construct the radiomic model, including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), Extra Trees, LightGBM and Multilayer Perceptron (MLP).The hyperparameters for each classifier are provided in Supplementary Information 2. To improve the performance of the radiomic model, clinical baseline features were selected to construct the clinical-based model. Univariate and multivariate analyses were performed to identify independent risk factors for differentiating PCNSL from ADG, and finally, clinical indicators were integrated into the radiomic model to construct a Rad-Clinic fusion model.
Explanation and visualization of rad-clinic model
The SHAP method enables the results of the model to be understood in an interpretable manner, providing visually concise graphs by representing the range and distribution of the importance of features to the model outputs and by correlating the values of features with the impact of features, thus enhancing model transparency by providing global and local interpretability (22, 23). The features are ranked in order of importance, with higher ranked features contributing more to the model. Each point representing the SHAP value of each feature for a patient is plotted horizontally and stacked vertically to show the density of the same SHAP value (25). The method elucidates the most influential variables, thereby significantly improving the interpretability of the model (26). Enhanced understanding of key predictors allows for informed assessment of their contribution to predictive outcomes in the training cohort.
Construction of a fusion model-based nomogram
To visualise the classification assessment, we conducted a logistic regression analysis to build a nomogram based on the Rad-Clinic fusion model and demographic characteristics.
Statistical analysis
Python (version 3.7.3) and SPSS (v22.0, IBM) statistical packages were used for statistical analysis. We used independent samples t-tests and Mann-Whitney tests to analyze variables between the training cohort and the external validation cohort, and univariate and multivariate logistic regression to analyze continuous and categorical variables between different groups of patients. Categorical variables were compared using the chi-square test or Fisher’s exact test. Univariate and multivariate analyses were performed for various factors. The area under the curve (AUC) of the subjects was used to assess the discriminatory power of the model. Accuracy, sensitivity (SEN) and specificity (SPE) were calculated to quantify the discriminative power of the predictive models. Delong test was used to compare AUC values between models, and Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI) were calculated to quantify the improvement in discrimination performance. To assess how well the predictions of the models match the actual results, calibration curve analysis was performed and quantified with the Hosmer-Lemeshow(HL) test. In addition, decision curve analysis (DCA) was used to quantify the net benefit at different threshold probabilities, thus highlighting the clinical applicability of the model. The SHAP method was used to explain the influence of features on the predictive model, elucidating the most influential variables, thus significantly improving the interpretability of the model.
Results
Patient characteristics
Table 1 summarizes the baseline characteristics of all participants.In the training cohort, significant differences were observed in age, tumor intensification pattern, number of tumors, tumor location, and tumor cystic necrosis (p<0.05) between the groups with ADG and PCNSL. In the external validation cohort, age, tumor intensification pattern, and tumor cystic necrosis demonstrated statistically significant differences (p<0.05). Comprehensive comparisons of baseline characteristics between the internal training and external validation cohort revealed no significant differences between the two cohorts. Detailed results of the stratified analysis for GBM and PCNSL are provided in the Supplementary Table S3.
Feature selection
The workflow for constructing the radiomic model is shown in Figure 2. Radiomic features were extracted from tumor lesions across four MRI sequences(CET1WI, FLAIR, DWI, and ADC). This enabled the development of four single-sequence radiomic models and eleven multi-sequence radiomic models. The multi-sequence models were created through multivariate logistic regression, integrating all possible combinations of the single-sequence models. They are composed of the linearly weighted sum of outputs from the single-sequence models.
Individual features retained within each of the CET1WI, FLAIR, DWI, and ADC sequences are detailed in Supplementary Table S4 for single-sequence and multi-sequence radiomic models.
Performance of radiomic−based machine learning model
In these machine learning algorithms, the combined CET1WI+DWI+FLAIR model using the SVM method demonstrated superior diagnostic efficiency, achieving AUC values of 0.975 for the training set and 0.871 for the validation set. The performance of other sequence models is detailed in the Supplementary Table S5.
Indicators with significant correlation with identifying ADG and PCNSL (age, mode of intensification, and tumor cystic changes) were used for the construction of the clinical model, and AUC values of 0.727 and 0.733 were obtained for the training and validation cohort.The clinical indicators were integrated into the radiomic model for the construction of the Rad-Clinic fusion model, and AUC values of 0.973 and 0.940 were obtained for the training and validation cohort. A nomogram (Figure 3) was created based on the independent clinical predictors and the best fusion model, and the AUC values of the nomogram training and validation cohort were 0.933 and 0.927, respectively.Therefore, the Rad-Clinic fusion model performed the best among all models. DeLong test, NRI and IDI were used to compare the diagnostic effectiveness of the models, and in the validation cohort results, the Delong test was statistically significant (p<0.05) for the radiomic model, fusion model and nomogram compared to the clinical model, but the Delong test was not statistically significant for the Rad-Clinic fusion model compared to the radiomic model.This may also be due to the insensitivity of the DeLong test to small samples, the NRI and IDI analyses further confirmed improved diagnostic efficiency with the Rad-Clinic fusion model.The Hosmer-Lemeshow goodness-of-fit test indicated a superior fit for the fusion model, corroborated by calibration curves, suggesting enhanced predictive accuracy. This improvement augments the clinical model, radiomic model, and overall predictive performance. The ROC curves, decision curve analyses (DCA), and calibration curves for the fusion model and nomogram are shown in Figure 4. The diagnostic performance details are summarized in Table 2.
Figure 4. Comparison of the model’s performance in the training and validation cohort. (A–C) ROC curves, decision curve analysis and calibration curve of Radiomic model, Clinical model, Rad-Clinic fusion model and nomogram in training cohorts. (D–F) ROC curves, decision curve analysis and calibration curve of Radiomic model, Clinical model, Rad-Clinic fusion model and nomogram in validation cohorts.
Explanation and visualization of rad-clinic model
We constructed the optimal Rad-Clinic model and used the SHAP method for visualizing and analyzing the differentiation between ADG and PCNSL. The feature ‘GLSZM_ZoneEntropy_CET1WI’ emerged as the most significant in distinguishing between these two conditions. Among clinical features, reinforcement modality and age also played vital roles. The statistical significance of ‘GLSZM_ZoneEntropy_CET1WI’ in differentiating ADG from PCNSL is shown in Figure 5, where the SHAP summary plot elucidates the cumulative impact of each variable.
Figure 5. SHAP summary plot of Rad-Clinic fusion model.The plot illustrates the feature relevance and combined feature attributions to the models’ predictive performance.
Figure 6 shows that in the evaluation of Patient A, the SHAP value was higher than the baseline value, indicating that this patient was an individual with lymphoma, and the characteristic arrow contributed to the quantitative assessment of PCNSL. The GLSZM_ZoneEntropy_CET1WI feature was negatively correlated with the SHAP value(-0.8231). Patient B was an individual with ADG, whose SHAP value was significantly lower than the baseline value, and the GLSZM_ZoneEntropy_CET1WI feature of this patient was positively correlated with the SHAP value(1.1663).
Figure 6. SHAP force plots explained how the Rad-Clinic fusion model differentiating PCNSL from ADG. The patient (A) was PCNSL, and the patient (B) was ADG. For instance, low feature value of GLSZM_ZoneEntropy_CET1WI contributed to the increase in the assessment probability of PCNSL. Patient (B) had a GLSZM_ZoneEntropy_CET1WI value of 1.1663, while a lower GLSZM_ZoneEntropy_CET1WI value of patient A (-0.8231).
Discussion
In this study, we developed a multi-parametric MRI-based machine learning Algorithm that integrates both radiomics and clinical data to preoperatively differentiating PCNSL from ADG. The combined Rad-Clinic model demonstrated superior diagnostic efficiency compared to standalone radiomic or clinical models, supported by external validation. Visualization techniques were applied to effectively present the predictive factors.
Seven machine learning classifiers(LR, SVM, RF, KNN, ExtraTrees, LightGBM and MLP) were utilized to construct an interpretable radiomic model differentiating PCNSL from ADG. Radiomic features were extracted from conventional MRI sequences, including CET1WI, FLAIR, DWI, and ADC. In the validation cohort, the combination model of CET1WI+DWI+FLAIR using SVM method the best diagnostic efficiency, with AUCs of 0.975 for training and 0.871 for external validation.Additionally, a clinical model was developed by integrating clinically significant indicators, achieving AUCs of 0.730 for the training and 0.735 for the validation cohort. These clinical indicators were incorporated into the radiomic model to construct the Rad-Clinic fusion model, resulting in AUCs of 0.972 for training and 0.940 for validation. A nomogram, based on independent clinical predictors and the optimal fusion model, provided AUCs of 0.933 and 0.927 for the training and validation cohort.The Rad-Clinic fusion model delivered the highest diagnostic performance among all models, significantly enhancing diagnostic efficiency. DCA and calibration curves further confirmed that the fusion model offered considerable benefits in differentiating ADG from PCNSL.
In this study, the SVM-based CET1WI model demonstrated the highest diagnostic efficiency among single-sequence models, and the combination of CET1WI+DWI+FLAIR based on the SVM method had the best diagnostic efficiency mong multi-sequence models. Notably, the feature ‘GLSZM_ZoneEntropy_CET1WI’ from the CET1WI sequence was the most significant contributor to the diagnostic model. Enhancement modality, which is inherently related to the CET1WI sequence, was statistically significant for diagnosis, underscoring the sequence’s critical role in differentiating PCNSL from ADG. Each MRI sequence capitalizes on the distinct biophysical characteristics of brain tumors. The CET1WI sequence, in particular, effectively highlights areas of enhancement and necrosis, reflecting the degree of blood-brain barrier disruption and contrast agent aggregation, thereby offering robust diagnostic potential for distinguishing PCNSL and ADG (27). Nevertheless, the higher diagnostic efficiency achieved by the CET1WI+DWI+FLAIR combination suggests the necessity of incorporating multiple MR sequences for optimal diagnostic performance, despite CET1WI is important for tumor diagnosis (28).Among baseline clinical features, age, enhancement, and tumor cystic changes were significantly associated with differential tumor diagnosis in both the training and validation cohorts. Age emerged as the most influential clinical feature, both in the nomogram and SHAP analysis, and is easily obtainable preoperatively, making it a common inclusion in joint models (29, 30). Tumor enhancement and cystic characteristics, as baseline imaging features, are derived from preoperative imaging. Typically, ADG presents with ring-shaped enhancement and frequent cystic necrosis, while PCNSL is characterized by solid enhancement and relatively rare cystic changes (31). This difference may relate to the pathophysiological mechanisms that ADG rapid growth can lead to ischemia and hypoxia-induced necrosis, completely destroying the blood-brain barrier in the necrotic area and resulting in non-enhancement at the center (1, 3, 32, 33). Around the periphery, viable tumor cells exhibit active proliferation and neovascularization, allowing contrast agents to leak in areas of increased vascular permeability, thus creating ring-shaped enhancement (34).
This study employed seven machine learning methods, with the SVM-based models demonstrating the highest diagnostic efficiency for both single-sequence CET1WI and multi-sequence fusion models of CET1WI+FLAIR+DWI. Despite these satisfactory results, visual demonstration was initially lacking. Therefore, we utilized SHAP analysis to elucidate how each feature influences model predictions, by visualizing the contribution value of each feature.SHAP values quantify the contribution of each feature to the model’s predictions, with positive values suggesting an increased likelihood of the predicted outcome and negative values a decreased likelihood (35). This study aims to address challenges of model interpretability and explainability. Interpretability refers to how well humans can understand or intuitively grasp a model’s output, while explainability pertains to the underlying mechanisms and logic of a machine learning system (26).We used SHAP to identify the two most influential features within the CET1WI sequence, including GLSZM and First-order features. GLSZM highlights homogeneous regions and tumor heterogeneity on a regional scale, reflecting the complexity and variation within the tumor, potentially linked to tumor aggressiveness or the immune microenvironment (36). First-order features, derived from grayscale values of tumor images, encompass various first-order statistics reflecting intensity distribution within the tumor and its internal heterogeneity.Furthermore, localized interpretation of individual patient assessments can be accomplished using SHAP force diagrams, which are faster and easier to use than nomograms (23). Clinicians can directly compare an individual patient’s SHAP values to baseline values. If an output SHAP value exceeds the baseline, a clinician may classify the patient as having PCNSL. SHAP force diagrams also visually demonstrate how features influence a patient’s assessment (25).Arrow color indicates the impact (red increases the likelihood of nonresponse), and arrow length signifies the feature’s contribution magnitude. These tools facilitate a nuanced understanding of feature roles in patient-specific assessments (23–26).
Previous research aimed at differentiating PCNSL from ADG has employed various machine learning models based on single-sequence MRI data, yielding inconsistent results (18). The emergence of multi-parametric MRI studies confirmed that models combining multiple sequences exhibit higher diagnostic efficiency than those relying on single-sequence data (15). For instance, one such study utilized a machine learning model incorporating multiple MRI sequences, achieving optimal diagnostic performance with a CET1WI+ADC combined model, displaying AUC values of 0.943 for the training and 0.935 for the validation cohort (19). Another study using cross-validation across multiple classifiers reported the highest diagnostic efficiency with an ADC+FLAIR+CET1WI model, achieving an AUC of 0.977 (37).While previous scholars have developed models using single or multiple machine learning methods to identify PCNSL and ADG through MRI data, these models often overlooked the integration and validation of clinical data alongside radiomic features (15, 19, 37). They lacked the ability to differentiate and assess the significance of clinical information in conjunction with imaging features for diagnosis. The incorporation of such features, alongside visualizing the model, enhances model completeness and provides more practical diagnostic guidance for clinicians.Our study addresses these gaps by using seven machine learning methods to develop and validate models using multi-parametric MRI sequences. We integrated clinical and imaging data to ensure comprehensive model development. Subsequently, we visualized the model using SHAP analysis, allowing for a nuanced understanding of feature influence, thus advancing the model’s applicability in clinical settings. Compared to previous studies, our approach provides a more holistic tool for clinicians seeking to differentiate PCNSL from ADG.
Despite the valuable insights gained from this study, several limitations remain. First, although data were sourced from two centers, the patient cohort size was relatively limited. Future research should focus on developing predictive models based on larger sample sizes to enhance robustness and generalizability. Second, as a retrospective study, this research may be subject to selection bias. To address this, future studies should incorporate data from multiple centers to facilitate prospective analyses, thereby improving the reliability and validity of the model.Third, the study predominantly relied on machine learning models, necessitating manual delineation of ROIs, which is time-consuming. Future efforts should prioritize the development of automatic or semi-automatic tumor segmentation techniques to streamline and expedite the process, enhancing efficiency and practicality in clinical settings.Finally, this study employed performing model selection and tuning within a single 10-fold cross-validation framework (without nesting), which may lead to slightly optimistic performance estimates. In future research, we will strive to adopt nested cross-validation to provide more rigorous work.
This study successfully developed and validated multiple machine learning models based on multiparametric MRI sequences, integrating radiomics and clinical indicators to construct interpretable models that can provide a reliable and noninvasive tool for preoperative differentiation of PCNSL and ADG.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The studies involving humans were approved by Aerospace center hospital, Beijing, China. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.
Author contributions
YL: Conceptualization, Investigation, Writing – original draft, Writing – review & editing. HJ: Methodology, Writing – review & editing. TW: Data curation, Formal Analysis, Writing – review & editing. DB: Methodology, Software, Writing – review & editing. ZW: Investigation, Supervision, Writing – review & editing.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Conflict of interest
The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2025.1713099/full#supplementary-material
Abbreviations
PCNSL, Primary central nervous system lymphoma; ADG, Adult-type diffuse glioma; SHAP, SHapley Additive exPlanations; WHO, World Health Organization; CNS, Central Nervous System; ROI, Region of interest; ICC, Interclass correlation coefficient; LR, Logistic Regression; SVM, Support Vector Machine; RF, Random Forest; KNN, K-Nearest Neighbors; MLP, Multilayer Perceptron; AUC, Area under the curve; NRI, Net Reclassification Improvement; IDI, Integrated Discrimination Improvement; HL, Hosmer-Lemeshow; DCA, Decision curve analysis; MRI, Magnetic Resonance Imaging; CET1WI, Contrast-Enhanced T1-Weighted Imaging; FLAIR, Fluid Attenuated Inversion Recovery; DWI, Diffusion-Weighted Imaging; ADC, Apparent Diffusion Coefficient; GLRLM, Gray Level Run Length Matrix; GLDM, Gray Level Dependence Matrix; GLSZM, Gray Level Size Zone Matrx; NGTDM, Neighbouring Gray Tone Difference Matrix; GLSZM, Gray Level Size Zone Matrix; GLCM, Gray Level Cooccurence Matrix; Rad-Clinic Radiomic-Clinical.
References
1. Louis DN, Perry A, Wesseling P, Brat DJ, Cree IA, Figarella-Branger D, et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro Oncol. (2021) 23:1231–51. doi: 10.1093/neuonc/noab106
2. Wang P, He K, Ma X, Weng L, Wu Q, Zhao P, et al. Applying MAP-MRI to identify the WHO grade and main genetic features of adult-type diffuse gliomas: A comparison of three diffusion-weighted MRI models. Acad Radiol. (2023) 30:1238–46. doi: 10.1016/j.acra.2022.10.009
3. Ostrom QT, Cioffi G, Waite K, Kruchko C, and Barnholtz-Sloan JS. CBTRUS statistical report: primary brain and other central nervous system tumors diagnosed in the United States in 2014-2018. Neuro Oncol. (2021) 23:iii1–105. doi: 10.1093/neuonc/noab200
4. Jiang T, Nam DH, Ram Z, Poon WS, Wang J, Boldbaatar D, et al. Clinical practice guidelines for the management of adult diffuse gliomas. Cancer Lett. (2021) 499:60–72. doi: 10.1016/j.canlet.2020.10.050
5. Barajas RF, Politi LS, Anzalone N, Schöder H, Fox CP, Boxerman JL, et al. Consensus recommendations for MRI and PET imaging of primary central nervous system lymphoma: guideline statement from the International Primary CNS Lymphoma Collaborative Group (IPCG). Neuro Oncol. (2021) 23:1056–71. doi: 10.1093/neuonc/noab020
6. Xie X, Yang L, Zhao F, Wang D, Zhang H, He X, et al. A deep learning model combining multimodal radiomics, clinical and imaging features for differentiating ocular adnexal lymphoma from idiopathic orbital inflammation. Eur Radiol. (2022) 32:6922–32. doi: 10.1007/s00330-022-08857-6
7. Mayerhoefer ME, Umutlu L, and Schöder H. Functional imaging using radiomic features in assessment of lymphoma. Methods. (2021) 188:105–11. doi: 10.1016/j.ymeth.2020.06.020
8. Smits M. MRI biomarkers in neuro-oncology. Nat Rev Neurol. (2021) 17:486–500. doi: 10.1038/s41582-021-00510-y
9. Berger TR, Wen PY, Lang-Orsini M, and Chukwueke UN. World health organization 2021 classification of central nervous system tumors and implications for therapy for adult-type gliomas: A review. JAMA Oncol. (2022) l8:1493–501. doi: 10.1001/jamaoncol.2022.2844
10. Albano D, Cuocolo R, Patti C, Ugga L, Chianca V, Tarantino V, et al. Whole-body MRI radiomics model to predict relapsed/refractory Hodgkin Lymphoma: A preliminary study. Magn Reson Imaging. (2022) 86:55–60. doi: 10.1016/j.mri.2021.11.005
11. Xu J, Li J, Wang T, Luo X, Zhu Z, Wang Y, et al. Predicting treatment response and prognosis of immune checkpoint inhibitors-based combination therapy in advanced hepatocellular carcinoma using a longitudinal CT-based radiomics model: a multicenter study. BMC Cancer. (2025) 25:602. doi: 10.1186/s12885-025-13978-4
12. Van der Voort SR, Incekara F, Wijnenga MMJ, Kapsas G, Gahrmann R, Schouten JW, et al. Combined molecular subtyping, grading, and segmentation of glioma using multi-task deep learning. Neuro Oncol. (2023) 25:279–89. doi: 10.1093/neuonc/noac166
13. Lv K, Chen H, Cao X, Du P, Chen J, Liu X, et al. Development and validation of a machine learning algorithm for predicting diffuse midline glioma, H3 K27-altered, H3 K27 wild-type high-grade glioma, and primary CNS lymphoma of the brain midline in adults. J Neurosurg. (2022) 139:393–401. doi: 10.3171/2022.11.JNS221544
14. Lu G, Zhang Y, Wang W, Miao L, and Mou W. Machine learning and deep learning CT-based models for predicting the primary central nervous system lymphoma and glioma types: A multicenter retrospective study. Front Neuro. (2022) l13:905227. doi: 10.3389/fneur.2022.905227
15. Cassinelli Petersen GI, Shatalov J, Verma T, Brim WR, Subramanian H, Brackett A, et al. Machine learning in differentiating gliomas from primary CNS lymphomas: A systematic review, reporting quality, and risk of bias assessment. AJNR AJNR Am J Neuroradiol. (2022) 43:526–33. doi: 10.3174/ajnr.A7473
16. Mo H, Liang W, Huang Z, Li X, Xiao X, Liu H, et al. Machine learning-based multiparametric magnetic resonance imaging radiomics model for distinguishing central neurocytoma from glioma of lateral ventricle. Eur Radiol. (2023) 33:4259–69. doi: 10.1007/s00330-022-09319-9
17. He J, Ren J, Niu G, Liu A, Wu Q, Xie S, et al. Multiparametric MR radiomics in brain glioma: models comparation to predict biomarker status. BMC Med Imaging. (2022) 22:137. doi: 10.1186/s12880-022-00865-8
18. Suh HB, Choi YS, Bae S, Ahn SS, Chang JH, Kang SG, et al. Primary central nervous system lymphoma and atypical glioblastoma: Differentiation using radiomics approach. Eur Radiol. (2018) 28:3832–9. doi: 10.1007/s00330-018-5368-4
19. Zhao LM, Hu R, Xie FF, Clay Kargilis D, Imami M, Yang S, et al. Radiomic-based MRI for classification of solitary brain metastases subtypes from primary lymphoma of the central nervous system. J Magn Reson Imaging. (2023) 57:227–35. doi: 10.1002/jmri.28276
20. Gao A, Zhang H, Yan X, Wang S, Chen Q, Gao E, et al. Whole-tumor histogram analysis of multiple diffusion metrics for glioma genotyping. Radiology. (2022) 302:652–61. doi: 10.1148/radiol.210820
21. Chen H, Lundberg SM, and Lee SI. Explaining a series of models by propagating Shapley values. Nat Commun. (2022) 13:4512. doi: 10.1038/s41467-022-31384-3
22. Rodríguez-Pérez R and Bajorath J. Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J Med Chem. (2020) 63:8761–77. doi: 10.1021/acs.jmedchem.9b01101
23. Wang Y, Zhang L, Jiang Y, Cheng X, He W, Yu H, et al. Multiparametric magnetic resonance imaging (MRI)-based radiomics model explained by the Shapley Additive exPlanations (SHAP) method for predicting complete response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer: a multicenter retrospective study. Quant Imaging Med Surg. (2024) 14:4617–34. doi: 10.21037/qims-24-7
24. Zhang R, Hong M, Cai H, Liang Y, Chen X, Liu Z, et al. Predicting the pathological invasiveness in patients with a solitary pulmonary nodule via Shapley additive explanations interpretation of a tree-based machine learning radiomics model: a multicenter study. Quant Imaging Med Surg. (2023) 13:7828–41. doi: 10.21037/qims-23-615
25. Wang Y, Lang J, Zuo JZ, Dong Y, Hu Z, Xu X, et al. The radiomic-clinical model using the SHAP method for assessing the treatment response of whole-brain radiotherapy: a multicentric study. Eur Radiol. (2022) 32:8737–47. doi: 10.1007/s00330-022-08887-0
26. Ye JY, Fang P, Peng ZP, Huang XT, Xie JZ, Yin XY, et al. A radiomics-based interpretable model to predict the pathological grade of pancreatic neuroendocrine tumors. Eur Radiol. (2024) 34:1994–2005. doi: 10.1007/s00330-023-10186-1
27. Louis DN, Perry A, Reifenberger G, Brat DJ, Cree IA, Figarella-Branger D, et al. The 2016 world health organization classification of tumors of the central nervous system: a summary. Acta Neuropathol. (2016) 131:803–20. doi: 10.1007/s00401-016-1545-1
28. Nguyen JC, Davis KW, Arkader A, Guariento A, Sze A, Hong S, et al. Pre-treatment MRI of leukaemia and lymphoma in children: are there differences in marrow replacement patterns on T1-weighted images? Eur Radiol. (2021) 31:7992–8000. doi: 10.1007/s00330-021-07814-z
29. Dolecek TA, Propp JM, Stroup NE, and Kruchko C. CBTRUS statistical report: primary brain and central nervous system tumors diagnosed in the United States in 2005-2009. Neuro Oncol. (2021) 14:v1–49. doi: 10.1093/neuonc/nos218
30. Haldorsen IS, Espeland A, and Larsson EM. Central nervous system lymphoma: characteristic findings on traditional and advanced imaging. AJNR Am J Neuroradiol. (2011) 32:984–92. doi: 10.3174/ajnr.A2171
31. Guha A, Goda JS, Dasgupta A, Mahajan A, Halder S, Gawde J, et al. Classifying primary central nervous system lymphoma from glioblastoma using deep learning and radiomics based machine learning approach - a systematic review and meta-analysis. Front Oncol. (2022) 12:884173. doi: 10.3389/fonc.2022.884173
32. Kang KM, Choi SH, Chul-Kee P, Kim TM, Park SH, Lee JH, et al. Differentiation between glioblastoma and primary CNS lymphoma: application of DCE-MRI parameters based on arterial input function obtained from DSC-MRI. Eur Radiol. (2021) 31:9098–109. doi: 10.1007/s00330-021-08044-z
33. Alcaide-Leon P, Dufort P, Geraldo AF, Alshafai L, Maralani PJ, Spears J, et al. Differentiation of enhancing glioma and primary central nervous system lymphoma by texture-based machine learning. AJNR Am J Neuroradiol. (2017) 38:1145–50. doi: 10.3174/ajnr.A5173
34. Schaff LR and Mellinghoff IK. Glioblastoma and other primary brain Malignancies in adults: A review. JAMA. (2023) 329:574–87. doi: 10.1001/jama.2023.0023
35. Wang P, Xie S, Wu Q, Weng L, Hao Z, Yuan P, et al. Model incorporating multiple diffusion MRI features: development and validation of a radiomics-based model to predict adult-type diffuse gliomas grade. Eur Radiol. (2023) 33:8809–20. doi: 10.1007/s00330-023-09861-0
36. Xia W, Hu B, Li H, Geng C, Wu Q, Yang L, et al. Multiparametric-MRI-based radiomics model for differentiating primary central nervous system lymphoma from glioblastoma: development and cross-vendor validation. J Magn Reson Imaging. (2021) 53:242–50. doi: 10.1002/jmri.27344
37. Bathla G, Priya S, Liu Y, Geng C, Wu Q, Yang L, et al. Radiomics-based differentiation between glioblastoma and primary central nervous system lymphoma: a comparison of diagnostic performance across different MRI sequences and machine learning techniques. Eur Radiol. (2021) 31:8703–13. doi: 10.1007/s00330-021-07845-6
Keywords: adult-type diffuse glioma, Interpretability, machine learning, MRI, primary central nervous system lymphoma
Citation: Liang Y, Hu J, Wu T, Bai D and Wang Z (2026) Development and validation of multiple machine learning algorithms for differentiating primary central nervous system lymphoma from adult-type diffuse glioma: an interpretable and multicenter study. Front. Oncol. 15:1713099. doi: 10.3389/fonc.2025.1713099
Received: 25 September 2025; Accepted: 15 December 2025; Revised: 28 November 2025;
Published: 07 January 2026.
Edited by:
Giorgio Carrabba, University of Milano-Bicocca, ItalyReviewed by:
Zhiqiang Zhang, Nanjing University School of Medicine, ChinaShailesh Nayak S, Manipal Academy of Higher Education, India
Copyright © 2026 Liang, Hu, Wu, Bai and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Zhiqun Wang, d2FuZ3poaXF1bkAxMjYuY29t
Tianhui Wu3