Predictive Role of the Apparent Diffusion Coefficient and MRI Morphologic Features on IDH Status in Patients With Diffuse Glioma: A Retrospective Cross-Sectional Study

Purpose To evaluate isocitrate dehydrogenase (IDH) status in clinically diagnosed grade II~IV glioma patients using the 2016 World Health Organization (WHO) classification based on MRI parameters. Materials and Methods One hundred and seventy-six patients with confirmed WHO grade II~IV glioma were retrospectively investigated as the study set, including lower-grade glioma (WHO grade II, n = 64; WHO grade III, n = 38) and glioblastoma (WHO grade IV, n = 74). The minimum apparent diffusion coefficient (ADCmin) in the tumor and the contralateral normal-appearing white matter (ADCn) and the rADC (ADCmin to ADCn ratio) were defined and calculated. Intraclass correlation coefficient (ICC) analysis was carried out to evaluate interobserver and intraobserver agreement for the ADC measurements. Interobserver agreement for the morphologic categories was evaluated by Cohen’s kappa analysis. The nonparametric Kruskal-Wallis test was used to determine whether the ADC measurements and glioma subtypes were related. By univariable analysis, if the differences in a variable were significant (P<0.05) or an image feature had high consistency (ICC >0.8; κ >0.6), then it was chosen as a predictor variable. The performance of the area under the receiver operating characteristic curve (AUC) was evaluated using several machine learning models, including logistic regression, support vector machine, Naive Bayes and Ensemble. Five evaluation indicators were adopted to compare the models. The optimal model was developed as the final model to predict IDH status in 40 patients with glioma as the subsequent test set. DeLong analysis was used to compare significant differences in the AUCs. Results In the study set, six measured variables (rADC, age, enhancement, calcification, hemorrhage, and cystic change) were selected for the machine learning model. Logistic regression had better performance than other models. Two predictive models, model 1 (including all predictor variables) and model 2 (excluding calcification), correctly classified IDH status with an AUC of 0.897 and 0.890, respectively. The test set performed equally well in prediction, indicating the effectiveness of the trained classifier. The subgroup analysis revealed that the model predicted IDH status of LGG and GBM with accuracy of 84.3% (AUC = 0.873) and 85.1% (AUC = 0.862) in the study set, and with the accuracy of 70.0% (AUC = 0.762) and 70.0% (AUC = 0.833) in the test set, respectively. Conclusion Through the use of machine-learning algorithms, the accurate prediction of IDH-mutant versus IDH-wildtype was achieved for adult diffuse gliomas via noninvasive MR imaging characteristics, including ADC values and tumor morphologic features, which are considered widely available in most clinical workstations.

Purpose: To evaluate isocitrate dehydrogenase (IDH) status in clinically diagnosed grade II~IV glioma patients using the 2016 World Health Organization (WHO) classification based on MRI parameters.

Materials and Methods:
One hundred and seventy-six patients with confirmed WHO grade II~IV glioma were retrospectively investigated as the study set, including lowergrade glioma (WHO grade II, n = 64; WHO grade III, n = 38) and glioblastoma (WHO grade IV, n = 74). The minimum apparent diffusion coefficient (ADCmin) in the tumor and the contralateral normal-appearing white matter (ADCn) and the rADC (ADCmin to ADCn ratio) were defined and calculated. Intraclass correlation coefficient (ICC) analysis was carried out to evaluate interobserver and intraobserver agreement for the ADC measurements. Interobserver agreement for the morphologic categories was evaluated by Cohen's kappa analysis. The nonparametric Kruskal-Wallis test was used to determine whether the ADC measurements and glioma subtypes were related. By univariable analysis, if the differences in a variable were significant (P<0.05) or an image feature had high consistency (ICC >0.8; k >0.6), then it was chosen as a predictor variable. The performance of the area under the receiver operating characteristic curve (AUC) was evaluated using several machine learning models, including logistic regression, support vector machine, Naive Bayes and Ensemble. Five evaluation indicators were adopted to compare the models. The optimal model was developed as the final model to predict IDH status in 40 patients with glioma as the subsequent test set. DeLong analysis was used to compare significant differences in the AUCs.
Results: In the study set, six measured variables (rADC, age, enhancement, calcification, hemorrhage, and cystic change) were selected for the machine learning model. Logistic regression had better performance than other models. Two predictive models, model 1 (including all predictor variables) and model 2 (excluding calcification), correctly classified

INTRODUCTION
Cerebral diffuse infiltrating gliomas are the second most common type of primary central nervous system (CNS) tumor, second only to meningiomas. According to the 2016 World Health Organization (WHO) classification of CNS tumors, adult diffuse gliomas include astrocytic tumors, oligodendrogliomas, and glioblastomas (WHO grade II~IV) (1). These tumors account for approximately 22% of all CNS tumors. In the United States, more than 16,000 cases of adult diffuse glioma are reported each year, with an incidence of approximately 5.13 per 100,000 people. In addition, glioblastoma (GBM) is the most common malignant tumor in the CNS, accounting for approximately 14.6% of all CNS tumors and 48.3% of all malignant CNS tumors, with 11,833 cases reported annually within the U.S (2,3). However, due to the heterogeneity of these neuroepithelial tumors, they have different clinical characteristics, biological behaviors, and histopathological characteristics, and substantial differences in treatment and prognosis.
Recently, the isocitrate dehydrogenase (IDH) status and other molecular subtypes have been reported as major prognostic factors and molecular diagnostic criteria for glioma tumor behavior. Thus, noninvasively detecting molecular subtypes before surgery is important for predicting the outcome and choosing the best therapy. Previous studies have shown that lower-grade glioma (LGG) IDH-wildtype and glioblastoma (GBM) have similar molecular structures and prognoses, while IDH-mutant status confers longer overall survival than IDHwildtype status (4). In addition, compared with glioblastomas in patients with IDH-mutations (grade IV), anaplastic gliomas (grade III) in patients with wild-type IDH have a worse prognosis (5). It should be noted that IDH mutation status has been integrated into the 2016 WHO Classification of Tumors of the Central Nervous System, Revised 4th edition (1). Furthermore, it has been reported that due to different molecular subtypes, the choice of surgical resection range has different survival effects on patients with lower-grade glioma (grades II and III) (6). Based on the above research (5-7), it is necessary to predict the IDH status accurately before surgery and to guide the clinical development of appropriate tumor treatment plans.
Diffusion-weighted imaging (DWI) is a practical imaging technique that is widely employed in the clinic and is mainly used to detect the diffusion of water molecules (8). A meta-analysis showed that the quantitative measurement of the apparent diffusion coefficient (ADC) value can be used to grade gliomas with high accuracy (9). Our previous study demonstrated that the minimum ADC (ADCmin) can be used to predict the grading of neuroepithelial tumors (10). Prior studies (11,12) have shown that the characteristics of lesions, such as location, internal structure, and enhancement pattern, are different among the genetic subtypes of glioma. In addition, machine learning has been applied in different medical fields, including medical image interpretation, prediction of disease development, and treatment response (13,14). The advantage of machine learning is that it does not require any assumptions about the input variables and their relationships with the output; in addition, it is a fully data-driven learning method that does not rely on rules-based programming. Therefore, our study focused on the WHO 2016 classification criteria, applying machine learning methods to evaluate the value of clinically obtainable MRI features in predicting the IDH status of adult patients with diffuse grade II~IV glioma.

Patient Cohort
This retrospective study was approved by the Institutional Ethics Committee of the Chinese PLA General Hospital, which waived the requirement for written informed consent. From August 2015 to July 2020, through the hospital's local picture archiving and communication system (PACS), two radiologists (Z.J. and P.H., with 10 and 13 years of experience, respectively), continuously collated patients with WHO grade II~IV glioma who underwent brain MRI. The original study cohort was collected from August 2015 to December 2019 as the study set, and another 40 cases from January 2020 to June 2020 were collected as the test set. The inclusion criteria included (a) a confirmed histologic diagnosis in accordance with WHO grade II-IV glioma; (b) conclusive histopathological and immunohistochemical staining results; and (c) brain MRI examinations performed within 6 months of WHO II/III and within 5 weeks of WHO IV prior to neurosurgical treatment. The exclusion criteria included (a) an MRI scan with substandard quality, including an incomplete MRI protocol, the inability to compute the ADC map and obvious artifacts; (b) tumors other than WHO grade II~IV adult glioma; (c) incomplete or ambiguous histologic results; and d) previous treatment for glioma, such as radiotherapy, chemotherapy or immunotherapy. The flow chart of the enrolled patients (including the study set and test set) is provided in Figure 1.

MRI Examination
All enrolled patients underwent 3.0 T MRI. The MRI protocols included axial T2-weighted, axial or coronal T2 FLAIR, axial T1weighted, fat-suppressed contrast-enhanced T1-weighted (including axial, coronal and sagittal) imaging, susceptibilityweighted imaging (SWI) and diffusion-weighted imaging. DWI was performed with b values of 0 s/mm 2 and 1000 s/mm 2 and was used to derive the ADC maps. Our institution is a general hospital, and although the MRI scans came from several examination rooms, they were performed with the same system (GE Healthcare, Milwaukee, USA). The MRI machines and protocols used are provided in Supplementary Table 1.

Histopathologic Analysis
All tumors were surgically resected, and the lesion specimens were fixed with paraffin blocks during the operation. Then, the neurologic pathology group adopted the 2016 WHO glioma classification for gross pathology and immunohistochemical staining to analyze and provide the final results.

ADC Quantification
The interobserver and intraobserver levels of agreement for ADC were assessed from the measurements made by two blinded radiologists (JZ and HP, with 10 and 13 years of experience, respectively, both with professional qualification certificates). To assess intraobserver reproducibility, the first observer performed region of interest (ROI) delineation twice within one week following the same procedure each time. At the same time, the second observer independently delineated the ROI once, and the interobserver agreement was assessed by comparing the results with the ADC outcomes extracted from the first ROI delineation made by the first observer.
Three different ROIs (30-40 mm 2 ) were placed into the visually perceived lowest portions inside the tumors on the ADC maps, excluding hemorrhagic, cystic, and necrotic portions and calcifications that might influence the measured results without overlapping the ROIs. Then, the minimum ADC was defined as the average value of the ROIs with the lowest ADC values, as in Maynard et al. (11) and Xing et al. (12). Subsequently, following the same method, an ROI was delineated by selecting the contralateral centrum semiovale region (8,11), and defining the ADC value within it as ADCn. Thus, there were four ROIs per patient. Finally, the rADC (ADCmin to ADCn ratio) was calculated, resulting in three total ADC parameters (ADC min , ADCn, rADC) per patient.
In the test set (n = 40), all ADC values were obtained by two certificated radiologists (Y-YC and Y-LW, with 3 and 18 years of experience, respectively) according to the method described above. Examples of ROI delineations are shown in Figure 2.

Morphologic Assessment
Two board-certified radiologists (JZ and HP with 10-13 years of experience) independently evaluated 176 MRI datasets in this study for 1 month while being blinded to the pathologic results.
The selection and evaluation of the tumor morphology were performed according to previous publications (11,12). (a) Tumor location, which was specified by the center of the lesion, was divided into 4 groups: frontal lobe, other lobes (including parietal lobe, temporal lobe and occipital lobe), thalamus or brainstem, and cerebellum. (b) The maximum tumor diameter was measured by reference to the T2-weighted images, FLAIR images and contrast-enhanced T1-weighted images. (c) Contrast enhancement was categorized into 3 groups: nonenhancement, patchy enhancement, and rim enhancement. (d) Calcification and hemorrhage were observed and evaluated on T1-weighted imaging, susceptibility-weighted imaging, and CT, as available.
(e) Cystic changes and central necrosis were defined as a freeliquid intensity with a nonenhanced portion. (f) T2-FLAIR mismatch signs, which previous studies considered to be specific (15,16), were defined as tumors showing nearly homogeneous hyperintensity on T2-weighted images and relatively low intensity and peripheral hyperintensity on FLAIR sequences. Figures 3 and 4 show examples of different morphologic characteristics of gliomas on MRI in the study set.

Statistical Analysis
Statistical analyses were performed using SPSS (version 26.0) and Python (version 3.8). Intraclass correlation coefficient analysis was used to evaluate the interobserver and intraobserver levels of agreement for ADC measurements, applying a two-way randomeffects model. The interobserver agreement for morphologic categories was evaluated by Cohen's kappa analysis. For the agreement analysis, the outcomes were interpreted as follows: 0.2 or less, slight agreement; 0.21-0.40, fair agreement; 0.41-0.60, moderate agreement; 0.61-0.80, substantial agreement; and 0.81-1.00, almost perfect agreement.
The differences in ADC values among IDH subtype glioma groups were tested using nonparametric Kruskal-Wallis test. The relationship between morphologic features and glioma subtypes was analyzed using the chi-squared test. P<0.05 was considered to indicate a statistically significant difference.
In the univariable analysis, if the differences in a variable were significant (P<0.05) or an image feature had high consistency   (ICC >0.8; k >0.6), then it was chosen as a predictor variable for multivariable logistic regression to predict IDH subtypes of glioma.

Model Construction
For machine learning, we attempted to implement the following machine learning methods, which are currently the most popular machine learning methods used to classify glioma tumors (17)(18)(19)(20), to develop prediction models: logistic regression, support vector machine (SVM), Naive Bayes (NB) and Ensemble (random forest + eXtreme Gradient Boosting). The logistic regression model uses the maximum likelihood method to estimate and determine the regression coefficient and accurately predict the probability of dichotomy. SVM is a supervised learning algorithm that can clearly identify highdimensional boundaries and solve dichotomy problems (21). Ensemble algorithms include random forest and eXtreme Gradient Boosting. Random forest is an integrated algorithm that combines multiple decision trees together by voting to discriminate and classify data (22). eXtreme Gradient Boosting integrates many weak classifiers into a strong classifier, which is an optimized extreme gradient promotion to improve the predictive power (21,23). We also attempted NB, an efficient algorithm based on the Bayesian principle that uses the knowledge of probability in statistics to classify data sets (24). The construction process for each model is provided in Supplementary Data.
To evaluate the predictive accuracy of these machine learning models and select the most suitable model, we calculated and compared sensitivity, specificity, accuracy, the areas under the receiver operating characteristic curve (AUC) and F1 score (25). Then, the best machine learning model was chosen as the final model to evaluate the IDH subtype probability in the test set. In clinical practice, SWI and CT, which help to observe calcification, may be unavailable in some circumstances, an alternative model (model 2) was developed in which calcification status was excluded from the multivariable logistic regression model. Subgroup analysis was also performed to validate the final model on LGG and GBM. DeLong analysis was used to compare significant differences in the AUCs (26).

Patients Demographic Characteristics
The flow chart of the enrolled patients (including the study and test sets) is provided in Figure 1. After excluding patients because of non-adult patients (age<18 y, n=11), insufficient MRI scan quality (n=54), the presence of tumors other than WHO grade II-IV glioma (n=19, including 8 WHO grade I and 11 diffuse midline gliomas), ambiguous histology results (n=28), a duration from MRI to surgery longer than 6 months in WHO II/III or 5 weeks in WHO IV (n=30), or a previous treatment for glioma (n=7). A total of 176 patients (109 male and 67 female patients; mean age, 46.5 years; age range, 21-74 years) with lower-grade glioma (n=102) and glioblastoma (n=74) were ultimately enrolled in the analysis of the study set. There was no relationship found between glioma IDH subtype and sex, but patients with the IDH-wildtype status were more likely to be older than those with the IDH-mutant status, especially in cases of GBM. An overview of patient information, morphologic features and IDH subgroups is listed in Table 1 and Supplemental Table 2. P<0.01). In assessing the T2-weighted FLAIR mismatch sign, fair interobserver agreement was found (k=0.396, P<0.01). Cohen's kappa results for the morphology categories are provided in Supplementary Table 3.

ADC Quantification
The interobserver and intraobserver levels of reproducibility were almost perfect for all ADC parameters (ICC=0.80-0.95), which indicated that there was no systematic difference between the observers. The rADC correctly classified IDH-mutant and IDH-wildtype in WHO grade II~IV gliomas and LGG subgroup (P<0.05), but not in GBM subgroup (P=0.126). The results are shown in Figure 5. Nonparametric testing (Kruskal-Wallis analysis of variance) revealed an association between ADC value and IDH status (P<0.001). The ICCs for different ADC values are provided in Supplementary Tables 4 and 5.

Predictor Selection (Univariable Analysis and Machine Learning Model)
The chi-squared tests revealed associations between morphological features, including enhancement, calcification, cysts, hemorrhage, cystic change and T2-FLAIR mismatch, and IDH status (P<0.05). The univariable analysis results are shown in Table 2.
After univariable analysis selection, combined with features with substantial agreement (k >0.6), six measured variables were selected for incorporation into the machine learning model, including rADC, age, enhancement, calcification, hemorrhage, and cystic change. In terms of the prediction accuracy of the single model, logistic regression, SVM, NB and ensemble showed similar model performance to the study set (AUC=0.866-0.897). Among them, logistic regression exhibited the largest area under the curve (AUC= 0.897) and the model achieved better performance than others. Then, we chose multivariable logistic regression as the final model. Models 1 and 2 (not including calcification) performed almost equivalently, with an AUC of 0.890 for model 2. DeLong analysis showed no statistically significant difference between the two models (P=0.361). In the lower-grade glioma and GBM, the models also achieved better performance, with the accuracy of 84.3% (AUC = 0.873) and 85.1% (AUC = 0.862), respectively. The AUCs of the different machine learning models are presented in Figure 6.   The comparison of machine learning models is provided in Supplementary Table 6. The results of models 1 and 2, LGG and GBM are shown in Table 3 and Figure 7.

Test Set Results
To predict the probability of the IDH status of patients in the subsequent test set, the study set results were transcribed into Python for further calculation. From January 2020 to June 2020, 40 diagnosed glioma patients (20 with IDH-mutant and 20 with IDH-wildtype) were included in the test set according to the same inclusion criteria. Two blinded observers (Y-YC and Y-LW) replicated the ADC measurements used in the study set. The ICCs for different ADC values are provided in Supplementary Table 4. The AUCs of models 1 and 2, LGG and GBM in the test set are presented in Table 3 and Figure 7.  (7)]. To our knowledge, no previous attempts have been made to use different machine learning methods to build a suitable model combining clinical and magnetic resonance imaging features to predict the IDH molecular subtype for WHO grade II to IV gliomas. Furthermore, previous studies have used region-derived minimum ADC measurements to estimate glioma grade or molecular status (8,11,12,27,28). Not surprisingly, according to receiver operating characteristic curve analysis, the ADC value was shown to be a useful tool for detecting the IDH status in diffuse gliomas, and we found that there was a significant difference between IDH-mutant and IDH-wildtype gliomas (P<0.001). Our study revealed excellent interobserver and intra-observer reproducibility (ICC=0.80-0.95) for ROI measurements, similar to the repeatability results for ADC measurements described in other studies (8). The rADC  (ADCmin to nADC ratio) was used as a fixed parameter to ensure vendor neutrality and to reduce the potential bias. When drawing the ROI, this study only included the solid part, avoiding cystic or necrotic portions and hemorrhagic areas as much as possible, which is considered feasible on most clinical workstations. This method is partially consistent with the results reported by G.Z (29) who suggested that when drawing ROIs on ADC maps, selection of the solid part is necessary and is an optimal choice for differentiating GBM from metastasis. When testing the rADC for predicting IDH status, our study found that the ADCmin and rADC of IDH-mutant glioma were higher than those of IDH-wildtype glioma in WHO grade II~IV gliomas and LGG subgroup, but not in GBM subgroup. ADCmin has been confirmed to represent the area with the highest cellularity in heterogeneous tumors. In general, the lower the ADC value is, the denser the glioma cells, and the worse the prognosis, which is supported by several studies comparing diffusivity, histological specimens and clinical data (8,30). Hong et al. reported that ADC was significantly lower in IDHwildtype GBM than in IDH-mutant GBM (31). However, our study failed to find this result. One reason may be attributed to the difference in sample size, with only 10 IDH mutants in our GBM subgroup. The other reason may be due to the heterogeneity in GBM and different ROI biases. Glioblastomas have different subsets of genetic abnormalities that take part in tumorigenesis and transformation, especially IDH mutants, which may contain lower-grade tumor components (32). In our study, the lowest value of ADC was selected for analysis, which greatly avoided the measurement bias caused by measuring the whole tumor.
Although quantitative, computerized methods hold substantial promise for the noninvasive prediction of the molecular characteristics of glioma, we aimed to establish a model by combining several morphologic features that can be easily evaluated on conventional, standard MRI daily in the clinic. Considering the age and morphological characteristics of our population, consistent with previous research, younger age and forehead positions were more likely to be associated with mutation status (33,34). Arita et al. (35) found that IDH-wildtype gliomas were mainly distributed in the parietal lobe and, to some extent, the temporal lobe but were rarely involved the frontal lobe. In our study, IDH-wildtype status was similarly associated with a greater likelihood of distribution in cerebral lobes other than the frontal lobe. Moreover, thalamic or brainstem locations and cerebellar locations showed IDH-wildtype predominance, which concurs with a study by Maynard et al. (11).
Our study showed a significant difference in postcontrast enhancement patterns between glioma subtypes in WHO grade II-IV glioma. Indeed, tumor ring-enhancement is a predictor of IDH-wildtype status, indicating a tendency for invasive behavior. While it is increasingly recognized that nonenhancement tumors also comprise a substantial proportion of grade IV gliomas (36), it should be noted that images of atypical glioblastoma might not be easily distinguished from lower-grade gliomas on routine MRI. Furthermore, the presence of hemorrhage was not related to a particular subgroup in our study. Moreover, previous studies show that the T2-FLAIR mismatch sign has high specificity in diagnosing IDH-mutant astrocytoma (16). This tendency was also shown in our research results, but it was not selected for incorporation into the model due to the fair interobserver agreement (k=0.396).
Calcification and cystic components also significantly contributed to our predictive model in WHO grade II-IV glioma. The absence of calcification strongly correlated with the IDH-wildtype status in univariable analysis. This finding is consistent with previous studies that have extensively evaluated calcification in IDH-mutant gliomas (11). The interobserver agreement was moderate (k=0.719, P<0.01). We hypothesize that by expanding the sample size and optimizing the examination sequence, the certainty and concordance of the observers would further increase when observing calcification. Kanazawa et al. (37) found that both calcification and cystic components could be used to predict IDH-mutant status with 1p/19q deletion in lower-grade gliomas. However, in our study, cystic components were more likely to be found in IDH-wildtype tumors than in IDH-mutant tumors. Considering that IDHwildtype tumors are more necrotic than IDH-mutant tumors (38), we speculate that subjectivity and overlap with necrotic components limit the reproducibility of this correlation.
Several limitations of the current study should be noted. First, we did not include infantile gliomas because high-grade gliomas are a specific entity with a paradoxical clinical course that distinguishes them from their pediatric and adult counterparts (39). Second, the simplified description and measurements of the ADC values combined with DWI cannot fully reflect the complexity of cell components and structural changes; a more advanced MRI postprocessing method (for example, a method that uses semiautomatic or automatic segmentation to cover the total tumor volume) may partially overcome these limitations at the expense of more time-consuming preprocessing and postprocessing workflows. It is worth mentioning that our ADC measurements applied are available in most clinical workstations. Finally, our study is a retrospective study based on data from a single institution. The stability of the morphological features may be affected by differences in the MR parameters and protocol, the image postprocessing steps and the repeatability of ADC measurements. Therefore, the next step is to conduct a multi-center study to verify our inferences.
In conclusion, we demonstrated that the ADCmin to ADCn ratio, combined with tumor morphologic features, has high accuracy in predicting tumors with IDH-mutant status versus tumors with IDH-wildtype status in adult diffuse glioma. The combination may provide a noninvasive, significant and feasible alternative marker. Further studies in larger sample trials are needed to improve its clinical application value.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The Institutional Ethical Committee of the Chinese PLA General Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
JZ and LM conceived the study design. JZ, HP, Y-YC, X-BB and D-KZ were responsible for patient recruitment and acquired clinical information. JZ, Y-LW and H-FX conducted the quality assurance of image quality. JZ and HP were responsible for statistical analysis. JZ wrote the first draft of this manuscript. Y-LW and LM reviewed the manuscript. All authors contributed to the article and approved the submitted version.