AUTHOR=Elsaid Ahmed F. , Fahmi Rasha M. , Shehta Nahed , Ramadan Bothina M. TITLE=Machine learning approach for hemorrhagic transformation prediction: Capturing predictors' interaction JOURNAL=Frontiers in Neurology VOLUME=Volume 13 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/neurology/articles/10.3389/fneur.2022.951401 DOI=10.3389/fneur.2022.951401 ISSN=1664-2295 ABSTRACT=Background and purpose: Ischemic stroke patients frequently develop hemorrhagic transformation (HT), which could potentially worsen prognosis. The objectives of the current study were to determine the incidence and predictors of HT, evaluate predictors interaction, and to identify the optimal predicting models. Methods: A prospective study including 360 ischemic stroke patients, of whom 354 successfully continued the study. Patients were subjected to thorough general and neurological examination and T2-diffusion weighted MRI, at admission and one week later to determine the incidence of HT. HT Predictors were selected by a filter-based minimum redundancy maximum relevance (mRMR) algorithm independent of models performance. Several machine learning algorithms including multivariable logistic regression classifier (LRC), support vector classifier (SVC), random forest classifier (RFC), gradient boosting classifier (GBC), and multilayer perceptron classifier (MLPC) were optimized for HT prediction in a randomly selected half of the sample (training set) and tested in the other sample-half (testing set). Models predictive performance were evaluated using receiver operator characteristics (ROC) and visualized by observing case distribution relative to the models’ predicted three-dimensional (3D) hypothesis spaces within the testing datasets true feature space. Interaction between predictors was investigated using a generalized additive modelling (GAM). Results: The incidence of HT in ischemic stroke patients was 19.8%. Infarction size, cerebral microbleeds (CMB), and National Institute of Health stroke scale (NIHSS) were identified as the best HT predictors. RFC (AUC: 0.91, 95%CI: 0.85-0.95) and GBC (AUC: 0.91, 95%CI: 0.86-0.95) demonstrated significantly superior performance compared to LRC (AUC: 0.85, 95%CI: 0.79-0.91) and MLPC (AUC: 0.85, 95%CI: 0.78-0.92). SVC (AUC: 0.90, 95%CI: 0.85-0.94) outperformed LRC and MLPC but didn’t reach statistical significant. LRC and MLPC didn’t show significant difference. The best models 3D hypothesis spaces demonstrated nonlinear decision boundaries suggesting interaction between predictor variables. GAM analysis demonstrated linear and nonlinear significant interaction between NIHSS and CMB as well as between NIHSS and infarction size, respectively. Conclusion: CMB, NIHSS, and infarction size were identified as HT predictors. The best predicting models were RFC and GBC capable of capturing nonlinear interaction between predictors. Predictor interaction suggests a dynamic, rather than, fixed cutoff risk value for any of these predictors.