Comparison of Conventional Logistic Regression and Machine Learning Methods for Predicting Delayed Cerebral Ischemia After Aneurysmal Subarachnoid Hemorrhage: A Multicentric Observational Cohort Study

Background Timely and accurate prediction of delayed cerebral ischemia is critical for improving the prognosis of patients with aneurysmal subarachnoid hemorrhage. Machine learning (ML) algorithms are increasingly regarded as having a higher prediction power than conventional logistic regression (LR). This study aims to construct LR and ML models and compare their prediction power on delayed cerebral ischemia (DCI) after aneurysmal subarachnoid hemorrhage (aSAH). Methods This was a multicenter, retrospective, observational cohort study that enrolled patients with aneurysmal subarachnoid hemorrhage from five hospitals in China. A total of 404 aSAH patients were prospectively enrolled. We randomly divided the patients into training (N = 303) and validation cohorts (N = 101) according to a ratio of 75–25%. One LR and six popular ML algorithms were used to construct models. The area under the receiver operating characteristic curve (AUC), accuracy, balanced accuracy, confusion matrix, sensitivity, specificity, calibration curve, and Hosmer–Lemeshow test were used to assess and compare the model performance. Finally, we calculated each feature of importance. Results A total of 112 (27.7%) patients developed DCI. Our results showed that conventional LR with an AUC value of 0.824 (95%CI: 0.73–0.91) in the validation cohort outperformed k-nearest neighbor, decision tree, support vector machine, and extreme gradient boosting model with the AUCs of 0.792 (95%CI: 0.68–0.9, P = 0.46), 0.675 (95%CI: 0.56–0.79, P < 0.01), 0.677 (95%CI: 0.57–0.77, P < 0.01), and 0.78 (95%CI: 0.68–0.87, P = 0.50). However, random forest (RF) and artificial neural network model with the same AUC (0.858, 95%CI: 0.78–0.93, P = 0.26) were better than the LR. The accuracy and the balanced accuracy of the RF were 20.8% and 11% higher than the latter, and the RF also showed good calibration in the validation cohort (Hosmer-Lemeshow: P = 0.203). We found that the CT value of subarachnoid hemorrhage, WBC count, neutrophil count, CT value of cerebral edema, and monocyte count were the five most important features for DCI prediction in the RF model. We then developed an online prediction tool (https://dynamic-nomogram.shinyapps.io/DynNomapp-DCI/) based on important features to calculate DCI risk precisely. Conclusions In this multicenter study, we found that several ML methods, particularly RF, outperformed conventional LR. Furthermore, an online prediction tool based on the RF model was developed to identify patients at high risk for DCI after SAH and facilitate timely interventions. Clinical Trial Registration http://www.chictr.org.cn, Unique identifier: ChiCTR2100044448.


INTRODUCTION
Aneurysmal subarachnoid hemorrhage (aSAH) is a severe acute cerebrovascular disorder resulting in high morbidity and mortality; roughly 50% of aSAH survivors have permanent neurological deficits (Molyneux et al., 2005;Fugate and Rabinstein, 2012). Delayed cerebral ischemia (DCI) is the most frequent complication after aSAH, affecting ∼ 30% of patients, often causing serious damage because of its late diagnosis (Macdonald, 2014;Francoeur and Mayer, 2016). Hence, timely and accurate prediction of DCI is critical for the treatment and prognosis of patients with aSAH. A precise, reliable model for early prediction of DCI development is urgently needed.
Traditional logistic regression (LR) is the primary method to construct models for predicting disease outcomes. However, when LR is used for complex multivariate non-linear relationships, complex transformations are often required owing to low robustness and multicollinearity between variables (Tu, 1996). Machine learning (ML) is valuable for analyzing clinical data because it can fully employ input features and predict outcomes more accurately (Jordan and Mitchell, 2015). Several studies suggested that in DCI, ML models utilizing admission clinical characteristics have better predictive power than LR de Jong et al., 2021;Savarraj et al., 2021). However, the model performance is not generally high due to the incomplete clinical features. Admission clinical characteristics include baseline information, laboratory test results, and imaging data, and the fragmented application of these data may reduce predictive performance; therefore, these features must be systematically utilized. To the best of our knowledge, there is no study that utilizes relatively complete clinical features to construct ML and LR models, some of which were not compared in previous studies.
We determined several types of the currently most popular ML algorithms to achieve the following aims. First, we constructed and validated a conventional LR and several ML models based on relatively complete clinical features on admission. Second, we compared the predictive performances of the LR and ML models. Third, we established an online prediction tool based on the important features identified by the optimal model, which is convenient for clinicians and can precisely calculate the risk of DCI after aSAH.

Study Design and Patient Enrollment
This multicenter, retrospective, observational cohort study utilized clinical data from the electronic health record system. The study participants consisted of all adult patients with aSAH within 24 h of onset who were treated in the Department of Neurosurgery from April 2019 to June 2021, Renmin Hospital of Wuhan University, Huzhou Central Hospital, Affiliated Hospital of Panzhihua University, General Hospital of Northern Theater Command, First Hospital of Shanxi Medical University. The study eventually enrolled 404 patients (Figure 1). According to SAH guidelines, aSAH was diagnosed using head computed tomography (CT), CT angiography, or digital subtraction angiography.
The exclusion criteria were: (1) admission time exceeded 24 h after onset, (2) intracerebral hemorrhage or vascular FIGURE 1 | The flowchart of study design and detailed patient enrollment. malformation, (3) acute infection, (4) postoperative state on admission, (5) bilateral mydriasis or other permanent brain injuries on admission, (6) non-surgical treatment, and (7) patients who died within 3 days after admission.

Clinical Data Collection
Patient demographic data (sex, age), medical history (hypertension, diabetes mellitus, coronary heart disease, smoking, alcohol consumption, anticoagulant treatment, and previous diseases), and clinical state on admission [World Federation of Neurosurgical Societies (WFNS), Hunt and Hess grade (HH), and modified Fisher scale (mFS)] were collected. Aneurysmal details were also recorded, including aneurysm number, location, length, neck size, and treatment. Surgical methods and laboratory tests on admission (glucose, D-dimer, as well as white blood cell [WBC], neutrophil, lymphocyte, and monocyte counts) were also utilized in this study.

CT Value Assessment
The CT values of subarachnoid clots and cerebral edema were manually measured and collected, and measurement methods and references are provided in the Supplementary Data 1.
Regions of interest (ROIs) were manually drawn on the central area of the blood clots in representative slices by two neurosurgeons who were blinded to the patients' clinical information. The mean blood clot density in the subarachnoid space was measured in each ROI (a circle 3-8 mm across), returning the mean Hounsfield Unit (HU) value. Subarachnoid cisterns/fissures, including the lateral Sylvian fissure, anterior interhemispheric fissure, medial Sylvian fissure, suprasellar cistern, ambient cistern, and quadrigeminal cistern, were used to determine the mean HU (Woo et al., 2017;Kanazawa et al., 2020).
Regions of interest (circles 5-10 mm across) of the cerebral edema were bilaterally and symmetrically drawn on a representative CT slice. If blood clots were below the insular cortex, the ROI was drawn on the thalamus and basal ganglia. Otherwise, the ROI was drawn on the bilateral centrum semiovale (Claassen et al., 2002;Ahn et al., 2018).

Outcome Definitions
The definition of DCI should meet at least one of the following criteria: (1) no other etiology could have caused a permanent or temporary focal neurological impairment (such as aphasia, apraxia, hemianopia, or neglect) between 4 and 14 days after aSAH; (2) the Glasgow Coma Scale score decreased by at least two points [either on one of its components (eye opening, verbal response, motor response), or on total score]; and (3) head CT scans revealed a low-density area that was not noticeable on admission or immediately after the operation, and there were no other causes except vasospasms between 4 and 30 days after aSAH (Vergouwen et al., 2010).

Sample Size
We used the events per variable criterion with a value of 10 (Peduzzi et al., 1996) to estimate the effective sample size in this study. Our preliminary analysis indicated that nine variables were entered into a multivariable LR model. Hence, at least 90 patients with DCI should be included in the training cohort. Moreover, according to the risk of DCI occurrence after SAH, ∼30% worldwide, there should be at least 300 patients in the model training cohort.

Processing of Missing Data
This dataset included 17 patients with missing values, which accounted for <5% of the study population, so we directly used the missing value deletion method to process the data (Eekhout et al., 2012).

Model Development
A total of 404 patients with aSAH from five medical centers were prospectively enrolled. We randomly divided the patients as training cohort (N = 303) and validation cohort (N = 101) according to a ratio of 75-25%. The training cohort was utilized to develop a conventional LR, k-nearest neighbor (KNN), support vector machine, decision tree, random forest (RF), extreme gradient boosting, and artificial neural network (ANN) models.

LR
The model was trained by fitting the predictor variables with P < 0.1 in univariate analysis to multivariate logistic analysis. We used the backward stepwise regression method based on the Akaike information criterion to select the optimal variables and constructed a final LR model. "MASS" package in R software was performed to fit the model.

LASSO
LASSO regression, which is suitable for analyzing highdimensional data, was used to select the most informative prediction variables. We used the "glmnet, corrplot, caret" packages and 10-fold cross-validation to obtain the optimal λ and factors.

KNN
KNN model uses local geographic information in the predictive environment to predict the results of the new samples. For example, a KNN model with ten neighbors uses the ten closest observations in multidimensional space to predict the results of a new sample based on a distance assessment. The optimal K value was determined by 10-fold cross-validation and the "e1071, class, kknn, kernlab, caret" packages.

SVM
The uniqueness of SVM algorithms is that they mainly use data points from each result class that is closest to the class boundary or misclassified when determining the boundary structure. The radial basis function was applied in this work, and the optimal gamma value and minimum error of the SVM model were determined by 10-fold cross-validation.

DT
DT algorithms partition the sample data by splitting prediction features at discrete cut-points and are usually presented in the form of a tree. In this study, the decision tree algorithm uses the Gini index to determine each split's optimal variable and location. The cost complexity parameter that penalizes more complex trees is used to control the size of the final tree. Ten-fold cross-validation and "rpart, partykit, caret" packages were used to determine the minimum error value.

RF
RF builds a predictive model by sampling objects and variables, generating multiple decision trees, and classifying objects in turn. Finally, the classification results of each decision tree are summarized, and the mode category in all prediction categories is the category of the object predicted by the RF model. The optimal number of trees was determined using 10-fold cross-validation and "randomForest" package.

XGBoost
XGBoost is an optimized distributed gradient enhancement library designed to be efficient, flexible, and portable. It implements ML algorithms under the Gradient Boosting framework. The optimal parameters were determined by "xgboost" package and 10-fold cross-validation.

ANN
ANN is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed and parallel information processing. This kind of network relies on the system's complexity, adjusts the interconnection between a mass of internal nodes to achieve the purpose of processing information, and has the ability of self-learning and self-adaptation. Ten-fold cross-validation and "caret, MASS, neuralnet, vcd" packages were conducted to determine the optimal parameters of this model.

Model Performance Evaluation
We used the area under the receiver operating characteristic curve (AUC) with 95% confidence intervals (95% CIs), accuracy, balanced accuracy, confusion matrix, sensitivity, and specificity indicators in both training and validation cohorts to evaluate model performance. The AUC value was used to assess model discrimination, while the calibration curve with 10-fold crossvalidation (1,000 resample) and Hosmer-Lemeshow test can reflect the model calibration performance.

Statistical Analysis
We applied the Kolmogorov-Smirnov test to determine the data distribution before formally analyzing the data. Continuous variables analyzed using the independent t-test or Mann-Whitney U-test are presented as mean ± SD or median with interquartile range. Categorical variables analyzed using the chi-square or Fisher's exact tests are expressed as numbers (percentages). The statistical difference between the AUCs of these models was completed by DeLong test. The feature importance was calculated by Gini index using RF algorithm.
The total score of all feature importance was added up to 100. A higher importance coefficient commonly indicated a stronger influence on the occurrence of DCI. For continuous variables that were important for DCI indicator, we used the Youden index to calculate the cut-off value to distinguish patients who were prone to be DCI. All statistical tests were two-tailed and p < 0.05 were considered statistically significant. Statistical analyses were conducted using IBM SPSS Statistics for Windows, version 26.0, (IBM Corp., Armonk NY, USA) and R software, version R×64 4.1.0 (https://www.r-project.org/).

Baseline Characteristics
The number of patients with DCI were 85 (28%) and 27 (27%) in training and validation cohorts, and women comprised 179 (59%) and 68 (67%) patients in the two groups, respectively. The median age in both the cohorts was 57 years. In terms of other admission clinical features, there were more patients with mFS of 3-4 point in the validation cohort than training cohort (P < 0.05), and the aneurysm mean length size in the validation cohort was larger than the training cohort (p < 0.05). Among the patients with aSAH in the validation cohort, there is a larger proportion of patients who chose aneurysm clipping (p < 0.05). However, there were no significant differences in medical history, disease history, other clinical conditions, aneurysm location, aneurysm number, admission laboratory results, and admission CT value between the two cohorts (P > 0.05). Table 1 shows the detailed baseline characteristics of the datasets. We also analyzed the baseline characteristics of the DCI and non-DCI groups in the training cohort. Table 2 shows the detailed baseline data of the two groups in the training cohort.   with a same AUC (0.858, 95%CI: 0.78-0.93, DeLong: P = 0.26) still performed well than the LR. Furthermore, the accuracy and balanced accuracy of the RF were 20.8 and 11% higher than the latter. Supplementary Table 1 shows the confusion matrix and balanced accuracy of ML and LR model using training and validation cohorts. Figure 2 and Table 3 present the performances of all models when using the training and validation cohorts. In addition, Figure 3 demonstrates that the superior RF model had a good calibration performance according to the calibration curve and Hosmer-Lemeshow test in the training (X 2 = 8.78, df = 8, P-value = 0.36;) and validation cohort (X 2 = 10.97, df = 8, P-value = 0.203). Table 4 and Figure 4 show the process of model development.

Individual Variable Importance
The five most important features for DCI prediction were CT value of subarachnoid hemorrhage (15.68), WBC count (13.72), neutrophil count (12.28), CT value of cerebral edema (8.54), and monocyte count (7.54). The cut-off value of WBC, neutrophil, and monocyte counts for predicting DCI were 11.2 × 10 ∧9 /L, 9.58 × 10 ∧9 /L, and 0.46 × 10 ∧9 /L, respectively. Moreover, the cut-off value of CT value in subarachnoid hemorrhage and cerebral edema were 60.12 (HU) and 28.15 (HU). Figure 5 shows all input feature importance. An online prediction tool (https://dynamic-nomogram.shinyapps. io/DynNomapp-DCI/) was developed based on the five optimal predictors in the RF model, which could precisely calculate the risk value of DCI after aSAH. A risk percentage of 50% calculated by this tool commonly represents an occurrence of DCI in patients with aSAH. Figure 6 displays the interface of the online tool for predicting DCI. Both decision curve analysis and clinical impact curve on the validation cohort showed a superior overall net benefit over the entire range of threshold probabilities (Figure 7).

DISCUSSION
In this study, the eligible patients with aSAH from five medical centers were randomly divided into model training and validation cohorts. One conventional LR and six types of famous ML methods were used to construct the prediction model by incorporating relatively complete admission clinical data, and all model performances were assessed and compared. To the best of our knowledge, this study is the first to utilize the rounded clinical features to develop the model and systematically compare the performance of several popular ML methods and conventional LR on DCI prediction. In addition, firstly, we developed an online prediction tool based on the most important features of the RF model to precisely calculate the risk of DCI development. It was considered that only a few admission clinical features would not lead to an accurate DCI prediction. However, the most commonly used multivariable prediction models are still based on LR. For instance, de Rooij et al. (2013) incorporated some features selected by LR and constructed a practical risk chart for DCI prediction. The AUC value of this risk chart was 0.69 in the validation cohort. Liu et al. (2020) used six factors selected via LR to develop a nomogram for DCI, which achieved an AUC value of 0.65 on the test set. Other studies have also employed the conventional LR method to identify independent factors for DCI prediction (Al-Mufti et al., 2017, 2019aDuan et al., 2018;Hurth et al., 2020). In our study, the LR model incorporated four independent features for DCI classification and achieved an AUC value of 0.837 in the validation cohort, which was higher than the AUC values previous models reported (de Oliveira Manoel et al., 2015;Foreman et al., 2017;van der Steen et al., 2019;Liu et al., 2020). The inclusion of complete admission clinical information can enable the LR to select the optimal variables to improve the prediction performance, which may explain the better performance of our LR model. However, owing to the robustness of the LR model, it cannot take full advantage of information from all clinical input features.
Machine learning models can solve the problem of highdimensional data more robustly than the conventional LR method, making them suitable for fitting more features for prediction (Brusko et al., 2018;Buchlak et al., 2020). This capability can reduce the subjectivity in statistical analysis and ensure the objectivity of the results. Recently, ML algorithms have been developed rapidly, and some studies have reported the use of ML to predict the occurrence of DCI. de Jong et al. (2021) constructed a feedforward artificial neural network model and achieved an AUC of 0.72 for DCI prediction with a database with 362 patients. Their model performed equally well as the VASOGRADE model (de Oliveira Manoel et al., 2015). The ANN model in our study, with an AUC of 0.858, had a better predictive power than the conventional LR model and outperformed the previous ANN model. Some researchers have compared the performances of LR and ML models for the prediction of DCI or other diseases. For instance, Savarraj et al. (2021) developed ML and LR models for DCI classification using a dataset with 399 patients. Their results showed that the ML model with the highest AUC value of 0.75 ± 0.07 outperformed the LR model. Ramos et al. (2019) reported that the ML model with the highest AUC value of 0.74 performed better than the best LR model with an AUC of 0.63. However, Nusinovici et al. (2020) reported that the LR model could perform equivalently to the ML models in their study, and Chen et al. (2020) showed that ML models   cannot outperform the conventional LR model in predicting other diseases. In our study, we constructed several popular ML models based on the relatively complete clinical features, some of which were not compared in previous studies. The prediction ability of the LR model was inferior to those of the ANN and RF models, but better than those of the KNN, support vector machine, decision tree, and extreme gradient boosting models. This indicates that the traditional LR method still can play an important role in DCI prediction. Although ML can make perfect use of the input characteristics, data overfitting may lead to poor prediction performance.
Subarachnoid hemorrhage is a state of systemic inflammatory response syndrome, with both biochemical and cellular reactions (Parkinson and Stephensen, 1984). SAH initiates the rapid activation of the inflammatory cascade, and growing evidence suggests that an early neurovascular inflammatory response is a potential mechanism of late cerebral vasospasm and early brain injury (Helbok et al., 2015). The CT value in SAH often represents the subarachnoid clot density and can reflect the cerebral inflammatory response. At present, the measurement method of CT density value of subarachnoid clot still relies on the manual drawing of ROI. Kanazawa et al. (2020) found that an ROI CT value of ≥49.95 HU is correlated with DCI occurrence. Our results are consistent with those of previous studies showing that the CT value of >60.12 HU plays a prominent role in DCI prediction. Additionally, Ahn et al. (2018) constructed a scoring system for predicting DCI and clinical outcomes based on early cerebral edema after aSAH. This scoring system may become a surrogate marker of early brain injury and predicts DCI and prognosis after aSAH. Our consequence also illustrates that early cerebral edema also has an important influence on DCI prediction. As we know, WBC and neutrophil counts also play an important role in reflecting neuroinflammatory responses. Al-Mufti et al. (2019b) found that a WBC count >12.1 × 10 9 /L was the strongest predictor of DCI after adjusting for confounding factors, including clinical grade and aneurysm clipping treatment. Our results found that the WBC count >11.2 × 10 ∧9 /L, neutrophil count >9.58 × 10 ∧9 /L, and monocytes count >0.46 × 10 ∧9 /L were the most important features for the prediction DCI. A recent study has shown that admission WBC, neutrophil, and monocyte counts were higher in patients with DCI and unfavorable prognosis (Gusdon et al., 2021). Inspiringly, our study confirmed this, which could account for the fact that DCI development is closely relative to the inflammatory response. Future basic research should further explore the inflammatory machine during the occurrence of DCI.
Based on the superior prediction performance of the RF, we used the most important features to construct an online prediction tool, which will aid in the early identification of patients at high risk of DCI after aSAH and allow timely interventions.
Our study systematically collected admission baseline information, laboratory test results, and admission CT imaging data, and these pieces of information are representative as possible of the true condition of aSAH patients when they are admitted to the hospital. Secondly, in order to avoid the defects of single-center data modeling, we collected data from multiple medical centers, making the DCI prediction model more generalized and robust, which is the second innovation of this study. Thirdly, this study covers several of the most popular machine learning algorithms, which have not been systematically compared with conventional models in previous studies, which is also an innovation point. Fourth, we built an online version of the prediction tool, which is convenient for clinicians to calculate the risk of DCI based on patient information at admission. However, there are several limitations that were observed. This was a retrospective study, and a larger prospective study should be considered to validate our results. Second, a possible deviation caused by manual ROI drawing is unavoidable. The agreement measurements for CT values between an experienced neurosurgeon and a radiologist were acceptable. Third, having an accuracy of 1 or AUC of 1 on the training dataset means the model is perfect, which is clearly not the case. Among the model we constructed, the random forest has overfitting. We know that overfitting may occur when the model tries to fit all the predicted features with a limited training dataset, which is to say a modeling error in statistics that occurs when a function is too closely aligned to the training dataset. Our future studies will collect more samples to further verify the results of the RF mode.

CONCLUSIONS
In this multicenter study, we found that several ML methods, particularly random forest, outperformed conventional LR. Furthermore, an online prediction tool based on the random forest model was developed to identify patients at high risk for delayed cerebral ischemia after subarachnoid hemorrhage and facilitate timely interventions.

DATA AVAILABILITY STATEMENT
The raw data that supporting the findings of this study are available from the corresponding author upon reasonable request.

ETHICS STATEMENT
The Medical Ethics Committee of Renmin Hospital of Wuhan University (our principal affiliation site) approved the study protocol (approval number WDRM2021-K022). The Ethics Committees of Huzhou Central Hospital (202108005-01), the Affiliated Hospital of Panzhihua University (202105002), General Hospital of Northern Theater Command (Y2021060), and the First Hospital of Shanxi Medical University (2021-Y6) also approved the protocol. The Medical Ethics Committee waived the need for patient consent because the data were derived from the electronic health record system. The patients/participants provided their written informed consent to participate in this study.