Characterizing Risk of In-Hospital Mortality Following Subarachnoid Hemorrhage Using Machine Learning: A Retrospective Study

Background Subarachnoid hemorrhage has a high rate of disability and mortality, and the ability to use existing disease severity scores to estimate the risk of adverse outcomes is limited. Collect relevant information of patients during hospitalization to develop more accurate risk prediction models, using logistic regression (LR) and machine learning (ML) technologies, combined with biochemical information. Methods Patient-level data were extracted from MIMIC-IV data. The primary outcome was in-hospital mortality. The models were trained and tested on a data set (ratio 70:30) including age and key past medical history. The recursive feature elimination (RFE) algorithm was used to screen the characteristic variables; then, the ML algorithm was used to analyze and establish the prediction model, and the validation set was used to further verify the effectiveness of the model. Result Of the 1,787 patients included in the mimic database, a total of 379 died during hospitalization. Recursive feature abstraction (RFE) selected 20 variables. After simplification, we determined 10 features, including the Glasgow coma score (GCS), glucose, sodium, chloride, SPO2, bicarbonate, temperature, white blood cell (WBC), heparin use, and sepsis-related organ failure assessment (SOFA) score. The validation set and Delong test showed that the simplified RF model has a high AUC of 0.949, which is not significantly different from the best model. Furthermore, in the DCA curve, the simplified GBM model has relatively higher net benefits. In the subgroup analysis of non-traumatic subarachnoid hemorrhage, the simplified GBM model has a high AUC of 0.955 and relatively higher net benefits. Conclusions ML approaches significantly enhance predictive discrimination for mortality following subarachnoid hemorrhage compared to existing illness severity scores and LR. The discriminative ability of these ML models requires validation in external cohorts to establish generalizability.


INTRODUCTION
Subarachnoid hemorrhage (SAH) is a type of hemorrhagic stroke that accounts for 3% of all stroke types. With the development of medicine, the global case fatality rate has decreased from 50% to 17%, but the mortality rate of subarachnoid hemorrhage remains high (1)(2)(3). In addition, survivors are often left with a permanent disability, cognitive deficits (particularly in executive function and short-term memory), and mental health symptoms (depression, anxiety), leading to significant reductions in health-related quality of life. In recent years, machine learning (ML), as an area of artificial intelligence, has been able to learn from data based on computational modeling. Similarly, ML can fit higherorder relationships between covariates and outcomes in datarich environments (4)(5)(6).
The purpose of this study was to determine whether ML algorithms using demographics, comorbidities, laboratory tests, and other variables can predict the prognosis of SAH fairly accurately and to identify factors that contribute to predictive ability.

Data Source
This study was a retrospective study based on the Medical Information Mart for Intensive Care IV (7) (MIMIC-IV version 1.0) database. An individual who has finished the Collaborative Institutional Training Initiative examination (Certification number 43357625 for author Deng) can access the database.

Participant Selection
Inclusion criteria are as follows: (1) patients with subarachnoid hemorrhage confirmed by ICD-9 or ICD-10; (2) people with an age of more than 16 years old; and (3) admission to ICU with the Glasgow coma score (GCS). Moreover, for patients with ICU admissions more than once, only data of the first ICU admission of the first hospitalization were included in the analysis.

Predictors
In this study, the data were extracted from MIMIC-IV, including age, gender, race, language, GCS, sepsis-related organ failure assessment (SOFA) score, and history of trauma. Then, we extracted data containing vital signs, laboratory findings, treatment history of heparin, and antibiotics during hospitalization. Besides, we collected the Charlson comorbidity index (CCI) composed of myocardial infarction, congestive heart failure, peripheral vascular disease, cerebrovascular disease, dementia, chronic pulmonary disease, rheumatic disease, peptic ulcer disease, diabetes, paraplegia, renal disease, malignant cancer, severe liver disease, metastatic solid tumor, and acquired immunodeficiency syndrome (AIDS).

Outcomes
Patients diagnosed with subarachnoid hemorrhage died during hospitalization.

Statistical Analysis
Categorical variables were presented as numbers and percentages that were analyzed using the χ 2 test or the Fisher exact test, while continuous variables were expressed as mean ± SD or median with interquartile range (IQR), which were analyzed by an independent t-test or Mann-Whitney U test.  Each feature has different importance or coef attributes in the model, and these data determine the importance of the feature in the model. Recursive feature elimination (RFE) returns the importance of each feature through the learner (8,9). Then, the least important feature is removed from the current feature set. This step of recursion on the feature set is repeated until the required number of features is finally reached. Then, features are then considered in groups of 5-60; they are organized according to the grade obtained by the feature selection method. In order to find the best hyperparameters, 10-fold cross-validation is used as a resampling method. In each iteration, every nine folds are used as a training subset, and the remaining one is processed to adjust the hyperparameters. In this way, each sample will participate in the training model and test the model, so that all data can be used to the greatest extent.
In this study, we divided the data set (ratio 70:30), trained the model, and verified it. We calculated the median and 95% confidence interval of the area under the curve (AUC), where the AUC value of 1.0 indicated complete discrimination and  0.5 indicated no discrimination. Finally, the accuracy, sensitivity, specificity, negative predictive value, and positive predictive value of external data verification were calculated. Additionally, we conducted the decision curve analysis (DCA) to determine the clinical usefulness of the included variables by quantifying the net benefit at different threshold probabilities. All analyses were performed by the statistical software package R version 4.1.3 (http://www.R-project.org, The R Foundation). FIGURE 3 | Area under receiver operating characteristic curve by different Model1 algorithms in the validation cohort. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO 2 , bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score. In our study, we used the "Caret" R packages to achieve the process. P values less than 0.05 (two-sided test) were considered statistically significant.

Baseline Characteristics
Variable values of the SAH patients in MIMIC-IV were analyzed. A total of 1,787 cases were included in the study, of which 349 died during hospitalization. It is found from the data in the table that the infection indexes of the dead patients are significantly increased, and the coagulation system has an abnormal function, thrombocytopenia, electrolyte disorder, and so on. At the same time, the temperature and oxygen saturation of these patients fluctuate more widely and are more likely to be accompanied by other diseases ( Table 1 and Figure 1).

Variable Importance
Through feature screening by the RFE algorithm, we find that it has the highest accuracy when 20 features are included ( Figure 2). In order to further simplify the model, we choose the models with an accuracy similar to the best feature number to verify the analysis. Therefore, we establish the prediction model with the characteristic numbers of 10 and 20. Model1 includes GCS, glucose, sodium, chloride, SPO 2 , bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score, while Model2 include GCS, glucose, sodium, chloride, SPO 2 , bicarbonate, temperature, white blood cell (WBC), heparin use, SOFA score, creatinine, bun, platelet, age, marital, trauma, lymphocytes, calcium, race, and cerebrovascular disease. Then, these variables were used in all the subsequent analyses for all models in both training and testing sets.

Prediction performance of different models
We use 10 features and 20 features to establish the traditional regression and ML models, respectively.  Table 3 and Figure 4). Through the Delong test, different models constructed by LR, NNET, and XGB algorithms are different ( Table 4). Comparatively, RF-Model 1 had the highest predictive performance among these models. The decision curve is suitable for comparing the net benefits of the best model and alternative methods of clinical decision-making. Among the two different models, the net benefit of the model composed of the GBM algorithm is higher than that of other models, indicating that the model has a better effect in predicting the in-hospital mortality of SAH ( Figures 5, 6).
Through the importance ranking of the ML algorithm, the first 10 important characteristics of two different models of RF are consistent (Figure 7). Moreover, the importance of the GCS accounted for the highest proportion.

Performance of Models in Subgroup (Non-Traumatic Subarachnoid Hemorrhage) Analysis
In order to verify the prediction ability of the model in nontraumatic subarachnoid hemorrhage, we took the cases without definite trauma as a new research subgroup ( Table 5) and divided them into a training set and a test set (ratio 70:30). After establishing the model with the simplified characteristic variables in the training set, the prediction ability was verified with the test set. Within the training set, the LR, RF, GBM, NNET, SVM, XGB, Ada, and NB models were established, and the testing set obtained AUCs of 0.909, 0.951, 0.955, 0.891, 0.929, 0.956, 0.947, and 0.921 ( Table 6 and Figure 8). Among the eight models, GBM has the highest prediction performance and NNET has the worst generalization ability. As shown in Figure 9, the net benefit of the GBM model exceeded that of other ML models and LR regression models, indicating that the model has better performance in predicting the queue.

DISCUSSION
Subarachnoid hemorrhage (SAH) has a high mortality and disability rate, and many complications may occur after the onset, while most of the current studies have used a single feature for prognosis research, ignoring the adverse outcomes caused by other factors. Recently, a large number of studies have reported that peripheral blood, biochemical, and other conventional indicators are associated with the prognosis of subarachnoid hemorrhage, so we used the indicators commonly found in the mimic database for model building.
In this study, we use RFE to screen important features. After simplifying the model, we use the traditional logistic regression and ML algorithm for modeling. There is basically no significant difference in the prediction ability between these simplified models and the best models. At the same time, the simplified models can reduce the phenomenon of overfitting and are more suitable for clinical use to reduce unnecessary workloads. In subgroup analysis, the model established with the same characteristics has higher AUCs, which also proves FIGURE 6 | Decision curve analysis of Model2. LR, logistic regression; RF, random forest; GBM, gradient boosting machine; NNET, artificial neural network; SVM, support vector machine; XGB, eXtreme gradient boosting; Ada, adapting boosting; NB, naïve Bayes; Model2 was adjusted for GCS, glucose, sodium, chloride, SPO 2 , bicarbonate, temperature, white blood cell (WBC), heparin use, SOFA score, creatinine, bun, platelet, age, marital, trauma, lymphocytes, calcium, race, and cerebrovascular disease.
that the model has a better ability to predict the prognosis of patients with non-traumatic subarachnoid hemorrhage.
From the study, we found a larger association of mortality with patients' electrolyte levels, glucose levels, and whether they used heparin in addition to the traditional GCS. In addition, the SOFA score also history a significant mortality factor, and this score mainly describes indicators of impairment in multiple organ functions (10) (respiratory, neurological, cardiovascular, hepatic, coagulation, and renal). The underlying mechanism may be caused by the patient's past medical history leading to organ failure or by coagulopathy due to bleeding. Impaired consciousness occurs in some patients after SAH. GCS is assessed by the ability to eye opening, best verbal response, and best motor response, can easily and rapidly assess the state of consciousness of a patient, and to identify development of complications and the potential degree of ultimate recovery (11). Meanwhile, in our study, glucose level served as an important factor in the prediction of death. Pppacena et al. found that higher blood glucose was associated with higher mortality after SAH (12). Meanwhile, a higher rate of glycemic variability was also associated with prognosis after SAH (13).
Recently, the neutrophil to lymphocyte ratio (NLR) was reported by most literature studies to have a correlation with the prognosis of SAH (14), so we also calculated NLR as a feature. In univariate analysis, there was a clear difference between the two groups, and after filtering by ML algorithms, NLR failed to be included in the model as a better feature, perhaps because of inconsistent outcomes across studies. The higher importance of leukocytes at the same time is consistent with the finding by Srinivasan et al. and Chamling et al. that early elevation of peripheral leukocytes is associated with the occurrence of DCI and poor functional outcomes (15).
Sodium and chloride are important components of electrolytes in humans, and 36% of SAH patients present with hyponatremia after the onset, mainly as a result of cerebral salt-wasting syndrome (CSWS) and syndrome of inappropriate antidiuretic hormone secretion (SIADH). Vrsajkov et al. and Saramma et al. found better outcomes in patients who did not develop hyponatremia during ICU treatment (16,17). Hyponatremia has also been reported to be associated with an increased risk of vasospasm. This may be the main reason for the poor prognosis of patients (18).
Low bicarbonate concentrations occur in patients with severe acute illness. Although the current mechanism is unknown, increased systemic vascular resistance can occur after SAH, leading to transient lactic acidosis with the formation of neurogenic pulmonary edema, resulting in poor patient FIGURE 7 | Variable importance in RF models. RF, random forest; Model1 was adjusted for GCS, glucose, sodium, chloride, SPO 2 , bicarbonate, temperature, white blood cell (WBC), heparin use, and SOFA score. Model2 was adjusted for GCS, glucose, sodium, chloride, SPO 2 , bicarbonate, temperature, white blood cell (WBC), heparin use, SOFA score, creatinine, bun, platelet, age, marital, trauma, lymphocytes, calcium, race, and cerebrovascular disease.
outcomes reported in a case study (19), Satoh et al. found that patients presenting with neurogenic pulmonary edema had lower bicarbonate concentrations (20). In addition, Stephan et al. found that one in five patients had abnormally low bicarbonate levels on admission and a poor prognosis (21).  Our study found that the use of heparin in SAH patients was able to improve outcomes, which was consistent with the findings of Post et al. (22) that the use of heparin was able to reduce mortality after SAH. The concomitant use of low-dose heparin may reduce the risk of thrombosis and reduce the poor prognosis resulting from thrombus shedding (23,24). In summary, the characteristic factors screened by RFE in our study were all investigated in SAH; meanwhile, they were all correlated with prognosis. The strength of this study is that the method of ML was used to combine the relevant factors to predict the mortality of SAH, while feature acquisition was simple and able to be acquired within a smaller hospital. Patients with SAH are sicker, and early and accurate prediction of mortality is able to provide clinicians with more time to adjust the corresponding treatment options, while, at the same time, in clinical work, further treatment should be given to the related diseases. In addition, the validation set was adopted in this study to verify the reliability of the model so that it had better reliability. Finally, most of the data in this study come from publicly available databases, and their data have good reliability.
Our study has limitations, which are similar to most studies related to public databases. First, the MIMIC database cannot provide the relevant imaging examination of cases. Therefore, we cannot perform an M-Fisher score on patients to establish a model nor can we evaluate whether patients have obvious trauma information and the nature of aneurysms.
Second, as a public database, the MIMIC database may cause data errors due to the errors of researchers or the database itself when extracting data. In addition, there is the possibility of SAH error classification. In order to reduce the deviation caused by inaccurate code, we adopt the extensively used ICD-9 and -10 codes. Third, as with all potential retrospective studies, there are unmeasured confounding factors that lead to selection bias. Finally, although our study explored the mortality of SAH in the intensive care unit, other results, such as long-term prognosis and complications, also need further study.

CONCLUSION
This study suggests that some important features may be related to the prognosis after SAH. The ML model deals with a large number of variables and then distinguishes patients who die in hospitals to promote the implementation of timely and effective treatment. In the future, further verification of its clinical application value will be necessary.

DATA AVAILABILITY STATEMENT
Publicly available data sets were analyzed in this study. This data can be found here: https://mimic.mit.edu/.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Massachusetts Institute of Technology (Cambridge, Massachusetts). Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.