The predictors of death within 1 year in acute ischemic stroke patients based on machine learning

Objective To explore the predictors of death in acute ischemic stroke (AIS) patients within 1 year based on machine learning (ML) algorithms. Methods This study retrospectively analyzed the clinical data of patients hospitalized and diagnosed with AIS in the Second Affiliated Hospital of Xuzhou Medical University between August 2017 and July 2019. The patients were randomly divided into training and validation sets at a ratio of 7:3, and the clinical characteristic variables of the patients were screened using univariate and multivariate logistics regression. Six ML algorithms, including logistic regression (LR), gradient boosting machine (GBM), extreme gradient boosting (XGB), random forest (RF), decision tree (DT), and naive Bayes classifier (NBC), were applied to develop models to predict death in AIS patients within 1 year. During training, a 10-fold cross-validation approach was used to validate the training set internally, and the models were interpreted using important ranking and the SHapley Additive exPlanations (SHAP) principle. The validation set was used to externally validate the models. Ultimately, the highest-performing model was selected to build a web-based calculator. Results Multivariate logistic regression analysis revealed that C-reactive protein (CRP), homocysteine (HCY) levels, stroke severity (SS), and the number of stroke lesions (NOS) were independent risk factors for death within 1 year in patients with AIS. The area under the curve value of the XGB model was 0.846, which was the highest among the six ML algorithms. Therefore, we built an ML network calculator (https://mlmedicine-de-stroke-de-stroke-m5pijk.streamlitapp.com/) based on XGB to predict death in AIS patients within 1 year. Conclusions The network calculator based on the XGB model developed in this study can help clinicians make more personalized and rational clinical decisions.


. Introduction
Acute ischemic stroke (AIS) is a disease caused by the occlusion of cerebral arteries, accompanied by brain tissue infarction and neuronal cell damage, causing severe trauma to the body. AIS is the leading cause of disability in adults and the primary cause of human death worldwide (1,2). In 2019, there were 7,630,800 cases of AIS globally, an 87.55% increase compared to the previous 30 years. The high morbidity, mortality, and disability rates associated with AIS impose a severe economic burden on society and families (3). Several factors may have a significant impact on the pathogenesis and prognosis of patients with AIS, including the immune inflammatory response during AIS development, with the involvement of different pathways and sources of activated inflammatory factors, and is an important regulator of stroke progression, post-stroke damage, cerebral function repair and death (4)(5)(6). Approximately 10% of AIS patients, representing a type of morbidity, experience a fatal event within 1 year (7). There is an urgent need to identify the early and effective predictors of death 1 year after the onset of AIS. The construction of a model of death prediction in stroke patients within 1 year could provide clinicians with a reliable tool to assess the condition of their patients. However, there are few reports in this area.
ML-assisted clinical decision-making and analysis have been widely used in clinical settings (8)(9)(10)(11), especially in the screening phase of big data feature variables (12,13). The superior performance demonstrated by ML algorithms in medical big data makes it possible to obtain better predictive tools than traditional statistical models under certain conditions. However, few studies have been conducted to screen the risk factors of death in AIS patients within 1 year using ML algorithms.
Therefore, this study aimed to develop and validate an interpretable ML model that used clinically relevant variables to predict death within 1 year in AIS patients and construct an easyto-use web calculator as a convenient and practical protective tool for clinical practitioners to provide valid information for AIS patients.
. Materials and methods

. . Subjects
Patients who were hospitalized in the Department of Neurology of the Second Affiliated Hospital of Xuzhou Medical University and diagnosed with AIS between August 2017 and July 2019 were retrospectively analyzed. A total of 677 patients with AIS were included in this study, 32 of whom died after admission and during follow-up.

. . Inclusion and exclusion criteria
The inclusion criteria were a diagnosis of AIS in accordance with the World Health Organization criteria, and the time between onset and hospital admission did not exceed 24 h. The exclusion criteria were: (1) incomplete clinical data, (2) those with severely abnormal organ function, (3) inadequate ancillary investigations, (4)

. . . Statistical methods
This study used R version 4.0.5 software for data processing and statistical analyses. Continuous variables are expressed as the median or interquartile range (IQR) while categorical variables are presented as frequencies (percentage, %). The continuous variables were compared by independent samples t-tests and the categorical variables were compared using χ 2 -tests. Understanding the relationship between the independent and dependent variables was clinically meaningful and P-values of < 0.05 were considered statistically significant (two-sided).

. . . Modeling of machine learning algorithms
Univariate and multivariate logistic regression analyses were used to assess the risk factors of death within 1 year in the training group study population. The odds ratio (OR) and 95% confidence interval (CI) were calculated, with an OR of > 1 indicating that the variable was a risk factor, and P < 0.05 considered to indicate a statistically significant difference. Then, the factors that were significant in both univariate and multivariate logistic regression   were included and subjected to stepwise regression analysis. The factors selected by stepwise regression were used as input variables to construct ML models. The ML algorithm process was based on Python (V3.7) software and the scikit-learn (version 0.24) library. First, the original dataset was randomly divided into training and test sets at a ratio of 7:3. Then, six machine algorithms [logistic regression (LR), gradient boosting machine (GBM), extreme gradient boosting (XGB), random forest (RF), decision tree (DT), and naive Bayes classifier (NBC)] were used to analyze the data and construct the model. To validate the predictive power of the model, the 10-fold cross-validation method was used for internal validation against the training group. The random search method was used to adjust the hyperparameters of the models.
In the test group, the area under the receiver operating characteristic curve (ROC-AUC), classification accuracy, recall, specificity, and F1 score were used to evaluate the prediction models. We also plotted the prediction recall curve (PRC) as a complementary metric to evaluate the model performance.

. . . Interpretation of the model and importance of features
To illustrate the risk factors of death within 1 year in AIS patients, Shapley Additive explanation (SHAP) analysis was used to interpret the predictive models ranked in terms of feature importance. SHAP analysis is a tool proposed by Lloyd Shapley in game theory to explain the output of machine learning models. The core idea is to calculate the marginal contribution of a variable feature when it is added to the model, and then to interpret the global and local levels of the "black box model" in an additive explanatory model (14,15). That is, it can assign predictive values to each feature and evaluate and visualize the contribution of each feature to the outcome of the machine learning model (16). Ultimately, a web-based calculator based on the best-performing model was created for inputting patient data to facilitate the clinicians' assessment of death within 1 year in AIS patients.

. . Baseline patient data characteristics
In this study, clinical information was collected on 677 AIS patients, of whom 645 survived and 32 died of AIS (Table 1)

. . Univariate and multivariate regression analysis of death within year in AIS patients
In the univariate regression analysis of death within 1 year in AIS patients (Table 2), there was a statistically significant difference (P < 0.05) in the overall population for death within 1 year according to NOS, FBG, HBALC, MB, and CRP levels, as well as anticoagulation therapy, PPI treatment, and SS.

. . Machine learning model building and validation
To compare the predictive performance of the six ML algorithm models, this study performed 10-fold cross-validation within the training group. The results are shown in Figure 1. Figure 2 shows the ROC curves of the predictive performance differences of the six ML algorithm models after external validation, and Figure 3 shows the result of radar plot analysis, which is a blanket, clear, intuitive, and easy-to-judge analysis and is suitable for comprehensive evaluation as it can show the AUC value, accuracy, recall, and F1 value of the models in multiple dimensions ( Figure 3, Table 3) to more clearly reflect the performance of the models. The PRC curves of the mortality prediction model are shown in Supplementary Figure 1.
The results suggest that the XGB model performed best in predicting death within 1 year in AIS patients after a comprehensive evaluation. The remaining models were ranked in descending order according to their predictive performance.
In summary, we finally adopted the XGB model as the preferred predictive model.

. . Relative importance of variables in ML algorithms
A SHAP interpretability study was used to analyze the results of the ML models. Generally, the higher the SHAP value of a feature, .
/fneur. . the higher the probability of the occurrence of the target event. In SHAP analysis, red represents the eigenvalues with positive impact on the model and blue represents the eigenvalues with negative impact on the model (17). The results of the study suggest that SS was the most important variable, followed by CRP, HCY, and NOS in descending order of importance, as shown in Figure 4.

. . The web calculator
A web-based calculator based on the XGB model was developed in this study. By entering the clinical characteristic variables of a patient with AIS, clinicians could predict their risk of death  within 1 year (https://mlmedicine-de-stroke-de-stroke-m5pijk. streamlitapp.com/; Figure 5).

. Discussion
In this study, we retrospectively analyzed the clinical data of AIS patients and developed a web-based calculator with ML algorithms to predict the risk of death within 1 year. The accuracy and rationality of the model were validated by 10-fold cross-validation, allowing the model to be used for clinical practice to help clinicians make more rational treatment decisions.
ML is an emerging field of medicine that has demonstrated an extraordinary ability to handle large, complicated, and disparate data, and is the future of biomedical research, personalized medicine, and computer-aided diagnosis. It holds the promise of significantly advancing global healthcare (18,19). Unlike traditional predictive models, ML is very good at discovering complex structures in selected variables in high-dimensional data and can easily combine a large number of variables (20,21). ML has been reported to improve the predictive accuracy of long-term prognoses for AIS patients (8,10).
. /fneur. . In this study, six ML methods were used to analyze and construct a model of death prediction within 1 year in AIS patients, and the performance of the six ML algorithms was compared to each other. The XGB algorithm performed best (Figure 1), with a better AUC value than the other five algorithmic models, and the highest accuracy, sensitivity, and F1 score. Therefore, the XGB algorithm model was finally chosen.
ML models are often considered to be a black box where is difficult to explain the predictive performance, and it becomes extremely important to study the interpretability of machine learning models. Therefore, this study attempted to introduce SHAP analysis, a new method for interpreting various black-box ML models that have been previously validated based on their interpretability performance. It can achieve both local and global interpretability and has a solid theoretical foundation compared to other methods (22). The SHAP analysis used in this study could interpret the model prediction results well, and its intuitive visualization is more easily accepted. This study further built a webbased calculator to estimate the probability of death within 1 year in AIS patients to make better use of the model. AIS is characterized by a high morbidity rate, which increases the economic burden on society and families (23). It is significant to explore the factors influencing the risk of death within 1 year for patients. In this study, the mortality rate of AIS patients within 1 year was only 4.7% (32/677), which was significantly lower than the 10% reported in previous studies (7), probably because of the exclusion of those whose families discontinued treatment for various reasons. Previously, an 8-point scoring system was constructed to predict the risk of death within 7 days of hospitalization (24). Factors influencing death within 6 months of stroke onset were also reported, with variables such as the Barthel index and platelet/lymphocyte ratio screened by LASSO regression and multiple logistic regression (25). A 30-year stroke burden predictive model was established (26). In contrast, unlike many previous studies, this study innovatively used machine learning algorithms to screen variables and, to our knowledge, was the first to develop a predictive model using machine learning algorithms to assess the probability of death within 1 year in patients with AIS. There is a growing body of research on the relationship between serum inflammatory biomarkers and AIS. A number of studies showed that AIS could induce an inflammatory response, which plays a major role in late ischemic damage to the brain parenchyma, and that inflammatory responses caused by various clinical factors could lead to an increase in inflammatory factors (27,28). It is also an inflammatory factor that can indirectly indicate the presence of pathogenic microorganisms in patients when it is upraised, which can help the physician in the diagnosis and treatment. In this study, we concluded that CRP levels were the most important predictor of death within 1 year in AIS patients. Elevated CRP levels were previously reported to reflect the severity of AIS, correlate with stroke subtype and risk stratification (27,28), and be an independent predictor of long-term mortality after ischemic stroke (29). Elevated CRP levels can lead to increased mortality after stroke, which may be related to inflammation-induced endothelial cell dysfunction and platelet activation (30). HCY is a sulfurcontaining non-essential amino acid produced by metabolism in vivo as a derivative of methionine cycle demethylation. It is .
also an inflammatory substance that induces the activation of nuclear factor (NF)-kB, which is a transcription factor common to inflammation and the immune response. Elevated levels of HCY are associated with a variety of diseases, which may lead to endothelial dysfunction, neurotoxicity, and the upregulation of thrombogenic factors. At the same time, monitoring HCY levels may provide a good indication of the development of related diseases (31). Previous studies also showed that elevated HCY levels were associated with AIS dysfunction and recurrent stroke (32). A multicenter study suggested that high levels of serum HCY were an independent predictor of early neurological deterioration in AIS patients (33). This study concluded that HCY levels significantly influenced the risk of death within 1 year in AIS patients. The risk of death in patients with high HCY having 1.29 times (95% CI 1.16 -1.45) compared to ones with normal HCY. The NIHSS is a common scale used in neurology as a quantitative indicator of disease severity (34). The present study classified SS with the help of the NIHSS scale, and an NIHSS score of ≥9 was defined as moderate-to-severe stroke. Fischer et al. (35) suggested that patients with low NIHSS scores tended to have a better prognosis, which is consistent with the current study. The present study concluded that SS had a significant influence on death within 1 year in AIS patients. The risk of death in patients with moderate to severe stroke having 3.12 times (95% CI 1.03 -9.83) higher than those with mild stroke.
Neurological deficits have been associated with lesions in different brain regions (36, 37), but the relationship between the number of lesions and AIS has rarely been reported. In this study, the number of lesions was innovatively included in the analysis, and the results suggested that the number of lesions was a significant factor in death within 1 year in AIS patients. The risk of death in patients with multiple lesions was 3.44 times (95% CI: 1.41 -8.36) higher than patients with a single lesion.
There were several limitations to this study. First, the retrospective study design may have introduced selection bias, while the data imbalance that emerged from real-world studies resulted in PRC effects without the AUC number. Secondly, although our model showed good performance, its data source was limited to one medical center, which may limit its generalizability, and we will follow up with an additional multicenter study. Thirdly, further independent external validation is needed to confirm these findings. Finally, we collected AIS-related variables as comprehensively as possible, but there were still some important variables that were not available in a timely manner, which may also limit the generalizability of the study. Future research is needed to examine this issue further.

. Conclusion
The results of this study suggest that serum inflammatory markers (CRP and HCY), SS, and NOS are independent risk factors of death within 1 year in AIS patients. The XGB algorithm showed good performance as a tool to predict death within 1 year in AIS patients. Using this web-based calculator can effectively prevent death, reduce mortality, and assist physicians in making treatment decisions.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The study was written approved by the Ethics Committee of the Second Affiliated Hospital of Xuzhou Medical University [ethics number: [2020] 081603]. The patients/participants provided their written informed consent to participate in this study.

Author contributions
WLi, LR, and XW completed the study design. KW, WLiu, and WLi performed the study and collected and analyzed the data.
LG and WLi drafted the manuscript. LR, XW, KW, and HL provided the expert consultations and suggestions. CX and CY conceived of the study, participated in its design and coordination, and helped to embellish language. All authors reviewed the final version of the manuscript.