The predictive performance of artificial intelligence on the outcome of stroke: a systematic review and meta-analysis

Objectives This study aimed to assess the accuracy of artificial intelligence (AI) models in predicting the prognosis of stroke. Methods We searched PubMed, Embase, and Web of Science databases to identify studies using AI for acute stroke prognosis prediction from the database inception to February 2023. Selected studies were designed cohorts and had complete data. We used the Quality Assessment of Diagnostic Accuracy Studies tool to assess the qualities and bias of included studies and used a random-effects model to summarize and analyze the data. We used the area under curve (AUC) as an indicator of the predictive accuracy of AI models. Results We retrieved a total of 1,241 publications and finally included seven studies. There was a low risk of bias and no significant heterogeneity in the final seven studies. The total pooled AUC under the fixed-effects model was 0.872 with a 95% CI of (0.862–0.881). The DL subgroup showed its AUC of 0.888 (95%CI 0.872–0.904). The LR subgroup showed its AUC 0.852 (95%CI 0.835–0.869). The RF subgroup showed its AUC 0.863 (95%CI 0.845–0.882). The SVM subgroup showed its AUC 0.905 (95%CI 0.857–0.952). The Xgboost subgroup showed its AUC 0.905 (95%CI 0.805–1.000). Conclusion The accuracy of AI models in predicting the outcomes of ischemic stroke is good from our study. It could be an assisting tool for physicians in judging the outcomes of stroke patients. With the update of AI algorithms and the use of big data, further AI predictive models will perform better.

Frontiers in Neuroscience 02 frontiersin.org2015; Esteva et al., 2019).ML techniques utilize various methods for automated data analysis, including logistic regression (LR), random forests (RF), support vector machines (SVM), and classification trees, which allow combining features (data characteristics) with flexible decision boundaries in a non-linear manner.The advent of neural networks (NN) and deep learning (DL) techniques has changed the ML domain and achieved automatic and efficient feature recognition and processing in covert analysis networks without prior feature selection.There were some studies suggesting that ML and DL have again recently achieved substantial improvements and demonstrated comparable performance to trained physicians in the fields of other departments, like radiology and dermatology (Gulshan et al., 2016;Esteva et al., 2017;Hannun et al., 2019).Acute stroke ranks among the leading causes of morbidity and mortality worldwide, and it can be divided into ischemic stroke and hemorrhagic stroke (Toyoda et al., 2022).In addition, predicting the outcome of a stroke often depends on the experience of the physician clinically, but it is difficult for inexperienced young physicians to judge the prognosis.In clinical, patients are most concerned about their clinical outcomes.Imagine that you are a young inexperienced physician and you are on duty in the ward, your patient asks you about the outcomes after treatment and you cannot ensure the judgment is right based on your own experience.If there is an objective tool at hand to predict the prognosis according to the patient's condition quickly and accurately, using this tool to corroborate your judgment will make you more confident in judging the prognosis of your patient.Exactly, AI predictive models can bring objective results after learning input features and countless calculations.ML predictive models which are image-based feature recognition and segmentation and have greatly facilitated the rapid diagnosis of stroke, but stroke prognosis depends on a large number of patientspecific and clinical factors, so accurate prognostic prediction models remain challenging (Mendelson and Prabhakaran, 2021;Toyoda et al., 2022).
Although previous studies on predicting stroke prognosis also used many AI algorithms, the overall accuracy of AI models in predicting stroke prognosis is inconsistent.Tree-based algorithms own favorable interpretability and a relatively simple algorithm, and researchers that used RF algorithms performed high-accuracy prediction prognosis of acute ischemic stroke patients (AUC = 0.936 ± 0.034) and primary intracerebral hemorrhage stroke patients (AUC = 0.917) (Monteiro et al., 2018;Wang et al., 2019).However, the samples of these studies were not "big" datasets and had relatively poor representativeness.SVM is frequently used in predicting stroke outcomes relying on neuroimaging data and showing moderate to high accuracy prediction of prognosis, with an AUC ranging from 0.788 to 0.92 (Forkert et al., 2015;Giacalone et al., 2018;Lin et al., 2018;Nishi et al., 2019;Li et al., 2022;Roh et al., 2022).The present most complex algorithms, deep neural network (DNN), a model of DL, ran a bigger sample analysis and performed high accuracy prediction prognosis (AUC = 0.904) in minor stroke patients (Sung et al., 2020) and moderate accuracy prediction prognosis (AUC = 0.88 ± 0.12 and 0.888 ± 0.008) in acute ischemic stroke (Nielsen et al., 2018;Heo et al., 2019) but had poor interpretability for input variables.
This study aimed to analyze the literature to explore the accuracy of AI models in stroke outcome prediction and compare the AUC among different algorithms.

Methods
We performed this study according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (McInnes et al., 2018).

Selection criteria
We searched "acute stroke" "artificial intelligence" "deep learning" "machine learning" "prognosis" and "outcome" in PubMed, Embase, and Web of Science databases from the inception to February 2023 and assessed eligible articles by screening titles and abstracts followed by full-text evaluation (Figure 1).In addition, we formulated our included studies as follows: (1) Population: patients diagnosed with acute ischemic or hemorrhagic stroke in retrospective and prospective cohorts and had prognosis data.The prognosis of stroke not only included the functional outcomes but also the radiological outcomes, the likelihood of morbidity, and mortality (Toyoda et al., 2022).(2) Index test: the predictive prognosis results of AI. (3) Reference standard: recognized prognosis recorded in included studies.(4) Outcomes: area under the curve (AUC) with its 95% confidence interval or standard error of receiver operator characteristic in AI models on stroke prognosis.

Data extraction and quality assessment
Two independent investigators extracted the following information from the included studies: first author, publication year, country, population data (age and sex), and outcomes.Data extraction forms included details on the included study characteristics.Two investigators assessed the quality and bias of studies independently by using The Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) scores.In the whole process of data extraction and quality assessment, all different opinions were solved through discussion with the third reviewer.

Statistical analysis
We pooled the outcome data using a fixed effects model.Heterogeneity among studies was evaluated using the Q statistic and I-squared test (I 2 ).Significant heterogeneity was defined as p-value <0.05 or I 2 > 50%.For sensitivity analyses, one by one elimination method was performed to investigate the robustness of the results.
All statistical analyses were performed using MedCalc ® statistical

Search results and included studies characteristics
We retrieved a total of 1,241 publications.Finally, we included seven studies with 4,379 ischemic stroke participants.The detailed flow chart is shown in Figure 1.In the seven included studies, there were 17 predictive models and they were divided into five subgroups according to their algorithms (SVM, RF, LR, DL, and Xgboost).
The characteristics of all included studies are shown in Table 1.In our review, four articles (Monteiro et al., 2018;

Quality assessment
All included studies had low to moderate risks in QUADAS scores, and the risk of bias is shown in Figure 2. The heterogeneity test of studies for analyzing AUC showed I 2 = 27.67%,suggesting that there was no significant heterogeneity in the study.

Discussion
To our knowledge, this is the first meta-analysis to study AI models' performance in predicting stroke outcomes.In the final included studies, the participants were ischemic stroke patients.Many studies on the outcome prediction of hemorrhagic stroke did not meet our inclusion criteria.Thus, our results apply to ischemic stroke patients only.According to our results, the overall performance of the predictive for ischemic stroke outcomes is good.In subgroup analysis, the SVM was the most accurate and the LR models were the least in terms of comparison AUC.Previously, many studies in stroke outcomes used some variant of linear regression models, which are generally easier to interpret, but they do not automatically exploit nonlinear relationships and interactions, leading to poor prediction accuracy.The prognostic models from the Virtual International Stroke Trials Archive on functional outcome and survival are AUC 0.808 and 0.706 (Konig et al., 2008), respectively, which is a relatively accurate and simple prediction scale, but compared to AI predictive models, AI models show better accurate performance.The biomarker-based CoRisk score was AUC 0.819, of which the score components were copeptin levels, age, NIH Stroke Scale, and recanalization therapy (De Marchis et al., 2019).The prediction model score showed relatively good accuracy and interpretation, but whether it is necessary to detect the plasma copeptin levels in clinical is still to be decided.In our study, the participants that included in our analysis used rapid prediction from medical data obtained at the time of presentation at the emergency department or obtained imaging data after admission or interventional procedures.This indicates that the use of AI to build a prediction model has better clinical applicability.However, clinicians typically have a limited understanding of this methodology.Therefore, we conducted this systematic review and meta-analysis to clarify how AI models can provide stroke outcome prediction.
In addition, the points of prediction tools in clinical practice are feasibility and acceptability as overly complex scales will not be used in acute stroke.Although DL models have good predictive accuracy, the complexity and non-interpretability of models limit their clinical application.According to our results, though SVM and The risk of bias in the included studies.Xgboost had similar AUC, SVM seemed to perform better than Xgboost in predicting stroke outcomes due to discrete linear data or proper nonlinear kernels that fit the data better by improved generalization (Noble, 2006;Roh et al., 2022).Regrettably, we did not know why researchers chose the specific algorithm for their predictive models during the literature review.
In contrast to traditional predictive scores, most AI predictive models share a common set of independent demographic variables, laboratory values, and imaging feathers.While some variables and characteristics are not well validated individually in clinical, they may add predictive value in some cases.A comparative study of ML algorithms and traditional risk models is needed.If these studies demonstrate the advantages of ML-based prediction, then optimization algorithms can be implemented through electronic health records to facilitate the application of clinical practice.
The limitations of our study are that we assessed the accuracy of the prediction only through the AUC, not exactly focusing on the sensitivity, specificity, and accuracy rate because of incomplete data for metaanalysis.We decided to use AUC to assess the accuracy of predictive models because AUC could remain stable even when the distribution of positive and negative samples in the test set changed.Therefore, despite the simplicity of our results, they were explanatory and reliable.What is more?Because of the large scope of our review, our goal was systematic rather than comprehensive.Therefore, we might miss some relevant studies, but we consider it unlikely that these studies were of higher quality than those already included.Another problem was the sample size included in the literature, none of which could be called as big data; only one study of more than 1,000 participants, and according to previous studies, there are a large number of candidate predictor variables to analyze, so future AI predictive models must be developed for a large number of patients.

Conclusion
AI predictive models have high accuracy in predicting the outcome of stroke, which assists physicians to judge the specific outcome of a patient and adjust the treatment plan according to the outcome of the judgment.
FIGURE 1Flow chart of included literature.

TABLE 1
The characteristics of included studies.

TABLE 2
The total included AUC under the random-effects model.