Machine Learning in Action: Stroke Diagnosis and Outcome Prediction

The application of machine learning has rapidly evolved in medicine over the past decade. In stroke, commercially available machine learning algorithms have already been incorporated into clinical application for rapid diagnosis. The creation and advancement of deep learning techniques have greatly improved clinical utilization of machine learning tools and new algorithms continue to emerge with improved accuracy in stroke diagnosis and outcome prediction. Although imaging-based feature recognition and segmentation have significantly facilitated rapid stroke diagnosis and triaging, stroke prognostication is dependent on a multitude of patient specific as well as clinical factors and hence accurate outcome prediction remains challenging. Despite its vital role in stroke diagnosis and prognostication, it is important to recognize that machine learning output is only as good as the input data and the appropriateness of algorithm applied to any specific data set. Additionally, many studies on machine learning tend to be limited by small sample size and hence concerted efforts to collate data could improve evaluation of future machine learning tools in stroke. In the present state, machine learning technology serves as a helpful and efficient tool for rapid clinical decision making while oversight from clinical experts is still required to address specific aspects not accounted for in an automated algorithm. This article provides an overview of machine learning technology and a tabulated review of pertinent machine learning studies related to stroke diagnosis and outcome prediction.


INTRODUCTION
The term machine learning (ML) was coined by Arthur Samuel in 1959 (1). He investigated two machine learning procedures using the game of checkers and concluded that computers can be programmed quickly to play a better game of checkers than the person who wrote the program. Simply put, machine learning can be defined as a subfield of artificial intelligence (AI) that uses computerized algorithms to automatically improve performance through iterative learning process or experience (i.e., data acquisition) (2). Of late, the field of ML has vastly evolved with the development of various computerized algorithms for pattern recognition and data assimilation to improve predictions, decisions, perceptions, and actions across various fields and serves as an extension to the traditional statistical approaches. In our day-to-day life, a relatable example of ML is the application of spam filters to the 319 billion emails sent and received daily worldwide, of which, nearly 50% can be classified as spam (3). Use of ML technology has made this process efficient and manageable. The ML technology utilizes various methods for automated data analysis including linear and logistic regression models as well as other methods such as the support vector machines (SVM), random forests (RF), classification trees and discriminant analysis that allow combination of features (data points) in a non-linear manner with flexible decision boundaries. The advent of neural networks and deep learning (DL) technology has transformed the field of ML with automatic and efficient feature identification and processing within a covert analytic network, without the need for a priori feature selection. Notably, performance of DL is known to improve with access to larger datasets, whereas classic ML methods tend to plateau at relatively lower performance levels. Hence, in this era of big data where clinicians are constantly inundated with plethora of clinical information, use of DL technology has significnalty enhanced our ability to assimilate the vast amount of clinical data to make expeditious clinical decision.
Stroke is a leading cause of death, disability, and cognitive impairment in the United States (4). According to the 2013 policy statement from the American Heart Association, an estimated 4% of US adults will suffer from a stroke by 2030, accounting for total annual stroke-related medical cost of $240.67 billion by 2030 (5). For ischemic stroke, acute management is highly dependent on prompt diagnosis. According to the current ischemic stroke guidelines, patients are eligible for intravenous thrombolysis up to 4.5 h from symptom onset and endovascular thrombectomy without advanced imaging within 6 h of symptom onset (6)(7)(8). For patients presenting between 6 and 24 h of symptom onset (or last known well time), advanced imaging is recommended to assess salvageable penumbra for decisions regarding endovascular therapy (9)(10)(11). Similarly for hemorrhagic stroke, timely diagnosis utilizing imaging technology to evaluate the type and etiology of hemorrhage is important in guiding acute treatment decisions. Prompt diagnosis with emergent treatment decision and accurate prognostication is hence the cornerstone of acute stroke management. Over the recent years, a multitude of ML methodologies have been applied to stroke for various purposes, including diagnosis of stroke (12,13), prediction of stroke symptom onset (14,15), assessment of stroke severity (16,17), characterization of clot composition (18), analysis of cerebral edema (19), prediction of hematoma expansion (20), and outcome prediction (21)(22)(23). In particular, there has been a rapid increase in the trend of ML application for imaging-based stroke diagnosis and outcome prediction. The Ischemic Stroke Lesion Segmentation Challenge (ISLES: http:// www.isles-challenge.org/) provides a global competing platform encouraging teams across the world to develop advanced tools for stroke lesion analysis using ML. In this platform, competitors train their algorithms on a standardized dataset and eventually generate benchmarks for algorithm performance.
Deciding which type of ML to use on a specific dataset depends on factors such as the size of dataset, need for supervision, ability to learn, and the generalizability of the model (24). DL technology such as the deep neural networks has significantly improved the ability for image segmentation, automated featurization (e.g., conversion of raw signal into clinically useful parameter), and multimodal prognostication in stroke; and it is increasingly utilized in stroke-based applications (25)(26)(27). For example, DL algorithms can be applied to extract meaningful imaging features for image processing in an increasing order of hierarchical complexity to make predictions, such as the final infarct volume (27). Some commonly used ML types with their respective algorithms and practical examples are outlined in Figures 1-3. In the healthcare setting, supervised and unsupervised algorithms are both commonly used. In this review, we will specifically focus on ML strategies for stroke diagnosis and outcome prediction. Table 1 provides an overview of pertinent studies with use of ML in stroke diagnosis (Section A) and outcome prediction (Section B). A glossary of machine learning terms with brief description is separately provided in Supplementary Table 1.

METHODS
We searched PubMed, Google Scholar, Web of Science, and IEEE Xplore R for relevant articles using various combination of the following key words: "machine learning, " "artificial intelligence, " "stroke, " "ischemic stroke, " "hemorrhagic stroke, " "diagnosis, " "prognosis, " "outcome, " "big data, " and "outcome prediction." Resulting abstracts were screened by all authors and articles were hand-picked for full review based on relevance and scientific integrity. Final article list was reviewed and approved by all authors.

Machine Learning in Stroke Diagnosis
The time-sensitive nature of stroke care underpins the need for accurate and rapid tools to assist in stroke diagnosis. Over the recent years, the science of brain imaging has vastly advanced with the availability of a myriad of AI based diagnostic imaging algorithms (77). Machine learning is particularly useful in diagnosis of acute stroke with large vessel occlusion (LVO). Various automated methods for detection of stroke core and penumbra size as well as mismatch quantification and detection of vascular thrombi have recently been developed (77). Over the past decade, 13 different companies have developed automated and semi-automated commercially available software for acute stroke diagnostics (Aidoc R , Apollo Medical Imaging Technology R , Brainomix R , inferVISION R , RAPID R , JLK Inspection R , Max-Q AI R , Nico.lab R , Olea Medical R , Qure.ai R , Viz.ai R , and Zebra Medical Vision R ) (78). The RapidAI R and Viz.ai R technology have been approved under the medical device category of computer-assisted triage by the United States Food and Drug Administration (FDA). The RAPID MRI R (Rapid processing of Perfusion and Diffusion) software allows for an unsupervised, fully-automated processing of perfusion and diffusion data to identify those who may benefit from thrombectomy based on the mismatch ratio (79). Such commercial platforms available for automatic detection of ischemic stroke and LVO have facilitated rapid treatment decisions. When compared to manual  segmentation of lesion volume and mismatch identification from patients enrolled in DEFUSE 2, the RAPID results were found to be well-correlated (r 2 = 0.99 and 0.96 for diffusion and perfusion weighted imaging, respectively) with 100% sensitivity and 91% specificity for mismatch identification (80). Since 2008, the RapidAI R platform has expanded to include other products (Rapid R ICH, ASPECTS, CTA, LVO, CTP, MRI, Angio, and Aneurysm) that assist across the entire spectrum of stroke. Viz LVO R was the first FDA-cleared software to detect and alert clinicians of LVO via the "Viz Platform" (81). In a recent single center study with 1,167 CTAs analyzed, Viz LVO R was found to have a sensitivity of 0.81 and a negative predictive value of 0.99 with an accuracy of 0.94 (82).
Other areas of stroke diagnostics that have seen an increase in attention over the past decade are the identification of intracerebral hemorrhage (ICH) and patients at risk for delayed cerebral ischemia in the setting of aneurysmal subarachnoid hemorrhage (aSAH). While most studies tend to have good accuracy in detecting an ICH there is more variability in subclassification and measurements of hematoma volume. A summary of recent publications on ML in stroke diagnosis is presented in Table 1 (Section A).

Machine Learning in Stroke Outcome Prediction
Despite recent advances in stroke care, it remains the second leading cause of death and disability world-wide (4,83). Although acute stroke diagnosis and determination of the time of stroke onset are the initial steps of comprehensive stroke management, clinicians are also often charged with the task of determining stroke outcomes. These outcomes range from discrete radiological outcomes (e.g., final infarct volume, the likelihood of hemorrhagic transformation, etc.), the likelihood of morbidity (e.g., stroke-associated pneumonia) and mortality, and various measures of functional independence (e.g., mRS score, Barthel Index score, cognitive, and language function, etc.). Prognostication after an acute brain injury is notoriously challenging, particularly within the first 24-48 h (84). However, a clinician may be called upon to provide estimates of a patient's short-term and long-term mortality and degree of functional dependence to assist with decision-making regarding the intensity of care (e.g., use of thrombolytics or endovascular treatment, intubation, code status, etc.) (60,64,66,67,69,70,(72)(73)(74)(75)(76). Like all medical emergencies, it is incumbent upon the stroke clinician to ensure that all care provided is concordant with an individual patient's goals (85). For example, a surrogate decisionmaker may decline to reverse a patient's longstanding "do not intubate" order to facilitate mechanical thrombectomy if the clinician predicts the patient has a high likelihood of functional dependence or short-term mortality. Hence, accuracy in outcome prediction is critical in guiding management of our patients.
Determining a patient's likelihood of developing symptomatic intracranial hemorrhage (sICH) is of obvious, immediate value in acute stroke management in determining candidacy for thrombolytic therapy or endovascular treatment. Historically, clinician-based prognostication tools to predict the risk of symptomatic intracranial hemorrhage after IV thrombolysis, such as the SEDAN (Sugar, Early Infarct signs, Dense cerebral artery sign, Age, and NIHSS) and HAT (Hemorrhage After Thrombolysis) scores have been used to predict the risk of symptomatic intracranial hemorrhage after IV thrombolysis (23). Advances in ML and DL have allowed for the development of more accurate models which outperform the traditional SEDAN and HAT scores (23,54,55). Similarly, the ability to predict final infarct volume and the likelihood of the development of malignant cerebral edema have important treatment implications and remain a significant focus of ML in stroke (26,(51)(52)(53).
In patients with intracerebral hemorrhage (ICH), the ICHscore is one of the most widely used clinical prediction scores (85)(86)(87)(88). Although ML technology for outcome prediction has rapidly advanced for ischemic stroke, recent ML studies predicting functional outcomes after ICH have also demonstrated high-discriminating power (63,89). A recent study by Sennfält et al. tracked long-term functional dependence and mortality after an acute ischemic stroke of more than 20,000 Swedish patients (90). The 30-day mortality rate was 11.1%. At 5 years,

DISCUSSION
In recent years, some DL algorithms have approached human levels of performance in object recognition (91). One of the greatest strengths of ML is its ability to endlessly process data and tirelessly perform an iterative task. Further, creation of a ML model can be performed much faster (i.e., in a matter of 5-6 days compared with 5-6 months or even years) than traditional computer-aided detection and diagnosis (CAD) (92). which makes ML an attractive field for computer experts and scientists. Several ML tools are currently in use including the FDA-approved ML algorithms previously discussed for rapid stroke diagnosis which have significantly enhanced the workflow of acute ischemic stroke patients. Despite the prolific advent of new and improved ML algorithms with increasing clinical applications, it is important to recognize that computer-based algorithms are only as good as the data used to train the models. For a reliable algorithm, it is important to develop well-defined training, validation, and testing sets. Testing should be done on a diverse set of data points reflective of a real-world scenario. Overfitting can be an issue in ML algorithms when the model is trained on a group of highly-selected, specific features, which when tested on a larger dataset with varied features, fails to perform adequately. Similarly, underfitting can occur when a model is oversimplified with generalized feature selection in the training set which then becomes unable to capture the relevant features within a complex pattern of a larger or more diverse testing set. The aphorism "garbage in, garbage out" remains true as the use of inadequate or unvalidated data points (e.g., unverified clinical reports from electronic health record) in the training set can lead to poor performance of the ML algorithm in the testing set. Hence, it is important to note that the algorithmic decision-making tools do not guarantee accurate and unbiased interpretation compared to established logistic regression models (56,59,93). Comparisons to well-established models should be standard when developing new ML algorithms given the high cost associated with ML (e.g., the time required to collect data, train the model, perform internal and external validations, cost of reliable and secure data storage, etc.) (94). Specifically, as it relates to diagnostics there are a myriad of considerations that must be taken into account. Not only should the algorithm provide accurate information quickly, but it should have the ability to integrate into the electornic health record (EHR) to improve end user experience and efficiency in workflow. Programs such as RAPID R , Viz.ai R , and Brainomix R have started to successfully integrate into the EHR, which has helped expedite acute stroke diagnosis and triage process. One of the major technical challenges of ML include the ability to develop an algorithm with a "reasonable" detection rate of pathology without an excessive rate of false-positives. For example, there are notable discrepancies among various ML studies for ICH diagnosis, with varying accuracy depending on the type of ICH (e.g., spontaneous ICH, SDH, aSAH, or IVH). Overfitting and underfitting of the model could lead to poor applicability and therefore, image preprocessing with meticulous feature selection is necessary. Furthermore, the "black-box" nature of ML precludes the clinicians from identifying and addressing biases within the algorithms (95,96). Hence, proper external validation is necessary to ensure generalizability of the algorithm in diverse clinical scenarios.
For stroke prediction, most existing ML algorithms utilize dichotomized outcomes. Functional outcome is frequently defined as "good" when mRS score is 0-2 and "poor" when mRS score is 3-6 by convention and IS studies often measure mRS score at 90 days after stroke (64)(65)(66)(67)(68)(69)97). However, the medical community is increasingly embracing patientcentered outcomes. People are starting to recognize the need for longitudinal patient follow-up given potential for functional improvement beyond conventional norms of 90 days (98). Once patient-centered outcomes are clinically validated (e.g., MRS cutoff of 0-2 vs. 3-6, 0-3 vs. 4-6, or 0-4 vs. 5-6), new ML algorithms incorporating such outcomes would be increasingly helpful to the clinicians. The use of high-yield, ML programs using patient-centered outcomes could ease the commonplace but challenging discussions of the anticipated quality of life and the risk of long-term dependency or death before deciding on a patient's goals-of-care. It is however important to apply caution while using ML algorithms for outcome prediction as patient demographics and clinical practice continue to evolve and updates to the ML algorithms would be necessary to remain applicable to evolving patient populations and clinical standards. Additionally, developers often retrieve data from existing datasets (e.g., clinical trial data) with its inherent biases including selection bias, observer bias and other confounders (e.g., withdrawal of life supporting therapy may be more common in older patients with large hemispheric stroke compared to younger patients, which could confound outcome prediction in older patients compared to younger ones).
Overall, compared to other diseases such as Alzheimer's disease, there is a relative paucity of large, high-quality datasets within stroke. Some limitations that have stymied the development of large, open-access stroke registries include the need for data-sharing agreements, patient privacy concerns, high costs of data storage and security, arbitration of quality control of the input data, etc. (95). Cohesive and collaborative efforts across hospital systems, regions, and nations with data acquisition and harmonization is needed to improve future ML-based programs in stroke. With adoption of EHR systems, healthcare data is rapidly accumulating with an estimated over 35 zettabytes of existing healthcare data! (99). Adoption of AI and ML algorithms allow us to efficiently process the plethora of information that surround us every day. Nonetheless, as we continue to adapt to this evolving landscape of medical practice surrounding big data, clinicians need to remain aware of the limitations of this modern day "black box" magic.

CONCLUSION
The emerging ML technology has rapidly integrated into multiple fields of medicine including stroke. Deep learning has significantly enhanced practical applications of ML and some newer algorithms are known to have comparable accuracy to humans. However, the diagnosis and prognosis of a disease, including stroke, is highly intricate and depends on various clinical and personal factors. The development of optimal ML programs requires comprehensive data collection and assimilation to improve diagnostic and prognostic accuracy. Given the "black box" or cryptic nature of these algorithms, it is extremely important for the end-user (i.e., clinicians) to understand the intended use and limitations of any ML algorithm to avoid inaccurate data interpretation. Although ML algorithms have improved stroke systems of care, blind dependence on such computerized technology may lead to misdiagnosis or inaccurate prediction of prognostic trajectories. At the current state, ML tools are best used as "aids" for clinical decision making while still requiring oversight to address relevant clinical aspects that are overlooked by the algorithm.

AUTHOR CONTRIBUTIONS
SM: substantial contributions including conception and design of the work, literature review, interpretation and summarization of data, drafting the complete manuscript, revising it critically for important intellectual content, and final approval of the manuscript to be published. MD and KS: contribution including conception and design of the work, literature review, interpretation and summarization of the data, drafting of critical portion of the manuscript, critical revision for important intellectual content, and final approval of the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This article was supported by the Virginia Commonwealth University, Department of Neurology.