Validation and refinement of a predictive nomogram using artificial intelligence: assessing in-hospital mortality in patients with large hemispheric cerebral infarction

Background Large Hemispheric Infarction (LHI) poses significant mortality and morbidity risks, necessitating predictive models for in-hospital mortality. Previous studies have explored LHI progression to malignant cerebral edema (MCE) but have not comprehensively addressed in-hospital mortality risk, especially in non-decompressive hemicraniectomy (DHC) patients. Methods Demographic, clinical, risk factor, and laboratory data were gathered. The population was randomly divided into Development and Validation Groups at a 3:1 ratio, with no statistically significant differences observed. Variable selection utilized the Bonferroni-corrected Boruta technique (p < 0.01). Logistic Regression retained essential variables, leading to the development of a nomogram. ROC and DCA curves were generated, and calibration was conducted based on the Validation Group. Results This study included 314 patients with acute anterior-circulating LHI, with 29.6% in the Death group (n = 93). Significant variables, including Glasgow Coma Score, Collateral Score, NLR, Ventilation, Non-MCA territorial involvement, and Midline Shift, were identified through the Boruta algorithm. The final Logistic Regression model led to a nomogram creation, exhibiting excellent discriminative capacity. Calibration curves in the Validation Group showed a high degree of conformity with actual observations. DCA curve analysis indicated substantial clinical net benefit within the 5 to 85% threshold range. Conclusion We have utilized NIHSS score, Collateral Score, NLR, mechanical ventilation, non-MCA territorial involvement, and midline shift to develop a highly accurate, user-friendly nomogram for predicting in-hospital mortality in LHI patients. This nomogram serves as valuable reference material for future studies on LHI patient prognosis and mortality prevention, while addressing previous research limitations.


Introduction
Large Hemispheric Cerebral Infarction (LHI) is a severe subtype of ischemic stroke, often accompanied by complications such as cerebral edema, pneumonia, and infarct transformation hemorrhage (1).LHI typically refers to a complete or partial anterior circulation infarction resulting from proximal occlusion of the internal carotid artery or the middle cerebral artery (MCA).It constitutes approximately 10% of all ischemic strokes, making it one of the most validated subtypes of ischemic stroke (2).Simultaneously, it is characterized by a heightened risk of in-hospital mortality, ranging from 30 to 80%.Given the complexity of LHI and its associated mortality risk, there is an imperative need to develop predictive models that facilitate early identification of patients at higher risk, enabling timely interventions and personalized care strategies.Previous study (3) had highlighted the potential benefits of decompressive hemicraniectomy (DHC), reporting a significant improvement in survival rates ranging from 67 to 84%.However, the limited acceptance of DHC can be attributed to its stringent eligibility criteria.Another clinical trial (4) has demonstrated the effectiveness of intravenous glyburide in reducing midline shift and mortality associated with edema in patients with large hemispheric infarction (LHI), though it did not mitigate the risk of malignant brain edema or improve 3-month functional outcomes.Despite the intricate nature of early mortality causes in LHI patients, current understanding of the treatment plan remains the needing of further research.
Given the intricate nature of LHI and its' heightened mortality risk, this study aims to proactively employ the Boruta artificial intelligence algorithm in conjunction with Logistic Regression, based on a retrospective investigation.The objective is to construct a predictive model for in-hospital mortality risk among LHI patients, facilitating real-time assessment during their hospitalization and offering valuable guidance for clinical decision-making.Furthermore, the study seeks to externally validate and propose more pragmatic enhancements to the previous research (5).The ultimate goal is to propose more practical improvements to existing researches endeavors.

Participants
Patients diagnosed with LHI at the Department of Neurology in the Third Affiliated Hospital of Soochow University between December 2018 and April 2023 were included in this study.The study received ethical approval from the Ethics Review Board of the Third Affiliated Hospital of Soochow University (Approval Number: 2023-S-080).Due to the retrospective nature of the study, the requirement for obtaining informed consent from the participants was waived.
The inclusion criteria utilized in this study are consistent with those of a prior investigation (6), in order to maintain standard uniformity in addressing the MCE issue: 1 Diagnosis of acute ischemic stroke 2 Lesion site included the blood supply area of the MCA with or without additional affected regions, and the cerebral infarction area was at least two-thirds of the blood supply area of the MCA 3 Age ≥ 18 years 4 Time from onset to admission <72 h 5 Vascular recanalization therapy, such as thrombectomy or thrombolysis, was not administered.
Patients who previously experienced severe organ dysfunction or were diagnosed with major medical conditions, including cancer, were excluded from the study.

Data collection and group division
A comprehensive dataset was constructed, encompassing demographic characteristics, clinical variables, vascular risk factors, and laboratory findings.Although a few pieces of information were found to be missing, the proportion of missing data remained within an acceptable range (less than 30%).To address this limitation, Multiple Imputation by Chained Equations was applied separately for both patient groups, which had been previously categorized based on the occurrence of in-hospital death to prevent intergroup data feature leakage.These imputation techniques were used to accurately fill in the missing data, ensuring a more robust and complete analysis.Subsequently, the entire population was randomly divided into the Development Group and Validation Group at a ratio of 3:1.The inter-group comparison of the categorical variables was conducted with the continuous correction chi-square test, while the continuous variables were compared via one-way ANOVA.No statistically significant differences were existed in all the variables and outcome between the Development Group and the Validation Group.Due to the large population distribution, it is mandatory to use normal distribution to present continuous data.Categorical data consists of composition ratios, also, the Details can be found in the Supplementary Table 1.This study was approved by the Ethics Review Board of the Third Affiliated Hospital of Soochow University (Approval Number: 2023-S-080).Due to the retrospective nature of the study, the requirement of obtaining informed consent from the participants was waived.

Imaging evaluation
Midline shift, as well as non-MCA territory involvement, which pertains to the additional inclusion of the anterior and/or posterior cerebral artery territory, were independently evaluated by two experienced physicians with over a decade of clinical practice.Precise measurements in millimeters were recorded for midline shift.In cases of discrepancies, a third senior physician provided resolution.

Variable selection
Using the Bonferroni-corrected Boruta technique (7), a wrapperbased feature selection method, variables were screened across the entire dataset to retain a subset of features correlated with the dependent variable.Throughout this process, the significance level for p-values was set at 0.01 to ensure the selected features possess a high degree of statistical significance.

Logistic regression and nomogram development
Initially, all 15 variables retained by the Boruta algorithm were included in the Logistic Regression.Subsequently, a backward stepwise regression method was employed to iteratively refine the model and reduce its Akaike Information Criterion (AIC) value.Finally, the variables NIHSS score, Collateral Status, NLR, Ventilation, Midline shift, Neutrophil-to-lymphocyte ratio, and APACHE II Score were retained.Variance inflation factor tests were conducted to assess the presence of multicollinearity among variables in the logistic regression, finally nomogram was constructed.
Meanwhile, ROC curve and DCA curve were drawn according to the Nomogram model.Calibration technique was performed based on the data from Validation Group.

Comparing two models
Utilizing the Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI) metrics, along with DCA curves, aiming to compare the performance of the two models and assess their strengths and weaknesses.The NRI and IDI metrics provide quantitative measures for the improvement in risk prediction achieved by one model over another.By incorporating these indices and examining DCA curves, we can comprehensively evaluate and compare the efficacy of the two models in predicting outcomes.This integrated approach will offer insights into the superiority or inferiority of each model, aiding in a more robust assessment of their predictive capabilities.

Demographic characteristics and clinical information
In this study, a total of 314 patients with acute anterior circulating Large Hemispheric Infarction (LHI) were recruited, with 29.6% of the patients assigned to the Death group (n = 93).The baseline characteristics and clinical information of the patients are presented in Table 1.

Boruta variable selection and logistic regression
After applying the Boruta algorithm for variable selection, a total of 15 variables were identified as significant.Which were Glasgow Coma Score, Collateral Score, White Blood Cell count, Neutrophil count, Lymphocyte count, NLR (Neutrophil-to-Lymphocyte Ratio), Ventilation, Non-MCA territorial involvement, Admission Diastolic Blood Pressure, Midline Shift, and Basal cistern, APACHE II score, NIHSS score, ASPECTS score, HbA1c.
All the 15 variables mentioned above were included in the logistic regression (Table 2), subsequently a stepwise backward elimination approach was employed to refine the model and reduce its AIC.The final model retained 6 variables: NIHSS score, Collateral Score, NLR, Ventilation, Non-MCA territorial involvement, and Midline Shift.Meanwhile, the model's AIC decreased from 183.55 to 171.65.

Nomogram assessment
The final Logistic Regression model was subjected to Nomogram construction (Figure 1), and ROC curves were generated and AUC values calculated for both the Development Group and the Validation Group (Figure 2).It is evident that the model exhibits excellent discriminative capacity and generalizability.In the Validation Group, calibration curves were generated (Figure 3), demonstrating a high degree of conformity between the calibrated model and actual observations.Lastly, DCA curves were crafted (Figure 4) for both the Development Group and the Validation Group, revealing the model's remarkable clinical net benefit capability.Within the threshold range of 5 to 85%, the model's performance surpasses that of clinical net benefit under complete intervention and non-intervention scenarios.
The steps for using the nomogram are to score each sub-item up to the corresponding scale, then add them up, and read the total score on the lower scale bar, which points to the final probability line.The resulting probability is the probability of all-cause mortality during the patient's hospitalization in this Nomogram.
For example, a 50-year-old male patient diagnosed with large hemispheric stroke was admitted to the Emergency Room with an NIHSS score of 15, a collateral score of 2, an NLR of 20, requiring mechanical ventilation, presenting with a non-ischemic territory lesion, and a midline shift of 6 mm.Using the Nomogram, the points assigned for each parameter are as follows: NIHSS score of 15: 35 points; Collateral score of 2: 10 points; NLR of 20: 18 points; Ventilation: 45 points; Non-Ischemic Territory Lesion: 0 points; Midline Shift of 6 mm: 12 points.
Summing these points, the total is 120.According to the Nomogram, this total corresponds to an all-cause in-hospital mortality probability of approximately 60%.

Models' comparison
The previous study (5)  Which means through the utilization of the NRI and IDI metrics, we have comprehensively surpassed the original model.

Discussion
LHI imposes a substantial burden on both patients and their family members due to its high mortality.Identifying potential risk factors for in-hospital mortality at an early stage is essential for both patients and clinicians.In our study, the mortality of patients with LHI is 29.6%, which is comparatively low (1).Several factors contribute to this lower mortality rate.First, decompressive hemicraniectomy was performed in some of our patients, leading to a reduction in mortality (8).Second, the participants in our study were elderly, with an average age of 75.67 years.Elderly patients may exhibit brain atrophy, which allows for more space for brain swelling.
Previous study has reported that MBE, pulmonary infection, and hypoalbuminemia are independently associated with a 3-month unfavorable outcome in patients with right-sided large hemisphere infarction (RLHI) (9).Additionally, admission NIHSS>20 and mechanical ventilation within 48 h of admission were independently associated with a poor outcome in very elderly patients with LHI which received medical management only (10).In our study, we found that MLS, ventilation, NLR, NIHSS, collateral score and involve of non-ischemic territory could predict the in-hospital mortality in LHI patients (Figure 1).We have identified that MLS is a crucial independent factor associated with poor outcome in LHI patients.MLS typically indicates the presence of malignant brain edema (MBE), a significant contributor to early mortality.MBE can lead to irreversible tissue damage, inadequate cerebral blow flow, an impaired blood-brain barrier (BBB), elevated intracranial pressure and brain herniation (11).Li et al. (9) reported that MBE is linked to unfavorable outcome in patients with RLHI independently.Thus, early detection of brain edema is imperative.
In a previous study (6) focused on predicting the risk of MCE after acute LHI involving the anterior circulation, eight independent predictors were identified, including GCS score, NIHSS score, ASPECTS, monocyte count, WBC count, HbA1c level, history of hypertension, as well as a history of hypertension and atrial fibrillation.While this study shed light on the predictive mechanisms of LHI progression to MCE, it did not delve into the exploration of risk prediction mechanisms for in-hospital mortality in patients who did The nomogram for predicting in-hospital mortality in patients with LHI.LHI, large hemisphere infarction. 10.3389/fneur.2024.1398142 Frontiers in Neurology 07 frontiersin.orgnot undergo DHC.To manage elevated intracranial pressure resulting from cerebral edema, hyperosmolar agents such as hypertonic saline and mannitol are commonly employed.However, these therapies require an intact BBB to exert their osmotic effects and may not be effective at the site of edema (12).DHC has been advocated as an effective treatment to reduce MBE-related mortality, but its criteria are stringent.For instance, very elderly patients may not be candidates for this procedure (10).The GAMES-RP trial (13) demonstrated that glibenclamide, when compared to a placebo, significantly reduce mortality at 30 days, although no distinction in mortality rates was observed (13) between days 7 and 90.Therefore, early detection and treatment of brain edema are vital steps to reduce the mortality rates among patients suffering from large-scale cerebral infarctions in the future.
In our study, we also found that mechanical ventilation was another independent predictor for mortality, consistent with previous researches (14,15) which highlighted that a high mortality rate among patients requiring artificial ventilation.Zhang et al. ( 16) demonstrated that mechanical ventilation represented an independent risk factor for in-hospital mortality in acute stroke patients (15).On one hand, patients with LHI usually require mechanical ventilation due to severe brain damage, swallowing dysfunction, and impaired consciousness.On the other hand, serious complications such as pneumonia and sepsis may result in respiratory failure, necessitating mechanical  Although the exact mechanism remains unclear, it appears that mechanical ventilation is the underlying factor contributing to this discrepancy.This is attributed to the fact that mechanical ventilation augments the susceptibility to lung infections, ventilator-associated pulmonary injuries, respiratory failure, and additional complications, all of which can substantially heighten the risk of mortality (17).
In the last few years, inflammation has emerged as a pivotal factor in predicting the prognosis of cerebral infarction.In our study, we identified that the NLR was associated with poor outcome of patients with LHI, consistent with previous study that NLR was the highly potential predictor of clinical outcomes (18), additionally, the study also suggested that NLR can be an indicator of brain edema and death in individuals suffering from a large-scale cerebral infarction.Ji et al. ( 19) also reported that NLR had the exceptional predictive ability for in-hospital mortality following acute myocardial infarction, it appears that inflammatory factors can infiltrate ischemia-damaged tissues, including myocardial and brain tissues, and exert their effects.Prior study (18) postulated that cerebral edema might constitute a crucial mechanism linking systemic inflammation to secondary brain injury and stroke morbidity, a hypothesis with which we concur.Stroke initiates an early disruption of the blood-brain barrier (BBB), permitting the infiltration of peripheral immune cells into injured tissues (20).Inflammations in infarcted regions would rapidly coalesces within a few hours, then occluding the microvascular network, reducing the microvessel blood flow, and exacerbating tissue damage (21).By triggering local inflammation, this process may worsen the existing endothelial damage, thus resulting in further brain injury due to cerebral edema.An experiment revealed that when a vessel is blocked, neutrophils quickly accumulate in the downstream microcirculation veins.This phenomenon, known as downstream microvascular thromboinflammation (DMT), is intensified by neutrophil activation (22).A recent MRI study (23) in mice has lent credibility to the idea that DMT could potentially exacerbate ischemic damage and disrupt the BBB, thereby potentially leading to hemorrhagic transformation.Both studies had suggested that specific substances released by inflammatory cells, such as oxygen species, proteases, cytokines and chemokines could increase the neuronal death.Boisseua et al. (24) found that Neutrophil count predicts poor outcomes after endovascular therapy.Cui et al. (25) also demonstrated that early peripheral neutrophil count after stroke correlates with infarct size and the fatal outcome of LHI patients.Thus, the control of inflammatory cell aggregation and the subsequent inflammatory reactions is imperative in reducing brain edema, preventing bleeding transformation, and decreasing mortality in patients with LHI.
The NIHSS score is a valuable tool for diagnosing and treating clinical cerebral infarction.It assesses the degree of functional impairment in patients with cerebral infarction, including aspects like speech, consciousness, and limb activity.Generally, a higher the score indicates a more severe condition.Previous investigations have shown that NIHSS scores can be utilized to predict the onset of brain edema (26).It has been observed that a high NIHSS score is associated with an unfavorable clinical outcome (9).Consistent with previous studies, our research reveals a correlation between the NIHSS score and mortality.Therefore, healthcare professionals should pay particular attention to patients with higher NIHSS scores.
The Collateral Score (CS) and infarctions involving Non-MCA Perfusion Territories have both demonstrated associations with the mortality of LHI patients.pathway that can supply blood to target tissue in the event of blockages in the primary vascular channels (29).Consequently, collateral flow is an effective technique for augmenting blood supply to protect the neurons in ischemic areas.When collateral flow is insufficient, irreversible neuronal damage can occur in a matter of minutes (30).The provision of collateral circulation is integral to the development of cerebral ischemia, albeit challenging to measure due to its intricate and narrow pathways (31).The CS serves as a fundamental yet reliable method for evaluating collateral supply and its correlation with smaller infarct volumes.Non-middle cerebral artery infarction usually implies the involvement of other blood vessels, such as the anterior or posterior cerebral artery regions.A previous study (32) had established that anterior cerebral artery involvement is an independent predictor for malignant cerebral edema in LHI patients.Nomogram is a widely popularized visual tool for Logistic regression models in recent years, enabling a clear and concise risk prediction for binary classification tasks.Sun's article had previously conducted a similar study and discovered that age, NLR, and MLS are the risk factors for mortality, which aligns with our findings (5).However, our study offers three distinct advantages in comparison to the prior research.Firstly, our research has a larger sample size and a more comprehensive population representation than the previous investigation.The earlier study had a sample size of 158 participants, ranging in age from 53 to 71 years.In contrast, our study encompassed 314 individuals, aged between 32 and 104 years.Secondly, our research has identified a broader range of predictive factors that can be easily accessed from routine clinical practice, thus enhancing their practical utility.Previous research has highlighted the challenges in obtaining predictive factors, such as the requirement for logarithmic values in the case of NLR.Moreover, in contrast to prior research, this study utilized a variant of the random forest-based artificial intelligence method known as Boruta for variable selection.Unlike the conventional approach employed in the study mentioned above, which involved a sequential process of univariate regression followed by multivariate regression, our method accounts for interactions between variables, resulting in more robust outcomes.
Despite our best efforts to mitigate potential constraints, like applying AI technique Boruta algorithm (7).Despite which made a key role in variable selection, our study still exhibits certain limitations.Firstly, it is important to acknowledge that the study is based on a singlecenter sample.Secondly, while our prediction tool demonstrated high performance, it has only been internally validated.Future research should prioritize the investigation of subgroups and their clinical implications to enhance the generalizability of our findings.
In conclusion, our research demonstrated that MLS, ventilation, NLR, NIHSS, collateral score and non-ischemic territory are reliable indicators for predicting in-hospital mortality in LHI patients.Our prediction model can serve as a useful tool for clinicians in guiding appropriate treatment strategies for patients with ischemic stroke.
The limitations of our Nomogram and future developments include: 1 Need for external validation: our model requires validation with larger and more diverse external datasets to assess its generalizability.This includes data from populations undergoing vascular recanalization treatments. 2 Existing scores: there are existing scores for EDEMA developed in South Korea, the USA, and China (36).Although these scores need improvement in terms of generalizability, especially in populations without access to endovascular treatment, they provide a foundation for further research.3 Development of a standardized score: current models, such as the MBE, EDEMA and modified versions of EDEMA in China, focus on different aspects, like the weight of imaging indicators.
There is a need to develop a standardized, globally applicable scoring system for EDMA and its complications, suitable for acute stroke patients of all ethnicities.

Conclusion
Our study has successfully employed NIHSS score, Collateral Score, NLR, mechanical ventilation, non-MCA territorial involvement in patients, and midline shift to construct a highly accurate, userfriendly nomogram for in-hospital mortality prediction in LHI patients.This nomogram provides valuable reference information for future investigations into the prognosis of LHI patients and death prevention.Furthermore, it addresses the shortcomings of previous research.

FIGURE 2 ROC
FIGURE 2 ROC-AUC curves of the development and validation groups.ROC, Receptor operating characteristic curve; AUC, Area under the curve.

FIGURE 3
FIGURE 3Calibration curve of the nomogram in the validation set.
Jo et al. (26)  demonstrated that CS was independently associated with malignant brain edema.Other research (27) had similarly found that poor collateral status independently predicts malignant infarction in patients receiving endovascular therapy.In a study conducted by Elijovich et al.(28), a favorable collateral status was associated with smaller infarct volumes and improved clinical outcomes in patients who underwent endovascular recanalization.Collateral circulation represents an existing vascular

FIGURE 4
FIGURE 4Decision curve analysis of the nomogram model.

TABLE 1
included solely of MLS, Age, and the logarithmically transformed NLR renders the model insufficiently robust, exhibiting an excessive degree of simplicity.Through the utilization of the NRI and IDI metrics, we have comprehensively surpassed the original model.Comparing to the former one, our model achieved NRI (Continuous) 1.189, 95% CI [0.9588-1.4191];p-value<0.001.IDI 0.2951, 95% CI [0.225-0.3651];p-value<0.001.Demographic characteristics and clinical information of the study patients (n = 314).

TABLE 2
Univariate and multivariate logistic regression of the six predictors.
(34)bvash et al. (33)similarly demonstrated that anterior extension of LHI, which involves the ACA territory and ACA-MCA border zone, independently predicts poor functional outcomes in LHI patients.When the ACA territory is affected in LHI patients, it likely indicates a larger infarction or an occlusion closer to the internal carotid artery, which is often accompanied by reduced hemispheric collateral flow and the presence of edematous brain tissue(34).Nevertheless, alternative perspectives have been presented in some articles.Kürten et al. (35) suggested that patients with infarctions extending beyond the MCA territory have similar likelihoods of positive outcomes as those with solely MCA infarction.Further research is needed on these topics.