Predicting the recurrence of spontaneous intracerebral hemorrhage using a machine learning model

Background Recurrence can worsen conditions and increase mortality in ICH patients. Predicting the recurrence risk and preventing or treating these patients is a rational strategy to improve outcomes potentially. A machine learning model with improved performance is necessary to predict recurrence. Methods We collected data from ICH patients in two hospitals for our retrospective training cohort and prospective testing cohort. The outcome was the recurrence within one year. We constructed logistic regression, support vector machine (SVM), decision trees, Voting Classifier, random forest, and XGBoost models for prediction. Results The model included age, NIHSS score at discharge, hematoma volume at admission and discharge, PLT, AST, and CRP levels at admission, use of hypotensive drugs and history of stroke. In internal validation, logistic regression demonstrated an AUC of 0.89 and precision of 0.81, SVM showed an AUC of 0.93 and precision of 0.90, the random forest achieved an AUC of 0.95 and precision of 0.93, and XGBoost scored an AUC of 0.95 and precision of 0.92. In external validation, logistic regression achieved an AUC of 0.81 and precision of 0.79, SVM obtained an AUC of 0.87 and precision of 0.76, the random forest reached an AUC of 0.92 and precision of 0.86, and XGBoost recorded an AUC of 0.93 and precision of 0.91. Conclusion The machine learning models performed better in predicting ICH recurrence than traditional statistical models. The XGBoost model demonstrated the best comprehensive performance for predicting ICH recurrence in the external testing cohort.


Introduction
Patients with intracerebral hemorrhage (ICH) have higher rates of disability and mortality compared to those with ischemic stroke (1,2).Factors such as recurrence can lead to worsening conditions and increased mortality in ICH patients (3).The recurrence rate in ICH patients is approximately 2-10% (3).Consequently, predicting the recurrence risk and implementing preventive or therapeutic measures for patients at higher risk of recurrence is a rational strategy to improve outcomes potentially (3,4).Currently, there are a few studies on predicting the recurrence of ICH.One predictive model, which included 38 patients with recurrent ICH, demonstrated an AUC of 0.802 (5).However, the limited number of outcome events could lead to the model's performance instability.Additionally, the model lacked an external cohort for validation, which is crucial for assessing its generalizability.Other recurrence model had a similar condition.Therefore, a model that includes more outcome events and an external validation cohort is necessary to predict ICH recurrence more accurately.
Numerous risk factors can influence the recurrence rate (6).For example, the patient's condition, lobar cerebral hemorrhages or deep subcortical intracerebral hemorrhages, amount of bleeding, changes in the amount of bleeding, and medication status after discharge should be documented (6).These factors, with their intricate correlations, also impact the accuracy of predictions (6).All these aspects affect the performance of traditional statistical models.Machine learning models could appropriately address these challenges (7,8).Two studies constructed machine learning models to predict outcomes in ICH patients, outperforming traditional statistical models and scores (9,10).One study developed a machine learning model to predict the occurrence of seizures in ICH patients, which also performed well (11).
In this study, we aim to construct a machine-learning model to predict the recurrence of ICH.The model incorporates a larger cohort of patients and more outcome events.Furthermore, we validated the model's generalizability using an external cohort.

Study subjects
The training cohort of ICH patients was drawn from neurosurgery, neurology, and rehabilitation departments at the affiliated hospitals Youjiang Medical University for Nationalities.The inclusion period was from January 1, 2018, to January 1, 2021, with follow-up extending to January 1, 2022.This was a retrospective cohort.The testing cohort of ICH patients originated from the affiliated Baidong hospitals of Youjiang Medical University for Nationalities, from January 1, 2021, to January 1, 2022, and follow-up until January 1, 2023.This was a prospective cohort.
Inclusion criteria: 1. Aged 18 years or older who underwent head CT examinations and met the WHO's diagnostic criteria for ICH. 2. Standard medical or surgical treatment was administered to all patients.3. First ICH patients.
Exclusion criteria: 1. Traumatic cerebral hemorrhage, ICH due to venous sinus thrombosis, metastatic lesions, or underlying vascular lesions.2. Recurrence of intracerebral hemorrhage secondary to infection or hemorrhagic cerebral infarction excluded.3. The patient died during hospitalization.
The study was conducted by the Declaration of Helsinki and the ethical standards of the institutional and/or national research committee.The study received approval from the Ethics Committee of the Affiliated Hospital of Youjiang Medical University for Nationalities.Informed consent was obtained from all study participants or their surrogates.Patients who could not give consent and those without a legally appointed representative were excluded from our study.

Risk factors and outcomes
Patient data were collected from electronic medical records, including demographic information such as age, gender, and nationality.Vital signs, including heart rate and blood pressure, were recorded at admission.Comorbidities, such as renal insufficiency, epilepsy, pneumonia, and the location of ICH, were also documented.Laboratory data, including PLT (blood platelet), INR (international normalized ratio), ALT (glutamic-pyruvic transaminase), AST (glutamic oxalacetic transaminase hemoglobin), LDL-C (low-density lipoprotein), CRP (C Reactive Protein), and others, were collected.Medical histories and medication profiles were obtained using structured questionnaires completed by patients or their relatives.
Clinical assessments, such as NIHSS (National Institute of Health stroke scale) scores at admission and discharge, mRS (Modified Rankin Scale) scores at admission and discharge, GCS (Glasgow Coma Scale) scores at admission, and ADL (activity of daily living) scores at admission, were conducted.Hematoma volumes were measured using the ABC/2 method and evaluated by the Alice software (PAREXEL International, Waltham, MA, United States) (12).Two experienced neurologists blinded to the patient's conditions and outcomes performed these evaluations at admission and discharge.
The outcome was the recurrence of ICH within 1 year post-onset.We follow up with each patient until 1 year after the onset of the disease, and document any occurrences of recurrent intracerebral hemorrhage or death within the year.Head CT examinations confirmed recurrent cases according to the WHO's ICH diagnostic criteria.The neurologists evaluated these data independently of the patient's other conditions and baseline data.
The training and testing cohorts were subjected to the same evaluation and inclusion criteria, and the uniform process collected data for both cohorts.Both cohorts shared similar baseline characteristics.All laboratory data adhered to a uniform standard for normal values.

Availability of data and material
Data from the study are available from the corresponding author upon.

Statistical analysis and machine learning model
After gathering and processing data from all patients, we performed statistical and machine learning modeling.
The number of missing features constituted less than one-fifth of the total variables in the cohort.We used multiple imputations to address missing data.Continuous variables were not converted into categorical variables.All data from the final cohort were included in the model.Statistical analyses were performed using SPSS 23.0 for

Baseline characteristics between cohorts
We presented normally distributed continuous variables as mean ± SD (standard deviation) and non-normally distributed as median and frequencies.These data were compared using a t-test for normally distributed variables (such as blood pressure, heart rate, laboratory data, etc.) or a Mann-Whitney U test for non-normally distributed variables (such as NIHSS, ADL, GCS, etc.) between the two cohorts.Categorical variables (such as gender, history of disease, history of medication, etc.) or ranked variables (such as mRS, occupation, nationality, etc.) were expressed as numbers and percentages.These data were compared using the chi-square test.

Data pre-processing
To mitigate the significant impact of class imbalance on machine learning performance, we applied random under-sampling (RUS) technique to balance the data (Scikit-learn library in Python).To guarantee fairness in the under-sampling process and unbiased generalization capabilities of the model, we performed 200 repeated experiments with distinct random under-sampling in each, coupled with ten-fold cross-validation to examine the results' stability.

Feature selection
Our data included 70 variables.We conducted feature selection using various methods to identify relevant variables for the model.We selected relevant features through Lasso regression and a step-bystep recursive procedure (Boruta library in Python).When the p-value of the feature is less than 0.05, the feature is included in the model.Additional features were identified and included based on clinical guidelines and literature (4-6).

Construction and internal validation of the model
We choose commonly used and well-performing basic machine learning algorithms (logistic regression, support vector machine, and decision trees) and integrated machine learning algorithms (Voting Classifier model, random forest model, and XGBoost model) that have shown good performance in previous medical research to build the model (7-11).
We constructed three fundamental machine learning algorithms: logistic regression, support vector machine (SVM), and decision trees (sklearn library in Python).We employed a five-fold cross-validation (4:1) for randomly splitting the training cohort's data.Subsequently, we performed grid search and cross-validation to optimize parameters (sklearn library in Python).We also calculated the ROC and precision values and depicted the model performance using ROC and precision plots (sklearn library in Python).
We further constructed integrated algorithm models based on three fundamental algorithms.The integrated algorithm included a Voting Classifier model, random forest model, and XGBoost model (sklearn library in Python).We calculated the F1 score for each integrated algorithm model and comprehensively compared the performance of each model through AUC, precision values, and F1 scores.We evaluated whether these integrated algorithms improved the performance of the fundamental algorithm models and selected the model with the best performance.

External validation of the model
We addressed outliers and class imbalance in the testing cohort data similarly to the training cohort data.We then predicted the recurrence of ICH in the testing cohort data (without outcome events) using the selected model from the previous step.The actual outcome events were used to validate the model's performance.We demonstrated the model's external performance and generalization ability through AUC and prediction accuracy.

Model visualization
To further elucidate the selected model, we visualized the model using feature importance and individual prediction.We depicted the importance of each feature in the final model with bar charts (SHAP library in Python).We illustrated the effect of each feature on individual patient predictions using visual representations (LIME library in Python).

Baseline characteristics
Initially, 1,133 patients were eligible for the training cohort; however, we excluded 129 patients due to loss of follow-up, missing data, or withdrawal from the study.In the end, the training cohort included 1,024 patients, with 114 experiencing ICH recurrence, and 203 patients who passed away (176 due to neurological causes and 27 due to non-neurological causes).The training cohort had 53 patients with missing data, which we addressed through multiple imputations.The mean age of the training cohort was 63.50 ± 13.550 years, 43.3% of whom were female (443 patients).After excluding 24 patients for the same reasons as in the training cohort, the test cohort comprised 265 patients, with 31 experiencing ICH recurrence, and 41 patients who passed away (36 due to neurological causes and 5 due to non-neurological causes).Similarly, 11 patients in the test cohort had missing data, which we addressed through multiple imputations.The mean age of the test cohort was 59.82 ± 13.634 years, 40.4% of whom were female (107 patients) (Supplementary Figure S1).
In the training cohort, patients with recurrence were less likely to take hypotensive drugs post-onset and had less well-controlled blood pressure.Additionally, patients with recurrence were older and had greater hematoma volume at admission and discharge, higher NIHSS and ADL scores, and higher PLT levels at admission.Other features showed no significant differences (Table 1).
In the test cohort, patients with recurrence had higher NIHSS and ADL scores at discharge and greater hematoma volume at admission and discharge.Other features showed no significant differences (Table 2).Both cohorts had a similar distribution in baseline characteristics.
To compare the baseline data between the training and test cohorts, patients in the training cohort were older than those in the test cohort.Other baseline data showed no significant difference between the two cohorts (Table 3).

Internal validation performance of models
For the basic machine learning models, internal validation revealed that the AUC for the logistic regression model was 0.89, and the precision was 0.81, the positive predictive value (PPV) was 87.3%, and negative predictive value (NPV) was 76.0%.The SVM model had an AUC of 0.93 and a precision of 0.90, a PPV of 94.2% and a NPV of 88.5%, while the decision trees model showed an AUC of 0.91 and a precision of 0.71, a PPV of 79.0% and a NPV of 81.5%.All three models demonstrated excellent discrimination, with the logistic regression and SVM models also exhibiting excellent calibration.
For the integrated machine learning models, the score for the Logistic Regression model was 0.951, for the Decision Tree Classifier was 0.962, and for the SVC model was 0.883.In the Voting Classifier model, the score for the soft voting method was 0.928, with an AUC and precision of 0.96 and 0.86, a PPV of 94.2% and a NPV of 89.4%, respectively.The random forest model scored 0.966 and exhibited an AUC and precision of 0.95 and 0.93, a PPV of 95.0% and a NPV of 92.3%, respectively.The XGBoost model scored 0.977, with an AUC and precision of 0.95 and 0.92, a PPV of 94.6% and a NPV of 93.1%, respectively.All three integrated machine learning models exhibited excellent discrimination, and the latter two models demonstrated excellent calibration.The random forest and XGBoost algorithms improved the basic models' performance.

External validation performance of models
External validation for the basic machine learning models showed that the AUC for the logistic regression model was 0.81 (Supplementary Figure S3a-1), with a precision of 0.79 (Supplementary Figure S3a-2).The SVM model had an AUC of 0.87 (Supplementary Figure S3b-1) and a precision of 0.76 (Supplementary Figure S3b-2), and the decision trees model had an AUC of 0.71 (Supplementary Figure S3c-1) and a precision of 0.70 (Supplementary Figure S3c-2).
For the integrated machine learning models, the random forest model exhibited an AUC of 0.92 (Supplementary Figure S4a-1) and a precision of 0.86 (Supplementary Figure S4a-2), while the XGBoost model had an AUC of 0.93 (Supplementary Figure S4b-2) and a precision of 0.91 (Supplementary Figure S4b-2).The random forest model had a PPV of 90.0% and a NPV of 85.2%, and the XGBoost model had a PPV of 92.5% and a NPV of 93.2%.The integrated machine learning algorithms still demonstrated excellent discrimination and calibration in the test cohort.

Model visualization
In the feature importance figure of the XGBoost model, we found that the value of the hematoma and NIHSS scores at discharge had the most significant effect on the model.Then, the age, the value of PLT and AST, and the value of hematoma at admission also affected the model differently (Figure 1).In the feature importance figure of the random forest model, we found that the value of hematoma at discharge had the best significant effect on the model.Then, the NIHSS score at discharge, the value of hematoma at admission, age, and the value of PLT and AST also affected the model differently (Supplementary Figure S2).
Figure 2 shows the process of the XGBoost model to predict the recurrence rate in individual ICH patients.First, the XGBoost model calculated the score through the difference in volume extent of hematoma and NIHSS score, age, PLT, and AST.Then, the model calculated the possibility of ICH recurrence in the individual patients.For the selected patients, the model accurately predicts the patients would have recurrent ICH with a 100% possibility.

FIGURE 1
The importance of the feature plot of the XGBoost model.2-1: the importance of features for predicting in the total model; 2-2: the importance of features for predicting in each patient of the model.

Discussion
The study constructs machine learning models to predict the recurrence of ICH.We evaluated six machine learning models.The models included age, NIHSS score at discharge, and hematoma volume at admission and discharge.The SVM and decision trees were performed better than logistic regression model in the training cohort.The integrated machine learning models performed better than the basic models in the training cohort.In the test cohort, the AUC and precision of the basic models all decreased to less than 0.9.However, the integrated models, such as the random forest and XGBoost models, still performed better.Notably, the XGBoost model performed excellently in the testing cohort, with an AUC and precision greater than 0.9.
We performed multiple imputation on data with a small number of missing features, and data with more than 30% missing features were not included in the analysis.Less than 1% of the data was not included in the analysis, and it did not significantly affect the result analysis.We standardized all the data included in our model; thus, the data distribution did not significantly affect the results.
For selected features, the NIHSS score at discharge had a highly significant impact on the model.Their importance was ranked second and third in the final random forest and XGBoost models.The neurological deficit at discharge contained more information for predicting long-term events.The discharge score at least reflects the illness's initial state and therapy's effect.In other study, the discharge score had a more pronounced relationship with patient outcomes (5, [9][10][11].The hematoma volume also had a similar effect.Our model included hematoma volume at admission and discharge, allowing it to evaluate the changing situation throughout hospitalization.These changes could more accurately reflect the patients' conditions.This was similar to other models, including the GCS score at admission and discharge (5).The hematoma volume consistently ranked among the top three in our final random forest and XGBoost models, significantly improving our model's performance.
Our model also included age and PLT levels.It is plausible that older age and lower PLT levels could increase the risk of ICH recurrence, which is consistent with other studies (13).Our model also indicated that AST and CRP levels affect the risk of recurrence.Poor liver function and infection can affect the state of the illness (14,15).However, these effects require further investigation.Our model includes a history of stroke and the use of hypotensive drugs.Although these two features did not significantly affect our model performance, these clinically significant features could improve the model's generalizability.Therefore, our model could perform better in different region cohorts.An acute intraparenchymal hematoma in a young patient can be the presenting manifestation of a hematological disorder (16).The hematologic disorders from other stroke etiologies that have a different treatment approach and outcome (16).For less young patients in our cohort, a model included more young patients was more suit for these patients with hematological disorder.
The recurrence of ICH can be influenced by numerous risk factors that vary over time.These factors exhibit a complex interrelationship that traditional statistical models struggle to represent.Machine learning models are more adept at processing these complex characteristics.Consequently, machine learning models have the potential to identify novel predictors with enhanced predictive value.They demonstrated excellent performance in the training cohort due to the abovementioned advantages.However, machine learning models face challenges when applied to external datasets.In other words, they may exhibit poorer generalization performance.The overall performance of our model decreased in the test cohort.Nevertheless, the integrated model exhibited superior generalization capabilities, maintaining its excellent performance even within the test cohort.Consequently, the integrated model could hold significant clinical value across various regions.
Another common challenge with machine learning models is the interpretability of their outcomes.The SHAP library is widely used for global interpretation of model outcomes, revealing the significance of each model feature (17).This means the library elucidates the impact and contribution of each feature to the predictive results and displays each feature's value within the model.The LIME library, which stands Local interpretable model-agnostic explanation (LIME) with XGBoost model applied to one correctly predicted patient that ICH recurrence within 1 year.This study faced several limitations.First, the training cohort might be susceptible to bias owing to its retrospective design.Second, the training and testing cohorts, which originated from the same geographical region, had a similar feature distribution.Consequently, it becomes imperative to validate the model's performance across diverse regional patient populations further.Third, our model did not incorporate neuroimaging data; including a broader array of data types could potentially enhance model performance.

Conclusion
The machine learning model demonstrated excellent performance in predicting the recurrence of ICH compared to traditional statistical models.The XGBoost models with hematoma volume at admission and discharge outperformed the external test cohort.The change situation in hematoma volume improves the performance of model.The XGBoost model exhibited the best discrimination and calibration in predicting the recurrence of ICH.The XGBoost model offered excellent predictive value for guiding medical care in patients at a higher risk of ICH recurrence.
The use of this XGBoost model, along with other machine learning models for predicting recurrent intracerebral hemorrhage, can improve the accuracy of recurrent intracerebral hemorrhage prediction.It guides the formulation of personalized treatment and prevention plans in clinical medicine.The next step involves validating the model's applicability in other regions and ethnicities to confirm its practicality and universality.

TABLE 1
Through random under-sampling and cross-validation, the model performance has improved by approximately 10%.Through Lasso regression and a stepwise process, we identified several relevant variables for the model, including age, NIHSS score at Baseline characteristic of training cohort.* was calculated by ANOVA, Chi-square test, or Mann-Whitney U test as appropriate.NIHSS, National Institute of Health stroke scale; ADL, activity of daily living; AST, glutamic oxalacetic transaminase; CRP, C-reactive protein. p

TABLE 2
Baseline characteristic of testing cohort.
p* was calculated by ANOVA, Chi-square test, or Mann-Whitney U test as appropriate.NIHSS, National Institute of Health stroke scale; ADL, activity of daily living; AST, glutamic oxalacetic transaminase; CRP, C-reactive protein.

TABLE 3
Baseline characteristic between training and testing cohort.
p* was calculated by ANOVA, Chi-square test, or Mann-Whitney U test as appropriate.NIHSS, National Institute of Health stroke scale; ADL, activity of daily living; AST, glutamic oxalacetic transaminase; CRP, C-reactive protein.
Neurology 07 frontiersin.orgfor Local Interpretable Model-Agnostic Explanations, can demonstrate the model's prediction process for individual samples (18).Consequently, LIME visually illustrates how the model predicts the likelihood of outcome events based on the value of each feature.These two libraries enhance our model's interpretability, demonstrating practical value and superior performance in clinical settings.