Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Neurol., 07 January 2026

Sec. Artificial Intelligence in Neurology

Volume 16 - 2025 | https://doi.org/10.3389/fneur.2025.1691549

This article is part of the Research TopicPrecision Medicine in Neurocritical CareView all 15 articles

Development of an interpretable machine learning model for predicting venous thromboembolism in intensive care unit patients with intracerebral hemorrhage

Menghui He&#x;Menghui He1Wenyan Liu&#x;Wenyan Liu1Zhongsheng Lu
Zhongsheng Lu2*Yiwei LvYiwei Lv1Qiang ZhangQiang Zhang2Xiaoqing JinXiaoqing Jin2Pei HanPei Han2
  • 1Department of Graduate School, Qinghai University, Xining, China
  • 2Department of Neurosurgery, Qinghai Provincial People’s Hospital, Xining, China

Background: Venous thromboembolism (VTE) is a frequent and potentially life-threatening complication in patients with intracerebral hemorrhage (ICH) in intensive care units (ICU). However, the necessity of prophylactic anticoagulation therapy for these patients remains controversial. This study aims to develop an interpretable machine learning (ML) model to accurately predict the risk of VTE in critically ill ICH patients, thereby enabling timely and individualized preventive measures.

Methods: A retrospective analysis was performed on clinical data from the MIMIC-IV database and ICU patients diagnosed with ICH at Qinghai Provincial People’s Hospital. After data preprocessing, 1,545 cases from the MIMIC-IV database were randomly divided into a training set (1,097 cases) and a test set (448 cases) in a 7:3 ratio. Data from 151 ICH patients treated in the ICU of Qinghai Provincial People’s Hospital between January 2020 and December 2024 were utilized as an external validation set. The Least Absolute Shrinkage and Selection Operator (LASSO) algorithm was applied for feature selection. Model performance was assessed using metrics including the area under the curve (AUC), decision curve analysis (DCA), accuracy, positive predictive value (PPV), and negative predictive value (NPV). The optimal model was further explained using the SHapley Additive exPlanations (SHAP) method.

Results: The XGBoost model exhibited the best predictive performance, with AUC values of 0.936, 0.778, and 0.761 for the training set, test set, and external validation set, respectively. Feature importance analysis identified the top 10 influential features as follows: ICU stay duration, age, prothrombin time, triglycerides, albumin, body mass index, partial thromboplastin time, blood glucose, white blood cell count, and systolic blood pressure.

Conclusion: The XGBoost model accurately predicts VTE occurrence in ICH patients in the ICU. By employing the SHAP method, it is possible to precisely assess the impact of various pathophysiological parameters on individual patient predictions, thereby providing robust support for personalized risk stratification and preventive treatment.

1 Introduction

Patients with intracerebral hemorrhage (ICH) are frequently at risk of venous thromboembolism (VTE)—a severe complication that primarily presents as deep vein thrombosis (DVT) or pulmonary embolism (PE) (1). Studies indicate that approximately 3% of ICH patients develop DVT or PE, with the incidence of symptomatic VTE ranging between 1 and 10% (2). Beyond prolonging hospital stays and increasing healthcare costs, VTE also significantly elevates mortality risk (3). Prophylactic anticoagulation is an established effective strategy to reduce VTE incidence (4). Despite the critical need for VTE prevention in ICH patients, a significant clinical conflict exists between anticoagulation and hemostatic therapies. Clinicians must consider not only the risk of fatal VTE in the absence of anticoagulation but also the risk of intracranial hematoma expansion potentially induced by anticoagulation (5). Balancing these dual therapeutic priorities and developing a scientifically sound intervention strategy for this patient population has emerged as a critical, unmet challenge in clinical practice.

Precise, adaptable assessment tools are critical for the early identification of severe intracerebral hemorrhage (ICH) patients at risk of venous thromboembolism (VTE). Relative to traditional approaches, machine learning (ML) algorithms can discern latent associations and patterns within complex medical datasets. Through the learning and analysis of large-scale, complex datasets, such algorithms contribute substantially to clinical disease diagnosis and outcome assessment (6). In recent years, the application of ML in clinical medicine has grown increasingly prevalent; specifically, ML is employed to quantify risks, identify predictive factors, and develop high-precision diagnostic and prognostic models (7).

This study employed five machine learning models, including XGBoost, Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM), to construct a predictive framework for the risk of VTE in severe ICH patients. The research aims to thoroughly explore the risk factors for VTE occurrence through interpretable machine learning methods, providing a scientific basis for clinical decision-making.

2 Materials and methods

2.1 Data source

The data for this study were sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV) (Record ID: 66983242) and from patients with ICH admitted to the ICU of Qinghai Provincial People’s Hospital. Specifically, the data extracted from the MIMIC-IV database were randomly divided into a training set (70%) and a test set (30%). The data collected from the ICU of Qinghai Provincial People’s Hospital were used as an external validation set.

MIMIC-IV is a publicly accessible intensive care database that comprises clinical data from inpatients at Beth Israel Deaconess Medical Center (BIDMC) between 2008 and 2019 (8). This database systematically documents patients’ vital signs, surgical procedures, laboratory test results, comorbidities, medication records, demographic information, and follow-up survival status. Given that patient identifiers have been de-identified in the database, informed consent was not required for its use. Additionally, we collected data from patients with ICH admitted to the ICU of Qinghai Provincial People’s Hospital between January 2020 and December 2024. This study was approved by the Research Ethics Committee of Qinghai Provincial People’s Hospital (reference number: 2025–400-01). All participants or their legal guardians were informed and provided consent to participate in this study. Throughout the research process, all methods and procedures were conducted strictly with the Declaration of Helsinki principles. Patient data were anonymized, with no patient-identifiable information recorded.

2.2 Study population and target variables

Patients with ICH admitted to the ICU were included in the study cohort. Samples with missing variables exceeding 20% were excluded. The retrieval of ICH patients utilized the diagnostic codes from the International Classification of Diseases, Ninth Revision (ICD-9) and Tenth Revision (ICD-10). The screening process for patient inclusion in this study is illustrated in Figure 1. Exclusion criteria included: (1) age < 18 years; (2) patients not admitted to the ICU for the first time; (3) ICU stay duration less than 24 h; (4) presence of venous thromboembolism at admission; (5) insufficient data on the first day of admission (including triglycerides, blood glucose, prothrombin time, and partial thromboplastin time).

Figure 1
Flowchart showing patient selection for analysis. From 94,458 ICU stays, 3,089 were patients with ICH. Of those, 1,544 were excluded due to factors like age under 18, non-first ICU admission, ICU stay under 24 hours, existing VTE, or insufficient data. This left 1,545 patients for final analysis, divided into 240 with VTE and 1,305 without.

Figure 1. The flow chart of patient selection. The study enrolled 1,545 patients with severe ICH.

VTE is defined as DVT (including upper or lower extremities) and/or PE. This study extracted VTE data based on ICD-9 and ICD-10 codes from the MIMIC-IV database. Additionally, we manually reviewed bilateral limb venous ultrasound, CT venography, or CT pulmonary angiography (CTPA) reports of ICH patients in the intensive care unit of Qinghai Provincial People’s Hospital to confirm the diagnosis of VTE. Imaging examinations are performed as part of the routine screening protocol, and the date of VTE diagnosis is determined by the date of imaging confirmation.

2.3 Data extraction and preprocessing

41 predictive features were extracted through Structured Query Language (SQL). All laboratory data were collected at the time of hospital admission. Variable collection can be categorized into four groups: (1) Baseline characteristics: gender, age, BMI (weight/height2), ICU stay duration; (2) Vital signs: heart rate, systolic blood pressure (SBP), diastolic blood pressure (DBP), respiratory rate, body temperature, oxygen saturation (Spo2), Glasgow Coma Scale (GCS), SOFA score; (3) Biochemical indicators (collected within 24 h of admission): blood urea nitrogen (BUN), serum potassium, serum sodium, blood glucose, creatinine level, white blood cell count (WBC), red cell distribution width (RDW), red blood cell count (RBC), platelet count, mean corpuscular volume (MCV), hemoglobin, hematocrit, mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), international normalized ratio (INR), prothrombin time (PT), partial thromboplastin time (PTT), albumin, serum calcium, triglycerides; (4) Complications: VTE (deep vein thrombosis and pulmonary embolism), congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD), renal disease, liver disease, atrial fibrillation, peripheral vascular disease, paraplegia, malignant cancer.

To address the issue of missing data, this study employed Multivariate Imputation techniques for data filling. Specifically, we employed the Multiple Imputation by Chained Equations (MICE) algorithm to achieve this purpose. During the variable preprocessing stage, continuous data were transformed using standardization methods, while categorical variables were factorized. To address the issue of class imbalance, this study employed the Random Over-Sampling Examples (ROSE) method. We implemented this algorithm using the ROSE package in R language, adhering to the package’s default settings where the appropriate sampling ratio is automatically determined based on the degree of dataset imbalance. This approach helps enhance the model’s generalization capability, enabling accurate predictions for both majority and minority classes.

2.4 Construction of the machine learning model

This study constructed five prediction models using the training dataset: XGBoost (implemented via the xgboost R package v1.7.5.1 in R 4.4.2, using tree-based booster [gbtree] for binary classification), logistic regression (LR), random forest (RF; with fixed random seed of 12,345 and five-fold cross-validation partitioning before bagging), K-nearest neighbors (KNN), and support vector machine (SVM). Model training and hyperparameter optimization were performed through five-fold repeated cross-validation. In this process, XGBoost employs grid search to optimize key parameters and implements an early stopping mechanism (training stops if the validation set AUC improvement is less than 0.001 for 10 consecutive iterations) to prevent overfitting. The test set and external validation set remain completely isolated during the training process, serving solely for final model validation after determination. This approach effectively reduces overfitting risks and ensures independence in model selection and evaluation.

2.5 Model assessment

Based on the area under the receiver operating characteristic curve (AUC), this study systematically evaluated the predictive performance of five machine learning algorithms in the test set. By comprehensively examining the performance parameters of each model and integrating the results of decision curve analysis (DCA), the machine learning algorithm with the best overall performance was ultimately selected. Subsequently, the calibration performance of this model was validated, and patient data from Qinghai Provincial People’s Hospital were used as an external validation group to further assess the model’s stability and generalization capabilities.

2.6 Machine learning explainable tool

We employed the SHAP method to interpret the predictive model, which precisely quantifies the contribution and influence of each feature on the final prediction outcome. Notably, SHAP not only accounts for the independent contribution of features but also assesses interactions among these features, thereby providing a more comprehensive explanation. To assess feature SHAP values, we utilized SHAP feature importance plots, SHAP scatter plots, and SHAP waterfall plots. These visualization approaches help delineate the predictive mechanisms of the “black-box” model, thereby enhancing the model’s transparency and credibility.

2.7 Statistical analysis

The statistical processing and data visualization in this study were conducted using R language (version 4.4.2). Differentiated approaches were adopted based on the characteristics of the variables when selecting the analytical methods. For categorical data, chi-square tests or Fisher’s exact tests were employed for statistical analysis, with results presented as a combination of frequencies and percentages. Different strategies were applied to continuous variables according to their distribution characteristics. Data conforming to a normal distribution were described using means combined with standard deviations, and group comparisons were performed using t-tests. For data not conforming to a normal distribution, quartiles were used for description, and the Wilcoxon rank-sum test was applied for difference testing. Regarding the significance level, p < 0.05 was set as the criterion for statistical significance.

This study employs LASSO regression analysis to standardize the statistical analysis results and eliminate potential confounding factors. This method enhances the predictive accuracy and interpretability of statistical models, particularly suitable for dimensionality reduction in high-dimensional data. To ensure minimal autocorrelation, we selected features with non-zero coefficients from the LASSO regression model for additional analysis. The discriminative performance of the model is quantified by AUC, supplemented by metrics such as sensitivity and accuracy for comprehensive evaluation. To further validate the clinical applicability of the model, DCA is used to calculate the net benefit values at different risk thresholds, assessing the decision utility of the predictive model.

3 Results

3.1 Patient characteristics

This study extracted 3,089 cases of ICH from the MIMIC-IV database. After applying the exclusion criteria, 1,545 patients were ultimately included. The data from the MIMIC-IV database were randomly divided into a training set (n = 1,097) and a test set (n = 448) in a 7:3 ratio, which were used for model development and performance evaluation, respectively. The data from Qinghai Provincial People’s Hospital served as the external validation set (n = 151). Prior to applying the ROSE algorithm, the training dataset exhibited significant class imbalance, with 172 venous thromboembolism (VTE) cases (minority class) and 925 non-VTE cases (majority class). After ROSE-based balancing treatment, the training set achieved a nearly balanced distribution (545 VTE cases vs. 552 non-VTE cases), effectively alleviating the issue of insufficient minority class samples. The comparison of baseline characteristics between the training set and the test set showed no statistically significant differences in any of the indicators (Supplementary Table 1). Further analysis of intergroup differences in the training set regarding the occurrence of VTE (Table 1) revealed significant disparities in age (p < 0.001), presence of malignancy (p = 0.008), length of ICU stay (p < 0.001), and several biochemical indicators (such as hemoglobin, prothrombin time, albumin, and red cell distribution width, etc.).

Table 1
www.frontiersin.org

Table 1. Characteristics of ICH patients in the training set.

3.2 Selection of predictors

In the LASSO method, the regularization degree of the β coefficients is determined by the tuning parameter λ (λ = 0.02690252; lambda.1se; Figure 2). Through feature screening, 10 key variables with non-zero coefficients were ultimately identified, including ICU stay duration, age, prothrombin time, triglycerides, albumin, body mass index, partial thromboplastin time, blood glucose, white blood cell count, and systolic blood pressure.

Figure 2
Panel A shows a graph of coefficient paths versus logarithm of lambda, with multiple colored lines representing different coefficients converging to zero as lambda increases. Panel B displays a plot of binomial deviance against logarithm of lambda, with red dots indicating the deviance values. A U-shaped pattern is visible, with the lowest point suggesting the optimal lambda value for prediction accuracy.

Figure 2. The result of the Least Absolute Shrinkage and Selection Operator (LASSO) method for filtering variables. (A) Coefficients of all predictors gradually returning to zeros by used 10-fold cross-validation. (B) 10 predictors with non-zero coefficients at the rightmost dashed line.

3.3 Model performance

This study constructed five predictive models, namely XGBoost, LR, RF, KNN, and SVM, based on the training set (Figure 3). In the test set, the XGBoost model demonstrated the best predictive performance, with an AUC value of 0.778 and a 95% confidence interval of (0.716, 0.838). In contrast, the SVM model showed relatively weaker predictive capability, with an AUC value of 0.686 and a 95% confidence interval of (0.607, 0.763). The ROC curves and AUC values of different models in the test set and external validation set are shown in Figures 4A,B. To comprehensively evaluate the model performance, this study employed multiple evaluation metrics, including accuracy, sensitivity, positive predictive value (PPV), negative predictive value (NPV), and F1 score (Table 2). On the external validation set, the AUC value of XGBoost remained stable at 0.761 (95% CI 0.643, 0.879). Although its confidence interval slightly widened compared to the training set, it was still significantly higher than the random prediction baseline (AUC = 0.5). These results indicate that the predictive model demonstrates stable generalization performance, and its output results are in good agreement with real clinical scenarios, providing practical technical support for quantitative disease risk assessment.

Figure 3
Receiver Operating Characteristic (ROC) curve comparing five models: XGBoost (red, AUC=0.936), Random Forest (blue, AUC=0.871), K-Nearest Neighbors (purple, AUC=0.913), Support Vector Machine (green, AUC=0.858), and Logistic Regression (yellow, AUC=0.741). The y-axis is Sensitivity, and the x-axis is 1-Specificity. Each curve includes a 95% confidence interval.

Figure 3. ROC curve analysis of five machine learning algorithms in the training set for predicting VTE in ICH patients.

Figure 4
Two ROC curve graphs labeled A and B compare different machine learning models' performance. Graph A shows XGBoost, RF, KNN, SVM, and LR models with AUCs ranging from 0.686 to 0.778. Graph B displays similar comparisons with slightly different AUCs, from 0.684 to 0.761. Each graph indicates sensitivity against 1-specificity, representing classification effectiveness.

Figure 4. (A) ROC curve analysis of five machine learning algorithms in the test set for predicting VTE in ICH patients. (B) ROC curve analysis of five machine learning algorithms in the external validation set for predicting VTE in ICH patients.

Table 2
www.frontiersin.org

Table 2. Predictive performance of the models.

DCA was used to evaluate the clinical utility of each model, with a focus on net clinical benefit performance across different probability thresholds (Figure 5). Analytical data indicated that all five models exhibited superior clinical decision-making value compared to the two extreme strategies: “treating all patients” (orange reference line) and “treating no patients” (yellow reference line). Notably, the XGBoost model sustained the highest net benefit across most probability threshold ranges, demonstrating exceptional clinical application potential. From the calibration curve (Figure 6), it can be observed that the predicted outcomes of the XGBoost model on both the training set and test set were highly consistent with actual outcomes. Based on multi-dimensional evaluation results, XGBoost was ultimately selected as the core model for predicting VTE in patients with ICH.

Figure 5
Line chart showing net benefit against threshold probability for different models: RF (light blue), KNN (purple), LR (green), SVM (blue), and XGBoost (red). Net benefit decreases with increasing threshold probability, starting above zero and converging towards zero, with slight variations among models. Horizontal axis ranges from zero percent to one hundred percent, and vertical axis ranges from negative zero point zero five to zero point fifteen.

Figure 5. Decision curve analysis of five models plotting net benefits with different threshold probabilities.

Figure 6
Side-by-side calibration plots labeled A and B. Both plots show predictions on the x-axis and observations on the y-axis, with data points and error bars. Each plot features a diagonal reference line indicating perfect agreement between prediction and observation. Plot A shows a distribution of data points slightly above and below the line, while Plot B has a similar pattern.

Figure 6. Calibration plots of the XGBoost model in the training (A) and testing sets (B).

3.4 Interpretation of personalized predictions

As shown in Figure 7, ICU stay duration is the most influential predictor of VTE occurrence, followed by Age, PT, Triglycerides, Albumin, BMI, PTT, Blood Glucose, WBC, and SBP. Figure 8 further elucidates the directional impact of each variable. For instance, Age exhibits a strong positive correlation with adverse outcomes, with higher values (right side, orange) associated with increased risk. Conversely, higher Albumin levels (left side, orange) are linked to reduced risk, as reflected by the negative SHAP values.

Figure 7
Bar graph showing the mean absolute SHAP values for various features. Features include ICU stay, age, PT, triglycerides, albumin, BMI, PTT, blood glucose, WBC, and SBP. ICU stay has the highest mean SHAP value, indicating the most significant impact.

Figure 7. The weights of variables importance.

Figure 8
A SHAP summary plot displaying the impact of various features on a model's output. Features include ICU stay, age, PT, triglycerides, albumin, BMI, PTT, blood glucose, WBC, and SBP, each represented by a horizontal scatter of points colored from purple to yellow, indicating low to high feature values. The x-axis shows the SHAP value ranging from negative to positive influence, and the color gradient on the right illustrates feature value intensity.

Figure 8. Scatter plot of feature values and SHAP values.

Figure 9 displays individual SHAP waterfall plots for two patients: one who developed VTE (Figure 9A) and one who did not (Figure 9B). In the figure, red arrows indicate features that increase risk, while blue arrows denote features that suppress risk. The length of the arrows corresponds to the magnitude of the feature contributions. For Patient A (who developed VTE), high-risk features such as hypoalbuminemia (Albumin = 1.9), extreme obesity (BMI = 36.6), and advanced age (Age = 75) collectively elevated the predicted value [f(x) = 2.0] above the baseline E[f(x)] = 1.16, strongly indicating a poor outcome. Conversely, for patient B (who did not develop VTE), the protective characteristics predominated, pushing the predicted value [f(x) = 0.998] below the baseline E[f(x)] = 1.16.

Figure 9
Two waterfall charts labeled A and B illustrate the impact of various medical parameters on predictions. Chart A shows positive contributions from factors like BMI, blood glucose, and age, resulting in a prediction of 2. Chart B shows mostly negative contributions from factors like age and PT, leading to a prediction of 0.998. Each bar indicates the magnitude and direction of impact on the prediction value.

Figure 9. SHAP waterfall plots for two selected patients: (A) Patient with VTE occurrence. (B) Patient without VTE occurrence.

4 Discussion

Globally, the incidence of ICH is approximately 29.9 per 100,000 person-years, and this rate has not decreased significantly over the past four decades (9). Following ICH onset, the body’s coagulation cascade is activated, triggering local platelet aggregation and subsequent microthrombus formation. This process not only impairs cerebral hematoma absorption but also may induce adverse events such as VTE (10). Thus, timely identification of VTE risk factors in ICH patients and targeted interventions for effective prevention and management are critical for improving patient prognosis. Currently, commonly used thromboembolism risk assessment tools, such as the Padua Prediction Score, Improved Risk Score for VTE in Stroke, Caprini Score, and Wells Score, exhibit significant limitations when applied to ICH patients. Most of these tools were developed for mixed stroke populations (ischemic and hemorrhagic) or surgical patients and have not been adequately validated in cohorts of pure ICH (11). Traditional approaches such as univariate and multivariate logistic regression have limited accuracy in outcome prediction. For example, Ma et al. (12)developed a traditional binary logistic regression model to predict VTE in ICH patients, using spontaneous echo contrast (SEC), albumin, and age as predictors. While the model performed well, the authors noted a key limitation: the exclusion of artificial intelligence (AI) integration. In recent years, ML algorithms have been widely applied in medical research and frequently outperform traditional statistical models.

This study systematically evaluated the performance of multiple machine learning algorithms and, for the first time, validated the superiority of the XGBoost model in predicting VTE events in patients with severe ICH (test set AUC = 0.778; external validation set AUC = 0.761). Experimental results confirmed that the model’s strong performance originated from inherent predictive validity rather than overfitting. Through machine learning methods, the researchers achieved precise prediction of VTE risk in ICH patients, providing a scientific basis for clinical formulation of individualized prophylactic anticoagulation strategies, which helps optimize patient prognosis and improve their quality of life. Using SHAP waterfall plot analysis, the study revealed the contribution of core pathophysiological indicators to individual prediction outcomes, enhancing the model’s interpretability. Through a systematic evaluation based on the SHAP method, the research team not only quantified the clinical weights of predictive variables but also conducted an integrated analysis based on the importance ranking and clinical characteristics.

Our research findings indicate that in the model interpretability analysis, ICU length of stay, age, and BMI demonstrated high SHAP values. This suggests that these factors play a crucial role in predicting the risk of VTE in patients with severe ICH. This finding is consistent with previous research results. It is noteworthy that the length of ICU stay may have a bidirectional association with venous thromboembolism (VTE): on one hand, prolonged hospitalization increases VTE risk, which is related to immobilization, systemic inflammatory response, and invasive interventions, and is also consistent with its role as the primary predictive factor in the model; on the other hand, the occurrence of VTE may prolong ICU stay due to additional anticoagulation therapy and complication monitoring. The study by Quanhong Chu et al. (13) showed that age >60 years (OR: 2.138, 95% CI: 1.087–4.207, p = 0.028) and hospital stay duration >16 days (OR: 2.548, 95% CI: 1.381–4.701, p = 0.003) are independent risk factors for VTE in ICH patients. Moreover, our research indicates that BMI is also an independent risk factor for VTE. Existing studies have shown that the risk of VTE in obese patients (BMI ≥ 30 kg/m2) is 1.5–2.7 times higher than that in individuals with normal weight (HR/RR: 1.62–2.74) (14). The study by Kim et al. (15) further quantified the relationship between BMI and VTE risk, demonstrating that for every 5 kg/m2 increase in BMI, the risk of VTE increases by 50%. Prolonged hospitalization, advanced age, and obesity all significantly increase the risk of VTE. In response to these high-risk factors, clinicians should adopt comprehensive measures, including early mobilization, mechanical prophylaxis, pharmacological prophylaxis, and lifestyle interventions, to reduce the incidence of VTE and improve patient outcomes.

It is noteworthy that abnormalities in coagulation function indicators are also closely associated with the occurrence of VTE in ICH patients. PT and APTT are key indicators for assessing the extrinsic and intrinsic coagulation pathways, respectively. The relationship between PT and the risk of VTE in ICH patients is complex and highly context-dependent, far from being a simple positive or negative correlation. One study has shown that a shortened PT is associated with the occurrence of VTE (16). In contrast, multiple studies on hospitalized COVID-19 patients consistently found that prolonged PT (e.g., exceeding baseline by 3 s) is an independent predictor of VTE (17, 18). The contradictory nature of these results may stem from the heterogeneity of the study populations (e.g., whether they had COVID-19) and the interference of potential confounding factors. Although theoretically a shortened PT may indicate a hypercoagulable state, the clinical evidence supporting it as an independent risk factor for VTE is weak and fraught with contradictions. Its predictive power and strength of evidence are far inferior to those of a shortened APTT. Multiple studies have shown that a shortened APTT is a marker of a hypercoagulable state, potentially caused by elevated levels or activation of coagulation factors (such as VIII, XI, etc.) (19, 20). APTT is currently more widely used in clinical practice and, compared to PTT, it includes specific activators that allow for a more precise measurement of the activity of certain coagulation components. Since the MIMIC-IV database does not provide data on APTT, we only analyzed PTT and found that a shortened PTT is independently associated with secondary VTE in ICH patients, which is also a limitation of our study design.

Studies have shown that the risk of VTE in patients with ICH complicated by hypertension is significantly increased (5, 21). This association may be related to vascular endothelial dysfunction, enhanced inflammatory response, and hypercoagulable state induced by hypertension (22, 23). Additionally, patients with hypertension often have other metabolic diseases (such as diabetes and hyperlipidemia), which further increase the risk of thrombosis (24). There are few studies on the impact of blood glucose levels on VTE, and the results are inconsistent. A case–control study found that fasting blood glucose levels did not increase the risk of VTE (OR = 0.98, 95% CI = 0.69–1.37) (25). However, another case–control study indicated that hyperglycemia is associated with an increased risk of VTE (OR = 2.21, 95% CI = 1.2–4.05) (26), which is consistent with our findings. Hyperglycemia activates coagulation through endothelial glycocalyx damage, upregulation of tissue factor, increased non-enzymatic glycosylation, and oxidative stress (27), thereby increasing the probability of VTE occurrence. Elevated triglyceride levels can promote VTE through multiple mechanisms. Firstly, hypertriglyceridemia can lead to increased blood viscosity, thereby slowing blood flow and increasing the risk of thrombus formation (28). Secondly, elevated triglyceride levels may further promote thrombus formation by affecting platelet activity and the expression of coagulation factors (29). Huang et al. (28) indicated that the combination of decreased high-density lipoprotein cholesterol (HDL-C) and elevated triglyceride levels significantly increases the risk of venous thromboembolism (VTE) formation. Given the close relationship between triglyceride levels and VTE formation, the management of hypertriglyceridemia should be strengthened in clinical practice (30).

In addition to the aforementioned factors, low albumin levels are an independent risk factor for VTE formation in severe ICH patients (31). Through Mendelian randomization analysis, the study further confirmed the relationship between low albumin levels and venous thrombosis (32). Albumin can modulate inflammatory responses, reducing the release of inflammatory factors, thereby lowering the risk of thrombus formation (33). Moreover, by binding to arachidonic acid, albumin inhibits its metabolism into potent platelet-aggregating substances such as thromboxane A2, thereby suppressing platelet activation and aggregation, and preventing thrombus formation (34). Albumin also improves vascular endothelial function by increasing the expression of nitric oxide (NO) and endothelial nitric oxide synthase (eNOS), thereby inhibiting thrombus formation (35). For patients with severe ICH who develop hypoalbuminemia, timely supplementation of albumin or implementation of other interventions may help reduce the risk of VTE occurrence (33, 35). Future research could further explore the specific mechanisms of albumin in thrombogenesis and its clinical application value.

Finally, the inflammatory response has also been confirmed to be involved in the formation of VTE. Studies have shown that elevated WBC levels are closely associated with an increased risk of VTE in patients with ICH (36), further corroborating our findings. The increase in WBC is typically related to systemic inflammatory responses, which can promote thrombosis by activating coagulation pathways and causing endothelial injury (37). In a study focusing on patients with neurological disorders, Makoto et al. explicitly identified WBC ≥ 7.6 × 109/L as an independent predictor of DVT formation (38). In various diseases and clinical scenarios, the monitoring of WBC and its combined application with other biomarkers provide crucial evidence for the diagnosis, prediction, and prevention of venous thrombosis.

5 Conclusion

We have developed an interpretable XGBoost prediction model, which demonstrates exceptional performance in assessing the risk of VTE in critically ill ICH patients. Moreover, by quantifying the specific contributions of key pathophysiological indicators to the model’s predictions for individual patients through the SHAP framework, it enables personalized risk stratification and optimization of medical resource allocation.

6 Strengths

The strength of this study lies in the construction of a predictive model for VTE occurrence in severe ICH patients, addressing the limitation of traditional scoring methods that are not specifically tailored for ICH patients. In addition to conventional indicators such as length of hospital stay and age, the study also confirmed the independent predictive value of serum albumin, white blood cell count, and triglycerides for VTE occurrence in severe ICH patients. The model’s general applicability in the real world was supported by external validation from Qinghai Provincial People’s Hospital and decision curve analysis. Clinicians can utilize this model to identify high-risk patients and implement early individualized prevention. Furthermore, the application of the SHAP framework enhanced the model’s transparency, providing interpretable evidence for personalized interventions. The model can be integrated into clinical workflows (e.g., embedded in electronic medical record systems) to generate real-time risk scores, reducing reliance on subjective risk assessment tools such as the Padua score.

7 Limitations

This study has several limitations. Firstly, as a retrospective study based on publicly available data, its findings still require further validation through prospective studies. Secondly, the retrospective analysis itself may involve selection bias, which could affect the generalizability of the research results. Thirdly, the limited number of externally validated cases may affect the reliability of the study findings. Further validation in larger and more diverse cohorts is needed in the future. Fourthly, the MIMIC-IV database does not provide imaging features of intracerebral hemorrhage, such as hematoma volume, hematoma location, and intraventricular hemorrhage. In the future, if these imaging features of ICH become available, we will add relevant imaging information to this study and explore the value of these imaging features in patients with ICH combined with VTE.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by the Research Ethics Committee of Qinghai Provincial People’s Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

MH: Validation, Software, Writing – original draft, Methodology, Conceptualization, Investigation, Data curation, Resources, Visualization, Writing – review & editing, Project administration. WL: Conceptualization, Validation, Methodology, Formal analysis, Writing – review & editing, Writing – original draft, Data curation, Investigation. ZL: Investigation, Supervision, Funding acquisition, Methodology, Writing – review & editing, Resources, Project administration. YL: Investigation, Writing – review & editing, Data curation. QZ: Funding acquisition, Writing – review & editing, Project administration, Supervision. XJ: Data curation, Investigation, Supervision, Writing – review & editing, Methodology. PH: Validation, Supervision, Writing – review & editing, Investigation.

Funding

The author(s) declared that financial support was received for this work and/or its publication. The author gratefully acknowledges the financial support by the General Program of the Major Science and Technology Project of Qinghai Provincial Science and Technology Department no.2024-SF-A2 as well as the “Kunlun Talents • Leading Talents in Science and Technology” project of Qinghai Province.

Acknowledgments

We sincerely thank all the authors for their joint efforts, as well as the patients and their families for their participation and support.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fneur.2025.1691549/full#supplementary-material

References

1. Becattini, C, Cimini, LA, and Carrier, M. Challenging anticoagulation cases: a case of pulmonary embolism shortly after spontaneous brain bleeding. Thromb Res. (2021) 200:41–7. doi: 10.1016/j.thromres.2021.01.016,

PubMed Abstract | Crossref Full Text | Google Scholar

2. Cherian, LJ, Smith, EE, Schwamm, LH, Fonarow, GC, Schulte, PJ, Xian, Y, et al. Current practice trends for use of early venous thromboembolism prophylaxis after intracerebral hemorrhage. Neurosurgery. (2018) 82:85–92. doi: 10.1093/neuros/nyx146,

PubMed Abstract | Crossref Full Text | Google Scholar

3. Melmed, KR, Boehme, A, Ironside, N, Murthy, S, Park, S, Agarwal, S, et al. Respiratory and blood stream infections are associated with subsequent venous thromboembolism after primary intracerebral hemorrhage. Neurocrit Care. (2021) 34:85–91. doi: 10.1007/s12028-020-00974-8,

PubMed Abstract | Crossref Full Text | Google Scholar

4. Nicholson, M, Chan, N, Bhagirath, V, and Ginsberg, J. Prevention of venous thromboembolism in 2020 and beyond. J Clin Med. (2020) 9:2467. doi: 10.3390/jcm9082467,

PubMed Abstract | Crossref Full Text | Google Scholar

5. Cai, Q, Zhang, X, and Chen, H. Patients with venous thromboembolism after spontaneous intracerebral hemorrhage: a review. Thromb J. (2021) 19:93. doi: 10.1186/s12959-021-00345-z,

PubMed Abstract | Crossref Full Text | Google Scholar

6. Du, C, Li, Y, Yang, M, Ma, Q, Ge, S, and Ma, C. Prediction of hematoma expansion in intracerebral hemorrhage in 24 hours by machine learning algorithm. World Neurosurg. (2024) 185:e475–83. doi: 10.1016/j.wneu.2024.02.058,

PubMed Abstract | Crossref Full Text | Google Scholar

7. Minardi, M, Bianconi, A, Mesin, L, Salvati, LF, Griva, F, and Narducci, A. Proposal of a machine learning based prognostic score for ruptured microsurgically treated anterior communicating artery aneurysms. J Clin Med. (2025) 14:578. doi: 10.3390/jcm14020578,

PubMed Abstract | Crossref Full Text | Google Scholar

8. Liu, W, Tao, G, Zhang, Y, Xiao, W, Zhang, J, Liu, Y, et al. A simple weaning model based on interpretable machine learning algorithm for patients with sepsis: a research of MIMIC-IV and eICU databases. Front Med. (2021) 8:814566. doi: 10.3389/fmed.2021.814566

Crossref Full Text | Google Scholar

9. Wang, S, Zou, XL, Wu, LX, Zhou, HF, Xiao, L, Yao, T, et al. Epidemiology of intracerebral hemorrhage: a systematic review and meta-analysis. Front Neurol. (2022) 13:915813. doi: 10.3389/fneur.2022.915813,

PubMed Abstract | Crossref Full Text | Google Scholar

10. Al-Kawaz, MN, Hanley, DF, and Ziai, W. Advances in therapeutic approaches for spontaneous intracerebral hemorrhage. Neurotherapeutics. (2020) 17:1757–67. doi: 10.1007/s13311-020-00902-w,

PubMed Abstract | Crossref Full Text | Google Scholar

11. Ji, R, Wang, L, Liu, X, Liu, Y, Wang, D, Wang, W, et al. A novel risk score to predict deep vein thrombosis after spontaneous intracerebral hemorrhage. Front Neurol. (2022) 13:930500. doi: 10.3389/fneur.2022.930500,

PubMed Abstract | Crossref Full Text | Google Scholar

12. Ma, B, Chen, C, Wang, Q, and Chen, X. Construction of an early warning model for venous thromboembolism risk in patients with severe cerebral hemorrhage based on ultrasound spontaneous imaging. Front Neurol. (2025) 16:1562963. doi: 10.3389/fneur.2025.1562963,

PubMed Abstract | Crossref Full Text | Google Scholar

13. Chu, Q, Liao, L, Wei, W, Ye, Z, Zeng, L, Qin, C, et al. Venous thromboembolism in ICU patients with intracerebral hemorrhage: risk factors and the prognosis after anticoagulation therapy. Int J Gen Med. (2021) 14:5397–404. doi: 10.2147/IJGM.S327676,

PubMed Abstract | Crossref Full Text | Google Scholar

14. Rahmani, J, Haghighian Roudsari, A, Bawadi, H, Thompson, J, Khalooei Fard, R, Clark, C, et al. Relationship between body mass index, risk of venous thromboembolism and pulmonary embolism: a systematic review and dose-response meta-analysis of cohort studies among four million participants. Thromb Res. (2020) 192:64–72. doi: 10.1016/j.thromres.2020.05.014,

PubMed Abstract | Crossref Full Text | Google Scholar

15. Kim, J, Kraft, P, Hagan, KA, Harrington, LB, Lindstroem, S, and Kabrhel, C. Interaction of a genetic risk score with physical activity, physical inactivity, and body mass index in relation to venous thromboembolism risk. Genet Epidemiol. (2018) 42:354–65. doi: 10.1002/gepi.22118,

PubMed Abstract | Crossref Full Text | Google Scholar

16. Yin, ZJ, Huang, YJ, and Chen, QL. Risk factor analysis and a new prediction model of venous thromboembolism after pancreaticoduodenectomy. BMC Surg. (2023) 23:25. doi: 10.1186/s12893-023-01916-9,

PubMed Abstract | Crossref Full Text | Google Scholar

17. Aryal, MR, Gosain, R, Donato, A, Pathak, R, Bhatt, VR, Katel, A, et al. Venous thromboembolism in COVID-19: towards an ideal approach to Thromboprophylaxis, screening, and treatment. Curr Cardiol Rep. (2020) 22:52. doi: 10.1007/s11886-020-01327-9,

PubMed Abstract | Crossref Full Text | Google Scholar

18. Al-Dorzi, HM, Alqirnas, MQ, Hegazy, MM, Alghamdi, AS, Alotaibi, MT, Albogami, MT, et al. Prevalence and risk factors of venous thromboembolism in critically ill patients with severe COVID-19 and the association between the dose of anticoagulants and outcomes. J Crit Care Med. (2022) 8:249–58. doi: 10.2478/jccm-2022-0023,

PubMed Abstract | Crossref Full Text | Google Scholar

19. Tripodi, A, Chantarangkul, V, Martinelli, I, Bucciarelli, P, and Mannucci, PM. A shortened activated partial thromboplastin time is associated with the risk of venous thromboembolism. Blood. (2004) 104:3631–4. doi: 10.1182/blood-2004-03-1042,

PubMed Abstract | Crossref Full Text | Google Scholar

20. Jin, J, Lu, J, Su, X, Xiong, Y, Ma, S, Kong, Y, et al. Development and validation of an ICU-venous thromboembolism prediction model using machine learning approaches: a multicenter study. Int J Gen Med. (2024) 17:3279–92. doi: 10.2147/IJGM.S467374,

PubMed Abstract | Crossref Full Text | Google Scholar

21. Gao, H, Zhang, J, Wang, X, Shou, J, Wang, J, and Yang, P. Identification of mRNA biomarkers in extremely early hypertensive intracerebral hemorrhage (HICH). Proteome Sci. (2024) 22:12. doi: 10.1186/s12953-024-00237-w,

PubMed Abstract | Crossref Full Text | Google Scholar

22. Goldhaber, SZ, and Bounameaux, H. Pulmonary embolism and deep vein thrombosis. Lancet. (2012) 379:1835–46. doi: 10.1016/S0140-6736(11)61904-1,

PubMed Abstract | Crossref Full Text | Google Scholar

23. Wolberg, AS, Rosendaal, FR, Weitz, JI, Jaffer, IH, Agnelli, G, Baglin, T, et al. Venous thrombosis. Nat Rev Dis Primers. (2015) 1:15006. doi: 10.1038/nrdp.2015.6,

PubMed Abstract | Crossref Full Text | Google Scholar

24. Gil, JS, Drager, LF, Guerra-Riccio, GM, Mostarda, C, Irigoyen, MC, Costa-Hong, V, et al. The impact of metabolic syndrome on metabolic, pro-inflammatory and prothrombotic markers according to the presence of high blood pressure criterion. Clinics. (2013) 68:1495–501. doi: 10.6061/clinics/2013(12)04,

PubMed Abstract | Crossref Full Text | Google Scholar

25. Li-Gao, R, Morelli, VM, Lijfering, WM, Cannegieter, SC, Rosendaal, FR, and van Hylckama, VA. Glucose levels and diabetes are not associated with the risk of venous thrombosis: results from the MEGA case-control study. Br J Haematol. (2019) 184:431–5. doi: 10.1111/bjh.15599,

PubMed Abstract | Crossref Full Text | Google Scholar

26. Tichelaar, YI, Lijfering, WM, ter Maaten, JC, Kluin-Nelemans, JC, and Meijer, K. High levels of glucose at time of diagnosing venous thrombosis: a case-control study. J Thrombosis Haemostasis. (2011) 9:883–5. doi: 10.1111/j.1538-7836.2011.04226.x,

PubMed Abstract | Crossref Full Text | Google Scholar

27. Pomero, F, Di Minno, MN, Fenoglio, L, Gianni, M, Ageno, W, and Dentali, F. Is diabetes a hypercoagulable state? A critical appraisal. Acta Diabetologica. (2015) 52:1007–16. doi: 10.1007/s00592-015-0746-8,

PubMed Abstract | Crossref Full Text | Google Scholar

28. Huang, Y, Ge, H, Wang, X, and Zhang, X. Association between blood lipid levels and lower extremity deep venous thrombosis: a population-based cohort study. Clini Appl Thrombosis. (2022) 28:10760296221121282. doi: 10.1177/10760296221121282,

PubMed Abstract | Crossref Full Text | Google Scholar

29. Winckers, K, Biguzzi, E, Thomassen, S, Heinzmann, A, Rosendaal, FR, Hackeng, TM, et al. Risk of first venous thrombosis by comparing different thrombin generation assay conditions: results from the MEGA case-control study. TH Open. (2025) 9:a25346123. doi: 10.1055/a-2534-6123,

PubMed Abstract | Crossref Full Text | Google Scholar

30. Liu, Z, Liu, D, Guo, ZN, Jin, H, Sun, T, Ni, C, et al. Incidence and risk factors of lower-extremity deep vein thrombosis after thrombolysis among patients with acute ischemic stroke. Pharmacogenomics Pers Med. (2021) 14:1107–14. doi: 10.2147/PGPM.S321084,

PubMed Abstract | Crossref Full Text | Google Scholar

31. Cao, Y, Li, Y, Zhang, W, and Lei, C. Association between serum albumin levels and risk of deep vein thrombosis in diabetic patients: a retrospective case-control study. Hormones. (2025) 24:1089–97. doi: 10.1007/s42000-025-00679-7,

PubMed Abstract | Crossref Full Text | Google Scholar

32. Liu, Z, and Mi, J. Serum albumin and circulating metabolites and risk of venous thromboembolism: a two-sample Mendelian randomization study. Front Nutr. (2021) 8:712600. doi: 10.3389/fnut.2021.712600,

PubMed Abstract | Crossref Full Text | Google Scholar

33. Xia, X, Tie, X, Hong, M, and Yin, W. Exploration of the causal relationship and mechanisms between serum albumin and venous thrombosis: a bidirectional Mendelian randomization analysis and bioinformatics study. Thromb J. (2025) 23:17. doi: 10.1186/s12959-025-00700-4,

PubMed Abstract | Crossref Full Text | Google Scholar

34. Sun, Y, Deng, J, Ding, Y, Luo, S, Li, S, Guan, Y, et al. Serum albumin, genetic susceptibility, and risk of venous thromboembolism. Rese Pract Thrombosis Haemostasis. (2024) 8:102509. doi: 10.1016/j.rpth.2024.102509,

PubMed Abstract | Crossref Full Text | Google Scholar

35. Zhang, Y, Liu, J, Jia, W, Tian, X, Jiang, P, Cheng, Z, et al. AGEs/RAGE blockade downregulates Endothenin-1 (ET-1), mitigating human umbilical vein endothelial cells (HUVEC) injury in deep vein thrombosis (DVT). Bioengineered. (2021) 12:1360–8. doi: 10.1080/21655979.2021.1917980,

PubMed Abstract | Crossref Full Text | Google Scholar

36. Wang, G, Zhao, W, Zhao, Z, Wang, D, Wang, D, Bai, R, et al. Leukocyte as an independent predictor of lower-extremity deep venous thrombosis in elderly patients with primary intracerebral hemorrhage. Front Neurol. (2022) 13:899849. doi: 10.3389/fneur.2022.899849,

PubMed Abstract | Crossref Full Text | Google Scholar

37. Tort, M, Sevil, FC, Sevil, H, and Becit, N. Evaluation of systemic immune-inflammation index in acute deep vein thrombosis: a propensity-matched. J Vasc Surg Venous Lymphat Disord. (2023) 11:972–7.e1. doi: 10.1016/j.jvsv.2023.02.008,

PubMed Abstract | Crossref Full Text | Google Scholar

38. Nakajima, M, Watari, M, Ando, Y, and Ueda, M. Asymptomatic deep venous thrombosis identified on routine screening in patients with hospitalized neurological diseases. J Clin Neurosci. (2022) 102:13–20. doi: 10.1016/j.jocn.2022.06.002,

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: intracerebral hemorrhage, machine learning, prediction model, SHAP, venous thromboembolism, XGBoost

Citation: He M, Liu W, Lu Z, Lv Y, Zhang Q, Jin X and Han P (2026) Development of an interpretable machine learning model for predicting venous thromboembolism in intensive care unit patients with intracerebral hemorrhage. Front. Neurol. 16:1691549. doi: 10.3389/fneur.2025.1691549

Received: 27 August 2025; Revised: 09 December 2025; Accepted: 16 December 2025;
Published: 07 January 2026.

Edited by:

Hitoshi Fukuda, Kōchi University, Japan

Reviewed by:

Zhiming Zhou, Chongqing Medical University, China
Yukihiro Imaoka, Kumamoto University, Japan

Copyright © 2026 He, Liu, Lu, Lv, Zhang, Jin and Han. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Zhongsheng Lu, TFpTMTM5OTcxNTQwNDdAMTYzLmNvbQ==

These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.