- 1Department of Intensive Care Unit, Renmin Hospital of Wuhan University, Wuhan, China
- 2Department of Radiology, Renmin Hospital of Wuhan University, Wuhan, China
- 3Department of Ultrasound, Renmin Hospital of Wuhan University, Wuhan, China
- 4Department of Pharmacy, Renmin Hospital of Wuhan University, Wuhan, China
Background: Acute upper gastrointestinal bleeding (AUGIB) is one of the most common critical diseases encountered in the intensive care unit (ICU), with a mortality rate ranging from 15 to 20%. Accurate stratification of acute gastrointestinal bleeding into acute variceal gastrointestinal bleeding (AVGIB) and acute non-variceal gastrointestinal bleeding (ANGIB) subtypes is clinically essential as distinct entities require markedly different therapeutic approaches and even divergent prognostic implications. AUGIB characterized by hemorrhagic shock, hypotension, multiple organ dysfunction (MODS), and even circulatory failure is life-threatening. Machine learning (ML) prediction model can be an effective tool for mortality prediction, enabling the timely identification of high-risk patients and improving outcomes.
Methods: A total of 3,050 acute upper gastrointestinal bleeding (AUGIB) patients were included in our research from the MIMIC-IV database, among which 625 patients were classified as AVGIB and 2,425 patients were categorized as ANGIB. Patients’ clinical features, intervention methods, vital signs, scores, and important laboratory results were collected. The Synthetic Minority Over-sampling Technique-Edited Nearest Neighbors (SMOTE-ENN) and Adaptive Synthetic Sampling (ADASYN) were adopted to address the imbalance of the dataset. As many as 12 machine learning (ML) algorithms, namely, logistic regression (LR), decision tree (DT), random forest (RF), gradient boosting (GB), AdaBoost, XGBoost, Naive Bayes (NB), support vector machine (SVM), light gradient-boosting machine (LightGBM), K-nearest neighbors (KNN), extremely randomized trees (ET), and voting classifier (VC), were performed. The model performance was evaluated using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). Shapley Additive exPlanations (SHAP) analysis was conducted to identify the most influential features contributing to mortality prediction.
Results: In terms of AVGIB patients, extremely randomized trees model demonstrated excellent predictive value among other ML models, with the AUC of 0.996 ± 0.007, accuracy of 0.996 ± 0.009, precision of 0.957 ± 0.024, recall of 0.988 ± 0.012, and F1 score of 0.972 ± 0.007. The top 10 primary feature variables of ET model were whether combined with acute kidney failure, transfusion of albumin, vasoactive drugs, transfusion of plasma, transfusion of platelet, the max of international normalized ratio (INR), the max of prothrombin time (PT), and the max of activated partial thromboplastin time (APTT). In case of ANGIB patients, gradient boosting model proven to be the optimal machine learning models, with the AUC of 0.985 ± 0.002, accuracy of 0.948 ± 0.009, precision of 0.949 ± 0.009, recall of 0.968 ± 0.009, and F1 score of 0.959 ± 0.007. Similarly, the top 10 feature variables of GB model were Glasgow Coma Scale (GCS) score, vasoactive drugs, acute kidney failure, AIMS65 score, APACHE-II score, mechanical ventilation, the minimum of lactate, chronic liver disease, and the minimum and maximum of APTT. The SHAP visualization shows the weights of two ML models feature variables and the average sharp values of variables. Meanwhile, SHAP waterfall outputs the model prediction process with true positive and negative patients. Most importantly, two website prognostic prediction platforms were developed to enhance clinical accessibility: the ET model for AVGIB patients available at https://10zr656do5281.vicp.fun while the GB model for ANGIB patients accessible at http://10zr656do5281.vicp.fun.
Conclusion: The ET model provides a reliable prognostic tool for AVGIB patients, while the GB model serves as a robust tool for ANGIB patients in predicting in-hospital mortality. By systematically integrating clinical features, risk stratification scores, vital signs, and invention measures, the ML models may deliver comprehensive predictions that benefit for clinical decision-making and potentially enhance clinical outcomes in the near future.
1 Introduction
Acute upper gastrointestinal bleeding (AUGIB) is a life-threatening disease frequently manifests as hemorrhagic shock, hypotension, multiple organ dysfunction (MODS), and even circulatory failure, with a mortality ranging from 15 to 20% (1). Approximately one-third of AUGIB patients require intensive care unit (ICU) admission, such as central venous catheterization, repaid liquid resuscitation, emergency tracheal intubation, endoscopy, and even vasoactive drugs (2). Due to the increasing aging population, numerous cardiovascular comorbidities, and the unhealthy lifestyles, the incidence of AUGIB patients has been rising significantly. According to the 2021 American College of Gastroenterology (ACG) guidelines, the estimated incidence ranges from 100 to 180 per 100,000 individuals, with mortality between 10 and 15% (3).
Risk stratification is the key priority in AUGIB management, as emphasized by major international guidelines (3–5). Previously, existing scoring systems such as AIMS65 (6), Glasgow-Blatchford score (GBS) (7), and Rockall score (8) have been used for risk prediction and stratification but exhibit limited sensitivity and specificity in forecasting mortality, rebleeding, and the need for therapeutic intervention. The ICU physicians are unfamiliar with scores other than APACHE-II and GCS scores, which might not be optimized for AUGIB patients (9). The heterogeneity of AUGIB patients, with varying in ages, genders, etiologies, bleeding sites, and blood loss volumes, further complicates accurate outcome prediction. Previous studies on gastrointestinal bleeding have primarily focused on the emergency departments (10) and gastroenterology departments (11). The Ungureanu ML model is restricted to patients with non-variceal upper gastrointestinal bleeding (12). Given the high mortality risk associated with AUGIB, the majority of patients required ICU admission for more attention.
The advent of artificial intelligence (AI) has transformed medical practice, particular in precision diagnosis, personalized therapy strategies, and prognostic prediction (13). Thus, we sought to develop dedicated ML models for mortality prediction for in ICU admitted AUGIB patients, incorporating with comprehensive variables. Recognizing the critical importance of disease stratification, we distinguished AUGIB into variceal and non-variceal subtypes, as these conditions demonstrate substantial differences in both clinical management and prognosis outcomes (14). To identify optimal predictive models, we utilized the Medical Information Mart for Intensive Care (MIMIC)-IV, comprehensive critical care database from the United States. Our methodology involved evaluating 12 machine learning algorithms and implementing various data imbalance techniques to enhance the model’s performance. The rigorous approach enabled us to develop two distinct prediction models, one for variceal and another for non-variceal AUGIB patients. Ultimately, two separate prediction models offer improved comprehensive, precision, and highly accuracy, representing significant advancements in mortality risk stratification for high-risk patient population.
2 Materials and methods
2.1 Data resource and ethical issues
The MIMIC-IV database represents an open-access critical care repository developed and maintained by the Massachusetts Institute of Technology (MIT) laboratory for computational physiology. It provides longitudinal clinical records for inpatients at the Boston-based Beth Israel Deaconess Medical Center spanning 2008 to 2019 (15). It encompasses a wide array of clinical parameters including demographics, continuous vital signs measurements, laboratory results, diagnostic codes, medication administration records, therapeutic interventions, and additional clinically relevant data.
Access to database was obtained through proper institutional channels, with authorized certification No. 12760266. All patient’s private information was anonymized to meet ethical regulations, and the study was granted exemption from requiring informed consent by the institution review board.
2.2 Inclusion and exclusion criteria
Patients were included based on the following criteria: (1) Diagnosis of AUGIB according to the American Gastroenterological Association (AGA) guidelines was enrolled (3). (2) Documented ICD codes confirming upper gastrointestinal etiology, including but not limited to esophageal varices with bleeding (4560), peptic ulcer hemorrhage (K25.0, K26.0, et al.), acute hemorrhagic gastritis (K29.01), Mallory-Weiss syndrome (K22.6), acute gastrojejunal ulcer with hemorrhage (K280), and angiodysplasia of stomach and duodenum with bleeding (K31.811).
Patients were excluded if they met any of the following criteria: (1) age under 18 years old. (2) ICU hospitalization duration less than 24 h; (3) primary diagnosis of lower gastrointestinal bleeding; (4) incomplete clinical records, defined as more than 20% missing values. (5) pregnancy of postpartum status.
2.3 Data extraction and processing
The study utilized data from MIMIC-IV database, which was obtained through authorized access from PhysioNet. Get the access of downloaded, and subsequently installed and imported into PostGres 12.0 software. All data retrieval and extraction were performed by Structured Query Language (SQL) to ensure precision and reproducibility.
Clinical risk scores, including the AIMS65 score, Rockall score, shock index, and GBS score, were calculated based on variables extracted from each patient’s initial medical record. Due to continuous monitoring and multiple dynamic follow-ups, we captured patient’s vital signs and laboratory parameters during the first 24 h of ICU admission: the minimum, the maximum, and average values. Acute variceal gastrointestinal bleeding (AVGIB) and acute non-variceal gastrointestinal bleeding (ANGIB) patients’ demographics, vital signs, laboratory results, medications, diagnoses, interventions, and other clinical data were included as possible related variables.
2.4 Machine learning models
The comprehensive analysis employed 12 distinct machine learning algorithms, each selected for unique strengths in predictive model.
(1) Logistic Regression (LR) A fundamental classification algorithms particularly effective for binary problems, LR provides interpretable results by quantifying the contribution of each independent variable through odds ratios (16).
(2) Decision Trees (DT) The intuitive algorithm utilizes hierarchical tree structure to model decision pathways, capable of capturing non-linear relationships without requiring feature scaling. The DT visual interpretability makes it particularly valuable in medical applications (17).
(3) Random Forest (RF) The ensemble learning method that constructs multiple decision trees during the training. The RF aggregates predictions through either majority voting (classification) or averaging (regression), significantly improving prediction stability (18).
(4) Gradient Boosting (GBoost) It involves a sequence of weak models (such as decision trees) in specific order to minimize given loss function, which automatically handle missing values, offering high accuracy and versatility across diverse tasks (19).
(5) Adaptive Boosting (AdaBoost) The ensemble algorithm that specifically targets misclassification by adjusting the weights of misclassified instances and iteratively refining the model (20).
(6) eXtreme Gradient Boosting (XGBoost) The optimized distributed gradient boosting algorithms designed to be highly efficient, flexible, and portable, XGBoost minimizes the objective function through iterative training and perform regularization to prevent overfitting (21).
(7) Naive Bayes (NB) The probabilistic algorithm based on Bayes’ theorem, assuming feature independence, is particularly effective for multiple classification tasks and lower dimensional data (22).
(8) Support Vector Machine (SVM) The SVM with RBF kernel uses the “kernel trick” maps to implicitly features into a higher-dimensional space for non-linear decision bound with high sensitivity (23).
(9) Light Gradient-Boosting Machine (LightGBM) The gradient boosting framework using tree-based learning algorithms could hand large datasets and categorical features effectively with leaf-wise growth strategy (24).
(10) K-Nearest Neighbors (KNN) The simple and instance-based learning algorithm that classifies a data point based on the majority vote of its neighbors, where the distance metric significantly impacts performance and requires normalization for consistent results (25).
(11) Extra Trees (ET) The ensemble algorithm by creating multiple decision trees through randomizing feature splits and dataset sampling, which not only reduced variance but also improved model robustness (26).
(12) Voting Classifier (VC) The ensemble of multiple models that reduces the bias and standard deviation of individual models, resulting in a more robust and reliable performance (27).
2.5 Model evaluation
To effectively mitigate the challenges posed by dataset imbalance, we implemented two advanced resampling techniques, namely, Edited Nearest Neighbors (SMOTE-ENN) technique and Adaptive Synthetic Sampling (ADASYN) technique to enhance model accuracy (28, 29). For robust model validation, we employed dual validation strategy: 5-fold cross-validation (CV) and independent validation (IV) to minimize overfitting risk (30).
The model evaluation incorporated multiple complementary metrics: classification metrics and discriminatory performance. The key classification metrics contained accuracy, precision, recall, and F1-score. The critical parameter of discriminatory performance included receiver operating characteristic (ROC) curve analysis. The area under the curve (AUC) analysis was conducted to evaluate the predictive performance among different models.
2.6 Model explainability
The SHAP algorithm was used to analyze the importance of features and present the contribution of feature variables based on ML model predictions to explain the model’s prediction process (31). To enhance model interpretability and trustworthiness, summary and waterfall plots were constructed to understand the decision-making process better. The ML models randomly generated positive and negative analysis of prediction process.
2.7 Statistical analysis
Descriptive statistics were presented as means with standard deviations (SDs), medians with interquartile ranges, or counts with frequencies, depending on the data type and distribution. Comparisons between groups were conducted using Fisher’s exact test for categorical data and the t-test, Wilcoxon rank-sum test, analysis of variance, or Kruskal–Wallis test for continuous data as appropriate. Patients were divided into two groups according to the in-hospital outcome. The ML models ultimately included variables with p-value less than 0.001. Python programming software (version 3.9) was used in data processing and model evaluation.
3 Results
3.1 Study research design process
The prediction model construction comprises three core components, namely, data preprocessing and standardization, ML models development and validation, models explainability analysis (Figure 1). Initial screening of the MIMIC-IV database identified 3,519 patients, 101 patients were excluded due to less than 24 h ICU duration, 97 patients were excluded for lower gastrointestinal bleeding, and 271 patients were removed for insufficient data more than 20% missing. Ultimately, a total of 3,050 patients were included in the cohort, 625 patients (20.5%) were stratified into AVGIB, and the remaining 2,425 patients (79.5%) were defined as ANGIB, described in Supplementary Table. The cohort of 3,050 AUGIB patients demonstrated an overall mortality rate of 19.15% (n = 584), aligning with established epidemiological data from prior studies (32). The study included 3,050 AUGIB patients with numerous variables, including demographic characteristics, medication history, special interventions, vital signs, medical history, blood transfusions, severity scores, and laboratory results (Table 1).

Figure 1. Technical roadmap for constructing machine learning model. SMOTE-ENN: Synthetic Minority Over-sampling Technique-Edited Nearest Neighbors; ADASYN: Adaptive Synthetic Sampling; LR: logistic regression; DT: decision tree; RF: random forest; GBoost: gradient boosting; AdaBoost: adaptive boosting; XGBoost: eXtreme Gradient Boosting; NB: Naive Bayes; SVM: support vector machine; LightGBM: light gradient-boosting machine; KNN: K-nearest neighbors; ET: extra trees; VC: voting classifier.
The ML models were separately developed according to the AVGIB and ANGIB subgroups. To address data imbalance and enhance model generalizability, we employed advanced data balancing techniques including SMOTE-ENN and ADASYN. Twelve machine learning algorithms were systematically evaluated through 5-fold cross validation and independent validation. The optimal models were selected based on comprehensive performance metrics including AUC, accuracy, precision, recall, and F1-score.
The Shapley Additive exPlanations (SHAP) algorithms were introduced to identify the top 20 most influential features through SHAP summary plots, mean SHAP value analysis, and visualize prediction mechanism via decision plots for representative cases and force plots illustrating positive and negative predictions.
3.2 Acute variceal gastrointestinal bleeding patients machine learning model
3.2.1 Clinical characteristic and predictor screening
The study identified 625 patients with acute variceal gastrointestinal bleeding through ICD code verification. Based on clinical outcomes, 497 patients (79.5%) were stratified into survival group and 128 patients (20.5%) were classified into non-survival group. The clinical characteristic of demographics, medical history, previous history, intervention measures, vital sign, related scores, and laboratory results were collected.
The variceal bleeding study initially evaluated 92 candidate variables, with 59 demonstrating statistically significant associations (p < 0.001) upon rigorous screening. The feature variable selection results are presented in Table 2. To enhance the sensitivity of the ML model, we only incorporated 59 variables with p < 0.001.
3.2.2 Machine learning model construction and evaluation
Hybrid approaches combining SMOTE-ENN and ADASYN techniques were employed to simultaneously address data imbalances. Subsequently, 12 ML algorithms were adopted to select the optimal ML models. The dual validation strategy incorporating both 5-fold cross validation (Table 3) and independent validation (Table 4) serves as robust safeguard against overfitting by providing multiple performance estimates and maintaining completely unseen data for final evaluation.
Through comprehensive evaluation of key parameters (including accuracy, precision, recall, and F1-score), the ET model exhibited excellent prediction performance with AUC ranking the first with 0.996 ± 0.007, accuracy of 0.966 ± 0.009, precision of 0.957 ± 0.024, recall of 0.988 ± 0.012, and F1-score of 0.972 ± 0.007, respectively. Thus, ET model demonstrated superior performance and was consequently selected as the excellent predictive framework.
3.2.3 Machine learning model performance
As represented in Table 3, the SMOTE-ENN imbalance-handling technique achieves superior performance across nearly all metrics. Consequently, SMOTE-ENN was finally selected as the optimal algorithm, outperforming the ADASYN technique. The consistency performance observed in both 5-CV and IV confirmed that SMOTE-ENN effectively mitigates class imbalance issue without generalizing overfitting, making it the most reliable resampling approach.
Regarding the ML model performance, the AUC values varied significantly across different algorithms when using the ADASYN and SMOTE-ENN techniques, as evaluated through both 5-CV and IV in Figure 2. The ET model emerged as the optimal choice due to its highest AUC values and overall excellent performance, as detailed in Section 3.2.2.

Figure 2. AUC of different machine learning models for AVGIB patients in 5 CV and IV. (A) The ROC of models by ADASYN technique in 5 CV; (B) the ROC of models by ADASYN technique in IV; (C) the ROC of models by SMOTE-ENN technique in 5 CV; (D) the ROC of models by SMOTE-ENN technique in IV. SMOTE-ENN: Synthetic Minority Over-sampling Technique-Edited Nearest Neighbors; ADASYN: Adaptive Synthetic Sampling.
3.2.4 Machine learning model SHAP explainable
To directly display the weights of each feature variable and its predictive values of the ET model, SHAP (Shapley Additive exPlanations) algorithms were performed to visualize variables. The SHAP summary plot contained average SHAP values of variables and a representation of each feature’s contribution with SHAP values. The top 20 feature variables were acute kidney failure, transfusion albumin, vasoactive drugs, transfusion plasma, transfusion platelet, INR_max, PT_max, AIMS65 score, PT_mean, APTT_max, APACHE-II score, bilirubin_max, diabetes, INR_mean, bilirubin_mean, GCS score, APTT_min, INR_min, APTT_mean, and bilirubin_min.
The SHAP feature importance plot indicated top 20 feature variables weight of the optimal ET model in mortality prediction (Figure 3A). In the SHAP summary plot, each point represents the SHAP value of corresponding feature variable for given sample (Figure 3B). Similarly, points trending toward red color indicate higher feature values, while those approaching blue color denote lower feature values.

Figure 3. SHAP explainable of the extra trees prediction model. (A) The SHAP feature importance plot; (B) the SHAP summary plot; (C) the ML explainability of positive patient; (D) the ML explainability of negative patient.
3.2.5 Machine learning explainability for patients
The SHAP explains model predictions by quantifying feature contributions, visualized via waterfall plots. The positive patient was randomly selected in Figure 3C. The base value of the ET model is E f(x) = 0.05, the patient transfusion albumin, corresponding to f(x) = 0.11; patient had acute kidney failure, corresponding to f(x) = 0.09. Similarly, other feature variables correspond to f(x) values. As described in Figure 3C, the final f(x) was 0.936; therefore, the patient was positive case representative.
Similarly, the negative patient was randomly selected in Figure 3D. The base value of the ET model is E f(x) = −0.25, and the patient did not transfusion albumin, corresponding to f(x) = −0.04; patient had no diabetes, corresponding to f(x) = −0.03. Similarly, other feature variables correspond to f(x) values. As described in Figure 3D, the final f(x) was 0.04; therefore, the patient was negative case representative.
3.3 Acute non-variceal gastrointestinal bleeding patients machine learning model
3.3.1 Clinical characteristic and predictor screening
The study identified 2,425 patients with acute non-variceal gastrointestinal bleeding through ICD code verification. Based on clinical outcomes, 1969 patients (81.2%) were stratified into survival group, and 456 patients (18.8%) were classified into non-survival group. The clinical characteristic of demographics, medical history, previous history, intervention measures, vital sign, related scores, and laboratory results are compared in Table 5.
The variceal bleeding study initially evaluated 92 candidate variables, with 70 demonstrating statistically significant associations (p < 0.001) upon rigorous screening. The feature variable selection results are presented in Table 4. To enhance the sensitivity of the ML model, we only incorporated 70 variables with p < 0.001.
3.3.2 Machine learning model construction and evaluation
Hybrid approaches combining SMOTE-ENN and ADASYN techniques were employed to simultaneously address data imbalances. Subsequently, 12 ML algorithms were adopted to select the optimal ML models. The dual validation strategy incorporating both 5-fold cross validation (Table 6) and independent validation (Table 7) serves as robust safeguard against overfitting by providing multiple performance estimates and maintaining completely unseen data for final evaluation.
Through comprehensive evaluation of key parameters (including accuracy, precision, recall, and F1-score), the Gradient Boosting model exhibited excellent prediction performance with AUC of 0.985 ± 0.002, accuracy of 0.948 ± 0.009, precision of 0.949 ± 0.009, recall of 0.968 ± 0.009, and F1-score of 0.959 ± 0.007, respectively. Thus, GB model demonstrated superior performance and was consequently selected as the excellent predictive framework.
3.3.3 Machine learning model performance
As represented in Table 6, the SMOTE-ENN imbalance-handling technique achieves superior performance across nearly all metrics. Consequently, SMOTE-ENN was finally selected as the optimal algorithm, outperforming the ADASYN technique. The consistency performance observed in both 5-CV and IV confirmed that SMOTE-ENN effectively mitigates class imbalance issue without generalizing overfitting, making it the most reliable resampling approach.
Regarding the ML model performance, the AUC values varied significantly across different algorithms when using the ADASYN and SMOTE-ENN techniques, as evaluated through both 5-CV and IV in Figure 4. The ET model emerged as the optimal choice due to the highest AUC values and overall excellent performance, as detailed in Section 3.2.2.

Figure 4. AUC of different machine learning models for ANGIB patients in 5 CV and IV. (A) The ROC of models by ADASYN technique in 5 CV; (B) the ROC of models by ADASYN technique in IV; (C) the ROC of models by SMOTE-ENN technique in 5 CV; (D) the ROC of models by SMOTE-ENN technique in IV. SMOTE-ENN: Synthetic Minority Over-sampling Technique-Edited Nearest Neighbors; ADASYN: Adaptive Synthetic Sampling.
3.3.4 Machine learning model SHAP explainable
To directly display the weights of each feature variable and its predictive values of the GB model, SHAP (Shapley Additive exPlanations) algorithms were performed to visualize variables. The SHAP summary plot contained average SHAP values of variables and a representation of each feature’s contribution with SHAP values. The top 20 feature variables were GCS score, vasoactive drugs, acute kidney failure, AIMS65 score, APACHE-II score, mechanical ventilation, lactate_min, chronic liver disease, APTT_min, APTT_max, potassium_max, acute heart failure, anticoagulants, albumin_max, APTT_mean, BUN_miin, sepsis, respiratory rate_max, WBC_min, and heart rate_max.
The SHAP feature importance plot indicated top 20 feature variable weight of the optimal GB model in mortality prediction (Figure 5A). In the SHAP summary plot, each point represents the SHAP value of corresponding feature variable for given sample (Figure 5B). Similarly, points trending toward red color indicate higher feature values, while those approaching blue color denote lower feature values.

Figure 5. SHAP explainable of the gradient boosting prediction model. (A) The SHAP feature importance plot; (B) the SHAP summary plot; (C) the ML explainability of positive patient; (D) the ML explainability of negative patient.
3.3.5 Machine learning Explainability for patients
The SHAP explains model predictions by quantifying feature contributions, visualized via waterfall plots. The positive patient was randomly selected in Figure 5C. The base value of the GB model is E f(x) = 1.36, the vasoactive drugs corresponding to f(x) = 1.22 and previous acute kidney failure corresponding to f(x) = 0.50. Similarly, other feature variables correspond to values. As described in Figure 5C, the final f(x) was 5.581; therefore, the patient was positive case representative.
Similarly, the negative patient was randomly selected in Figure 5D. The base value of the GB model is E f(x) = −1.11, and the patient had not acute kidney failure, corresponding to f(x) = −1.83; lactate min value was 1.0 mmol/L corresponding to f(x) = −0.80. Similarly, other feature variables correspond to f(x) values. As described in Figure 5D, the final f(x) was-3.78; therefore, the patient was negative case representative.
4 Discussion
Despite significant advancements in the prevention and treatment of AUGIB, the prognosis still remains great challenge, especially during ICU hospitalization (33). AUGIB patients constantly presented with massive bleeding, persistent hematemesis, melena, and even with active bleeding. Due to underlying circulatory failure and the possibility of MODS, high-risk patients often require ICU intensive bundle therapy (2). Multidisciplinary collaboration, fluid resuscitation, blood transfusion, correction of coagulopathy, and early endoscopic interventions all contribute to the outcomes (34–36). However, not all patients admitted to the ICU received emergency endoscopy, pharmacological hemostasis, blood transfusion, interventional therapy, or surgical intervention, it was judged by conditions. Previous scores, such as AIMS65, Rockall, and GBS, could be utilized as risk stratification tools. However, the score’s accuracy, sensitivity, and specificity were unsatisfactory, ranging from 70 to 80% (37). The APACHE-II score may be used for prediction in ICU. However, relevant studies are relatively limited (38).
The AUGIB patient mortality is associated with multiple factors, including patient age, medical history, personal history, laboratory tests, and vital signs. Therefore, an integrated model with various variables is desperately needed. Variceal and non-variceal, as major categories of acute upper gastrointestinal hemorrhage, substantially influence clinical outcomes (39). Machine learning models are emerging as powerful tools by AI algorithms, which can achieve high accuracy and automated decision-making with highly adaptable and predictive power (40). Zhao X model was especially for non-variceal GIB patients (12), while the Agarwal S model was used for esophageal varices patients (41). The Kou Y prediction model was designed for GIB patients with acute myocardial infarction (AMI) (42). Thus, we aimed to construct ML model for AUGIB patients based on varice and non-varice subtypes. The MIMC-IV database was real-world clinical data from the ICU of Beth Israel Deaconess Medical Center (BIDMC), open access with millions of electronic health records including structured data and non-structured data, physiological waveform data, and time series data, which was the ideal resource for the ML model (43). The dynamic initial value, the minimum, max, and mean value give more accurate data for prediction. Thus, multiple values of vital signs and laboratory results were integrated into the ML model.
The ET model exhibited excellent performance for AVIGB patients with AUC of 0.996 ± 0.007, accuracy of 0.966 ± 0.009, precision of 0.957 ± 0.024, recall of 0.988 ± 0.012, and F1-score of 0.972 ± 0.007; the GB model was the optimal for ANGIB patients with AUC of 0.985 ± 0.002, accuracy of 0.948 ± 0.009, precision of 0.949 ± 0.009, recall of 0.968 ± 0.009, and F1-score of 0.959 ± 0.007, respectively. In other words, both prediction models can accurately identify positive patients while correctly recognizing negative patients better than previous ML models. Further analysis of the risk factors ranking in the top 10 revealed that vasoactive drugs, GCS score, AIMS65 score, anticoagulation, and SpO2 were the most related variables.
(1) Vasoactive agents: If patients present with persistent bleeding or hemodynamic instability or with comorbidities such as cirrhosis, renal insufficiency, or heart failure, vasoactive drugs should be considered to maintain blood pressure and improve tissue perfusion. AGA Clinical Practice in 2024 suggested that vasoactive drugs should be initiated as soon as the diagnosis of variceal hemorrhage should be continued for 2 to 5 days to prevent early rebleeding (44). As described in previous research, vasoactive agents might be closely related to mortality (45). The use of vasoactive agents indicates hypotension, shock, or circulatory failure, all of which are associated with poor outcomes in AUGIB patients. (2) GCS score: The GCS (Glasgow Coma Scale) score consists of three components: verbal response, eye-opening response, and motor response, which collectively reflect the level of consciousness. Patients with hemodynamic instability, advanced cirrhosis with hepatic encephalopathy (HE), or severe cardiac or renal dysfunction may present with varying degrees of consciousness, leading to diverse GCS scores. Qiu W’s research indicated that a higher GCS score was associated with an increased risk of GIB patients (45). AUGIB patients in ICU have a higher proportion of advanced cirrhosis, chronic renal insufficiency, or congestive heart failure; the lower GCS scores were independent predictive factors of mortality (46). (3) AIMS 65 score: The AIMS65 developed in 2011 by Saltzman JR was simple and easy to implement without endoscopy result (47). It focused on the evaluation of in-hospital mortality with high accuracy (6). The AIMS65 scale also had good predictability and was suitable for rapid preliminary evaluation at the outset (48). The prediction value of AIMS65 was also confirmed by the ML model, as expected. (4) Anticoagulants: The AGC 2020 guideline stresses the association between anticoagulants and antiplatelets in acute GIB patients, and the administration of fresh frozen plasma (FFP) can significantly impact prognosis (49). Gastrointestinal bleeding patients who have coagulopathies or are on oral anticoagulants or antiplatelet agents often face a high risk of massive bleeding or rebleeding due to deficiencies in coagulation factors and prolonged hemostasis. (5) SpO2: SpO2 is a standard indicator for evaluating oxygenation and is characteristic of non-invasive and continuous measures. The minimum SpO2 value indicates the onset of hypoxemia, which reflects inadequate tissue and organ perfusion levels and a decline in cardiorespiratory function (10). In addition, reduced SpO2 is a risk prognostic factor for disease progression (50). The minimum of SpO2 was an independent risk factor in our study, consistent with other researchers (10, 51).
While our ML model demonstrated excellent prediction performance with large-scale data, several limitations should be acknowledged. First, as a retrospective study, it was inherently subject to selection biases or system biases, which may have led to the omission of specific crucial parameters. Second, the MIMIC-IV database is regarded as a signal center medical unit; external validation is required further to confirm the prediction value of the optimal ML models. Third, the etiology of GIB patients is diverse. In this study, no diagnosis-related subgroup analysis was conducted, which may have affected the accuracy to some extent.
5 Conclusion
In this study, we developed prediction models especially for AVGIB and ANGIB hospital ICU patients based on as many as 12 standard algorithms. Considering the imbalanced dataset of real-world patients, the SMOTE-ENN technique was performed to improve model performance and optimize evaluation metrics. When compared with key parameters, gradient boosting and extremely randomized tree both ranked first with excellent performance by integrating feature variables. The SHAP plot visualization displays feature variables by weight: vasoactive drugs, GCS score, AIMS65 score, anticoagulants, and SpO2. Most importantly, two website prognostic prediction platforms were developed to enhance clinical accessibility: the ET model for AVGIB patients available at https://10zr656do5281.vicp.fun while the GB model for ANGIB patients accessible at http://10zr656do5281.vicp.fun.
The models provide valuable decision support for clinicians, enabling the early identification of at-risk patients, timely initiation of endoscopy, correction of coagulation dysfunction, fluid resuscitation, and combined with interventional or even surgery to reduce mortality and improve outcomes.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.
Ethics statement
The requirement of ethical approval was waived by MIMIC-IV database in Beth Israel Deaconess Medical Center for the studies involving humans because the patients private information is not contained in the manuscript. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin because the patient already discharged from the hospital. Written informed consent was not obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article because the patients already discharged from the hospital without private information.
Author contributions
ZL: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing. GJ: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Writing – original draft. LiaZ: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft. PS: Conceptualization, Data curation, Formal analysis, Investigation, Supervision, Writing – original draft, Writing – review & editing. YH: Conceptualization, Data curation, Formal analysis, Methodology, Software, Writing – original draft. YZ: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Writing – original draft. GL: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Writing – original draft, Writing – review & editing. YX: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Writing – original draft, Writing – review & editing. LiyZ: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Writing – original draft, Writing – review & editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This work was supported by the National Natural Science Foundation of China, No. 82272226, the Knowledge Innovation Program of Wuhan, No. 2023020201010165, and the Cross-Innovation Talent Project of Renmin Hospital of Wuhan University, No. JCRCZN-2022-017 and No. JCRCYG-2022-005.
Acknowledgments
We are profoundly grateful for the MIMIC database, which has greatly facilitated our research.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that no Gen AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2025.1580094/full#supplementary-material
References
1. Costable, NJ, and Greenwald, DA. Upper gastrointestinal bleeding. Clin Geriatr Med. (2021) 37:155–72. doi: 10.1016/j.cger.2020.09.001
2. Al Duhailib, Z, Dionne, JC, and Alhazzani, W. Management of severe upper gastrointestinal bleeding in the Icu. Curr Opin Crit Care. (2020) 26:212–8. doi: 10.1097/MCC.0000000000000699
3. Laine, L, Barkun, AN, Saltzman, JR, Martel, M, and Leontiadis, GI. Acg clinical guideline: upper gastrointestinal and ulcer bleeding. Am J Gastroenterol. (2021) 116:899–917. doi: 10.14309/ajg.0000000000001245
4. Wilkins, T, Wheeler, B, and Carpenter, M. Upper gastrointestinal bleeding in adults: evaluation and management. Am Fam Physician. (2020) 101:294–300.
5. Gralnek, IM, Dumonceau, JM, Kuipers, EJ, Lanas, A, Sanders, DS, Kurien, M, et al. Diagnosis and management of nonvariceal upper gastrointestinal hemorrhage: European Society of Gastrointestinal Endoscopy (Esge) guideline. Endoscopy. (2015) 47:a1–a46. doi: 10.1055/s-0034-1393172
6. Boyapati, R, Majumdar, A, and Robertson, M. Aims65: a promising upper gastrointestinal bleeding risk score but further validation required. World J Gastroenterol. (2014) 20:14515–6. doi: 10.3748/wjg.v20.i39.14515
7. Lee, HA, Jung, HK, Kim, TO, Byeon, JR, Jeong, ES, Cho, HJ, et al. Clinical outcomes of acute upper gastrointestinal bleeding according to the risk indicated by Glasgow-Blatchford risk score-computed tomography score in the emergency room. Korean J Intern Med. (2022) 37:1176–85. doi: 10.3904/kjim.2022.099
8. Stanley, AJ, Laine, L, Dalton, HR, Ngu, JH, Schultz, M, Abazi, R, et al. Comparison of risk scoring systems for patients presenting with upper gastrointestinal bleeding: international multicentre prospective study. BMJ. (2017) 356:i6432. doi: 10.1136/bmj.i6432
9. Sungono, V, Hariyanto, H, Soesilo, TEB, Adisasmita, AC, Syarif, S, Lukito, AA, et al. Cohort study of the Apache ii score and mortality for different types of intensive care unit patients. Postgrad Med J. (2022) 98:914–8. doi: 10.1136/postgradmedj-2021-140376
10. Yuan, L, and Yao, W. Development and validation of a risk prediction model for in-hospital mortality in patients with acute upper gastrointestinal bleeding. Clin Appl Thromb Hemost. (2023) 29:10760296231207806. doi: 10.1177/10760296231207806
11. Benedeto-Stojanov, D, Bjelaković, M, Stojanov, D, and Aleksovski, B. Prediction of in-hospital mortality after acute upper gastrointestinal bleeding: cross-validation of several risk scoring systems. J Int Med Res. (2022) 50:3000605221086442. doi: 10.1177/03000605221086442
12. Ungureanu, BS, Gheonea, DI, Florescu, DN, Iordache, S, Cazacu, SM, Iovanescu, VF, et al. Predicting mortality in patients with nonvariceal upper gastrointestinal bleeding using machine-learning. Front Med (Lausanne). (2023) 10:1134835. doi: 10.3389/fmed.2023.1134835
13. Jiang, F, Jiang, Y, Zhi, H, Dong, Y, Li, H, Ma, S, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. (2017) 2:230–43. doi: 10.1136/svn-2017-000101
14. Raţiu, I, Lupuşoru, R, Popescu, A, Sporea, I, Goldiş, A, Dănilă, M, et al. Acute gastrointestinal bleeding: a comparison between variceal and nonvariceal gastrointestinal bleeding. Medicine (Baltimore). (2022) 101:e31543. doi: 10.1097/MD.0000000000031543
15. Johnson, AEW, Bulgarelli, L, Shen, L, Gayles, A, Shammout, A, Horng, S, et al. Mimic-iv, a freely accessible electronic health record dataset. Sci Data. (2023) 10:1. doi: 10.1038/s41597-022-01899-x
16. Stoltzfus, JC. Logistic regression: a brief primer. Acad Emerg Med. (2011) 18:1099–104. doi: 10.1111/j.1553-2712.2011.01185.x
17. Podgorelec, V, Kokol, P, Stiglic, B, and Rozman, I. Decision trees: an overview and their use in medicine. J Med Syst. (2002) 26:445–63. doi: 10.1023/A:1016409317640
18. Hu, J, and Szymczak, S. A review on longitudinal data analysis with random forest. Brief Bioinform. (2023) 24:bbad002. doi: 10.1093/bib/bbad002
19. Zhang, Z, Zhao, Y, Canes, A, Steinberg, D, and Lyashevska, Owritten on behalf of AME Big-Data Clinical Trial Collaborative Group. Predictive analytics with gradient boosting in clinical medicine. Ann Transl Med. (2019) 7:152. doi: 10.21037/atm.2019.03.29
20. Zheng, Z, and Yang, Y. Adaptive boosting for domain adaptation: toward robust predictions in scene segmentation. IEEE Trans Image Process. (2022) 31:5371–82. doi: 10.1109/TIP.2022.3195642
21. Liu, M, Guo, C, and Guo, S. An explainable knowledge distillation method with Xgboost for Icu mortality prediction. Comput Biol Med. (2023) 152:106466. doi: 10.1016/j.compbiomed.2022.106466
22. Sugahara, S, and Ueno, M. Exact learning augmented naive Bayes classifier. Entropy (Basel). (2021) 23:1703. doi: 10.3390/e23121703
23. Valkenborg, D, Rousseau, AJ, Geubbelmans, M, and Burzykowski, T. Support vector machines. Am J Orthod Dentofacial Orthop. (2023) 164:754–7. doi: 10.1016/j.ajodo.2023.08.003
24. Liao, H, Zhang, X, Zhao, C, Chen, Y, Zeng, X, and Li, H. Lightgbm: an efficient and accurate method for predicting pregnancy diseases. J Obstet Gynaecol. (2022) 42:620–9. doi: 10.1080/01443615.2021.1945006
25. Ehsani, R, and Drabløs, F. Robust distance measures for knn classification of Cancer data. Cancer Informat. (2020) 19:1176935120965542. doi: 10.1177/1176935120965542
26. Martiello Mastelini, S, Nakano, FK, Vens, C, and de Leon Ferreira de Carvalho, ACP. Online extra trees Regressor. IEEE Trans Neural Netw Learn Syst. (2023) 34:6755–67. doi: 10.1109/TNNLS.2022.3212859
27. El-Kenawy, EM, Ibrahim, A, Mirjalili, S, Eid, MM, and Hussein, SE. Novel feature selection and voting classifier algorithms for Covid-19 classification in Ct images. IEEE Access. (2020) 8:179317–35. doi: 10.1109/ACCESS.2020.3028012
28. Kim, JH, Shin, JK, Lee, H, Lee, DH, Kang, JH, Cho, KH, et al. Improving the performance of machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method. Water Res. (2021) 207:117821. doi: 10.1016/j.watres.2021.117821
29. Yang, F, Wang, K, Sun, L, Zhai, M, Song, J, and Wang, H. A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis. BMC Med Inform Decis Mak. (2022) 22:344. doi: 10.1186/s12911-022-02075-2
30. Gao, C, Bian, X, Wu, L, Zhan, Q, Yu, F, Pan, H, et al. A nomogram predicting the histologic activity of lupus nephritis from clinical parameters. Nephrol Dial Transplant. (2024) 39:520–30. doi: 10.1093/ndt/gfad191
31. Li, J, Liu, S, Hu, Y, Zhu, L, Mao, Y, and Liu, J. Predicting mortality in intensive care unit patients with heart failure using an interpretable machine learning model: retrospective cohort study. J Med Internet Res. (2022) 24:e38082. doi: 10.2196/38082
32. Liu, Z, Zhang, L, Li, G, Bai, WH, Wang, PX, Jiang, GJ, et al. A nomogram model for prediction of mortality risk of patients with dangerous upper gastrointestinal bleeding: a two-center retrospective study. Curr Med Sci. (2023) 43:723–32. doi: 10.1007/s11596-023-2748-z
33. Redondo-Cerezo, E, Tendero-Peinado, C, López-Tobaruela, JM, Fernandez-García, R, Lancho, A, Ortega-Suazo, EJ, et al. Risk factors for massive gastrointestinal bleeding occurrence and mortality: a prospective single-center study. Am J Med Sci. (2024) 367:259–67. doi: 10.1016/j.amjms.2024.01.012
34. Zhang, Z, and Yan, J. Comparative study on clinical effect of multidisciplinary treatment model and traditional consultation model in treatment of dangerous upper gastrointestinal hemorrhage. Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. (2020) 32:1107–10. doi: 10.3760/cma.j.cn121430-20200520-00395
35. Guo, CLT, Wong, SH, Lau, LHS, Lui, RNS, Mak, JWY, Tang, RSY, et al. Timing of endoscopy for acute upper gastrointestinal bleeding: a territory-wide cohort study. Gut. (2022) 71:1544–50. doi: 10.1136/gutjnl-2020-323054
36. Boros, E, Sipos, Z, Hegyi, P, Teutsch, B, Frim, L, Váncsa, S, et al. Prophylactic transcatheter arterial embolization reduces rebleeding in non-variceal upper gastrointestinal bleeding: a meta-analysis. World J Gastroenterol. (2021) 27:6985–99. doi: 10.3748/wjg.v27.i40.6985
37. Lincoln, M, Keating, N, O’loughlin, C, Tam, A, O'Kane, M, MacCarthy, F, et al. Comparison of risk scoring systems for critical care patients with upper gastrointestinal bleeding: predicting mortality and length of stay. Anaesthesiol Intensive Ther. (2022) 54:310–4. doi: 10.5114/ait.2022.120741
38. Deng, K, Jing, W, and Yang, J. Adding the analysis or discussion to the Apache ii score may clearly explain the predictive risk associated with upper gastrointestinal bleeding via the re.Co.De score. Gastrointest Endosc. (2022) 96:385. doi: 10.1016/j.gie.2022.03.019
39. Tandon, P, Bishay, K, Fisher, S, Yelle, D, Carrigan, I, Wooller, K, et al. Comparison of clinical outcomes between variceal and non-variceal gastrointestinal bleeding in patients with cirrhosis. J Gastroenterol Hepatol. (2018) 33:1773–9. doi: 10.1111/jgh.14147
40. Margraf, JT. Science-driven atomistic machine learning. Angew Chem Int Ed Eng. (2023) 62:e202219170. doi: 10.1002/anie.202219170
41. Agarwal, S, Sharma, S, Kumar, M, Venishetty, S, Bhardwaj, A, Kaushal, K, et al. Development of a machine learning model to predict bleed in esophageal varices in compensated advanced chronic liver disease: a proof of concept. J Gastroenterol Hepatol. (2021) 36:2935–42. doi: 10.1111/jgh.15560
42. Kou, Y, Ye, S, Tian, Y, Yang, K, Qin, L, Huang, Z, et al. Risk factors for gastrointestinal bleeding in patients with acute myocardial infarction: multicenter retrospective cohort study. J Med Internet Res. (2025) 27:e67346. doi: 10.2196/67346
43. Xie, W, Li, Y, Meng, X, and Zhao, M. Machine learning prediction models and nomogram to predict the risk of in-hospital death for severe Dka: a clinical study based on Mimic-iv, eicu databases, and a college hospital Icu. Int J Med Inform. (2023) 174:105049. doi: 10.1016/j.ijmedinf.2023.105049
44. Garcia-Tsao, G, Abraldes, JG, Rich, NE, and Wong, VWS. Aga clinical practice update on the use of vasoactive drugs and intravenous albumin in cirrhosis: expert review. Gastroenterology. (2024) 166:202–10. doi: 10.1053/j.gastro.2023.10.016
45. Long, B, and Gottlieb, M. Emergency medicine updates: upper gastrointestinal bleeding. Am J Emerg Med. (2024) 81:116–23. doi: 10.1016/j.ajem.2024.04.052
46. Kaya, E, Karaca, MA, Aldemir, D, and Ozmen, MM. Predictors of poor outcome in gastrointestinal bleeding in emergency department. World J Gastroenterol. (2016) 22:4219–25. doi: 10.3748/wjg.v22.i16.4219
47. Saltzman, JR, Tabak, YP, Hyett, BH, Sun, X, Travis, AC, and Johannes, RS. A simple risk score accurately predicts in-hospital mortality, length of stay, and cost in acute upper Gi bleeding. Gastrointest Endosc. (2011) 74:1215–24. doi: 10.1016/j.gie.2011.06.024
48. Kim, MS, Choi, J, and Shin, WC. Aims65 scoring system is comparable to Glasgow-Blatchford score or Rockall score for prediction of clinical outcomes for non-variceal upper gastrointestinal bleeding. BMC Gastroenterol. (2019) 19:136. doi: 10.1186/s12876-019-1051-8
49. Abraham, NS, Barkun, AN, Sauer, BG, Douketis, J, Laine, L, Noseworthy, PA, et al. American College of Gastroenterology-Canadian Association of Gastroenterology clinical practice guideline: Management of Anticoagulants and Antiplatelets during Acute Gastrointestinal Bleeding and the Periendoscopic period. Am J Gastroenterol. (2022) 117:542–58. doi: 10.14309/ajg.0000000000001627
50. Yin, H, Yang, R, Xin, Y, Jiang, T, and Zhong, D. In-hospital mortality and SpO2 incritical care patients with cerebral injury: data from the Mimic-iv database. BMC Anesthesiol. (2022) 22:386. doi: 10.1186/s12871-022-01933-w
Keywords: acute variceal gastrointestinal bleeding, acute non-variceal gastrointestinal bleeding, extremely randomized trees, gradient boosting, artificial intelligence, mortality
Citation: Liu Z, Jiang G, Zhang L, Shrestha P, Hu Y, Zhu Y, Li G, Xiong Y and Zhan L (2025) The future of critical care: AI-powered mortality prediction for acute variceal gastrointestinal bleeding and acute non-variceal gastrointestinal bleeding patients. Front. Med. 12:1580094. doi: 10.3389/fmed.2025.1580094
Edited by:
Raffaele Pellegrino, University of Campania Luigi Vanvitelli, ItalyReviewed by:
Jonathan Soldera, University of Caxias do Sul, BrazilAli Taha, University Hospital Crosshouse, United Kingdom
Copyright © 2025 Liu, Jiang, Zhang, Shrestha, Hu, Zhu, Li, Xiong and Zhan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Yuanguo Xiong, Uk0wMDIzOTdAd2h1LmVkdS5jbg==; Liying Zhan, emhhbmxpeWluZ0B3aHUuZWR1LmNu