Investigating the feasibility of machine learning to guide personalized red blood cell (RBC) transfusion: analyzing the heterogeneity of RBC transfusion in septic patients with hemoglobin levels of 7–9 g/dL based on the causal forest model

Yang, Penglei; Yuan, Jun; He, Jie; Yu, Lina; Gu, Xue; Ding, Xizhen; Chen, Qihong

doi:10.3389/fphar.2025.1615618

ORIGINAL RESEARCH article

Front. Pharmacol., 28 August 2025

Sec. Translational Pharmacology

Volume 16 - 2025 | https://doi.org/10.3389/fphar.2025.1615618

Investigating the feasibility of machine learning to guide personalized red blood cell (RBC) transfusion: analyzing the heterogeneity of RBC transfusion in septic patients with hemoglobin levels of 7–9 g/dL based on the causal forest model

Penglei Yang¹

Jun Yuan¹

Jie He²

Lina Yu¹

Xue Gu¹

Xizhen Ding¹

Qihong Chen¹*

¹Department of Critical Care Medicine, Jiangdu People’s Hospital Affiliated to Yangzhou University, Yangzhou, China
²Department of Emergency Medicine, Yangzhou Jiangdu Traditional Chinese Medicine Hospital, Yangzhou, China

Background: This study utilized the causal forest algorithm to explore the heterogeneity of treatment effects of low-dose red blood cell (RBC) transfusion on the 90-day survival rate of sepsis patients with hemoglobin (Hb) levels of 7–9 g/dL to develop personalized transfusion strategies.

Methods: The data of patients the met the Sepsis-3 criteria with a minimum Hb level of 7–9 g/dL were obtained from the MIMIC-IV and MIMIC-III databases and divided into RBC transfusion and non-transfusion groups. Patients in both groups were paired using a propensity score matching analysis (PSM) after which a causal forest model was constructed using MIMIC-IV data. The model’s accuracy was analyzed using out-of-bag data. Individual treatment effects (ITE) of MIMIC-III patients were predicted and categorized into four subgroups: Quantile1 to Quantile4, based on the effect size. Kaplan-Meier survival curves were established for each Quantile to determine the survival rates.

Results: The MIMIC-IV and MIMIC-III database comprised 1,652 and 868 patients, with 826 (50%) and 434 (50%) in the RBC transfusion group, respectively, after PSM. The mean prediction coefficient estimated by the causal forest was 1.00 with a standard error of 0.57, while the differential forest prediction coefficient was 1.64 with a standard error of 0.48, demonstrating the model’s ability to effectively identify differences in the impact of transfusion on survival rates among individuals. There was significant heterogeneity in the ITE among patients in the MIMIC-III validation cohort. Moreover, the ITE values were divided into Quantile1: −5.4% (−8.0%, −3.9%), Quantile2: −2.1% (−2.6%, −1.7%), Quantile3: −0.5% (−0.1%, +0.1%), and Quantile 4: +3.6% (+2.0%, +6.6%). The Kaplan-Meier curves and the log-rank test demonstrated that the RBC transfusion decreased the survival of patients in Quantile1 (p < 0.001) and Quantile2 (p = 0.011) but increased the survival of patients in Quantile4 (p < 0.001).

Conclusion: RBC transfusions among sepsis patients with Hb levels of 7–9 g/dL exhibit heterogenous treatment effects, which reduces the mortality of patients with high ITE. Although the causal forest model can guide personalized transfusion in these cases, randomized controlled trials are needed to validate these findings.

GRAPHICAL ABSTRACT

Flowchart illustrating a machine learning approach for guiding red blood cell transfusions in septic patients. MIMIC-IV data identifies patients with hemoglobin levels of 7-9 g/dL. A causal forest model calculates individual treatment effects (ITE) on 90-day survival, validated with MIMIC-III data. Four subgroups are stratified by ITE: Quantile 1 shows RBC transfusion is harmful, Quantile 2 might be harmful, Quantile 3 might be beneficial, and Quantile 4 shows it is beneficial. Each subgroup is represented with graphs showing survival probabilities over time for transfusion and non-transfusion strategies.

GRAPHICAL ABSTRACT | Exploring the Potential of Machine Learning to Guide Personalized Red Blood Cell (RBC) Transfusion: Analyzing the Heterogeneity of RBC Transfusion in Septic Patients with Hemoglobin Levels of 7–9 g/dL.

Introduction

Sepsis is a life-threatening organ dysfunction caused by a dysregulated immune response to infection (Evans, 2018) and is associated with high incidence and mortality rates. Approximately 19 million global cases of sepsis are reported annually, resulting in five million deaths and the World Health Organization has thus prioritized the diagnosis and treatment of sepsis globally (World Health Organization, 2017). Blood transfusion is an important additional therapy for patients with sepsis (Evans et al., 2021). Treating sepsis requires enhancing oxygen delivery and reducing tissue hypoxia. Hemoglobin (Hb) plays a vital role in facilitating oxygen delivery. Sepsis patients are particularly susceptible to anemia because of hemolysis, hemorrhage, fluid resuscitation, inflammatory responses, and underlying conditions (Qi and Peng, 2021). Yang et al. reported that elderly septic patients with hemoglobin levels below 10 g/dL had a significantly increased risk of mortality (Yang et al., 2023). Studies suggest transfusion below 7 g/dL reduces mortality risk. However, its effects on sepsis patients with Hb levels of 7–9 g/dL remain controversial (Hébert et al., 1999; Parsons et al., 2011; Perner et al., 2012; Holst et al., 2014; Rosland et al., 2014; Evans et al., 2021).

The recommendation to have a restrictive transfusion strategy for sepsis patients when their Hb levels fall below 7 g/dL has limited evidence (Hébert et al., 1999; Holst et al., 2014; Evans et al., 2021). The TRISS trial (Holst et al., 2014) found no significant difference in mortality rates between using 7 g/dL and 9 g/dL as red blood cell transfusion thresholds. These studies suggested that transfusion may not benefit patients with Hb levels of 7–9 g/dL. However, an RBC transfusion threshold of 9 g/dL was found to reduce mortality in cancer patients with sepsis in the TRICOP trial (Bergamin et al., 2017). In a recent retrospective analysis, transfusion of red blood cells (RBC) at an average Hb level of 8.50 g/dL reduced mortality in sepsis patients with chronic kidney disease (Chen et al., 2024). However, there is potential heterogeneity in the treatment effects of transfusions for sepsis patients with Hb levels of 7–9 g/dL. Adopting a uniform transfusion threshold of Hb 7 g/dL is thus not suitable for all sepsis patients. It is thus crucial to identify the factors that influence the effectiveness of RBC transfusions in sepsis patients with Hb levels of 7–9 g/dL for effective transfusion management.

The causal forest model, developed by Susan Athey and Stefan Wager, is a machine-learning method used to estimate causal effects. It was designed to generate many diverse causal trees through subsampling. The model estimates individual treatment effects (ITEs) and provides recommendations for personalized treatment strategies by averaging the predictions of these causal trees (Athey and Imbens, 2016; Wager and Athey, 2018; Inoue et al., 2023). Osawa et al. identified that the causal forest model could identify patients who benefit from polymyxin B hemoperfusion therapy through retrospective data from the JSEPTIC-DIC study (Osawa et al., 2023). Elsewhere, Inoue et al. reported that the causal forest model can guide the administration of personalized statin therapy, potentially reducing cardiovascular disease risk in patients through applied propensity score matching using observational study data (Inoue et al., 2023). However, the ability of the causal forest model to identify sepsis patients with Hb levels between 7–9 g/dL who would benefit from red blood cell transfusions lacks sufficient evidence.

In this study, a causal forest model was constructed based on data of septic patients with Hb levels between 7–9 g/dL from the Medical Information Mart for Intensive Care (MIMIC) -IV database. The model was validated using data from the MIMIC-III database.

Methods

Data source

Data were extracted from the MIMIC-IV and MIMIC-III databases maintained by the Massachusetts Institute of Technology. The data was extracted using Navicat and Structured Query Language (SQL) and primarily originated from the intensive care unit of Beth Israel Deaconess Medical Center. Further data processing was conducted through R Studio using R version 4.4.2. The version of RStudio is 2024.12.1 (Build 563). The researchers had permission (License Number: 36463743) to use the data in the MIMIC databases, which are open-access, and adhered to the data use agreement.

Patient selection

Sepsis patients with a minimum Hb level of 7–9 g/dL admitted to the intensive care unit (ICU) for the first time were enrolled in the study. Sepsis was defined as an infection with a Sequential Organ Failure Assessment (SOFA) score increase of 2 or more from baseline (Levy et al., 2018). The sepsis patient extraction code published by the official MIMIC-IV database team on GitHub (MIT-LCP, 2020) and similar to the infection extraction method by Seymour et al. (2016) and Hu et al. (2023) was employed. According to Nilsson et al., patients who receive ≤670 mL of red blood cells within the first 5 days of ICU admission are deemed to have received low-dose RBC transfusions (Nilsson et al., 2020). Herein, patients with missing Hb data, as well as those with trauma, acute myocardial infarction, gastrointestinal bleeding, and those who received >670 mL of RBC transfusions, were excluded from the study because of the significant differences in the pathological characteristics of sepsis patients with such complications (Perner et al., 2012; Dupuis et al., 2017; Nilsson et al., 2020). The detailed patient selection process is outlined in the Supplementary Material (Supplementary Appendix SA, SB).

Data extraction

The characteristics of sepsis patients from the MIMIC-IV and MIMIC-III databases, including baseline information, such as age, gender, height, weight, Charlson Comorbidity Index, chronic obstructive pulmonary disease (COPD), chronic heart failure (CHF), and chronic kidney disease (CKD) were extracted. Additional parameters were also collected within 24 h of ICU admission. They included pH (Pondus Hydrogenii), arterial oxygen pressure (PaO₂), arterial carbon dioxide tension (PaCO₂), peripheral capillary oxygen saturation (SpO₂), heart rate (HR), respiratory rate (RR), mean arterial pressure (MAP), white blood cell count (WBC), blood urea nitrogen (BUN), creatinine (Cr), prothrombin time (PT), activated partial thromboplastin time (APTT), lactate (Lac), blood sodium (Na), blood potassium (K), blood glucose (G), body temperature (T), Simplified Acute Physiology Score (SAPS)-II, SAPS-III, Systemic Inflammatory Response Syndrome (SIRS), and Sequential Organ Failure Assessment (SOFA) score among other parameters. The patients’ Hb levels from the first to the fifth day after ICU admission were also recorded. The specific indicators are detailed in Supplementary Appendix SC.

Statistical analysis

Statistical analyses were conducted using RStudio in R version 4.4.2. Multiple imputations for data with less than 40% missing values in the MIMIC-IV database were done (Hu et al., 2023). The “mice” (v3.18.0) package in R to imputation the missing variables. Detailed imputation methods are outlined in the Supplementary Material (Supplementary Appendix SD). Sepsis patients who received low-dose RBC transfusions were referred to as the RBC transfusion group, while those who did not were the non-transfusion group. Normally distributed continuous variables are presented as means ± standard deviation (X ± s) and were compared using the student's t-test to determine if there were any significant differences. Non-normally distributed continuous variables are presented as median (interquartile range) [M(QL, QU)] and were compared using the Mann-Whitney U test to determine whether the distributions were significantly different. Categorical variables are expressed as rates, and their distribution was compared using the χ² chi-square test. The statistical significance level was set at P < 0.05.

Incorporating all variables into the model would increase the risk of overfitting because of the many variables included. As such, the Lasso regression and the Boruta model were used to select variables associated with 90-day mortality as covariates for propensity score matching (PSM) analysis by the “glmnet” (v4.1-9) package, the “Boruta” package (v8.0.0), and the “MatchIt” (v4.7.2) package. The PSM analysis was conducted using the intersection of variables selected by Lasso regression and the Boruta algorithm. PSM was performed using the nearest neighbor matching method without replacement. The matching ratio between the RBC transfusion and non-transfusion groups was 1:1, while the caliper value was 0.10.

The “lcmm” package (v2.2.1) was used to perform a latent class mixed models on Hb levels during the first 3 days after ICU admission. The patients were assigned to the latent classes (trajectory groups) based on their maximum posterior probability. Each class represents a different longitudinal pattern characterized by the estimated trajectory parameters. The Hb trajectory was included as a covariate in the casual forest model.

MIMIC-IV data was used as the derivation cohort, while MIMIC-III data was used as the validation cohort. The “grf” (v2.40) package in R was employed to construct a causal forest model to predict the treatment effect of RBC transfusion on 90-day survival. An ensemble of 5,000 causal trees was erected using the honest splitting method to minimize model overfitting. In this approach, each tree algorithm used a randomly selected 50% subsample from the training set (without replacement) to build the tree structure. The subsample was then split in half. The first half was used to construct the tree structure, while the second half was used for prediction. The remaining hyperparameters were screened using an automatic hyperparameter optimization system. The model was calibrated by fitting the best linear fit of the regression of the observed association on the predicted association (Inoue et al., 2023). The model’s features were displayed using bar charts. Currently, there is no standardized method for variable selection in causal forest models. We tested different numbers of top important features, ranging from the top five to the top twenty, as well as incorporating all available variables. We found that both too many or too few variables reduced predictive performance and ability to detect treatment heterogeneity. Ultimately, we selected top ten ranked variables as covariates in the final model. The final model was reconstructed using the top ten ranked features based on feature importance, and its optimal linear function accuracy was assessed using out-of-bag data. The individual treatment effect (ITE) for patients in the MIMIC-III validation set was predicted, followed by plotting the total operating characteristic (TOC) curve. The Area Under the TOC Curve (AUTOC) was subsequently calculated to evaluate the model’s discriminative ability. Patients from the MIMIC-IV training set and the MIMIC-III validation were classified into four subgroups, Quantile1 to Quantile4, based on ITE values, with ITE values increasing from Quantile1 to Quantile4. Kaplan-Meier survival curves for 90-day survival were plotted to evaluate the survival differences between the 4 subgroups, followed by a log-rank test to compare the discrepancies. Figure 1 is a flow chart of the statistical analysis. Supplementary Appendix SE shows R code for statistical analysis.

Figure 1

Flowchart depicting the study methodology. Process starts with patient selection based on Sepsis-3 criteria, excluding certain conditions. Features are extracted within 24 hours in ICU, focusing on patient information, vital signs, organ function, blood gas analysis, and treatment information. LASSO regression and Boruta algorithm select mortality-related features. Propensity score matching (PSM) is employed with nearest neighbor matching. Causal forest model development follows, using derivation and validation cohorts from MIMIC-IV and MIMIC-III databases. The model identifies significant features, employs TOC curve and Kaplan-Meier curves for results.

Figure 1. A flow chart of the statistical analysis process.

Results

Baseline characteristics of sepsis patients

A total of 28,087 and 12,512 patients met the Sepsis-3 criteria in the MIMIC-IV and MIMIC-III databases, respectively. After applying the exclusion criteria, 6,182 patients in the MIMIC-IV and 2,340 patients in the MIMIC-III databases were included, with 857 (13.86%) and 458 (19.57%) belonging to the RBC transfusion group. Figure 2 illustrates the patient screening process. The baseline characteristics of the two groups of patients are detailed in Supplementary Table S1.

Figure 2

Flow chart comparing patient selection from MIMIC-IV 3.0 and MIMIC-III 1.4 databases. In (A), from 94,458 ICU admissions, 65,366 were first-time admissions; 28,087 met Sepsis-3 criteria. After exclusions, 6,182 were analyzed. In (B), from 61,532 admissions, 46,428 were first-time; 12,512 met Sepsis-3 criteria. After exclusions, 2,340 were analyzed. Exclusions included repeat admissions, non-Sepsis-3 patients, age restrictions, and various medical conditions or treatments.

Figure 2. Flowcharts for patient selection from the MIMIC-IV dataset (A) and the MIMIC-III dataset (B).

Feature selection and propensity score matching analysis

The LASSO regression and Boruta model were used to select variables associated with 90-day mortality as covariates for PSM analysis. LASSO regression and Boruta model identified 60 and 54 variables, respectively, associated with mortality. A combination of LASSO regression and Boruta model results led to the selection of 48 variables. The specific process of variable selection is detailed in Supplementary Appendix SE.

PSM analysis included 48 features as covariates. There were 1,652 matched patients in the MIMIC-IV database: 826 in the RBC transfusion group and 826 in the non-transfusion group. In contrast, the MIMIC-III database had 868 matched patients: 434 in the RBC transfusion group and 434 in the non-transfusion group. In the MIMIC-IV database, the 24-h fluid intake was significantly higher in the transfusion group compared to the non-transfusion group (p < 0.001). Notably, after matching, no statistically significant differences were observed in the remaining baseline characteristics between the RBC transfusion and non-transfusion groups in either database (p > 0.05) (Table 1).

Table 1

Table 1. Baseline characteristics of patients in the RBC transfusion and non-transfusion groups after propensity score matching.

The Kaplan-Meier curves and log-rank test in both the MIMIC-IV and MIMIC-III datasets demonstrated no statistically significant differences in the 90-day survival rates between the RBC transfusion and non-transfusion groups after matching (p > 0.05) (Supplementary Figure S1). Similarly, there was no statistically significant difference in Hb levels between the transfusion and non-transfusion groups upon ICU admission in the MIMIC-IV and MIMIC-III datasets (p > 0.05). Hb levels were significantly higher in the transfusion group compared to the non-transfusion group (p < 0.05) on the third, fourth, and fifth days after ICU admission (Supplementary Figure S2; Supplementary Table S1). The trajectories of Hb were classified into three categories after matching: decreasing, stable, and increasing class (Supplementary Figures S3, S4). The relative entropy was 0.759, while the Bayesian Information Criterion (BIC) was 18,688. The Average Posterior Probability of Assignment (APPA) values of the decreasing, stable, and increasing classes were 0.890, 0.904, and 0.808, respectively.

Causal forest models and survival analysis

The propensity-matched data from the MIMIC-IV and MIMIC-III databases were the derivation and validation cohort, respectively. All baseline characteristics and Hb trajectories were used as covariates. SAPS-II, minimum WBC (WBC_min), maximum BUN (BUN_max), minimum Na (Na_min), mean HR (HR_mean), minimum PLT (PLT_min), minimum RDW (RDW_min), Age, maximum RR (RR_max), and minimum HCO₃ (HCO₃_min) within the first 24 h of ICU admission were the top ten features ranked by importance and were used to develop the final model. The importance of each feature in the final model is illustrated in Figure 3. The accuracy of the final causal forest model was validated using the best linear model on out-of-bag data. The mean forest prediction coefficient was 1.00 with a standard error of 0.57, indicating good model accuracy, while the differential forest prediction coefficient was 1.64 with a standard error of 0.48, demonstrating the model’s ability to effectively identify differences in the impact of transfusion on survival rates among individuals. The ranking of individual treatment effects for patients in the MIMIC-IV and MIMIC-III datasets showed heterogeneity in treatment effects among the patients (Figure 4). Noteworthy, the average treatment effect decreased as the treatment proportion increased (Supplementary Figure S5). The area under the Targeted Operating Characteristic curve (AUTOC) was 0.08 ± 0.02. This value indicated that the model effectively identified the treatment effect of RBC transfusion on 90-day survival rates. The partial dependence plot revealed a nonlinear relationship between the 10 features included in the treatment effects of RBC transfusion (Figure 5). Patients with higher SAPS-II, WBC_min, BUN_max, Na_min, and RDW_min values generally experienced increased survival rates from RBC transfusion. Conversely, patients with higher HR_mean, PLT_min, Age, and HCO₃_min values were associated with decreased survival rates following RBC transfusion. RR_max and treatment effects exhibited an inverted U-shape relationship, where the survival rates initially increased and then decreased with an increase in RR_max. The ITE values for 90-day survival rates in the MIMIC-IV derivation cohort were: Quantile1: −1.1% (−2.4%, −0.3%), Quantile2: +1.0% (+0.1%, +1.4%), Quantile3: +2.6% (+2.1%, +3.3%), and Quantile4: +8.3% (+5.8%, +12.5%) (Supplementary Figure S6). In contrast, the ITE values on 90-day survival rates the MIMIC-III validation cohort were: Quantile1: −5.4% (−8.0%, −3.9%), Quantile2: −2.1% (−2.6%, −1.7%), Quantile3: −0.5% (−0.1%, +0.1%), and Quantile4: +3.6% (+2.0%, +6.6%) (Supplementary Figure S6). The Kaplan-Meier curves and log-rank test indicated that RBC transfusion decreased the patient survival rates in the Quantile1 subgroup (p < 0.01) but increased the survival rates in the Quantile4 subgroup (p < 0.01) in the MIMIC-IV derivation cohort (Supplementary Figure S7). RBC transfusion decreased the survival rates in Quantile1 (p < 0.001) and Quantile2 subgroups (p = 0.011) but increased the survival rates in the Quantile4 subgroup (p < 0.001) in the MIMIC-III validation cohort (Figure 6). In the MIMIC-IV database, the Quantile4 subgroup exhibited the highest values of SAPS II, WBC_min, BUN_max, PLT_min, RDW_min, age, and RR_max (all p < 0.05), as well as the lowest value of HCO₃_min (p < 0.05) (Supplementary Table S3). Similarly, in the MIMIC-III database, the Quantile4 subgroup demonstrated the highest SAPS II, WBC_min, BUN_max, PLT_min, RDW_min, age, and RR_max values (all p < 0.05), and the lowest HCO₃_min value (p < 0.05) (Supplementary Table S3).

Figure 3

Horizontal bar chart depicting feature importance in descending order. SAPS_II has the highest importance, followed by WBC_min, BUN_max, Na_min, HR_mean, PLT_min, RDW_min, Age, RR_max, and HCO3_min. Bars are orange, representing the feature importance values.

Figure 3. Bar chart illustrating the importance of features in the final causal forest model. The features are the patient’s parameters measured within the first 24 h of admission to the ICU. ICU: Intensive Care Unit; SAPS-II, Simplified Acute Physiology Score II; WBC_min, Minimum white blood cell count within 24 h in ICU; BUN_max, Maximum blood urea nitrogen within 24 h in ICU; Na_min, Minimum sodium within 24 h in ICU; HR_mean, Mean heart rate within 24 h in ICU; PLT_min, Minimum platelet count within 24 h in ICU; RDW_max, Maximum red cell distribution width within the first 24 h of ICU admission; RR_max, Maximum respiratory rate within 24 h in ICU; HCO₃_min, Minimum bicarbonate within 24 h in ICU.

Figure 4

Two graphs compare the individual treatment effects on ninety-day survival. Chart A shows a range from negative twenty to forty percent, while Chart B ranges from negative thirty to thirty percent. Both charts display ranked individual treatment effects, with a purple line and shaded area indicating variability.

Figure 4. The estimated individual treatment effects (ITEs) distributed in the MIMIC-IV derivation cohorts (A) and the MIMIC-III validation cohorts (B). The solid line represents mean of ITEs. The shaded bands represent the 95% confidence interval of ITEs.

Figure 5

Line graphs display the estimated treatment effect across various medical parameters: SAPS II, WBC_min, BUN_max, Na_min, HR_mean, PLT_min, RDW_min, Age, RR_max, and HCO3_min. Each graph shows different slopes and fluctuations, indicating variations in treatment effect with changes in these parameters.

Figure 5. Partial dependence plots (PDPs) for features of the final causal forest model. The PDPs demonstrate how one feature affects the predicted outcome of the final causal forest model averaged across the distribution of the other features.

Figure 6

Four Kaplan-Meier survival plots compare non-transfusion and RBC transfusion groups over 90 days. Plots A and D show significant differences (p < 0.001) with lower survival in the transfusion group. Plot B indicates a slight difference (p = 0.011), while Plot C shows no significant difference (p = 0.320). Each graph displays survival probability against time with a risk table below.

Figure 6. The Kaplan-Meier curves for patients in Quantiles1 to four subgroups of the MIMIC-III validation cohort. (A) Quantile1; (B) Quantile2; (C) Quantile3; (D) Quantile4. RBC, Red Blood Cell.

Discussion

Propensity score matching analysis revealed no significant differences in mortality risk between the RBC transfusion group and the non-RBC transfusion group of sepsis patients with Hb levels of 7–9 g/dL. Notably, the causal forest model suggested that RBC transfusion could reduce the mortality risk of patients with higher ITEs. The treatment efficacy was influenced by several parameters, including SAPS-II, WBC_min, BUN_max, Na_min, HR_mean, PLT_min, RDW_min, Age, RR_max, and HCO₃_min.

Microcirculatory dysfunction in sepsis patients results in inadequate tissue perfusion, leading to tissue and organ ischemia and hypoxia, subsequently impairing organ function. Anemia is a common complication in sepsis patients that decreases oxygen delivery, worsening organ failure (Jansma et al., 2015; Chan et al., 2017). The mortality risk in elderly sepsis patients progressively increases with a decrease in Hb levels (Yang et al., 2023). RBC transfusion enhances oxygen delivery, thereby improving tissue oxygenation in sepsis patients (Rivers et al., 2001). In contrast, Marik et al. demonstrated that increasing hemoglobin levels does not automatically translate into improved tissue oxygenation. Their study showed that even with sustained high hemoglobin concentrations (119 ± 9.0 g/L), patients did not experience a meaningful enhancement in oxygen supply to tissues. Noteworthy, there was a decrease in gastric intramucosal pH (Marik and Sibbald, 1993). This report was similar to that of Fernandes et al., who postulated that RBC transfusion did not significantly increase global or regional oxygen utilization (Fernandes et al., 2001). These reports collectively suggest that administering RBC blood products to patients with high Hb concentrations confers no significant therapeutic advantage or improvement in clinical outcomes.

The 2016 and 2021 Surviving Sepsis Campaign guidelines based on the TRISS and TRICC studies recommend a restrictive transfusion strategy for managing RBC transfusions in sepsis patients (Hébert et al., 1999; Holst et al., 2014; Rhodes et al., 2017). In both studies, patients transfused at a threshold of 7 g/dL and 9 g/dL had insignificant differences in mortality. However, there was potentially significant heterogeneity in response across different patient subpopulations (Hébert et al., 1999; Holst et al., 2014; Rhodes et al., 2017). A retrospective study by Nilsson et al. reported that low-dose transfusion in sepsis patients potentially increases the mortality risk (Nilsson et al., 2020). Nonetheless, the patients in Nilsson’s study had a high Hb level of 9.5 g/dL, and the study did not determine whether blood transfusion benefited sepsis patients with Hb levels between 7–9 g/dL (Nilsson et al., 2020). A transfusion threshold of 9 g/dL was associated with reduced mortality risk in a survey of liberal versus restrictive transfusion strategies in critically ill oncologic patients (Bergamin et al., 2017). Chen et al. reported that blood transfusion potentially reduces the mortality risk in sepsis patients with CKD (Chen et al., 2024). These reports suggest that sepsis patients with Hb levels of 7–9 g/dL may benefit from RBC transfusion depending on the severity of their condition and underlying diseases.

Causal forest is a tree-based ensemble method designed to estimate heterogeneous treatment effects. In contrast to T-learner and S-learner methods, which require fitting separate or combined models for potential outcomes, causal forests are designed to directly estimate individual treatment effects. They inherently accommodate complex nonlinear interactions and high-order feature dependencies without the need for strong parametric assumptions. This makes causal forests particularly suitable for settings with complex and unknown effect heterogeneity (Athey and Imbens, 2016; Tibshirani et al., 2024; Xie et al., 2024).

The causal forest model uncovered high heterogeneity in the ITEs of RBC transfusion among sepsis patients with Hb levels of 7–9 g/dL. The treatment effect of transfusion on 90-day survival was affected by factors such as SAPS-II, WBC, BUN, Na, HR, PLT, RDW, age, RR, and HCO₃. In the Quantile4 subgroup, which benefited from transfusion, the patients showed higher SAPS II, WBC_min, BUN_max, PLT_min, RDW_min, age, and RR_max values, and the lowest HCO₃_min. The partial dependence plot indicated that patients with high SAPS-II scores and elevated WBC, BUN, Na, and RDW levels benefited from RBC transfusions and exhibited increased survival rates. These patients represented a subgroup likely to benefit from transfusion, as they generally presented with more severe clinical conditions and a higher risk of mortality. Elevated SAPS II scores, along with increased levels of WBC, BUN, sodium, and RDW, are indicative of inadequate tissue perfusion and severe ischemia-hypoxia. Red blood cell transfusion may enhance oxygen delivery, thereby alleviating tissue ischemia and hypoxia and potentially improving clinical outcomes (Poncet et al., 2017; Cavalcante Dos Santos et al., 2020).

Although the Quantile4 subgroup had higher PLT, analysis of the partial dependence plot showed that excessively elevated platelet levels correlated with increased risk of mortality. This discrepancy suggests that, although univariate analysis demonstrated higher PLT in the Quantile4 subgroup, the partial dependence plot confirmed the independent association between PLT and mortality after adjusting for other covariates. Therefore, the partial dependence plot provides a more nuanced assessment of the relationship between PLT and mortality risk by adjusting for potential confounders. On the other hand, patients with higher PLT counts have a higher risk of thrombosis. Of note, RBC transfusion can exacerbate PLT activation, worsening microcirculatory thrombosis (Czubak-Prowizor et al., 2020). In septic patients, increased tissue ischemia and hypoxia can lead to the accumulation of acidic metabolites, resulting in decreased HCO₃⁻ levels. Since red blood cell transfusion enhances oxygen-carrying capacity, its benefits are more likely to be observed in patients with impaired tissue oxygenation. In contrast, patients with normal HCO₃⁻ levels—who are less likely to experience significant tissue ischemia or hypoxia—may derive limited benefit from transfusion. In such cases, the potential risks of transfusion, including fluid overload, transfusion-related inflammation, acute respiratory distress syndrome (ARDS), and acute kidney injury (AKI), may outweigh any marginal benefits, potentially leading to net harm. RR and transfusion benefits have an inverted U-shape relationship where the survival impact of transfusion initially increases and then decreases with an increase in RR. Patients with excessively high RR often experience complications, such as volume overload and severe ARDS. Of note, RBC transfusions might exacerbate these conditions, increasing their mortality risk. TOC curves and AUTOC values observed in this study suggest meaningful individual variability in the benefit of low-dose RBC transfusions among septic patients with hemoglobin levels between 7 and 9 g/dL. Patients with higher predicted benefit scores experienced a greater reduction in mortality risk following transfusion. Though the causal forest model may effectively guide personalized transfusion strategies, further randomized controlled trials are needed to confirm these findings.

Limitations

This study was limited by several factors. 1) The study used retrospective data, and the findings should thus be validated using prospective randomized controlled trials. 2) The model used did not include CRP, PCT, and D-dimer markers because of the high number of missing values. The absence of these inflammatory and thrombosis-related markers could have potentially influenced the effects of transfusion.

Conclusion

RBC transfusions among sepsis patients with Hb levels of 7–9 g/dL exhibit heterogeneity of treatment effects, potentially reducing the mortality of patients with high ITE. Moreover, the treatment effect of RBC transfusion on 90-day survival is influenced by multiple factors, such as SAPS-II, WBC, BUN, Na, HR, PLT, RDW, age, RR, and HCO₃. Although the causal forest model can guide personalized transfusion, randomized controlled trials are advocated to further validate the present results.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and the institutional requirements.

Author contributions

PY: Formal Analysis, Data curation, Supervision, Writing – original draft, Writing – review and editing, Visualization. JY: Writing – original draft, Writing – review and editing, Methodology, Supervision. JH: Methodology, Investigation, Writing – original draft, Supervision, Conceptualization. LY: Methodology, Writing – original draft, Visualization, Software, Project administration, Writing – review and editing. XG: Conceptualization, Supervision, Methodology, Project administration, Resources, Writing – original draft. XD: Funding acquisition, Writing – original draft, Conceptualization, Methodology, Visualization. QC: Investigation, Methodology, Validation, Supervision, Funding acquisition, Writing – original draft, Conceptualization.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This study was supported by the Yangzhou Municipal Health Commission Research Project (2023-2-27), and the Jiangdu People’s Hospital Affiliated to Yangzhou University (YNKT202208).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar.2025.1615618/full#supplementary-material

Abbreviations

RBC, Red Blood Cell; ITE, Individual Treatment Effect; ATE, Average Treatment Effect; Hb, Hemoglobin; MIMIC, Medical Information Mart for Intensive Care; SMD, Standardized Mean Difference; HR, Heart Rate; MBP, Mean Blood Pressure; RR, Respiratory Rate; T, Temperature; SPO2, Oxygen Saturation; G, Glucose; WBC, White Blood Cell; BUN, Blood Urea Nitrogen; Na, Sodium; K, Potassium; PT, Prothrombin Time; APTT, Activated Partial Thromboplastin Time; VDI_24hmax, Maximum Norepinephrine Equivalent dose within 24 h in ICU; Lac, Lactate; PO2, Arterial Oxygen Partial Pressure; Fluid_6h_sum, Volume of intravenous infusion within six h of ICU admission; Fluid_24h_sum, Volume of intravenous infusion within 24h of ICU admission; SAPS-II, Simplified Acute Physiology Score II; SAPS-III, Simplified Acute Physiology Score III; SIRS, Systemic Inflammatory Response Syndrome; Charlson, Charlson comorbidity index; SOFA, Sequential Organ Failure Assessment; CRD, Chronic Renal Disease. Venti, Mechanical Ventilation. TOC, Targeting Operator Characteristic; AUTOC, the Area Under the Targeting Operator Characteristic.

References

Athey, S., and Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. U S A 113, 7353–7360. doi:10.1073/pnas.1510489113

PubMed Abstract | CrossRef Full Text | Google Scholar

Bergamin, F. S., Almeida, J. P., Landoni, G., Galas, F., Fukushima, J. T., Fominskiy, E., et al. (2017). Liberal versus restrictive transfusion strategy in critically ill oncologic patients, the transfusion requirements in critically ill oncologic patients randomized controlled trial. Crit. Care Med. 45, 766–773. doi:10.1097/CCM.0000000000002283

PubMed Abstract | CrossRef Full Text | Google Scholar

Cavalcante Dos Santos, E., Orbegozo, D., Mongkolpun, W., Galfo, V., Nan, W., Gouvêa Bogossian, E., et al. (2020). Systematic review and meta-analysis of effects of transfusion on hemodynamic and oxygenation variables. Crit. Care Med. 48, 241–248. doi:10.1097/CCM.0000000000004115

PubMed Abstract | CrossRef Full Text | Google Scholar

Chan, Y. L., Han, S. T., Li, C. H., Wu, C. C., and Chen, K. F. (2017). Transfusion of red blood cells to patients with sepsis. Int. J. Mol. Sci. 18, 1946. doi:10.3390/ijms18091946

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, L., Lu, H., Lv, C., Ni, H., Yu, R., Zhang, B., et al. (2024). Association between red blood cells transfusion and 28-day mortality rate in septic patients with concomitant chronic kidney disease. Sci. Rep. 14, 23769. doi:10.1038/s41598-024-75643-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Czubak-Prowizor, K., Rywaniak, J., and Zbikowska, H. M. (2020). Red blood cell supernatant increases activation and agonist-induced reactivity of blood platelets. Thromb. Res. 196, 543–549. doi:10.1016/j.thromres.2020.10.023

PubMed Abstract | CrossRef Full Text | Google Scholar

Dupuis, C., Sonneville, R., Adrie, C., Gros, A., Darmon, M., Bouadma, L., et al. (2017). Impact of transfusion on patients with sepsis admitted in intensive care unit, a systematic review and meta-analysis. Ann. Intensive Care 7, 5. doi:10.1186/s13613-016-0226-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Evans, T. (2018). Diagnosis and management of sepsis. Clin. Med. (Lond) 18, 146–149. doi:10.7861/clinmedicine.18-2-146

PubMed Abstract | CrossRef Full Text | Google Scholar

Evans, L., Rhodes, A., Alhazzani, W., Antonelli, M., Coopersmith, C. M., French, C., et al. (2021). Surviving sepsis campaign, international guidelines for management of sepsis and septic shock 2021. Intensive Care Med. 47, 1181–1247. doi:10.1007/s00134-021-06506-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Fernandes, C. J., Akamine, N., De Marco, F. V. C., De Souza, J. a.M., Lagudis, S., and Knobel, E. (2001). Red blood cell transfusion does not increase oxygen consumption in critically ill septic patients. Crit. Care 5, 362–367. doi:10.1186/cc1070

PubMed Abstract | CrossRef Full Text | Google Scholar

Hébert, P. C., Wells, G., Blajchman, M. A., Marshall, J., Martin, C., Pagliarello, G., et al. (1999). A multicenter, randomized, controlled clinical trial of transfusion requirements in critical care. Transfusion Requirements in Critical Care Investigators, Canadian Critical Care Trials Group. N. Engl. J. Med. 340, 409–417. doi:10.1056/NEJM199902113400601

PubMed Abstract | CrossRef Full Text | Google Scholar

Holst, L. B., Haase, N., Wetterslev, J., Wernerman, J., Guttormsen, A. B., Karlsson, S., et al. (2014). Lower versus higher hemoglobin threshold for transfusion in septic shock. N. Engl. J. Med. 371, 1381–1391. doi:10.1056/NEJMoa1406617

PubMed Abstract | CrossRef Full Text | Google Scholar

Hu, W., Chen, H., Ma, C., Sun, Q., Yang, M., Wang, H., et al. (2023). Identification of indications for albumin administration in septic patients with liver cirrhosis. Crit. Care 27, 300. doi:10.1186/s13054-023-04587-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Inoue, K., Seeman, T. E., Horwich, T., Budoff, M. J., and Watson, K. E. (2023). Heterogeneity in the association between the presence of coronary artery calcium and cardiovascular events, a machine-learning approach in the mesa study. Circulation 147, 132–141. doi:10.1161/CIRCULATIONAHA.122.062626

PubMed Abstract | CrossRef Full Text | Google Scholar

Jansma, G., De Lange, F., Kingma, W. P., Vellinga, N. A., Koopmans, M., Kuiper, M. A., et al. (2015). Sepsis-related anemia' is absent at hospital presentation; a retrospective cohort analysis. BMC Anesthesiol. 15, 55. doi:10.1186/s12871-015-0035-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Levy, M. M., Evans, L. E., and Rhodes, A. (2018). The surviving sepsis campaign bundle, 2018 update. Intensive Care Med. 44, 925–928. doi:10.1007/s00134-018-5085-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Marik, P. E., and Sibbald, W. J. (1993). Effect of stored-blood transfusion on oxygen delivery in patients with sepsis. Jama 269, 3024–3029. doi:10.1001/jama.1993.03500230106037

PubMed Abstract | CrossRef Full Text | Google Scholar

MIT-LCP (2020). MIMIC-IV concepts. GitHub. Available online at: https://github.com/MIT-LCP/mimic-iv/blob/master/concepts/sepsis/sepsis3.sql (Accessed July 18, 2025).

Google Scholar

Nilsson, C. U., Bentzer, P., Andersson, L. E., Björkman, S. A., Hanssson, F. P., and Kander, T. (2020). Mortality and morbidity of low-grade red blood cell transfusions in septic patients, a propensity score-matched observational study of a liberal transfusion strategy. Ann. Intensive Care 10, 111. doi:10.1186/s13613-020-00727-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Osawa, I., Goto, T., Kudo, D., Hayakawa, M., Yamakawa, K., Kushimoto, S., et al. (2023). Targeted therapy using polymyxin B hemadsorption in patients with sepsis, a post-hoc analysis of the JSEPTIC-DIC study and the EUPHRATES trial. Crit. Care 27, 245. doi:10.1186/s13054-023-04533-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Parsons, E. C., Hough, C. L., Seymour, C. W., Cooke, C. R., Rubenfeld, G. D., Watkins, T. R., et al. (2011). Red blood cell transfusion and outcomes in patients with acute lung injury, sepsis and shock. Crit. Care 15, R221. doi:10.1186/cc10458

PubMed Abstract | CrossRef Full Text | Google Scholar

Perner, A., Smith, S. H., Carlsen, S., and Holst, L. B. (2012). Red blood cell transfusion during septic shock in the ICU. Acta Anaesthesiol. Scand. 56, 718–723. doi:10.1111/j.1399-6576.2012.02666.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Poncet, A., Perneger, T. V., Merlani, P., Capuzzo, M., and Combescure, C. (2017). Determinants of the calibration of SAPS II and SAPS 3 mortality scores in intensive care, a European multicenter study. Crit. Care 21, 85. doi:10.1186/s13054-017-1673-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Qi, D., and Peng, M. (2021). Early hemoglobin status as a predictor of long-term mortality for sepsis patients in intensive care units. Shock 55, 215–223. doi:10.1097/SHK.0000000000001612

PubMed Abstract | CrossRef Full Text | Google Scholar

Rhodes, A., Evans, L. E., Alhazzani, W., Levy, M. M., Antonelli, M., Ferrer, R., et al. (2017). Surviving sepsis campaign, international guidelines for management of sepsis and septic shock, 2016. Intensive Care Med. 43, 304–377. doi:10.1007/s00134-017-4683-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Rivers, E., Nguyen, B., Havstad, S., Ressler, J., Muzzin, A., Knoblich, B., et al. (2001). Early goal-directed therapy in the treatment of severe sepsis and septic shock. N. Engl. J. Med. 345, 1368–1377. doi:10.1056/NEJMoa010307

PubMed Abstract | CrossRef Full Text | Google Scholar

Rosland, R. G., Hagen, M. U., Haase, N., Holst, L. B., Plambech, M., Madsen, K. R., et al. (2014). Red blood cell transfusion in septic shock - clinical characteristics and outcome of unselected patients in a prospective, multicentre cohort. Scand. J. Trauma Resusc. Emerg. Med. 22, 14. doi:10.1186/1757-7241-22-14

PubMed Abstract | CrossRef Full Text | Google Scholar

Seymour, C. W., Liu, V. X., Iwashyna, T. J., Brunkhorst, F. M., Rea, T. D., Scherag, A., et al. (2016). Assessment of clinical criteria for sepsis, for the third international consensus definitions for sepsis and septic shock (Sepsis-3). Jama 315, 762–774. doi:10.1001/jama.2016.0288

PubMed Abstract | CrossRef Full Text | Google Scholar

Tibshirani, J., Athey, S., Friedberg, R., Hadad, V., Hirshberg, D., Miner, L., Sverdrup, E., Wager, S., and Wright, M. (2024). grf: Generalized Random Forests (Version 2.4.0). Comprehensive R Archive Network (CRAN). Available online at: https://CRAN.R-project.org/package=grf (Accessed November 1, 2024).

Google Scholar

Wager, S., and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113, 1228–1242. doi:10.1080/01621459.2017.1319839

CrossRef Full Text | Google Scholar

World Health Organization (2017). WHA70.7, improving the prevention, diagnosis and clinical management of sepsis. Available online at: https://www.paho.org/en/documents/wha707-resolution-improving-prevention-diagnosis-and-clinical-management-sepsis-2017 (Accessed July 17, 2025).

Google Scholar

Xie, H., Jia, Y., and Liu, S. (2024). Integration of artificial intelligence in clinical laboratory medicine, advancements and challenges. Interdiscip. Med. 2, e20230056. doi:10.1002/inmd.20230056

CrossRef Full Text | Google Scholar

Yang, P., Yuan, J., Yu, L., Yu, J., Zhang, Y., Yuan, Z., et al. (2023). Clinical significance of hemoglobin level and blood transfusion therapy in elderly sepsis patients, a retrospective analysis. Am. J. Emerg. Med. 73, 27–33. doi:10.1016/j.ajem.2023.08.005

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: sepsis, causal forest, red blood cell transfusion, heterogeneity analysis, personalized treatment

Citation: Yang P, Yuan J, He J, Yu L, Gu X, Ding X and Chen Q (2025) Investigating the feasibility of machine learning to guide personalized red blood cell (RBC) transfusion: analyzing the heterogeneity of RBC transfusion in septic patients with hemoglobin levels of 7–9 g/dL based on the causal forest model. Front. Pharmacol. 16:1615618. doi: 10.3389/fphar.2025.1615618

Received: 21 April 2025; Accepted: 13 August 2025;
Published: 28 August 2025.

Edited by:

Antonino S. Rubino, Kore University of Enna, Italy

Reviewed by:

Juan Francisco Morales, University of Florida, United States
Hee-Jung Kim, Korea University, Republic of Korea

Copyright © 2025 Yang, Yuan, He, Yu, Gu, Ding and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Qihong Chen, amlhbmdkdXJlbnlpaWN1QDE2My5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.