ORIGINAL RESEARCH article

Front. Med., 21 January 2026

Sec. Intensive Care Medicine and Anesthesiology

Volume 12 - 2025 | https://doi.org/10.3389/fmed.2025.1753260

Development and validation of an interpretable machine learning model for predicting cognitive impairment in patients with sepsis

  • 1. Department of Critical Care Medicine, Shenzhen Baoan Shiyan People’s Hospital, Shenzhen, China

  • 2. Department of Gynecology, Langxin Community Health Service Center, Shenzhen Baoan Shiyan People’s Hospital, Shenzhen, China

Article metrics

View details

465

Views

60

Downloads

Abstract

Objective:

Cognitive impairment is a common and debilitating complication after sepsis. This study aimed to develop and validate an interpretable machine learning (ML) model to predict post-sepsis cognitive impairment and identify key clinical risk factors.

Methods:

A retrospective cohort of 866 adult sepsis patients treated in our hospital between January 2020 and January 2025 was analyzed. Cognitive function was assessed 1–3 months after discharge using the Montreal Cognitive Assessment (MoCA), with scores < 26 indicating impairment. Key predictors were selected via least absolute shrinkage and selection operator (LASSO) regression and the Boruta algorithm. Five ML models—logistic regression, extreme gradient boosting (XGBoost), random forest (RF), k-nearest neighbors (KNN), and decision tree (DT)—were developed and evaluated using area under the curve (AUC), accuracy, F1-score. SHapley Additive exPlanations (SHAP) values were applied for interpretability.

Results:

Cognitive impairment occurred in 195 patients (22.5%). Seven variables were identified as key predictors of cognitive impairment, including age, years of education, septic shock, benzodiazepine use, acute physiology and chronic health evaluation II (APACHE II) score, sequential organ failure assessment (SOFA) score, and interleukin-10 (IL-10) level. The RF model performed best, with AUCs of 0.947 (training set) and 0.895 (validation set), showing good calibration and clinical utility. SHAP analysis showed that SOFA score had the greatest influence on cognitive impairment, followed by age, APACHE II score, IL-10, and years of education.

Conclusion:

Using SHAP analysis, the RF model provided clear insights into the key factors contributing to the model’s prediction of cognitive impairment after sepsis. The model not only achieved high predictive accuracy but also offered a transparent, data-driven tool to identify patients at elevated risk, potentially enabling timely interventions and tailored clinical management.

1 Introduction

Sepsis is a life-threatening syndrome characterized by dysregulated host responses to infection and remains a major global health challenge (1). Earlier global estimates suggested that sepsis affects approximately 31 million people annually and contributes to nearly 5 million deaths worldwide, underscoring its profound impact on population health (2). With improvements in critical care, a growing number of patients survive the acute phase of sepsis; however, many survivors experience long-term complications that substantially affect their quality of life (3). Among these sequelae, cognitive impairment has emerged as one of the most prevalent and debilitating outcomes. Studies have reported that a considerable proportion of sepsis survivors develop persistent deficits in memory, attention, executive function, or information processing, which may persist for months or even years after discharge (4, 5). These impairments not only hinder functional recovery and return to daily activities but also increase healthcare utilization, dependency, and long-term mortality. Despite its clinical significance, early recognition of cognitive impairment remains challenging because symptoms are often subtle and obscured by delirium, sedation, and other features of critical illness.

The development of cognitive impairment following sepsis is multifactorial. Proposed mechanisms include sustained systemic inflammation, blood-brain barrier disruption, microglial activation, neuronal injury induced by oxidative stress, and altered cerebral perfusion (68). At the clinical level, diverse factors such as disease severity, organ dysfunction patterns, metabolic disturbances, sedative exposure, and hemodynamic instability have all been implicated (5, 9, 10). However, the interaction among these variables is highly complex and non-linear, and traditional regression-based methods often fail to capture such intricate relationships. As a consequence, early and reliable prediction of cognitive impairment in sepsis is still limited, narrowing the potential to implement timely interventions and focused neuroprotective strategies.

Machine learning (ML) offers a powerful approach to address these limitations and has been applied to cognitive impairment following sepsis, both in clinical studies (11, 12) and animal models (13, 14). By leveraging large-scale, multidimensional clinical data, ML algorithms are capable of modeling non-linear interactions and improving predictive accuracy beyond conventional statistical techniques (15). Nevertheless, many ML models function as “black boxes,” producing predictions without revealing how input variables contribute to the outcome (16). In high-stakes clinical decision-making, this opacity raises concerns regarding accountability and interpretability, reducing clinicians’ willingness to adopt ML tools in practice. The emergence of interpretable ML provides a solution to this challenge (17, 18). Techniques such as SHapley additive exPlanations (SHAP) and model-agnostic interpretation frameworks allow visualization of feature contributions at both population and individual levels, enhancing transparency, reliability, and user acceptance (19). Applying interpretable ML to the prediction of cognitive impairment in sepsis may therefore improve early risk stratification, uncover clinically meaningful risk profiles, and support targeted strategies to prevent long-term neurological deterioration.

In this study, we aimed to develop and validate an interpretable ML model to predict the occurrence of cognitive impairment among patients with sepsis. By combining routinely available clinical variables with advanced interpretable ML techniques, we sought not only to optimize predictive performance but also to clarify the relative contribution of critical risk factors. This interpretable framework may assist clinicians in recognizing high-risk patients at an earlier stage and facilitate the implementation of strategies to mitigate long-term cognitive deterioration.

2 Materials and methods

2.1 Study cohort and participant selection

This retrospective cohort study included adult patients diagnosed with sepsis and treated at our hospital between January 2020 and January 2025. Inclusion Criteria: (1) Diagnosed with sepsis according to the Sepsis-3 international consensus definition (20); (2) Age ≥ 18 years; (3) Patients who survived until discharge; (4) Patients with complete clinical data. Exclusion Criteria: (1) Pre-existing cognitive impairment or diagnosed dementia prior to hospitalization; (2) Did not complete follow-up or died during follow-up; (3) Refusal or inability to undergo cognitive evaluation; (4) Central nervous system infections, including meningitis or encephalitis; (5) With renal, hepatic, pulmonary encephalopathy or other metabolic encephalopathies; (6) With malignancies or autoimmune diseases; (7) Pregnant or breastfeeding women; (8) Drug or toxic substance intoxication, or substance addiction. After applying the exclusion criteria, 866 eligible patients were included in the analysis. To facilitate model development and internal validation, the entire cohort was randomly divided into a training set (70%) and a validation set (30%) using computerized random allocation. All feature selection and model training procedures were performed exclusively in the training set, while the validation set was used only for independent performance evaluation. A detailed flow of patient screening and enrollment was illustrated in Figure 1.

FIGURE 1

Flowchart showing the selection process of sepsis patients for a study conducted between January 2020 and January 2025. Initially, 1053 patients were considered. Exclusions included pre-existing cognitive impairment (43), incomplete follow-up (68), refusal of cognitive evaluation (22), central nervous system infections (13), encephalopathies (15), malignancies or autoimmune diseases (16), pregnancy or breastfeeding (3), and substance issues (7). After exclusions, 866 patients were included. These were divided into a training set of 606 and a validation set of 260.

Study flowchart illustrating patient screening, enrollment, and grouping procedures.

2.2 Follow-up and cognitive assessment

After hospital discharge, patients were followed for 3 months. Cognitive function was assessed using the Beijing version of the Montreal Cognitive Assessment (MoCA) (21). Evaluations were conducted at three predefined time points: approximately 1 month (30 ± 7 days), 2 months (60 ± 7 days), and 3 months (90 ± 14 days) after discharge. In accordance with scoring recommendations, 1 point was added to the total MoCA score for individuals with ≤ 12 years of education to correct for educational effects. Patients who had a MoCA score < 26 at any of the three follow-up assessments within 3 months were classified into the cognitive impairment group, whereas those with MoCA scores ≥ 26 at all assessments were assigned to the non-cognitive impairment group.

2.3 Data collection

Clinical data were collected retrospectively from the electronic medical records using a standardized case report form. Two trained investigators independently extracted and cross-checked all variables to ensure accuracy and completeness. Baseline demographic characteristics included gender, age, body mass index (BMI), marital status, years of education, and lifestyle factors such as smoking and drinking status. Comorbidities assessed at admission included hypertension, diabetes mellitus, and hyperlipidemia. Disease-related clinical variables comprised the presence of septic shock, the primary site of infection (respiratory, intra-abdominal, urinary, or other), and the use of critical care interventions, including mechanical ventilation, benzodiazepine administration, and vasopressor support. Severity of illness was quantified using the Acute Physiology and Chronic Health Evaluation II (APACHE II) and Sequential Organ Failure Assessment (SOFA) scores, calculated from the worst physiological and biochemical values recorded within the first 24 h after admission. Laboratory parameters were obtained from the initial blood samples collected during this same 24-h window and included white blood cell count (WBC), platelet count (PLT), procalcitonin (PCT), interleukin-6 (IL-6), interleukin-10 (IL-10), tumor necrosis factor-α (TNF-α), and serum uric acid (UA) levels.

The initial set of candidate variables was selected based on a combination of biological plausibility, evidence from prior literature, and availability in the electronic health record system. Demographic characteristics and established severity scores (SOFA and APACHE II) were included because they are widely recognized predictors of sepsis outcomes and neurological complications. Laboratory markers were chosen to reflect key pathophysiological domains relevant to post-sepsis cognitive impairment. Specifically, the cytokine panel (IL-6, IL-10, and TNF-α) was selected to capture systemic inflammatory and immune regulatory responses, which have been consistently implicated in sepsis-associated brain dysfunction (5, 7). Uric acid was included as a marker related to oxidative stress and metabolic disturbance, both of which may contribute to neuronal injury during severe systemic inflammation (22). Other commonly used sepsis biomarkers, such as lactate or C-reactive protein, as well as specific neuro-injury markers (e.g., S100B or neuron-specific enolase), were not consistently measured in this retrospective cohort and therefore could not be reliably incorporated into the initial analysis.

2.4 Model development and explainable machine learning

Prior to feature selection and model development, data preprocessing was performed. Continuous variables were standardized using z-score normalization (mean = 0, standard deviation = 1), while categorical variables were encoded as binary indicators. Because cases with incomplete demographic or laboratory information were excluded during cohort selection, no missing data imputation was required. To prevent information leakage, all preprocessing procedures were fitted using the training dataset and subsequently applied to the validation dataset.

To identify the most informative predictors of post-sepsis cognitive impairment, a two-stage feature selection strategy was adopted. First, the least absolute shrinkage and selection operator (LASSO) regression was applied to reduce dimensionality and eliminate redundant variables by imposing a penalty on the absolute size of coefficients. Subsequently, the Boruta algorithm was used to further confirm variable importance through iterative comparison with shadow features. The intersection of variables identified by both methods was selected as the final feature set for model construction.

Using these selected predictors, five ML algorithms were developed to construct prediction models: logistic regression, extreme gradient boosting (XGBoost), random forest (RF), k-nearest neighbors (KNN), and decision tree (DT). Hyperparameter tuning was performed within the training dataset for all ML models. Hyperparameters for all models were optimized using grid search combined with five-fold cross-validation. The training set was partitioned into five folds, with models iteratively trained on four folds and evaluated on the remaining fold. The combination of hyperparameters yielding the best mean cross-validation performance was selected for final model training. To prevent information leakage, the validation set was not involved in hyperparameter tuning or cross-validation and was used solely for independent performance evaluation. Model performance was comprehensively evaluated using multiple quantitative metrics, including area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, specificity, precision, and F1-score, etc. AUC served as the primary indicator of discriminative ability, while F1-score and accuracy were used as secondary evaluation metrics to further characterize overall performance. The model demonstrating the most favorable and stable results across these measures was selected as the final predictive model.

To assess the clinical utility and reliability of the final model, decision curve analysis (DCA), calibration curve analysis, and confusion matrix evaluation were conducted. Finally, to enhance interpretability and elucidate the contributions of individual predictors, SHAP analysis was employed to quantify both global and patient-level feature importance, thereby improving transparency and clinical interpretability of the model.

2.5 Statistical analysis

All statistical analyses were conducted using R software (version 4.3.2). Continuous variables were examined for normality using the Shapiro-Wilk test. Normally distributed variables were presented as mean ± standard deviation (SD) and compared using independent-samples t-tests. Non-normally distributed variables were expressed as median (interquartile range, IQR) and compared using the Mann-Whitney U test. Categorical variables were summarized as frequencies and percentages and compared using chi-square. Prior to model development, multicollinearity among predictors was examined by calculating the variance inflation factor (VIF) using the “car” package, and variables with VIF > 5 were considered to exhibit severe collinearity and excluded.

The “glmnet” package was used for LASSO regression, and the “Boruta” package was applied to identify important predictors via the Boruta algorithm. All ML models were developed and validated using the “tidymodels” package. Clinical utility was evaluated through DCA using the “rmda” package. Calibration performance was examined using calibration plots generated via the “rms” package. Confusion matrices were generated using “caret” package. Model interpretability was achieved using SHAP implemented through the “fastshap” and “shapviz” packages, which enabled quantification and visualization of global and individual-level predictor contributions. Statistical significance was defined as a two-tailed P < 0.05.

3 Results

3.1 Baseline characteristics of the study population

A total of 866 patients with sepsis were included in this study. The mean age of the cohort was 63.35 ± 9.37 years, and 53.9% were male. Overall, 195 patients (22.5%) developed cognitive impairment within 3 months after discharge. The dataset was randomly divided into a training set (n = 606) and a validation set (n = 260). As shown in Table 1, there were no significant differences between the two sets in age (63.23 ± 9.50 vs. 63.63 ± 9.08 years, P = 0.565), gender distribution (male: 54.1% vs. 53.5%, P = 0.857), or the overall incidence of cognitive impairment (21.9% vs. 23.8%, P = 0.540). Other demographic factors, lifestyle habits, comorbidities, sepsis-related clinical features, severity scores, and laboratory results also showed no significant differences between the two groups (all P > 0.05). These findings indicate that the training and validation sets were well balanced across baseline characteristics prior to model development.

TABLE 1

Variables Total (n = 866) Training set (n = 606) Validation set (n = 260) χ 2/t/Z P
Gender 0.032 0.857
 Male 467 (53.9%) 328 (54.1%) 139 (53.5%)
 Female 399 (46.1%) 278 (45.9%) 121 (46.5%)
Age (years) 63.35 ± 9.37 63.23 ± 9.50 63.63 ± 9.08 0.575 0.565
BMI (kg/m2) 23.02 ± 2.28 23.09 ± 2.33 22.87 ± 2.16 1.301 0.194
Marital status 0.201 0.654
 Coupled 688 (79.4%) 479 (79.0%) 209 (80.4%)
 Not coupled 178 (20.6%) 127 (21.0%) 51 (19.6%)
Years of education 10.06 ± 2.56 10.14 ± 2.62 9.88 ± 2.41 1.371 0.171
Smoking 0.050 0.823
 No 518 (59.8%) 361 (59.6%) 157 (60.4%)
 Yes 348 (40.2%) 245 (40.4%) 103 (39.6%)
Drinking 0.041 0.840
 No 703 (81.2%) 493 (81.4%) 210 (80.8%)
 Yes 163 (18.8%) 113 (18.7%) 50 (19.2%)
Septic shock 0.240 0.624
 No 721 (83.3%) 507 (83.7%) 214 (82.3%)
 Yes 145 (16.7%) 99 (16.3%) 46 (17.7%)
Hypertension 0.142 0.706
 No 561 (64.8%) 395 (65.2%) 166 (63.8%)
 Yes 305 (35.2%) 211 (34.8%) 94 (36.2%)
Diabetes mellitus 0.235 0.628
 No 685 (79.1%) 482 (79.5%) 203 (78.1%)
 Yes 181 (20.9%) 124 (20.5%) 57 (21.9%)
Hyperlipidemia 0.107 0.743
 No 586 (67.7%) 408 (67.3%) 178 (68.5%)
 Yes 280 (32.3%) 198 (32.7%) 82 (31.5%)
Infection site 0.474 0.924
 Respiratory tract 400 (46.2%) 283 (46.7%) 117 (45.0%)
 Intra-abdominal 177 (20.4%) 125 (20.6%) 52 (20.0%)
 Urinary tract 163 (18.8%) 111 (18.3%) 52 (20.0%)
 Other 126 (14.6%) 87 (14.4%) 39 (15.0%)
Mechanical ventilation 0.022 0.882
 No 207 (23.9%) 144 (23.8%) 63 (24.2%)
 Yes 659 (76.1%) 462 (76.2%) 197 (75.8%)
Benzodiazepine use 1.278 0.258
 No 731 (84.4%) 506 (83.5%) 225 (86.5%)
 Yes 135 (15.6%) 100 (16.5%) 35 (13.5%)
Vasopressor use 0.226 0.635
 No 556 (64.2%) 386 (63.7%) 170 (65.4%)
 Yes 310 (35.8%) 220 (36.3%) 90 (34.6%)
APACHE II score (points) 17 (6, 30) 16 (6, 30) 19 (8, 29) 0.467 0.641
SOFA score (points) 7.37 ± 2.19 7.29 ± 2.20 7.55 ± 2.15 1.605 0.109
WBC (× 109/L) 14.95 ± 3.40 14.99 ± 3.29 14.87 ± 3.65 0.476 0.634
PLT (× 109/L) 158 (100, 213) 159 (101, 213) 154 (97, 211) 0.450 0.653
PCT (ng/mL) 8.19 ± 2.32 8.18 ± 2.38 8.22 ± 2.19 0.232 0.817
IL-6 (μg/L) 1.20 ± 0.40 1.19 ± 0.41 1.21 ± 0.37 0.677 0.499
IL-10 (μg/L) 0.73 ± 0.31 0.72 ± 0.31 0.75 ± 0.30 1.318 0.188
TNF-α (μg/L) 1.30 ± 0.40 1.31 ± 0.39 1.27 ± 0.42 1.352 0.177
UA (μmol/L) 355 (214, 439) 354 (215, 438) 357 (212, 441) 0.120 0.904
Cognitive impairment 0.376 0.540
 No 671 (77.5%) 473 (78.1%) 198 (76.2%)
 Yes 195 (22.5%) 133 (21.9%) 62 (23.8%)

Comparison of baseline clinical profiles between the training and validation sets.

BMI, body mass index; APACHE II, acute physiology and chronic health evaluation II; SOFA, sequential organ failure assessment; WBC, white blood cell count; PLT, platelet count; PCT, procalcitonin; IL-6, interleukin-6; IL-10, interleukin-10; TNF-α, tumor necrosis factor-alpha; UA, uric acid.

3.2 Baseline comparison by cognitive impairment status in the training set

Among the 606 patients in the training set, 133 (21.9%) developed cognitive impairment. As shown in Table 2, patients in the cognitive impairment group were significantly older than those without impairment and had fewer years of education (all P < 0.001). The incidence of septic shock was higher in the cognitive impairment group (P < 0.001), and benzodiazepine use was more common (P = 0.001). In addition, patients with cognitive impairment had markedly higher APACHE II scores and higher SOFA scores (all P < 0.001). Regarding laboratory indicators, IL-6 (P = 0.048) and IL-10 (P < 0.001) levels were significantly elevated in the cognitive impairment group, while other laboratory parameters, including WBC, PLT, PCT, TNF-α, and UA, showed no significant differences (all P > 0.05). No significant differences were found between the two groups in terms of gender, BMI, marital status, smoking or drinking habits, comorbidities, infection site, mechanical ventilation, or vasopressor use (all P > 0.05).

TABLE 2

Variables Cognitive impairment group (n = 133) Non-cognitive impairment group (n = 473) χ 2/t/Z P
Gender 0.040 0.842
 Male 73 (54.9%) 255 (53.9%)
 Female 60 (45.1%) 218 (46.1%)
Age (years) 68.21 ± 8.62 61.83 ± 9.27 7.118 < 0.001
BMI (kg/m2) 22.88 ± 2.06 23.16 ± 2.41 1.220 0.223
Marital status 0.074 0.786
 Coupled 104 (78.2%) 375 (79.3%)
 Not coupled 29 (21.8%) 98 (20.7%)
Years of education 9.14 ± 2.25 10.43 ± 2.64 5.135 < 0.001
Smoking 0.024 0.878
 No 80 (60.2%) 281 (59.6%)
 Yes 53 (39.8%) 192 (40.4%)
Drinking 0.091 0.762
 No 107 (80.5%) 386 (81.6%)
 Yes 26 (19.5%) 87 (18.4%)
Septic shock 12.415 < 0.001
 No 98 (73.7%) 409 (86.5%)
 Yes 35 (26.3%) 64 (13.5%)
Hypertension 0.073 0.787
 No 88 (66.2%) 307 (64.9%)
 Yes 45 (33.8%) 166 (35.1%)
Diabetes mellitus 0.037 0.848
 No 105 (78.9%) 377 (79.7%)
 Yes 28 (21.1%) 96 (20.3%)
Hyperlipidemia 0.523 0.470
 No 93 (69.9%) 315 (66.6%)
 Yes 40 (30.1%) 158 (33.4%)
Infection site 0.118 0.990
 Respiratory tract 63 (47.4%) 220 (46.5%)
 Intra-abdominal 28 (21.1%) 97 (20.5%)
 Urinary tract 24 (18.0%) 87 (18.4%)
 Other 18 (13.5%) 69 (14.6%)
Mechanical ventilation 0.137 0.711
 No 30 (22.6%) 114 (24.1%)
 Yes 103 (77.4%) 359 (75.9%)
Benzodiazepine use 10.156 0.001
 No 99 (74.4%) 407 (86.0%)
 Yes 34 (25.6%) 66 (14.0%)
Vasopressor use 0.123 0.726
 No 83 (62.4%) 303 (64.1%)
 Yes 50 (37.6%) 170 (35.9%)
APACHE II score (points) 23 (12, 36) 13 (5, 28) 4.652 < 0.001
SOFA score (points) 9.15 ± 2.32 6.76 ± 1.85 12.409 < 0.001
WBC (× 109/L) 15.17 ± 3.56 14.94 ± 3.21 0.712 0.477
PLT (× 109/L) 152 (90, 212) 161 (106, 214) 1.129 0.259
PCT (ng/mL) 8.43 ± 2.18 8.11 ± 2.42 1.376 0.169
IL-6 (μg/L) 1.25 ± 0.45 1.17 ± 0.40 1.981 0.048
IL-10 (μg/L) 0.84 ± 0.37 0.69 ± 0.29 4.942 < 0.001
TNF-α (μg/L) 1.35 ± 0.42 1.30 ± 0.38 1.309 0.191
UA (μmol/L) 297 (169, 455) 355 (223, 436) 0.634 0.526

Clinical and laboratory profiles of patients with and without cognitive impairment in the training set.

BMI, body mass index; APACHE II, acute physiology and chronic health evaluation II; SOFA, sequential organ failure assessment; WBC, white blood cell count; PLT, platelet count; PCT, procalcitonin; IL-6, interleukin-6; IL-10, interleukin-10; TNF-α, tumor necrosis factor-alpha; UA, uric acid.

3.3 Feature selection using a combined LASSO-Boruta approach

A two-stage feature selection strategy combining LASSO regression and the Boruta algorithm was applied to identify the most informative predictors of post-sepsis cognitive impairment. In the LASSO analysis, the coefficient profiles progressively shrank toward zero with increasing penalty values (Figure 2A). Using 10-fold cross-validation, two candidate λ values were identified (Figure 2B). The λ corresponding to the minimum binomial deviance retained 10 variables, whereas the more parsimonious λ within one standard error (λ0.1se) retained 7 variables. To improve model simplicity and generalizability, we selected the λ0.1se model, which included the following seven features: age, years of education, septic shock, benzodiazepine use, APACHE II score, SOFA score, and IL-10.

FIGURE 2

A series of data visualizations demonstrate statistical analyses. Panel A shows a line graph of coefficients versus log lambda, with curves of varying colors. Panel B depicts a binomial deviance plot against Log(λ) with a curve and error bars. Panel C is a line chart of variable importance over classifier runs, showcasing multiple colored lines. Panel D displays a bar chart of variable importance, with color coding for confirmation status. Panel E features a network diagram connecting variables like septic shock and age with the methods Bonferroni and Lasso.

Feature selection using LASSO regression and the Boruta algorithm. (A) LASSO coefficient profiles of candidate variables plotted against log (λ). As the penalty increased, coefficients shrank toward zero, and variables with minimal contribution were gradually excluded. (B) Ten-fold cross-validation for selecting the optimal λ. The left dotted line indicates the λ value that yields the minimum binomial deviance (λ_min), while the right dotted line represents the most regularized model within one standard error of the minimum (λ_1se). The red dots represent mean binomial deviance, with error bars indicating ± 1se. (C) Feature importance trajectories produced by the Boruta algorithm over multiple classifier iterations. (D) Boxplots of Boruta variable importance. Confirmed features (green) and tentative features (yellow) demonstrate higher importance compared with shadow features (blue), whereas rejected features (red) fall below the threshold. (E) Integration of LASSO and Boruta results. Blue edges represent variables selected by LASSO, and red edges indicate variables confirmed by Boruta. The overlapping variables (orange nodes) were identified as the final feature set for model development. BMI, body mass index; APACHE II, acute physiology and chronic health evaluation II; SOFA, sequential organ failure assessment; WBC, white blood cell count; PLT, platelet count; PCT, procalcitonin; IL-6, interleukin-6; IL-10, interleukin-10; TNF-α, tumor necrosis factor-alpha; UA, uric acid.

The Boruta algorithm further evaluated the robustness of each variable through iterative comparisons with shadow features (Figures 2C–D). A total of nine predictors were ultimately confirmed as important, including age, years of education, septic shock, benzodiazepine use, APACHE II score, SOFA score, IL-6, IL-10, and uric acid, while all remaining variables were rejected. To ensure stability and clinical interpretability, the intersection of the two methods was taken as the final feature set. As illustrated in Figure 2E, seven variables overlapped between LASSO and Boruta: age, years of education, septic shock, benzodiazepine use, APACHE II score, SOFA score, and IL-10, which were subsequently used for model development.

3.4 Collinearity analysis

Before model construction, multicollinearity was assessed among the seven predictors retained after feature selection. The VIF values for each variable were as follows: age (1.015), years of education (1.024), septic shock (1.022), benzodiazepine use (1.013), APACHE II score (1.010), SOFA score (1.033), and IL-10 (1.031). All VIFs were well below the conventional threshold of 5, indicating that no severe multicollinearity was present among these predictors. Therefore, all variables were retained for subsequent ML model construction. For clarity, the definitions and value assignments of the variables were summarized in Table 3.

TABLE 3

Variable type Variable name Value assignment
Independent Age Continuous
Independent Years of education Continuous
Independent Septic shock No = 0, Yes = 1
Independent Benzodiazepine use No = 0, Yes = 1
Independent APACHE II score Continuous
Independent SOFA score Continuous
Independent IL-10 Continuous
Dependent Cognitive impairment No = 0, Yes = 1

Definitions and coding schemes for variables included in the analysis.

APACHE II, acute physiology and chronic health evaluation II; SOFA, sequential organ failure assessment; IL-10, interleukin-10.

3.5 Model performance evaluation

Five ML algorithms—logistic regression, XGBoost, RF, KNN, and DT—were constructed using the selected predictors. Model performance was assessed in both the training and validation sets using multiple evaluation metrics. As shown in Figure 3A, all models demonstrated acceptable performance in the training set, with RF, XGBoost, and logistic regression showing superior overall metrics, including accuracy, F1-score, precision, recall, specificity, and Youden index. Similar patterns were observed in the validation set (Figure 3B), where RF, XGBoost, and logistic regression maintained relatively stable and balanced performance, whereas DT showed weaker discrimination across several metrics.

FIGURE 3

Radar and ROC plots comparing model performances. Panels A and B are radar charts for training and validation sets, illustrating metrics like accuracy and precision for Logistic, XGBoost, RF, KNN, and DT models. Panels C and D are ROC curves for training and validation sets, displaying AUC values: Logistic, XGBoost, RF, KNN, and DT. Colors differentiate models: purple (Logistic), gold (XGBoost), orange (RF), blue (KNN), yellow (DT).

Performance of five ML models in predicting post-sepsis cognitive impairment. (A) Radar plot summarizing model performance across multiple evaluation metrics in the training set. (B) Radar plot summarizing model performance across the same metrics in the validation set. (C) ROC curves for the five models in the training set. (D) ROC curves for the five models in the validation set. ML, machine learning; XGBoost, extreme gradient boosting; RF, random forest; KNN, k-nearest neighbor; DT, decision tree; AUC, Area under the curve; MCC, Matthews correlation coefficient; NPV, negative predictive value; PPV, positive predictive value.

ROC curve analysis further demonstrated the discriminative ability of each model. In the training set (Figure 3C), the RF model achieved the highest AUC (0.947, 95% CI: 0.930–0.964), followed by XGBoost (0.912, 95% CI: 0.886–0.937). The DT model yielded the lowest AUC (0.807, 95% CI: 0.763–0.851). In the validation set (Figure 3D), RF remained the best-performing model with an AUC of 0.895 (95% CI: 0.852–0.938), while DT again performed worst (AUC 0.711, 95% CI: 0.639–0.783). Taken together, the RF model demonstrated the most favorable and consistent predictive performance across both datasets and was therefore selected as the final model for subsequent clinical utility evaluation and explainability analysis.

3.6 Clinical utility and performance verification by calibration and confusion matrices in the RF model

The clinical usefulness of the RF model was evaluated using DCA. In both the training set (Figure 4A) and validation set (Figure 4B), the RF model demonstrated a consistently higher net benefit across a wide range of threshold probabilities compared with the “treat-all” and “treat-none” strategies. This indicates that the RF model provides meaningful clinical value for guiding early identification of patients at high risk for post-sepsis cognitive impairment. Model calibration was then assessed using calibration curves. In the training cohort (Figure 4C), the apparent and bias-corrected curves closely followed the ideal reference line, suggesting good agreement between predicted and observed probabilities. A similar calibration pattern was observed in the validation set (Figure 4D), confirming good model reliability and stability. Confusion matrices further demonstrated the model’s classification performance. In the training set (Figure 4E), the RF model correctly identified 413 non-impaired and 123 impaired patients. In the validation set (Figure 4F), 151 non-impaired and 52 impaired patients were correctly classified. These results support the RF model’s balanced performance in both sensitivity and specificity. Collectively, the DCA, calibration analysis, and confusion matrices confirm that the RF model not only predicts cognitive impairment accurately but also offers substantial clinical benefit and generalizability.

FIGURE 4

Six-panel figure showing model evaluation metrics: A and B: Net Benefit graphs for Training and Validation sets respectively, versus Threshold Probability, with “All” , “None” , and “RF” lines. C and D: Calibration plots for Training and Validation sets, showing Observed versus Predicted probability with “Apparent” , “Bias-corrected” , and “Ideal” lines. E: Confusion matrix for the Training set with 413 true negatives, 10 false positives, 60 false negatives, and 123 true positives. F: Confusion matrix for the Validation set with 181 true negatives, 10 false positives, 47 false negatives, and 52 true positives.

Clinical utility, calibration, and confusion matrix evaluation of the RF model. (A) DCA for the RF model in the training set, showing superior net benefit across a wide range of threshold probabilities compared with the treat-all and treat-none strategies. (B) DCA in the validation set, demonstrating consistent clinical usefulness of the RF model. (C) Calibration curve in the training set. The apparent and bias-corrected curves show good agreement between predicted and observed probabilities, indicating reliable model calibration. (D) Calibration curve in the validation set, confirming stable calibration performance. (E) Confusion matrix for the RF model in the training set, illustrating accurate classification of cognitive impairment and non-impairment cases. (F) Confusion matrix in the validation set, showing good generalizability of the classification performance. RF, random forest; DCA, decision curve analysis.

3.7 Model interpretability based on SHAP analysis

SHAP analysis was conducted to enhance interpretability of the RF model by quantifying both global and individual feature contributions, as well as characterizing non-linear relationships between predictors and cognitive impairment risk. At the global level, mean absolute SHAP values (Figure 5A) showed that the SOFA score was the most influential predictor, followed by age, APACHE II score, IL-10, and years of education. Septic shock and benzodiazepine use demonstrated smaller but meaningful contributions. The SHAP summary plot (Figure 5B) further illustrated the directional influence of features. Higher SOFA scores, older age, elevated APACHE II scores, and higher IL-10 levels increased the predicted probability of cognitive impairment, whereas greater years of education had a protective pattern that lowered predicted risk. For categorical variables, patients with septic shock (coded as 1) and those who received benzodiazepines (coded as 1) was associated with higher predicted probabilities within the model, compared with non-use (coded as 0). Individual SHAP force plots provided detailed interpretations for specific patients. For the first patient in the dataset (Figure 5C), lower SOFA and APACHE II scores drove the prediction downward and outweighed the modest positive influence of older age, resulting in an overall low predicted risk. For the last patient in the dataset (Figure 5D), advanced age and the presence of septic shock produced substantial positive contributions, shifting the prediction toward greater cognitive impairment risk. To illustrate extreme model behaviors, Figure 5E presents a typical low-risk case, characterized by low SOFA and APACHE II scores, higher years of education, and low IL-10 levels. In contrast, the high-risk case shown in Figure 5F demonstrates strong positive contributions from high SOFA scores, elevated IL-10, and older age, collectively pushing the prediction toward a near-certain risk.

FIGURE 5

Bar and violin plots illustrate SHAP value analyses. Panel A shows mean SHAP values highlighting the SOFA score as the top predictor. Panel B presents violin plots showing feature values with SHAP impacts. Panels C-F display waterfall plots of specific cases, detailing how features like SOFA score, APACHE II score, and age impact predictions positively or negatively. Panel F shows a significant prediction influenced by a high SOFA score.

Global and individual-level SHAP interpretation of the RF model. (A) Mean absolute SHAP values showing the global importance ranking of all predictors. (B) SHAP summary (beeswarm) plot illustrating the direction and magnitude of feature influences. Feature values are color-coded from low (purple) to high (yellow). (C) SHAP force plot for the first patient in the dataset. (D) SHAP force plot for the last patient in the dataset. (E) SHAP force plot of a representative low-risk case. (F) SHAP force plot of a representative high-risk case. Yellow bars represent features that increase the predicted risk, whereas purple bars represent features that reduce it. E[f(x)] denotes the model’s baseline expected prediction, and f(x) refers to the individualized predicted probability for a given patient. SHAP, SHapley Additive exPlanations; APACHE II, acute physiology and chronic health evaluation II; SOFA, sequential organ failure assessment; IL-10, interleukin-10.

SHAP dependence plots (Figures 6A–G) further revealed the non-linear relationships between predictors and their contributions to risk. SHAP values increased progressively with age, with a marked upward shift after approximately 65 years. Years of education showed a clear negative association, with higher educational attainment corresponding to lower predicted risk. For the binary predictors septic shock and benzodiazepine use, SHAP values remained near-zero when absent and shifted positively when present, supporting their role as risk-enhancing factors. APACHE II displayed an upward non-linear pattern, with sharply increasing contributions beyond scores of about 30. SOFA showed a sigmoidal pattern, with limited influence at low values and rapidly rising SHAP values above 7–8. IL-10 demonstrated a U-shaped pattern at lower concentrations followed by steep positive contributions once levels exceeded approximately 1.0 μg/L.

FIGURE 6

Graphs A to G show SHAP value plots illustrating relationships between different variables and their impact. Graph A (Age) and Graph E (APACHE II score) show a positive correlation. Graph B (Years of education) shows a negative correlation. Graphs C (Septic shock) and D (Benzodiazepine use) display scattered values with minimal variation. Graph F (SOFA score) shows an upward trend. Graph G (IL-10) depicts a non-linear relationship. Each graph includes blue data points and a red trendline.

SHAP dependence plots demonstrating non-linear effects of key predictors. (A) Age. (B) Years of education. (C) Septic shock. (D) Benzodiazepine use. (E) APACHE II score. (F) SOFA score. (G) IL-10. Each dot corresponds to an individual patient, with the x-axis displaying the actual feature value and the y-axis representing its associated SHAP value. SHAP, SHapley Additive exPlanations; APACHE II, acute physiology and chronic health evaluation II; SOFA, sequential organ failure assessment; IL-10, interleukin-10.

Together, these global, individual, and dependence-based SHAP analyses clarify which features exert the greatest influence, how variations in feature values shape predictions, and why individual patients are classified as high or low risk. These results substantially improve the transparency and clinical interpretability of the RF model.

4 Discussion

In this retrospective cohort study, we developed an interpretable ML framework to predict cognitive impairment among sepsis survivors and evaluated its performance using multiple complementary validation methods. The study identified seven key predictors using a combined LASSO-Boruta feature selection strategy, including age, years of education, septic shock, benzodiazepine use, APACHE II score, SOFA score, and IL-10. Among the five ML models constructed, the RF model achieved the most favorable and stable performance, with AUCs of 0.947 in the training set and 0.895 in the validation set. Calibration curves, DCA, and confusion matrices consistently confirmed the model’s stability, good clinical applicability, and balanced classification performance. SHAP-based interpretability analysis further clarified the global and individual contributions of the selected predictors. Among them, SOFA score emerged as the strongest contributor, followed by age, APACHE II score, IL-10, and years of education. These findings highlight the potential of an interpretable RF model to enhance early identification of sepsis survivors at high risk for cognitive impairment.

Although cognitive impairment is increasingly recognized as a major complication of sepsis, predictive studies remain scarce. Among the limited studies that have attempted to predict cognitive outcomes after sepsis, Zhu et al. evaluated three serum biomarkers and reported that their combined panel achieved an AUC of 0.887 in predicting persistent cognitive impairment in a cohort of 114 sepsis patients (23). Although the study offers initial insights into potential markers associated with cognitive outcomes after sepsis, its reliance on specialized biomarker assays, small sample size, and the absence of ML guided feature evaluation or interpretability limits its scalability and clinical practicality. In contrast, most other predictive efforts in the sepsis literature have focused on acute brain dysfunction rather than post-discharge cognitive outcomes. Jin et al. used MIMIC-IV data to develop a LASSO-based nomogram for predicting sepsis-associated encephalopathy (SAE), showing moderate discrimination. However, their model targets acute encephalopathy, not cognitive impairment (24). Similarly, Zhao et al. constructed a logistic-regression nomogram to predict acute SAE in elderly ICU patients, achieving an AUC of approximately 0.80, but their work likewise centers on in-hospital encephalopathy (25). Compared with these prior studies, our work provides substantial methodological and clinical advancements by focusing specifically on cognitive impairment, a clinically meaningful outcome that has been largely overlooked in previous predictive models. By leveraging routinely available clinical variables within a rigorous LASSO-Boruta framework and incorporating SHAP-based interpretability to characterize non-linear and individualized risk patterns, our model uses a RF algorithm that achieved excellent discriminative performance, with an AUC of 0.947 in the training cohort and 0.895 in the validation cohort, and offers a robust approach to anticipating cognitive impairment after sepsis.

The importance of the seven retained predictors is supported by existing biological and clinical evidence. Age is one of the strongest predictors of post-sepsis cognitive impairment. Older adults exhibit reduced cognitive reserve, diminished neuroplasticity, and greater vulnerability to inflammatory and hypoperfusion-related brain injury. Sepsis triggers systemic inflammation, microglial activation, oxidative stress, and blood-brain barrier dysfunction, all of which disproportionately impair aging neural circuits (26, 27). Overton et al. reported in a large population-based cohort that the prevalence of mild cognitive impairment rises substantially with advancing age, with older adults showing markedly higher rates of both mild and severe cognitive deficits across all cognitive impairment subtypes (28). Our findings similarly demonstrate that higher age is a key vulnerability factor for post-sepsis cognitive impairment. Individuals with higher educational attainment are believed to have greater cognitive reserve, which enables more efficient neural networks and resilience against brain injury from inflammation or hypoperfusion (29, 30). Concha-Cisternas et al. analyzed more than 2,000 Chilean adults aged ≥ 60 years and found that individuals with lower educational attainment had a markedly higher likelihood of cognitive impairment, with the risk increasing up to 4.53-fold in those with ≤ 8 years of education compared with those with higher education (31). This strong inverse association between education and cognitive performance supports the cognitive-reserve hypothesis and aligns with our finding that fewer years of education increase vulnerability to post-sepsis cognitive impairment. Patients who develop septic shock experience more pronounced circulatory and microvascular disturbances, which increase vulnerability to secondary brain injury through impaired cerebral perfusion and amplified inflammation (32). An MRI study by Götz et al. demonstrated that survivors of severe sepsis or septic shock exhibit structural alterations in key cognitive regions, including the hippocampus and amygdala (33). This evidence supports the strong association between shock physiology and cognitive impairment and aligns with our finding that septic shock significantly heightens the risk of post-sepsis cognitive decline. Benzodiazepine exposure has been reported to be associated with long-term cognitive outcomes in critically ill patients, often reflecting underlying acute brain dysfunction rather than a direct causal effect (34). Pandharipande et al. showed that lorazepam use independently increased the daily risk of transitioning into delirium in ICU patients (35). Stewart reviewed evidence from multiple peer-reviewed studies and reported that long-term benzodiazepine use is associated with impairments across several cognitive domains, with deficits that may persist even after discontinuation (36). The association between benzodiazepine use and post-sepsis cognitive impairment should be interpreted with caution. In critically ill patients, benzodiazepines are often administered in response to agitation, delirium, or severe physiological instability. As such, benzodiazepine exposure is subject to confounding by indication and may primarily reflect underlying acute brain dysfunction or illness severity rather than serving as an independent causal factor of long-term cognitive impairment. From a clinical perspective, benzodiazepine use in this context may act as a surrogate marker identifying patients who experienced more severe neurophysiological stress during the acute phase of sepsis. This interpretation is consistent with prior evidence linking sedative exposure, delirium, and subsequent cognitive decline. Therefore, our findings should not be interpreted as implying a direct neurotoxic effect of benzodiazepines per se, but rather as highlighting a high-risk clinical phenotype characterized by acute neurological vulnerability. Higher APACHE II and SOFA scores reflect more severe physiological derangement and multi-organ dysfunction during sepsis, factors that have been consistently linked to worse long-term neurological outcomes. Severe systemic inflammation, impaired perfusion, and metabolic instability captured by these scoring systems can exacerbate neuronal injury and hinder cognitive recovery (37). Iwashyna et al. reported that survivors of severe sepsis experienced a more than threefold increase in moderate-to-severe cognitive impairment compared with their presepsis baseline (4). A recent systematic review further showed that severe sepsis was associated with a significantly higher risk of dementia or cognitive impairment than milder forms of sepsis (10). These findings reinforce the concept that greater illness severity and organ failure burden markedly increase vulnerability to post-sepsis cognitive impairment, consistent with our observation that higher APACHE II and SOFA scores are key predictors of poor cognitive outcomes. It is important to interpret the contribution of severity-related scores within the context of a survivor-based cohort. In sepsis, patients with extremely high SOFA or APACHE II scores are at substantially increased risk of early mortality and are therefore less likely to survive to undergo post-discharge cognitive evaluation. As a result, the analyzed population represents a subset of sepsis survivors in whom the distribution of disease severity may be partially truncated. Within this framework, the strong influence of SOFA and APACHE II scores observed in our model suggests that, even among survivors, residual organ dysfunction and acute physiological derangement remain key determinants of subsequent cognitive impairment. Rather than reflecting mortality risk alone, these severity scores may capture the cumulative burden of systemic inflammation, hypoperfusion, and multi-organ dysfunction that predispose surviving patients to long-term neurocognitive sequelae. This distinction is clinically relevant, as it highlights that illness severity retains prognostic value beyond survival and may aid in identifying high-risk survivors who could benefit from early cognitive monitoring and intervention. IL-10 also emerged as a significant predictor in our model. Although IL-10 can exert anti-inflammatory or even neuroprotective effects in experimental models of acute neuroinflammation, elevated circulating IL-10 levels in sepsis primarily reflect severe immune dysregulation rather than direct protection (38, 39). Clinical studies have consistently shown that higher IL-10 concentrations correlate with greater organ failure, secondary infections, and increased mortality, indicating a state of sepsis-induced immunosuppression and more severe physiological derangement (40, 41). In this context, IL-10 likely serves as a biomarker of heightened systemic inflammation and immune paralysis—conditions known to amplify neuroinflammatory responses and hinder cognitive recovery. This interpretation aligns with our findings that higher IL-10 levels were associated with an increased risk of post-sepsis cognitive impairment.

Several methodological features of our study strengthen both the robustness of the model and its potential for real-world clinical application. By integrating a two-stage LASSO-Boruta feature selection strategy, we combined the strengths of penalized regression and iterative feature confirmation to minimize redundancy, reduce multicollinearity, and ensure the stability and clinical relevance of the retained predictors. The parallel evaluation of multiple ML algorithms further reduced model-specific bias, allowing an unbiased selection of the most reliable approach. Importantly, embedding SHAP-based interpretability addresses one of the major barriers to clinical adoption of ML—the opacity of prediction mechanisms—by providing transparent, patient-level explanations of how individual variables contribute to risk estimates. The methodological strengths of our model also translate into clinically actionable implications. Although septic shock, APACHE II, and SOFA scores primarily reflect underlying illness severity rather than directly adjustable interventions, the strong association of early physiological derangement with long-term cognitive impairment suggests that meticulous early resuscitation and organ-support strategies may indirectly contribute to neuroprotection. Together, these findings highlight several points in the acute-management trajectory where optimization may help mitigate the risk of post-sepsis cognitive decline. Although k-fold cross-validation is commonly used to enhance robustness in predictive modeling, a fixed training-validation split was intentionally adopted in this study. This design allowed us to retain an independent validation cohort for assessing discrimination, calibration, and clinical utility, as well as for illustrating SHAP-based interpretability on unseen data. Given that the primary objective of this work was to develop a clinically interpretable prediction framework rather than to optimize performance alone, preserving a hold-out validation set was considered appropriate. Nevertheless, we acknowledge that k-fold cross-validation may further improve generalizability, particularly in single-center settings. Future studies incorporating multicenter cohorts and cross-validation–based validation strategies are warranted to further evaluate model robustness and transportability. In addition to interpretability and predictive performance, the clinical applicability of ML models also depends on their generalizability across settings. Although the present model demonstrated strong discriminative performance and stability under internal validation, several issues related to generalizability should be considered. This study was conducted at a single center, and local clinical practices, patient characteristics, and sedation strategies may influence model behavior. Consequently, the current findings should be interpreted as reflecting performance within a similar clinical context rather than as universally generalizable estimates. Nevertheless, the use of conservative feature selection, independent validation splitting, and multiple complementary performance assessments was intended to reduce overfitting and enhance robustness. External validation using multicenter cohorts with standardized cognitive assessments will be essential to further evaluate model transportability and is a priority for future investigation. Several limitations should be acknowledged. First, this study included only sepsis survivors who were able to complete post-discharge cognitive assessments, which may introduce survivorship bias. Patients with extremely severe illness and early mortality were underrepresented, potentially limiting the generalizability of the findings to the entire sepsis population. Future multicenter studies are needed to validate the model across broader severity spectra. Second, this was a single-center retrospective study with internal validation only. Although multiple strategies were applied to reduce overfitting, including conservative feature selection and independent validation splitting, the absence of external validation limits the generalizability of the proposed model. Validation in multicenter cohorts with diverse clinical practices and patient populations is therefore warranted. Third, although MoCA is widely used for cognitive screening, it may not capture subtle deficits in specific domains such as processing speed or executive function. Incorporating comprehensive neuropsychological batteries or digital cognitive tools in future studies would yield more granular insights. Fourth, the follow-up duration in this study was limited to 3 months after hospital discharge, which may not fully capture the trajectory of post-sepsis cognitive impairment. Fifth, benzodiazepine use was treated as a binary variable due to limitations in the availability of detailed dosing and duration data. This simplified representation may not fully capture dose-dependent or time-dependent effects, and residual confounding by indication cannot be completely excluded. Finally, although a broad spectrum of clinical variables was incorporated, residual confounding cannot be completely excluded, particularly with regard to premorbid cognitive reserve, socioeconomic factors, and variations in sedation or hemodynamic management.

5 Conclusion

In summary, our study demonstrates that age, years of education, septic shock, benzodiazepine exposure, APACHE II and SOFA scores, and IL-10 levels are key features contributing to the model’s prediction of post-sepsis cognitive impairment. Among the ML models evaluated, the RF algorithm demonstrated the most robust and stable predictive performance. By integrating these variables into an interpretable RF framework, we were able to quantify the relative importance of each factor and enable fine-grained, individualized risk prediction to support personalized clinical management and early identification of high-risk patients. While the follow-up period was limited and external validation is needed, this model offers a promising tool for individualized risk stratification and may inform targeted strategies to mitigate neurocognitive sequelae in sepsis survivors. Future studies with extended follow-up and multicenter cohorts are warranted to refine predictive accuracy and elucidate the mechanisms underlying persistent cognitive impairment.

Statements

Data availability statement

The raw data supporting the conclusions of this article will be made available by the author, without undue reservation.

Ethics statement

The studies involving humans were approved by Ethics Committee of The Shenzhen Baoan Shiyan People’s Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

GL: Writing – original draft, Methodology, Conceptualization. ML: Writing – original draft, Formal analysis, Software, Data curation. YP: Writing – review & editing, Investigation, Supervision.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

  • 1.

    Martin-Loeches I Singer M Leone M . Sepsis: key insights, future directions, and immediate goals. A review and expert opinion.Intensive Care Med. (2024) 50:20439. 10.1007/s00134-024-07694-z

  • 2.

    Fleischmann C Scherag A Adhikari N Hartog C Tsaganos T Schlattmann P et al Assessment of global incidence and mortality of hospital-treated sepsis. Current estimates and limitations. Am J Respir Crit Care Med. (2016) 193:25972. 10.1164/rccm.201504-0781OC

  • 3.

    Prescott H Costa D . Improving long-term outcomes after sepsis.Crit Care Clin. (2018) 34:17588. 10.1016/j.ccc.2017.08.013

  • 4.

    Iwashyna T Ely E Smith D Langa K . Long-term cognitive impairment and functional disability among survivors of severe sepsis.JAMA. (2010) 304:178794. 10.1001/jama.2010.1553

  • 5.

    Li Y Ji M Yang J . Current understanding of long-term cognitive impairment after sepsis.Front Immunol. (2022) 13:855006. 10.3389/fimmu.2022.855006

  • 6.

    Ji M Gao Y Shi C Wu X Yang J . Acute and long-term cognitive impairment following sepsis: mechanism and prevention.Expert Rev Neurother. (2023) 23:93143. 10.1080/14737175.2023.2250917

  • 7.

    Michels M Steckert A Quevedo J Barichello T Dal-Pizzol F . Mechanisms of long-term cognitive dysfunction of sepsis: from blood-borne leukocytes to glial cells.Intensive Care Med Exp. (2015) 3:30. 10.1186/s40635-015-0066-x

  • 8.

    Xin Y Tian M Deng S Li J Yang M Gao J et al The key drivers of brain injury by systemic inflammatory responses after sepsis: microglia and neuroinflammation. Mol Neurobiol. (2023) 60:136990. 10.1007/s12035-022-03148-z

  • 9.

    Calsavara A Nobre V Barichello T Teixeira A . Post-sepsis cognitive impairment and associated risk factors: a systematic review.Aust Crit Care. (2018) 31:24253. 10.1016/j.aucc.2017.06.001

  • 10.

    Lei S Li X Zhao H Feng Z Chun L Xie Y et al Risk of dementia or cognitive impairment in sepsis survivals: a systematic review and meta-analysis. Front Aging Neurosci. (2022) 14:839472. 10.3389/fnagi.2022.839472

  • 11.

    Lei J Zhai J Qi J Sun C . Identification of sepsis-associated encephalopathy biomarkers through machine learning and bioinformatics approaches.Sci Rep. (2024) 14:31717. 10.1038/s41598-024-82885-8

  • 12.

    Gao L Wang G Yang X Tong S Wang X Chen Y et al Development of a risk prediction model for sepsis-related delirium based on multiple machine learning approaches and an online calculator. PLoS One. (2025) 20:e0323831. 10.1371/journal.pone.0323831

  • 13.

    Strohl J Carrión J Huerta P . Brain imaging and machine learning reveal uncoupled functional network for contextual threat memory in long sepsis.Sci Rep. (2024) 14:27747. 10.1038/s41598-024-79259-5

  • 14.

    Strohl J Huerta T Robbiati S Huerta P . Dysregulated neural coding in the vagus nerve during long sepsis.Brain Behav Immun Health. (2025) 47:101043. 10.1016/j.bbih.2025.101043

  • 15.

    Jiang T Gradus J Rosellini A . Supervised machine learning: a brief primer.Behav Ther. (2020) 51:67587. 10.1016/j.beth.2020.05.002

  • 16.

    Gabel J Desaphy J Rognan D . Beware of machine learning-based scoring functions-on the danger of developing black boxes.J Chem Inf Model. (2014) 54:280715. 10.1021/ci500406k

  • 17.

    Rudin C . Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nat Mach Intell. (2019) 1:20615. 10.1038/s42256-019-0048-x

  • 18.

    Azodi C Tang J Shiu S . Opening the black box: interpretable machine learning for geneticists.Trends Genet. (2020) 36:44255. 10.1016/j.tig.2020.03.005

  • 19.

    Bifarin O . Interpretable machine learning with tree-based shapley additive explanations: application to metabolomics datasets for binary classification.PLoS One. (2023) 18:e0284315. 10.1371/journal.pone.0284315

  • 20.

    Singer M Deutschman C Seymour C Shankar-Hari M Annane D Bauer M et al The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. (2016) 315:80110. 10.1001/jama.2016.0287

  • 21.

    Yu J Li J Huang X . The Beijing version of the Montreal Cognitive Assessment as a brief screening tool for mild cognitive impairment: a community-based study.BMC Psychiatry. (2012) 12:156. 10.1186/1471-244x-12-156

  • 22.

    Sautin Y Johnson R . Uric acid: the oxidant-antioxidant paradox.Nucleosides Nucleotides Nucleic Acids. (2008) 27:60819. 10.1080/15257770802138558

  • 23.

    Zhu Z He C Yao H Liao G Gan Y Deng L . Assessment of the combined utility of S100B, GFAP, and IL-6 in predicting long-term cognitive impairment in survivors of sepsis.Neurol Res. (2025) 47:98190. 10.1080/01616412.2025.2511084

  • 24.

    Jin J Yu L Zhou Q Zeng M . Improved prediction of sepsis-associated encephalopathy in intensive care unit sepsis patients with an innovative nomogram tool.Front Neurol. (2024) 15:1344004. 10.3389/fneur.2024.1344004

  • 25.

    Zhao Q Xiao J Liu X Liu H . The nomogram to predict the occurrence of sepsis-associated encephalopathy in elderly patients in the intensive care units: a retrospective cohort study.Front Neurol. (2023) 14:1084868. 10.3389/fneur.2023.1084868

  • 26.

    Seidel G Gaser C Götz T Günther A Hamzei F . Accelerated brain ageing in sepsis survivors with cognitive long-term impairment.Eur J Neurosci. (2020) 52:4395402. 10.1111/ejn.14850

  • 27.

    Manabe T Heneka M . Cerebral dysfunctions caused by sepsis during ageing.Nat Rev Immunol. (2022) 22:44458. 10.1038/s41577-021-00643-7

  • 28.

    Overton M Pihlsgård M Elmståhl S . Prevalence and incidence of mild cognitive impairment across subtypes, age, and sex.Dement Geriatr Cogn Disord. (2019) 47:21932. 10.1159/000499763

  • 29.

    Lövdén M Fratiglioni L Glymour M Lindenberger U Tucker-Drob E . Education and cognitive functioning across the life span.Psychol Sci Public Interest. (2020) 21:641. 10.1177/1529100620920576

  • 30.

    Wilson R Yu L Lamar M Schneider J Boyle P Bennett D . Education and cognitive reserve in old age.Neurology. (2019) 92:e104150. 10.1212/wnl.0000000000007036

  • 31.

    Concha-Cisternas Y Castro-Piñero J Petermann-Rocha F Troncoso-Pantoja C Díaz X Cigarroa I et al [Association between educational level and suspicion of cognitive imparirment in Chilean older people]. Rev Med Chil. (2022) 150:157584. 10.4067/s0034-98872022001201575

  • 32.

    Polito A Eischwald F Maho A Polito A Azabou E Annane D et al Pattern of brain injury in the acute setting of human septic shock. Crit Care. (2013) 17:R204. 10.1186/cc12899

  • 33.

    Götz T Günther A Witte O Brunkhorst F Seidel G Hamzei F . Long-term sequelae of severe sepsis: cognitive impairment and structural brain alterations - an MRI study (LossCog MRI).BMC Neurol. (2014) 14:145. 10.1186/1471-2377-14-145

  • 34.

    Goldberg T Chen C Wang Y Jung E Swanson A Ing C et al Association of delirium with long-term cognitive decline: a meta-analysis. JAMA Neurol. (2020) 77:137381. 10.1001/jamaneurol.2020.2273

  • 35.

    Pandharipande P Shintani A Peterson J Pun B Wilkinson G Dittus R et al Lorazepam is an independent risk factor for transitioning to delirium in intensive care unit patients. Anesthesiology. (2006) 104:216. 10.1097/00000542-200601000-00005

  • 36.

    Stewart S . The effects of benzodiazepines on cognition.J Clin Psychiatry. (2005) 66:913.

  • 37.

    Li D Wei Y Zhang C Yang Y Wang Z Lu Y et al Value of SOFA score, APACHE II score, and WBC count for mortality risk assessment in septic patients: a retrospective study. Medicine. (2025) 104:e42464. 10.1097/md.0000000000042464

  • 38.

    Richwine A Sparkman N Dilger R Buchanan J Johnson R . Cognitive deficits in interleukin-10-deficient mice after peripheral injection of lipopolysaccharide.Brain Behav Immun. (2009) 23:794802. 10.1016/j.bbi.2009.02.020

  • 39.

    Candelli M Sacco Fernandez M Rozzi G Sodero G Piccioni A Pignataro G et al The interleukin network in sepsis: from cytokine storm to clinical applications. Diagnostics. (2025) 15:2927. 10.3390/diagnostics15222927

  • 40.

    Li X Xu Z Pang X Huang Y Yang B Yang Y et al Interleukin-10/lymphocyte ratio predicts mortality in severe septic patients. PLoS One. (2017) 12:e0179050. 10.1371/journal.pone.0179050

  • 41.

    Gogos C Drosou E Bassaris H Skoutelis A . Pro- versus anti-inflammatory cytokine profile in patients with severe sepsis: a marker for prognosis and future therapeutic options.J Infect Dis. (2000) 181:17680. 10.1086/315214

Summary

Keywords

cognitive impairment, interpretable machine learning, random forest, sepsis, SHapley Additive exPlanations

Citation

Liang G, Lu M and Pan Y (2026) Development and validation of an interpretable machine learning model for predicting cognitive impairment in patients with sepsis. Front. Med. 12:1753260. doi: 10.3389/fmed.2025.1753260

Received

24 November 2025

Revised

23 December 2025

Accepted

23 December 2025

Published

21 January 2026

Volume

12 - 2025

Edited by

Patricio Huerta, Feinstein Institute for Medical Research, United States

Reviewed by

Joshua James Strohl, Feinstein Institute for Medical Research, United States

Zhengqiu Yu, Xiamen University, China

Updates

Copyright

*Correspondence: Guisheng Liang,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics