Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Med., 29 December 2025

Sec. Intensive Care Medicine and Anesthesiology

Volume 12 - 2025 | https://doi.org/10.3389/fmed.2025.1668593

This article is part of the Research TopicPredictive Models in Non-Invasive Respiratory Support: Insights from Clinical and Machine Learning ModelsView all 6 articles

Development and validation of a risk prediction model for consciousness disorders in stroke patients in the intensive care unit (ICU): a retrospective study

Gang FangGang Fang1Liping WangLiping Wang2Xinhua LiuXinhua Liu1Jinyu LiuJinyu Liu2Yongle PeiYongle Pei1Yuxia QiYuxia Qi2Haixia Chang
Haixia Chang2*
  • 1School of Nursing, Xinjiang Medical University, Ürümqi, China
  • 2Department of Nursing, The Fifth Affiliated Hospital of Xinjiang Medical University, Ürümqi, China

Objective: We used data from stroke patients in the Medical Information Mart for Intensive Care (MIMIC) database to develop and validate risk prediction models for consciousness disorders in stroke patients using 11 machine learning algorithms. It aims to provide a basis for clinical assessment of consciousness changes in stroke patients.

Methods: Data of 2,434 stroke patients were extracted from the MIMIC-IV database and randomly split into a training set and a validation set at a 7:3 ratio. Multivariate logistic regression was employed to identify independent predictors, and 11 machine learning algorithms were used to construct predictive models for post-stroke consciousness disorders. Calibration curves were applied to validate the calibration performance of the models, while decision curve analysis (DCA) was utilized to evaluate their clinical applicability, ultimately determining the optimal predictive model.

Results: A total of 2,434 ICU stroke patients were included, with 1,706 assigned to the training set and 728 to the validation set. Logistic regression analysis identified four independent predictors (all p < 0.001): length of hospital stay (p < 0.001, 95% confidence interval [CI]: 1.02–1.06), mechanical ventilation (p < 0.001, 95% CI: 0.29–0.72), nasogastric tube (p < 0.001, 95% CI: 1.61–3.79), and Sequential Organ Failure Assessment (SOFA) score (p < 0.001, 95% CI: 1.47–1.74). Among the 11 machine learning models, the Light Gradient Boosting Machine (LightGBM) model exhibited the optimal performance across three dimensions: accuracy (area under the curve [AUC] = 0.824 in the training set, AUC = 0.795 in the validation set), stability (consistency between training and validation set results), and probability calibration (Brier score = 0.132 in the training set, Brier score = 0.140 in the validation set). Calibration curves demonstrated excellent agreement between the model’s predictions and ideal values in both datasets, and DCA confirmed its favorable clinical utility.

Conclusion: Multivariate analysis revealed that length of hospital stay, mechanical ventilation, nasogastric tube, and SOFA score are independent predictors of consciousness disorders in ICU stroke patients. The model constructed using the LightGBM algorithm showed the best comprehensive performance and can serve as an intuitive, personalized clinical tool. It assists healthcare providers in the early identification and risk stratification of stroke patients at high risk of consciousness disorders, thereby supporting the timely implementation of interventions to reduce the incidence of complications.

1 Introduction

Stroke patients are highly prone to consciousness disorders, which present as lethargy, confusion, agitation, or coma. These conditions can precipitate complications including pressure ulcers, urinary tract infections, aspiration pneumonia, and hypostatic pneumonia—markedly compromising patients’ quality of life and elevating their risk of mortality (1). According to existing literature, 4–38% of stroke patients may suffer from varying degrees of coma, whereas 13–48% exhibit symptoms of confusion or delirium (2). Delayed intervention for post-stroke consciousness disorders can further result in aphasia, dysphagia, and pulmonary infections, which in turn prolong the rehabilitation period, increase medical expenditures, impede functional recovery and prognostic improvement, and even pose life-threatening hazards (3). Notably, stroke patients with altered consciousness typically require admission to the Intensive Care Unit (ICU) for treatment (4), and those with concurrent consciousness disorders have a substantially higher mortality rate compared to patients with normal consciousness (5). Therefore, the timely identification and intervention of consciousness changes in post-stroke patients are crucial for optimizing patient prognosis.

A predictive model refers to a tool that estimates the probability of prognostic events by leveraging clinicopathological parameters (6). While prior studies have examined the influencing factors of post-stroke delirium and established predictive models for this condition (710), a critical gap remains: there is an urgent need to utilize multicenter, large-sample datasets and integrate multiple machine learning algorithms to develop an optimal predictive model tailored specifically to post-stroke consciousness disorders.

Jointly developed by the Massachusetts Institute of Technology (MIT) and Beth Israel Deaconess Medical Center (BIDMC), the Medical Information Mart for Intensive Care IV (MIMIC-IV) database houses comprehensive clinical data from nearly 70,000 ICU patients treated at BIDMC between 2008 and 2019. Its core data domains encompass demographic characteristics, vital sign records, laboratory test results, medication regimens, medical device utilization (e.g., mechanical ventilators, nasogastric tubes), International Classification of Diseases (ICD) codes, and in-hospital outcomes (e.g., in-hospital death, discharge location) (11). Owing to its large sample size, diverse data categories, well-structured format, and accessibility for retrieval and analysis, MIMIC-IV has become a widely adopted resource in research focused on critical illnesses such as stroke and sepsis (12).

In this retrospective study, we developed a predictive model for post-stroke Consciousness Disorders by analyzing potential risk factors extracted from the MIMIC-IV database.

2 Materials and methods

2.1 Data source

A retrospective cohort analysis was performed using data extracted from the MIMIC-IV Version 3.1 database (13). Gang Fang (ID: 69157712), an author of this study, completed the required registration process and all mandatory training modules through the National Institutes of Health (NIH) Collaborative Agreement Online platform. This access was granted following approval by the Institutional Review Board (IRB) of the MIT. Mr. Fang maintains valid authorization to access, retrieve, and validate data within the MIMIC-IV database.

2.2 Study population

Inclusion Criteria: Patient data were retrieved from the MIMIC-IV database in accordance with the following criteria: (1) All participants met the diagnostic criteria for ischemic stroke as defined in the Chinese Guidelines for the Diagnosis and Treatment of Acute Ischemic Stroke (2023) (14), or the diagnostic criteria for hemorrhagic stroke as outlined in the Chinese Guidelines for the Diagnosis and Treatment of Intracerebral Hemorrhage (2019) (15); (2) Age ≥ 18 years; (3) Onset of consciousness disorders occurring within 24 h following stroke onset.

Exclusion Criteria: (1) Age < 18 years (n = 0); (2) Comorbid conditions that could interfere with the assessment of consciousness disorders (e.g., dementia, pre-stroke consciousness impairment) (n = 326); (3) A history of cardiopulmonary resuscitation (n = 20); (4) Missing data accounting for more than 20% of total variables (n = 2).

The patient selection workflow is depicted in Figure 1. Previous studies have established the Glasgow Coma Scale (GCS) as the gold standard for evaluating patients’ level of consciousness (16). In the current study, the GCS was utilized to assess patients’ consciousness status at the time of hospital admission, with consciousness categorized into two distinct groups: conscious and consciousness-impaired. The GCS score spans a range of 3 to 15 points: a score of 15 denotes a fully conscious state, whereas a score below 15 indicates the presence of consciousness disorders (17). Consistent with this classification, patients in the present study with a GCS score < 15 were allocated to the consciousness disorders group, while those with a GCS score = 15 were assigned to the non-consciousness disorders group.

Figure 1
Flowchart illustrating the extraction and processing of stroke patient information from the MIMIC-IV v3.1 database with 2,782 entries. Exclusion criteria removed patients under eighteen, those with diseases like dementia, those who had undergone cardiopulmonary resuscitation, and those with excess missing data, totaling 348 exclusions. The final analysis included 2,434 patients. Data was divided into a training set of 1,706 and a validation set of 728.

Figure 1. Flow diagram of patient selection.

2.3 Statistical analysis

Statistical analyses were performed using SPSS 27.0 and R 4.4.3 software. Normally distributed quantitative data were expressed as mean ± standard deviation (SD), while non-normally distributed quantitative data were described using median and interquartile range (IQR). Categorical variables were presented as frequencies or percentages. For intergroup comparisons, the independent-samples t-test or Mann–Whitney U test was used for continuous variables, and the chi-square test or Fisher’s exact test was applied for categorical variables. Variables with a missing value rate exceeding 20% were directly excluded. For variables with a missing value rate of ≤20%—including fasting blood glucose, potassium, and sodium—simple mean imputation was applied to address missing values. The dataset was randomly divided into a training set and a validation set at a 7:3 ratio.

Univariate and multivariate logistic regression analyses were used to identify risk factors associated with post-stroke consciousness disorders, which informed the construction of the predictive model. Receiver operating characteristic (ROC) curves were employed to evaluate model performance, with the area under the curve (AUC) calculated to quantify the diagnostic efficacy of the models. Calibration curves were utilized to verify the calibration of the models, while decision curve analysis (DCA) was performed to assess their clinical utility. Detailed procedures for generating the DCA curves are provided in Supplementary Material S1. The level of statistical significance was set at p < 0.05.

Eleven machine learning algorithms were employed for model construction and validation, encompassing Decision Tree (DT), Ridge Regression, Elastic Net (ENet), Logistic Regression (LR), Least Absolute Shrinkage and Selection Operator (LASSO) Regression, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP). Five-fold cross-validation and hyperparameter tuning were implemented to optimize model performance.

Evaluation metrics included the Area Under the Receiver Operating Characteristic Curve (AUC), accuracy, F1-score, and Brier score, all of which were utilized to comprehensively assess the predictive capability of each model. Calibration curves and Decision Curve Analysis (DCA) were separately applied to verify the calibration effectiveness and clinical applicability of the models, ultimately identifying the optimal one.

SHAP (SHapley Additive exPlanations) is a game-theoretic approach designed to interpret the outputs of machine learning models. By leveraging the classic Shapley value from game theory and its relevant extensions, it integrates optimal contribution allocation with local interpretations. In essence, SHAP values can quantify the contribution of each feature to the individual prediction results of a single sample as well as the overall model output. This enables us to conduct a clear and consistent ranking of feature importance based on their marginal impacts.

2.4 Machine learning model training and hyperparameter optimization

Regarding the LightGBM model, the specific parameters employed were as follows: the number of boosting iterations (n_estimators) was set to 1,000; the learning rate was 0.05; the maximum depth (max_depth) was configured to 7; and the number of leaf nodes (num_leaves) was restricted to 31. Importantly, an early stopping criterion (with an early stopping tolerance of 50 rounds) was implemented on a held-out validation set (accounting for 20% of the training data) to mitigate model overfitting and determine the optimal number of boosting rounds. The model that achieved the best performance on this validation set was selected as the final model for subsequent evaluations. Furthermore, the complete parameters of all other models (e.g., Random Forest, Support Vector Machine, Logistic Regression) have been tabulated in Table 1. These detailed specifications enable other researchers to precisely replicate our model training process, thereby enhancing the scientific rigor and credibility of the study findings (see Table 1 for additional details).

Table 1
www.frontiersin.org

Table 1. Relevant parameters of various machine learning models.

3 Results

3.1 General characteristics

A total of 2,434 ICU stroke patients were included in this study, among whom 1,802 (74.0%) developed consciousness disorders within 24 h of stroke onset. Table 2 presents the demographic and clinical baseline characteristics of the patients stratified by consciousness status, while Table 3 provides detailed results of univariate analysis—identifying variables with statistically significant differences between the consciousness disorder group and non-consciousness disorder group (all p < 0.05).

Table 2
www.frontiersin.org

Table 2. The characteristics of patients included in the training set (stratified by risk of altered consciousness).

Table 3
www.frontiersin.org

Table 3. Univariate and multivariate logistic analysis was performed on the training group.

Univariate analysis revealed distinct clinical profiles between the two groups:

Comorbidity-related variables: Diabetes was less frequent in the consciousness-disorder group, whereas cerebrovascular disease was substantially more prevalent in this group; malignant tumor was less common in patients with consciousness disorders.

Supportive intervention variables: Mechanical ventilation, antibiotic use, urinary catheterization, and nasogastric tube placement were all significantly more frequent in the consciousness-disorder group, with nasogastric tube use and mechanical ventilation showing the strongest univariate associations with altered consciousness.

Physiological and laboratory variables: Patients with consciousness disorders had longer hospital stays, higher Simplified Acute Physiology Score II (SAPS II) and Sequential Organ Failure Assessment (SOFA) scores, slightly higher body temperatures, and higher respiratory rates. Laboratory findings indicated that platelet counts and serum sodium levels were higher, while serum potassium levels were lower in the consciousness-disorder group compared to the non-consciousness-disorder group.

Specific variables with significant univariate differences (all p < 0.05) are listed in Table 3, including hospital length of stay, diabetes, cerebrovascular disease, malignant tumor, mechanical ventilation, antibiotic use, urinary catheterization, nasogastric tube, SAPS II score, SOFA score, body temperature, respiratory rate, platelet count, serum potassium level, and serum sodium level.

3.2 Multivariate logistic regression analysis

Multivariate logistic regression analysis, adjusted for potential confounders, confirmed four variables as independent predictors of consciousness disorders in ICU stroke patients (all p < 0.001; Table 3):

(1) Mechanical ventilation (OR = 0.45; 95% CI: 0.29–0.72);

(2) Hospital length of stay (OR = 1.04; 95% CI: 1.02–1.06);

(3) Nasogastric tube (OR = 2.47; 95% CI: 1.61–3.79);

(4) SOFA score (OR = 1.60; 95% CI: 1.47–1.74);

3.3 Development and validation of machine learning prediction models

Four variables with p < 0.001 from the multivariate analysis (length of hospital stay, mechanical ventilation, nasogastric tube, and SOFA score) were used to compare the performance of 11 machine learning models in the training and test sets. Evaluation metrics included AUC, accuracy, sensitivity, specificity, F1 score, and Brier score (Table 4).

Table 4
www.frontiersin.org

Table 4. Comparison of performance metrics among different machine learning models.

3.3.1 Performance on the training set

AUC: The Random Forest (RF) model performed optimally (0.900), significantly outperforming other models; K-Nearest Neighbors (KNN) (0.857) and Decision Tree (DT) (0.828) followed in sequence.

Accuracy: The Elastic Net (ENet) model achieved the highest accuracy (0.788), closely followed by RF (0.787) and Multi-Layer Perceptron (MLP) (0.775).

Sensitivity: The ENet model exhibited the best sensitivity (0.892), with Ridge Regression (0.861) and DT (0.863) reaching comparable levels.

Specificity: The RF model had the highest specificity (0.900), with KNN (0.845) ranking second.

F1 Score: The DT model obtained the highest F1 Score (0.865), while ENet (0.862) and XGBoost (0.843) also performed well.

Brier Score: The DT and KNN models tied for the lowest Brier Score (0.126), indicating the highest accuracy in probability prediction; SVM had the highest Brier Score (0.191).

3.3.2 Performance on the validation set

AUC: The LightGBM model performed optimally (0.795), followed by XGBoost (0.785) and DT (0.779).

Accuracy: The ENet model achieved the highest accuracy (0.786), with DT (0.772) and MLP (0.762) showing stable performance.

Sensitivity: The ENet model remained the top performer in sensitivity (0.895), while Ridge Regression (0.867) and MLP (0.821) maintained high levels.

Specificity: The KNN model had the highest specificity (0.731), followed by Logistic Regression (0.715).

F1 Score: The MLP model obtained the highest F1 Score (0.835), with LightGBM and XGBoost (both 0.827) showing stable performance.

Brier Score: The LightGBM model had the lowest Brier Score (0.140), indicating the highest accuracy in probability prediction; SVM still had the highest Brier Score (0.195).

3.3.3 Key conclusions

The RF model exhibited the best comprehensive performance on the training set, but its AUC dropped to 0.773 on the test set—suggesting a tendency toward overfitting. The LightGBM model achieved the highest AUC (0.795) and the lowest Brier Score (0.140) on the test set, demonstrating optimal generalization ability and stability in probability prediction. The ENet model showed outstanding performance in sensitivity (0.892 in the training set, 0.895 in the test set) but the lowest specificity (0.490 in the training set, 0.482 in the test set), indicating that the improvement in sensitivity came at the cost of specificity. The SVM model had the highest Brier Score in both sets, indicating the poorest performance in probability prediction. Details are shown in Figures 25 and Table 4.

Figure 2
Two ROC curve plots compare model performance on training and validation data. Both plots display ROCs with axes: sensitivity versus one minus specificity. Models include Logistic, DT, LASSO, Ridge, ENet, KNN, Lightgbm, RF, Xgboost, SVM, and MLP, each with distinct colors. Each plot includes AUC scores in the legend. Plot A shows training data performance, while Plot B shows validation data performance. A diagonal line represents random chance.

Figure 2. Comparison of ROC curves of different machine learning models in the training (A) and validation (B) sets. AUC, area under the ROC curve; DT, decision tree; ENet, elastic net; KNN, K-nearest neighbors; Lightgbm, light gradient boosting machine; RF, random forest; Xgboost, eXtreme gradient boosting; SVM, support vector machine; MLP, multi-layer perceptron.

Figure 3
Line graphs display calibration curves of various models on training and validation data. Each line represents a model, such as Decision Tree, K-Nearest Neighbors, and others, showing event rates against bin midpoints. Dashed lines indicate perfect calibration. Additional smaller graphs below highlight individual model performances separately for clearer analysis.

Figure 3. Calibration curves of different machine learning models in the training set (A) and validation set (B). AUC, area under the ROC curve; DT, decision tree; ENet, elastic net; KNN, K-nearest neighbors; Lightgbm, light gradient boosting machine; RF, random forest; Xgboost, eXtreme gradient boosting; SVM, support vector machine; MLP, multi-layer perceptron.

Figure 4
Two decision curve analysis plots compare net benefit versus threshold probability for different models. The first plot is for training data, and the second for validation data. Both plots include lines for models such as Logistic, DT, LASSO, Ridge, SVM, and others. The x-axis represents threshold probability, while the y-axis represents net benefit. Lines show how each model performs across varying threshold probabilities.

Figure 4. Decision curve analysis (DCA) curves of different machine learning models in the training set (A) and validation set (B). AUC, area under the ROC curve; DT, decision tree; ENet, elastic net; KNN, K-nearest neighbors; Lightgbm, light gradient boosting machine; RF, random forest; Xgboost, eXtreme gradient boosting; SVM, support vector machine; MLP, multi-layer perceptron.

Figure 5
Parallel coordinate plots comparing various algorithms on training and validation data. Plot A shows training data performance with metrics like accuracy and precision across models like Logistic, Decision Tree, and SVM. Plot B displays similar metrics for validation data, showing consistent trends across models.

Figure 5. Parallel coordinate plots of different machine learning models in the training set (A) and validation set (B). AUC, area under the ROC curve; DT, decision tree; ENet, elastic net; KNN, K-nearest neighbors; Lightgbm, light gradient boosting machine; RF, random forest; Xgboost, eXtreme gradient boosting; SVM, support vector machine; MLP, multi-layer perceptron.

Our Decision Curve Analysis (DCA) results demonstrated that the model yields positive net benefits within a clinically relevant risk threshold range of approximately 30 to 90%. This indicates that within this probability threshold, utilizing our model to guide clinical decision-making—such as initiating specific interventions or further diagnostic evaluations—can lead to superior patient outcomes compared to the strategies of “treating all patients” or “treating no patients,” as detailed in Figure 4.

In the present study, we specifically employed the SHAP (SHapley Additive exPlanations) framework to interpret the predictions generated by the LightGBM model. The summary plot in Figure 6 illustrates the mean absolute SHAP values of each feature across the entire dataset, providing a global-level explanation of which factors the model deems most critical for predictions. Beyond presenting the ranking of feature importance, this plot intuitively visualizes the direction of feature impacts through a color gradient—i.e., whether higher feature values drive the prediction results in a positive or negative direction.

Figure 6
Bar and summary plot depicting SHAP values for medical features. Plot A shows importance based on mean SHAP value, ranking SOFA score highest, followed by hospital length of stay, nasogastric tube, and mechanical ventilation. Plot B illustrates SHAP value distribution for each feature, with color indicating feature value intensity from low (blue) to high (red).

Figure 6. Ranking of predictive factors by weight using the SHAP method.

4 Discussion

This retrospective study analyzed data from 2,434 stroke patients in the MIMIC-IV Version 3.1 database to evaluate the risk of consciousness disorders, focusing on four key factors: length of hospital stay, mechanical ventilation, nasogastric tube, and Sequential Organ Failure Assessment (SOFA) score. The findings aim to assist healthcare providers in identifying high-risk patients and formulating personalized treatment strategies.

Consciousness disorders may induce complications including dysphagia, pulmonary infections, and dyspnea—risks particularly prominent in ICUs with extensive use of mechanical ventilation. While previous studies have confirmed a significant association between mechanical ventilation and consciousness disorders (18), the present study identifies mechanical ventilation as a protective factor against these disorders in stroke patients: the incidence of consciousness disorders in ventilated patients was only 45% of that in non-ventilated patients. This aligns with prior research showing early ventilation rapidly corrects hypoxemia and alleviates cerebral hypoxia (19), likely by precisely regulating oxygen concentration and ventilation parameters to prevent hypoxia-induced metabolic disturbances and neuronal damage (19) —ultimately reducing the risk of consciousness disorders. Notably, mechanical ventilation lasting over 24 h is classified as prolonged (20), and its long-term use carries risks: it may induce ventilator-associated lung injury or even pulmonary fibrosis (21), which can further cause dyspnea and cerebral hypoxia. These outcomes lead to cerebral cell hypoxia, subsequent cell death, and impaired cerebral function—eventually contributing to consciousness disorders (22).

The ICU inpatients often experience complications (e.g., hypotension, respiratory depression, hallucinations, cerebral hypoperfusion, neurological dysfunction) that prolong hospital stays and elevate the risk of consciousness disorders (23). The present study confirms each additional day of hospitalization increases this risk by a factor of 1.04, consistent with Chen et al. (24), who reported longer stays correlate with poorer prognoses. This may stem from the severe baseline condition of ICU patients: extended hospital stays increase their exposure to multiple pathogens, thereby raising the probability of developing consciousness disorders (25). Thus, clinical teams should prioritize optimizing hospital length of stay by streamlining diagnostic workflows, individualizing therapeutic plans, and enhancing nursing interventions to accelerate patient recovery and reduce consciousness disorder risk.

Regarding nasogastric tube use, the study found ICU patients with such tubes had a 2.47-fold higher incidence of consciousness disorders than those without. Nasogastric tubes are typically used in stroke patients with dysphagia to prevent aspiration and provide nutritional support (26); however, dysphagia itself is linked to brainstem or cortical dysfunction, giving these patients an inherently higher baseline risk of consciousness disorders. Even with tube placement, improper care (e.g., overly rapid feeding, inappropriate patient positioning) can cause regurgitation and aspiration of gastric contents, leading to aspiration pneumonia (27). Inflammatory responses, hypoxemia, or sepsis resulting from pulmonary infections may exacerbate consciousness disorders by impairing cerebral perfusion or inducing systemic inflammatory response syndrome (SIRS). Nursing staff should therefore closely monitor patients’ swallowing function and consider early tube removal once function recovers.

For the SOFA score, each 1-point increase raises the risk of consciousness disorders by 1.60 times. The SOFA score quantifies the severity of organ dysfunction by assessing six organ systems (respiratory, coagulation, hepatic, cardiovascular, central nervous, renal), with each system scored on a 0–4 scale and a total score ranging from 0 to 24 (28); higher scores indicate more severe organ dysfunction or failure (29), and accordingly, a greater likelihood of consciousness disorders. Clinically, it is necessary to promptly perform SOFA scoring for ICU stroke patients and formulate individualized preventive measures in advance based on the results.

In this study, 11 machine learning algorithms were employed to construct risk prediction models for post-stroke consciousness disorders. The results revealed significant differences in performance across algorithms, which reflects the unique characteristics and applicable scenarios of machine learning models in clinical prediction tasks. Among traditional algorithms, logistic regression offers the advantage of strong interpretability (30), however, its AUC values in the training and validation sets were lower than those of most ensemble learning algorithms, indicating that logistic regression has limited ability to fit and generalize complex clinical data. The decision tree model exhibited excellent performance in the training set, but its AUC decreased significantly in the validation set—exposing the limitations of single decision trees, such as susceptibility to overfitting and poor stability.

Among ensemble learning algorithms, the Light Gradient Boosting Machine (LightGBM) demonstrated the optimal comprehensive performance: it achieved the highest AUC in the validation set, with the smallest difference in Brier scores between the training and validation sets. This indicates that through gradient boosting and feature parallel optimization, LightGBM not only avoids the overfitting issue observed in the Random Forest model but also overcomes the limitation of the Elastic Net model—where high sensitivity is accompanied by low specificity. Thus, LightGBM achieves a balance between discriminative ability, stability, and probability calibration. In contrast, the Support Vector Machine (SVM) model performed poorly in both datasets. This is presumably because SVM has insufficient ability to fit the nonlinear relationships in high-dimensional clinical data and is highly sensitive to differences in sample distribution, preventing it from adapting to the complex association patterns of risk factors in this study.

Our Decision Curve Analysis (DCA) results revealed that the model delivers positive net benefits across a clinically relevant risk threshold range of approximately 30 to 90%. This indicates that within this probability spectrum, employing our model to guide clinical decision-making—such as initiating targeted interventions or additional diagnostic workups—yields more favorable patient outcomes compared to the extreme strategies of “treating all patients” or “withholding treatment from all patients.” This specific threshold range holds substantial clinical significance, as it reflects the critical point at which clinicians or patients perceive the benefits of an intervention to outweigh its potential harms. For instance, in the context of the present study, a risk threshold of around 20% might imply that if the model-predicted risk exceeds this cutoff, prophylactic treatment is recommended.

In summary, consciousness disorders in ICU stroke patients are associated with length of hospital stay, mechanical ventilation, nasogastric tube use, and SOFA score. Early assessment of these risk factors and proactive prevention of post-stroke consciousness disorders are therefore of paramount clinical importance. Machine learning algorithms provide diverse tools for predicting the risk of these disorders, and ensemble algorithms such as LightGBM offer distinct advantages in handling multi-factor interactions and improving model generalization. However, the clinical application of such models still requires simplification of operational workflows based on real-world scenarios (e.g., conversion into visual nomograms or online calculators) to enhance their utility in primary care settings.

This study has several limitations. First, its retrospective observational design may inherently introduce selection bias; the direct exclusion of patients with missing variables could lead to outcome bias; and data collection was limited to the early stage of ICU admission, failing to capture the impact of dynamic changes in patients’ conditions on the risk of consciousness disorders. Second, despite the MIMIC-IV database being a large-scale multicenter resource, this study did not conduct external multicenter validation. Third, there are a small number of overlapping indicators between the Glasgow Coma Scale (GCS) score and SOFA score; while the SOFA score is a composite indicator containing more non-overlapping items, data constraints of the MIMIC database prevented the identification of a more suitable alternative indicator. Additionally, research on the subacute phase of stroke was not feasible due to limitations in the available data from the MIMIC database.

Based on the aforementioned limitations, future studies will integrate clinical data to address these issues. Independent multicenter studies are still needed to further verify the generalizability of the established model.

5 Conclusion

This study identified length of hospital stay, mechanical ventilation, nasogastric tube use, and SOFA score as independent risk factors for consciousness disorders in ICU stroke patients. Among the 11 machine learning algorithms evaluated, the LightGBM algorithm demonstrated the best performance across three key dimensions: accuracy (assessed by AUC), stability (consistency between training and test set results), and probability calibration (assessed by Brier score). Balancing performance and efficiency, this algorithm can serve as an intuitive, personalized clinical tool to assist clinicians in the timely identification and risk stratification of stroke patients at high risk of consciousness disorders. In turn, this enables the prompt implementation of clinical interventions to reduce the incidence of complications and improve patient outcomes.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and the institutional requirements.

Author contributions

GF: Data curation, Writing – review & editing, Formal analysis, Validation, Software, Writing – original draft. LW: Writing – review & editing, Funding acquisition. XL: Investigation, Writing – review & editing. JL: Writing – review & editing, Investigation. YP: Software, Writing – review & editing. YQ: Writing – review & editing, Investigation. HC: Writing – review & editing, Supervision.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This study was supported by the General Project of the Xinjiang Key Laboratory of Neurological Diseases (a key laboratory of the Xinjiang Uygur Autonomous Region) (Grant no. XJDX1711-2436). The funder had no role in the design of the study, data collection and analysis, interpretation of results, or writing of the manuscript.

Acknowledgments

The authors sincerely thank the research team of the MIMIC database for establishing and maintaining this valuable public resource, which made this study possible. Additionally, the authors thank all colleagues from the School of Nursing at Xinjiang Medical University and the Department of Nursing at The Fifth Affiliated Hospital of Xinjiang Medical University for their constructive suggestions during the research process.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2025.1668593/full#supplementary-material

References

1. De Courson, H, Massart, N, Asehnoune, K, and Cinotti, RENIO Study group. Risk factors of extubation failure in neurocritical patients with the most impaired consciousness. Intensive Care Med. (2023) 49:1251–3. doi: 10.1007/s00134-023-07189-3,

PubMed Abstract | Crossref Full Text | Google Scholar

2. Wang, H, Ye, K, Li, D, Liu, Y, and Wang, D. DL-3-n-butylphthalide for acute ischemic stroke: an updated systematic review and meta-analysis of randomized controlled trials. Front Pharmacol. (2022) 13:963118. doi: 10.3389/fphar.2022.963118,

PubMed Abstract | Crossref Full Text | Google Scholar

3. Von Hofen-Hohloch, J, Awissus, C, Fischer, MM, Michalski, D, Rumpf, J-J, and Classen, J. Delirium screening in Neurocritical care and stroke unit patients: a pilot study on the influence of neurological deficits on CAM-ICU and ICDSC outcome. Neurocrit Care. (2020) 33:708–17. doi: 10.1007/s12028-020-00938-y,

PubMed Abstract | Crossref Full Text | Google Scholar

4. Wei, H, Huang, X, Zhang, Y, Jiang, G, Ding, R, Deng, M, et al. Explainable machine learning for predicting neurological outcome in hemorrhagic and ischemic stroke patients in critical care. Front Neurol. (2024) 15:1385013. doi: 10.3389/fneur.2024.1385013,

PubMed Abstract | Crossref Full Text | Google Scholar

5. Peng, Z, Wu, J, Wang, Z, Xie, H, Wang, J, Zhang, P, et al. Incidence and related risk factors for postoperative delirium following revision total knee arthroplasty: a retrospective nationwide inpatient sample database study. BMC Musculoskelet Disord. (2024) 25:633. doi: 10.1186/s12891-024-07757-8,

PubMed Abstract | Crossref Full Text | Google Scholar

6. Tan, X, Wang, J, Tang, J, Tian, X, Jin, L, Li, M, et al. A nomogram for predicting Cancer-specific survival in children with Wilms tumor: a study based on SEER database and external validation in China. Front Public Health. (2022) 10:829840. doi: 10.3389/fpubh.2022.829840,

PubMed Abstract | Crossref Full Text | Google Scholar

7. Cui, C, Han, G, Wang, Y, Zhao, B, and Li, Q. Development and validation of a nomogram for predicting model for delirium after stroke. Asian Nurs Res. (2025) 19:113–9. doi: 10.1016/j.anr.2025.01.001,

PubMed Abstract | Crossref Full Text | Google Scholar

8. Cai, X, Yu, X, Qin, J, Zhou, K, Li, Z, Zhang, J, et al. Development and validation of a nomogram for delirium in the old ischaemic stroke patients. Psychogeriatrics. (2025) 25:e13247. doi: 10.1111/psyg.13247,

PubMed Abstract | Crossref Full Text | Google Scholar

9. Reznik, ME, Daiello, LA, Thompson, BB, Wendell, LC, Mahta, A, Potter, NS, et al. Fluctuations of consciousness after stroke: associations with the confusion assessment method for the intensive care unit (CAM-ICU) and potential undetected delirium. J Crit Care. (2020) 56:58–62. doi: 10.1016/j.jcrc.2019.12.008,

PubMed Abstract | Crossref Full Text | Google Scholar

10. Guldolf, K, Vandervorst, F, Gens, R, Ourtani, A, Scheinok, T, and De Raedt, S. Neutrophil-to-lymphocyte ratio predicts delirium after stroke. Age Ageing. (2021) 50:1626–32. doi: 10.1093/ageing/afab133,

PubMed Abstract | Crossref Full Text | Google Scholar

11. Ni, P, Zhang, S, Zhang, G, Zhang, W, Zhang, H, Zhu, Y, et al. Development and validation of machine learning-based prediction model for outcome of cardiac arrest in intensive care units. Sci Rep. (2025) 15:8691. doi: 10.1038/s41598-025-93182-3,

PubMed Abstract | Crossref Full Text | Google Scholar

12. Andreu-Perez, J, Poon, CCY, Merrifield, RD, Wong, STC, and Yang, G-Z. Big data for health. IEEE J Biomed Health Inform. (2015) 19:1193–208. doi: 10.1109/JBHI.2015.2450362,

PubMed Abstract | Crossref Full Text | Google Scholar

13. Liu, L, Ma, Q, Yu, G, Ji, X, and He, H. Association between the (neutrophil + monocyte)/albumin ratio and all-cause mortality in sepsis patients: a retrospective cohort study and predictive model establishment according to machine learning. BMC Infect Dis. (2025) 25:579. doi: 10.1186/s12879-025-10969-5,

PubMed Abstract | Crossref Full Text | Google Scholar

14. Liu, K, Yang, L, Liu, Y, Zhang, Y, Zhu, J, Zhang, H, et al. Systemic immune-inflammation index (SII) and neutrophil-to-lymphocyte ratio (NLR): a strong predictor of disease severity in large-artery atherosclerosis (LAA) stroke patients. J Inflamm Res. (2025) 18:195–202. doi: 10.2147/JIR.S500474,

PubMed Abstract | Crossref Full Text | Google Scholar

15. Fu, H, Ge, L, and Liang, J. Cerebral microbleeds and antiplatelet therapy in Mongolian and Han patients with ischemic cerebrovascular disease. J Multidiscip Healthc. (2024) 17:5789–98. doi: 10.2147/JMDH.S491665,

PubMed Abstract | Crossref Full Text | Google Scholar

16. Chawnchhim, AL, Mahajan, C, Kapoor, I, Sinha, TP, Prabhakar, H, and Chaturvedi, A. Comparison of Glasgow coma scale full outline of UnResponsiveness and Glasgow coma scale: pupils score for predicting outcome in patients with traumatic brain injury. Indian J Crit Care Med Peer Rev Off Publ Indian Soc Crit Care Med. (2024) 28:256–64. doi: 10.5005/jp-journals-10071-24651,

PubMed Abstract | Crossref Full Text | Google Scholar

17. Wang, X, Xia, J, Shan, Y, Yang, Y, Li, Y, and Sun, H. Predictive value of the Oxford acute severity of illness score in acute stroke patients with stroke-associated pneumonia. Front Neurol. (2023) 14:1251944. doi: 10.3389/fneur.2023.1251944,

PubMed Abstract | Crossref Full Text | Google Scholar

18. Berger, E, Wils, E-J, Vos, P, Van Santen, S, Koets, J, Slooter, AJC, et al. Prevalence and management of delirium in intensive care units in the Netherlands: an observational multicentre study. Intensive Crit Care Nurs. (2020) 61:102925. doi: 10.1016/j.iccn.2020.102925,

PubMed Abstract | Crossref Full Text | Google Scholar

19. Liu, W, Yu, X, Chen, J, Chen, W, and Wu, Q. Explainable machine learning for early prediction of sepsis in traumatic brain injury: a discovery and validation study. PLoS One. (2024) 19:e0313132. doi: 10.1371/journal.pone.0313132,

PubMed Abstract | Crossref Full Text | Google Scholar

20. Audet, L-A, Lavoie-Tremblay, M, Tchouaket, É, and Kilpatrick, K. The level of adherence to best-practice guidelines by interprofessional teams with and without acute care nurse practitioners in cardiac surgery: a study protocol. PLoS One. (2023) 18:e0282467. doi: 10.1371/journal.pone.0282467,

PubMed Abstract | Crossref Full Text | Google Scholar

21. Liu, C, Xiao, K, and Xie, L. Advances in the regulation of macrophage polarization by mesenchymal stem cells and implications for ALI/ARDS treatment. Front Immunol. (2022) 13:928134. doi: 10.3389/fimmu.2022.928134,

PubMed Abstract | Crossref Full Text | Google Scholar

22. Wang, H, Guo, J, Zhang, Y, Fu, Z, and Yao, Y. Closed-loop rehabilitation of upper-limb dyskinesia after stroke: from natural motion to neuronal microfluidics. J Neuroengineering Rehabil. (2025) 22:87. doi: 10.1186/s12984-025-01617-9,

PubMed Abstract | Crossref Full Text | Google Scholar

23. Fu, X, Wang, L, Wang, G, Liu, X, Wang, X, Ma, S, et al. Delirium in elderly patients with COPD combined with respiratory failure undergoing mechanical ventilation: a prospective cohort study. BMC Pulm Med. (2022) 22:266. doi: 10.1186/s12890-022-02052-5,

PubMed Abstract | Crossref Full Text | Google Scholar

24. Chen, L-L, Wang, W-T, Zhang, S, Liu, H-M, Yuan, X-Y, Yang, X, et al. Cohort study on the prognosis of acute cerebral infarction in different circulatory systems at 1-year follow-up. BMC Cardiovasc Disord. (2021) 21:521. doi: 10.1186/s12872-021-02291-0,

PubMed Abstract | Crossref Full Text | Google Scholar

25. Liang, M, and Liu, Q. Distribution and risk factors of multidrug-resistant Bacteria infection in orthopedic patients. J Healthc Eng. (2022) 2022:1–5. doi: 10.1155/2022/2114661,

PubMed Abstract | Crossref Full Text | Google Scholar

26. Juan, W, Zhen, H, Yan-Ying, F, Hui-Xian, Y, Tao, Z, Pei-Fen, G, et al. A comparative study of two tube feeding methods in patients with dysphagia after stroke: a randomized controlled trial. J Stroke Cerebrovasc Dis. (2020) 29:104602. doi: 10.1016/j.jstrokecerebrovasdis.2019.104602,

PubMed Abstract | Crossref Full Text | Google Scholar

27. Chen, J, Luo, A-L, Yang, L, Wang, W, Zhou, X, and Yang, M. Nutrition management by a multidisciplinary team for prevention of nutritional deficits and morbidity following esophagectomy. Braz J Med Biol Res. (2023) 56:e12421. doi: 10.1590/1414-431X2023e12421,

PubMed Abstract | Crossref Full Text | Google Scholar

28. Sun, J, Shao, Y, Jiang, R, Qi, T, Xun, J, Shen, Y, et al. Monocyte distribution width (MDW) as a reliable diagnostic biomarker for sepsis in patients with HIV. Emerg Microbes Infect. (2025) 14:2479634. doi: 10.1080/22221751.2025.2479634,

PubMed Abstract | Crossref Full Text | Google Scholar

29. Ahn, YH, Yoon, SM, Lee, J, Lee, S-M, Oh, DK, Lee, SY, et al. Early Sepsis-associated acute kidney injury and obesity. JAMA Netw Open. (2025) 7:e2354923. doi: 10.1001/jamanetworkopen.2023.54923,

PubMed Abstract | Crossref Full Text | Google Scholar

30. Zapata-Cortes, O, Arango-Serna, MD, Zapata-Cortes, JA, and Restrepo-Carmona, JA. Machine learning models and applications for early detection. Sensors. (2024) 24:4678. doi: 10.3390/s24144678,

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: consciousness disorders, intensive care unit (ICU), machine learning, risk prediction model, stroke

Citation: Fang G, Wang L, Liu X, Liu J, Pei Y, Qi Y and Chang H (2025) Development and validation of a risk prediction model for consciousness disorders in stroke patients in the intensive care unit (ICU): a retrospective study. Front. Med. 12:1668593. doi: 10.3389/fmed.2025.1668593

Received: 18 July 2025; Revised: 30 November 2025; Accepted: 30 November 2025;
Published: 29 December 2025.

Edited by:

Denise Battaglini, University of Genoa, Italy

Reviewed by:

Kais Gadhoumi, Duke University, United States
José Eduardo Guimarães Pereira, Hospital Central do Exército, Brazil

Copyright © 2025 Fang, Wang, Liu, Liu, Pei, Qi and Chang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Haixia Chang, MTM4OTk5OTEzODlAMTYzLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.