Machine Learning Prediction Models for Mechanically Ventilated Patients: Analyses of the MIMIC-III Database

Background: Mechanically ventilated patients in the intensive care unit (ICU) have high mortality rates. There are multiple prediction scores, such as the Simplified Acute Physiology Score II (SAPS II), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA), widely used in the general ICU population. We aimed to establish prediction scores on mechanically ventilated patients with the combination of these disease severity scores and other features available on the first day of admission. Methods: A retrospective administrative database study from the Medical Information Mart for Intensive Care (MIMIC-III) database was conducted. The exposures of interest consisted of the demographics, pre-ICU comorbidity, ICU diagnosis, disease severity scores, vital signs, and laboratory test results on the first day of ICU admission. Hospital mortality was used as the outcome. We used the machine learning methods of k-nearest neighbors (KNN), logistic regression, bagging, decision tree, random forest, Extreme Gradient Boosting (XGBoost), and neural network for model establishment. A sample of 70% of the cohort was used for the training set; the remaining 30% was applied for testing. Areas under the receiver operating characteristic curves (AUCs) and calibration plots would be constructed for the evaluation and comparison of the models' performance. The significance of the risk factors was identified through models and the top factors were reported. Results: A total of 28,530 subjects were enrolled through the screening of the MIMIC-III database. After data preprocessing, 25,659 adult patients with 66 predictors were included in the model analyses. With the training set, the models of KNN, logistic regression, decision tree, random forest, neural network, bagging, and XGBoost were established and the testing set obtained AUCs of 0.806, 0.818, 0.743, 0.819, 0.780, 0.803, and 0.821, respectively. The calibration curves of all the models, except for the neural network, performed well. The XGBoost model performed best among the seven models. The top five predictors were age, respiratory dysfunction, SAPS II score, maximum hemoglobin, and minimum lactate. Conclusion: The current study indicates that models with the risk of factors on the first day could be successfully established for predicting mortality in ventilated patients. The XGBoost model performs best among the seven machine learning models.

Background: Mechanically ventilated patients in the intensive care unit (ICU) have high mortality rates. There are multiple prediction scores, such as the Simplified Acute Physiology Score II (SAPS II), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA), widely used in the general ICU population. We aimed to establish prediction scores on mechanically ventilated patients with the combination of these disease severity scores and other features available on the first day of admission.
Methods: A retrospective administrative database study from the Medical Information Mart for Intensive Care (MIMIC-III) database was conducted. The exposures of interest consisted of the demographics, pre-ICU comorbidity, ICU diagnosis, disease severity scores, vital signs, and laboratory test results on the first day of ICU admission. Hospital mortality was used as the outcome. We used the machine learning methods of k-nearest neighbors (KNN), logistic regression, bagging, decision tree, random forest, Extreme Gradient Boosting (XGBoost), and neural network for model establishment. A sample of 70% of the cohort was used for the training set; the remaining 30% was applied for testing. Areas under the receiver operating characteristic curves (AUCs) and calibration plots would be constructed for the evaluation and comparison of the models' performance. The significance of the risk factors was identified through models and the top factors were reported.
Results: A total of 28,530 subjects were enrolled through the screening of the MIMIC-III database. After data preprocessing, 25,659 adult patients with 66 predictors were included in the model analyses. With the training set, the models of KNN, logistic regression, decision tree, random forest, neural network, bagging, and XGBoost were established and the testing set obtained AUCs of 0.806, 0.818, 0.743, 0.819, 0.780,

INTRODUCTION
Mechanically ventilated patients account for more than a quarter in the intensive care unit (ICU) (1). Invasive mechanical ventilation is associated with multiple complications and high mortality (2). The mechanical ventilation ratio has been increasing in the ICU in recent years due to the aging population, more survivors with cancers and comorbidities, and the advancements in treatment (3,4).
Prediction models are useful tools to unearth underlying causes and provide assistance for clinical practice (5). Establishing a death prediction model of mechanically ventilated patients using their early-stage, easily obtained, and wellgeneralized features might be helpful for ICU physicians for early alerting and judgment.
With the development of machine learning algorithms, modeling methods are more diversified (6,7). Extreme Gradient Boosting (XGBoost) has been widely recognized and highly praised in a number of data mining challenges (8)(9)(10). With its notable advantages, we hypothesized that the XGBoost model would perform better than other models. We planned to develop and validate multiple machine learning models using the data available in the early stages to predict hospital mortality and identify risk factors in mechanically ventilated ICU patients.

Database and Study Design
The Medical Information Mart for Intensive Care (MIMIC-III) database was used as the data resource (11 have obtained permission after application and completion of the course and test (record IDs: 32994435 and 32450965). We established and validated the prediction models using the retrospectively extracted data in MIMIC-III. This study was performed based on the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guideline (12).

Subjects, Variables, and the Outcome Extraction
Adult ICU patients treated with invasive mechanical ventilation during ICU stay were included. Subjects aged younger than 18 years or older than 90 years or who lack information on the outcome measure were excluded. Hospital mortality was used as the outcome measure.
The subject IDs were used to identify distinct adult patients. The predictors included: (a) demographic information: age and gender; (b) medical history: uncomplicated hypertension (defined as hypertension without complication), complicated hypertension (defined as hypertension with complication),    information on renal replacement therapy (RRT) and the duration of mechanical ventilation were extracted to present the characteristics of the included subjects; they were not analyzed as predictors since we included only early-stage predictors, which can be obtained on the first day of ICU admission in this prediction model. The lengths of stay in hospital of survivors and non-survivors were reported. The target subjects together with all the predefined predictors, subject ID, characteristic variables, and the outcome measure were extracted using a Structured Query Language (SQL) script. The definition of the medical condition was referred to the ICD-9 code (13) and derived from the GitHub (https://github.com/MIT-LCP/mimic-code). The severity of respiratory, coagulation, liver, cardiovascular, central nervous system, or renal failure referred to the SOFA score of the specific organ (scores 0-4). The first day indicates the first 24 h of ICU admission. The SOFA, SAPS II, and OASIS scores refer to the first scores after ICU admission. After the extraction of the data, subjects who met the exclusion criteria were excluded. Then, the extreme and error values failing the logic check were censored. We excluded variables with missing values accounting for more than 30% of the sample size (14). Otherwise, we used the mean imputation method to deal with missing values. Thus, the subset was established for the final analyses.

Statistical Analysis
The characteristics of the included patients were compared between survivors and non-survivors. The continuous variables are presented as the median and interquartile range (IQR) and compared using the t-test. The counting data are presented as numbers and percentages and compared using the chisquare test. We employed seven machine learning methods-k-nearest neighbors (KNN), logistic regression, bagging, decision tree, random forest, XGBoost, and neural network-for model establishment. A sample of 70% of the cohort generated randomly using a seed was applied for the training set; the remaining 30% was used for testing. Areas under the receiver operating characteristic curves (AUCs) were used to evaluate the performance of the models. Calibration plots were drawn to visualize the prediction abilities of the models. For the bestperforming model, the significance of the model parameters was identified and reported; the Shapley additive explanation (SHAP) plot was drawn. SAS software (version 9.4), R software (version 3.6.1), and Python software (version 3.4.3) were used for statistical analyses.

Participants
Among the 38,597 adult patients in the MIMIC-III database, 28,530 subjects met our selection criteria. After the logic check, 25,659 patients were included in the final analyses (Figure 1). Sixty-seven predictors were extracted from the database. After data cleaning, the predictor severe liver failure was excluded because of more than 30% of missing data; 66 predictors were included in the model. The mortality rate of the cohort was 45.5% (13, 987 survivors and 11,672 non-survivors). The median length of stay in hospital of survivors was 9.2 days (IQR = 11.1) and that of non-survivors was 11.1 days (IQR = 15.3, p < 0.0001). The comparison of characteristics between the survivors and the non-survivors is reported in Table 1. Nonsurvivors were older and had higher SAPS II, SOFA, and OASIS scores; more medical history of hypertension with complication, diabetes with complication, malignancy, hematologic disease, peripheral vascular disease, hypothyroidism, chronic heart failure, stroke, and liver disease; more diagnosis of sepsis, any organ failure, severe respiratory failure, severe coagulation failure, severe liver failure, severe cardiovascular failure, severe central nervous system failure, severe renal failure, respiratory dysfunction, cardiovascular dysfunction, renal dysfunction, hematologic dysfunction, metabolic dysfunction, and neurologic dysfunction; had higher mean HR, maximum HR, maximum MAP, maximum SBP, mean lactate, minimum lactate, mean Frontiers in Medicine | www.frontiersin.org glucose, minimum glucose, maximum glucose, mean WBC, minimum WBC, maximum WBC, mean creatinine, minimum creatinine, and maximum creatinine; and had longer duration of mechanical ventilation and more RRTs (p < 0.05), while they had a lower male ratio, hypertension without complication, mean MAP, minimum MAP, mean SBP, minimum SBP, mean DBP, minimum DBP, mean temperature, maximum temperature, mean hemoglobin, minimum hemoglobin, and maximum hemoglobin (p < 0.05). There were no significant differences in diabetes without complication (p = 0.0815) and maximum DBP (p = 0.0636) between the two groups.
The KNN, logistic regression, decision tree, random forest, neural network, bagging, and XGBoost models were established with the training set; the AUCs of the testing set were 0.806, 0.818, 0.743, 0.819, 0.780, 0.803, and 0.821, respectively (Figure 2). The calibration plots of the seven models are presented in Figure 3.
The calibration curves of all the models, except that of the neural network, performed well. Among the seven models, XGBoost performed best, with the highest receiver operating characteristic (ROC) and the best calibration curve. The hyperparameters applied in the final XGBoost model were as follows: learning rates = 0.008, number of estimators = 800, maximum depth of a tree = 6, α = 0, λ = 0. The significance of the predictors in the XGBoost model is presented in Figure 4. In the SHAP methodology, the top five predictors were age, respiratory dysfunction, SAPS II score, maximum hemoglobin, and minimum lactate (the importance values were 0.410, 0.309, 0.302, 0.209, and 0.194, respectively). The confusion matrix of the XGBoost model is presented in Table 2. The SHAP plot and a decision tree of the XGBoost model are in the Supplementary Material.

DISCUSSION
This study identified various clinical features associated with increased hospital mortality among mechanically ventilated ICU patients. Through sophisticated machine learning methods, we determined that age, respiratory dysfunction, SAPS II score, maximum hemoglobin, and minimum lactate were most associated with hospital death. Among the seven models, XGBoost revealed the best performance in discrimination.
Our results showed that more than half of the ICU patients were under mechanical ventilation; the mortality of the mechanically ventilated patients was high (45.5%). The requirement for mechanical ventilation has increased in recent   years (1). Therefore, it is of great importance to recognize early the patients at high risk of death with early-stage, well-generalized, and easily obtained features (15). With the development of machine learning algorithms, the magnitude of predictors that can be processed has mainly been largely enriched. Thus, advanced machine learning techniques allow researchers to establish more optimal models in comparison with conventional models (16). With such models, ICU physicians could be alerted early when patients become complicated and have deteriorated with mechanical ventilation. A previous study conducted by Yao et al. (16) explored the death prediction model in postoperative septic patients using the MIMIC-III database. Similar to our results, they also found that the XGBoost model performed better in predicting hospital mortality than the other models. However, due to the different patient types and the various features included, the feature importance rankings were quite different (their top five predictors: fluid-electrolyte disturbance, coagulopathy, RRT, urine output, and cardiovascular surgery). Another study (5) used information from the first 24 h after admission to the ICU to build a 1-year death prediction model in septic patients based on the stochastic gradient boosting (SGB) methodology. The AUC of the SGB model was 0.8039, similar to the performance of XGBoost in our study. Both the SGB and XGBoost models belong to gradient boosting algorithms. Similar to our results, age ranked first in the feature importance (their top five predictors: age, urine output, maximum BUN, metastatic cancer, and maximum temperature).
There are strengths of our study. Firstly, this is the first study that established several advanced machine learning death prediction models focused on mechanically ventilated ICU patients. Secondly, we used MIMI-III, a high-quality database with a large sample size and comprehensive clinical information. Thirdly, we utilized advanced statistical methods, including seven machine learning models, with the 30% subset used for internal validation and the ROCs and calibration plots to evaluate the models (17).
There are limitations to our study. Firstly, our models were retrospectively established based on a single-center database. Thus, further prospective studies are needed to evaluate the generalization of our models and predictors. Secondly, there were missing data in our research. There was also a potential confounding variable that we were unable to assess because its missing data exceeded the predesigned limit. Thirdly, external validation has not been employed in this study; hence, the significance and evidence level were decreased. Fourthly, our study only focused on hospital mortality, while other important outcome measures such as ventilator-free days within 28 days and long-term mortalities still needed further investigation. Lastly, we did not exclude patients who were withdrawn from care, which may also provide bias.

CONCLUSION
Our results suggest that age, respiratory dysfunction, SAPS II score, maximum hemoglobin, and minimum lactate might be closely associated with hospital mortality in mechanically ventilated ICU patients. The XGBoost model performs better than the KNN, logistic regression, bagging, decision tree, random forest, and neural network models in our study. Further external validations are needed to test the generalization of our models and predictors.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://mimic. physionet.org.

ETHICS STATEMENT
The establishment of this database was approved by the Massachusetts Institute of Technology (Cambridge, MA) and Beth Israel Deaconess Medical Center (Boston, MA), and consent was obtained for the original data collection. Therefore, the ethical approval statement and the need for informed consent were waived for this manuscript. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
YZ and HH conceptualized the research aims, planned the analyses, and guided the literature review. YL and QY extracted the data from the MIMIC-III database. JZ, GW, GC, SL, XJ, and JG participated in processing the data and doing the statistical analysis. YZ wrote the first draft of the paper. RY, CR, HZ, YC, QG, LL, BD, XX, WL, and HH provided comments and approved the final manuscript. All authors read and approved the final manuscript.