Comparison of Four Machine Learning Techniques for Prediction of Intensive Care Unit Length of Stay in Heart Transplantation Patients

Background Post-operative heart transplantation patients often require admission to an intensive care unit (ICU). Early prediction of the ICU length of stay (ICU-LOS) of these patients is of great significance and can guide treatment while reducing the mortality rate among patients. However, conventional linear models have tended to perform worse than non-linear models. Materials and Methods We collected the clinical data of 365 patients from Wuhan Union Hospital who underwent heart transplantation surgery between April 2017 and August 2020. The patients were randomly divided into training data (N = 256) and test data (N = 109) groups. 84 clinical features were collected for each patient. Features were validated using the Least Absolute Shrinkage and Selection Operator (LASSO) regression’s fivefold cross-validation method. We obtained Shapley Additive explanations (SHAP) values by executing package “shap” to interpret model predictions. Four machine learning models and logistic regression algorithms were developed. The area under the receiver operating characteristic curve (AUC-ROC) was used to compare the prediction performance of different models. Finally, for the convenience of clinicians, an online web-server was established and can be freely accessed via the website https://wuhanunion.shinyapps.io/PredictICUStay/. Results In this study, 365 consecutive patients undergoing heart transplantation surgery for moderate (NYHA grade 3) or severe (NYHA grade 4) heart failure were collected in Wuhan Union Hospital from 2017 to 2020. The median age of the recipient patients was 47.2 years, while the median age of the donors was 35.58 years. 330 (90.4%) of the donor patients were men, and the average surgery duration was 260.06 min. Among this cohort, 47 (12.9%) had renal complications, 25 (6.8%) had hepatic complications, 11 (3%) had undergone chest re-exploration and 19 (5.2%) had undergone extracorporeal membrane oxygenation (ECMO). The following six important clinical features were selected using LASSO regression, and according to the result of SHAP, the rank of importance was (1) the use of extracorporeal membrane oxygenation (ECMO); (2) donor age; (3) the use of an intra-aortic balloon pump (IABP); (4) length of surgery; (5) high creatinine (Cr); and (6) the use of continuous renal replacement therapy (CRRT). The eXtreme Gradient Boosting (XGBoost) algorithm presented significantly better predictive performance (AUC-ROC = 0.88) than other models [Accuracy: 0.87; sensitivity: 0.98; specificity: 0.51; positive predictive value (PPV): 0.86; negative predictive value (NPV): 0.93]. Conclusion Using the XGBoost classifier with heart transplantation patients can provide an accurate prediction of ICU-LOS, which will not only improve the accuracy of clinical decision-making but also contribute to the allocation and management of medical resources; it is also a real-world example of precision medicine in hospitals.


INTRODUCTION
Nowadays, heart transplantation surgery is a life-saving last resort for patients with terminal heart disease (1). The number of heart failure patients is rapidly increasing in developed countries due to population ageing (2). As the largest developing country and the second-largest economy in the world, China is no exception (3). It is estimated that there are over 4.5 million heart failure patients requiring heart transplantation surgery annually in China, costing the economy 30.7 billion dollars per year (4).
Cardiac surgeries, especially heart transplantations, rank among the most challenging surgical interventions to perform and necessitate the routine admission of patients to the ICU (5). Once in the ICU, the patients' characteristics can have a considerable impact on ICU length of stay (ICU-LOS). Several studies and meta-analyses have revealed that renal failure can lead to a significantly prolonged LOS and higher rates of in-ICU mortality (6)(7)(8). Prolonged stays in the ICU contribute significantly to overall ICU costs. Meanwhile, the ICU-LOS varies greatly among patients. Prediction of the ICU-LOS based on patients' clinical characteristics is beneficial for guiding the decisions of clinicians and enables hospital beds to be used more effectively. Therefore, accurate prediction of ICU-LOS is of great significance for both heart transplantation patients and the hospital.
Previous studies on the prediction of ICU-LOS have been scarce; in the majority of these, traditional static methods-such as the logistic regression model-were used (9,10). However, these methods may be limited in terms of the number of clinical features and their linear characteristics. Nowadays, machine learning (ML) is widely used to solve biostatistical problems in medicine (11,12). Machine learning algorithms can effectively process a large amount of data in the field of medicine. In addition, non-parametric approaches may have potential for revealing relationships otherwise obscured by non-linearities in data (13). Therefore, four machine learning models were Abbreviations: ICU-LOS, ICU length of stay; LASSO, the least absolute shrinkage and selection operator; AUC-ROC, the area under the receiver operating characteristic; Cr, creatinine; ECMO, extracorporeal membrane oxygenation; CRRT, continuous renal replacement therapy; IABP, intra-aortic balloon pump; PPV, positive predictive value; NPV, negative predictive value; RF, Random Forest; NB, Naive Bayes; SVM, Support Vector Machines; LR, linear regression; XGboost, The eXtreme Gradient Boosting algorithms; SHAP, Shapley Additive explanations. constructed to predict ICU-LOS in heart transplantation patients based on features we summarized in our study, with the aim of providing clinical decision-making support for doctors.

Study Population and Data Source
Between 2017 and 2020, the data of 365 consecutive patients undergoing heart transplantation surgery for moderate (NYHA grade 3) or severe (NYHA grade 4) heart failure were collected in Wuhan Union Hospital. First, we stratified the patients into the following three groups based on ICU-LOS: the lowest quartile (25th quartile), the median (25th-75th quartiles), and the highest quartile (75th quartile). We then defined the 75th quartile of ICU-LOS (9.08 days) as the demarcation line between short and prolonged periods. The baseline characteristics are summarized in Table 1. Missing values were imputed with median of each corresponding variable. Ethics committee approval was obtained with the study adhering to the principles of the Declaration of Helsinki, and the requirement for informed consent was waived (ChiCTR2200055529).

Variable Definition and Collection
A total of 84 patient characteristics were collected for subsequent studies. These clinical features have been reported or considered by clinicians to be closely related to cardiovascular diseases. For example, there was a direct link between the level of potassium concentration and the occurrence of arrhythmia (14). Additionally, the elevated levels of ALT and AST appear in liver function abnormalities, which correlated with the prognosis (15). The collected information included the following: recipient gender; recipient age; recipient BMI; recipient blood type; donor age; donor gender; donor BMI; donor blood type; cold ischemic time; classification of NYHA heart function (NYHA); history of cardiac surgery; smoking history; alcohol history; no assistant device; implantable cardioverter-defibrillator (ICD); combined cardioverter-defibrillator (CRT-D); intra-aortic balloon pump (IABP); extracorporeal membrane oxygenation (ECMO); dopamine; angiotensin-converting enzyme inhibitors (ACEI); angiotensin receptor blocker (ARB); β-blockers (BB); red blood cell; platelet; white blood cell; neutrophil-to-lymphocyte ratio; Renal/hepatic complications refer to post-operative renal or hepatic damage that maybe related to the transplant as demonstrated by significant elevation of serum levels of ALT/AST or Cr. ECMO was used as arterio-venous ECMO providing both cardiac and respiratory support or as V-V ECMO, which only provides oxygenation. Intra-aortic balloon pump is an important physiologic adjunct in the temporary support for the failing myocardium. Continuous renal replacement therapy (CRRT) is common practice in critical care patients with acute renal failure. Chest Re-exploration means that post-operative patients had emergencies like a massive hemorrhage in the thoracic cavity, compressing the pulmonary tissue and resulting in atelectasis.
High Cr in our study as value of creatinine above the normal range (108 µ moI/L). High ALT/AST in our study as value of ALT/AST above 40 U/L. The normal INR/BNP/LDL range was 0.8-1.3, <100 pg/ml and <3.4 mg/dl. Variables with p-values less than 0.05 are bolded.  whether to re-exploration the chest; ICU length of stay. In our study, we define matching blood type as the blood types of the recipient and the donor being exactly the same. Otherwise, the blood type is different, but it can be heterotype compatible, for convenience, we describe the former as "yes" and the latter as "no." All clinical data were obtained from a review of the patients' medical records. Whether a recipient was considered to have complications or a history of ECMO/chest re-exploration, etc. are based on whether this was mentioned in the progress notes or the discharge summary/diagnosis.

Feature Extraction
We performed feature preselection using the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm on the candidate features, followed by PLS-DA (Partial Least Squares-Based Discriminant Analysis) classification using the LASSOselected features to single out the optimal features (16). The dataset was randomly split into the training set and the testing set at a ratio of 7:3. To our knowledge, cross validation includes k-fold cross validation and leave-one-out cross validation. According to the literature, it is proposed that the leave-one-out test might overfit in small samples, whereas the K-fold cross-validation should do better (17). Meantime, the efficiency of k-fold cross validation is higher than the leave-one-out test in theory, and k-fold cross validation can be repeated k times, that means the evaluation value we get is the average of the k times results. In other words, once calculation error occurred, the results of the others will compensate, and finally output represent the level of the overall result. Then, to estimate the sample performance, we made use of fivefold cross validation. We used fivefold cross validation instead of 10-fold cross validation because at small training sample sizes (e.g., 20,25,30) the condition "at least one different sample per class in every fold" didn't hold for 10-fold cross validation. The classification model was then built using the training set, with the testing set being used for prediction and performance evaluation. In the LASSO model, the minimum criterion (λ) based on five cross-validations was chosen.
In addition, in our study, we use up-sampling to process the data, which can randomly samples so that replacement from the smallest class is the same size as the largest class (18). And we also avoid complex models with many parameters, thus limiting their generalization and possibility of overfitting. Furthermore, we use linear model and tree-based model to select the features, to build a better model with the least features to ensure not to be fooled by overfitting (19).

Linear Regression Model
Linear Regression (LR) is a standard statistical generalized linear model method used in data mining, automatic disease diagnosis, economic forecasting, and other broad applications. The algorithm is essentially a conventional two-category model, with the object's category determined by inputting the object's attribute sequence. To classify the data, the model assumes that the data follows the Bernoulli distribution and employs the method of maximizing the likelihood function to solve the parameters with gradient descent. In our study, a multivariable LR model was built using the function glm of the R package stats.
Odds ratios were calculated for each risk factor by exponentiating the LR coefficients. Odds ratios less than 1.0 indicate a decreased risk, while odds ratios greater than 1.0 indicate an increased risk, and the p-value of <0.05 was considered significant.

Machine Learning Models
Four classic machine learning models with fivefold crossvalidation were developed to predict ICU-LOS; namely, Naive Bayes (NB), Random Forest (RF), Support Vector Machines (SVM), and XGboost. All machine learning models were constructed using Random Forest, XGboost, and caret packages in the R programming language (version 3.3.1). As an extra level of precaution, models were trained and tested by two different team members (LY and KW) to ensure that models were never accidentally trained using held-out test data.

Random Forest
The random forest method is a machine learning technique that mixes many decision trees to create a single classification model. The random forest approach generates a forest of multiple decision trees by selecting various dividing characteristics and training samples. When predicting unknown samples, each tree in the forest is trained to make decisions, significantly increasing prediction accuracy compared to a single decision tree. After statistically assessing the decision outcomes, the classification with the most votes are recognized as the official classification result.

Naive Bayes
Naive Bayes classifier is a highly scalable supervised learning technique. Bayesian reasoning is based on probability to derive conclusion about the ideal decision's probability distribution. It has been effectively implemented in various scientific areas and consistently performs well even when just a few variables are considered.

Support Vector Machines
Support vector machines is a machine learning approach that is based on statistical learning theory. A support vector machine aims to minimize generalization error by creating a hyperplane in a high-dimensional space and utilizing a maximum margin to separate feature vectors belonging to distinct classes. When a support vector machine is used for linear classification, an n-1 dimensional hyperplane is used, where n is the dimension of the data.
eXtreme Gradient Boosting eXtreme Gradient Boosting is one of the most extensively used machine learning classifiers in bioinformatics. It is based on a tree model that classifies using a boosting method, according to the literature, it can also perform a good fit when randomly reducing the sample size to 100 (20). Regularization elements are added to the cost function to minimize the model's complexity and to prevent overfitting. Additionally, the parallel computing function is enabled by the XGboost algorithm, which significantly accelerates calculation.

Shapley Additive Explanations
Shapley additive explanations is a recently developed technique that aims to interpret black box machine learning models. Lundberg et al. examined several contemporary algorithms for determine feature importance and showed that they belonged to the same class of measures, then unified them into the SHAP framework (21). Most previous machine learning algorithms provide predictors with global feature importance, and it is difficult to interpret each prediction case. However, the SHAP technique calculates the contribution of each input variable in each decision of a machine learning model. We obtained SHAP values by executing package "shap" to interpret model predictions (22).

Statistical Analyses
The machine learning models included random forest (RF), naive Bayes (NB), support-vector machines (SVM), and XGBoost. Linear regression (LR) and machine learning models were performed using R software (4.1.1) and Python (Version 2.8).
Then, the performance of these machine learning methods was measured by calculating the AUC-the area under the ROC (relative operating characteristic) curve.

Patients' Characteristics
As shown in Figure 1, 365 patients were enrolled in this study. The baseline characteristics of these patients are shown in Table 1.
The median age of the recipient patients was 47.2 years, while

Feature Selection
These patients were randomly placed into a training set (N = 256) and a test set (N = 109), which were used to train, optimize and evaluate models ( Table 2). The results revealed that variables p-values for the training and validation sets were greater than 0.05, which indicated that there has no significant differences between training and test dataset variables. Figure 1 presents a flow diagram depicting the sampling strategy for this study. 30 clinical features were selected as crucial variables by machine learning methods. We performed a correlation analysis between these features (Figure 2). Besides ALT and AST, no additional significant correlations were identified through correlation analysis. We then used LASSO regression (LR) to identify six variables that were relevant for algorithm development (Figure 3) (23). Each point in the figure is a feature value of a particular training example. The color of the point represent the feature value and the X-axis position of the point is its SHAP value. The features are ranked by the sum of SHAP value magnitudes over all samples. Figure 4 shows the detailed impact of each feature to the model output in dataset using SHAP. Compared with XGBoost feature importance result, the rank of feature importance is different between two methods,  File 3). Meantime, SHAP analysis of feature importance for prediction of ICU-LOS (the severity of patients' conditions) was consistent with current medical knowledge. The use of IABP, CRRT can both to be bad prognostic features of a patient undergoing heart surgery (24). Increases in Cr (renal complications) can be caused by prolonged surgery time, so it makes sense that this would be an important predictive feature. In order of importance, these variables were: (1) the use of extracorporeal membrane oxygenation (ECMO); (2) donor age; (3) the use of an intra-aortic balloon pump (IABP); (4) length of surgery; (5) high creatinine (Cr); and (6) the use of continuous renal replacement therapy (CRRT) (Figure 4).

Performance of Intensive Care Unit Length of Stay Prediction Models
After identifying the six variables, the receiver operating characteristics (ROC) for predicting the length of ICU-LOS in heart transplantation patients by using machine learning models were determined; these are presented in Figure 5. As shown in Table 3 Figure 6.

Online Prediction Tool
The final model was incorporated into an online open-access platform prediction tool, allowing users to calculate the ICU-LOS of patients following heart transplantation surgery 1 based on six selected features. Figure 7 displays the results generated by the prediction tool for two case scenarios. We analyzed the training time to build an explainable XGboost model using the whole training dataset. The training process was completed within 50 min to obtain the final multiclass model. During the validation process, the execution time required for one case was 30 ms using our platform without a graphic processing unit.

DISCUSSION
With the continuous development of societies, people's living standards and quality of life have been significantly improved; however, most countries-especially developed countries-are prone to population ageing (25), with heart failure being recognized as a major aging-associated disease (26,27). However, with the exception of heart transplantation surgery for end-stage heart failure, no curative treatment has yet become available (28). After the heart transplantation procedure, post-operative patients are admitted to the ICU as a matter of course. Prolonged ICU-LOS brings further financial risk, as well as risks associated with morbidity and mortality (29). Premature ICU discharge may potentially expose patients to the risks of unsuitable treatment, which further leads to preventable mortality (30). However, it remains a difficult task for clinicians to accurately predict the ICU-LOS due to the complexity of characteristics presented by different patients (31).
As electronic health records have become more prevalent, we have been able to harness the power of machine learning to accurately predict the length of stay. Machine learning models have been developed to predict the length of stay in the ICU after heart surgery (32). Other studies have also used these models to predict mortality and disease severity in the ICU (33). A previous study has trained a support vector machine (SVM) model to forecast patient survival and length of stay using data from 14,480 patients (34). The AUC of prolonged ICU length of stay was 0.82. Mani et al. reported that a hidden Markov model also predicted ICU length of stay with reasonable accuracy (35). Although there have been several models to predict ICU-LOS, most of them have focused on patients undergoing cardiac surgery and thus may not be suitable for heart transplantation surgery patients. Our study, to the best of our knowledge, is the first to predict ICU-LOS in heart transplantation patients using four machine learning methods in one single ICU. Also, by using this prediction algorithm for ICU-LOS based on patients' clinical features, we can direct the clinicians' attention toward patients who are most at risk.
Intensive care unit length of stay (LOS) is a well-established measure of ICU resource use and performance. In only 53% of cases can the ICU-LOS of patients be adequately predicted by ICU physicians (36), and the reason for a longer ICU-LOS remains unclear. Our primary purpose in this article was to precisely identify patients' ICU-LOS in order to best utilize medical resources and give practical advice for clinicians. In this study, we applied 84 basic clinical characteristics to LASSO algorithms and determined the six most important clinical features in increasing the ICU length of stay. The six clinical features are (1) the use of extracorporeal membrane oxygenation (ECMO); (2) donor age; (3) the use of an intra-aortic balloon pump (IABP); (4) length of surgery; (5) high creatinine (Cr); and (6) the use of continuous renal replacement therapy (CRRT). Indeed, it is not surprising that ECMO, IABP, and CRRT are critical in increasing the LOS in the ICU. This also clearly demonstrates that quality criteria for feature selection is accurate and stable. Meanwhile, in addition to these recognized risk factors, other factors and models still have some potential influence. For example, the value of Cr indicates the degree of renal dysfunction, which directly affects the speed of postoperative recovery and may bring about an increase in operating time and, consequently, medical expenses. Prolonged surgery time was found to cause an increase in the aortic cross clamp time and the time to return to high PaO2, causing damage to several organs and increasing the length of stay in the ICU. In addition, the quality of the donor organ depends on donor-related factors, such as donor age, which ultimately affects the prognosis of the transplantation (37,38). Also, in 2019, a nationwide report from South Korea revealed that donor age has a close relationship to conditional mortality in heart transplantation patients (39). Furthermore, abnormal blood creatinine levels suggest internal environment disorder and impairment of renal function, indicating a worse state of patients (40)(41)(42). These results showed that changes in pre-operative conditions may be predictive of clinical endpoints (43).
Based on these results, prediction models were developed using four machine learning methods. It was observed that, among these four methods, the XGBoost classifier exhibited stronger diagnostic power for the predictions, with an area under the ROC curve of 0.88. Also, its effect was significantly better than of the traditional linear regression methods (LR: ROC = 0.81). This implies that the use of machine learningespecially XGBoost-in making predictions based on the preoperative patients' characteristics will enable us to determine patients' ICU-LOS earlier, allocate medical resources accordingly, and optimize treatment procedures. For the convenience of clinicians, a predictive model for calculating ICU-LOS in heart transplantation patients was established; it was displayed using a webpage calculator for convenient clinical application (see text footnote 1).
However, this study has some limitations. A major limitation is that the research data came from one single ICU with a small number of patients, and there may be some bias caused by regional factors. In medical research, relatively small data sets are common, which often leads to concerns about the stability of models (19). Because of this, we made sure to use simple models and select the features accordingly so that the results would not suffer from the problem of overfitting. This issue can also be solved by external validation; however, due to the particularity of clinical data, clinical details of heart transplantation patients in other centers were not easily accessible. An additional limitation is that the published articles only gave us access to the researchers' inferred data and not the raw data; thus, it was impractical to perform an external validation in this study. A further limitation is that we could not provide a causal relationship between factors such as increased surgery duration or impaired renal function and the observed prolonged ICU treatment. However, a goaldirected study should be able to provide such a relationship.
In summary, the overall results revealed the XGBoost model to be the superior model of classification for predicting heart transplantation patients' ICU-LOS (AUC = 0.88) in comparison to logistics regression (LR) and three other machine learning models (NB, RF, and SVM). Also, six clinical features were identified. It is reasonable to assume that the coming era of big data and personalized medicine will see a significant increase in machine learning applications for assisting clinical decisions.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
Ethics committee approval was obtained with the study adhering to the principles of the Declaration of Helsinki, and the requirement for informed consent was waived (ChiCTR2200055529).

AUTHOR CONTRIBUTIONS
KW conceived and designed the study and wrote the manuscript. KW, LY, and WL collected and analyzed the data. CJ, NW, and QZ revised the manuscript. JS and ND reviewed and edited the manuscript. All authors participated in discussions and approved the final manuscript.