Mortality prediction and influencing factors for intensive care unit patients with acute tubular necrosis: random survival forest and cox regression analysis

Background: Patients with acute tubular necrosis (ATN) not only have severe renal failure, but also have many comorbidities, which can be life-threatening and require timely treatment. Identifying the influencing factors of ATN and taking appropriate interventions can effectively shorten the duration of the disease to reduce mortality and improve patient prognosis. Methods: Mortality prediction models were constructed by using the random survival forest (RSF) algorithm and the Cox regression. Next, the performance of both models was assessed by the out-of-bag (OOB) error rate, the integrated brier score, the prediction error curve, and area under the curve (AUC) at 30, 60 and 90 days. Finally, the optimal prediction model was selected and the decision curve analysis and nomogram were established. Results: RSF model was constructed under the optimal combination of parameters (mtry = 10, nodesize = 88). Vasopressors, international normalized ratio (INR)_min, chloride_max, base excess_min, bicarbonate_max, anion gap_min, and metastatic solid tumor were identified as risk factors that had strong influence on mortality in ATN patients. Uni-variate and multivariate regression analyses were used to establish the Cox regression model. Nor-epinephrine, vasopressors, INR_min, severe liver disease, and metastatic solid tumor were identified as important risk factors. The discrimination and calibration ability of both predictive models were demonstrated by the OOB error rate and the integrated brier score. However, the prediction error curve of Cox regression model was consistently lower than that of RSF model, indicating that Cox regression model was more stable and reliable. Then, Cox regression model was also more accurate in predicting mortality of ATN patients based on the AUC at different time points (30, 60 and 90 days). The analysis of decision curve analysis shows that the net benefit range of Cox regression model at different time points is large, indicating that the model has good clinical effectiveness. Finally, a nomogram predicting the risk of death was created based on Cox model. Conclusion: The Cox regression model is superior to the RSF algorithm model in predicting mortality of patients with ATN. Moreover, the model has certain clinical utility, which can provide clinicians with some reference basis in the treatment of ATN and contribute to improve patient prognosis.


Background
Acute tubular necrosis (ATN) is the most common type of acute renal failure (accounting for approximately 70%-80%) (Zhou et al., 2010).It is a clinical syndrome caused by ischemia of renal tissue or necrosis of tubular epithelial cells due to toxic damage, and resulting in a dramatic decrease in the glomerular filtration rate (An et al., 2022).It is often manifested as progressive azotemia, electrolyte disturbance, acidbase balance disorder, and a host of other symptoms.ATN patients not only have severe renal failure, but also have many comorbidities, such as myocardial infarction, congestive heart failure, peptic ulcer disease, etc., which can be life-threatening and require timely treatment.ATN is associated with high mortality, especially for those patients in the intensive care unit (ICU) (Rosen and Stillman, 2008).Understanding the influencing factors of ATN and taking appropriate interventions can effectively shorten the duration of the disease to reduce mortality and improve patient prognosis.Previous studies have shown that pH, base excess, creatinine, and blood urea nitrogen (BUN) are common influencing factors of ATN (Liaño et al., 1989).However, other potential risk factors that may affect the prognosis of patients with ATN have not yet been identified.In recent years, the continuous development of medical information technology and the popularization of electronic medical record systems have generated a large quantity of data for prognostic model evaluation and other clinical applications.
Cox regression model is the most common semi-parametric regression model, which can analyze the influence of multiple factors on outcome events and carry out statistical analysis on data with censoring (Moolgavkar et al., 2018).This method improves the efficiency and reliability of survival analysis by considering multiple predictor variables simultaneously (Koletsi and Pandis, 2017).It has been widely used in medical research, such as evaluating the survival rate of cancer patients, the risk of heart failure, and the prognosis prediction of patients (Hippisley-Cox and Coupland, 2015;Tang M. et al., 2021;Wang et al., 2022).On the other hand, machine learning (ML) is a data-based algorithmic technology that automatically analyzes and learns patterns and regularities in data to predict and optimize future outcomes, and is a fast-growing area.ML technology has also been applied in many aspects of medical research, which is important for improving medical care and promoting human health (Noorbakhsh-Sabet et al., 2019;Issa et al., 2021).Random survival forest (RSF) is a comprehensive method of random forest and survival analysis, which processes right censored data.Different from the general binary classification method, the target variable of survival analysis method is survival time.By training a large number of survival trees, the model votes out the final prediction results weighted from individual trees in the form of voting.RSF is a ML algorithm based on decision tree (Yosefian et al., 2015), which has good prediction accuracy without over-fitting, and it is suitable for survival analysis of many diseases (Farhadian et al., 2021;Roshanaei et al., 2022).The most important feature of the RSF algorithm is that it can rank the importance of variables in order to filter out those that have a greater impact on outcome indicators (Adham et al., 2017;Wang and Zhou, 2017).Moreover, RSF can effectively deal with the problem of data imbalance, when there is classification imbalance, RSF can balance the data error.Currently, it has been used in the construction of prognostic models for many different diseases, such as heart failure, arrhythmia, multiple myeloma, etc. (Hsich et al., 2011;Miao et al., 2015;Morvan et al., 2020).In addition, several studies have compared the performance of RSF with the classical Cox regression model, while some have found that RSF is more accurate than Cox regression (Sloan et al., 2016;Ma et al., 2020;Tapak et al., 2020), others have reached the opposite conclusion (Qiu et al., 2020).Hitherto, there were no comparative studies of two models in the ATN based on large sample data.Therefore, this study aimed to investigate the prediction of mortality in ATN patients in the ICU and the associated influencing factors by using RSF algorithm and Cox regression method.

Data source and study population
Medical Information Mart for Intensive Care IV (MIMIC-IV) is a large-scale public database containing clinical information of patients at Beth Israel Deaconess Medical Center between 2008 and 2019, which was established by the Massachusetts Institute of Technology and Beth Israel Deaconess Medical Center.In the MIMIC-IV database, the patients' true identifying information is hidden, therefore, there is no need to obtain informed consent from patients.However, researchers are required to complete relevant training courses and receive certificate before accessing the database.Datasets were obtained from the Physionet official website (http://mimic.physionet.org/).
A total of 4,031 patients were diagnosed with ATN in the database.For this study, the inclusion criteria were: over 18 years old and admission to the ICU longer than 24 h.Exclusion criteria were: patients who died within 24 h of the ICU admission and patients with incomplete data.For patients with multiple ICU admissions, and only data from their first admission were taken.Ultimately, a total of 3,220 patients were enrolled in this study.

Data extraction
Datasets were extracted by using structured query language.Basic information of ATN patients included: age at admission, gender, ethnicity, weight, length of stay in ICU, etc. Treatment measures included: antibiotic use, vasopressors use, nor-epinephrine use, the use of continuous renal replacement therapy, etc. Related comorbidities included the following: myocardial infarction, congestive heart failure, peripheral vascular disease, chronic pulmonary disease, cerebrovascular disease, rheumatic disease, mild liver disease, peptic ulcer disease, paraplegia, malignant cancer, severe liver disease, metastatic solid tumor, etc.The first laboratory test results after ICU admission included: hemoglobin, white blood cells, base excess, pH, anion gap (AG), bicarbonate, international normalized ratio (INR), prothrombin time, urine output, arterial partial pressure of oxygen, creatinine, BUN, chloride, glucose, etc. Vital signs after ICU admission included: heart rate, respiratory rate, systolic blood pressure (SBP), diastolic blood pressure (DBP), body temperature, etc.Because of the high sampling frequency, the maximum, the minimum and the average values were used to represent vital signs and laboratory test results.

Model construction
Patients with ATN were randomly divided into the training set and validation set in an 8:2 ratio.The training set was used to construct RSF or Cox model and the validation set was used to evaluate the performance of the two predictive models.RSF model was constructed on the basis of optimal parameter combination.The out-of-bag (OOB) error rate under different parameter combination, which is calculated by grid search method, and it was used to determine the optimal parameter combination of the model (Wang et al., 2019).RSF algorithm has the ability to assess the importance of each variable that contributed to the outcome indicators.In this study, the minimum depth method was used to measure and rank the importance of each variable (Peng et al., 2016).On the other hand, uni-variate and multivariate regression analysis were carried out for the Cox regression model.All variables were first analyzed in the uni-variate Cox regression model, and those with p-values less than 0.05 were selected and subjected to multivariate Cox regression analysis.

Model comparison
The OOB error rate is equivalent to the value of 1-C index, which is used to evaluate the prediction ability of the model.The smaller the out-of-bag error rate is, the stronger the differentiation ability of the model is.Brier score is an evaluation index to evaluate different survival models and can represent the prediction accuracy of prediction models.Brier score can be viewed as a "calibrated" measure of a set of probabilistic predictions.The OOB error rate and the integrated Brier score were first calculated to determine the discrimination and calibration ability of the two models.The smaller the OOB error rate, the better the discrimination ability of the predictive model (Banerjee et al., 2012).The model is well calibrated when the Brier score is less than 0.25 (Mogensen et al., 2012).And the smaller the Brier score, the better the calibration of the model.Then, the prediction error curves of two models were plotted for judging the prediction performance.To further assess the prognostic ability of two models, 30-day, 60-day and 90-day dependent receiver operating characteristic (ROC) curves were plotted.A larger area under the curve (AUC) value indicates a stronger predictive ability of the model when the AUC values are greater than 0.5 (Bansal and Heagerty, 2018).To analyze the decision curve analysis (DCA) of the Cox model, 30-day, 60-day, and 90-day DCA were plotted.When the net benefit of DCA is large, it indicates that the clinical application value of the model is high (Vickers and Holland, 2021).The superior performance model was utilized to construct a nomogram as individual prediction tool for ATN mortality risk.

Statistical analysis
For descriptive variables, the median and quartile were superior to the means and standard deviation values in several statistical guides (Kattan and Vickers, 2020).Therefore, continuous variables were represented by using the median and quartile and were compared by Mann-Whitney U test.Categorical variables were expressed in terms of frequency or percentage and compared using Chi-square tests or Fisher's exact tests.In this study, indicators with the missing degree greater than 20% were removed, and then the remaining missing data were filled with multipe interpolation method (Zhang et al., 2022).Since the presence of outliers reduces the accuracy of RSF algorithm, the outliers were first identified by using box-plots, then the outlier indicators that exceed 10% were removed and median replacement was performed on the remaining outlier data (Dutta et al., 2022).The software packages used in the data analysis and processing included: randomForestSRC, survival, ggRandomForests, timeROC, and ggplot2.

Baseline characteristics
A total of 3,220 patients with ATN were included in this study, of which 2,457 patients survived and 763 patients died during hospitalization.Comparisons between two groups showed that there were significant differences in the age, weight, length of stay in ICU, vasopressors, mild liver disease, severe liver disease, metastatic solid tumors, AG_ min, AG_max, bicarbonate_min, bicarbonate_max, chloride_ min, chloride_max, creatinine_min, creatinine_max, base excess_min, base excess_max, temperature_min, temperature_mean, urine output, DBP_min, etc.There were no statistically significant differences in the variables including gender, myocardial infarction, peripheral vascular disease, paraplegia, rheumatic disease, peptic ulcer disease, glucose_ min, glucose_max, etc.Other baseline characteristics were shown in Tables 1-3.The OOB error rates were calculated by grid search method under different combination of parameters.As shown in Figure 1, RSF model achieved the lowest OOB error rate (21.4%) under the parameter combination of mtry = 10 and nodesize = 88.Different shades of the same color indicate the level of OOB error rate, while a shift from yellow to purple indicates an increase in the OOB error rate.The lower the OOB error rate, the better the predictive ability of model (Lines et al., 2021).As shown in Figure 2, the OOB error rate of model stabilized when 500 survival trees were reached.As shown in Table 4, the variables were ranked in importance by using the minimum depth method.Vasopressors, age, length of stay in ICU, metastatic solid tumors, INR_min, respiratory rate_min, chloride_ max, calcium_min, base excess_min, bicarbonate_max, AG_min, potassium_min, pH_max, bicarbonate_min, DBP_min were the top 15 variables, indicating that these variables have strong predictive ability and significant effect on the outcome.

Cox regression model
Uni-variate Cox regression analysis results revealed that variables included age, length of stay in ICU, dopamine, epinephrine, norepinephrine, vasopressors, AG_min, AG_max, bicarbonate_min, Curve of the OOB error rate for RSF model.The OOB error rate drops from 0.258 to 0.214 and stabilizes at 500 survival trees.bicarbonate_max, INR_min, INR_max, pH_min, pH_max, base excess_min, base excess_max, urine output, SBP_min, SBP_max, SBP_mean, temperature_min, temperature_max, temperature_mean, cerebrovascular disease, chronic pulmonary disease, mild liver disease, severe liver disease, and metastatic solid tumors had p-value ≤0.05.These variables were included in the multivariate Cox regression model for analysis.Dopamine, epinephrine, nor-epinephrine, vasopressors, INR_min, cerebrovascular disease, chronic pulmonary disease, mild liver disease, severe liver disease, metastatic solid tumors were important risk factors that increased the risk of death in ATN patients.However, length of stay in ICU was protective factor.The shorter the stay, the lower the risk of death.Detailed information of Cox regression analysis was shown in Table 5.   Frontiers in Pharmacology frontiersin.org10 two models were small, indicating that both predictive models had good discrimination ability.The integrated Brier score of two models were 0.199 (RSF) and 0.154 (Cox), both of which were less than 0.25, suggesting that two models had good calibration ability.As shown in Figure 3, the prediction error curve of Cox regression model was lower than that of RSF model, indicating that Cox regression model was more stable and reliable.In the validation set, RSF model had AUC values of 0.788, 0.719, and 0.715 at 30, 60 and 90 days (Figure 4).Cox regression model had AUC values of 0.833, 0.736, and 0.732 at 30, 60 and 90 days (Figure 4).Thus, it appeared that Cox regression model was more accurate in predicting mortality of ATN patients.As shown in Figure 5, the net benefit of Cox model at different time points was large, indicating that the Cox model had high clinical practical value.

Nomogram for predicting risk of ATN mortality
Given that Cox model outperforms RSF model in the discrimination and calibration ability, a nomogram was constructed on the basis of Cox model to predict the probability of death at the individual level in ATN patients.The nomogram was constructed to predict 30-day, 60-day, and 90-day mortality risk based on 15 significant variables in the training set.In the nomogram, an individual score for each factor is obtained by projecting the value of each factor vertically onto the first row of "points".For each participant, the total score was calculated by adding the scores for each factor.By projecting the total score vertically to the bottom, we can get a picture of the risk of death for ATN patients.Assuming a 68-yearold patient with ATN has a score of 82 for metastatic solid tumor, 83 for severe liver disease, and 87 for mild liver disease.Chronic pulmonary disease score was 84, cerebrovascular disease score was 82, SBP_min score was 83, urine output score was 83, and PaO2_min score was 82.INR_min score was 80, vasopressor score was 91, nor-epinephrine score was 86, epinephrine score was 88, dopamine score was 81, length of stay in the ICU score was 55, with an age score of 83 out of 1,230, the estimated risk of death at 30, 60, and 90 days was 15%, 37.6%, and 61.6% (Figure 6).

Discussion
ATN is one of the most common types of acute kidney injury, which seriously affects the quality of life and even threatens their lives, and it is characterized by high morbidity, high mortality and poor prognosis (Hoste et al., 2018).Therefore, it is important to identify the influencing  Frontiers in Pharmacology frontiersin.org11 Zeng et al. 10.3389/fphar.2024.1361923factors for ATN, which can help screen those patients at high risk and receive proper treatment.
Although ML methods represented by RSF algorithm performed well in several fields, this does not mean that MLalgorithms have an absolute advantage over traditional methods.For example, Cuthbert et al. (Cuthbert et al., 2022) analyzed the prediction of 8-year revision risk following total knee and hip arthroplasty, and Tang X et al. (Tang X. et al., 2021) studied prognostic prediction in metastatic non-small cell lung cancer patients receiving EGFR-TKI osiertinib treatment, both studies have proved that traditional statistical methods have certain strengths.Nonetheless, RSF algorithm also has its own unique advantages.RSF algorithm can directly and quantitatively calculate the minimum depth of each variable to reflect the magnitude of importance, which facilitates the comparison among variables (Taylor, 2011).This is a feature which has not been found in the traditional Cox regression method.Therefore, it is better to construct predictive models by using several methods and compare them to identify the best ones.
Since Cox model outperforms RSF model, we constructed a nomogram based on the Cox model.The nomogram is a visualization tool used to generate the probability of clinical outcome (Park, 2018).Studies have demonstrated that nomogram enable accurate compared to traditional scoring systems (Wang et al., 2021).It is now widely used for risk  prediction of many diseases (Li et al., 2021;Tan et al., 2022;Wang et al., 2023).Therefore, the creation of mortality risk nomogram based on the information about ATN patients can inform existing critical care assessment programs.Clinical staff can use the total score to predict the probability of death in ATN patients, thus assisting them in developing a more rational treatment plan.
In this study, two different models were constructed to explore the influencing factors of ATN.Cox regression analysis concluded that vasopressors, nor-epinephrine, INR_max, severe liver disease, and metastatic solid tumors were the important risk factors.RSF model concluded that vasopressors, INR_max, and metastatic solid tumors were the important influencing factors based on the importance rank of the variables.The influencing factors identified by two methods of analysis are basically similar, indicating that they are probably true factors associated with ATN.
Among the variables associated with predicting ATN patients, the most important ones are the AG, pH, base excess, BUN and bicarbonate, which can be used to determine whether patients have symptoms of acid-base imbalance, azotemia and electrolyte disturbances (Bellomo, 2011).Urine output is the most common factor affecting ATN.This is due to decreased urine output can cause hypovolemia, which increases the risk of death from the disease (Xu et al., 2020).Timely rehydration therapy can restore the circulating blood volume and improve the impaired renal perfusion function.
ATN is often associated with many comorbidities, and the presence of these comorbidities also increases the risk of death from ATN. Severe liver disease is a relatively common comorbidity in patients with ATN.Due to the presence of large amounts of peritoneal fluid in patients with severe hepatitis, it can lead to insufficient circulating blood volume and uneven distribution of intrarenal blood flow, which ultimately increases the probability of death in ATN (Chancharoenthana and Leelahavanichkul, 2019).In addition, metastatic solid tumors are the common comorbidity that increases the risk of death from this disease (Wu et al., 2020), and the main reason is that neutrophils in solid tumors can enhance cytotoxicity and lead to necrosis of renal tubular epithelial cells, thus reducing patient survival (Liao and Liaw, 2020).
Relevant studies have shown that certain drugs also increase the risk of death in ATN.For example, vasopressor drugs can increase glomerular perfusion pressure and urine output, thus affecting renal function (Shi and Wang, 2017).Nor-epinephrine can increase patients' blood pressure and reduce renal blood flow, resulting in renal function impairment (Kim et al., 2021).INR is a preferred monitoring indicator for oral anticoagulants.Since overdose of anticoagulants increases the probability of death from ATN, the level of this indicator also reflects the risk of death occurring from ATN (Lim and Campbell, 2013).
However, the association between these drugs and disease needs further study.Because machine learning is one of the main methods of drug knowledge discovery.In the follow-up study, we plan to use machine learning, text mining and other technical methods to mine the process of drug tacit knowledge contained in the data, so as to explore whether there is a potential association between drugs and some biomedical entities, such as drug-disease association, and the association between drugs and side effects, etc.The main advantage of this study is that it was the first to use the RSF algorithm and the Cox regression method to predict hospital mortality of patients with ATN from the MIMIC-IV database.Cox regression model has improved accuracy and precision compared to RSF model.This study also has some limitations: firstly, it is a single-center study and lacks external validation.Secondly, this was a retrospective observational study in which the majority of patients were white, and there may have been unobserved confounding factors that could have influenced the outcome.Finally, although the predictive ability of Cox regression model in this study is superior toRSF model, ML algorithms are evolving rapidly and new algorithms are constantly proposed, and further comparative studies are needed in practical applications.

Conclusion
Cox regression model is superior to RSF algorithm model in predicting mortality of patients with ATN.Vasopressors, norepinephrine, INR_min, and metastatic solid tumors were imporant factors that also significantly influence prognosis.Therefore, the mortality risk nomogram based on information about ATN patients can inform existing critical care assessment programs.Moreover, the model has certain clinical utility, which can provide clinicians with some reference basis in the treatment of ATN and contribute to improve patient prognosis.available and all patient data are de-identified.Informed consent of all subjects and/or their legal guardians was obtained when MIMIC-IV was established.

FIGURE 1
FIGURE 1Tuning parameter of the RSF model.The black sign in the figure is the parameter combination of mtry = 10 and nodesize = 88, and the corresponding OOB error rate is 21.4%.

FIGURE 3
FIGURE 3Prediction error curves for RSF and Cox models.The smaller the prediction error value, the more accurate the predictive power of the model.

FIGURE 5
FIGURE 5Decision curve analysis of the Cox model.

TABLE 1
General information of the patients.
ICU, intensive care unit; CCU, coronary care unit; SICU, surgical intensive care unit; MICU, medical intensive care unit; CVICU, cardiac vascular intensive care unit, p-value less than 0.05 are shown in bold text.

TABLE 2
The treatment and comorbidity of the patients.
(Continued in next column)

TABLE 2 (
Continued) The treatment and comorbidity of the patients.CRRT, continuous renal replacement therapy, p-value less than 0.05 are shown in bold text.

TABLE 3
Laboratory tests and vital signs of the patients.

TABLE 3 (
Continued) Laboratory tests and vital signs of the patients.
WBC, white blood cells; AG, anion gap; BUN, blood urea nitrogen; INR, international normalized ratio; PT, prothrombin time; PTT, partial thromboplastin time, pH potential of hydrogen, PaO2 partial pressure of oxygen, PaCO2 partial pressure of carbon dioxide, SBP, systolic blood pressure; DBP, diastolic blood pressure, SpO2 pulse oxygen saturation, Max maximum, Min minimum, p-value less than 0.05 are shown in bold text.

Table 6 ,
the OOB error rates for RSF model and Cox regression model were 0.214 and 0.215.The OOB error rates for

TABLE 4
Variable importance ranking for RSF.

TABLE 4 (
Continued) Variable importance ranking for RSF.
The 59 variables in the depth threshold and the corresponding minimum depth value; ICU, intensive care unit; INR, international normalized ratio; AG, anion gap, pH potential of hydrogen, DBP, diastolic blood pressure; SBP, systolic blood pressure; WBC, white blood cells; PTT, partial thromboplastin time, PaO2 partial pressure of oxygen, PaCO2 partial pressure of carbon dioxide, BUN, blood urea nitrogen, SpO2 pulse oxygen saturation, Max maximum, Min minimum.

TABLE 5 (
Continued) Cox regression analysis results.CI, confidence interval; ICU, intensive care unit; CRRT, continuous renal replacement therapy; WBC, white blood cells; AG, anion gap; BUN, blood urea nitrogen; INR, international normalized ratio; PT, prothrombin time; PTT, partial thromboplastin time, pH potential of hydrogen, PaO2 partial pressure of oxygen, PaCO2 partial pressure of carbon dioxide, SBP, systolic blood pressure; DBP, diastolic blood pressure, SpO2 pulse oxygen saturation, Max maximum, Min minimum, p-value less than 0.05 are shown in bold text.

TABLE 6
Performance comparison of the two models.