Application of Extreme Learning Machine in the Survival Analysis of Chronic Heart Failure Patients With High Percentage of Censored Survival Time

Objective: To explore the application of the Cox model based on extreme learning machine in the survival analysis of patients with chronic heart failure. Methods: The medical records of 5,279 inpatients diagnosed with chronic heart failure in two grade 3 and first-class hospitals in Taiyuan from 2014 to 2019 were collected; with death as the outcome and after the feature selection, the Lasso Cox, random survival forest (RSF), and the Cox model based on extreme learning machine (ELM Cox) were constructed for survival analysis and prediction; the prediction performance of the three models was explored based on simulated data with three censoring ratios of 25, 50, and 75%. Results: Simulation results showed that the prediction performance of the three models decreased with increasing censoring proportion, and the ELM Cox model performed best overall; the ELM Cox model constructed with 21 highly influential survival predictors screened from actual chronic heart failure data showed the best performance with C-index and Integrated Brier Score (IBS) of 0.775(0.755, 0.802) and 0.166(0.150, 0.182), respectively. Conclusion: The ELM Cox model showed good discrimination performance in the survival analysis of patients with chronic heart failure; it performs consistently for data with a high proportion of censored survival time; therefore, the model could help physicians identify patients at high risk of poor prognosis and target therapeutic measures to patients as early as possible.


INTRODUCTION
Chronic heart failure (CHF), one of the most severe cardiovascular diseases of the 21st century (1), is a complex clinical syndrome manifested when the heart does not pump enough blood for tissue and metabolic needs (2). As the prevalence of heart failure in China increases year by year, it has become a major cause of hospitalization and rehospitalization among the elderly, imposing a heavy medical burden on individuals and society (3). Adverse prognosis in heart failure patients can be intervened promptly with lifestyle modifications and medications that effectively slow the progression of the disease or prevent the onset of adverse prognosis (4).
Therefore, a prediction model for people with HF is beneficial to the development of patients, doctors, and even the entire society. Doctors can prescribe more aggressive treatment plans for high-risk patients based on accurate risk prediction, and patients will follow the treatment more because they have confidence in the treatment plan prescribed by the doctor (5). An accurate prediction model can also help clinical researchers design clinical trials to target high-risk patients with heterogeneous characteristics and change treatment interventions (6). Multiple heart failure survival prediction models have been developed and verified in multiple cohorts, such as the Seattle heart failure prediction model (7,8), and the above prediction models have been successfully used in routine clinical care to manage patients with different degrees of heart failure. However, the above survival prediction model data comes from clinical trials. These data have a small sample size, strict test conditions, lack of heterogeneity in the patient population, and poor population representation (9). In addition, these models based on clinical trials are not derived from real-world data. Even if such a model is constructed with high accuracy, it is not very useful for real-world research (10). As electronic medical records (EHRs) become more common in clinical research, methods for predicting the prognosis of HF using EHRs instead of clinical trial data have become necessary (11,12).
In recent years, with the rapid development of artificial intelligence, machine learning technology has been used to build cardiovascular disease prediction models more and more widely (13)(14)(15). In models for aging patients, many studies have also proved that the prediction performance of the survival model based on machine learning is better than the traditional Cox proportional hazard model (16). Survival analysis models the time to event (17). A major challenge in survival analysis is censoring, which is the problem that makes the modeling time of event data more complicated, compared with traditional regression methods (18)(19)(20)(21). Miao (22) used the Cox and RSF models to predict cardiovascular disease in 2015 and assessed the performance of the constructed models by comparing the discrimination ability, the identification of nonlinear effects, and the identification of significant predictors, and the results showed that the RSF model could automatically identify nonlinear effects among variables, while the Cox model could not. However, the RSF model was not as good as the Cox model in identifying some variables with small population proportional distribution.
Therefore, the Cox model cannot be completely replaced by the RSF model in survival analysis.
Hong (23) applies the emerging extreme learning machine (ELM) algorithm to the survival analysis of a single-layer feedforward neural network. It performs well in highdimensional and ultra-high-dimensional real data sets. The results show that ELM Cox has good predictive performance. In addition, it also has a greater advantage in shortening the calculation time (24). Wang (25) proposed an ELM survival model in 2018 that could effectively solve the above problems. Wang (26) applied the ELM algorithm to survival analysis and showed the ELM Cox model's good prediction performance on high and ultra-high dimensional datasets and reduced computation time.
In this study, we used the EHRs of inpatients with heart failure to construct least absolute shrinkage and selection operator Cox regression model (Lasso Cox), RSF, and ELM Cox survival analysis prognostic models. According to VIMP and minimal depth method, the predictors that have a significant impact on the prognosis are selected out, and a model with high predictive ability is constructed. To provide the basis for patients, doctors, and clinical researchers to initiate subsequent treatment and intervention measures.

Sources of Information
Data in this study are from the complete inpatient medical records of patients diagnosed with CHF in the cardiology departments of two grade 3 and first-class hospitals in Taiyuan, Shanxi Province during the period Jan. 2014 to Apr. 2019. The data were obtained according to the case report form of chronic heart failure (CHF-CRF) developed by our research group according to the case record content and HF guidelines (27). Patients were followed up at 3, 6, and 12 months after discharge and every 6 months after that until July 2019. The primary outcome is CHF-related mortality. Inclusion criteria are patients aged ≥18 years presenting with typical signs or symptoms of CHD, in NYHA class II to IV, and receiving heart failure medications or other therapeutic measures. Patients were excluded if they had experienced an acute cardiovascular event within the past 2 months, they had a psychiatric disorder or other major non-cardiovascular chronic disease.

Statistical Analysis
SPSS (V26.0) and R 3.6.5 were used for statistical analysis. For group comparisons, we used chi-square tests for categorical variables; Student's t-test or nonparametric Kruskal-Wallis tests for continuous variables. Univariate Cox regression analysis was used to describe the influence of variables on primary outcomes. Random forest VIMP (variable Importance) and minimal depth (28) methods are used to select variables. Significance thresholdα = 0.05. The R packages SurvELM (29), randomForestSRC (30), and glmnet (31) are used to build the ELM Cox, RSF, and Lasso Cox survival models.

Data Preprocessing and Feature Selection
In clinical practice, patients undergo different tests, resulting in missing indicators in the data collected. Variables with ≥30% missing were removed from the analysis (Supplementary Table 3). According to previous research (32), this paper uses the MissForest algorithm in the missForest R package (33) to impute variables with <30% missing rate. We use random forest's VIMP and minimal depth method to carry out 5-fold cross-validation to select variables for constructing predictive models. The research process is shown in Figure 1 (Details in Supplementary Materials).

RESEARCH METHODOLOGY The Lasso Cox Model
Lasso is a regression analysis method that performs regularization along with variable selection to improve the prediction performance and interpretability of statistical models. Tibshirani (34) applied Lasso to the Cox proportional hazards model in 1997 and performed variable selection by reducing the absolute values of the penalty coefficients to even zero so that the estimated variance of the final model was decreased and its interpretability increased.

Random Survival Forest
RSF is an algorithm that estimates risks under the framework of the random forests using statistical methods without making any assumptions about individual risk functions. RSF randomly selects the features and samples of subtrees and uses the logrank test to split the trees; the overall cumulative risk function is estimated after calculating the cumulative risk function for each tree. RSF extends the application of Breiman's Random Forests method for truncated data with advantages such as being free from the assumption of equal scaling conditions and suitability for complex variable problems with variable multicollinearity and high dimensionality (35).   There are several reasons why we choose ELM as the singlehidden-layer feedforward neural network (SLFN) Cox model instead of other popular deep neural network survival models. First, it has been proved that any continuous objective function can be approximated by SLFN with adjustable hidden nodes. This means that complex network structures such as MLP neural networks or deep neural networks may not always be necessary (38,39). Second, most of the backpropagation or similar algorithms used in deep learning neural networks adjust the input and output weights and hidden layer bias values through optimization based on gradient descent. This is likely to reduce the generalization ability of the network. In contrast, ELM hidden node parameters do not need to be adjusted, and better model performance can be obtained without complicated parameter tuning (40). Third, the simulation study of Wang et al. (23) showed that ELM Cox can choose a simple linear kernel in various types of data, and has good stability under different ratios of censoring conditions. This may be the linear check is not sensitive to Kernel parameter c (41).

Model Development
Censoring can have an important influence on the results of survival analysis. A high degree of censoring can result in lower accuracy and effectiveness of a model, increasing the risk of bias (42). The censored rate of heart failure data in this study was 90.2%. To build a stable performance model, we used stratified bootstrap (43). In this study, we stratified the training sets and the testing sets in the ratio of 2:1 by the outcome. To obtain reliable model indicators, the entire process was repeated 100 times, and the performance of the model was compared.
The parameter combination of the RSF model with the optimal prediction performance was selected through 5-fold cross-validation, i.e., ntree = 500, mtry = 7, and nodesize = 60; ELM Cox model was constructed with the default parameters, i.e., implied layer nodes L = 100 and regularization parameter C = 1e5.

Model Evaluation Metrics
Two common survival analysis evaluation metrics, Integrated Brier Score (IBS) (44) and Harrell's concordance index (C-index) (20) were used to assess the accuracy of the survival analysis models in the follow-up experiments. The C-index for survival prediction indicates the proportion of observations with correct ranking divided by all valid pairs, and the closer C-index is to one, the better the model prediction; IBS is the Brier score of the survival model over a certain period, and the smaller the IBS, the stronger the prediction model. Comparisons of indicators between models were made using nonparametric rank-sum tests and Nemenyi post hoc tests.

Simulation Analysis
In this paper, the R package SimSurv (45) was used to test the applicability of the Lasso Cox, RSF, and ELM Cox algorithms to low-dimensional data, in which the fundamental risk function was set to be Weibull distributed and the scale parameter was set to two to give a simulation dataset with 1,000 samples and five normal covariates (23). We generated on the data set and were still alive until the end of follow-up, that is, the proportion of censoring was 25, 50, and 75%. And the three models were constructed by repeating 50 times with default parameters. The results are shown in Figure 2.
When the censoring ratio is 25%, the performance of RSF and ELM Cox models is almost the same with a C-index >0.75. The evaluation indexes of the two models have a small fluctuation range, indicating relatively good performance. The Lasso Cox model performed slightly worse, but the results were still acceptable. The IBS of the three models is all below 0.1, indicating that their overall performance is stable. The ELM Cox model outperformed the other two models when the censoring ratio was 50%. At a censoring ratio of 75%, the performance of all three models decreased, with a C-index below 0.6 and IBS over 0. 15. In summary, the performance of the three prediction models gradually decreases as the survival time data censoring ratio increases and the ELM Cox model performs most consistently among the three constructed models. Performance comparison of the three algorithms in low-dimensional data shows that the ELM Cox model can be applied in the survival analysis of heart failure patients.

Basic Information
According to the inclusion and exclusion criteria, at the end of follow-up, a total of 5,819 patients were included in the study, of which 444 (7.63%) were excluded due to loss to follow-up. Five thousand two hundred seventy-ninth patients were finally enrolled, of which 4,762 (90.2%) were alive and 517 (9.8%) died. The mean age of the enrolled patients was (70 ± 11.7) years, with 3,404 (64.5%) male and 1,875 (35.5%) female cases (Details in Supplementary Table 1).

Univariate Cox Regression
Univariate Cox analysis results are as follows ( Table 1). In Figure 3, we show the survival curves of patients by age and NYHA subgroups.

Feature Selection
The RSF model was used to prioritize and explain the influencing factors using VIMP and Minimal Depth to select variables. The importance of the relationship between each attribute (predictor) to outcome were plotted with different colored dots, red for lowrisk values and blue for high-risk values. Twenty-one Variables selected by both methods were selected for subsequent modeling (variables below the horizontal dotted line) (Figure 4, Table 2) (Details in Supplementary Figure 1).

Interpretation of Predictive Features
In order to explain the selected variables intuitively, we use SHAP (SHapley Additive exPlanations) (46) to illustrate how these variables affect the mortality rate in the model. Figure 5A shows   the 21 risk factors assessed by the average absolute SHAP value. Figure 5B shows the details of the features in the model. The feature ranking (y-axis) indicates the importance of the predictive model. The SHAP value (x-axis) is a unified index that responds to the influence of a certain feature in the model. In each feature important row, use different colored dots to draw the attribution of all patients to the results, where the red dot represents the high-risk value, and the blue dot represents the low-risk value.
Older age, elevated NYHA Classification, a higher Uric acid, absolute neutrophil count, QRS, Blood urea nitrogen, direct bilirubin, Cystatin C, free thyroxine, NT-proBNP, Cardiac troponin, red blood cell distribution width, Serum chlorine, Creatinine; the presence of previous diabetes mellitus and noβblockers have increased the risk of CHF-related mortality. Furthermore, a lower blood pressure, BMI, albumin, left ventricular ejection fraction and free triiodothyronine were also associated with a higher predicted probability of CHFrelated mortality.
Lasso Cox, RSF, and ELM Cox were then applied to construct the survival prediction models for CHF. In 2017, Voors (47) developed and validated a mortality risk model based on the clinical data of patients with heart failure with preserved ejection fraction from 11 European countries in the BIOSTAT-CHF and showed that advanced age, higher BUN and NT-proBNP, lower hemoglobin, and no β-blocker were the five variables with the strongest prediction effect on mortality, among which age, BUN, NT-proBNP, and β-blockers were consistent with the results of this paper.

Model Prediction Performance Comparison
As shown in Figure 6, compared to the other two models, the ELM Cox model has the highest C-index 0.775(0.755, 0.802) and the lowest IBS 0.166(0.150, 0.182), showing the best overall performance. The results from the data application align with those from the simulation studies in this manuscript, and it can be concluded that the Cox proportional hazard model based on ELM could produce better predictions when applied to the survival analysis of patients with CHF.

DISCUSSION
Traditionally, the Cox proportional hazard regression algorithm is used to construct models for heart failure research, but its application conditions are subject to many restrictions (34).
In this study, the predictive performance of three survival analysis models, Lasso cox, RSF, and ELM Cox models, on a simulated dataset and an actual CHF dataset was compared. The prediction performance of the three models under three survival time data censoring ratios was compared, and the results showed that the prediction performance of the three models gradually decreases as the censoring ratio increases. However, the ELM Cox model performed the best with the highest stability. The simulation study laid the foundation for the study of actual CHF data and explored the possibility of constructing chronic disease survival analysis models on survival tie data with large censoring ratios.
In this paper, the Lasso Cox and RSF models consumed relatively longer training time on real data, especially when the RSF cross-validation is used to select the optimal parameters, each iteration taking 5-10 min. In addition to the short computational time, the evaluation metrics of the ELM Cox heart failure prediction model (C-index and IBS: 0.775, 0.166, respectively) were also the most ideal among the three models. Compared with the performance of the Lasso Cox and RSF models, the ELM Cox model showed stable performances on simulated and real data, which was still superior even with high censoring ratios.
The innovation of this study is that the classical parametric or semiparametric survival analysis model has serious limitations and cannot achieve good predictive effects in complex variables. For example, in the Cox risk proportional model, there are proportional hazards and log-linear assumptions. It is difficult to fully analyze the nonlinear relationship between the independent variable and the dependent variable. It is assumed that the risk ratio is constant over time (18). However, these basic assumptions are not easy to satisfy and difficult to verify in practice. In this study, a newer ELM Cox algorithm can be used  to make up for the shortcomings of the traditional algorithm, and from the perspective of model construction, the algorithm is applied to the survival prediction of patients with chronic heart failure. It can improve the predictive ability of the survival model.
In this study, three survival prediction models, Lasso Cox, RSF, and ELM Cox models were constructed using electronic medical records of patients with CHF, with the following limitations: (1) This study analyzed survival censored higher proportion, 90.6%; thus, the C-index of the models was not very high; In the real-world high censored heart failure data research, there is no further comparison with established approaches that combine backpropagation-trained deep neural networks with Cox proportional hazards models and other integrated algorithms (29,48), (2) The ELM Cox model is a black box when it comes to how the variables are used, a characteristic of all neural networks, and the intermediate links in building the model are not yet clear, (3) The data sources are only from Taiyuan city, Shanxi Province. Therefore, it is necessary to expand the sample sources in future studies, and (4) The models are constructed without external validation, which may be added in future studies.

CONCLUSION
Overall, this study applies a newer survival analysis algorithm, the ELM Cox model, to build a survival prediction model for patients with CHF, which has a better and more stable prediction performance compared with the Lasso Cox and RSF models. The 21 clinical variables with a significant impact on the survival of heart failure patients are of great theoretical significance and application value in assessing the mortality risk of heart failure patients, assisting physicians to carry out targeted therapeutic measures for high-risk groups with poor prognosis, and preventing and mitigating the development of poor prognosis in CHF patients.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The research program received medical and ethical approval from Shanxi Medical University (NO. 2018LL128). Written informed consent to participate in this study was provided by the participants or their legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
HY conceived the study, designed the study protocol, analyzed and interpreted the data, and draft and write the manuscript. JT revised and reviewed the article. BM, KW, CZ, YL, and JY were responsible for collecting the data. HY and BM participated in the data analysis. QH and YZ came up with the original concept for the study, oversaw the data analysis, and revised the paper. All authors contributed to the article and approved the submitted version.